IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 13 Dec 2016

    我爱机器学习(52ml.net)发表于 2016-12-13 00:00:00
    love 0

    Neural and Evolutionary Computing

    Neurogenesis Deep Learning

    Timothy J. Draelos, Nadine E. Miner, Christopher C. Lamb, Craig M. Vineyard, Kristofor D. Carlson, Conrad D. James, James B. Aimone
    Comments: Submitted to IJCNN 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

    Neural machine learning methods, such as deep neural networks (DNN), have
    achieved remarkable success in a number of complex data processing tasks. These
    methods have arguably had their strongest impact on tasks such as image and
    audio processing – data processing domains in which humans have long held clear
    advantages over conventional algorithms. In contrast to biological neural
    systems, which are capable of learning continuously, deep artificial networks
    have a limited ability for incorporating new information in an already trained
    network. As a result, methods for continuous learning are potentially highly
    impactful in enabling the application of deep networks to dynamic data sets.
    Here, inspired by the process of adult neurogenesis in the hippocampus, we
    explore the potential for adding new neurons to deep layers of artificial
    neural networks in order to facilitate their acquisition of novel information
    while preserving previously trained data representations. Our results on the
    MNIST handwritten digit dataset and the NIST SD 19 dataset, which includes
    lower and upper case letters and digits, demonstrate that neurogenesis is well
    suited for addressing the stability-plasticity dilemma that has long challenged
    adaptive machine learning algorithms.

    Empirical Evaluation of A New Approach to Simplifying Long Short-term Memory (LSTM)

    Yuzhen Lu
    Comments: 5 pages, 5 figures
    Subjects: Neural and Evolutionary Computing (cs.NE)

    The standard LSTM, although it succeeds in the modeling long-range
    dependences, suffers from a highly complex structure that can be simplified
    through modifications to its gate units. This paper was to perform an empirical
    comparison between the standard LSTM and three new simplified variants that
    were obtained by eliminating input signal, bias and hidden unit signal from
    individual gates, on the tasks of modeling two sequence datasets. The
    experiments show that the three variants, with reduced parameters, can achieve
    comparable performance with the standard LSTM. Due attention should be paid to
    turning the learning rate to achieve high accuracies

    Improved Quick Hypervolume Algorithm

    Andrzej Jaszkiewicz
    Subjects: Neural and Evolutionary Computing (cs.NE)

    In this paper, we present an improved version of recently proposed Quick
    Hypervolume algorithm for calculating exact hypervolume of the space dominated
    by a set of d-dimensional points. This value is often used as a quality
    indicator in multiobjective evolutionary algorithms and the efficiency of
    calculating this indicator is of crucial importance especially in the case of
    large set or many dimensional spaces. We use a similar divide and conquer
    scheme as in the original Quick Hypervolume algorithm, we modify, however, the
    way the problem is split into smaller sub-problems. Through both theoretical
    analysis and computational study we show that our approach improves
    computational complexity of the algorithm and running time of its
    implementation.

    Self-calibrating Neural Networks for Dimensionality Reduction

    Yuansi Chen, Cengiz Pehlevan, Dmitri B. Chklovskii
    Comments: 2016 Asilomar Conference on Signals, Systems and Computers
    Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)

    Recently, a novel family of biologically plausible online algorithms for
    reducing the dimensionality of streaming data has been derived from the
    similarity matching principle. In these algorithms, the number of output
    dimensions can be determined adaptively by thresholding the singular values of
    the input data matrix. However, setting such threshold requires knowing the
    magnitude of the desired singular values in advance. Here we propose online
    algorithms where the threshold is self-calibrating based on the singular values
    computed from the existing observations. To derive these algorithms from the
    similarity matching cost function we propose novel regularizers. As before,
    these online algorithms can be implemented by Hebbian/anti-Hebbian neural
    networks in which the learning rule depends on the chosen regularizer. We
    demonstrate both mathematically and via simulation the effectiveness of these
    online algorithms in various settings.

    Generalized Deep Image to Image Regression

    Venkataraman Santhanam, Vlad I. Morariu, Larry S. Davis
    Comments: Submitted to CVPR on November 15th, 2016. Code will be made available soon
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We present a Deep Convolutional Neural Network architecture which serves as a
    generic image-to-image regressor that can be trained end-to-end without any
    further machinery. Our proposed architecture: the Recursively Branched
    Deconvolutional Network (RBDN) develops a cheap multi-context image
    representation very early on using an efficient recursive branching scheme with
    extensive parameter sharing and learnable upsampling. This multi-context
    representation is subjected to a highly non-linear locality preserving
    transformation by the remainder of our network comprising of a series of
    convolutions/deconvolutions without any spatial downsampling. The RBDN
    architecture is fully convolutional and can handle variable sized images during
    inference. We provide qualitative/quantitative results on (3) diverse tasks:
    relighting, denoising and colorization and show that our proposed RBDN
    architecture obtains comparable results to the state-of-the-art on each of
    these tasks when used off-the-shelf without any post processing or
    task-specific architectural modifications.

    Towards deep learning with spiking neurons in energy based models with contrastive Hebbian plasticity

    Thomas Mesnard, Wulfram Gerstner, Johanni Brea
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    In machine learning, error back-propagation in multi-layer neural networks
    (deep learning) has been impressively successful in supervised and
    reinforcement learning tasks. As a model for learning in the brain, however,
    deep learning has long been regarded as implausible, since it relies in its
    basic form on a non-local plasticity rule. To overcome this problem,
    energy-based models with local contrastive Hebbian learning were proposed and
    tested on a classification task with networks of rate neurons. We extended this
    work by implementing and testing such a model with networks of leaky
    integrate-and-fire neurons. Preliminary results indicate that it is possible to
    learn a non-linear regression task with hidden layers, spiking neurons and a
    local synaptic plasticity rule.


    Computer Vision and Pattern Recognition

    Deep Supervised Hashing with Triplet Labels

    Xiaofang Wang, Yi Shi, Kris M. Kitani
    Comments: Appear in ACCV 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Hashing is one of the most popular and powerful approximate nearest neighbor
    search techniques for large-scale image retrieval. Most traditional hashing
    methods first represent images as off-the-shelf visual features and then
    produce hashing codes in a separate stage. However, off-the-shelf visual
    features may not be optimally compatible with the hash code learning procedure,
    which may result in sub-optimal hash codes. Recently, deep hashing methods have
    been proposed to simultaneously learn image features and hash codes using deep
    neural networks and have shown superior performance over traditional hashing
    methods. Most deep hashing methods are given supervised information in the form
    of pairwise labels or triplet labels. The current state-of-the-art deep hashing
    method DPSH~cite{li2015feature}, which is based on pairwise labels, performs
    image feature learning and hash code learning simultaneously by maximizing the
    likelihood of pairwise similarities. Inspired by DPSH~cite{li2015feature}, we
    propose a triplet label based deep hashing method which aims to maximize the
    likelihood of the given triplet labels. Experimental results show that our
    method outperforms all the baselines on CIFAR-10 and NUS-WIDE datasets,
    including the state-of-the-art method DPSH~cite{li2015feature} and all the
    previous triplet label based deep hashing methods.

    Inverse Compositional Spatial Transformer Networks

    Chen-Hsuan Lin, Simon Lucey
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In this paper, we establish a theoretical connection between the classical
    Lucas & Kanade (LK) algorithm and the emerging topic of Spatial Transformer
    Networks (STNs). STNs are of interest to the vision and learning communities
    due to their natural ability to combine alignment and classification within the
    same theoretical framework. Inspired by the Inverse Compositional (IC) variant
    of the LK algorithm, we present Inverse Compositional Spatial Transformer
    Networks (IC-STNs). We demonstrate that IC-STNs can achieve better performance
    than conventional STNs with less model capacity; in particular, we show
    superior performance in pure image alignment tasks as well as joint
    alignment/classification problems on real-world problems.

    PoseAgent: Budget-Constrained 6D Object Pose Estimation via Reinforcement Learning

    Alexander Krull, Eric Brachmann, Sebastian Nowozin, Frank Michel, Jamie Shotton, Carsten Rother
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    State-of-the-art computer vision algorithms often achieve efficiency by
    making discrete choices about which hypotheses to explore next. This allows
    allocation of computational resources to promising candidates, however, such
    decisions are non-differentiable. As a result, these algorithms are hard to
    train in an end-to-end fashion. In this work we propose to learn an efficient
    algorithm for the task of 6D object pose estimation. Our system optimizes the
    parameters of an existing state-of-the art pose estimation system using
    reinforcement learning, where the pose estimation system now becomes the
    stochastic policy, parametrized by a CNN. Additionally, we present an efficient
    training algorithm that dramatically reduces computation time. We show
    empirically that our learned pose estimation procedure makes better use of
    limited resources and improves upon the state-of-the-art on challenging
    datasets. Our approach enables differentiable end-to-end training of complex
    algorithmic pipelines and learns to make optimal use of a given computational
    budget.

    Next-Flow: Hybrid Multi-Tasking with Next-Frame Prediction to Boost Optical-Flow Estimation in the Wild

    Nima Sedaghat
    Comments: 8 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    CNN-based optical flow estimators have attracted attentions recently, mainly
    due to their impressive speed. As successful as they’ve been on synthetic
    datasets, they are still far behind the classical methods in real-world
    scenarios, mainly due to lack of flow ground-truth. In the current work, we
    seek to boost CNN-based flow estimation in real scenes with the help of the
    freely available self-supervised task of next-frame prediction. To this end we
    train the network in a hybrid way, providing it with a mixture of synthetic and
    real videos. With the help of a novel time-variant multi-tasking architecture,
    the network decides which of the tasks to learn at each point of time,
    depending on the availability of ground-truth. Our proposed method improves
    optical flow estimation in real scenes dramatically. We also experiment with
    prediction of “next-flow” instead of estimation of the current flow, which is
    intuitively more related to the task of next-frame prediction and improve the
    results even more. We report and evaluate results both qualitatively and
    quantitatively. The latter is done by training a single-stream action
    classifier on the estimated flow fields on UCF101 & HMDB51 and demonstrate high
    improvements of accuracy over the baseline: 10.2% and 9.6% respectively. Also
    as a side product of this work, we report significant improvements over
    state-of-the-art results in the task of next-frame prediction.

    COCO-Stuff: Thing and Stuff Classes in Context

    Holger Caesar, Jasper Uijlings, Vittorio Ferrari
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Semantic classes can be either things (objects with a well-defined shape,
    e.g. car, person) or stuff (amorphous background regions, e.g. grass, sky).
    While lots of classification and detection works focus on thing classes, less
    attention has been given to stuff classes. Nonetheless, stuff classes are
    important as they allow to explain important aspects of an image, including (1)
    scene type; (2) which thing classes are likely to be present and their location
    (determined through contextual reasoning); (3) physical attributes, material
    types and geometric properties of the scene. To understand stuff and things in
    context we annotate 10,000 images of the COCO dataset with a broad range of
    stuff classes, using a specialized stuff annotation protocol allowing us to
    efficiently label each pixel. On this dataset, we analyze several aspects: (a)
    the importance of stuff and thing classes in terms of their surface cover and
    how frequently they are mentioned in image captions; (b) the importance of
    several visual criteria to discriminate stuff and thing classes; (c) we study
    the spatial relations between stuff and things, highlighting the rich
    contextual relations that make our dataset unique. Furthermore, we show
    experimentally how modern semantic segmentation methods perform on stuff and
    thing classes and answer the question whether stuff is easier to segment than
    things. We release our new dataset and the trained models online, hopefully
    promoting further research on stuff and stuff-thing contextual relations.

    Segmentation of large images based on super-pixels and community detection in graphs

    Oscar A. C. Linares, Glenda Michele Botelho, Francisco Aparecido Rodrigues, João Batista Neto
    Comments: 20 pages, 12 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Image segmentation has many applications which range from machine learning to
    medical diagnosis. In this paper, we propose a framework for the segmentation
    of images based on super-pixels and algorithms for community identification in
    graphs. The super-pixel pre-segmentation step reduces the number of nodes in
    the graph, rendering the method the ability to process large images. Moreover,
    community detection algorithms provide more accurate segmentation than
    traditional approaches, such as those based on spectral graph partition. We
    also compare our method with two algorithms: a) the graph-based approach by
    Felzenszwalb and Huttenlocher and b) the contour-based method by Arbelaez.
    Results have shown that our method provides more precise segmentation and is
    faster than both of them.

    Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification

    Maksim Lapin, Matthias Hein, Bernt Schiele
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Top-k error is currently a popular performance measure on large scale image
    classification benchmarks such as ImageNet and Places. Despite its wide
    acceptance, our understanding of this metric is limited as most of the previous
    research is focused on its special case, the top-1 error. In this work, we
    explore two directions that shed more light on the top-k error. First, we
    provide an in-depth analysis of established and recently proposed single-label
    multiclass methods along with a detailed account of efficient optimization
    algorithms for them. Our results indicate that the softmax loss and the smooth
    multiclass SVM are surprisingly competitive in top-k error uniformly across all
    k, which can be explained by our analysis of multiclass top-k calibration.
    Further improvements for a specific k are possible with a number of proposed
    top-k loss functions. Second, we use the top-k methods to explore the
    transition from multiclass to multilabel learning. In particular, we find that
    it is possible to obtain effective multilabel classifiers on Pascal VOC using a
    single label per image for training, while the gap between multiclass and
    multilabel methods on MS COCO is more significant. Finally, our contribution of
    efficient algorithms for training with the considered top-k and multilabel loss
    functions is of independent interest.

    A Binary Convolutional Encoder-decoder Network for Real-time Natural Scene Text Processing

    Zichuan Liu, Yixing Li, Fengbo Ren, Hao Yu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we develop a binary convolutional encoder-decoder network
    (B-CEDNet) for natural scene text processing (NSTP). It converts a text image
    to a class-distinguished salience map that reveals the categorical, spatial and
    morphological information of characters. The existing solutions are either
    memory consuming or run-time consuming that cannot be applied to real-time
    applications on resource-constrained devices such as advanced driver assistance
    systems. The developed network can process multiple regions containing
    characters by one-off forward operation, and is trained to have binary weights
    and binary feature maps, which lead to both remarkable inference run-time
    speedup and memory usage reduction. By training with over 200, 000 synthesis
    scene text images (size of (32 imes128)), it can achieve (90\%) and (91\%)
    pixel-wise accuracy on ICDAR-03 and ICDAR-13 datasets. It only consumes (4.59
    ms) inference run-time realized on GPU with a small network size of 2.14 MB,
    which is up to (8 imes) faster and (96\%) smaller than it full-precision
    version.

    VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering

    Marc Bolaños, Álvaro Peris, Francisco Casacuberta, Petia Radeva
    Comments: Submitted to IbPRIA’17, 8 pages, 3 figures, 1 table
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    In this paper, we address the problem of visual question answering by
    proposing a novel model, called VIBIKNet. Our model is based on integrating
    Kernelized Convolutional Neural Networks and Long-Short Term Memory units to
    generate an answer given a question about an image. We prove that VIBIKNet is
    an optimal trade-off between accuracy and computational load, in terms of
    memory and time consumption. We validate our method on the VQA challenge
    dataset and compare it to the top performing methods in order to illustrate its
    performance and speed.

    Statistics of Visual Responses to Object Stimuli from Primate AIT Neurons to DNN Neurons

    Qiulei Dong, Zhanyi Hu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Cadieu et al. (Cadieu,2014) reported that deep neural networks(DNNs) could
    rival the representation of primate inferotemporal cortex for object
    recognition. Lehky et al. (Lehky,2011) provided a statistical analysis on
    neural responses to object stimuli in primate AIT cortex. They found the
    intrinsic dimensionality of object representations in AIT cortex is around 100
    (Lehky,2014). Considering the outstanding performance of DNNs in object
    recognition, it is worthwhile investigating whether the responses of DNN
    neurons have similar response statistics to those of AIT neurons. Following
    Lehky et al.’s works, we analyze the response statistics to image stimuli and
    the intrinsic dimensionality of object representations of DNN neurons. Our
    findings show in terms of kurtosis and Pareto tail index, the response
    statistics on single-neuron selectivity and population sparseness of DNN
    neurons are fundamentally different from those of IT neurons except some
    special cases. By increasing the number of neurons and stimuli, the conclusions
    could alter substantially. In addition, with the ascendancy of the
    convolutional layers of DNNs, the single-neuron selectivity and population
    sparseness of DNN neurons increase, indicating the last convolutional layer is
    to learn features for object representations, while the following
    fully-connected layers are to learn categorization features. It is also found
    that a sufficiently large number of stimuli and neurons are necessary for
    obtaining a stable dimensionality. To our knowledge, this is the first work to
    analyze the response statistics of DNN neurons comparing with AIT neurons, and
    our results provide not only some insights into the discrepancy of DNN neurons
    with respect to IT neurons in object representation, but also shed some light
    on possible outcomes of IT neurons when the number of recorded neurons and
    stimuli is beyond the level in (Lehky,2011,2014).

    Text-guided Attention Model for Image Captioning

    Jonghwan Mun, Minsu Cho, Bohyung Han
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Visual attention plays an important role to understand images and
    demonstrates its effectiveness in generating natural language descriptions of
    images. On the other hand, recent studies show that language associated with an
    image can steer visual attention in the scene during our cognitive process.
    Inspired by this, we introduce a text-guided attention model for image
    captioning, which learns to drive visual attention using associated captions.
    For this model, we propose an exemplar-based learning approach that retrieves
    from training data associated captions with each image, and use them to learn
    attention on visual features. Our attention model enables to describe a
    detailed state of scenes by distinguishing small or confusable objects
    effectively. We validate our model on MS-COCO Captioning benchmark and achieve
    the state-of-the-art performance in standard metrics.

    PIGMIL: Positive Instance Detection via Graph Updating for Multiple Instance Learning

    Dongkuan Xu, Jia Wu, Wei Zhang, Yingjie Tian
    Comments: 11 pages, 9 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Positive instance detection, especially for these in positive bags (true
    positive instances, TPIs), plays a key role for multiple instance learning
    (MIL) arising from a specific classification problem only provided with bag (a
    set of instances) label information. However, most previous MIL methods on this
    issue ignore the global similarity among positive instances and that negative
    instances are non-i.i.d., usually resulting in the detection of TPI not precise
    and sensitive to outliers. To the end, we propose a positive instance detection
    via graph updating for multiple instance learning, called PIGMIL, to detect TPI
    accurately. PIGMIL selects instances from working sets (WSs) of some working
    bags (WBs) as positive candidate pool (PCP). The global similarity among
    positive instances and the robust discrimination of instances of PCP from
    negative instances are measured to construct the consistent similarity and
    discrimination graph (CSDG). As a result, the primary goal (i.e. TPI detection)
    is transformed into PCP updating, which is approximated efficiently by updating
    CSDG with a random walk ranking algorithm and an instance updating strategy. At
    last bags are transformed into feature representation vector based on the
    identified TPIs to train a classifier. Extensive experiments demonstrate the
    high precision of PIGMIL’s detection of TPIs and its excellent performance
    compared to classic baseline MIL methods.

    Recurrent Attentional Model for No-Reference Image Quality Assessment

    Diqi Chen, Yizhou Wang, Tianfu Wu, Wen Gao
    Comments: 9 pages, 7 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a recurrent attentional model (RAM) for general
    no-reference image quality assessment (NR-IQA), that is to predict the
    perceptual quality score for an input image without using any reference image
    and/or prior knowledge regarding underlying distortions. The proposed RAM is
    inspired by the well known visual attention mechanism, both covert and overt,
    which affects many aspects of visual perception including image quality
    assessment. The attentional mechanism is, however, largely ignored in the
    NR-IQA literature. The proposed RAM hypothesizes that the attentional scanning
    path in an image should contain intrinsic information for IQA. The RAM thus
    consists of three components: a glimpse sub-network analyzing the quality at a
    fixation using multi-scale information, a location sub-network selecting where
    to look next by sampling a stochastic node, and a recurrent network aggregating
    information along the scanning path to compute the final prediction. The RAM is
    formulated under multi-task learning for the joint prediction of distortion
    type and image quality score and for the REINFORCE
    rule~cite{williams1992simple} used to handle the stochastic node. The RAM is
    trained through back-propagation. In experiments, the RAM is tested on the
    TID2008 dataset with promising performance obtained, which shows the
    effectiveness of the proposed RAM. Furthermore, the RAM is very efficient in
    the sense that a small number of glimpses are used usually in testing.

    On Choosing Training and Testing Data for Supervised Algorithms in Ground Penetrating Radar Data for Buried Threat Detection

    Daniël Reichman, Leslie M. Collins, Jordan M. Malof
    Comments: 9 pages, 8 figures, journal paper
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Ground penetrating radar (GPR) is one of the most popular and successful
    sensing modalities that has been investigated for landmine and subsurface
    threat detection. Many of the detection algorithms applied to this task are
    supervised and therefore require labeled examples of target and non-target data
    for training. Training data most often consists of 2-dimensional images (or
    patches) of GPR data, from which features are extracted, and provided to the
    classifier during training and testing. Identifying desirable training and
    testing locations to extract patches, which we term “keypoints”, is well
    established in the literature. In contrast however, a large variety of
    strategies have been proposed regarding keypoint utilization (e.g., how many of
    the identified keypoints should be used at targets, or non-target, locations).
    Given the variety keypoint utilization strategies that are available, it is
    very unclear (i) which strategies are best, or (ii) whether the choice of
    strategy has a large impact on classifier performance. We address these
    questions by presenting a taxonomy of existing utilization strategies, and then
    evaluating their effectiveness on a large dataset using many different
    classifiers and features. We analyze the results and propose a new strategy,
    called PatchSelect, which outperforms other strategies across all experiments.

    Motion Detection Robust to Severe Illumination Changes

    Sahar Yousefi, M.T. Manzuri Shalmani
    Comments: 15 pages, 14 figures, 3 Tables, Journal
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recently, there has been a considerable attention to motion detection due to
    the explosive growth of its application in video analysis and surveillance
    systems. While good results can be achieved in the previous approaches,
    accurate motion detection remains a challenging task due to the difficulties
    raised by illumination variations, occlusion, camouflage, burst physical
    motion, dynamic texture and environmental changes such as those on climate
    changes, sunlight changes during a day, etc. In this paper, we propose a novel
    perpixel motion descriptor for both motion detection and dynamic texture
    segmentation which outperforms the literature in severe scenarios. The proposed
    descriptor is based on two complementary three-dimensional-discrete wavelet
    transform (3D-DWT) and three dimensional wavelet leader. In this approach, a
    feature vector is extracted for each pixel by applying a novel three
    dimensional wavelet based motion descriptor. Then, the extracted features are
    clustered by the well-known k-means algorithm. The experimental results
    demonstrate the effectiveness of our proposed method compared to the literature
    motion detection approaches.

    Multiple Instance Learning: A Survey of Problem Characteristics and Applications

    Marc-André Carbonneau, Veronika Cheplygina, Eric Granger, Ghyslain Gagnon
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

    Multiple instance learning (MIL) is a form of weakly supervised learning
    where training instances are arranged in sets, called bags, and a label is
    provided for the entire bag. This formulation is gaining interest because it
    naturally fits various problems and allows to leverage weakly labeled data.
    Consequently, it has been used in diverse application fields such as computer
    vision and document classification. However, learning from bags raises
    important challenges that are unique to MIL. This paper provides a
    comprehensive survey of the characteristics which define and differentiate the
    types of MIL problems. Until now, these problem characteristics have not been
    formally identified and described. As a result, the variations in performance
    of MIL algorithms from one data set to another are difficult to explain. In
    this paper, MIL problem characteristics are grouped into four broad categories:
    the composition of the bags, the types of data distribution, the ambiguity of
    instance labels, and the task to be performed. Methods specialized to address
    each category are reviewed. Then, the extent to which these characteristics
    manifest themselves in key MIL application areas are described. Finally,
    experiments are conducted to compare the performance of 16 state-of-the-art MIL
    methods on selected problem characteristics. This paper provides insight on how
    the problem characteristics affect MIL algorithms, recommendations for future
    benchmarking and promising avenues for research.

    Salient Region Detection with Convex Hull Overlap

    Yongqing Liang, Cheng Jin, Yuejie Zhang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we establish a novel bottom-up cue named Convex Hull Overlap
    (CHO), and then propose an effective approach to detect salient regions using
    the combination of the CHO cue and global contrast cue. Our scheme
    significantly differs from other earlier work in: 1) The hierarchical
    segmentation model based on Normalized Graph-Cut fits the splitting and merging
    processes in human visual perception; 2) Previous work only focuses on color
    and texture cues, while our CHO cue makes up the obvious gap between the
    spatial region covering and the region saliency. CHO is a kind of improved and
    enhanced Gestalt cue, while other popular figure-ground cues such as convexity
    and surroundedness can be regarded as the special cases of CHO. Our experiments
    on a large number of public data have obtained very positive results.

    Towards an Automated Image De-fencing Algorithm Using Sparsity

    Sankaraganesh Jonna, Krishna K. Nakka, Rajiv R. Sahay
    Comments: The paper was accepted in VISAPP-2015
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Conventional approaches to image de-fencing suffer from non-robust fence
    detection and are limited to processing images of static scenes. In this
    position paper, we propose an automatic de-fencing algorithm for images of
    dynamic scenes. We divide the problem of image de-fencing into the tasks of
    automated fence detection, motion estimation and fusion of data from multiple
    frames of a captured video of the dynamic scene. Fences are detected
    automatically using two approaches, namely, employing Gabor filter and a
    machine learning method. We cast the fence removal problem in an optimization
    framework, by modeling the formation of the degraded observations. The inverse
    problem is solved using split Bregman technique assuming total variation of the
    de-fenced image as the regularization constraint.

    Generalized Deep Image to Image Regression

    Venkataraman Santhanam, Vlad I. Morariu, Larry S. Davis
    Comments: Submitted to CVPR on November 15th, 2016. Code will be made available soon
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We present a Deep Convolutional Neural Network architecture which serves as a
    generic image-to-image regressor that can be trained end-to-end without any
    further machinery. Our proposed architecture: the Recursively Branched
    Deconvolutional Network (RBDN) develops a cheap multi-context image
    representation very early on using an efficient recursive branching scheme with
    extensive parameter sharing and learnable upsampling. This multi-context
    representation is subjected to a highly non-linear locality preserving
    transformation by the remainder of our network comprising of a series of
    convolutions/deconvolutions without any spatial downsampling. The RBDN
    architecture is fully convolutional and can handle variable sized images during
    inference. We provide qualitative/quantitative results on (3) diverse tasks:
    relighting, denoising and colorization and show that our proposed RBDN
    architecture obtains comparable results to the state-of-the-art on each of
    these tasks when used off-the-shelf without any post processing or
    task-specific architectural modifications.

    StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

    Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, Dimitris Metaxas
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Synthesizing photo-realistic images from text descriptions is a challenging
    problem in computer vision and has many practical applications. Samples
    generated by existing text-to-image approaches can roughly reflect the meaning
    of the given descriptions, but they fail to contain necessary details and vivid
    object parts. In this paper, we propose stacked Generative Adversarial Networks
    (StackGAN) to generate photo-realistic images conditioned on text descriptions.
    The Stage-I GAN sketches the primitive shape and basic colors of the object
    based on the given text description, yielding Stage-I low resolution images.
    The Stage-II GAN takes Stage-I results and text descriptions as inputs, and
    generates high resolution images with photo-realistic details. The Stage-II GAN
    is able to rectify defects and add compelling details with the refinement
    process. Samples generated by StackGAN are more plausible than those generated
    by existing approaches. Importantly, our StackGAN for the first time generates
    realistic 256 x 256 images conditioned on only text descriptions, while
    state-of-the-art methods can generate at most 128 x 128 images. To demonstrate
    the effectiveness of the proposed StackGAN, extensive experiments are conducted
    on CUB and Oxford-102 datasets, which contain enough object appearance
    variations and are widely-used for text-to-image generation analysis.

    Co-localization with Category-Consistent CNN Features and Geodesic Distance Co-Propagation

    Hieu Le, Chen-Ping Yu, Gregory Zelinsky, Dimitris Samaras
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Co-localization is the problem of localizing categorical objects using only
    positive sets of example images, without any form of further supervision. This
    is a challenging task as there is no pixel-level annotations. Motivated by
    human visual learning, we find the common features of an object category from
    convolutional kernels of a pre-trained convolutional neural network (CNN). We
    call these category-consistent CNN features. Then, we co-propagate their
    activated spatial regions using superpixel geodesic distances for localization.
    In our first set of experiments, we show that the proposed method achieves
    state-of-the-art performance on three related benchmarks: PASCAL 2007,
    PASCAL-2012, and the Object Discovery dataset. We also show that our method is
    able to detect and localize truly unseen categories, using six held-out ImagNet
    subset of categories with state-of-the-art accuracies. Our intuitive approach
    achieves this success without any region proposals or object detectors, and can
    be based on a CNN that was pre-trained purely on image classification tasks
    without further fine-tuning.

    Automatic Lymphocyte Detection in H&E Images with Deep Neural Networks

    Jianxu Chen, Chukka Srinivas
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Automatic detection of lymphocyte in H&E images is a necessary first step in
    lots of tissue image analysis algorithms. An accurate and robust automated
    lymphocyte detection approach is of great importance in both computer science
    and clinical studies. Most of the existing approaches for lymphocyte detection
    are based on traditional image processing algorithms and/or classic machine
    learning methods. In the recent years, deep learning techniques have
    fundamentally transformed the way that a computer interprets images and have
    become a matchless solution in various pattern recognition problems. In this
    work, we design a new deep neural network model which extends the fully
    convolutional network by combining the ideas in several recent techniques, such
    as shortcut links. Also, we design a new training scheme taking the prior
    knowledge about lymphocytes into consideration. The training scheme not only
    efficiently exploits the limited amount of free-form annotations from
    pathologists, but also naturally supports efficient fine-tuning. As a
    consequence, our model has the potential of self-improvement by leveraging the
    errors collected during real applications. Our experiments show that our deep
    neural network model achieves good performance in the images of different
    staining conditions or different types of tissues.

    Generalizable Features From Unsupervised Learning

    Mehdi Mirza, Aaron Courville, Yoshua Bengio
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Humans learn a predictive model of the world and use this model to reason
    about future events and the consequences of actions. In contrast to most
    machine predictors, we exhibit an impressive ability to generalize to unseen
    scenarios and reason intelligently in these settings. One important aspect of
    this ability is physical intuition(Lake et al., 2016). In this work, we explore
    the potential of unsupervised learning to find features that promote better
    generalization to settings outside the supervised training distribution. Our
    task is predicting the stability of towers of square blocks. We demonstrate
    that an unsupervised model, trained to predict future frames of a video
    sequence of stable and unstable block configurations, can yield features that
    support extrapolating stability prediction to blocks configurations outside the
    training set distribution

    A probabilistic graphical model approach in 30 m land cover mapping with multiple data sources

    Jie Wang, Luyan Ji, Xiaomeng Huang, Haohuan Fu, Shiming Xu, Congcong Li
    Comments: 38 pages, 5 figures
    Subjects: Applications (stat.AP); Computer Vision and Pattern Recognition (cs.CV)

    There is a trend to acquire high accuracy land-cover maps using multi-source
    classification methods, most of which are based on data fusion, especially
    pixel- or feature-level fusions. A probabilistic graphical model (PGM) approach
    is proposed in this research for 30 m resolution land-cover mapping with
    multi-temporal Landsat and MODerate Resolution Imaging Spectroradiometer
    (MODIS) data. Independent classifiers were applied to two single-date Landsat 8
    scenes and the MODIS time-series data, respectively, for probability
    estimation. A PGM was created for each pixel in Landsat 8 data. Conditional
    probability distributions were computed based on data quality and reliability
    by using information selectively. Using the administrative territory of Beijing
    City (Area-1) and a coastal region of Shandong province, China (Area-2) as
    study areas, multiple land-cover maps were generated for comparison.
    Quantitative results show the effectiveness of the proposed method. Overall
    accuracies promoted from 74.0% (maps acquired from single-temporal Landsat
    images) to 81.8% (output of the PGM) for Area-1. Improvements can also be seen
    when using MODIS data and only a single-temporal Landsat image as input
    (overall accuracy: 78.4% versus 74.0% for Area-1, and 86.8% versus 83.0% for
    Area-2). Information from MODIS data did not help much when the PGM was applied
    to cloud free regions of. One of the advantages of the proposed method is that
    it can be applied where multi-temporal data cannot be simply stacked as a
    multi-layered image.


    Artificial Intelligence

    Knowledge Completion for Generics using Guided Tensor Factorization

    Hanie Sedghi, Ashish Sabharwal
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    We consider knowledge base (KB) completion for KBs rich in facts about common
    nouns (generics), such as “trees produce oxygen” or “dogs have tails”. While KB
    completion has received much attention for named entity KBs, little emphasis
    has been placed on generics despite their importance for capturing general
    knowledge. Compared with named entity KBs such as Freebase, KBs about generics
    have more complex underlying regularities, are substantially more incomplete,
    and violate the commonly used locally closed world assumption (LCWA).
    Consequently, existing completion methods struggle with this new task, and the
    commonly used evaluation metrics become less meaningful. To address these
    challenges, we make three contributions: (a) a tensor factorization approach
    that achieves state-of-the-art results by incorporating external knowledge
    about relation schema and entity taxonomy, (b) taxonomy guided submodular
    active learning to efficiently collect additional annotations to address KB
    incompleteness, and (c) a more appropriate metric (yield at high precision)
    along with a constant-factor deterministic approximation algorithm to compute
    it cost-effectively with only a logarithmic number of human annotations. An
    empirical evaluation shows that our techniques achieve state-of-the-art results
    on two novel generics KBs about elementary level science.

    DeepMind Lab

    Charles Beattie, Joel Z. Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright, Heinrich Küttler, Andrew Lefrancq, Simon Green, Víctor Valdés, Amir Sadik, Julian Schrittwieser, Keith Anderson, Sarah York, Max Cant, Adam Cain, Adrian Bolton, Stephen Gaffney, Helen King, Demis Hassabis, Shane Legg, Stig Petersen
    Comments: 11 pages, 8 figures
    Subjects: Artificial Intelligence (cs.AI)

    DeepMind Lab is a first-person 3D game platform designed for research and
    development of general artificial intelligence and machine learning systems.
    DeepMind Lab can be used to study how autonomous artificial agents may learn
    complex tasks in large, partially observed, and visually diverse worlds.
    DeepMind Lab has a simple and flexible API enabling creative task-designs and
    novel AI-designs to be explored and quickly iterated upon. It is powered by a
    fast and widely recognised game engine, and tailored for effective use by the
    research community.

    Online Reinforcement Learning for Real-Time Exploration in Continuous State and Action Markov Decision Processes

    Ludovic Hofer, Hugo Gimbert
    Journal-ref: ICAPS 26th, PlanRob 4th (Workshop) (2016) 37-48
    Subjects: Artificial Intelligence (cs.AI)

    This paper presents a new method to learn online policies in continuous
    state, continuous action, model-free Markov decision processes, with two
    properties that are crucial for practical applications. First, the policies are
    implementable with a very low computational cost: once the policy is computed,
    the action corresponding to a given state is obtained in logarithmic time with
    respect to the number of samples used. Second, our method is versatile: it does
    not rely on any a priori knowledge of the structure of optimal policies. We
    build upon the Fitted Q-iteration algorithm which represents the (Q)-value as
    the average of several regression trees. Our algorithm, the Fitted Policy
    Forest algorithm (FPF), computes a regression forest representing the Q-value
    and transforms it into a single tree representing the policy, while keeping
    control on the size of the policy using resampling and leaf merging. We
    introduce an adaptation of Multi-Resolution Exploration (MRE) which is
    particularly suited to FPF. We assess the performance of FPF on three classical
    benchmarks for reinforcement learning: the “Inverted Pendulum”, the “Double
    Integrator” and “Car on the Hill” and show that FPF equals or outperforms other
    algorithms, although these algorithms rely on the use of particular
    representations of the policies, especially chosen in order to fit each of the
    three problems. Finally, we exhibit that the combination of FPF and MRE allows
    to find nearly optimal solutions in problems where (epsilon)-greedy approaches
    would fail.

    Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks

    Sahand Sharifzadeh, Ioannis Chiotellis, Rudolph Triebel, Daniel Cremers
    Comments: NIPS workshop on Deep Learning for Action and Interaction, 2016
    Subjects: Artificial Intelligence (cs.AI)

    We propose an inverse reinforcement learning (IRL) approach using Deep
    Q-Networks to extract the rewards in problems with large state spaces. We
    evaluate the performance of this approach in a simulation-based autonomous
    driving scenario. Our results resemble the intuitive relation between the
    reward function and readings of distance sensors mounted at different poses on
    the car. We also show that, after a few learning rounds, our simulated agent
    generates collision-free motions and performs human-like lane change behaviour.

    Flu Detector: Estimating influenza-like illness rates from online user-generated content

    Vasileios Lampos
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    We provide a brief technical description of an online platform for disease
    monitoring, titled as the Flu Detector (fludetector.cs.ucl.ac.uk). Flu
    Detector, in its current version (v.0.5), uses either Twitter or Google search
    data in conjunction with statistical Natural Language Processing models to
    estimate the rate of influenza-like illness in the population of England. Its
    back-end is a live service that collects online data, utilises modern
    technologies for large-scale text processing, and finally applies statistical
    inference models that are trained offline. The front-end visualises the various
    disease rate estimates. Notably, the models based on Google data achieve a high
    level of accuracy with respect to the most recent four flu seasons in England
    (2012/13 to 2015/16). This highlighted Flu Detector as having a great potential
    of becoming a complementary source to the domestic traditional flu surveillance
    schemes.

    Reinforcement Learning With Temporal Logic Rewards

    Xiao Li, Cristian-Ioan Vasile, Calin Belta
    Subjects: Artificial Intelligence (cs.AI)

    The reward function plays a critical role in rein- forcement learning (RL).
    It is a place where designers specify the desired behavior and impose important
    constraints for the system. While most reward functions used in the current RL
    literature have been quite simple for relatively simple tasks, real world
    applications typically involve tasks that are logically more complex. In this
    paper we take advantage of the expressive power of temporal logic (TL) to
    express complex rules the system should follow, or incorporate domain knowledge
    into learning. We propose a TL that we argue is suitable for robotic task
    specification in RL. We also present the concept of one-step robustness, and
    show that the problem of sparse reward that occurs when using TL specification
    as the reward function can be alleviated while leaving the resultant policy
    invariant. Simulated manipulation tasks are used to illustrate RL with the
    proposed TL and the one-step robustness.

    Technical Report: A Generalized Matching Pursuit Approach for Graph-Structured Sparsity

    Feng Chen, Baojian Zhou
    Comments: International Joint Conference on Artificial Intelligence, 2016
    Subjects: Artificial Intelligence (cs.AI)

    Sparsity-constrained optimization is an important and challenging problem
    that has wide applicability in data mining, machine learning, and statistics.
    In this paper, we focus on sparsity-constrained optimization in cases where the
    cost function is a general nonlinear function and, in particular, the sparsity
    constraint is defined by a graph-structured sparsity model. Existing methods
    explore this problem in the context of sparse estimation in linear models. To
    the best of our knowledge, this is the first work to present an efficient
    approximation algorithm, namely, Graph-structured Matching Pursuit (Graph-Mp),
    to optimize a general nonlinear function subject to graph-structured
    constraints. We prove that our algorithm enjoys the strong guarantees analogous
    to those designed for linear models in terms of convergence rate and
    approximation accuracy. As a case study, we specialize Graph-Mp to optimize a
    number of well-known graph scan statistic models for the connected subgraph
    detection task, and empirical evidence demonstrates that our general algorithm
    performs superior over state-of-the-art methods that are designed specifically
    for the task of connected subgraph detection.

    FOCA: A Methodology for Ontology Evaluation

    Judson Bandeira, Ig Ibert Bittencourt, Patricia Espinheira, Seiji Isotani
    Subjects: Artificial Intelligence (cs.AI)

    Modeling an ontology is a hard and time-consuming task. Although
    methodologies are useful for ontologists to create good ontologies, they do not
    help with the task of evaluating the quality of the ontology to be reused. For
    these reasons, it is imperative to evaluate the quality of the ontology after
    constructing it or before reusing it. Few studies usually present only a set of
    criteria and questions, but no guidelines to evaluate the ontology. The effort
    to evaluate an ontology is very high as there is a huge dependence on the
    evaluator’s expertise to understand the criteria and questions in depth.
    Moreover, the evaluation is still very subjective. This study presents a novel
    methodology for ontology evaluation, taking into account three fundamental
    principles: i) it is based on the Goal, Question, Metric approach for empirical
    evaluation; ii) the goals of the methodologies are based on the roles of
    knowledge representations combined with specific evaluation criteria; iii) each
    ontology is evaluated according to the type of ontology. The methodology was
    empirically evaluated using different ontologists and ontologies of the same
    domain. The main contributions of this study are: i) defining a step-by-step
    approach to evaluate the quality of an ontology; ii) proposing an evaluation
    based on the roles of knowledge representations; iii) the explicit difference
    of the evaluation according to the type of the ontology iii) a questionnaire to
    evaluate the ontologies; iv) a statistical model that automatically calculates
    the quality of the ontologies.

    Knowledge Elicitation via Sequential Probabilistic Inference for High-Dimensional Prediction

    Pedram Daee, Tomi Peltola, Marta Soare, Samuel Kaski
    Comments: 22 pages, 9 figures, all codes and data available at this https URL
    Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG); Machine Learning (stat.ML)

    Prediction in a small-sized sample with a large number of covariates, the
    “small n, large p” problem, is challenging. This setting is encountered in
    multiple applications, such as precision medicine, where obtaining additional
    samples can be extremely costly or even impossible, and extensive research
    effort has recently been dedicated to finding principled solutions for accurate
    prediction. However, a valuable source of additional information, domain
    experts, has not yet been efficiently exploited. We formulate knowledge
    elicitation generally as a probabilistic inference process, where expert
    knowledge is sequentially queried to improve predictions. In the specific case
    of sparse linear regression, where we assume the expert has knowledge about the
    values of the regression coefficients or about the relevance of the features,
    we propose an algorithm and computational approximation for fast and efficient
    interaction, which sequentially identifies the most informative features on
    which to query expert knowledge. Evaluations of our method in experiments with
    simulated and real users show improved prediction accuracy already with a small
    effort from the expert.

    DeepCancer: Detecting Cancer through Gene Expressions via Deep Generative Learnin

    Rajendra Rana Bhat, Vivek Viswanath, Xiaolin Li
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Genomics (q-bio.GN)

    Transcriptional profiling on microarrays to obtain gene expressions has been
    used to facilitate cancer diagnosis. We propose a deep generative machine
    learning architecture (called DeepCancer) that learn features from unlabeled
    microarray data. These models have been used in conjunction with conventional
    classifiers that perform classification of the tissue samples as either being
    cancerous or non-cancerous. The proposed model has been tested on two different
    clinical datasets. The evaluation demonstrates that DeepCancer model achieves a
    very high precision score, while significantly controlling the false positive
    and false negative scores.

    A Unit Selection Methodology for Music Generation Using Deep Neural Networks

    Mason Bretan, Gil Weinberg, Larry Heck
    Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)

    Several methods exist for a computer to generate music based on data
    including Markov chains, recurrent neural networks, recombinancy, and grammars.
    We explore the use of unit selection and concatenation as a means of generating
    music using a procedure based on ranking, where, we consider a unit to be a
    variable length number of measures of music. We first examine whether a unit
    selection method, that is restricted to a finite size unit library, can be
    sufficient for encompassing a wide spectrum of music. We do this by developing
    a deep autoencoder that encodes a musical input and reconstructs the input by
    selecting from the library. We then describe a generative model that combines a
    deep structured semantic model (DSSM) with an LSTM to predict the next unit,
    where units consist of four, two, and one measures of music. We evaluate the
    generative model using objective metrics including mean rank and accuracy and
    with a subjective listening test in which expert musicians are asked to
    complete a forced-choiced ranking task. We compare our model to a note-level
    generative baseline that consists of a stacked LSTM trained to predict forward
    by one note.

    Segmentation of large images based on super-pixels and community detection in graphs

    Oscar A. C. Linares, Glenda Michele Botelho, Francisco Aparecido Rodrigues, João Batista Neto
    Comments: 20 pages, 12 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Image segmentation has many applications which range from machine learning to
    medical diagnosis. In this paper, we propose a framework for the segmentation
    of images based on super-pixels and algorithms for community identification in
    graphs. The super-pixel pre-segmentation step reduces the number of nodes in
    the graph, rendering the method the ability to process large images. Moreover,
    community detection algorithms provide more accurate segmentation than
    traditional approaches, such as those based on spectral graph partition. We
    also compare our method with two algorithms: a) the graph-based approach by
    Felzenszwalb and Huttenlocher and b) the contour-based method by Arbelaez.
    Results have shown that our method provides more precise segmentation and is
    faster than both of them.

    Reading Comprehension using Entity-based Memory Network

    Xun Wang, Katsuhito Sudoh, Masaaki Nagata, Tomohide Shibata, Kawahara Daisuke, Kurohashi Sadao
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    This paper introduces a novel neural network model for question answering,
    the emph{entity-based memory network}. It enhances neural networks’ ability of
    representing and calculating information over a long period by keeping records
    of entities contained in text. The core component is a memory pool which
    comprises entities’ states. These entities’ states are continuously updated
    according to the input text. Questions with regard to the input text are used
    to search the memory pool for related entities and answers are further
    predicted based on the states of retrieved entities. Compared with previous
    memory network models, the proposed model is capable of handling fine-grained
    information and more sophisticated relations based on entities. We formulated
    several different tasks as question answering problems and tested the proposed
    model. Experiments reported satisfying results.

    A Model of Multi-Agent Consensus for Vague and Uncertain Beliefs

    Michael Crosscombe, Jonathan Lawry
    Journal-ref: Adaptive Behavior. 24 (4). pp 249-260. 2016
    Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)

    Consensus formation is investigated for multi-agent systems in which agents’
    beliefs are both vague and uncertain. Vagueness is represented by a third truth
    state meaning emph{borderline}. This is combined with a probabilistic model of
    uncertainty. A belief combination operator is then proposed which exploits
    borderline truth values to enable agents with conflicting beliefs to reach a
    compromise. A number of simulation experiments are carried out in which agents
    apply this operator in pairwise interactions, under the bounded confidence
    restriction that the two agents’ beliefs must be sufficiently consistent with
    each other before agreement can be reached. As well as studying the consensus
    operator in isolation we also investigate scenarios in which agents are
    influenced either directly or indirectly by the state of the world. For the
    former we conduct simulations which combine consensus formation with belief
    updating based on evidence. For the latter we investigate the effect of
    assuming that the closer an agent’s beliefs are to the truth the more visible
    they are in the consensus building process. In all cases applying the consensus
    operators results in the population converging to a single shared belief which
    is both crisp and certain. Furthermore, simulations which combine consensus
    formation with evidential updating converge faster to a shared opinion which is
    closer to the actual state of the world than those in which beliefs are only
    changed as a result of directly receiving new evidence. Finally, if agent
    interactions are guided by belief quality measured as similarity to the true
    state of the world, then applying the consensus operator alone results in the
    population converging to a high quality shared belief.

    Multiple Instance Learning: A Survey of Problem Characteristics and Applications

    Marc-André Carbonneau, Veronika Cheplygina, Eric Granger, Ghyslain Gagnon
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

    Multiple instance learning (MIL) is a form of weakly supervised learning
    where training instances are arranged in sets, called bags, and a label is
    provided for the entire bag. This formulation is gaining interest because it
    naturally fits various problems and allows to leverage weakly labeled data.
    Consequently, it has been used in diverse application fields such as computer
    vision and document classification. However, learning from bags raises
    important challenges that are unique to MIL. This paper provides a
    comprehensive survey of the characteristics which define and differentiate the
    types of MIL problems. Until now, these problem characteristics have not been
    formally identified and described. As a result, the variations in performance
    of MIL algorithms from one data set to another are difficult to explain. In
    this paper, MIL problem characteristics are grouped into four broad categories:
    the composition of the bags, the types of data distribution, the ambiguity of
    instance labels, and the task to be performed. Methods specialized to address
    each category are reviewed. Then, the extent to which these characteristics
    manifest themselves in key MIL application areas are described. Finally,
    experiments are conducted to compare the performance of 16 state-of-the-art MIL
    methods on selected problem characteristics. This paper provides insight on how
    the problem characteristics affect MIL algorithms, recommendations for future
    benchmarking and promising avenues for research.

    StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

    Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, Dimitris Metaxas
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Synthesizing photo-realistic images from text descriptions is a challenging
    problem in computer vision and has many practical applications. Samples
    generated by existing text-to-image approaches can roughly reflect the meaning
    of the given descriptions, but they fail to contain necessary details and vivid
    object parts. In this paper, we propose stacked Generative Adversarial Networks
    (StackGAN) to generate photo-realistic images conditioned on text descriptions.
    The Stage-I GAN sketches the primitive shape and basic colors of the object
    based on the given text description, yielding Stage-I low resolution images.
    The Stage-II GAN takes Stage-I results and text descriptions as inputs, and
    generates high resolution images with photo-realistic details. The Stage-II GAN
    is able to rectify defects and add compelling details with the refinement
    process. Samples generated by StackGAN are more plausible than those generated
    by existing approaches. Importantly, our StackGAN for the first time generates
    realistic 256 x 256 images conditioned on only text descriptions, while
    state-of-the-art methods can generate at most 128 x 128 images. To demonstrate
    the effectiveness of the proposed StackGAN, extensive experiments are conducted
    on CUB and Oxford-102 datasets, which contain enough object appearance
    variations and are widely-used for text-to-image generation analysis.

    How to Read Less: Better Machine Assisted Reading Methods for Systematic Literature Reviews

    Zhe Yu, Nicholas A. Kraft, Tim Menzies
    Comments: 16 pages, 13 figures, submitted to Information and Software Technology
    Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

    Context: Systematic literature reviews (SLRs) are the primary method for
    aggregating and synthesizing evidence in evidence-based software engineering.
    Primary study selection is a critical and time-consuming SLR step in which
    reviewers use titles, abstracts, or even full texts to evaluate thousands of
    studies to find the dozens of them that are relevant to the research questions.
    Objective: We seek to reduce the effort of primary study selection in SLRs with
    machine assisted reading techniques. Method: In this paper we explore and
    refactor the state-of-the-art machine assisted reading techniques from both
    evidence-based medicine and legal electronic discovery to support SLRs. By
    refactoring those methods, we discovered FASTREAD, which is a new
    state-of-the-art in machine assisted primary studies for SLRs. Tested on two
    data sets generated from existing SLRs of Hall, Wahono, et al., FASTREAD
    outperforms the current state-of-the-art methods. Results: Using FASTREAD, it
    is possible to find 90% of the studies found by standard manual methods, but
    after only reading less than 10% of the candidate studies. Conclusions: With
    the help of FASTREAD, conducting an SLR is much more efficient and less
    difficult. Software Engineering researchers now have no excuse: they should
    conduct SLRs.


    Information Retrieval

    Search Personalization with Embeddings

    Thanh Vu, Dat Quoc Nguyen, Mark Johnson, Dawei Song, Alistair Willis
    Comments: In Proceedings of the 39th European Conference on Information Retrieval, ECIR 2017, to appear
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    Recent research has shown that the performance of search personalization
    depends on the richness of user profiles which normally represent the user’s
    topical interests. In this paper, we propose a new embedding approach to
    learning user profiles, where users are embedded on a topical interest space.
    We then directly utilize the user profiles for search personalization.
    Experiments on query logs from a major commercial web search engine demonstrate
    that our embedding approach improves the performance of the search engine and
    also achieves better search performance than other strong baselines.

    Label Visualization and Exploration in IR

    Omar Alonso
    Subjects: Information Retrieval (cs.IR)

    There is a renaissance in visual analytics systems for data analysis and
    sharing, in particular, in the current wave of big data applications. We
    introduce RAVE, a prototype that automates the generation of an interface that
    uses facets and visualization techniques for exploring and analyzing relevance
    assessments data sets collected via crowdsourcing. We present a technical
    description of the main components and demonstrate its use.

    Data Curation APIs

    Seyed-Mehdi-Reza Beheshti, Alireza Tabebordbar, Boualem Benatallah, Reza Nouri
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    Understanding and analyzing big data is firmly recognized as a powerful and
    strategic priority. For deeper interpretation of and better intelligence with
    big data, it is important to transform raw data (unstructured, semi-structured
    and structured data sources, e.g., text, video, image data sets) into curated
    data: contextualized data and knowledge that is maintained and made available
    for use by end-users and applications. In particular, data curation acts as the
    glue between raw data and analytics, providing an abstraction layer that
    relieves users from time consuming, tedious and error prone curation tasks. In
    this context, the data curation process becomes a vital analytics asset for
    increasing added value and insights.

    In this paper, we identify and implement a set of curation APIs and make them
    available (on GitHub) to researchers and developers to assist them transforming
    their raw data into curated data. The curation APIs enable developers to easily
    add features – such as extracting keyword, part of speech, and named entities
    such as Persons, Locations, Organizations, Companies, Products, Diseases,
    Drugs, etc.; providing synonyms and stems for extracted information items
    leveraging lexical knowledge bases for the English language such as WordNet;
    linking extracted entities to external knowledge bases such as Google Knowledge
    Graph and Wikidata; discovering similarity among the extracted information
    items, such as calculating similarity between string, number, date and time
    data; classifying, sorting and categorizing data into various types, forms or
    any other distinct class; and indexing structured and unstructured data – into
    their applications.

    A natural language interface to a graph-based bibliographic information retrieval system

    Yongjun Zhu, Erjia Yan, Il-Yeol Song
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    With the ever-increasing scientific literature, there is a need on a natural
    language interface to bibliographic information retrieval systems to retrieve
    related information effectively. In this paper, we propose a natural language
    interface, NLI-GIBIR, to a graph-based bibliographic information retrieval
    system. In designing NLI-GIBIR, we developed a novel framework that can be
    applicable to graph-based bibliographic information retrieval systems. Our
    framework integrates algorithms/heuristics for interpreting and analyzing
    natural language bibliographic queries. NLI-GIBIR allows users to search for a
    variety of bibliographic data through natural language. A series of text- and
    linguistic-based techniques are used to analyze and answer natural language
    queries, including tokenization, named entity recognition, and syntactic
    analysis. We find that our framework can effectively represents and addresses
    complex bibliographic information needs. Thus, the contributions of this paper
    are as follows: First, to our knowledge, it is the first attempt to propose a
    natural language interface to graph-based bibliographic information retrieval.
    Second, we propose a novel customized natural language processing framework
    that integrates a few original algorithms/heuristics for interpreting and
    analyzing natural language bibliographic queries. Third, we show that the
    proposed framework and natural language interface provide a practical solution
    in building real-world natural language interface-based bibliographic
    information retrieval systems. Our experimental results show that the presented
    system can correctly answer 39 out of 40 example natural language queries with
    varying lengths and complexities.

    Fun Facts: Automatic Trivia Fact Extraction from Wikipedia

    David Tsurel, Dan Pelleg, Ido Guy, Dafna Shahaf
    Comments: To appear in Proceedings of tenth ACM International Conference on Web Search and Data Mining, WSDM 2017
    Subjects: Social and Information Networks (cs.SI); Databases (cs.DB); Information Retrieval (cs.IR)

    A significant portion of web search queries directly refers to named
    entities. Search engines explore various ways to improve the user experience
    for such queries. We suggest augmenting search results with {em trivia facts}
    about the searched entity. Trivia is widely played throughout the world, and
    was shown to increase users’ engagement and retention.

    Most random facts are not suitable for the trivia section. There is skill
    (and art) to curating good trivia. In this paper, we formalize a notion of
    emph{trivia-worthiness} and propose an algorithm that automatically mines
    trivia facts from Wikipedia. We take advantage of Wikipedia’s category
    structure, and rank an entity’s categories by their trivia-quality.

    Our algorithm is capable of finding interesting facts, such as Obama’s Grammy
    or Elvis’ stint as a tank gunner. In user studies, our algorithm captures the
    intuitive notion of “good trivia” 45\% higher than prior work. Search-page
    tests show a 22\% decrease in bounce rates and a 12\% increase in dwell time,
    proving our facts hold users’ attention.

    Connection Discovery using Shared Images by Gaussian Relational Topic Model

    Xiaopeng Li, Ming Cheung, James She
    Comments: IEEE International Conference on Big Data 2016
    Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR)

    Social graphs, representing online friendships among users, are one of the
    fundamental types of data for many applications, such as recommendation,
    virality prediction and marketing in social media. However, this data may be
    unavailable due to the privacy concerns of users, or kept private by social
    network operators, which makes such applications difficult. Inferring user
    interests and discovering user connections through their shared multimedia
    content has attracted more and more attention in recent years. This paper
    proposes a Gaussian relational topic model for connection discovery using user
    shared images in social media. The proposed model not only models user
    interests as latent variables through their shared images, but also considers
    the connections between users as a result of their shared images. It explicitly
    relates user shared images to user connections in a hierarchical, systematic
    and supervisory way and provides an end-to-end solution for the problem. This
    paper also derives efficient variational inference and learning algorithms for
    the posterior of the latent variables and model parameters. It is demonstrated
    through experiments with over 200k images from Flickr that the proposed method
    significantly outperforms the methods in previous works.

    Multiple Instance Learning: A Survey of Problem Characteristics and Applications

    Marc-André Carbonneau, Veronika Cheplygina, Eric Granger, Ghyslain Gagnon
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

    Multiple instance learning (MIL) is a form of weakly supervised learning
    where training instances are arranged in sets, called bags, and a label is
    provided for the entire bag. This formulation is gaining interest because it
    naturally fits various problems and allows to leverage weakly labeled data.
    Consequently, it has been used in diverse application fields such as computer
    vision and document classification. However, learning from bags raises
    important challenges that are unique to MIL. This paper provides a
    comprehensive survey of the characteristics which define and differentiate the
    types of MIL problems. Until now, these problem characteristics have not been
    formally identified and described. As a result, the variations in performance
    of MIL algorithms from one data set to another are difficult to explain. In
    this paper, MIL problem characteristics are grouped into four broad categories:
    the composition of the bags, the types of data distribution, the ambiguity of
    instance labels, and the task to be performed. Methods specialized to address
    each category are reviewed. Then, the extent to which these characteristics
    manifest themselves in key MIL application areas are described. Finally,
    experiments are conducted to compare the performance of 16 state-of-the-art MIL
    methods on selected problem characteristics. This paper provides insight on how
    the problem characteristics affect MIL algorithms, recommendations for future
    benchmarking and promising avenues for research.


    Computation and Language

    Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices

    Felix Stahlberg, Adrià de Gispert, Eva Hasler, Bill Byrne
    Comments: Submitted as EACL2017 short paper
    Subjects: Computation and Language (cs.CL)

    We present a novel scheme to combine neural machine translation (NMT) with
    traditional statistical machine translation (SMT). Our approach borrows ideas
    from linearised lattice minimum Bayes-risk decoding for SMT. The NMT score is
    combined with the Bayes-risk of the translation according the SMT lattice. This
    makes our approach much more flexible than (n)-best list or lattice rescoring
    as the neural decoder is not restricted to the SMT search space. We show an
    efficient and simple way to integrate risk estimation into the NMT decoder. We
    test our method on English-German and Japanese-English and report significant
    gains over lattice rescoring on several data sets for both single and ensembled
    NMT.

    Context-aware Sentiment Word Identification: sentiword2vec

    Yushi Yao, Guangjian Li
    Comments: 15 pages
    Subjects: Computation and Language (cs.CL)

    Traditional sentiment analysis often uses sentiment dictionary to extract
    sentiment information in text and classify documents. However, emerging
    informal words and phrases in user generated content call for analysis aware to
    the context. Usually, they have special meanings in a particular context.
    Because of its great performance in representing inter-word relation, we use
    sentiment word vectors to identify the special words. Based on the distributed
    language model word2vec, in this paper we represent a novel method about
    sentiment representation of word under particular context, to be detailed, to
    identify the words with abnormal sentiment polarity in long answers. Result
    shows the improved model shows better performance in representing the words
    with special meaning, while keep doing well in representing special idiomatic
    pattern. Finally, we will discuss the meaning of vectors representing in the
    field of sentiment, which may be different from general object-based
    conditions.

    From narrative descriptions to MedDRA: automagically encoding adverse drug reactions

    Carlo Combi, Margherita Zorzi, Gabriele Pozzani, Ugo Moretti
    Comments: arXiv admin note: substantial text overlap with arXiv:1506.08052
    Subjects: Computation and Language (cs.CL)

    The collection of narrative spontaneous reports is an irreplaceable source
    for the prompt detection of suspected adverse drug reactions (ADRs): qualified
    domain experts manually revise a huge amount of narrative descriptions and then
    encode texts according to MedDRA standard terminology. The manual annotation of
    narrative documents with medical terminology is a subtle and expensive task,
    since the number of reports is growing up day-by-day. MagiCoder, a Natural
    Language Processing algorithm, is proposed for the automatic encoding of
    free-text descriptions into MedDRA terms. MagiCoder procedure is efficient in
    terms of computational complexity (in particular, it is linear in the size of
    the narrative input and the terminology). We tested it on a large dataset of
    about 4500 manually revised reports, by performing an automated comparison
    between human and MagiCoder revisions. For the current base version of
    MagiCoder, we measured: on short descriptions, an average recall of (86\%) and
    an average precision of (88\%); on medium-long descriptions (up to 255
    characters), an average recall of (64\%) and an average precision of (63\%).
    From a practical point of view, MagiCoder reduces the time required for
    encoding ADR reports. Pharmacologists have simply to review and validate the
    MagiCoder terms proposed by the application, instead of choosing the right
    terms among the 70K low level terms of MedDRA. Such improvement in the
    efficiency of pharmacologists’ work has a relevant impact also on the quality
    of the subsequent data analysis. We developed MagiCoder for the Italian
    pharmacovigilance language. However, our proposal is based on a general
    approach, not depending on the considered language nor the term dictionary.

    Unraveling reported dreams with text analytics

    Iris Hendrickx, Louis Onrust, Florian Kunneman, Ali Hürriyetoğlu, Antal van den Bosch, Wessel Stoop
    Subjects: Computation and Language (cs.CL)

    We investigate what distinguishes reported dreams from other personal
    narratives. The continuity hypothesis, stemming from psychological dream
    analysis work, states that most dreams refer to a person’s daily life and
    personal concerns, similar to other personal narratives such as diary entries.
    Differences between the two texts may reveal the linguistic markers of dream
    text, which could be the basis for new dream analysis work and for the
    automatic detection of dream descriptions. We used three text analytics
    methods: text classification, topic modeling, and text coherence analysis, and
    applied these methods to a balanced set of texts representing dreams, diary
    entries, and other personal stories. We observed that dream texts could be
    distinguished from other personal narratives nearly perfectly, mostly based on
    the presence of uncertainty markers and descriptions of scenes. Important
    markers for non-dream narratives are specific time expressions and
    conversational expressions. Dream texts also exhibit a lower discourse
    coherence than other personal narratives.

    FastText.zip: Compressing text classification models

    Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, Tomas Mikolov
    Comments: Submitted to ICLR 2017
    Subjects: Computation and Language (cs.CL)

    We consider the problem of producing compact architectures for text
    classification, such that the full model fits in a limited amount of memory.
    After considering different solutions inspired by the hashing literature, we
    propose a method built upon product quantization to store word embeddings.
    While the original technique leads to a loss in accuracy, we adapt this method
    to circumvent quantization artefacts. Our experiments carried out on several
    benchmarks show that our approach typically requires two orders of magnitude
    less memory than fastText while being only slightly inferior with respect to
    accuracy. As a result, it outperforms the state of the art by a good margin in
    terms of the compromise between memory usage and accuracy.

    Reading Comprehension using Entity-based Memory Network

    Xun Wang, Katsuhito Sudoh, Masaaki Nagata, Tomohide Shibata, Kawahara Daisuke, Kurohashi Sadao
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    This paper introduces a novel neural network model for question answering,
    the emph{entity-based memory network}. It enhances neural networks’ ability of
    representing and calculating information over a long period by keeping records
    of entities contained in text. The core component is a memory pool which
    comprises entities’ states. These entities’ states are continuously updated
    according to the input text. Questions with regard to the input text are used
    to search the memory pool for related entities and answers are further
    predicted based on the states of retrieved entities. Compared with previous
    memory network models, the proposed model is capable of handling fine-grained
    information and more sophisticated relations based on entities. We formulated
    several different tasks as question answering problems and tested the proposed
    model. Experiments reported satisfying results.

    A Character-Word Compositional Neural Language Model for Finnish

    Matti Lankinen, Hannes Heikinheimo, Pyry Takala, Tapani Raiko, Juha Karhunen
    Subjects: Computation and Language (cs.CL)

    Inspired by recent research, we explore ways to model the highly
    morphological Finnish language at the level of characters while maintaining the
    performance of word-level models. We propose a new
    Character-to-Word-to-Character (C2W2C) compositional language model that uses
    characters as input and output while still internally processing word level
    embeddings. Our preliminary experiments, using the Finnish Europarl V7 corpus,
    indicate that C2W2C can respond well to the challenges of morphologically rich
    languages such as high out of vocabulary rates, the prediction of novel words,
    and growing vocabulary size. Notably, the model is able to correctly score
    inflectional forms that are not present in the training data and sample
    grammatically and semantically correct Finnish sentences character by
    character.

    Active Learning for Speech Recognition: the Power of Gradients

    Jiaji Huang, Rewon Child, Vinay Rao, Hairong Liu, Sanjeev Satheesh, Adam Coates
    Comments: published as a workshop paper at NIPS 2016
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    In training speech recognition systems, labeling audio clips can be
    expensive, and not all data is equally valuable. Active learning aims to label
    only the most informative samples to reduce cost. For speech recognition,
    confidence scores and other likelihood-based active learning methods have been
    shown to be effective. Gradient-based active learning methods, however, are
    still not well-understood. This work investigates the Expected Gradient Length
    (EGL) approach in active learning for end-to-end speech recognition. We justify
    EGL from a variance reduction perspective, and observe that EGL’s measure of
    informativeness picks novel samples uncorrelated with confidence scores.
    Experimentally, we show that EGL can reduce word errors by 11\%, or
    alternatively, reduce the number of samples to label by 50\%, when compared to
    random sampling.

    #HashtagWars: Learning a Sense of Humor

    Peter Potash, Alexey Romanov, Anna Rumshisky
    Comments: 10 Pages
    Subjects: Computation and Language (cs.CL)

    In this work, we present a new dataset for computational humor, specifically
    comparative humor ranking, which attempts to eschew the ubiquitous binary
    approach to humor detection. The dataset consists of tweets that are humorous
    responses to a given hashtag. We describe the motivation for this new dataset,
    as well as the collection process, which includes a description of our
    semi-automated system for data collection. We also present initial experiments
    for this dataset using both unsupervised and supervised approaches. Our best
    supervised system achieved 63.7% accuracy, suggesting that this task is much
    more difficult than comparable humor detection tasks. Initial experiments
    indicate that a character-level model is more suitable for this task than a
    token-level model, likely due to a large amount of puns that can be captured by
    a character-level model.

    Evaluating Creative Language Generation: The Case of Rap Lyric Ghostwriting

    Peter Potash, Alexey Romanov, Anna Rumshisky
    Comments: 10 pages
    Subjects: Computation and Language (cs.CL)

    Language generation tasks that seek to mimic human ability to use language
    creatively are difficult to evaluate, since one must consider creativity,
    style, and other non-trivial aspects of the generated text. The goal of this
    paper is to develop evaluation methods for one such task, ghostwriting of rap
    lyrics, and to provide an explicit, quantifiable foundation for the goals and
    future directions of this task. Ghostwriting must produce text that is similar
    in style to the emulated artist, yet distinct in content. We develop a novel
    evaluation methodology that addresses several complementary aspects of this
    task, and illustrate how such evaluation can be used to meaningfully analyze
    system performance. We provide a corpus of lyrics for 13 rap artists, annotated
    for stylistic similarity, which allows us to assess the feasibility of manual
    evaluation for generated verse.

    VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering

    Marc Bolaños, Álvaro Peris, Francisco Casacuberta, Petia Radeva
    Comments: Submitted to IbPRIA’17, 8 pages, 3 figures, 1 table
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    In this paper, we address the problem of visual question answering by
    proposing a novel model, called VIBIKNet. Our model is based on integrating
    Kernelized Convolutional Neural Networks and Long-Short Term Memory units to
    generate an answer given a question about an image. We prove that VIBIKNet is
    an optimal trade-off between accuracy and computational load, in terms of
    memory and time consumption. We validate our method on the VQA challenge
    dataset and compare it to the top performing methods in order to illustrate its
    performance and speed.

    Search Personalization with Embeddings

    Thanh Vu, Dat Quoc Nguyen, Mark Johnson, Dawei Song, Alistair Willis
    Comments: In Proceedings of the 39th European Conference on Information Retrieval, ECIR 2017, to appear
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    Recent research has shown that the performance of search personalization
    depends on the richness of user profiles which normally represent the user’s
    topical interests. In this paper, we propose a new embedding approach to
    learning user profiles, where users are embedded on a topical interest space.
    We then directly utilize the user profiles for search personalization.
    Experiments on query logs from a major commercial web search engine demonstrate
    that our embedding approach improves the performance of the search engine and
    also achieves better search performance than other strong baselines.

    Flu Detector: Estimating influenza-like illness rates from online user-generated content

    Vasileios Lampos
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    We provide a brief technical description of an online platform for disease
    monitoring, titled as the Flu Detector (fludetector.cs.ucl.ac.uk). Flu
    Detector, in its current version (v.0.5), uses either Twitter or Google search
    data in conjunction with statistical Natural Language Processing models to
    estimate the rate of influenza-like illness in the population of England. Its
    back-end is a live service that collects online data, utilises modern
    technologies for large-scale text processing, and finally applies statistical
    inference models that are trained offline. The front-end visualises the various
    disease rate estimates. Notably, the models based on Google data achieve a high
    level of accuracy with respect to the most recent four flu seasons in England
    (2012/13 to 2015/16). This highlighted Flu Detector as having a great potential
    of becoming a complementary source to the domestic traditional flu surveillance
    schemes.

    Data Curation APIs

    Seyed-Mehdi-Reza Beheshti, Alireza Tabebordbar, Boualem Benatallah, Reza Nouri
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    Understanding and analyzing big data is firmly recognized as a powerful and
    strategic priority. For deeper interpretation of and better intelligence with
    big data, it is important to transform raw data (unstructured, semi-structured
    and structured data sources, e.g., text, video, image data sets) into curated
    data: contextualized data and knowledge that is maintained and made available
    for use by end-users and applications. In particular, data curation acts as the
    glue between raw data and analytics, providing an abstraction layer that
    relieves users from time consuming, tedious and error prone curation tasks. In
    this context, the data curation process becomes a vital analytics asset for
    increasing added value and insights.

    In this paper, we identify and implement a set of curation APIs and make them
    available (on GitHub) to researchers and developers to assist them transforming
    their raw data into curated data. The curation APIs enable developers to easily
    add features – such as extracting keyword, part of speech, and named entities
    such as Persons, Locations, Organizations, Companies, Products, Diseases,
    Drugs, etc.; providing synonyms and stems for extracted information items
    leveraging lexical knowledge bases for the English language such as WordNet;
    linking extracted entities to external knowledge bases such as Google Knowledge
    Graph and Wikidata; discovering similarity among the extracted information
    items, such as calculating similarity between string, number, date and time
    data; classifying, sorting and categorizing data into various types, forms or
    any other distinct class; and indexing structured and unstructured data – into
    their applications.

    A natural language interface to a graph-based bibliographic information retrieval system

    Yongjun Zhu, Erjia Yan, Il-Yeol Song
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    With the ever-increasing scientific literature, there is a need on a natural
    language interface to bibliographic information retrieval systems to retrieve
    related information effectively. In this paper, we propose a natural language
    interface, NLI-GIBIR, to a graph-based bibliographic information retrieval
    system. In designing NLI-GIBIR, we developed a novel framework that can be
    applicable to graph-based bibliographic information retrieval systems. Our
    framework integrates algorithms/heuristics for interpreting and analyzing
    natural language bibliographic queries. NLI-GIBIR allows users to search for a
    variety of bibliographic data through natural language. A series of text- and
    linguistic-based techniques are used to analyze and answer natural language
    queries, including tokenization, named entity recognition, and syntactic
    analysis. We find that our framework can effectively represents and addresses
    complex bibliographic information needs. Thus, the contributions of this paper
    are as follows: First, to our knowledge, it is the first attempt to propose a
    natural language interface to graph-based bibliographic information retrieval.
    Second, we propose a novel customized natural language processing framework
    that integrates a few original algorithms/heuristics for interpreting and
    analyzing natural language bibliographic queries. Third, we show that the
    proposed framework and natural language interface provide a practical solution
    in building real-world natural language interface-based bibliographic
    information retrieval systems. Our experimental results show that the presented
    system can correctly answer 39 out of 40 example natural language queries with
    varying lengths and complexities.


    Distributed, Parallel, and Cluster Computing

    Smart Scheduling of Continuous Data-Intensive Workflows with Machine Learning Triggered Execution

    Sérgio Esteves, Helena Galhardas, Luís Veiga
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    To extract value from evergrowing volumes of data, coming from a number of
    different sources, and to drive decision making, organizations frequently
    resort to the composition of data processing workflows, since they are
    expressive, flexible, and scalable. The typical workflow model enforces strict
    temporal synchronization across processing steps without accounting the actual
    effect of intermediate computations on the final workflow output. However, this
    is not the most desirable behavior in a multitude of scenarios. We identify a
    class of applications for continuous data processing where workflow output
    changes slowly and without great significance in a short-to-medium time window,
    thus wasting compute resources and energy with current approaches.

    To overcome such inefficiency, we introduce a novel workflow model, for
    continuous and data-intensive processing, capable of relaxing triggering
    semantics according to the impact input data is assessed to have on changing
    the workflow output. To assess this impact, learn the correlation between input
    and output variation, and guarantee correctness within a given tolerated error
    constant, we rely on Machine Learning. The functionality of this workflow model
    is implemented in SmartFlux, a middleware framework which can be effortlessly
    integrated with existing workflow managers. Experimental results indicate we
    are able to save a significant amount of resources while not deviating the
    workflow output beyond a small error constant with high confidence level.

    Consus: Taming the Paxi

    Robert Escriva, Robbert van Renesse
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Consus is a strictly serializable geo-replicated transactional key-value
    store. The key contribution of Consus is a new commit protocol that reduces the
    cost of executing a transaction to three wide area message delays in the common
    case. Augmenting the commit protocol are multiple Paxos implementations
    optimized for different purposes. Together the different implementations and
    optimizations comprise a cohesive system that provides low latency, high
    availability, and strong guarantees. This paper describes the techniques
    implemented in the open source release of Consus, and lays the groundwork for
    evaluating Consus once the system implementation is sufficiently robust for a
    thorough evaluation.

    Efficient Methods and Parallel Execution for Algorithm Sensitivity Analysis with Parameter Tuning on Microscopy Imaging Datasets

    George Teodoro, Tahsin Kurc, Luis F. R. Taveira, Alba C. M. A. Melo, Jun Kong, Joel Saltz
    Comments: 36 pages, 10 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Background: We describe an informatics framework for researchers and clinical
    investigators to efficiently perform parameter sensitivity analysis and
    auto-tuning for algorithms that segment and classify image features in a large
    dataset of high-resolution images. The computational cost of the sensitivity
    analysis process can be very high, because the process requires processing the
    input dataset several times to systematically evaluate how output varies when
    input parameters are varied. Thus, high performance computing techniques are
    required to quickly execute the sensitivity analysis process.

    Results: We carried out an empirical evaluation of the proposed method on
    high performance computing clusters with multi-core CPUs and co-processors
    (GPUs and Intel Xeon Phis). Our results show that (1) the framework achieves
    excellent scalability and efficiency on a high performance computing cluster —
    execution efficiency remained above 85% in all experiments; (2) the parameter
    auto-tuning methods are able to converge by visiting only a small fraction
    (0.0009%) of the search space with limited impact to the algorithm output
    (0.56% on average).

    Conclusions: The sensitivity analysis framework provides a range of
    strategies for the efficient exploration of the parameter space, as well as
    multiple indexes to evaluate the effect of parameter modification to outputs or
    even correlation between parameters. Our work demonstrates the feasibility of
    performing sensitivity analyses, parameter studies, and auto-tuning with large
    datasets with the use of high-performance systems and techniques. The proposed
    technologies will enable the quantification of error estimations and output
    variations in these pipelines, which may be used in application specific ways
    to assess uncertainty of conclusions extracted from data generated by these
    image analysis pipelines.

    Gradient Coding

    Rashish Tandon, Qi Lei, Alexandros G. Dimakis, Nikos Karampatziakis
    Comments: 13 pages, Presented at the Machine Learning Systems Workshop at NIPS 2016
    Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Computation (stat.CO)

    We propose a novel coding theoretic framework for mitigating stragglers in
    distributed learning. We show how carefully replicating data blocks and coding
    across gradients can provide tolerance to failures and stragglers for
    synchronous Gradient Descent. We implement our scheme in MPI and show how we
    compare against baseline architectures in running time and generalization
    error.


    Learning

    Design and Development of Bayes' Minimax Linear Classification Systems

    Denise M. Reeves
    Comments: 122 pages, 25 figures. arXiv admin note: text overlap with arXiv:1511.05102
    Subjects: Learning (cs.LG)

    This paper considers the design and development of Bayes’ minimax, linear
    classification systems using linear discriminant functions that are Bayes’
    equalizer rules. Bayes’ equalizer rules divide two-class feature spaces into
    decision regions that have equal classification errors. I will formulate the
    problem of learning unknown linear discriminant functions from data as a locus
    problem, thereby formulating geometric locus methods within a statistical
    framework. Solving locus problems involves finding the equation of a curve or
    surface defined by a given property, and finding the graph or locus of a given
    equation. I will devise a system of locus equations that determines Bayes’
    equalizer rules which is based on a variant of the inequality constrained
    optimization problem for linear kernel support vector machines. Thereby, I will
    define a class of learning machines which are fundamental building blocks for
    Bayes’ minimax pattern recognition systems.

    Orthogonal Tensor Decompositions via Two-Mode Higher-Order SVD (HOSVD)

    Miaoyan Wang, Yun S. Song
    Comments: 33 pages, 5 figures
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Tensor decompositions have rich applications in statistics and machine
    learning, and developing efficient, accurate algorithms for the problem has
    received much attention recently. Here, we present a new method built on
    Kruskal’s uniqueness theorem to decompose symmetric, nearly orthogonally
    decomposable tensors. Unlike the classical higher-order singular value
    decomposition which unfolds a tensor along a single mode, we consider
    unfoldings along two modes and use rank-1 constraints to characterize the
    underlying components. This tensor decomposition method provably handles a
    greater level of noise compared to previous methods and achieves a high
    estimation accuracy. Numerical results demonstrate that our algorithm is robust
    to various noise distributions and that it performs especially favorably as the
    order increases.

    Kernel-based Reconstruction of Space-time Functions on Dynamic Graphs

    Daniel Romero, Vassilis N. Ioannidis, Georgios B. Giannakis
    Comments: Submitted to IEEE Journal of Selected Topics in Signal processing, Oct. 2016
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Graph-based methods pervade the inference toolkits of numerous disciplines
    including sociology, biology, neuroscience, physics, chemistry, and
    engineering. A challenging problem encountered in this context pertains to
    determining the attributes of a set of vertices given those of another subset
    at possibly different time instants. Leveraging spatiotemporal dynamics can
    drastically reduce the number of observed vertices, and hence the cost of
    sampling. Alleviating the limited flexibility of existing approaches, the
    present paper broadens the existing kernel-based graph function reconstruction
    framework to accommodate time-evolving functions over possibly time-evolving
    topologies. This approach inherits the versatility and generality of
    kernel-based methods, for which no knowledge on distributions or second-order
    statistics is required. Systematic guidelines are provided to construct two
    families of space-time kernels with complementary strengths. The first
    facilitates judicious control of regularization on a space-time frequency
    plane, whereas the second can afford time-varying topologies. Batch and online
    estimators are also put forth, and a novel kernel Kalman filter is developed to
    obtain these estimates at affordable computational cost. Numerical tests with
    real data sets corroborate the merits of the proposed methods relative to
    competing alternatives.

    Noisy subspace clustering via matching pursuits

    Michael Tschannen, Helmut Bölcskei
    Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

    Sparsity-based subspace clustering algorithms have attracted significant
    attention thanks to their excellent performance in practical applications. A
    prominent example is the sparse subspace clustering (SSC) algorithm by
    Elhamifar and Vidal, which performs spectral clustering based on an adjacency
    matrix obtained by sparsely representing each data point in terms of all the
    other data points via the Lasso. When the number of data points is large or the
    dimension of the ambient space is high, the computational complexity of SSC
    quickly becomes prohibitive. Dyer et al. observed that SSC-OMP obtained by
    replacing the Lasso by the greedy orthogonal matching pursuit (OMP) algorithm
    results in significantly lower computational complexity, while often yielding
    comparable performance. The central goal of this paper is an analytical
    performance characterization of SSC-OMP for noisy data. Moreover, we introduce
    and analyze the SSC-MP algorithm, which employs matching pursuit (MP) in lieu
    of OMP. Both SSC-OMP and SSC-MP are proven to succeed even when the subspaces
    intersect and when the data points are contaminated by severe noise. The
    clustering conditions we obtain for SSC-OMP and SSC-MP are similar to those for
    SSC and for the thresholding-based subspace clustering (TSC) algorithm due to
    Heckel and B”olcskei. Analytical results in combination with numerical results
    indicate that both SSC-OMP and SSC-MP with a data-dependent stopping criterion
    automatically detect the dimensions of the subspaces underlying the data.
    Moreover, experiments on synthetic and real data show that SSC-MP compares very
    favorably to SSC, SSC-OMP, TSC, and the nearest subspace neighbor (NSN)
    algorithm, both in terms of clustering performance and running time. In
    addition, we find that, in contrast to SSC-OMP, the performance of SSC-MP is
    very robust with respect to the choice of parameters in the stopping criteria.

    Non-Redundant Spectral Dimensionality Reduction

    Yochai Blau, Tomer Michaeli
    Subjects: Learning (cs.LG)

    Spectral dimensionality reduction algorithms are widely used in numerous
    domains, including for recognition, segmentation, tracking and visualization.
    However, despite their popularity, these algorithms suffer from a major
    limitation known as the “repeated Eigen-directions” phenomenon. That is, many
    of the embedding coordinates they produce typically capture the same direction
    along the data manifold. This leads to redundant and inefficient
    representations that do not reveal the true intrinsic dimensionality of the
    data. In this paper, we propose a general method for avoiding redundancy in
    spectral algorithms. Our approach relies on replacing the orthogonality
    constraints underlying those methods by unpredictability constraints.
    Specifically, we require that each embedding coordinate be unpredictable (in
    the statistical sense) from all previous ones. We prove that these constraints
    necessarily prevent redundancy, and provide a simple technique to incorporate
    them into existing methods. As we illustrate on challenging high-dimensional
    scenarios, our approach produces significantly more informative and compact
    representations, which substantially improve visualization and classification
    tasks.

    Towards deep learning with spiking neurons in energy based models with contrastive Hebbian plasticity

    Thomas Mesnard, Wulfram Gerstner, Johanni Brea
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    In machine learning, error back-propagation in multi-layer neural networks
    (deep learning) has been impressively successful in supervised and
    reinforcement learning tasks. As a model for learning in the brain, however,
    deep learning has long been regarded as implausible, since it relies in its
    basic form on a non-local plasticity rule. To overcome this problem,
    energy-based models with local contrastive Hebbian learning were proposed and
    tested on a classification task with networks of rate neurons. We extended this
    work by implementing and testing such a model with networks of leaky
    integrate-and-fire neurons. Preliminary results indicate that it is possible to
    learn a non-linear regression task with hidden layers, spiking neurons and a
    local synaptic plasticity rule.

    Inverse Compositional Spatial Transformer Networks

    Chen-Hsuan Lin, Simon Lucey
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In this paper, we establish a theoretical connection between the classical
    Lucas & Kanade (LK) algorithm and the emerging topic of Spatial Transformer
    Networks (STNs). STNs are of interest to the vision and learning communities
    due to their natural ability to combine alignment and classification within the
    same theoretical framework. Inspired by the Inverse Compositional (IC) variant
    of the LK algorithm, we present Inverse Compositional Spatial Transformer
    Networks (IC-STNs). We demonstrate that IC-STNs can achieve better performance
    than conventional STNs with less model capacity; in particular, we show
    superior performance in pure image alignment tasks as well as joint
    alignment/classification problems on real-world problems.

    Knowledge Completion for Generics using Guided Tensor Factorization

    Hanie Sedghi, Ashish Sabharwal
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    We consider knowledge base (KB) completion for KBs rich in facts about common
    nouns (generics), such as “trees produce oxygen” or “dogs have tails”. While KB
    completion has received much attention for named entity KBs, little emphasis
    has been placed on generics despite their importance for capturing general
    knowledge. Compared with named entity KBs such as Freebase, KBs about generics
    have more complex underlying regularities, are substantially more incomplete,
    and violate the commonly used locally closed world assumption (LCWA).
    Consequently, existing completion methods struggle with this new task, and the
    commonly used evaluation metrics become less meaningful. To address these
    challenges, we make three contributions: (a) a tensor factorization approach
    that achieves state-of-the-art results by incorporating external knowledge
    about relation schema and entity taxonomy, (b) taxonomy guided submodular
    active learning to efficiently collect additional annotations to address KB
    incompleteness, and (c) a more appropriate metric (yield at high precision)
    along with a constant-factor deterministic approximation algorithm to compute
    it cost-effectively with only a logarithmic number of human annotations. An
    empirical evaluation shows that our techniques achieve state-of-the-art results
    on two novel generics KBs about elementary level science.

    Generalizable Features From Unsupervised Learning

    Mehdi Mirza, Aaron Courville, Yoshua Bengio
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Humans learn a predictive model of the world and use this model to reason
    about future events and the consequences of actions. In contrast to most
    machine predictors, we exhibit an impressive ability to generalize to unseen
    scenarios and reason intelligently in these settings. One important aspect of
    this ability is physical intuition(Lake et al., 2016). In this work, we explore
    the potential of unsupervised learning to find features that promote better
    generalization to settings outside the supervised training distribution. Our
    task is predicting the stability of towers of square blocks. We demonstrate
    that an unsupervised model, trained to predict future frames of a video
    sequence of stable and unstable block configurations, can yield features that
    support extrapolating stability prediction to blocks configurations outside the
    training set distribution

    Neurogenesis Deep Learning

    Timothy J. Draelos, Nadine E. Miner, Christopher C. Lamb, Craig M. Vineyard, Kristofor D. Carlson, Conrad D. James, James B. Aimone
    Comments: Submitted to IJCNN 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

    Neural machine learning methods, such as deep neural networks (DNN), have
    achieved remarkable success in a number of complex data processing tasks. These
    methods have arguably had their strongest impact on tasks such as image and
    audio processing – data processing domains in which humans have long held clear
    advantages over conventional algorithms. In contrast to biological neural
    systems, which are capable of learning continuously, deep artificial networks
    have a limited ability for incorporating new information in an already trained
    network. As a result, methods for continuous learning are potentially highly
    impactful in enabling the application of deep networks to dynamic data sets.
    Here, inspired by the process of adult neurogenesis in the hippocampus, we
    explore the potential for adding new neurons to deep layers of artificial
    neural networks in order to facilitate their acquisition of novel information
    while preserving previously trained data representations. Our results on the
    MNIST handwritten digit dataset and the NIST SD 19 dataset, which includes
    lower and upper case letters and digits, demonstrate that neurogenesis is well
    suited for addressing the stability-plasticity dilemma that has long challenged
    adaptive machine learning algorithms.

    Non-negative Factorization of the Occurrence Tensor from Financial Contracts

    Zheng Xu, Furong Huang, Louiqa Raschid, Tom Goldstein
    Comments: NIPS tensor workshop
    Subjects: Computational Engineering, Finance, and Science (cs.CE); Learning (cs.LG)

    We propose an algorithm for the non-negative factorization of an occurrence
    tensor built from heterogeneous networks. We use l0 norm to model sparse errors
    over discrete values (occurrences), and use decomposed factors to model the
    embedded groups of nodes. An efficient splitting method is developed to
    optimize the nonconvex and nonsmooth objective. We study both synthetic
    problems and a new dataset built from financial documents, resMBS.

    An Empirical Study of ADMM for Nonconvex Problems

    Zheng Xu, Soham De, Mario Figueiredo, Christoph Studer, Tom Goldstein
    Comments: NIPS nonconvex workshop
    Subjects: Optimization and Control (math.OC); Learning (cs.LG)

    The alternating direction method of multipliers (ADMM) is a common
    optimization tool for solving constrained and non-differentiable problems. We
    provide an empirical study of the practical performance of ADMM on several
    nonconvex applications, including l0 regularized linear regression, l0
    regularized image denoising, phase retrieval, and eigenvector computation. Our
    experiments suggest that ADMM performs well on a broad class of non-convex
    problems. Moreover, recently proposed adaptive ADMM methods, which
    automatically tune penalty parameters as the method runs, can improve algorithm
    efficiency and solution quality compared to ADMM with a non-tuned penalty.

    Knowledge Elicitation via Sequential Probabilistic Inference for High-Dimensional Prediction

    Pedram Daee, Tomi Peltola, Marta Soare, Samuel Kaski
    Comments: 22 pages, 9 figures, all codes and data available at this https URL
    Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG); Machine Learning (stat.ML)

    Prediction in a small-sized sample with a large number of covariates, the
    “small n, large p” problem, is challenging. This setting is encountered in
    multiple applications, such as precision medicine, where obtaining additional
    samples can be extremely costly or even impossible, and extensive research
    effort has recently been dedicated to finding principled solutions for accurate
    prediction. However, a valuable source of additional information, domain
    experts, has not yet been efficiently exploited. We formulate knowledge
    elicitation generally as a probabilistic inference process, where expert
    knowledge is sequentially queried to improve predictions. In the specific case
    of sparse linear regression, where we assume the expert has knowledge about the
    values of the regression coefficients or about the relevance of the features,
    we propose an algorithm and computational approximation for fast and efficient
    interaction, which sequentially identifies the most informative features on
    which to query expert knowledge. Evaluations of our method in experiments with
    simulated and real users show improved prediction accuracy already with a small
    effort from the expert.

    Generalized Deep Image to Image Regression

    Venkataraman Santhanam, Vlad I. Morariu, Larry S. Davis
    Comments: Submitted to CVPR on November 15th, 2016. Code will be made available soon
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We present a Deep Convolutional Neural Network architecture which serves as a
    generic image-to-image regressor that can be trained end-to-end without any
    further machinery. Our proposed architecture: the Recursively Branched
    Deconvolutional Network (RBDN) develops a cheap multi-context image
    representation very early on using an efficient recursive branching scheme with
    extensive parameter sharing and learnable upsampling. This multi-context
    representation is subjected to a highly non-linear locality preserving
    transformation by the remainder of our network comprising of a series of
    convolutions/deconvolutions without any spatial downsampling. The RBDN
    architecture is fully convolutional and can handle variable sized images during
    inference. We provide qualitative/quantitative results on (3) diverse tasks:
    relighting, denoising and colorization and show that our proposed RBDN
    architecture obtains comparable results to the state-of-the-art on each of
    these tasks when used off-the-shelf without any post processing or
    task-specific architectural modifications.

    Active Learning for Speech Recognition: the Power of Gradients

    Jiaji Huang, Rewon Child, Vinay Rao, Hairong Liu, Sanjeev Satheesh, Adam Coates
    Comments: published as a workshop paper at NIPS 2016
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    In training speech recognition systems, labeling audio clips can be
    expensive, and not all data is equally valuable. Active learning aims to label
    only the most informative samples to reduce cost. For speech recognition,
    confidence scores and other likelihood-based active learning methods have been
    shown to be effective. Gradient-based active learning methods, however, are
    still not well-understood. This work investigates the Expected Gradient Length
    (EGL) approach in active learning for end-to-end speech recognition. We justify
    EGL from a variance reduction perspective, and observe that EGL’s measure of
    informativeness picks novel samples uncorrelated with confidence scores.
    Experimentally, we show that EGL can reduce word errors by 11\%, or
    alternatively, reduce the number of samples to label by 50\%, when compared to
    random sampling.

    DeepCancer: Detecting Cancer through Gene Expressions via Deep Generative Learnin

    Rajendra Rana Bhat, Vivek Viswanath, Xiaolin Li
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Genomics (q-bio.GN)

    Transcriptional profiling on microarrays to obtain gene expressions has been
    used to facilitate cancer diagnosis. We propose a deep generative machine
    learning architecture (called DeepCancer) that learn features from unlabeled
    microarray data. These models have been used in conjunction with conventional
    classifiers that perform classification of the tissue samples as either being
    cancerous or non-cancerous. The proposed model has been tested on two different
    clinical datasets. The evaluation demonstrates that DeepCancer model achieves a
    very high precision score, while significantly controlling the false positive
    and false negative scores.

    Low-Rank Inducing Norms with Optimality Interpretations

    Christian Grussler, Pontus Giselsson
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)

    Optimization problems with rank constraints appear in many diverse fields
    such as control, machine learning and image analysis. Since the rank constraint
    is non-convex, these problems are often approximately solved via convex
    relaxations. Nuclear norm regularization is the prevailing convexifying
    technique for dealing with these types of problem. This paper introduces a
    family of low-rank inducing norms and regularizers which includes the nuclear
    norm as a special case. A posteriori guarantees on solving an underlying rank
    constrained optimization problem with these convex relaxations are provided. We
    evaluate the performance of the low-rank inducing norms on three matrix
    completion problems. In all examples, the nuclear norm heuristic is
    outperformed by convex relaxations based on other low-rank inducing norms. For
    two of the problems there exist low-rank inducing norms that succeed in
    recovering the partially unknown matrix, while the nuclear norm fails. These
    low-rank inducing norms are shown to be representable as semi-definite programs
    and to have cheaply computable proximal mappings. The latter makes it possible
    to also solve problems of large size with the help of scalable first-order
    methods. Finally, it is proven that our findings extend to the more general
    class of atomic norms. In particular, this allows us to solve corresponding
    vector-valued problems, as well as problems with other non-convex constraints.


    Information Theory

    Enhancing the Physical Layer Security of Non-orthogonal Multiple Access in Large-Scale Networks

    Yuanwei Liu, Zhijin Qin, Maged Elkashlan, Yue Gao, Lajos Hanzo
    Comments: IEEE Transactions on Wireless Communications, revision
    Subjects: Information Theory (cs.IT)

    This paper investigates the physical layer security of non-orthogonal
    multiple access (NOMA) in large-scale networks with invoking stochastic
    geometry. Both single-antenna and multiple-antenna aided transmission scenarios
    are considered, where the base station (BS) communicates with randomly
    distributed NOMA users. In the single-antenna scenario, we adopt a protected
    zone around the BS to establish an eavesdropper-exclusion area with the aid of
    careful channel-ordering of the NOMA users. In the multiple-antenna scenario,
    artificial noise is generated at the BS for further improving the security of a
    beamforming-aided system. In order to characterize the secrecy performance, we
    derive new exact expressions of the security outage probability for both
    single-antenna and multiple-antenna aided scenarios. To obtain further
    insights, 1) for the single antenna scenario, we perform secrecy diversity
    order analysis of the selected user pair. The analytical results derived
    demonstrate that the secrecy diversity order is determined by the specific user
    having the worse channel condition among the selected user pair; and 2) for the
    multiple-antenna scenario, we derive the asymptotic secrecy outage probability,
    when the number of transmit antennas tends to infinity. Monte Carlo simulations
    are provided for verifying the analytical results derived and to show that:
    i)~The security performance of the NOMA networks can be improved by invoking
    the protected zone and by generating artificial noise at the BS; and ii)~The
    asymptotic secrecy outage probability is close to the exact secrecy outage
    probability.

    Unified Framework for the Effective Rate Analysis of Wireless Communication Systems over MISO Fading Channels

    Minglei You, Hongjian Sun, Jing Jiang, Jiayi Zhang
    Subjects: Information Theory (cs.IT)

    This paper proposes a unified framework for the effective rate analysis over
    arbitrary correlated and not necessarily identical multiple inputs single
    output (MISO) fading channels, which uses moment generating function (MGF)
    based approach and H transform representation. The proposed framework has the
    potential to simplify the cumbersome analysis procedure compared to the
    probability density function (PDF) based approach. Moreover, the effective
    rates over two specific fading scenarios are investigated, namely independent
    but not necessarily identical distributed (i.n.i.d.) MISO hyper Fox’s H fading
    channels and arbitrary correlated generalized K fading channels. The exact
    analytical representations for these two scenarios are also presented. By
    substituting corresponding parameters, the effective rates in various practical
    fading scenarios, such as Rayleigh, Nakagami-m, Weibull/Gamma and generalized K
    fading channels, are readily available. In addition, asymptotic approximations
    are provided for the proposed H transform and MGF based approach as well as for
    the effective rate over i.n.i.d. MISO hyper Fox’s H fading channels.
    Simulations under various fading scenarios are also presented, which support
    the validity of the proposed method.

    On the SNR Variability in Noisy Compressed Sensing

    Anastasia Lavrenko, Florian Roemer, Giovanni Del Galdo Member, Reiner Thomae
    Comments: 4 pages + references
    Subjects: Information Theory (cs.IT)

    Compressed sensing is a sampling paradigm that allows to simultaneously
    measure and compress signals that are sparse or compressible in some domain.
    The choice of a sensing matrix that carries out the measurement has a defining
    impact on the system performance and it is often advocated to draw its elements
    randomly. It has been noted that in the presence of input (signal) noise, the
    application of the sensing matrix causes SNR degradation due to the noise
    folding effect. In fact, it might also result in the variations of the output
    SNR in compressive measurements over the support of the input signal. In this
    work, we study the impact of a distribution from which the elements of a
    sensing matrix are drawn on the spread of the output SNR. We derive analytic
    expressions for several common types of sensing matrices and show that SNR
    variations can be significant, especially for lower dimensional problems.
    Hence, this effect should not be ignored during system design and performance
    evaluation.

    Robust MIMO Beamforming for Cellular and Radar Coexistence

    Fan Liu, Christos Masouros, Ang Li, Jianming Zhou
    Subjects: Information Theory (cs.IT)

    In this letter, we consider the coexistence and spectrum sharing between
    downlink multiple-input-multiple-output (MIMO) communication and a MIMO radar.
    Based on the knowledge of the communication and interference channels, the
    communication beamforming matrix is designed to minimize the transmit power
    while guaranteeing the performance requirement of the radar and downlink
    communication users. We investigate the beamforming design for perfect Channel
    State Information (CSI) and the robust beamforming for imperfect CSI case.
    Simulation results verify that the proposed approach facilitates the
    coexistence between radar and communication links, and illustrates a scalable
    trade-off of the two systems’ performance.

    Cache-enabled Uplink Transmission in Wireless Small Cell Networks

    Zhanzhan Zhang, Zhiyong Chen, Hao Feng, Bin Xia, Weiliang Xie, Yong Zhao
    Comments: submitted to IEEE Trans. Veh. Tech., Sep. 2016
    Subjects: Information Theory (cs.IT)

    It is starting to become a big trend in the era of social networking that
    people produce and upload user-generated contents to Internet via wireless
    networks, bringing a significant burden on wireless uplink networks. In this
    paper, we contribute to designing and theoretical understanding of wireless
    cache-enabled upload transmission in a delay-tolerant small cell network to
    relieve the burden, and then propose the corresponding scheduling policies for
    the small base station (SBS) under the infinite and finite cache sizes.
    Specifically, the cache ability introduced by SBS enables SBS to eliminate the
    redundancy among the upload contents from users. This strategy not only
    alleviates the wireless backhual traffic congestion from SBS to a macro base
    station (MBS), but also improves the transmission efficiency of SBS. We then
    investigate the scheduling schemes of SBS to offload more data traffic under
    caching size constraint. Moreover, two operational regions for the wireless
    cache-enabled upload network, namely, the delay-limited region and the
    cache-limited region, are established to reveal the fundamental tradeoff
    between the delay tolerance and the cache ability. Finally, numerical results
    are provided to demonstrate the significant performance gains of the proposed
    wireless cache-enabled upload network.

    A Note on Hamming distance of constacyclic codes of length (p^s) over (mathbb F_{p^m} + umathbb F_{p^m})

    Hongwei Liu, Maouche Youcef
    Subjects: Information Theory (cs.IT)

    For any prime (p), (lambda)-constacyclic codes of length (p^s) over ({cal
    R}=mathbb{F}_{p^m} + umathbb{F}_{p^m}) are precisely the ideals of the local
    ring ({cal R}_{lambda}=frac{{cal R}[x]}{leftlangle x^{p^s}-lambda

    ight
    angle}), where (u^2=0). In this paper, we first investigate the Hamming
    distances of cyclic codes of length (p^s) over ({cal R}). The minimum Hamming
    distances of all cyclic codes of length (p^s) over ({cal R}) are determined.
    Moreover, an isometry between cyclic and (alpha)-constacyclic codes of length
    (p^s) over ({cal R}) is established, where (alpha) is a nonzero element of
    (mathbb{F}_{p^m}), which carries over the results regarding cyclic codes
    corresponding to (alpha)-constacyclic codes of length (p^s) over ({cal R}).

    Circulant Matrix Representation of PN-sequences with Ideal Autocorrelation Property

    Mohammad J. Khojasteh, Morteza H. Shoreh, Jawad A. Salehi
    Comments: Communication and Information Theory (IWCIT), 2015 Iran Workshop on. IEEE, 2015
    Subjects: Information Theory (cs.IT); Rings and Algebras (math.RA)

    In this paper, we investigate PN-sequences with ideal autocorrelation
    property and the consequences of this property on the number of +1s and -1s and
    run structure of sequences. We begin by discussing and surveying about the
    length of PNsequences with ideal autocorrelation property. From our discussion
    and survey we introduce circulant matrix representation of PN-sequence. Through
    circulant matrix representation we obtain system of non-linear equations that
    lead to ideal autocorrelation property. Rewriting PN-sequence and its
    autocorrelation property in {0,1} leads to a definition based on Hamming weight
    and Hamming distance and hence we can easily prove some results on the
    PN-sequences with ideal autocorrelation property.

    Corruption Robust Phase Retrieval via Linear Programming

    Paul Hand, Vladislav Voroninski
    Subjects: Information Theory (cs.IT); Optimization and Control (math.OC); Probability (math.PR)

    We consider the problem of phase retrieval from corrupted magnitude
    observations. In particular we show that a fixed (x_0 in mathbb{R}^n) can be
    recovered exactly from corrupted magnitude measurements (|langle a_i, x_0

    angle | + eta_i, quad i =1,2ldots m) with high probability for (m = O(n)),
    where (a_i in mathbb{R}^n) are i.i.d standard Gaussian and (eta in
    mathbb{R}^m) has fixed sparse support and is otherwise arbitrary, by using a
    version of the PhaseMax algorithm augmented with slack variables subject to a
    penalty. This linear programming formulation, which we call RobustPhaseMax,
    operates in the natural parameter space, and our proofs rely on a direct
    analysis of the optimality conditions using concentration inequalities.

    LP Bounds for Rate-Distortion with Variable Side Information

    Sinem Unal, Aaron B. Wagner
    Comments: Presented in part at the IEEE Int. Symposium on Information Theory (ISIT), Barcelona, July 2016 and submitted for presentation in part to Data Compression Conference (DCC), Snowbird, UT April, 2017. Submitted to IEEE Transactions on Information Theory
    Subjects: Information Theory (cs.IT)

    We consider a rate-distortion problem with side information at multiple
    decoders. Several upper and lower bounds have been proposed for this general
    problem or special cases of it. We provide an upper bound for general instances
    of this problem, which takes the form of a linear program, by utilizing random
    binning and simultaneous decoding techniques and compare it with the existing
    bounds. We also provide a lower bound for the general problem, which was
    inspired by a linear-programming lower bound for index coding, and show that it
    subsumes most of the lower bounds in literature. Using these upper and lower
    bounds, we explicitly characterize the rate-distortion function of a problem
    that can be seen as a Gaussian analogue of the “odd-cycle” index coding
    problem.

    Vector Gaussian Rate-Distortion with Variable Side Information

    Sinem Unal, Aaron B. Wagner
    Comments: Presented in parts at the IEEE Int. Symposium on Information Theory (ISIT), Hawaii, July 2014, the Information Theory and Applications Workshop, Jan./Feb. 2016, and the Conference on Information Sciences and Systems (CISS), Princeton, March 2016. Submitted to IEEE Transactions on Information Theory
    Subjects: Information Theory (cs.IT)

    We consider rate-distortion with two decoders, each with distinct side
    information. This problem is well understood when the side information at the
    decoders satisfies a certain degradedness condition. We consider cases in which
    this degradedness condition is violated but the source and the side information
    consist of jointly Gaussian vectors. We provide a hierarchy of four lower
    bounds on the optimal rate. These bounds are then used to determine the optimal
    rate for several classes of instances.

    Information Theory of Molecular Communication: Directions and Challenges

    Amin Gohari, Mahtab Mirmohseni, Masoumeh Nasiri-Kenari
    Comments: Accepted for publication in the IEEE Transactions on Molecular, Biological, and Multi-Scale Communications as a Shannon Centennial Special Issue
    Subjects: Information Theory (cs.IT)

    Molecular Communication (MC) is a communication strategy that uses molecules
    as carriers of information, and is widely used by biological cells. As an
    interdisciplinary topic, it has been studied by biologists, communication
    theorists and a growing number of information theorists. This paper aims to
    specifically bring MC to the attention of information theorists. To do this, we
    first highlight the unique mathematical challenges of studying the capacity of
    molecular channels. Addressing these problems require use of known, or
    development of new mathematical tools. Toward this goal, we review a subjective
    selection of the existing literature on information theoretic aspect of
    molecular communication. The emphasis here is on the mathematical techniques
    used, rather than on the setup or modeling of a specific paper. Finally, as an
    example, we propose a concrete information theoretic problem that was motivated
    by our study of molecular communication.

    Limited Feedback in MISO Systems with Finite-Bit ADCs

    Jianhua Mo, Robert W. Heath Jr
    Comments: To appear in the Proceedings of 50th Asilomar Conference on Signals, Systems and Computers
    Subjects: Information Theory (cs.IT)

    We analyze limited feedback in systems where a multiple-antenna transmitter
    sends signals to single-antenna receivers with finite-bit ADCs. If channel
    state information (CSI) is not available with high resolution at the
    transmitter and the precoding is not well designed, the inter-user interference
    is a big decoding challenge for receivers with low-resolution quantization. In
    this paper, we derive achievable rates with finite-bit ADCs and finite-bit CSI
    feedback. The performance loss compared to the case with perfect CSI is then
    analyzed. The results show that the number of bits per feedback should increase
    linearly with the ADC resolution to restrict the loss.

    Graph based linear error correcting codes

    Monika Polak, Eustrat Zhupa
    Comments: 9 pages, 5 figures, 2 tables
    Subjects: Information Theory (cs.IT)

    In this article we present a construction of error correcting codes, that
    have representation as very sparse matrices and belong to the class of Low
    Density Parity Check Codes. LDPC codes are in the classical Hamming metric.
    They are very close to well known Shannon bound. The ability to use graphs for
    code construction was first discussed by Tanner in 1981 and has been used in a
    number of very effective implementations. We describe how to construct such
    codes by using special a family of graphs introduced by Ustimenko and Woldar.
    Graphs that we used are bipartite, bi-regular, very sparse and do not have
    short cycles C 4 . Due to the very low density of such graphs, the obtained
    codes are fast decodable. We describe how to choose parameters to obtain a
    desired code rate. We also show results of computer simulations of BER (bit
    error rate) of the obtained codes in order to compare them with other known
    LDPC codes.

    Optimal Design of Energy and Spectral Efficiency Tradeoff in One-Bit Massive MIMO Systems

    Yongzhi Li, Cheng Tao, Amine Mezghani, A. Lee Swindlehurst, Gonzalo Seco-Granados, Liu Liu
    Comments: 14 pages, 7 figures
    Subjects: Information Theory (cs.IT)

    This paper considers a single-cell massive multiple-input multiple-output
    (MIMO) system equipped with a base station (BS) that uses one-bit quantization
    and investigates the energy efficiency (EE) and spectral efficiency (SE)
    trade-off by simultaneously looking at the uplink and downlink transmission. To
    this end, we first propose a new precoding scheme and downlink power allocation
    strategy, which makes the uplink-downlink SINR duality hold in the one-bit MIMO
    systems. Then, by taking into account the effect of the imperfect channel state
    information (CSI), we obtain approximate closed-form expressions for the uplink
    achievable rate with maximum ratio combining (MRC) and zero-forcing (ZF)
    receivers, which, according to the duality property, can also be achieved in
    the downlink transmission. By employing the multiple objective optimization
    (MOO) framework, we focus on the optimal design for the EE and SE trade-off
    that jointly selects the number of active terminals, pilot training duration
    and operating power to maximize both EE and SE. The weighted Chebyshev method
    is used to obtain the Pareto boundary of EE and SE, which allows the system to
    know all the possible operating points and balance the EE and SE in an
    efficient way. In order to go beyond the Pareto boundary and actually solve the
    MOO problem, this problem is transformed into a single-objective problem using
    the a priori method such as the weighted product method, where the operating EE
    and SE are chosen with equal importance by adjusting the weights. Numerical
    results are presented to verify our analytical results and demonstrate the
    fundamental tradeoff between EE and SE for different parameter settings.

    Delay-Optimal Scheduling and Power Control for Instantaneous-Interference-Limited CRs

    Ahmed Ewaisha, Cihan Tepedelenlioglu
    Comments: Asilomar 2016 (to appear). arXiv admin note: text overlap with arXiv:1601.00608, arXiv:1602.08010
    Subjects: Information Theory (cs.IT)

    We study an uplink multi secondary user (SU) cognitive radio system suffering
    statistical heterogeneity among SUs’ channels. This heterogeneity may result in
    differentiated delay performances to these SUs and result in harmful
    interference to the PU. We first derive an explicit closed-form expression for
    the average delay in terms of an arbitrary power-control policy. Then, we
    propose a delay-optimal closed-form scheduling and power-control policy that
    can provide the required average delay guarantees to all SUs besides protecting
    the PU from harmful interference. We support our findings by extensive system
    simulations and show that it outperforms existing policies substantially.

    Dynamic Scheduling for Delay Guarantees for Heterogeneous Cognitive Radio Users

    Ahmed Ewaisha, Cihan Tepedelenlioglu
    Comments: Asilomar 2015. arXiv admin note: text overlap with arXiv:1602.08010
    Subjects: Information Theory (cs.IT)

    We study an uplink multi secondary user (SU) system having statistical delay
    constraints, and an average interference constraint to the primary user (PU).
    SUs with heterogeneous interference channel statistics, to the PU, experience
    heterogeneous delay performances since SUs causing low interference are
    scheduled more frequently than those causing high interference. We propose a
    scheduling algorithm that can provide arbitrary average delay guarantees to SUs
    irrespective of their statistical channel qualities. We derive the algorithm
    using the Lyapunov technique and show that it yields bounded queues and satisfy
    the interference constraints. Using simulations, we show its superiority over
    the Max-Weight algorithm.

    Locally Recoverable Codes with Availability (tgeq 2) from Fiber Products of Curves

    Kathryn Haymaker, Beth Malmskog, Gretchen Matthews
    Subjects: Number Theory (math.NT); Information Theory (cs.IT)

    We generalize the construction of locally recoverable codes on algebraic
    curves given by Barg, Tamo and Vladut to those with arbitrarily many recovery
    sets by exploiting the structure of fiber products of curves. Employing maximal
    curves, we create several new families of locally recoverable codes with
    multiple recovery sets, including codes with two recovery sets from the
    generalized Giulietti and Korchm’aros (GK) curves and the Suzuki curves, and a
    new locally recoverable code with many recovery sets based on the Hermitian
    curve, using a fiber product construction of Van der Geer and Van der Vlugt. In
    addition, we consider the relationship between local error recovery and global
    error correction, and the availability required to locally recover any pattern
    of a fixed number of erasures.

    Symmetric blind information reconciliation for quantum key distribution

    E.O. Kiktenko, A.S. Trushechkin, C.C.W. Lim, Y.V. Kurochkin, A.K. Fedorov
    Comments: 10 pages, 4 figures
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    Quantum key distribution (QKD) is a quantum-proof key exchange scheme which
    is fast approaching the communication industry. An essential component in QKD
    is the information reconciliation step, which is used for correcting the
    quantum channel noise errors. The recently suggested blind reconciliation
    technique, based on low-density parity-check (LDPC) codes, offers remarkable
    prospectives for efficient information reconciliation without an a priori error
    rate estimation. In the present work, we suggest an improvement of the blind
    information reconciliation protocol allowing significant increase the
    efficiency of the procedure and reducing its interactivity. The proposed
    technique is based on introducing symmetry in operations of parties, and the
    consideration of results of unsuccessful belief propagation decodings.

    Trace reconstruction with (exp( O( n^{1/3} ) )) samples

    Fedor Nazarov, Yuval Peres
    Comments: 9 pages
    Subjects: Probability (math.PR); Information Theory (cs.IT); Statistics Theory (math.ST)

    In the trace reconstruction problem, an unknown bit string (x in {0,1}^n)
    is observed through the deletion channel, which deletes each bit of (x) with
    some constant probability (q), yielding a contracted string (widetilde{x}).
    How many independent copies of (widetilde{x}) are needed to reconstruct (x)
    with high probability? Prior to this work, the best upper bound, due to
    Holenstein, Mitzenmacher, Panigrahy, and Wieder (2008), was
    (exp(widetilde{O}(n^{1/2}))). We improve this bound to (exp(O(n^{1/3})))
    using statistics of individual bits in the output and show that this bound is
    sharp in the restricted model where this is the only information used. Our
    method, that uses elementary complex analysis, can also handle insertions.

    Noisy subspace clustering via matching pursuits

    Michael Tschannen, Helmut Bölcskei
    Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

    Sparsity-based subspace clustering algorithms have attracted significant
    attention thanks to their excellent performance in practical applications. A
    prominent example is the sparse subspace clustering (SSC) algorithm by
    Elhamifar and Vidal, which performs spectral clustering based on an adjacency
    matrix obtained by sparsely representing each data point in terms of all the
    other data points via the Lasso. When the number of data points is large or the
    dimension of the ambient space is high, the computational complexity of SSC
    quickly becomes prohibitive. Dyer et al. observed that SSC-OMP obtained by
    replacing the Lasso by the greedy orthogonal matching pursuit (OMP) algorithm
    results in significantly lower computational complexity, while often yielding
    comparable performance. The central goal of this paper is an analytical
    performance characterization of SSC-OMP for noisy data. Moreover, we introduce
    and analyze the SSC-MP algorithm, which employs matching pursuit (MP) in lieu
    of OMP. Both SSC-OMP and SSC-MP are proven to succeed even when the subspaces
    intersect and when the data points are contaminated by severe noise. The
    clustering conditions we obtain for SSC-OMP and SSC-MP are similar to those for
    SSC and for the thresholding-based subspace clustering (TSC) algorithm due to
    Heckel and B”olcskei. Analytical results in combination with numerical results
    indicate that both SSC-OMP and SSC-MP with a data-dependent stopping criterion
    automatically detect the dimensions of the subspaces underlying the data.
    Moreover, experiments on synthetic and real data show that SSC-MP compares very
    favorably to SSC, SSC-OMP, TSC, and the nearest subspace neighbor (NSN)
    algorithm, both in terms of clustering performance and running time. In
    addition, we find that, in contrast to SSC-OMP, the performance of SSC-MP is
    very robust with respect to the choice of parameters in the stopping criteria.

    Quantum dynamical entropy, chaotic unitaries and complex Hadamard matrices

    Wojciech Słomczyński, Anna Szczepanek
    Comments: 8 double column pages, 4 figures
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph)

    We introduce two information-theoretical invariants for the projective
    unitary group acting on a finite-dimensional complex Hilbert space: PVM- and
    POVM-dynamical (quantum) entropies. They quantify the randomness of the
    successive quantum measurement results in the case where the evolution of the
    system between each two consecutive measurements is described by a given
    unitary operator. We study the class of chaotic unitaries, i.e., the ones of
    maximal entropy or, equivalently, such that they can be represented by suitably
    rescaled complex Hadamard matrices in some orthonormal bases. We provide
    necessary conditions for a unitary operator to be chaotic, which become also
    sufficient for qubits and qutrits. These conditions are expressed in terms of
    the relation between the trace and the determinant of the operator. We also
    compute the volume of the set of chaotic unitaries in dimensions two and three,
    and the average PVM-dynamical entropy over the unitary group in dimension two.
    We prove that this mean value behaves as the logarithm of the dimension of the
    Hilbert space, which implies that the probability that the dynamical entropy of
    a unitary is almost as large as possible approaches unity as the dimension
    tends to infinity.




沪ICP备19023445号-2号
友情链接