IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Thu, 27 Oct 2016

    我爱机器学习(52ml.net)发表于 2016-10-27 00:00:00
    love 0

    Computer Vision and Pattern Recognition

    Mask-off: Synthesizing Face Images in the Presence of Head-mounted Displays

    Yajie Zhao, Qingguo Xu, Xinyu Huang, Ruigang Yang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A head-mounted display (HMD) could be an important component of augmented
    reality system. However, as the upper face region is seriously occluded by the
    device, the user experience could be affected in applications such as
    telecommunication and multi-player video games. In this paper, we first present
    a novel experimental setup that consists of two near-infrared (NIR) cameras to
    point to the eye regions and one visible-light RGB camera to capture the
    visible face region. The main purpose of this paper is to synthesize realistic
    face images without occlusions based on the images captured by these cameras.
    To this end, we propose a novel synthesis framework that contains four modules:
    3D head reconstruction, face alignment and tracking, face synthesis, and eye
    synthesis. In face synthesis, we propose a novel algorithm that can robustly
    align and track a personalized 3D head model given a face that is severely
    occluded by the HMD. In eye synthesis, in order to generate accurate eye
    movements and dynamic wrinkle variations around eye regions, we propose another
    novel algorithm to colorize the NIR eye images and further remove the “red eye”
    effects caused by the colorization. Results show that both hardware setup and
    system framework are robust to synthesize realistic face images in video
    sequences.

    Estimating the concentration of gold nanoparticles incorporated on Natural Rubber membranes using Multi-Level Starlet Optimal Segmentation

    Alexandre Fioravante de Siqueira, Flávio Camargo Cabrera, Aylton Pagamisse, Aldo Eloizo Job
    Comments: 22 pages, 8 figures
    Journal-ref: J Nanopart Res (2014) 16: 2809
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This study consolidates Multi-Level Starlet Segmentation (MLSS) and
    Multi-Level Starlet Optimal Segmentation (MLSOS), techniques for
    photomicrograph segmentation that use starlet wavelet detail levels to separate
    areas of interest in an input image. Several segmentation levels can be
    obtained using Multi-Level Starlet Segmentation; after that, Matthews
    correlation coefficient (MCC) is used to choose an optimal segmentation level,
    giving rise to Multi-Level Starlet Optimal Segmentation. In this paper, MLSOS
    is employed to estimate the concentration of gold nanoparticles with diameter
    around 47 nm, reducted on natural rubber membranes. These samples were used on
    the construction of SERS/SERRS substrates and in the study of natural rubber
    membranes with incorporated gold nanoparticles influence on Leishmania
    braziliensis physiology. Precision, recall and accuracy are used to evaluate
    the segmentation performance, and MLSOS presents accuracy greater than 88% for
    this application.

    Universal adversarial perturbations

    Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Given a state-of-the-art deep neural network classifier, we show the
    existence of a universal (image-agnostic) and very small perturbation vector
    that causes natural images to be misclassified with high probability. We
    propose a systematic algorithm for computing universal perturbations, and show
    that state-of-the-art deep neural networks are highly vulnerable to such
    perturbations, albeit being quasi-imperceptible to the human eye. We further
    empirically analyze these universal perturbations and show, in particular, that
    they generalize very well across neural networks. The surprising existence of
    universal perturbations reveals important geometric correlations among the
    high-dimensional decision boundary of classifiers. It further outlines
    potential security breaches with the existence of single directions in the
    input space that adversaries can possibly exploit to break a classifier on most
    natural images.

    Video Analysis of "YouTube Funnies" to Aid the Study of Human Gait and Falls – Preliminary Results and Proof of Concept

    Babak Taati, Pranay Lohia, Avril Mansfield, Ahmed Ashraf
    Comments: 4 pages, 3 figures, 2 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Because falls are funny, YouTube and other video sharing sites contain a
    large repository of real-life falls. We propose extracting gait and balance
    information from these videos to help us better understand some of the factors
    that contribute to falls. Proof-of-concept is explored in a single video
    containing multiple (n=14) falls/non-falls in the presence of an unexpected
    obstacle. The analysis explores: computing spatiotemporal parameters of gait in
    a video captured from an arbitrary viewpoint; the relationship between
    parameters of gait from the last few steps before the obstacle and falling vs.
    not falling; and the predictive capacity of a multivariate model in predicting
    a fall in the presence of an unexpected obstacle. Homography transformations
    correct the perspective projection distortion and allow for the consistent
    tracking of gait parameters as an individual walks in an arbitrary direction in
    the scene. A synthetic top view allows for computing the average stride length
    and a synthetic side view allows for measuring up and down motions of the head.
    In leave-one-out cross-validation, we were able to correctly predict whether a
    person would fall or not in 11 out of the 14 cases (78.6%), just by looking at
    the average stride length and the range of vertical head motion during the 1-4
    most recent steps prior to reaching the obstacle.

    Incremental Nonparametric Weighted Feature Extraction for OnlineSubspace Pattern Classification

    Hamid Abrishami Moghaddam, Elaheh Raisi
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, a new online method based on nonparametric weighted feature
    extraction (NWFE) is proposed. NWFE was introduced to enjoy optimum
    characteristics of linear discriminant analysis (LDA) and nonparametric
    discriminant analysis (NDA) while rectifying their drawbacks. It emphasizes the
    points near decision boundary by putting greater weights on them and
    deemphasizes other points. Incremental nonparametric weighted feature
    extraction (INWFE) is the online version of NWFE. INWFE has advantages of NWFE
    method such as extracting more than L-1 features in contrast to LDA. It is
    independent of the class distribution and performs well in complex distributed
    data. The effects of outliers are reduced due to the nature of its
    nonparametric scatter matrix. Furthermore, it is possible to add new samples
    asynchronously, i.e. whenever a new sample becomes available at any given time,
    it can be added to the algorithm. This is useful for many real world
    applications since all data cannot be available in advance. This method is
    implemented on Gaussian and non-Gaussian multidimensional data, a number of UCI
    datasets and Indian Pine dataset. Results are compared with NWFE in terms of
    classification accuracy and execution time. For nearest neighbour classifier it
    shows that this technique converges to NWFE at the end of learning process. In
    addition, the computational complexity is reduced in comparison with NWFE in
    terms of execution time.

    Predicting First Impressions with Deep Learning

    Mel McCurrie, Fernando Beletti, Lucas Parzianello, Allen Westendorp, Samuel Anthony, Walter Scheirer
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Describable visual facial attributes are now commonplace in human biometrics
    and affective computing, with existing algorithms even reaching a sufficient
    point of maturity for placement into commercial products. These algorithms
    model objective facets of facial appearance, such as hair and eye color,
    expression, and aspects of the geometry of the face. A natural extension, which
    has not been studied to any great extent thus far, is the ability to model
    subjective attributes that are assigned to a face based purely on visual
    judgements. For instance, with just a glance, our first impression of a face
    may lead us to believe that a person is smart, worthy of our trust, and perhaps
    even our admiration – regardless of the underlying truth behind such
    attributes. Psychologists believe that these judgements are based on a variety
    of factors such as emotional states, personality traits, and other physiognomic
    cues. But work in this direction leads to an interesting question: how do we
    create models for problems where there is no ground truth, only measurable
    behavior? In this paper, we introduce a new convolutional neural network-based
    regression framework that allows us to train predictive models of crowd
    behavior for social attribute assignment. Over images from the AFLW face
    database, these models demonstrate strong correlations with human crowd
    ratings.

    The Event-Camera Dataset: Event-based Data for Pose Estimation, Visual Odometry, and SLAM

    Elias Mueggler, Henri Rebecq, Guillermo Gallego, Tobi Delbruck, Davide Scaramuzza
    Comments: 7 pages, 4 figures, 3 tables
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    New vision sensors, such as the Dynamic and Active-pixel Vision sensor
    (DAVIS), incorporate a conventional global-shutter camera and an event-based
    sensor in the same pixel array. These sensors have great potential for
    high-speed robotics and computer vision because they allow us to combine the
    benefits of conventional cameras with those of event-based sensors: low
    latency, high temporal resolution, and very high dynamic range. However, new
    algorithms are required to exploit the sensor characteristics and cope with its
    unconventional output, which consists of a stream of asynchronous brightness
    changes (called “events”) and synchronous grayscale frames. For this purpose,
    we present and release a collection of datasets captured with a DAVIS in a
    variety of synthetic and real environments, which we hope will motivate
    research on new algorithms for high-speed and high-dynamic-range robotics and
    computer-vision applications. In addition to global-shutter intensity images
    and asynchronous events, we also provide inertial measurements and ground truth
    from a motion-capture system. All the data are released both as standard text
    files and binary files (i.e., rosbag). This paper provides an overview of the
    available data.

    Image Segmentation for Fruit Detection and Yield Estimation in Apple Orchards

    Suchet Bargoti, James Underwood
    Comments: This paper is the initial version of the manuscript submitted to The Journal of Field Robotics in May 2016. Following reviews and revisions, the paper has been accepted for publication. The reviewed version includes extended comparison between the different classification frameworks and a more in-depth literature review
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Ground vehicles equipped with monocular vision systems are a valuable source
    of high resolution image data for precision agriculture applications in
    orchards. This paper presents an image processing framework for fruit detection
    and counting using orchard image data. A general purpose image segmentation
    approach is used, including two feature learning algorithms; multi-scale
    Multi-Layered Perceptrons (MLP) and Convolutional Neural Networks (CNN). These
    networks were extended by including contextual information about how the image
    data was captured (metadata), which correlates with some of the appearance
    variations and/or class distributions observed in the data. The pixel-wise
    fruit segmentation output is processed using the Watershed Segmentation (WS)
    and Circular Hough Transform (CHT) algorithms to detect and count individual
    fruits. Experiments were conducted in a commercial apple orchard near
    Melbourne, Australia. The results show an improvement in fruit segmentation
    performance with the inclusion of metadata on the previously benchmarked MLP
    network. We extend this work with CNNs, bringing agrovision closer to the
    state-of-the-art in computer vision, where although metadata had negligible
    influence, the best pixel-wise F1-score of (0.791) was achieved. The WS
    algorithm produced the best apple detection and counting results, with a
    detection F1-score of (0.858). As a final step, image fruit counts were
    accumulated over multiple rows at the orchard and compared against the
    post-harvest fruit counts that were obtained from a grading and counting
    machine. The count estimates using CNN and WS resulted in the best performance
    for this dataset, with a squared correlation coefficient of (r^2=0.826).


    Artificial Intelligence

    New Liftable Classes for First-Order Probabilistic Inference

    Seyed Mehran Kazemi, Angelika Kimmig, Guy Van den Broeck, David Poole
    Comments: Accepted at NIPS-2016. 22 pages, 1 figure
    Subjects: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Statistical relational models provide compact encodings of probabilistic
    dependencies in relational domains, but result in highly intractable graphical
    models. The goal of lifted inference is to carry out probabilistic inference
    without needing to reason about each individual separately, by instead treating
    exchangeable, undistinguished objects as a whole. In this paper, we study the
    domain recursion inference rule, which, despite its central role in early
    theoretical results on domain-lifted inference, has later been believed
    redundant. We show that this rule is more powerful than expected, and in fact
    significantly extends the range of models for which lifted inference runs in
    time polynomial in the number of individuals in the domain. This includes an
    open problem called S4, the symmetric transitivity model, and a first-order
    logic encoding of the birthday paradox. We further identify new classes S2FO2
    and S2RU of domain-liftable theories, which respectively subsume FO2 and
    recursively unary theories, the largest classes of domain-liftable theories
    known so far, and show that using domain recursion can achieve exponential
    speedup even in theories that cannot fully be lifted with the existing set of
    inference rules.

    A self-tuning Firefly algorithm to tune the parameters of Ant Colony System (ACSFA)

    M. K. A. Ariyaratne, T. G. I. Fernando, S. Weerakoon
    Comments: 18 pages, 21 references, 5 figures
    Subjects: Artificial Intelligence (cs.AI)

    Ant colony system (ACS) is a promising approach which has been widely used in
    problems such as Travelling Salesman Problems (TSP), Job shop scheduling
    problems (JSP) and Quadratic Assignment problems (QAP). In its original
    implementation, parameters of the algorithm were selected by trial and error
    approach. Over the last few years, novel approaches have been proposed on
    adapting the parameters of ACS in improving its performance. The aim of this
    paper is to use a framework introduced for self-tuning optimization algorithms
    combined with the firefly algorithm (FA) to tune the parameters of the ACS
    solving symmetric TSP problems. The FA optimizes the problem specific
    parameters of ACS while the parameters of the FA are tuned by the selected
    framework itself. With this approach, the user neither has to work with the
    parameters of ACS nor the parameters of FA. Using common symmetric TSP problems
    we demonstrate that the framework fits well for the ACS. A detailed statistical
    analysis further verifies the goodness of the new ACS over the existing ACS and
    also of the other techniques used to tune the parameters of ACS.

    A Physician Advisory System for Chronic Heart Failure Management Based on Knowledge Patterns

    Zhuo Chen, Kyle Marple, Elmer Salazar, Gopal Gupta, Lakshman Tamil
    Comments: Paper presented at the 32nd International Conference on Logic Programming (ICLP 2016), New York City, USA, 16-21 October 2016, 14 pages, LaTeX
    Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)

    Management of chronic diseases such as heart failure, diabetes, and chronic
    obstructive pulmonary disease (COPD) is a major problem in health care. A
    standard approach that the medical community has devised to manage widely
    prevalent chronic diseases such as chronic heart failure (CHF) is to have a
    committee of experts develop guidelines that all physicians should follow.
    These guidelines typically consist of a series of complex rules that make
    recommendations based on a patient’s information. Due to their complexity,
    often the guidelines are either ignored or not complied with at all, which can
    result in poor medical practices. It is not even clear whether it is humanly
    possible to follow these guidelines due to their length and complexity. In the
    case of CHF management, the guidelines run nearly 80 pages. In this paper we
    describe a physician-advisory system for CHF management that codes the entire
    set of clinical practice guidelines for CHF using answer set programming. Our
    approach is based on developing reasoning templates (that we call knowledge
    patterns) and using these patterns to systemically code the clinical guidelines
    for CHF as ASP rules. Use of the knowledge patterns greatly facilitates the
    development of our system. Given a patient’s medical information, our system
    generates a recommendation for treatment just as a human physician would, using
    the guidelines. Our system will work even in the presence of incomplete
    information. Our work makes two contributions: (i) it shows that highly complex
    guidelines can be successfully coded as ASP rules, and (ii) it develops a
    series of knowledge patterns that facilitate the coding of knowledge expressed
    in a natural language and that can be used for other application domains. This
    paper is under consideration for acceptance in TPLP.

    Kissing Cuisines: Exploring Worldwide Culinary Habits on the Web

    Sina Sajadmanesh, Sina Jafarzadeh, Seyed Ali Ossia, Hamid R. Rabiee, Hamed Haddadi, Yelena Mejova, Mirco Musolesi, Emiliano De Cristofaro, Gianluca Stringhini
    Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

    As food becomes an important part of modern life, recipes shared on the web
    are a great indicator of civilizations and culinary attitudes in different
    countries. Similarly, ingredients, flavors, and nutrition information are
    strong signals of the taste preferences of individuals from various parts of
    the world. Yet, we do not have a thorough understanding of these palate
    varieties.

    In this paper, we present a large-scale study of recipes published on the Web
    and their content, aiming to understand cuisines and culinary habits around the
    world. Using a database of more than 157K recipes from over 200 different
    cuisines, we analyze ingredients, flavors, and nutritional values which
    distinguish dishes from different regions, and use this knowledge to assess the
    predictability of recipes from different cuisines. We then use country health
    statistics to understand the relation between these factors and health
    indicators of different nations, such as obesity, diabetes, migration, and
    health expenditure. Our results confirm the strong effects of geographical and
    cultural similarities on recipes, health indicators, and culinary preferences
    between countries around the world.

    Universal adversarial perturbations

    Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Given a state-of-the-art deep neural network classifier, we show the
    existence of a universal (image-agnostic) and very small perturbation vector
    that causes natural images to be misclassified with high probability. We
    propose a systematic algorithm for computing universal perturbations, and show
    that state-of-the-art deep neural networks are highly vulnerable to such
    perturbations, albeit being quasi-imperceptible to the human eye. We further
    empirically analyze these universal perturbations and show, in particular, that
    they generalize very well across neural networks. The surprising existence of
    universal perturbations reveals important geometric correlations among the
    high-dimensional decision boundary of classifiers. It further outlines
    potential security breaches with the existence of single directions in the
    input space that adversaries can possibly exploit to break a classifier on most
    natural images.

    Quantum-enhanced machine learning

    Vedran Dunjko, Jacob M. Taylor, Hans J. Briegel
    Comments: 5+15 pages. This paper builds upon and mostly supersedes arXiv:1507.08482. In addition to results provided in this previous work, here we achieve learning improvements in more general environments, and provide connections to other work in quantum machine learning. Explicit constructions of oracularized environments given in arXiv:1507.08482 are omitted in this version
    Journal-ref: Phys. Rev. Lett. 117, 130501 (2016)
    Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Learning (cs.LG)

    The emerging field of quantum machine learning has the potential to
    substantially aid in the problems and scope of artificial intelligence. This is
    only enhanced by recent successes in the field of classical machine learning.
    In this work we propose an approach for the systematic treatment of machine
    learning, from the perspective of quantum information. Our approach is general
    and covers all three main branches of machine learning: supervised,
    unsupervised and reinforcement learning. While quantum improvements in
    supervised and unsupervised learning have been reported, reinforcement learning
    has received much less attention. Within our approach, we tackle the problem of
    quantum enhancements in reinforcement learning as well, and propose a
    systematic scheme for providing improvements. As an example, we show that
    quadratic improvements in learning efficiency, and exponential improvements in
    performance over limited time periods, can be obtained for a broad class of
    learning problems.

    Infinite-dimensional Log-Determinant divergences II: Alpha-Beta divergences

    Minh Ha Quang
    Comments: 60 pages
    Subjects: Functional Analysis (math.FA); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

    This work presents a parametrized family of divergences, namely Alpha-Beta
    Log- Determinant (Log-Det) divergences, between positive definite unitized
    trace class operators on a Hilbert space. This is a generalization of the
    Alpha-Beta Log-Determinant divergences between symmetric, positive definite
    matrices to the infinite-dimensional setting. The family of Alpha-Beta Log-Det
    divergences is highly general and contains many divergences as special cases,
    including the recently formulated infinite dimensional affine-invariant
    Riemannian distance and the infinite-dimensional Alpha Log-Det divergences
    between positive definite unitized trace class operators. In particular, it
    includes a parametrized family of metrics between positive definite trace class
    operators, with the affine-invariant Riemannian distance and the square root of
    the symmetric Stein divergence being special cases. For the Alpha-Beta Log-Det
    divergences between covariance operators on a Reproducing Kernel Hilbert Space
    (RKHS), we obtain closed form formulas via the corresponding Gram matrices.


    Information Retrieval

    Inferring individual attributes from search engine queries and auxiliary information

    Luca Soldaini, Elad Yom-Tov
    Subjects: Information Retrieval (cs.IR)

    Internet data has surfaced as a primary source for investigation of different
    aspects of human behavior. A crucial step in such studies is finding a suitable
    cohort (i.e., a set of users) that shares a common trait of interest to
    researchers. However, direct identification of users sharing this trait is
    often impossible, as the data available to researchers is usually anonymized to
    preserve user privacy. To facilitate research on specific topics of interest,
    especially in medicine, we introduce an algorithm for identifying a trait of
    interest in anonymous users. We illustrate how a small set of labeled examples,
    together with statistical information about the entire population, can be
    aggregated to obtain labels on unseen examples. We validate our approach using
    labeled data from the political domain.

    We provide two applications of the proposed algorithm to the medical domain.
    In the first, we demonstrate how to identify users whose search patterns
    indicate they might be suffering from certain types of cancer. In the second,
    we detail an algorithm to predict the distribution of diseases given their
    incidence in a subset of the population at study, making it possible to predict
    disease spread from partial epidemiological data.

    Learning to Match Using Local and Distributed Representations of Text for Web Search

    Bhaskar Mitra, Fernando Diaz, Nick Craswell
    Subjects: Information Retrieval (cs.IR)

    Models such as latent semantic analysis and those based on neural embeddings
    learn distributed representations of text, and match the query against the
    document in the latent semantic space. In traditional information retrieval
    models, on the other hand, terms have discrete or local representations, and
    the relevance of a document is determined by the exact matches of query terms
    in the body text. We hypothesize that matching with distributed representations
    complements matching with traditional local representations, and that a
    combination of the two is favorable. We propose a novel document ranking model
    composed of two separate deep neural networks, one that matches the query and
    the document using a local representation, and another that matches the query
    and the document using learned distributed representations. The two networks
    are jointly trained as part of a single neural network. We show that this
    combination or `duet’ performs significantly better than either neural network
    individually on a Web page ranking task, and also significantly outperforms
    traditional baselines and other recently proposed models based on neural
    networks.

    A recommender system for efficient discovery of new anomalies in large-scale access logs

    Heju Jiang, Scott Algatt, Parvez Ahammad
    Comments: 9 pages, 6 figures
    Subjects: Information Retrieval (cs.IR); Cryptography and Security (cs.CR)

    We present a novel, non-standard recommender system for large-scale security
    policy management(SPM). Our system Helios discovers and recommends unknown and
    unseen anomalies in large-scale access logs with minimal supervision and no
    starting information on users and items. Typical recommender systems assume
    availability of user- and item-related information, but such information is not
    usually available in access logs. To resolve this problem, we first use
    discrete categorical labels to construct categorical combinations from access
    logs in a bootstrapping manner. Then, we utilize rank statistics of entity rank
    and order categorical combinations for recommendation. From a double-sided cold
    start, with minimal supervision, Helios learns to recommend most salient
    anomalies at large-scale, and provides visualizations to security experts to
    explain rationale behind the recommendations. Our experiments show Helios to be
    suitable for large-scale applications: from cold starts, in less than 60
    minutes, Helios can analyze roughly 4.6 billion records in logs of 400GB with
    about 300 million potential categorical combinations, then generate ranked
    categorical combinations as recommended discoveries. We also show that, even
    with limited computing resources, Helios accelerates unknown and unseen anomaly
    discovery process for SPM by 1 to 3 orders of magnitude, depending on use
    cases. In addition, Helios’ design is flexible with metrics and measurement
    fields used for discoveries and recommendations. Overall, our system leads to
    more efficient and customizable SPM processes with faster discoveries of unseen
    and unknown anomalies.

    Modeling Ambiguity, Subjectivity, and Diverging Viewpoints in Opinion Question Answering Systems

    Mengting Wan, Julian McAuley
    Comments: 10 pages, accepted by ICDM’2016
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    Product review websites provide an incredible lens into the wide variety of
    opinions and experiences of different people, and play a critical role in
    helping users discover products that match their personal needs and
    preferences. To help address questions that can’t easily be answered by reading
    others’ reviews, some review websites also allow users to pose questions to the
    community via a question-answering (QA) system. As one would expect, just as
    opinions diverge among different reviewers, answers to such questions may also
    be subjective, opinionated, and divergent. This means that answering such
    questions automatically is quite different from traditional QA tasks, where it
    is assumed that a single `correct’ answer is available. While recent work
    introduced the idea of question-answering using product reviews, it did not
    account for two aspects that we consider in this paper: (1) Questions have
    multiple, often divergent, answers, and this full spectrum of answers should
    somehow be used to train the system; and (2) What makes a `good’ answer depends
    on the asker and the answerer, and these factors should be incorporated in
    order for the system to be more personalized. Here we build a new QA dataset
    with 800 thousand questions—and over 3.1 million answers—and show that
    explicitly accounting for personalization and ambiguity leads both to
    quantitatively better answers, but also a more nuanced view of the range of
    supporting, but subjective, opinions.

    Dis-S2V: Discourse Informed Sen2Vec

    Tanay Kumar Saha, Shafiq Joty, Naeemul Hassan, Mohammad Al Hasan
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Vector representation of sentences is important for many text processing
    tasks that involve clustering, classifying, or ranking sentences. Recently,
    distributed representation of sentences learned by neural models from unlabeled
    data has been shown to outperform the traditional bag-of-words representation.
    However, most of these learning methods consider only the content of a sentence
    and disregard the relations among sentences in a discourse by and large.

    In this paper, we propose a series of novel models for learning latent
    representations of sentences (Sen2Vec) that consider the content of a sentence
    as well as inter-sentence relations. We first represent the inter-sentence
    relations with a language network and then use the network to induce contextual
    information into the content-based Sen2Vec models. Two different approaches are
    introduced to exploit the information in the network. Our first approach
    retrofits (already trained) Sen2Vec vectors with respect to the network in two
    different ways: (1) using the adjacency relations of a node, and (2) using a
    stochastic sampling method which is more flexible in sampling neighbors of a
    node. The second approach uses a regularizer to encode the information in the
    network into the existing Sen2Vec model. Experimental results show that our
    proposed models outperform existing methods in three fundamental information
    system tasks demonstrating the effectiveness of our approach. The models
    leverage the computational power of multi-core CPUs to achieve fine-grained
    computational efficiency. We make our code publicly available upon acceptance.


    Computation and Language

    Distraction-Based Neural Networks for Document Summarization

    Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang
    Comments: Published in IJCAI-2016: the 25th International Joint Conference on Artificial Intelligence
    Journal-ref: IJCAI, 2016
    Subjects: Computation and Language (cs.CL)

    Distributed representation learned with neural networks has recently shown to
    be effective in modeling natural languages at fine granularities such as words,
    phrases, and even sentences. Whether and how such an approach can be extended
    to help model larger spans of text, e.g., documents, is intriguing, and further
    investigation would still be desirable. This paper aims to enhance neural
    network models for such a purpose. A typical problem of document-level modeling
    is automatic summarization, which aims to model documents in order to generate
    summaries. In this paper, we propose neural models to train computers not just
    to pay attention to specific regions and content of input documents with
    attention models, but also distract them to traverse between different content
    of a document so as to better grasp the overall meaning for summarization.
    Without engineering any features, we train the models on two large datasets.
    The models achieve the state-of-the-art performance, and they significantly
    benefit from the distraction modeling, particularly when input documents are
    long.

    Broad Context Language Modeling as Reading Comprehension

    Zewei Chu, Hai Wang, Kevin Gimpel, David McAllester
    Subjects: Computation and Language (cs.CL)

    Progress in text understanding has been driven by the availability of large
    datasets that test particular capabilities, like recent datasets for assessing
    reading comprehension. We focus here on the LAMBADA dataset, a word prediction
    task requiring broader context than the immediate sentence. We view the LAMBADA
    task as a reading comprehension problem and apply off-the-shelf comprehension
    models based on neural networks. Though these models are constrained to choose
    a word from the context, they improve the state of the art on LAMBADA from 7.3%
    to 45.4%. We analyze 100 instances, finding that neural network readers perform
    well in cases that involve selecting a name from the context based on dialogue
    or discourse cues but struggle when coreference resolution or external
    knowledge is needed.

    Content Selection in Data-to-Text Systems: A Survey

    Dimitra Gkatzia
    Subjects: Computation and Language (cs.CL)

    Data-to-text systems are powerful in generating reports from data
    automatically and thus they simplify the presentation of complex data. Rather
    than presenting data using visualisation techniques, data-to-text systems use
    natural (human) language, which is the most common way for human-human
    communication. In addition, data-to-text systems can adapt their output content
    to users’ preferences, background or interests and therefore they can be
    pleasant for users to interact with. Content selection is an important part of
    every data-to-text system, because it is the module that determines which from
    the available information should be conveyed to the user. This survey initially
    introduces the field of data-to-text generation, describes the general
    data-to-text system architecture and then it reviews the state-of-the-art
    content selection methods. Finally, it provides recommendations for choosing an
    approach and discusses opportunities for future research.

    Dis-S2V: Discourse Informed Sen2Vec

    Tanay Kumar Saha, Shafiq Joty, Naeemul Hassan, Mohammad Al Hasan
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Vector representation of sentences is important for many text processing
    tasks that involve clustering, classifying, or ranking sentences. Recently,
    distributed representation of sentences learned by neural models from unlabeled
    data has been shown to outperform the traditional bag-of-words representation.
    However, most of these learning methods consider only the content of a sentence
    and disregard the relations among sentences in a discourse by and large.

    In this paper, we propose a series of novel models for learning latent
    representations of sentences (Sen2Vec) that consider the content of a sentence
    as well as inter-sentence relations. We first represent the inter-sentence
    relations with a language network and then use the network to induce contextual
    information into the content-based Sen2Vec models. Two different approaches are
    introduced to exploit the information in the network. Our first approach
    retrofits (already trained) Sen2Vec vectors with respect to the network in two
    different ways: (1) using the adjacency relations of a node, and (2) using a
    stochastic sampling method which is more flexible in sampling neighbors of a
    node. The second approach uses a regularizer to encode the information in the
    network into the existing Sen2Vec model. Experimental results show that our
    proposed models outperform existing methods in three fundamental information
    system tasks demonstrating the effectiveness of our approach. The models
    leverage the computational power of multi-core CPUs to achieve fine-grained
    computational efficiency. We make our code publicly available upon acceptance.

    Word Embeddings and Their Use In Sentence Classification Tasks

    Amit Mandelbaum, Adi Shalev
    Comments: 13 pages, 10 figures
    Subjects: Learning (cs.LG); Computation and Language (cs.CL)

    This paper have two parts. In the first part we discuss word embeddings. We
    discuss the need for them, some of the methods to create them, and some of
    their interesting properties. We also compare them to image embeddings and see
    how word embedding and image embedding can be combined to perform different
    tasks. In the second part we implement a convolutional neural network trained
    on top of pre-trained word vectors. The network is used for several
    sentence-level classification tasks, and achieves state-of-art (or comparable)
    results, demonstrating the great power of pre-trainted word embeddings over
    random ones.

    Modeling Ambiguity, Subjectivity, and Diverging Viewpoints in Opinion Question Answering Systems

    Mengting Wan, Julian McAuley
    Comments: 10 pages, accepted by ICDM’2016
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    Product review websites provide an incredible lens into the wide variety of
    opinions and experiences of different people, and play a critical role in
    helping users discover products that match their personal needs and
    preferences. To help address questions that can’t easily be answered by reading
    others’ reviews, some review websites also allow users to pose questions to the
    community via a question-answering (QA) system. As one would expect, just as
    opinions diverge among different reviewers, answers to such questions may also
    be subjective, opinionated, and divergent. This means that answering such
    questions automatically is quite different from traditional QA tasks, where it
    is assumed that a single `correct’ answer is available. While recent work
    introduced the idea of question-answering using product reviews, it did not
    account for two aspects that we consider in this paper: (1) Questions have
    multiple, often divergent, answers, and this full spectrum of answers should
    somehow be used to train the system; and (2) What makes a `good’ answer depends
    on the asker and the answerer, and these factors should be incorporated in
    order for the system to be more personalized. Here we build a new QA dataset
    with 800 thousand questions—and over 3.1 million answers—and show that
    explicitly accounting for personalization and ambiguity leads both to
    quantitatively better answers, but also a more nuanced view of the range of
    supporting, but subjective, opinions.


    Distributed, Parallel, and Cluster Computing

    Havens: Explicit Reliable Memory Regions for HPC Applications

    Saurabh Hukerikar, Christian Engelmann
    Comments: 2016 IEEE High Performance Extreme Computing Conference (HPEC ’16), September 2016, Waltham, MA, USA
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Supporting error resilience in future exascale-class supercomputing systems
    is a critical challenge. Due to transistor scaling trends and increasing memory
    density, scientific simulations are expected to experience more interruptions
    caused by transient errors in the system memory. Existing hardware-based
    detection and recovery techniques will be inadequate to manage the presence of
    high memory fault rates.

    In this paper we propose a partial memory protection scheme based on
    region-based memory management. We define the concept of regions called havens
    that provide fault protection for program objects. We provide reliability for
    the regions through a software-based parity protection mechanism. Our approach
    enables critical program objects to be placed in these havens. The fault
    coverage provided by our approach is application agnostic, unlike
    algorithm-based fault tolerance techniques.

    Oh-RAM! One and a Half Round Atomic Memory

    Theophanis Hadjistasi, Nicolas Nicolaou, Alexander Schwarzmann
    Comments: A Brief Announcement related to this work was presented at ACM PODC, 2016
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Emulating atomic read/write shared objects in a message-passing system is a
    fundamental problem in distributed computing. Considering that network
    communication is the most expensive resource, efficiency is measured first of
    all in terms of the communication needed to implement read and write
    operations. It is well known that 2 communication round-trip phases involving
    in total 4 message exchanges are sufficient to implemented atomic operations.
    It is also known that under certain constraints on the number of readers with
    respect to the numbers of replica servers and failures it is possible to
    implement single-writer atomic objects such that each operation involves one
    round-trip phase. We present algorithms that allow operations to complete in 3
    communication exchanges without imposing any constraints on the number of
    readers and writers. Specifically, we present an atomic memory implementation
    for the SWMR setting, where reads complete in 3 communication exchanges and
    writes complete in 2 exchanges. We pose the question of whether it is possible
    to implement MWMR memory where operations complete in at most 3 communication
    exchanges. We answer this question in the negative by showing that an atomic
    memory implementation is impossible if both read and write operations take 3
    communication exchanges, even when assuming two writers, two readers, and a
    single replica server failure. Motivated by this impossibility result, we
    provide a MWMR atomic memory implementation where reads involve 3 and writes 4
    communication exchanges. In light of our impossibility result these algorithms
    are optimal in terms of the number of communication exchanges. We rigorously
    reason about the correctness of the algorithms.

    An Enhanced Model for Stochastic Coordination

    Nuno Oliveira (HASLab INESC TEX), Luis Soares Barbosa (HASLab INESC TEC)
    Comments: In Proceedings iFMCloud 2016, arXiv:1610.07700
    Journal-ref: EPTCS 228, 2016, pp. 35-45
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Logic in Computer Science (cs.LO); Software Engineering (cs.SE)

    Applications developed over the cloud coordinate several, often anonymous,
    computational resources, distributed over different execution nodes, within
    flexible architectures. Coordination models able to represent quantitative data
    provide a powerful basis for their analysis and validation. This paper extends
    IMCreo, a semantic model for Stochastic reo based on interactive Markov chains,
    to enhance its scalability, by regarding each channel and node, as well as
    interface components, as independent stochastic processes that may (or may not)
    synchronise with the rest of the coordination circuit.

    Modeling Deployment Decisions for Elastic Services with ABS

    Einar Broch Johnsen, Ka I Pun, S. Lizeth Tapia Tarifa
    Comments: In Proceedings iFMCloud 2016, arXiv:1610.07700
    Journal-ref: EPTCS 228, 2016, pp. 16-26
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)

    The use of cloud technology can offer significant savings for the deployment
    of services, provided that the service is able to make efficient use of the
    available virtual resources to meet service-level requirements. To avoid
    software designs that scale poorly, it is important to make deployment
    decisions for the service at design time, early in the development of the
    service itself. ABS offers a formal, model-based approach which integrates the
    design of services with the modeling of deployment decisions. In this paper, we
    illustrate the main concepts of this approach by modeling a scalable pool of
    workers with an auto-scaling strategy and by using the model to compare
    deployment decisions with respect to client traffic with peak loads.

    Static Analysis Using the Cloud

    Rahul Kumar (Microsoft Research, Redmond, WA, USA), Chetan Bansal (Microsoft Research, Redmond, WA, USA), Jakob Lichtenberg (Microsoft, Redmond, WA, USA)
    Comments: In Proceedings iFMCloud 2016, arXiv:1610.07700
    Journal-ref: EPTCS 228, 2016, pp. 2-15
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)

    In this paper we describe our experience of using Microsoft Azure cloud
    computing platform for static analysis. We start by extending Static Driver
    Verifier to operate in the Microsoft Azure cloud with significant improvements
    in performance and scalability. We present our results of using SDV on single
    drivers and driver suites using various configurations of the cloud relative to
    a local machine. Finally, we describe the Static Module Verifier platform, a
    highly extensible and configurable platform for static analysis of generic
    modules, where we have integrated support for verification using a cloud
    services provider (Microsoft Azure in this case).

    The Reverse Cuthill-McKee Algorithm in Distributed-Memory

    Ariful Azad, Mathias Jacquelin, Aydin Buluc, Esmond G. Ng
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (cs.NA); Numerical Analysis (math.NA)

    Ordering vertices of a graph is key to minimize fill-in and data structure
    size in sparse direct solvers, maximize locality in iterative solvers, and
    improve performance in graph algorithms. Except for naturally parallelizable
    ordering methods such as nested dissection, many important ordering methods
    have not been efficiently mapped to distributed-memory architectures. In this
    paper, we present the first-ever distributed-memory implementation of the
    reverse Cuthill-McKee (RCM) algorithm for reducing the profile of a sparse
    matrix. Our parallelization uses a two-dimensional sparse matrix decomposition.
    We achieve high performance by decomposing the problem into a small number of
    primitives and utilizing optimized implementations of these primitives. Our
    implementation shows strong scaling up to 1024 cores for smaller matrices and
    up to 4096 cores for larger matrices.

    Evaluating load balancing policies for performance and energy-efficiency

    Freek van den Berg (University of Twente), Björn F. Postema (University of Twente), Boudewijn R. Haverkort (University of Twente)
    Comments: In Proceedings QAPL’16, arXiv:1610.07696
    Journal-ref: EPTCS 227, 2016, pp. 98-117
    Subjects: Performance (cs.PF); Distributed, Parallel, and Cluster Computing (cs.DC)

    Nowadays, more and more increasingly hard computations are performed in
    challenging fields like weather forecasting, oil and gas exploration, and
    cryptanalysis. Many of such computations can be implemented using a computer
    cluster with a large number of servers. Incoming computation requests are then,
    via a so-called load balancing policy, distributed over the servers to ensure
    optimal performance. Additionally, being able to switch-off some servers during
    low period of workload, gives potential to reduced energy consumption.
    Therefore, load balancing forms, albeit indirectly, a trade-off between
    performance and energy consumption. In this paper, we introduce a syntax for
    load-balancing policies to dynamically select a server for each request based
    on relevant criteria, including the number of jobs queued in servers, power
    states of servers, and transition delays between power states of servers. To
    evaluate many policies, we implement two load balancers in: (i) iDSL, a
    language and tool-chain for evaluating service-oriented systems, and (ii) a
    simulation framework in AnyLogic. Both implementations are successfully
    validated by comparison of the results.

    Parameterized Dataflow (Extended Abstract)

    Dominic Duggan (Stevens Institute of Technology), Jianhua Yao (Stevens Institute of Technology)
    Comments: In Proceedings QAPL’16, arXiv:1610.07696
    Journal-ref: EPTCS 227, 2016, pp. 63-81
    Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC)

    Dataflow networks have application in various forms of stream processing, for
    example for parallel processing of multimedia data. The description of dataflow
    graphs, including their firing behavior, is typically non-compositional and not
    amenable to separate compilation. This article considers a dataflow language
    with a type and effect system that captures the firing behavior of actors. This
    system allows definitions to abstract over actor firing rates, supporting the
    definition and safe composition of actor definitions where firing rates are not
    instantiated until a dataflow graph is launched.


    Learning

    Adaptive matching pursuit for sparse signal recovery

    Tiep H. Vu, Hojjat S. Mousavi, Vishal Monga
    Comments: ICASSP
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Spike and Slab priors have been of much recent interest in signal processing
    as a means of inducing sparsity in Bayesian inference. Applications domains
    that benefit from the use of these priors include sparse recovery, regression
    and classification. It is well-known that solving for the sparse coefficient
    vector to maximize these priors results in a hard non-convex and mixed integer
    programming problem. Most existing solutions to this optimization problem
    either involve simplifying assumptions/relaxations or are computationally
    expensive. We propose a new greedy and adaptive matching pursuit (AMP)
    algorithm to directly solve this hard problem. Essentially, in each step of the
    algorithm, the set of active elements would be updated by either adding or
    removing one index, whichever results in better improvement. In addition, the
    intermediate steps of the algorithm are calculated via an inexpensive Cholesky
    decomposition which makes the algorithm much faster. Results on simulated data
    sets as well as real-world image recovery challenges confirm the benefits of
    the proposed AMP, particularly in providing a superior cost-quality trade-off
    over existing alternatives.

    An Improved Approach for Prediction of Parkinson's Disease using Machine Learning Techniques

    Kamal Nayan Reddy Challa, Venkata Sasank Pagolu, Ganapati Panda, Babita Majhi
    Comments: Conference Paper
    Subjects: Learning (cs.LG)

    Parkinson’s disease (PD) is one of the major public health problems in the
    world. It is a well-known fact that around one million people suffer from
    Parkinson’s disease in the United States whereas the number of people suffering
    from Parkinson’s disease worldwide is around 5 million. Thus, it is important
    to predict Parkinson’s disease in early stages so that early plan for the
    necessary treatment can be made. People are mostly familiar with the motor
    symptoms of Parkinson’s disease, however, an increasing amount of research is
    being done to predict the Parkinson’s disease from non-motor symptoms that
    precede the motor ones. If an early and reliable prediction is possible then a
    patient can get a proper treatment at the right time. Nonmotor symptoms
    considered are Rapid Eye Movement (REM) sleep Behaviour Disorder (RBD) and
    olfactory loss. Developing machine learning models that can help us in
    predicting the disease can play a vital role in early prediction. In this
    paper, we extend a work which used the non-motor features such as RBD and
    olfactory loss. Along with this the extended work also uses important
    biomarkers. In this paper, we try to model this classifier using different
    machine learning models that have not been used before. We developed automated
    diagnostic models using Multilayer Perceptron, BayesNet, Random Forest and
    Boosted Logistic Regression. It has been observed that Boosted Logistic
    Regression provides the best performance with an impressive accuracy of 97.159
    % and the area under the ROC curve was 98.9%. Thus, it is concluded that these
    models can be used for early prediction of Parkinson’s disease.

    Things Bayes can't do

    Daniil Ryabko
    Journal-ref: In Proceedings of ALT, LNCS 9925, pp.253-260, Bari, Italy, 2016
    Subjects: Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

    The problem of forecasting conditional probabilities of the next event given
    the past is considered in a general probabilistic setting. Given an arbitrary
    (large, uncountable) set C of predictors, we would like to construct a single
    predictor that performs asymptotically as well as the best predictor in C, on
    any data. Here we show that there are sets C for which such predictors exist,
    but none of them is a Bayesian predictor with a prior concentrated on C. In
    other words, there is a predictor with sublinear regret, but every Bayesian
    predictor must have a linear regret. This negative finding is in sharp contrast
    with previous results that establish the opposite for the case when one of the
    predictors in (C) achieves asymptotically vanishing error. In such a case, if
    there is a predictor that achieves asymptotically vanishing error for any
    measure in C, then there is a Bayesian predictor that also has this property,
    and whose prior is concentrated on (a countable subset of) C.

    Word Embeddings and Their Use In Sentence Classification Tasks

    Amit Mandelbaum, Adi Shalev
    Comments: 13 pages, 10 figures
    Subjects: Learning (cs.LG); Computation and Language (cs.CL)

    This paper have two parts. In the first part we discuss word embeddings. We
    discuss the need for them, some of the methods to create them, and some of
    their interesting properties. We also compare them to image embeddings and see
    how word embedding and image embedding can be combined to perform different
    tasks. In the second part we implement a convolutional neural network trained
    on top of pre-trained word vectors. The network is used for several
    sentence-level classification tasks, and achieves state-of-art (or comparable)
    results, demonstrating the great power of pre-trainted word embeddings over
    random ones.

    Fast Bayesian Non-Negative Matrix Factorisation and Tri-Factorisation

    Thomas Brouwer, Jes Frellsen, Pietro Lio'
    Comments: NIPS 2016 Workshop on Advances in Approximate Bayesian Inference
    Subjects: Learning (cs.LG); Numerical Analysis (cs.NA); Machine Learning (stat.ML)

    We present a fast variational Bayesian algorithm for performing non-negative
    matrix factorisation and tri-factorisation. We show that our approach achieves
    faster convergence per iteration and timestep (wall-clock) than Gibbs sampling
    and non-probabilistic approaches, and do not require additional samples to
    estimate the posterior. We show that in particular for matrix tri-factorisation
    convergence is difficult, but our variational Bayesian approach offers a fast
    solution, allowing the tri-factorisation approach to be used more effectively.

    Socratic Learning

    Rose Yu, Paroma Varma, Dan Iter, Christopher De Sa, Christopher Ré
    Comments: 3 figures
    Subjects: Learning (cs.LG)

    Modern machine learning techniques, such as deep learning, often use
    discriminative models that require large amounts of labeled data. An
    alternative approach is to use a generative model, which leverages heuristics
    from domain experts to train on unlabeled data. Domain experts often prefer to
    use generative models because they “tell a story” about their data.
    Unfortunately, generative models are typically less accurate than
    discriminative models. Several recent approaches combine both types of model to
    exploit their strengths. In this setting, a misspecified generative model can
    hurt the performance of subsequent discriminative training. To address this
    issue, we propose a framework called Socratic learning that automatically uses
    information from the discriminative model to correct generative model
    misspecification. Furthermore, this process provides users with interpretable
    feedback about how to improve their generative model. We evaluate Socratic
    learning on real-world relation extraction tasks and observe an immediate
    improvement in classification accuracy that could otherwise require several
    weeks of effort by domain experts.

    Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment

    Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, Krishna P. Gummadi
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    As the use of automated decision making systems becomes wide-spread, there is
    a growing concern about their potential unfairness towards people with certain
    traits. Anti-discrimination laws in various countries prohibit unfair treatment
    of individuals based on specific traits, also called sensitive attributes
    (e.g., gender, race). In many learning scenarios, the trained algorithms
    (classifiers) make decisions with certain inaccuracy (misclassification rate).
    As learning mechanisms target minimizing the error rate for all decisions, it
    is quite possible that the optimally trained algorithm makes decisions for
    users belonging to different sensitive attribute groups with different error
    rates (e.g., decision errors for females are higher than for males). To account
    for and avoid such unfairness when learning, in this paper, we introduce a new
    notion of unfairness, disparate mistreatment, which is defined in terms of
    misclassification rates. We then propose an intuitive measure of disparate
    mistreatment for decision boundary-based classifiers, which can be easily
    incorporated into their formulation as a convex-concave constraint. Experiments
    on synthetic as well as real world datasets show that our methodology is
    effective at avoiding disparate mistreatment, often at a small cost in terms of
    accuracy.

    Counterfactual Reasoning about Intent for Interactive Navigation in Dynamic Environments

    A. Bordallo, F. Previtali, N. Nardelli, S. Ramamoorthy
    Journal-ref: 2015 IEEE/RSJ International Conference on Intelligent Robots and
    Systems (IROS 2015), pp. 2943-2950
    Subjects: Robotics (cs.RO); Learning (cs.LG)

    Many modern robotics applications require robots to function autonomously in
    dynamic environments including other decision making agents, such as people or
    other robots. This calls for fast and scalable interactive motion planning.
    This requires models that take into consideration the other agent’s intended
    actions in one’s own planning. We present a real-time motion planning framework
    that brings together a few key components including intention inference by
    reasoning counterfactually about potential motion of the other agents as they
    work towards different goals. By using a light-weight motion model, we achieve
    efficient iterative planning for fluid motion when avoiding pedestrians, in
    parallel with goal inference for longer range movement prediction. This
    inference framework is coupled with a novel distributed visual tracking method
    that provides reliable and robust models for the current belief-state of the
    monitored environment. This combined approach represents a computationally
    efficient alternative to previously studied policy learning methods that often
    require significant offline training or calibration and do not yet scale to
    densely populated environments. We validate this framework with experiments
    involving multi-robot and human-robot navigation. We further validate the
    tracker component separately on much larger scale unconstrained pedestrian data
    sets.

    Universal adversarial perturbations

    Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Given a state-of-the-art deep neural network classifier, we show the
    existence of a universal (image-agnostic) and very small perturbation vector
    that causes natural images to be misclassified with high probability. We
    propose a systematic algorithm for computing universal perturbations, and show
    that state-of-the-art deep neural networks are highly vulnerable to such
    perturbations, albeit being quasi-imperceptible to the human eye. We further
    empirically analyze these universal perturbations and show, in particular, that
    they generalize very well across neural networks. The surprising existence of
    universal perturbations reveals important geometric correlations among the
    high-dimensional decision boundary of classifiers. It further outlines
    potential security breaches with the existence of single directions in the
    input space that adversaries can possibly exploit to break a classifier on most
    natural images.

    Quantum-enhanced machine learning

    Vedran Dunjko, Jacob M. Taylor, Hans J. Briegel
    Comments: 5+15 pages. This paper builds upon and mostly supersedes arXiv:1507.08482. In addition to results provided in this previous work, here we achieve learning improvements in more general environments, and provide connections to other work in quantum machine learning. Explicit constructions of oracularized environments given in arXiv:1507.08482 are omitted in this version
    Journal-ref: Phys. Rev. Lett. 117, 130501 (2016)
    Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Learning (cs.LG)

    The emerging field of quantum machine learning has the potential to
    substantially aid in the problems and scope of artificial intelligence. This is
    only enhanced by recent successes in the field of classical machine learning.
    In this work we propose an approach for the systematic treatment of machine
    learning, from the perspective of quantum information. Our approach is general
    and covers all three main branches of machine learning: supervised,
    unsupervised and reinforcement learning. While quantum improvements in
    supervised and unsupervised learning have been reported, reinforcement learning
    has received much less attention. Within our approach, we tackle the problem of
    quantum enhancements in reinforcement learning as well, and propose a
    systematic scheme for providing improvements. As an example, we show that
    quadratic improvements in learning efficiency, and exponential improvements in
    performance over limited time periods, can be obtained for a broad class of
    learning problems.

    Universality of Baysian mixture predictors

    Daniil Ryabko
    Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Learning (cs.LG)

    The problem is that of sequential probability forecasting for finite-valued
    time series. The data is generated by an unknown probability distribution over
    the space of all one-way infinite sequences. It is known that this measure
    belongs to a given set C, but the latter is completely arbitrary (uncountably
    infinite, without any structure given). The performance is measured with
    asymptotic average log loss. In this work it is shown that the minimax
    asymptotic performance is always attainable, and it is attained by a convex
    combination of a countably many measures from the set C (a Bayesian mixture).
    This was previously only known for the case when the best achievable asymptotic
    error is 0. This also contrasts previous results that show that in the
    non-realizable case all Bayesian mixtures may be suboptimal, while there is a
    predictor that achieves the optimal performance.

    Automatic measurement of vowel duration via structured prediction

    Yossi Adi, Joseph Keshet, Emily Cibelli, Erin Gustafson, Cynthia Clopper, Matthew Goldrick
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)

    A key barrier to making phonetic studies scalable and replicable is the need
    to rely on subjective, manual annotation. To help meet this challenge, a
    machine learning algorithm was developed for automatic measurement of a widely
    used phonetic measure: vowel duration. Manually-annotated data were used to
    train a model that takes as input an arbitrary length segment of the acoustic
    signal containing a single vowel that is preceded and followed by consonants
    and outputs the duration of the vowel. The model is based on the structured
    prediction framework. The input signal and a hypothesized set of a vowel’s
    onset and offset are mapped to an abstract vector space by a set of acoustic
    feature functions. The learning algorithm is trained in this space to minimize
    the difference in expectations between predicted and manually-measured vowel
    durations. The trained model can then automatically estimate vowel durations
    without phonetic or orthographic transcription. Results comparing the model to
    three sets of manually annotated data suggest it out-performed the current gold
    standard for duration measurement, an HMM-based forced aligner (which requires
    orthographic or phonetic transcription as an input).

    Image Segmentation for Fruit Detection and Yield Estimation in Apple Orchards

    Suchet Bargoti, James Underwood
    Comments: This paper is the initial version of the manuscript submitted to The Journal of Field Robotics in May 2016. Following reviews and revisions, the paper has been accepted for publication. The reviewed version includes extended comparison between the different classification frameworks and a more in-depth literature review
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Ground vehicles equipped with monocular vision systems are a valuable source
    of high resolution image data for precision agriculture applications in
    orchards. This paper presents an image processing framework for fruit detection
    and counting using orchard image data. A general purpose image segmentation
    approach is used, including two feature learning algorithms; multi-scale
    Multi-Layered Perceptrons (MLP) and Convolutional Neural Networks (CNN). These
    networks were extended by including contextual information about how the image
    data was captured (metadata), which correlates with some of the appearance
    variations and/or class distributions observed in the data. The pixel-wise
    fruit segmentation output is processed using the Watershed Segmentation (WS)
    and Circular Hough Transform (CHT) algorithms to detect and count individual
    fruits. Experiments were conducted in a commercial apple orchard near
    Melbourne, Australia. The results show an improvement in fruit segmentation
    performance with the inclusion of metadata on the previously benchmarked MLP
    network. We extend this work with CNNs, bringing agrovision closer to the
    state-of-the-art in computer vision, where although metadata had negligible
    influence, the best pixel-wise F1-score of (0.791) was achieved. The WS
    algorithm produced the best apple detection and counting results, with a
    detection F1-score of (0.858). As a final step, image fruit counts were
    accumulated over multiple rows at the orchard and compared against the
    post-harvest fruit counts that were obtained from a grading and counting
    machine. The count estimates using CNN and WS resulted in the best performance
    for this dataset, with a squared correlation coefficient of (r^2=0.826).

    A statistical framework for fair predictive algorithms

    Kristian Lum, James Johndrow
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Predictive modeling is increasingly being employed to assist human
    decision-makers. One purported advantage of replacing human judgment with
    computer models in high stakes settings– such as sentencing, hiring, policing,
    college admissions, and parole decisions– is the perceived “neutrality” of
    computers. It is argued that because computer models do not hold personal
    prejudice, the predictions they produce will be equally free from prejudice.
    There is growing recognition that employing algorithms does not remove the
    potential for bias, and can even amplify it, since training data were
    inevitably generated by a process that is itself biased. In this paper, we
    provide a probabilistic definition of algorithmic bias. We propose a method to
    remove bias from predictive models by removing all information regarding
    protected variables from the permitted training data. Unlike previous work in
    this area, our framework is general enough to accommodate arbitrary data types,
    e.g. binary, continuous, etc. Motivated by models currently in use in the
    criminal justice system that inform decisions on pre-trial release and
    paroling, we apply our proposed method to a dataset on the criminal histories
    of individuals at the time of sentencing to produce “race-neutral” predictions
    of re-arrest. In the process, we demonstrate that the most common approach to
    creating “race-neutral” models– omitting race as a covariate– still results
    in racially disparate predictions. We then demonstrate that the application of
    our proposed method to these data removes racial disparities from predictions
    with minimal impact on predictive accuracy.


    Information Theory

    Low-Delay Distributed Source Coding for Time-Varying Sources with Unknown Statistics

    Fangzhou Chen, Bin Li, Can Emre Koksal
    Subjects: Information Theory (cs.IT)

    We consider a system in which two nodes take correlated measurements of a
    random source with time-varying and unknown statistics. The observations of the
    source at the first node are to be losslessly replicated with a given
    probability of outage at the second node, which receives data from the first
    node over a constant-rate errorless channel. We develop a system and associated
    strategies for joint distributed source coding (encoding and decoding) and
    transmission control in order to achieve low end-to-end delay. Slepian-Wolf
    coding in its traditional form cannot be applied in our scenario, since the
    encoder requires the joint statistics of the observations and the associated
    decoding delay is very high. We analytically evaluate the performance of our
    strategies and show that the delay achieved by them are order optimal, as the
    conditional entropy of the source approaches to the channel rate. We also
    evaluate the performance of our algorithms based on real-world experiments
    using two cameras recording videos of a scene at different angles. Having
    realized our schemes, we demonstrated that, even with a very low-complexity
    quantizer, a compression ratio of approximately 50% is achievable for lossless
    replication at the decoder, at an average delay of a few seconds.

    Sum-networks: Dependency on Characteristic of the Finite Field under Linear Network Coding

    Niladri Das, Brijesh Kumar Rai
    Subjects: Information Theory (cs.IT)

    Sum-networks are networks where all the terminals demand the sum of the
    symbols generated at the sources. It has been shown that for any finite
    set/co-finite set of prime numbers, there exists a sum-network which has a
    vector linear solution if and only if the characteristic of the finite field
    belongs to the given set. It has also been shown that for any positive rational
    number (k/n), there exists a sum-network which has capacity equal to (k/n). It
    is a natural question whether, for any positive rational number (k/n), and for
    any finite set/co-finite set of primes ({p_1,p_2,ldots,p_l}), there exists a
    sum-network which has a capacity achieving rate (k/n) fractional linear network
    coding solution if and only if the characteristic of the finite field belongs
    to the given set. We show that indeed there exists such a sum-network by
    constructing such a sum-network.

    Joint MOO of Transmit Precoding and Receiver Design in a Downlink Time Switching MISO SWIPT System

    Nafiseh Janatian, Ivan Stupia, Luc Vandendorpe
    Subjects: Information Theory (cs.IT)

    In this paper, we consider a time-switching (TS) co-located simultaneous
    wireless information and power transfer (SWIPT) system consisting of multiple
    multi-antenna access points which serve multiple single antenna users. In this
    scenario, we propose a multi-objective optimization (MOO) framework to design
    jointly the Pareto optimal beamforming vector and the TS ratio for each
    receiver. The objective is to maximize the utility vector including the
    achieved data rates and the harvested energies of all users simultaneously.
    This problem is a non-convex rank-constrained MOO problem which is relaxed and
    transformed into a non-convex semidefinite program (SDP) based on the weighted
    Chebycheff method. The majorization-minimization algorithm is utilized to solve
    the nonconvex SDP and the optimal solution is proved to satisfy the rank
    constraint. We also study the problem of optimizing the beamforming vectors in
    a fixed TS ratio scenario with the same approach. Numerical results are
    provided for two coordinated access points with MISO configuration. The results
    illustrate the trade-off between harvested energy and information data rate
    objectives and show the effect of optimizing the precoding strategy and TS
    ratio on this trade-off.

    Cube-Split: Structured Quantizers on the Grassmannian of Lines

    Alexis Decurninge, Maxime Guillaud
    Comments: submitted to IEEE Wireless Communications and Networking Conference 2017
    Subjects: Information Theory (cs.IT)

    This paper introduces a new quantization scheme for real and complex
    Grassmannian sources. The proposed approach relies on a structured codebook
    based on a geometric construction of a collection of bent grids defined from an
    initial mesh on the unit-norm sphere. The associated encoding and decoding
    algorithms have very low complexity (equivalent to a scalar quantizer), while
    their efficiency (in terms of the achieved distortion) is on par with the best
    known structured approaches, and compares well with the theoretical bounds.
    These properties make this codebook suitable for high-resolutions, real-time
    applications such as channel state feedback in massive multiple-input
    multiple-output (MIMO) wireless communication systems.

    A New Piggybacking Design for Systematic MDS Storage Codes

    Chong Shangguan, Gennian Ge
    Comments: 7 pages, submitted
    Subjects: Information Theory (cs.IT)

    Distributed storage codes have important applications in the design of modern
    storage systems. In a distributed storage system, every storage node has a
    probability to fail and once an individual storage node fails, it must be
    reconstructed using data stored in the surviving nodes. Computation load and
    network bandwidth are two important issues we need to concern when repairing a
    failed node. The traditional maximal distance separable (MDS) storage codes
    have low repair complexity but high repair bandwidth. On the contrary, minimal
    storage regenerating (MSR) codes have low repair bandwidth but high repair
    complexity. Fortunately, the newly introduced piggyback codes combine the
    advantages of both ones.

    In this paper, by introducing a novel piggybacking design framework for
    systematic MDS codes, we construct a storage code whose average repair
    bandwidth rate, i.e., the ratio of average repair bandwidth and the amount of
    the original data, can be as low as (frac{sqrt{2r-1}}{r}), which
    significantly improves the ratio (frac{r-1}{2r-1}) of the previous result. In
    the meanwhile, every failed systematic node of the new code can be
    reconstructed quickly using the decoding algorithm of an MDS code, only with
    some additional additions over the underlying finite field. This is very fast
    compared with the complex matrix multiplications needed in the repair of a
    failed node of an MSR code.

    The reversible negacyclic codes over finite fields

    Shixin Zhu, Binbin Pang, Zhonghua Sun
    Subjects: Information Theory (cs.IT)

    In this paper, by investigating the factor of the (x^n+1), we deduce that the
    structure of the reversible negacyclic code over the finite field
    (mathbb{F}_{q}), where (q) is an odd prime power. Though studying
    (q-)cyclotomic cosets modulo (2n), we obtain the parameters of negacyclic BCH
    code of length (n=frac{q^ell+1}{2}) , (n=frac{q^m-1}{2(q-1)}) and
    (n=frac{q^{tcdot2^ au}-1}{2(q^t+1)}). Some optimal linear codes from
    negacyclic codes are given. Finally, we discuss a class of MDS LCD negacyclic
    codes.

    Multi-Antenna Transmission in Downlink Heterogeneous Cellular Networks under A Threshold-based Mobile Association Policy

    Tong-Xing Zheng, Hui-Ming Wang, Moon Ho Lee
    Comments: Journal paper, double column, 13 pages, 9 figures, accepted to appear on IEEE Transactions on Communications
    Subjects: Information Theory (cs.IT)

    With the recent emergence of 5G era, heterogeneous cellular networks (HCNs)
    have invoked a popular research interest. In this paper, we provide a
    comprehensive analysis for multiantenna transmissions in a multi-tier downlink
    HCN. We first propose a reliability-oriented threshold-based mobile association
    policy, where each user connects to the strongest base station from which this
    user can obtain the largest truncated long-term received power. Under our
    mobile association policy, we derive analytical expressions for the exact
    outage probability of an arbitrary randomly located user, along with
    computationally convenient lower and upper bounds. Asymptotic analysis on the
    outage probability shows that introducing a large access threshold into mobile
    association significantly decreases the outage probability. We further
    investigate the spectrum efficiency and the energy efficiency of the HCN. Our
    theoretic analysis and numerical validations show that both the spectrum and
    energy efficiencies can be improved by properly choosing the access threshold.

    Transfer entropy in continuous time, with applications to jump and neural spiking processes

    Richard E. Spinney, Mikhail Prokopenko, Joseph T. Lizier
    Comments: 21 pages, 2 figures
    Subjects: Information Theory (cs.IT); Adaptation and Self-Organizing Systems (nlin.AO); Data Analysis, Statistics and Probability (physics.data-an); Neurons and Cognition (q-bio.NC)

    Transfer entropy has been used to quantify the directed flow of information
    between source and target variables in many complex systems. Originally
    formulated in discrete time, we provide a framework for considering transfer
    entropy in continuous time systems. By appealing to a measure theoretic
    formulation we generalise transfer entropy, describing it in terms of
    Radon-Nikodym derivatives between measures of complete path realisations. The
    resulting formalism introduces and emphasises the idea that transfer entropy is
    an expectation of an individually fluctuating quantity along a path, in the
    same way we consider the expectation of physical quantities such as work and
    heat. We recognise that transfer entropy is a quantity accumulated over a
    finite time interval, whilst permitting an associated instantaneous transfer
    entropy rate. We use this approach to produce an explicit form for the transfer
    entropy for pure jump processes, and highlight the simplified form in the
    specific case of point processes (frequently used in neuroscience to model
    neural spike trains). We contrast our approach with previous attempts to
    formulate information flow between continuous time point processes within a
    discrete time framework, which incur issues that our continuous time approach
    naturally avoids. Finally, we present two synthetic spiking neuron model
    examples to exhibit the pertinent features of our formalism, namely that the
    information flow for point processes consists of discontinuous jump
    contributions (at spikes in the target) interrupting a continuously varying
    contribution (relating to waiting times between target spikes).

    Controlled Barrage Regions: Stochastic Modeling, Analysis, and Optimization

    Salvatore Talarico, Matthew C. Valenti, Thomas R. Halford
    Comments: 7 pages, 3 images, to appear in Military Communication Conference (MILCOM). arXiv admin note: text overlap with arXiv:1408.5928
    Subjects: Information Theory (cs.IT)

    A barrage relay network (BRN) is a broadcast oriented ad hoc network
    involving autonomous cooperative communication, a slotted time-division frame
    format, and a coarse slot-level synchronization. While inherently a broadcast
    protocol, BRNs can support unicast transmission by superimposing a plurality of
    controlled barrage regions (CBRs) onto the network. Within each CBRs, a new
    packet is injected by the unicast source during the first time slot of each new
    radio frame. When a CBRs is sufficiently long that a packet might not be able
    to reach the other end within a radio frame, multiple packets can be active at
    the same time via spatial pipelining, resulting in interference within the
    CBRs. In this paper, the dynamics of packet transmission within a CBRs is
    described as a Markov process, and the outage probability of each link within
    the CBRs is evaluated in closed form, thereby accounting for fading and
    co-channel interference. In order to account for the linkage between
    simultaneous active packets and their temporal correlation, a Viterbi-like
    algorithm is used. Using this accurate analytical framework, a line network is
    optimized, which identifies the code rate, the number of relays, and the length
    of a radio frame that maximizes the transport capacity.

    Asynchronous Transmission over Gaussian Interference Channels with Stochastic Data Arrival

    Kamyar Moshksar
    Subjects: Information Theory (cs.IT)

    This paper addresses a Gaussian interference channel with two
    transmitter-receiver~(Tx-Rx) pairs under stochastic data arrival~(GIC-SDA).
    Information bits arrive at the transmitters according to independent and
    asynchronous Bernoulli processes~(Tx-Tx~asynchrony). Each information source
    turns off after generating a given total number of bits. The transmissions are
    extit{asynchronous} (Tx-Rx~asynchrony) in the sense that each Tx sends a
    codeword to its Rx immediately after there are enough bits available in its
    buffer. Such asynchronous style of transmission is shown to significantly
    reduce the transmission delay in comparison with the existing Tx-Rx synchronous
    transmission schemes. The receivers learn the activity frames of both
    transmitters by employing sequential joint-typicality detection. As a
    consequence, the GIC-SDA under Tx-Rx asynchrony is represented by a standard
    GIC with state known at the receivers. The cardinality of the state space is
    (inom{2N_1+2N_2}{2N_2}) in which (N_1, N_2) are the numbers of transmitted
    codewords by the two transmitters. Each realization of the state imposes two
    sets of constraints on (N_1, N_2) referred to as the geometric and reliability
    constraints. In a scenario where the transmitters are only aware of the
    statistics of Tx-Tx~asynchrony, it is shown how one designs (N_1,N_2) to
    achieve target transmission rates for both users and minimize the probability
    of unsuccessful decoding.

    Optimal Power Allocation and Active Interference Mitigation for Spatial Multiplexed MIMO Cognitive Systems

    Nikolaos I. Miridakis, Minghua Xia, Theodoros A. Tsiftsis
    Subjects: Information Theory (cs.IT)

    In this paper, the performance of an underlay multiple-input multiple-output
    (MIMO) cognitive radio system is analytically studied. In particular, the
    secondary transmitter operates in a spatial multiplexing transmission mode,
    while a zero-forcing (ZF) detector is employed at the secondary receiver.
    Additionally, the secondary system is interfered by multiple randomly
    distributed single-antenna primary users (PUs). To enhance the performance of
    secondary transmission, optimal power allocation is performed at the secondary
    transmitter with a constraint on the interference temperature (IT) specified by
    the PUs. The outage probability of the secondary receiver is explicitly derived
    in an exact closed-form expression. Also, some special cases of practical
    interest, including co-located PUs and massive MIMO, are discussed. Further, to
    mitigate instantaneous excessive interference onto PUs caused by the
    time-average IT, an iterative antenna reduction algorithm is developed for the
    secondary transmitter and, accordingly, the average number of transmit antennas
    is analytically computed. Extensive numerical and simulation results
    corroborate the effectiveness of our analysis.

    Distributed Spatial Multiplexing Systems with Hardware Impairments and Imperfect Channel Estimation under Rank-(1) Rician Fading Channels

    Nikolaos I. Miridakis, Theodoros A. Tsiftsis, Corbett Rowell
    Subjects: Information Theory (cs.IT)

    The performance of a multiuser communication system with single-antenna
    transmitting terminals and a multi-antenna base-station receiver is
    analytically investigated. The system operates under independent and
    non-identically distributed rank-(1) Rician fading channels with imperfect
    channel estimation and residual hardware impairments (compensation algorithms
    are assumed, which mitigate the main impairments) at the transceiver. The
    spatial multiplexing mode of operation is considered where all the users are
    simultaneously transmitting their streams to the receiver. Zero-forcing is
    applied along with successive interference cancellation (SIC) as a means for
    efficient detection of the received streams. New analytical closed-form
    expressions are derived for some important performance metrics, namely, the
    outage probability and ergodic capacity of the entire system. Both the
    analytical expressions and simulation results show the impact of imperfect
    channel estimation and hardware impairments to the overall system performance
    in the usage scenarios of massive MIMO and mmWave communication systems.

    Low rank matrix recovery from Clifford orbits

    Richard Kueng, Huangjun Zhu, David Gross
    Comments: 19 pages, no figure
    Subjects: Information Theory (cs.IT); Quantum Physics (quant-ph)

    We prove that low-rank matrices can be recovered efficiently from a small
    number of measurements that are sampled from orbits of a certain matrix group.
    As a special case, our theory makes statements about the phase retrieval
    problem. Here, the task is to recover a vector given only the amplitudes of its
    inner product with a small number of vectors from an orbit. Variants of the
    group in question have appeared under different names in many areas of
    mathematics. In coding theory and quantum information, it is the complex
    Clifford group; in time-frequency analysis the oscillator group; and in
    mathematical physics the metaplectic group. It affords one particularly small
    and highly structured orbit that includes and generalizes the discrete Fourier
    basis: While the Fourier vectors have coefficients of constant modulus and
    phases that depend linearly on their index, the vectors in said orbit have
    phases with a quadratic dependence. In quantum information, the orbit is used
    extensively and is known as the set of stabilizer states. We argue that due to
    their rich geometric structure and their near-optimal recovery properties,
    stabilizer states form an ideal model for structured measurements for phase
    retrieval. Our results hold for (mgeq C kappa_r r d log(d)) measurements,
    where the oversampling factor k varies between (kappa_r=1) and (kappa_r =
    r^2) depending on the orbit. The reconstruction is stable towards both additive
    noise and deviations from the assumption of low rank. If the matrices of
    interest are in addition positive semidefinite, reconstruction may be performed
    by a simple constrained least squares regression. Our proof methods could be
    adapted to cover orbits of other groups.

    Universality of Baysian mixture predictors

    Daniil Ryabko
    Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Learning (cs.LG)

    The problem is that of sequential probability forecasting for finite-valued
    time series. The data is generated by an unknown probability distribution over
    the space of all one-way infinite sequences. It is known that this measure
    belongs to a given set C, but the latter is completely arbitrary (uncountably
    infinite, without any structure given). The performance is measured with
    asymptotic average log loss. In this work it is shown that the minimax
    asymptotic performance is always attainable, and it is attained by a convex
    combination of a countably many measures from the set C (a Bayesian mixture).
    This was previously only known for the case when the best achievable asymptotic
    error is 0. This also contrasts previous results that show that in the
    non-realizable case all Bayesian mixtures may be suboptimal, while there is a
    predictor that achieves the optimal performance.

    Infinite-dimensional Log-Determinant divergences II: Alpha-Beta divergences

    Minh Ha Quang
    Comments: 60 pages
    Subjects: Functional Analysis (math.FA); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

    This work presents a parametrized family of divergences, namely Alpha-Beta
    Log- Determinant (Log-Det) divergences, between positive definite unitized
    trace class operators on a Hilbert space. This is a generalization of the
    Alpha-Beta Log-Determinant divergences between symmetric, positive definite
    matrices to the infinite-dimensional setting. The family of Alpha-Beta Log-Det
    divergences is highly general and contains many divergences as special cases,
    including the recently formulated infinite dimensional affine-invariant
    Riemannian distance and the infinite-dimensional Alpha Log-Det divergences
    between positive definite unitized trace class operators. In particular, it
    includes a parametrized family of metrics between positive definite trace class
    operators, with the affine-invariant Riemannian distance and the square root of
    the symmetric Stein divergence being special cases. For the Alpha-Beta Log-Det
    divergences between covariance operators on a Reproducing Kernel Hilbert Space
    (RKHS), we obtain closed form formulas via the corresponding Gram matrices.




沪ICP备19023445号-2号
友情链接