IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 1 Feb 2017

    我爱机器学习(52ml.net)发表于 2017-02-01 00:00:00
    love 0

    Neural and Evolutionary Computing

    Skip Connections as Effective Symmetry-Breaking

    A. Emin Orhan
    Comments: 16 pages, 12 figures, 1 supplementary figure
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Skip connections made the training of very deep neural networks possible and
    have become an indispendable component in a variety of neural architectures. A
    satisfactory explanation for their success remains elusive. Here, we present an
    explanation for the benefits of skip connections in training very deep neural
    networks. We argue that skip connections help break symmetries inherent in the
    loss landscapes of deep networks, leading to drastically simplified landscapes.
    In particular, skip connections between adjacent layers in a multilayer network
    break the permutation symmetry of nodes in a given layer, and the recently
    proposed DenseNet architecture, where each layer projects skip connections to
    every layer above it, also breaks the rescaling symmetry of connectivity
    matrices between different layers. This hypothesis is supported by evidence
    from a toy model with binary weights and from experiments with fully-connected
    networks suggesting (i) that skip connections do not necessarily improve
    training unless they help break symmetries and (ii) that alternative ways of
    breaking the symmetries also lead to significant performance improvements in
    training deep networks, hence there is nothing special about skip connections
    in this respect. We find, however, that skip connections confer additional
    benefits over and above symmetry-breaking, such as the ability to deal
    effectively with the vanishing gradients problem.

    A Hybrid Approach for Secured Optimal Power Flow and Voltage Stability with TCSC Placement

    Sheila Mahapatra, Nitin Malik
    Comments: Google Scholar Indexed Australian journal, ISSN: 2005-4238, 8 pages, 2 figures, 1 table
    Journal-ref: International Journal of Advanced Science and Technology, vol.89,
    pp.1-8, April 2016
    Subjects: Neural and Evolutionary Computing (cs.NE); Systems and Control (cs.SY)

    This paper proposes a hybrid technique for secured optimal power flow coupled
    with enhancing voltage stability with FACTS device installation. The hybrid
    approach of Improved Gravitational Search algorithm (IGSA) and Firefly
    algorithm (FA) performance is analyzed by optimally placing TCSC controller.
    The algorithm is implemented in MATLAB working platform and the power flow
    security and voltage stability is evaluated with IEEE 30 bus transmission
    systems. The optimal results generated are compared with those available in
    literature and the superior performance of algorithm is depicted as minimum
    generation cost, reduced real power losses along with sustaining voltage
    stability.

    An Extremal Optimization approach to parallel resonance constrained capacitor placement problem

    André R. Goncalves, Celso Cavelucci, Christiano Lyra Filho, Fernando J. Von Zuben
    Comments: Paper published in the 6th IEEE/PES Transmission and Distribution: Latin America, 2012, Montevideo, Uruguay
    Subjects: Optimization and Control (math.OC); Neural and Evolutionary Computing (cs.NE)

    Installation of capacitors in distribution networks is one of the most used
    procedure to compensate reactive power generated by loads and, consequently, to
    reduce technical losses. So, the problem consists in identifying the optimal
    placement and sizing of capacitors. This problem is known in the literature as
    optimal capacitor placement problem. Neverthless, depending on the location and
    size of the capacitor, it may become a harmonic source, allowing capacitor to
    enter into resonance with the distribution network, causing several undesired
    side effects. In this work we propose a parsimonious method to deal with the
    capacitor placement problem that incorporates resonance constraints, ensuring
    that every allocated capacitor will not act as a harmonic source. This proposed
    algorithm is based upon a physical inspired metaheuristic known as Extremal
    Optimization. The results achieved showed that this proposal has reached
    significant gains when compared with other proposals that attempt repair, in a
    post-optimization stage, already obtained solutions which violate resonance
    constraints.

    Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point

    Naveen Mellempudi, Abhisek Kundu, Dipankar Das, Dheevatsa Mudigere, Bharat Kaul
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose a cluster-based quantization method to convert pre-trained full
    precision weights into ternary weights with minimal impact on the accuracy. In
    addition, we also constrain the activations to 8-bits thus enabling sub 8-bit
    full integer inference pipeline. Our method uses smaller clusters of N filters
    with a common scaling factor to minimize the quantization loss, while also
    maximizing the number of ternary operations. We show that with a cluster size
    of N=4 on Resnet-101, can achieve 71.8% TOP-1 accuracy, within 6% of the best
    full precision results while replacing approx 85% of all multiplications with
    8-bit accumulations. Using the same method with 4-bit weights achieves 76.3%
    TOP-1 accuracy which within 2% of the full precision result. We also study the
    impact of the size of the cluster on both performance and accuracy, larger
    cluster sizes N=64 can replace approx 98% of the multiplications with ternary
    operations but introduces significant drop in accuracy which necessitates fine
    tuning the parameters with retraining the network at lower precision. To
    address this we have also trained low-precision Resnet-50 with 8-bit
    activations and ternary weights by pre-initializing the network with full
    precision weights and achieve 68.9% TOP-1 accuracy within 4 additional epochs.
    Our final quantized model can run on a full 8-bit compute pipeline, with a
    potential 16x improvement in performance compared to baseline full-precision
    models.


    Computer Vision and Pattern Recognition

    DeepNav: Learning to Navigate Large Cities

    Samarth Brahmbhatt, James Hays
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present DeepNav, a Convolutional Neural Network (CNN) based algorithm for
    navigating large cities using locally visible street-view images. The DeepNav
    agent learns to reach its destination quickly by making the correct navigation
    decisions at intersections. We collect a large-scale dataset of street-view
    images organized in a graph where nodes are connected by roads. This dataset
    contains 10 city graphs and more than 1 million street-view images. We propose
    3 supervised learning approaches for the navigation task and show how A* search
    in the city graph can be used to generate supervision for the learning. Our
    annotation process is fully automated using publicly available mapping services
    and requires no human input. We evaluate the proposed DeepNav models on 4
    held-out cities for navigating to 5 different types of destinations. Our
    algorithms outperform previous work that uses hand-crafted features and Support
    Vector Regression (SVR)[19].

    A New Method for Removing the Moire' Pattern from Images

    Seyede Mahya Hazavei, Hamid Reza Shahdoosti
    Comments: 6 pages, 12 figures, conference
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    During the last decades, denoising methods have attracted much attention of
    researchers. The conventional method for removing the Moire’ pattern from
    images is using notch filters in the Frequency-domain. In this paper a new
    method is proposed that can achieve a better performance in comparison with the
    traditional method. Median filter is used in some part of spectrum of noisy
    images to reduce the noise. At the second part of this paper, to demonstrate
    the robustness of the proposed method, it is implemented for some noisy images
    that have moire’ pattern. Experiments on noisy images with different
    characteristics show that the proposed method increases the PSNR values
    compared with previous methods.

    A novel method for automatic localization of joint area on knee plain radiographs

    Aleksei Tiulpin, Jérôme Thevenot, Esa Rahtu, Simo Saarakkala
    Comments: Submitted to Scandinavian Conference on Image Analysis (SCIA) 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Osteoarthritis (OA) is a common musculoskeletal condition typically diagnosed
    from radiographic assessment after clinical examination. However, a visual
    evaluation made by a practitioner suffers from subjectivity and is highly
    dependent on the experience. Computer-aided diagnostics (CAD) could improve the
    objectivity of knee radiographic examination. The first essential step of knee
    OA CAD is to automatically localize the joint area. However, according to the
    literature this task itself remains challenging. The aim of this study was to
    develop novel and computationally efficient method to tackle the issue. Here,
    three different datasets of knee radiographs were used (n = 473/93/77) to
    validate the overall performance of the method. Our pipeline consists of two
    parts: anatomically-based joint area proposal and their evaluation using
    Histogram of Oriented Gradients and the pre-trained Support Vector Machine
    classifier scores. The obtained results for the used datasets show the mean
    intersection over the union equal to: 0.84, 0.79 and 0.78. Using a high-end
    computer, the method allows to automatically annotate conventional knee
    radiographs within 14-16ms and high resolution ones within 170ms. Our results
    demonstrate that the developed method is suitable for large-scale analyses.

    Deep Multitask Architecture for Integrated 2D and 3D Human Sensing

    Alin-Ionut Popa, Mihai Zanfir, Cristian Sminchisescu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a deep multitask architecture for emph{fully automatic 2d and 3d
    human sensing} (DMHS), including emph{recognition and reconstruction}, in
    emph{monocular images}. The system computes the figure-ground segmentation,
    semantically identifies the human body parts at pixel level, and estimates the
    2d and 3d pose of the person. The model supports the joint training of all
    components by means of multi-task losses where early processing stages
    recursively feed into advanced ones for increasingly complex calculations,
    accuracy and robustness. The design allows us to tie a complete training
    protocol, by taking advantage of multiple datasets that would otherwise
    restrictively cover only some of the model components: complex 2d image data
    with no body part labeling and without associated 3d ground truth, or complex
    3d data with limited 2d background variability. In detailed experiments based
    on several challenging 2d and 3d datasets (LSP, HumanEva, Human3.6M), we
    evaluate the sub-structures of the model, the effect of various types of
    training data in the multitask loss, and demonstrate that state-of-the-art
    results can be achieved at all processing levels. We also show that in the wild
    our monocular RGB architecture is perceptually competitive to a state-of-the
    art (commercial) Kinect system based on RGB-D data.

    Towards Adversarial Retinal Image Synthesis

    Pedro Costa, Adrian Galdran, Maria Inês Meyer, Michael David Abràmoff, Meindert Niemeijer, Ana Maria Mendonça, Aurélio Campilho
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Synthesizing images of the eye fundus is a challenging task that has been
    previously approached by formulating complex models of the anatomy of the eye.
    New images can then be generated by sampling a suitable parameter space. In
    this work, we propose a method that learns to synthesize eye fundus images
    directly from data. For that, we pair true eye fundus images with their
    respective vessel trees, by means of a vessel segmentation technique. These
    pairs are then used to learn a mapping from a binary vessel tree to a new
    retinal image. For this purpose, we use a recent image-to-image translation
    technique, based on the idea of adversarial learning. Experimental results show
    that the original and the generated images are visually different in terms of
    their global appearance, in spite of sharing the same vessel tree.
    Additionally, a quantitative quality analysis of the synthetic retinal images
    confirms that the produced images retain a high proportion of the true image
    set quality.

    Supervised Learning in Automatic Channel Selection for Epileptic Seizure Detection

    Nhan Truong, Levin Kuhlmann, Mohammad Reza Bonyadi, Jiawei Yang, Andrew Faulks, Omid Kavehei
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Detecting seizure using brain neuroactivations recorded by intracranial
    electroencephalogram (iEEG) has been widely used for monitoring, diagnosing,
    and closed-loop therapy of epileptic patients, however, computational
    efficiency gains are needed if state-of-the-art methods are to be implemented
    in implanted devices. We present a novel method for automatic seizure detection
    based on iEEG data that outperforms current state-of-the-art seizure detection
    methods in terms of computational efficiency while maintaining the accuracy.
    The proposed algorithm incorporates an automatic channel selection (ACS) engine
    as a pre-processing stage to the seizure detection procedure. The ACS engine
    consists of supervised classifiers which aim to find iEEGchannelswhich
    contribute the most to a seizure. Seizure detection stage involves feature
    extraction and classification. Feature extraction is performed in both
    frequency and time domains where spectral power and correlation between channel
    pairs are calculated. Random Forest is used in classification of interictal,
    ictal and early ictal periods of iEEG signals. Seizure detection in this paper
    is retrospective and patient-specific. iEEG data is accessed via Kaggle,
    provided by International Epilepsy Electro-physiology Portal. The dataset
    includes a training set of 6.5 hours of interictal data and 41 minin ictal data
    and a test set of 9.14 hours. Compared to the state-of-the-art on the same
    dataset, we achieve 49.4% increase in computational efficiency and 400 mins
    better in average for detection delay. The proposed model is able to detect a
    seizure onset at 91.95% sensitivity and 94.05% specificity with a mean
    detection delay of 2.77 s. The area under the curve (AUC) is 96.44%, that is
    comparable to the current state-of-the-art with AUC of 96.29%.

    Deep Reinforcement Learning for Visual Object Tracking in Videos

    Da Zhang, Hamid Maei, Xin Wang, Yuan-Fang Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Convolutional neural network (CNN) models have achieved tremendous success in
    many visual detection and recognition tasks. Unfortunately, visual tracking, a
    fundamental computer vision problem, is not handled well using the existing CNN
    models, because most object trackers implemented with CNN do not effectively
    leverage temporal and contextual information among consecutive frames.
    Recurrent neural network (RNN) models, on the other hand, are often used to
    process text and voice data due to their ability to learn intrinsic
    representations of sequential and temporal data. Here, we propose a novel
    neural network tracking model that is capable of integrating information over
    time and tracking a selected target in video. It comprises three components: a
    CNN extracting best tracking features in each video frame, an RNN constructing
    video memory state, and a reinforcement learning (RL) agent making target
    location decisions. The tracking problem is formulated as a decision-making
    process, and our model can be trained with RL algorithms to learn good tracking
    policies that pay attention to continuous, inter-frame correlation and maximize
    tracking performance in the long run. We compare our model with an existing
    neural-network based tracking method and show that the proposed tracking
    approach works well in various scenarios by performing rigorous validation
    experiments on artificial video sequences with ground truth. To the best of our
    knowledge, our tracker is the first neural-network tracker that combines
    convolutional and recurrent networks with RL algorithms.

    Co-segmentation for Space-Time Co-located Collections

    Hadar Averbuch-Elor, Johannes Kopf, Tamir Hazan, Daniel Cohen-Or
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a co-segmentation technique for space-time co-located image
    collections. These prevalent collections capture various dynamic events,
    usually by multiple photographers, and may contain multiple co-occurring
    objects which are not necessarily part of the intended foreground object,
    resulting in ambiguities for traditional co-segmentation techniques. Thus, to
    disambiguate what the common foreground object is, we introduce a
    weakly-supervised technique, where we assume only a small seed, given in the
    form of a single segmented image. We take a distributed approach, where local
    belief models are propagated and reinforced with similar images. Our technique
    progressively expands the foreground and background belief models across the
    entire collection. The technique exploits the power of the entire set of image
    without building a global model, and thus successfully overcomes large
    variability in appearance of the common foreground object. We demonstrate that
    our method outperforms previous co-segmentation techniques on challenging
    space-time co-located collections, including dense benchmark datasets which
    were adapted for our novel problem setting.

    Feature Selection based on PCA and PSO for Multimodal Medical Image Fusion using DTCWT

    Padmavathi K, Mahima Bhat, Maya V Karki
    Comments: 8 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Multimodal medical image fusion helps to increase efficiency in medical
    diagnosis. This paper presents multimodal medical image fusion by selecting
    relevant features using Principle Component Analysis (PCA) and Particle Swarm
    Optimization techniques (PSO). DTCWT is used for decomposition of the images
    into low and high frequency coefficients. Fusion rules such as combination of
    minimum, maximum and simple averaging are applied to approximate and detailed
    coefficients. The fused image is reconstructed by inverse DTCWT. Performance
    metrics are evaluated and it shows that DTCWT-PCA performs better than
    DTCWT-PSO in terms of Structural Similarity Index Measure (SSIM) and Cross
    Correlation (CC). Computation time and feature vector size is reduced in
    DTCWT-PCA compared to DTCWT-PSO for feature selection which proves robustness
    and storage capacity.

    3D Shape Retrieval via Irrelevance Filtering and Similarity Ranking (IF/SR)

    Xiaqing Pan, Yueru Chen, C.-C. Jay Kuo
    Comments: arXiv admin note: text overlap with arXiv:1603.01942
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A novel solution for the content-based 3D shape retrieval problem using an
    unsupervised clustering approach, which does not need any label information of
    3D shapes, is presented in this work. The proposed shape retrieval system
    consists of two modules in cascade: the irrelevance filtering (IF) module and
    the similarity ranking (SR) module. The IF module attempts to cluster gallery
    shapes that are similar to each other by examining global and local features
    simultaneously. However, shapes that are close in the local feature space can
    be distant in the global feature space, and vice versa. To resolve this issue,
    we propose a joint cost function that strikes a balance between two distances.
    Irrelevant samples that are close in the local feature space but distant in the
    global feature space can be removed in this stage. The remaining gallery
    samples are ranked in the SR module using the local feature. The superior
    performance of the proposed IF/SR method is demonstrated by extensive
    experiments conducted on the popular SHREC12 dataset.

    Language Independent Single Document Image Super-Resolution using CNN for improved recognition

    Ram Krishna Pandey, A G Ramakrishnan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recognition of document images have important applications in restoring old
    and classical texts. The problem involves quality improvement before passing it
    to a properly trained OCR to get accurate recognition of the text. The image
    enhancement and quality improvement constitute important steps as subsequent
    recognition depends upon the quality of the input image. There are scenarios
    when high resolution images are not available and our experiments show that the
    OCR accuracy reduces significantly with decrease in the spatial resolution of
    document images. Thus the only option is to improve the resolution of such
    document images. The goal is to construct a high resolution image, given a
    single low resolution binary image, which constitutes the problem of single
    image super-resolution. Most of the previous work in super-resolution deal with
    natural images which have more information-content than the document images.
    Here, we use Convolution Neural Network to learn the mapping between low and
    the corresponding high resolution images. We experiment with different number
    of layers, parameter settings and non-linear functions to build a fast
    end-to-end framework for document image super-resolution. Our proposed model
    shows a very good PSNR improvement of about 4 dB on 75 dpi Tamil images,
    resulting in a 3 % improvement of word level accuracy by the OCR. It takes less
    time than the recent sparse based natural image super-resolution technique,
    making it useful for real-time document recognition applications.

    Fully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs

    Alexey A. Novikov, David Major, Dimitrios Lenis, Jiri Hladůvka, Maria Wimmer, Katja Bühler
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The recent success of Deep Convolutional Neural Networks on image
    classification and recognition tasks has led to new applications in very
    diversifying contexts. One of these is medical imaging where scarcity and
    imbalance of training data has hindered rapid development of neural network
    related applications.

    This paper investigates and proposes neural network architectures within the
    context of automated segmentation of anatomical organs in chest radiographs,
    namely for lung, clavicles and heart. By relating prior class data
    distributions to the objective function sparsely represented structures are
    methodologically emphasized. Scarce training sets and data augmentation are
    encountered with aggressive data regularization. The problem of highly
    imbalanced target object appearance in the input data is solved by modifying
    the objective function.

    The models are trained and tested on the publicly available JSRT database
    consisting of 247 X-Ray images the ground-truth masks for which available in
    the SCR database. The networks have been trained in a multi-class setup with
    three target classes.

    Our best performing model trained with the negative Dice loss function was
    able to reach mean Jaccard overlap scores of 94.1\% for lungs, 86.6\% for heart
    and 88.7\% for clavicles in the multi-label setup, therefore, outperforming the
    best state-of-the art methods for heart and clavicle and human observer on lung
    and heart segmentation tasks.

    SenseGen: A Deep Learning Architecture for Synthetic Sensor Data Generation

    Moustafa Alzantot, Supriyo Chakraborty, Mani B. Srivastava
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Our ability to synthesize sensory data that preserves specific statistical
    properties of the real data has had tremendous implications on data privacy and
    big data analytics. The synthetic data can be used as a substitute for
    selective real data segments,that are sensitive to the user, thus protecting
    privacy and resulting in improved analytics.However, increasingly adversarial
    roles taken by data recipients such as mobile apps, or other cloud-based
    analytics services, mandate that the synthetic data, in addition to preserving
    statistical properties, should also be difficult to distinguish from the real
    data. Typically, visual inspection has been used as a test to distinguish
    between datasets. But more recently, sophisticated classifier models
    (discriminators), corresponding to a set of events, have also been employed to
    distinguish between synthesized and real data. The model operates on both
    datasets and the respective event outputs are compared for consistency. In this
    paper, we take a step towards generating sensory data that can pass a deep
    learning based discriminator model test, and make two specific contributions:
    first, we present a deep learning based architecture for synthesizing sensory
    data. This architecture comprises of a generator model, which is a stack of
    multiple Long-Short-Term-Memory (LSTM) networks and a Mixture Density Network.
    second, we use another LSTM network based discriminator model for
    distinguishing between the true and the synthesized data. Using a dataset of
    accelerometer traces, collected using smartphones of users doing their daily
    activities, we show that the deep learning based discriminator model can only
    distinguish between the real and synthesized traces with an accuracy in the
    neighborhood of 50%.


    Artificial Intelligence

    On the Semantics and Complexity of Probabilistic Logic Programs

    Fabio Gagliardi Cozman, Denis Deratani Mauá
    Subjects: Artificial Intelligence (cs.AI)

    We examine the meaning and the complexity of probabilistic logic programs
    that consist of a set of rules and a set of independent probabilistic facts
    (that is, programs based on Sato’s distribution semantics). We focus on two
    semantics, respectively based on stable and on well-founded models. We show
    that the semantics based on stable models (referred to as the “credal
    semantics”) produces sets of probability models that dominate infinitely
    monotone Choquet capacities, we describe several useful consequences of this
    result. We then examine the complexity of inference with probabilistic logic
    programs. We distinguish between the complexity of inference when a
    probabilistic program and a query are given (the inferential complexity), and
    the complexity of inference when the probabilistic program is fixed and the
    query is given (the query complexity, akin to data complexity as used in
    database theory). We obtain results on the inferential and query complexity for
    acyclic, stratified, and cyclic propositional and relational programs,
    complexity reaches various levels of the counting hierarchy and even
    exponential levels.

    Interaction Information for Causal Inference: The Case of Directed Triangle

    AmirEmad Ghassami, Negar Kiyavash
    Subjects: Artificial Intelligence (cs.AI)

    Interaction information is one of the multivariate generalizations of mutual
    information, which expresses the amount information shared among a set of
    variables, beyond the information, which is shared in any proper subset of
    those variables. Unlike (conditional) mutual information, which is always
    non-negative, interaction information can be negative. We utilize this property
    to find the direction of causal influences among variables in a triangle
    topology under some mild assumptions.

    Expert Level control of Ramp Metering based on Multi-task Deep Reinforcement Learning

    Francois Belletti, Daniel Haziza, Gabriel Gomes, Alexandre M. Bayen
    Subjects: Artificial Intelligence (cs.AI)

    This article shows how the recent breakthroughs in Reinforcement Learning
    (RL) that have enabled robots to learn to play arcade video games, walk or
    assemble colored bricks, can be used to perform other tasks that are currently
    at the core of engineering cyberphysical systems. We present the first use of
    RL for the control of systems modeled by discretized non-linear Partial
    Differential Equations (PDEs) and devise a novel algorithm to use
    non-parametric control techniques for large multi-agent systems. We show how
    neural network based RL enables the control of discretized PDEs whose
    parameters are unknown, random, and time-varying. We introduce an algorithm of
    Mutual Weight Regularization (MWR) which alleviates the curse of dimensionality
    of multi-agent control schemes by sharing experience between agents while
    giving each agent the opportunity to specialize its action policy so as to
    tailor it to the local parameters of the part of the system it is located in.

    Robust Multilingual Named Entity Recognition with Shallow Semi-Supervised Features

    Rodrigo Agerri, German Rigau
    Comments: 26 pages, 19 tables (submitted for publication on September 2015), Artificial Intelligence (2016)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We present a multilingual Named Entity Recognition approach based on a robust
    and general set of features across languages and datasets. Our system combines
    shallow local information with clustering semi-supervised features induced on
    large amounts of unlabeled text. Understanding via empirical experimentation
    how to effectively combine various types of clustering features allows us to
    seamlessly export our system to other datasets and languages. The result is a
    simple but highly competitive system which obtains state of the art results
    across five languages and twelve datasets. The results are reported on standard
    shared task evaluation data such as CoNLL for English, Spanish and Dutch.
    Furthermore, and despite the lack of linguistically motivated features, we also
    report best results for languages such as Basque and German. In addition, we
    demonstrate that our method also obtains very competitive results even when the
    amount of supervised data is cut by half, alleviating the dependency on
    manually annotated data. Finally, the results show that our emphasis on
    clustering features is crucial to develop robust out-of-domain models. The
    system and models are freely available to facilitate its use and guarantee the
    reproducibility of results.

    Efficient Rank Aggregation via Lehmer Codes

    Pan Li, Arya Mazumdar, Olgica Milenkovic
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    We propose a novel rank aggregation method based on converting permutations
    into their corresponding Lehmer codes or other subdiagonal images. Lehmer
    codes, also known as inversion vectors, are vector representations of
    permutations in which each coordinate can take values not restricted by the
    values of other coordinates. This transformation allows for decoupling of the
    coordinates and for performing aggregation via simple scalar median or mode
    computations. We present simulation results illustrating the performance of
    this completely parallelizable approach and analytically prove that both the
    mode and median aggregation procedure recover the correct centroid aggregate
    with small sample complexity when the permutations are drawn according to the
    well-known Mallows models. The proposed Lehmer code approach may also be used
    on partial rankings, with similar performance guarantees.

    Comparing Dataset Characteristics that Favor the Apriori, Eclat or FP-Growth Frequent Itemset Mining Algorithms

    Jeff Heaton
    Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

    Frequent itemset mining is a popular data mining technique. Apriori, Eclat,
    and FP-Growth are among the most common algorithms for frequent itemset mining.
    Considerable research has been performed to compare the relative performance
    between these three algorithms, by evaluating the scalability of each algorithm
    as the dataset size increases. While scalability as data size increases is
    important, previous papers have not examined the performance impact of
    similarly sized datasets that contain different itemset characteristics. This
    paper explores the effects that two dataset characteristics can have on the
    performance of these three frequent itemset algorithms. To perform this
    empirical analysis, a dataset generator is created to measure the effects of
    frequent item density and the maximum transaction size on performance. The
    generated datasets contain the same number of rows. This provides some insight
    into dataset characteristics that are conducive to each algorithm. The results
    of this paper’s research demonstrate Eclat and FP-Growth both handle increases
    in maximum transaction size and frequent itemset density considerably better
    than the Apriori algorithm.

    This paper explores the effects that two dataset characteristics can have on
    the performance of these three frequent itemset algorithms. To perform this
    empirical analysis, a dataset generator is created to measure the effects of
    frequent item density and the maximum transaction size on performance. The
    generated datasets contain the same number of rows. This provides some insight
    into dataset characteristics that are conducive to each algorithm. The results
    of this paper’s research demonstrate Eclat and FP-Growth both handle increases
    in maximum transaction size and frequent itemset density considerably better
    than the Apriori algorithm.

    CommAI: Evaluating the first steps towards a useful general AI

    Marco Baroni, Armand Joulin, Allan Jabri, Germàn Kruszewski, Angeliki Lazaridou, Klemen Simonic, Tomas Mikolov
    Comments: Submitted to ICLR 2017 Workshop Track
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    With machine learning successfully applied to new daunting problems almost
    every day, general AI starts looking like an attainable goal. However, most
    current research focuses instead on important but narrow applications, such as
    image classification or machine translation. We believe this to be largely due
    to the lack of objective ways to measure progress towards broad machine
    intelligence. In order to fill this gap, we propose here a set of concrete
    desiderata for general AI, together with a platform to test machines on how
    well they satisfy such desiderata, while keeping all further complexities to a
    minimum.

    Deep Reinforcement Learning for Robotic Manipulation-The state of the art

    Smruti Amarjyoti
    Comments: 18 pages
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

    The focus of this work is to enumerate the various approaches and algorithms
    that center around application of reinforcement learning in robotic ma-
    ]]nipulation tasks. Earlier methods utilized specialized policy representations
    and human demonstrations to constrict the policy. Such methods worked well with
    continuous state and policy space of robots but failed to come up with
    generalized policies. Subsequently, high dimensional non-linear function
    approximators like neural networks have been used to learn policies from
    scratch. Several novel and recent approaches have also embedded control policy
    with efficient perceptual representation using deep learning. This has led to
    the emergence of a new branch of dynamic robot control system called deep r
    inforcement learning(DRL). This work embodies a survey of the most recent
    algorithms, architectures and their implementations in simulations and real
    world robotic platforms. The gamut of DRL architectures are partitioned into
    two different branches namely, discrete action space algorithms(DAS) and
    continuous action space algorithms(CAS). Further, the CAS algorithms are
    divided into stochastic continuous action space(SCAS) and deterministic
    continuous action space(DCAS) algorithms. Along with elucidating an organ-
    isation of the DRL algorithms this work also manifests some of the state of the
    art applications of these approaches in robotic manipulation tasks.

    Algorithm selection of off-policy reinforcement learning algorithm

    Laroche Romain, Feraud Raphael
    Comments: 27 pages
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Optimization and Control (math.OC)

    Dialogue systems rely on a careful reinforcement learning design: the
    learning algorithm and its state space representation. In lack of more rigorous
    knowledge, the designer resorts to its practical experience to choose the best
    option. In order to automate and to improve the performance of the
    aforementioned process, this article formalises the problem of online
    off-policy reinforcement learning algorithm selection. A meta-algorithm is
    given for input a portfolio constituted of several off-policy reinforcement
    learning algorithms. It then determines at the beginning of each new
    trajectory, which algorithm in the portfolio is in control of the behaviour
    during the full next trajectory, in order to maximise the return. The article
    presents a novel meta-algorithm, called Epochal Stochastic Bandit Algorithm
    Selection (ESBAS). Its principle is to freeze the policy updates at each epoch,
    and to leave a rebooted stochastic bandit in charge of the algorithm selection.
    Under some assumptions, a thorough theoretical analysis demonstrates its
    near-optimality considering the structural sampling budget limitations. Then,
    ESBAS is put to the test in a set of experiments with various portfolios, on a
    negotiation dialogue game. The results show the practical benefits of the
    algorithm selection for dialogue systems, in most cases even outperforming the
    best algorithm in the portfolio, even when the aforementioned assumptions are
    transgressed.

    C3A: A Cognitive Collaborative Control Architecture For an Intelligent Wheelchair

    Rupam Bhattacharyya, Adity Saikia, Shyamanta M. Hazarika
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

    Retention of residual skills for persons who partially lose their cognitive
    or physical ability is of utmost importance. Research is focused on developing
    systems that provide need-based assistance for retention of such residual
    skills. This paper describes a novel cognitive collaborative control
    architecture C3A, designed to address the challenges of developing need- based
    assistance for wheelchair navigation. Organization of C3A is detailed and
    results from simulation of the proposed architecture is presented. For
    simulation of our proposed architecture, we have used ROS (Robot Operating
    System) as a control framework and a 3D robotic simulator called USARSim
    (Unified System for Automation and Robot Simulation).


    Information Retrieval

    Towards A Time Based Video Search Engine for Al Quran Interpretation

    Maged M. Eljazzar, Afnan Hassan, Amira A. AlSharkawy
    Subjects: Information Retrieval (cs.IR); Computers and Society (cs.CY)

    The number of Internet Muslim-users is remarkably increasing from all over
    the world countries. There are a lot of structured, and well-documented text
    resources for the Quran interpretation, Tafsir, over the Internet with several
    languages. Nevertheless, when searching for the meaning of specific words, many
    users prefer watching short videos rather than reading a script or a book. This
    paper introduces the solution for the challenge of partitioning the common
    Tafsir videos into short videos according to the search query and sharing these
    result videos on the social networks. Furthermore, we provide the ability of
    user commenting on a specific time-based frame on the video or a specific verse
    in a particular subject. It would be very valuable to apply the current
    technologies on Holy Quran and Tafsir to easy the query for verses,
    understanding of its meaning, and sharing it on the different social media.

    Integrating Reviews into Personalized Ranking for Cold Start Recommendation

    Guang-Neng Hu
    Comments: 12 pages, PAKDD 2017
    Subjects: Information Retrieval (cs.IR)

    Item recommendation task predicts a personalized ranking over a set of items
    for individual user. One paradigm is the rating-based methods that concentrate
    on explicit feedbacks and hence face the difficulties in collecting them.
    Meanwhile, the ranking-based methods are presented with rated items and then
    rank the rated above the unrated. This paradigm uses widely available implicit
    feedback but it usually ignores some important information: item reviews. Item
    reviews not only justify the preferences of users, but also help alleviate the
    cold-start problem that fails the collaborative filtering. In this paper, we
    propose two novel and simple models to integrate item reviews into matrix
    factorization based Bayesian personalized ranking (BPR-MF). In each model, we
    make use of text features extracted from item reviews via word embeddings. On
    top of text features we uncover the review dimensions that explain the
    variation in users’ feedback and these review factors represent a prior
    preference of a user. Experiments on real-world data sets show the benefits of
    leveraging item reviews on ranking prediction. We also conduct analyses to
    understand the proposed models.

    Ties That Bind – Characterizing Classes by Attributes and Social Ties

    Aria Rezaei, Bryan Perozzi, Leman Akoglu
    Comments: WWW’17 Web Science, 9 pages
    Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR); Physics and Society (physics.soc-ph)

    Given a set of attributed subgraphs known to be from different classes, how
    can we discover their differences? There are many cases where collections of
    subgraphs may be contrasted against each other. For example, they may be
    assigned ground truth labels (spam/not-spam), or it may be desired to directly
    compare the biological networks of different species or compound networks of
    different chemicals.

    In this work we introduce the problem of characterizing the differences
    between attributed subgraphs that belong to different classes. We define this
    characterization problem as one of partitioning the attributes into as many
    groups as the number of classes, while maximizing the total attributed quality
    score of all the given subgraphs.

    We show that our attribute-to-class assignment problem is NP-hard and an
    optimal ((1 – 1/e))-approximation algorithm exists. We also propose two
    different faster heuristics that are linear-time in the number of attributes
    and subgraphs. Unlike previous work where only attributes were taken into
    account for characterization, here we exploit both attributes and social ties
    (i.e. graph structure).

    Through extensive experiments, we compare our proposed algorithms, show
    findings that agree with human intuition on datasets from Amazon co-purchases,
    Congressional bill sponsorships, and DBLP co-authorships. We also show that our
    approach of characterizing subgraphs is better suited for sense-making than
    discriminating classification approaches.


    Computation and Language

    Robust Multilingual Named Entity Recognition with Shallow Semi-Supervised Features

    Rodrigo Agerri, German Rigau
    Comments: 26 pages, 19 tables (submitted for publication on September 2015), Artificial Intelligence (2016)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We present a multilingual Named Entity Recognition approach based on a robust
    and general set of features across languages and datasets. Our system combines
    shallow local information with clustering semi-supervised features induced on
    large amounts of unlabeled text. Understanding via empirical experimentation
    how to effectively combine various types of clustering features allows us to
    seamlessly export our system to other datasets and languages. The result is a
    simple but highly competitive system which obtains state of the art results
    across five languages and twelve datasets. The results are reported on standard
    shared task evaluation data such as CoNLL for English, Spanish and Dutch.
    Furthermore, and despite the lack of linguistically motivated features, we also
    report best results for languages such as Basque and German. In addition, we
    demonstrate that our method also obtains very competitive results even when the
    amount of supervised data is cut by half, alleviating the dependency on
    manually annotated data. Finally, the results show that our emphasis on
    clustering features is crucial to develop robust out-of-domain models. The
    system and models are freely available to facilitate its use and guarantee the
    reproducibility of results.

    CommAI: Evaluating the first steps towards a useful general AI

    Marco Baroni, Armand Joulin, Allan Jabri, Germàn Kruszewski, Angeliki Lazaridou, Klemen Simonic, Tomas Mikolov
    Comments: Submitted to ICLR 2017 Workshop Track
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    With machine learning successfully applied to new daunting problems almost
    every day, general AI starts looking like an attainable goal. However, most
    current research focuses instead on important but narrow applications, such as
    image classification or machine translation. We believe this to be largely due
    to the lack of objective ways to measure progress towards broad machine
    intelligence. In order to fill this gap, we propose here a set of concrete
    desiderata for general AI, together with a platform to test machines on how
    well they satisfy such desiderata, while keeping all further complexities to a
    minimum.


    Distributed, Parallel, and Cluster Computing

    Mutual Inclusivity of the Critical Path and its Partial Schedule on Heterogeneous Systems

    Aravind Vasudevan, David Gregg
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    The critical path of a group of tasks is an important measure that is
    commonly used to guide task allocation and scheduling on parallel computers.
    The critical path is the longest chain of dependencies in an acyclic task
    dependence graph. A problem arises on heterogeneous parallel machines where
    computation and communication costs can vary between different types of
    processor. Existing solutions for heterogeneous machines attempt to estimate
    the critical path using average values of computation and communication costs.
    However, this ignores opportunities to match specific tasks to specific classes
    of processor and communication links, and can result in quite misleading paths
    being identified as critical. We argue that an accurate critical path must
    consider the mapping of tasks to classes of processor and communication links.
    We formulate a polynomial time algorithm to find such a critical path. Our
    Critical Earliest Finish Time (CEFT) algorithm finds both the length of the
    critical path and an allocation of tasks to processors on that path. We
    compared CEFT experimentally to existing approaches such as averaging execution
    times across processors. The latter approach fails to accurately model the
    execution cost of tasks, and as a result fails to identify a correct critical
    path in 83.99% of cases in our experiments. We also adapted a critical
    path-oriented scheduling algorithm (CPOP) to use our critical path algorithm
    and found that the resulting schedules are faster.

    A parallel approach to bi-objective integer programming

    William Pettersson, Melih Ozlen
    Comments: 7 pages
    Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    To obtain a better understanding of the trade-offs between various
    objectives, Bi-Objective Integer Programming (BOIP) algorithms calculate the
    set of all non-dominated vectors and present these as the solution to a BOIP
    problem. Historically, these algorithms have been compared in terms of the
    number of single-objective IPs solved and total CPU time taken to produce the
    solution to a problem. This is equitable, as researchers can often have access
    to widely differing amounts of computing power. However, the real world has
    recently seen a large uptake of multi-core processors in computers, laptops,
    tablets and even mobile phones. With this in mind, we look at how to best
    utilise parallel processing to improve the elapsed time of optimisation
    algorithms. We present two methods of parallelising the recursive algorithm
    presented by Ozlen, Burton and MacRae. Both new methods utilise two threads and
    improve running times. One of the new methods, the Meeting algorithm, halves
    running time to achieve near-perfect parallelisation. The results are compared
    with the efficiency of parallelisation within the commercial IP solver IBM ILOG
    CPLEX, and the new methods are both shown to perform better.


    Learning

    A Dirichlet Mixture Model of Hawkes Processes for Event Sequence Clustering

    Hongteng Xu, Hongyuan Zha
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We propose an effective method to solve the event sequence clustering
    problems based on a novel Dirichlet mixture model of a special but significant
    type of point processes — Hawkes process. In this model, each event sequence
    belonging to a cluster is generated via the same Hawkes process with specific
    parameters, and different clusters correspond to different Hawkes processes.
    The prior distribution of the Hawkes processes is controlled via a Dirichlet
    process. We learn the model via a maximum likelihood estimator (MLE) and
    propose an effective variational Bayesian inference algorithm. We specifically
    analyze the resulting EM-type algorithm in the context of inner-outer
    iterations and discuss several inner iteration allocation strategies. The
    identifiability of our model, the convergence of our learning method, and its
    sample complexity are analyzed in both theoretical and empirical ways, which
    demonstrate the superiority of our method to other competitors. The proposed
    method learns the number of clusters automatically and is robust to model
    misspecification. Experiments on both synthetic and real-world data show that
    our method can learn diverse triggering patterns hidden in asynchronous event
    sequences and achieve encouraging performance on clustering purity and
    consistency.

    Efficient Rank Aggregation via Lehmer Codes

    Pan Li, Arya Mazumdar, Olgica Milenkovic
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    We propose a novel rank aggregation method based on converting permutations
    into their corresponding Lehmer codes or other subdiagonal images. Lehmer
    codes, also known as inversion vectors, are vector representations of
    permutations in which each coordinate can take values not restricted by the
    values of other coordinates. This transformation allows for decoupling of the
    coordinates and for performing aggregation via simple scalar median or mode
    computations. We present simulation results illustrating the performance of
    this completely parallelizable approach and analytically prove that both the
    mode and median aggregation procedure recover the correct centroid aggregate
    with small sample complexity when the permutations are drawn according to the
    well-known Mallows models. The proposed Lehmer code approach may also be used
    on partial rankings, with similar performance guarantees.

    Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point

    Naveen Mellempudi, Abhisek Kundu, Dipankar Das, Dheevatsa Mudigere, Bharat Kaul
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose a cluster-based quantization method to convert pre-trained full
    precision weights into ternary weights with minimal impact on the accuracy. In
    addition, we also constrain the activations to 8-bits thus enabling sub 8-bit
    full integer inference pipeline. Our method uses smaller clusters of N filters
    with a common scaling factor to minimize the quantization loss, while also
    maximizing the number of ternary operations. We show that with a cluster size
    of N=4 on Resnet-101, can achieve 71.8% TOP-1 accuracy, within 6% of the best
    full precision results while replacing approx 85% of all multiplications with
    8-bit accumulations. Using the same method with 4-bit weights achieves 76.3%
    TOP-1 accuracy which within 2% of the full precision result. We also study the
    impact of the size of the cluster on both performance and accuracy, larger
    cluster sizes N=64 can replace approx 98% of the multiplications with ternary
    operations but introduces significant drop in accuracy which necessitates fine
    tuning the parameters with retraining the network at lower precision. To
    address this we have also trained low-precision Resnet-50 with 8-bit
    activations and ternary weights by pre-initializing the network with full
    precision weights and achieve 68.9% TOP-1 accuracy within 4 additional epochs.
    Our final quantized model can run on a full 8-bit compute pipeline, with a
    potential 16x improvement in performance compared to baseline full-precision
    models.

    CommAI: Evaluating the first steps towards a useful general AI

    Marco Baroni, Armand Joulin, Allan Jabri, Germàn Kruszewski, Angeliki Lazaridou, Klemen Simonic, Tomas Mikolov
    Comments: Submitted to ICLR 2017 Workshop Track
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    With machine learning successfully applied to new daunting problems almost
    every day, general AI starts looking like an attainable goal. However, most
    current research focuses instead on important but narrow applications, such as
    image classification or machine translation. We believe this to be largely due
    to the lack of objective ways to measure progress towards broad machine
    intelligence. In order to fill this gap, we propose here a set of concrete
    desiderata for general AI, together with a platform to test machines on how
    well they satisfy such desiderata, while keeping all further complexities to a
    minimum.

    Deep Submodular Functions

    Jeffrey Bilmes, Wenruo Bai
    Subjects: Learning (cs.LG)

    We start with an overview of a class of submodular functions called SCMMs
    (sums of concave composed with non-negative modular functions plus a final
    arbitrary modular). We then define a new class of submodular functions we call
    {em deep submodular functions} or DSFs. We show that DSFs are a flexible
    parametric family of submodular functions that share many of the properties and
    advantages of deep neural networks (DNNs). DSFs can be motivated by considering
    a hierarchy of descriptive concepts over ground elements and where one wishes
    to allow submodular interaction throughout this hierarchy. Results in this
    paper show that DSFs constitute a strictly larger class of submodular functions
    than SCMMs. We show that, for any integer (k>0), there are (k)-layer DSFs that
    cannot be represented by a (k’)-layer DSF for any (k'<k). This implies that,
    like DNNs, there is a utility to depth, but unlike DNNs, the family of DSFs
    strictly increase with depth. Despite this, we show (using a “backpropagation”
    like method) that DSFs, even with arbitrarily large (k), do not comprise all
    submodular functions. In offering the above results, we also define the notion
    of an antitone superdifferential of a concave function and show how this
    relates to submodular functions (in general), DSFs (in particular), negative
    second-order partial derivatives, continuous submodularity, and concave
    extensions. To further motivate our analysis, we provide various special case
    results from matroid theory, comparing DSFs with forms of matroid rank, in
    particular the laminar matroid. Lastly, we discuss strategies to learn DSFs,
    and define the classes of deep supermodular functions, deep difference of
    submodular functions, and deep multivariate submodular functions, and discuss
    where these can be useful in applications.

    SenseGen: A Deep Learning Architecture for Synthetic Sensor Data Generation

    Moustafa Alzantot, Supriyo Chakraborty, Mani B. Srivastava
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Our ability to synthesize sensory data that preserves specific statistical
    properties of the real data has had tremendous implications on data privacy and
    big data analytics. The synthetic data can be used as a substitute for
    selective real data segments,that are sensitive to the user, thus protecting
    privacy and resulting in improved analytics.However, increasingly adversarial
    roles taken by data recipients such as mobile apps, or other cloud-based
    analytics services, mandate that the synthetic data, in addition to preserving
    statistical properties, should also be difficult to distinguish from the real
    data. Typically, visual inspection has been used as a test to distinguish
    between datasets. But more recently, sophisticated classifier models
    (discriminators), corresponding to a set of events, have also been employed to
    distinguish between synthesized and real data. The model operates on both
    datasets and the respective event outputs are compared for consistency. In this
    paper, we take a step towards generating sensory data that can pass a deep
    learning based discriminator model test, and make two specific contributions:
    first, we present a deep learning based architecture for synthesizing sensory
    data. This architecture comprises of a generator model, which is a stack of
    multiple Long-Short-Term-Memory (LSTM) networks and a Mixture Density Network.
    second, we use another LSTM network based discriminator model for
    distinguishing between the true and the synthesized data. Using a dataset of
    accelerometer traces, collected using smartphones of users doing their daily
    activities, we show that the deep learning based discriminator model can only
    distinguish between the real and synthesized traces with an accuracy in the
    neighborhood of 50%.

    Spatial Projection of Multiple Climate Variables using Hierarchical Multitask Learning

    André R. Gonçalves, Arindam Banerjee, Fernando J. Von Zuben
    Comments: Accepted for the 31st AAAI Conference on Artificial Intelligence (AAAI-17)
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Future projection of climate is typically obtained by combining outputs from
    multiple Earth System Models (ESMs) for several climate variables such as
    temperature and precipitation. While IPCC has traditionally used a simple model
    output average, recent work has illustrated potential advantages of using a
    multitask learning (MTL) framework for projections of individual climate
    variables. In this paper we introduce a framework for hierarchical multitask
    learning (HMTL) with two levels of tasks such that each super-task, i.e., task
    at the top level, is itself a multitask learning problem over sub-tasks. For
    climate projections, each super-task focuses on projections of specific climate
    variables spatially using an MTL formulation. For the proposed HMTL approach, a
    group lasso regularization is added to couple parameters across the
    super-tasks, which in the climate context helps exploit relationships among the
    behavior of different climate variables at a given spatial location. We show
    that some recent works on MTL based on learning task dependency structures can
    be viewed as special cases of HMTL. Experiments on synthetic and real climate
    data show that HMTL produces better results than decoupled MTL methods applied
    separately on the super-tasks and HMTL significantly outperforms baselines for
    climate projection.

    Emergence of Selective Invariance in Hierarchical Feed Forward Networks

    Dipan K. Pal, Vishnu Boddeti, Marios Savvides
    Subjects: Learning (cs.LG)

    Many theories have emerged which investigate how in- variance is generated in
    hierarchical networks through sim- ple schemes such as max and mean pooling.
    The restriction to max/mean pooling in theoretical and empirical studies has
    diverted attention away from a more general way of generating invariance to
    nuisance transformations. We con- jecture that hierarchically building
    selective invariance (i.e. carefully choosing the range of the transformation
    to be in- variant to at each layer of a hierarchical network) is im- portant
    for pattern recognition. We utilize a novel pooling layer called adaptive
    pooling to find linear pooling weights within networks. These networks with the
    learnt pooling weights have performances on object categorization tasks that
    are comparable to max/mean pooling networks. In- terestingly, adaptive pooling
    can converge to mean pooling (when initialized with random pooling weights),
    find more general linear pooling schemes or even decide not to pool at all. We
    illustrate the general notion of selective invari- ance through object
    categorization experiments on large- scale datasets such as SVHN and ILSVRC
    2012.

    Learning from various labeling strategies for suicide-related messages on social media: An experimental study

    Tong Liu, Qijin Cheng, Christopher M. Homan, Vincent M.B. Silenzio
    Comments: 8 pages, 4 figures, 7 tables
    Subjects: Learning (cs.LG); Computers and Society (cs.CY); Social and Information Networks (cs.SI)

    Suicide is an important but often misunderstood problem, one that researchers
    are now seeking to better understand through social media. Due in large part to
    the fuzzy nature of what constitutes suicidal risks, most supervised approaches
    for learning to automatically detect suicide-related activity in social media
    require a great deal of human labor to train. However, humans themselves have
    diverse or conflicting views on what constitutes suicidal thoughts. So how to
    obtain reliable gold standard labels is fundamentally challenging and, we
    hypothesize, depends largely on what is asked of the annotators and what slice
    of the data they label. We conducted multiple rounds of data labeling and
    collected annotations from crowdsourcing workers and domain experts. We
    aggregated the resulting labels in various ways to train a series of supervised
    models. Our preliminary evaluations show that using unanimously agreed labels
    from multiple annotators is helpful to achieve robust machine models.

    A Mixture Model and Task Allocation Scheme for Crowdsourcing

    Irineo Cabreros, Karan Singh, Angela Zhou
    Comments: Presented at the 2016 Data Efficient Machine Learning Workshop at International Conference on Machine Learning
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Aggregating crowdsourced labels to generate reliable estimates of the true
    answer is an important challenge for crowdsourcing systems. We introduce the
    mixture of experts model, an extension of the classical Dawid-Skene [2] model,
    where each question is comprised of a mixture of k topics upon which each user
    has varying expertise. Under this new model, we (1) provide a spectral
    algorithm for estimating true labels with provable asymptotic guarantees and
    (2) introduce a task assignment algorithm in a quasi-online setting. We test
    these algorithms on simulated data, finding that both methods perform well for
    large budget allocations and that the assignment algorithm boosts performance
    in the low-sampling regime. As a consequence of this work, we also develop a
    simple method for efficiently fitting the mixture of experts model that may be
    more widely applicable.

    Bayesian Learning of Consumer Preferences for Residential Demand Response

    Mikhail V. Goubko, Sergey O. Kuznetsov, Alexey A. Neznanov, Dmitry I. Ignatov
    Journal-ref: IFAC-PapersOnLine, 49(32), 2016, p. 24-29, ISSN 2405-8963
    Subjects: Learning (cs.LG); Systems and Control (cs.SY); Machine Learning (stat.ML)

    In coming years residential consumers will face real-time electricity tariffs
    with energy prices varying day to day, and effective energy saving will require
    automation – a recommender system, which learns consumer’s preferences from her
    actions. A consumer chooses a scenario of home appliance use to balance her
    comfort level and the energy bill. We propose a Bayesian learning algorithm to
    estimate the comfort level function from the history of appliance use. In
    numeric experiments with datasets generated from a simulation model of a
    consumer interacting with small home appliances the algorithm outperforms
    popular regression analysis tools. Our approach can be extended to control an
    air heating and conditioning system, which is responsible for up to half of a
    household’s energy bill.

    Towards Adversarial Retinal Image Synthesis

    Pedro Costa, Adrian Galdran, Maria Inês Meyer, Michael David Abràmoff, Meindert Niemeijer, Ana Maria Mendonça, Aurélio Campilho
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Synthesizing images of the eye fundus is a challenging task that has been
    previously approached by formulating complex models of the anatomy of the eye.
    New images can then be generated by sampling a suitable parameter space. In
    this work, we propose a method that learns to synthesize eye fundus images
    directly from data. For that, we pair true eye fundus images with their
    respective vessel trees, by means of a vessel segmentation technique. These
    pairs are then used to learn a mapping from a binary vessel tree to a new
    retinal image. For this purpose, we use a recent image-to-image translation
    technique, based on the idea of adversarial learning. Experimental results show
    that the original and the generated images are visually different in terms of
    their global appearance, in spite of sharing the same vessel tree.
    Additionally, a quantitative quality analysis of the synthetic retinal images
    confirms that the produced images retain a high proportion of the true image
    set quality.

    Variable selection for clustering with Gaussian mixture models: state of the art

    Abdelghafour Talibi, Boujemâa Achchab, Rafik Lasri
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    The mixture models have become widely used in clustering, given its
    probabilistic framework in which its based, however, for modern databases that
    are characterized by their large size, these models behave disappointingly in
    setting out the model, making essential the selection of relevant variables for
    this type of clustering. After recalling the basics of clustering based on a
    model, this article will examine the variable selection methods for model-based
    clustering, as well as presenting opportunities for improvement of these
    methods.

    Deep Reinforcement Learning for Visual Object Tracking in Videos

    Da Zhang, Hamid Maei, Xin Wang, Yuan-Fang Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Convolutional neural network (CNN) models have achieved tremendous success in
    many visual detection and recognition tasks. Unfortunately, visual tracking, a
    fundamental computer vision problem, is not handled well using the existing CNN
    models, because most object trackers implemented with CNN do not effectively
    leverage temporal and contextual information among consecutive frames.
    Recurrent neural network (RNN) models, on the other hand, are often used to
    process text and voice data due to their ability to learn intrinsic
    representations of sequential and temporal data. Here, we propose a novel
    neural network tracking model that is capable of integrating information over
    time and tracking a selected target in video. It comprises three components: a
    CNN extracting best tracking features in each video frame, an RNN constructing
    video memory state, and a reinforcement learning (RL) agent making target
    location decisions. The tracking problem is formulated as a decision-making
    process, and our model can be trained with RL algorithms to learn good tracking
    policies that pay attention to continuous, inter-frame correlation and maximize
    tracking performance in the long run. We compare our model with an existing
    neural-network based tracking method and show that the proposed tracking
    approach works well in various scenarios by performing rigorous validation
    experiments on artificial video sequences with ground truth. To the best of our
    knowledge, our tracker is the first neural-network tracker that combines
    convolutional and recurrent networks with RL algorithms.

    Algorithm selection of off-policy reinforcement learning algorithm

    Laroche Romain, Feraud Raphael
    Comments: 27 pages
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Optimization and Control (math.OC)

    Dialogue systems rely on a careful reinforcement learning design: the
    learning algorithm and its state space representation. In lack of more rigorous
    knowledge, the designer resorts to its practical experience to choose the best
    option. In order to automate and to improve the performance of the
    aforementioned process, this article formalises the problem of online
    off-policy reinforcement learning algorithm selection. A meta-algorithm is
    given for input a portfolio constituted of several off-policy reinforcement
    learning algorithms. It then determines at the beginning of each new
    trajectory, which algorithm in the portfolio is in control of the behaviour
    during the full next trajectory, in order to maximise the return. The article
    presents a novel meta-algorithm, called Epochal Stochastic Bandit Algorithm
    Selection (ESBAS). Its principle is to freeze the policy updates at each epoch,
    and to leave a rebooted stochastic bandit in charge of the algorithm selection.
    Under some assumptions, a thorough theoretical analysis demonstrates its
    near-optimality considering the structural sampling budget limitations. Then,
    ESBAS is put to the test in a set of experiments with various portfolios, on a
    negotiation dialogue game. The results show the practical benefits of the
    algorithm selection for dialogue systems, in most cases even outperforming the
    best algorithm in the portfolio, even when the aforementioned assumptions are
    transgressed.

    A Compressed Sensing Based Decomposition of Electrodermal Activity Signals

    Swayambhoo Jain, Urvashi Oswal, Kevin S. Xu, Brian Eriksson, Jarvis Haupt
    Comments: To appear in IEEE Transactions on Biomedical Engineering
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Applications (stat.AP)

    The measurement and analysis of Electrodermal Activity (EDA) offers
    applications in diverse areas ranging from market research, to seizure
    detection, to human stress analysis. Unfortunately, the analysis of EDA signals
    is made difficult by the superposition of numerous components which can obscure
    the signal information related to a user’s response to a stimulus. We show how
    simple pre-processing followed by a novel compressed sensing based
    decomposition can mitigate the effects of the undesired noise components and
    help reveal the underlying physiological signal. The proposed framework allows
    for decomposition of EDA signals with provable bounds on the recovery of user
    responses. We test our procedure on both synthetic and real-world EDA signals
    from wearable sensors and demonstrate that our approach allows for more
    accurate recovery of user responses as compared to the existing techniques.


    Information Theory

    Automatic Kolmogorov complexity and normality revisited

    Alexander Shen
    Subjects: Information Theory (cs.IT); Formal Languages and Automata Theory (cs.FL)

    It is well known that normality (all factors of given length appear in an
    infinite sequence with the same frequency) can be described as
    incompressibility via finite automata. Still the statement and proof of this
    result as given by Becher and Heiber in terms of “lossless finite-state
    compressors” do not follow the standard scheme of Kolmogorov complexity
    definition (the automaton is used for compression, not decompression). We
    modify this approach to make it more similar to the traditional Kolmogorov
    complexity theory (and simpler) by explicitly defining the notion of automatic
    Kolmogorov complexity and using its simple properties. Other known notions
    (Shallit–Wang, Calude–Salomaa–Roblot) of description complexity related to
    finite automata are discussed (see the last section).

    As a byproduct, we obtain simple proofs of classical results about normality
    (equivalence of definitions with aligned occurences and all occurencies, Wall’s
    theorem saying that a normal number remains normal when multiplied by a
    rational number, and Agafonov’s result saying that normality is preserved by
    automatic selection rules).

    Fairness-Aware Caching and Radio Resource Allocation for the Downlink of Multi-Cell OFDMA Systems

    Sepehr Rezvani, Nader Mokari, Mohammad R. Javan
    Comments: 19 pages, 10 figures
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    The unprecedented growth of internet contents, specially social media,
    invoking a challenge to the load of cellular networks. In addition, nowadays,
    the demands of quality of experience (QoE) became a more practical norm in
    contrast of quality of service (QoS), in which downloading delay is an
    important factor of it. To satisfy this demand, we propose two resource
    allocation (RA) algorithms to optimize the place of social media in the cache
    of the Base stations (BSs) and radio resources (i.e., transmit powers and
    subcarriers), jointly in a multi-cell Orthogonal Frequency Division Multiple
    Access (OFDMA) based system. In the first scheme, the total downloading delay
    of the network is minimized, while in the second scheme, a fairness-aware
    scheme is proposed in which the maximum downloading delays is minimized. We
    propose iterative algorithms to solve each problem, where the content placement
    problem and the joint subcarrier and transmit power allocation problem will be
    iteratively optimized. We also prove that the proposed approaches converges to
    a near-optimal solution, as well as the number of iterations increases.

    A proposal about the meaning of scale, scope and resolution in the context of the interpretation process

    Gerardo Febres
    Comments: 25 pages, 7 figues, 5 tables
    Subjects: Information Theory (cs.IT)

    When considering perceptions, the observation scale and resolution are
    closely related properties. There is consensus in considering resolution as the
    density of elementary pieces of information in a specified information space.
    Differently, with the concept of scale, several conceptions compete for a
    consistent meaning. Scale is typically regarded as way to indicate the degree
    of detail in which an observation is performed. But surprisingly, there is not
    a unified definition of scale as a description’s property. This paper offers a
    precise definition of scale, and a method to quantify it as a property
    associated to the interpretation of a description. To complete the parameters
    needed to describe the perception of a description, the concepts of scope and
    resolution are also exposed with an exact meaning. A model describing the
    recursive process of interpretation, based on evolving steps of scale, scope
    and resolution, is introduced. The model relies on the conception of
    observation scale and its association to the selection of symbols. Five
    experiments illustrate the application of these concepts, showing that
    resolution, scale and scope integrate the set of properties to define any point
    of view from which an observation is performed and interpreted.

    Lower Bounds on the Number of Writing Operations by ILIFC with Inversion Cells

    Akira Yamawaki, Hiroshi Kamabe, Shan Lu
    Comments: 8 pages, 3 figures, submitted to ISIT2017
    Subjects: Information Theory (cs.IT)

    Index-less Indexed Flash Code (ILIFC) is a coding scheme for flash memories,
    in which one bit of a data sequence is stored in a slice consisting of several
    cells but the index of the bit is stored implicitly. Although several modified
    ILIFC schemes have been proposed, in this research we consider an ILIFC with
    inversion cells(I-ILIFC). The I-ILIFC reduces the total number of cell level
    changes at each writing request. Computer simulation is used to show that the
    I-ILIFC improves the average performance of the ILIFC in many cases. This paper
    presents our derivation of the lower bounds on the number of writing operations
    by I-ILIFC and shows that the worst-case performance of the I-ILIFC is better
    than that of the ILIFC if the code length is sufficiently large. Additionally,
    we consider the tight lower bounds thereon. The results show that the threshold
    of the code length that determines whether the I-ILIFC improves the worst-case
    performance of the ILIFC is smaller than that in the first lower bounds.

    Generic Cospark of a Matrix Can Be Computed in Polynomial Time

    Sichen Zhong, Yue Zhao
    Subjects: Information Theory (cs.IT); Computational Complexity (cs.CC)

    The cospark of a matrix is the cardinality of the sparsest vector in the
    column space of the matrix. Computing the cospark of a matrix is well known to
    be an NP hard problem. Given the sparsity pattern (i.e., the locations of the
    non-zero entries) of a matrix, if the non-zero entries are drawn from
    independently distributed continuous probability distributions, we prove that
    the cospark of the matrix equals, with probability one, to a particular number
    termed the generic cospark of the matrix. The generic cospark also equals to
    the maximum cospark of matrices consistent with the given sparsity pattern. We
    prove that the generic cospark of a matrix can be computed in polynomial time,
    and offer an algorithm that achieves this.

    On-off Analog Beamforming for Massive MIMO

    Shengli Zhang, Chongtao Guo, Taotao Wang, Wei Zhang
    Comments: 10 pages, 9 figures
    Subjects: Information Theory (cs.IT)

    This paper investigates a new analog beamforming architecture for massive
    multiple-input multiple-output (MIMO) systems, where each of the multiple
    transmit antennas is switched to be on or off to form a beam according to the
    channel state information at transmitters. This on-off analog beamforming
    (OABF) scheme has the advantages in terms of both hardware complexities and
    algorithmic complexities. OABF can completely remove the high-cost,
    power-consuming and bulky analog phase-shifters that are extensively employed
    by traditional analog beamforming schemes; it only requires the deployment of
    low-cost analog switches that are easy to implement. Moreover, we show that the
    beams formed by such simple antenna on-off switch operations can achieve rather
    good performances with low-complexity beamforming algorithms. Specifically, we
    first propose two optimal signal-to-noise ratio maximization algorithms to
    determine the on-off state of each switch under the per-antenna power
    constraint and the total power constraint, respectively. After that, we
    theoretically prove that OABF can achieve the full diversity gain and the full
    array gain with complexities up to a polynomial order. Numerical results are
    consistent with our theoretical analysis. We believe that the simple structure
    of OABF makes massive MIMO much easier to implement in real systems.

    Covert Communication with Finite Blocklength in AWGN Channels

    Shihao Yan, Biao He, Yirui Cong, Xiangyun Zhou
    Comments: Accepted by IEEE ICC 2017
    Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)

    Covert communication is to achieve a reliable transmission from a transmitter
    to a receiver while guaranteeing an arbitrarily small probability of this
    transmission being detected by a warden. In this work, we study the covert
    communication in AWGN channels with finite blocklength, in which the number of
    channel uses is finite. Specifically, we analytically prove that the entire
    block (all available channel uses) should be utilized to maximize the effective
    throughput of the transmission subject to a predetermined covert requirement.
    This is a nontrivial result because more channel uses results in more
    observations at the warden for detecting the transmission. We also determine
    the maximum allowable transmit power per channel use, which is shown to
    decrease as the blocklength increases. Despite the decrease in the maximum
    allowable transmit power per channel use, the maximum allowable total power
    over the entire block is proved to increase with the blocklength, which leads
    to the fact that the effective throughput increases with the blocklength.

    A Covert Queueing Channel in Round Robin Schedulers

    AmirEmad Ghassami, Ali Yekkehkhany, Negar Kiyavash, Yi Lu
    Subjects: Information Theory (cs.IT)

    We study a covert queueing channel between two users sharing a round robin
    scheduler. Such a covert channel can arise when users share a resource such as
    a computer processor or a router arbitrated by a round robin policy. We present
    an information-theoretic framework to model and derive the maximum reliable
    data transmission rate, i.e., the capacity of this channel for both noiseless
    and noisy scenarios. Our results show that seemingly isolated users can
    communicate with high rate over the covert channel. Furthermore, we propose a
    practical finite-length code construction, which achieves the capacity limit.

    New Lower Bound on the Ergodic Capacity of Optical MIMO Channels

    Rémi Bonnefoi, Amor Nafkha
    Comments: 4 pages, 4 figures, accepted at IEEE ICC 2017 Optical Networks and Systems Symposium
    Subjects: Information Theory (cs.IT)

    In this paper, we present an analytical lower bound on the ergodic capacity
    of optical multiple-input multiple-output (MIMO) channels. It turns out that
    the optical MIMO channel matrix which couples the mt inputs (modes/cores) into
    mr outputs (modes/cores) can be modeled as a sub-matrix of a m x m
    Haar-distributed unitary matrix where m > mt,mr. Using the fact that the
    probability density of the eigenvalues of a random matrix from unitary ensemble
    can be expressed in terms of the Christoffel-Darboux kernel. We provide a new
    analytical expression of the ergodic capacity as function of signal-to-noise
    ratio (SNR). Moreover, we derive a closed-form lower-bound expression to the
    ergodic capacity. In addition, we also derive an approximation to the ergodic
    capacity in low-SNR regimes. Finally, we present numerical results supporting
    the expressions derived.

    Performance Characterization of a Real-Time Massive MIMO System with LOS Mobile Channels

    Paul Harris, Steffen Malkowsky, Joao Vieira, Fredrik Tufvesson Wael Boukley Hassan, Liang Liu, Mark Beach, Simon Armour, Ove Edfors
    Comments: Submitted to the 2017 IEEE JSAC Special Issue on Deployment Issues and Performance Challenges for 5G
    Subjects: Information Theory (cs.IT)

    The first measured results for massive MIMO performance in a line-of-sight
    (LOS) scenario with moderate mobility are presented, with 8 users served in
    real-time using a 100 antenna base Station (BS) at 3:7 GHz. When such a large
    number of channels dynamically change, the inherent propagation and processing
    delay has a critical relationship with the rate of change, as the use of
    outdated channel information can result in severe detection and precoding
    inaccuracies. For the downlink (DL) in particular, a time division duplex (TDD)
    configuration synonymous with massive multiple-input, multiple-output (MIMO)
    deployments could mean only the uplink (UL) is usable in extreme cases.
    Therefore, it is of great interest to investigate the impact of mobility on
    massive MIMO performance and consider ways to combat the potential limitations.
    In a mobile scenario with moving cars and pedestrians, the massive MIMO channel
    is sampled across many points in space to build a picture of the overall user
    orthogonality, and the impact of both azimuth and elevation array
    configurations are considered. Temporal analysis is also conducted for vehicles
    moving up to 29km/h and real-time bit error rates (BERs) for both the UL and DL
    without power control are presented.

    On the Capacity of the AWGN Channel with Additive Radar Interference

    Sara Shahi, Daniela Tuninetti, Natasha Devroye
    Comments: Under submission
    Subjects: Information Theory (cs.IT)

    This paper investigates the capacity of a communications channel that, in
    addition to additive white Gaussian noise, also suffers from interference
    caused by a co-existing radar transmission. The radar interference (of short
    duty-cycle and of much wider bandwidth than the intended communication signal)
    is modeled as an additive term whose amplitude is known and constant, but whose
    phase is independent and identically uniformly distributed at each channel use.
    The capacity achieving input distribution, under the standard average power
    constraint, is shown to have independent modulo and phase. The phase is
    uniformly distributed in ([0,2pi]). The modulo is discrete with countably
    infinitly many mass points, but only finitely many in any bounded interval.
    From numerical evaluations, a proper-complex Gaussian input is seen to perform
    quite well for weak radar interference. We also show that for very large radar
    interference, capacity is equal to (1/2log (1 + S)) and a proper-complex
    Gaussian input achieves it. It is concluded that the presence of the radar
    interference results in a loss of half of the degrees of freedom compared to an
    AWGN channel without radar interference.

    Universal Tree Source Coding Using Grammar-Based Compression

    Danny Hucke, Markus Lohrey
    Subjects: Information Theory (cs.IT)

    We apply so-called tree straight-line programs to the problem of lossless
    compression of binary trees. We derive upper bound on the maximal pointwise
    redundancy (or worst-case redundancy) that improve previous bounds obtained by
    Zhang, Yang, and Kieffer using directed acyclic graphs. Using this, we obtain
    universal codes for new classes of structered tree sources.

    On the Capacity of the Slotted Strongly Asynchronous Channel with a Bursty User

    Sara Shahi, Daniela Tuninetti, Natasha Devroye
    Comments: Under submission
    Subjects: Information Theory (cs.IT)

    The slotted strongly asynchronous channel with a bursty user consists of
    (A_n=e^{nalpha}) blocks of length (n) channel uses. A user transmits one
    randomly selected message among (M_n=e^{n R}) different ones in exactly
    (K_n=e^{n
    u}) randomly selected but distinct blocks of the available (A_n)
    blocks. This paper studies the trade-off between ((R,alpha,
    u)). For the pure
    synchronization problem, i.e., (R=0), the receiver must locate, with vanishing
    error probability in (n), each of the (K_n) blocks in which the user is
    transmitting; the trade-off between (alpha) and (
    u) is exactly
    characterized. For the case with (R>0), the receiver is also required to
    reliably decode all the (K_n) transmitted messages; outer and inner regions for
    ((R,alpha,
    u)) are proposed.

    Sparse phase retrieval of one-dimensional signals by Prony's method

    Robert Beinert, Gerlind Plonka
    Subjects: Numerical Analysis (math.NA); Information Theory (cs.IT)

    In this paper, we show that sparse signals f representable as a linear
    combination of a finite number N of spikes at arbitrary real locations or as a
    finite linear combination of B-splines of order m with arbitrary real knots can
    be almost surely recovered from O(N^2) Fourier intensity measurements up to
    trivial ambiguities. The constructive proof consists of two steps, where in the
    first step the Prony method is applied to recover all parameters of the
    autocorrelation function and in the second step the parameters of f are
    derived. Moreover, we present an algorithm to evaluate f from its Fourier
    intensities and illustrate it at different numerical examples.

    Chentsov's theorem for exponential families

    James G. Dowty
    Comments: 14 pages
    Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Differential Geometry (math.DG); Probability (math.PR)

    Chentsov’s theorem characterizes the Fisher information metrics on
    statistical models as essentially the only Riemannian metrics that are
    invariant under sufficient statistics. This implies that statistical models are
    naturally equipped with geometric structures, so Chentsov’s theorem explains
    why many statistical properties can be described in geometric terms. However,
    despite being one of the foundational theorems of statistics, Chentsov’s
    theorem has only been proved previously in very restricted settings or under
    relatively strong regularity and invariance assumptions. We therefore prove a
    version of this theorem for the important case of exponential families. In
    particular, we characterise the Fisher information metric as the only
    Riemannian metric (up to rescaling) on an exponential family and its derived
    families that is invariant under independent and identically distributed
    extensions and canonical sufficient statistics. Our approach is based on the
    central limit theorem, so it gives a unified proof for both discrete and
    continuous exponential families, and it is less technical than previous
    approaches.




沪ICP备19023445号-2号
友情链接