IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Mon, 12 Dec 2016

    我爱机器学习(52ml.net)发表于 2016-12-12 00:00:00
    love 0

    Neural and Evolutionary Computing

    Field-Programmable Crossbar Array (FPCA) for Reconfigurable Computing

    Mohammed A. Zidan, YeonJoo Jeong, Jong Hong Shin, Chao Du, Wei D. Lu
    Subjects: Emerging Technologies (cs.ET); Hardware Architecture (cs.AR); Neural and Evolutionary Computing (cs.NE)

    For decades, advances in electronics were directly related to the scaling of
    CMOS transistors according to Moore’s law. However, both the CMOS scaling and
    the classical computer architecture are approaching fundamental and practical
    limits, and new computing architectures based on emerging devices, such as
    non-volatile memories e.g. resistive memory (RRAM) devices, are expected to
    sustain the exponential growth of computing capability. Here we propose a novel
    memory-centric, reconfigurable, general purpose computing platform to handle
    the exploding amount of data in a fast and energy-efficient manner. The
    proposed computing architecture is based on a single physical resistive
    memory-centric fabric that can be optimally reconfigured and utilized to
    perform different computing and data storage tasks in a massively parallel
    approach. The system can be tailored to achieve maximal energy efficiency based
    on the data flow by dynamically allocating the basic computing fabric to
    storage, arithmetic, and analog including neuromorphic computing tasks.

    A Review of Intelligent Practices for Irrigation Prediction

    Hans Krupakar, Akshay Jayakumar, Dhivya G
    Comments: 18 pages, 3 figures, 1 table, In National Conference on Computational Intelligence and High-Performance Computing (NCCIHPC)
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Population growth and increasing droughts are creating unprecedented strain
    on the continued availability of water resources. Since irrigation is a major
    consumer of fresh water, wastage of resources in this sector could have strong
    consequences. To address this issue, irrigation water management and prediction
    techniques need to be employed effectively and should be able to account for
    the variabilities present in the environment. The different techniques surveyed
    in this paper can be classified into two categories: computational and
    statistical. Computational methods deal with scientific correlations between
    physical parameters whereas statistical methods involve specific prediction
    algorithms that can be used to automate the process of irrigation water
    prediction. These algorithms interpret semantic relationships between the
    various parameters of temperature, pressure, evapotranspiration etc. and store
    them as numerical precomputed entities specific to the conditions and the area
    used as the data for the training corpus used to train it. We focus on
    reviewing the computational methods used to determine Evapotranspiration and
    its implications. We compare the efficiencies of different data mining and
    machine learning methods implemented in this area, such as Logistic Regression,
    Decision Tress Classifier, SysFor, Support Vector Machine(SVM), Fuzzy Logic
    techniques, Artifical Neural Networks(ANNs) and various hybrids of Genetic
    Algorithms (GA) applied to irrigation prediction. We also recommend a possible
    technique for the same based on its superior results in other such time series
    analysis tasks.


    Computer Vision and Pattern Recognition

    Panoptic Studio: A Massively Multiview System for Social Interaction Capture

    Hanbyul Joo, Tomas Simon, Xulong Li, Hao Liu, Lei Tan, Lin Gui, Sean Banerjee, Timothy Godisart, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, Yaser Sheikh
    Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present an approach to capture the 3D motion of a group of people engaged
    in a social interaction. The core challenges in capturing social interactions
    are: (1) occlusion is functional and frequent; (2) subtle motion needs to be
    measured over a space large enough to host a social group; (3) human appearance
    and configuration variation is immense; and (4) attaching markers to the body
    may prime the nature of interactions. The Panoptic Studio is a system organized
    around the thesis that social interactions should be measured through the
    integration of perceptual analyses over a large variety of view points. We
    present a modularized system designed around this principle, consisting of
    integrated structural, hardware, and software innovations. The system takes, as
    input, 480 synchronized video streams of multiple people engaged in social
    activities, and produces, as output, the labeled time-varying 3D structure of
    anatomical landmarks on individuals in the space. Our algorithm is designed to
    fuse the “weak” perceptual processes in the large number of views by
    progressively generating skeletal proposals from low-level appearance cues, and
    a framework for temporal refinement is also presented by associating body parts
    to reconstructed dense 3D trajectory stream. Our system and method are the
    first in reconstructing full body motion of more than five people engaged in
    social interactions without using markers. We also empirically demonstrate the
    impact of the number of views in achieving this goal.

    Feature Pyramid Networks for Object Detection

    Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Feature pyramids are a basic component in recognition systems for detecting
    objects at different scales. But recent deep learning object detectors have
    avoided pyramid representations, in part because they are compute and memory
    intensive. In this paper, we exploit the inherent multi-scale, pyramidal
    hierarchy of deep convolutional networks to construct feature pyramids with
    marginal extra cost. A top-down architecture with lateral connections is
    developed for building high-level semantic feature maps at all scales. This
    architecture, called a Feature Pyramid Network (FPN), shows significant
    improvement as a generic feature extractor in several applications. Using FPN
    in a basic Faster R-CNN system, our method achieves state-of-the-art
    single-model results on the COCO detection benchmark without bells and
    whistles, surpassing all existing single-model entries including those from the
    COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU
    and thus is a practical and accurate solution to multi-scale object detection.
    Code will be made publicly available.

    Quantifying and Predicting Image Scenicness

    Scott Workman, Richard Souvenir, Nathan Jacobs
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Capturing the beauty of outdoor scenes in an image motivates many amateur and
    professional photographers and serves as the basis for many image sharing
    sites. While natural beauty is often considered a subjective property of
    images, in this paper, we take an objective approach and provide methods for
    quantifying and predicting the scenicness of an image. Using a dataset
    containing hundreds of thousands of outdoor images captured throughout Great
    Britain with crowdsourced ratings of natural beauty, we propose an approach to
    predict scenicness which explicitly accounts for the variance of human raters.
    We demonstrate that quantitative measures of scenicness can benefit semantic
    image understanding, content-aware image processing, and a novel application of
    cross-view mapping, where the sparsity of labeled ground-level images can be
    addressed by incorporating unlabeled aerial images in the training and
    prediction steps. For each application, our methods for scenicness prediction
    result in quantitative and qualitative improvements over baseline approaches.

    Shape-aware Instance Segmentation

    Zeeshan Hayder, Xuming He, Mathieu Salzmann
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We address the problem of instance-level semantic segmentation, which aims at
    jointly detecting, segmenting and classifying every individual object in an
    image. In this context, existing methods typically propose candidate objects,
    usually as bounding boxes, and directly predict a binary mask within each such
    proposal. As a consequence, they cannot recover from errors in the object
    candidate generation process, such as too small or shifted boxes. In this
    paper, we introduce a novel object segment representation based on the distance
    transform of the object masks. We then design an object mask network (OMN) with
    a new residual-deconvolution architecture that infers such a representation and
    decodes it into the final binary object mask. This allows us to predict masks
    that go beyond the scope of the bounding boxes and are thus robust to
    inaccurate object candidates. We integrate our OMN into a Multitask Network
    Cascade framework, and learn the resulting shape-aware instance segmentation
    (SAIS) network in an end-to-end manner. Our experiments on the PASCAL VOC 2012
    and the CityScapes datasets demonstrate the benefits of our approach, which
    outperforms the state-of-the-art in both object proposal generation and
    instance segmentation.

    Following Gaze Across Views

    Adrià Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba
    Comments: 9 pages, 8 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Following the gaze of people inside videos is an important signal for
    understanding people and their actions. In this paper, we present an approach
    for following gaze across views by predicting where a particular person is
    looking throughout a scene. We collect VideoGaze, a new dataset which we use as
    a benchmark to both train and evaluate models. Given one view with a person in
    it and a second view of the scene, our model estimates a density for gaze
    location in the second view. A key aspect of our approach is an end-to-end
    model that solves the following sub-problems: saliency, gaze pose, and
    geometric relationships between views. Although our model is supervised only
    with gaze, we show that the model learns to solve these subproblems
    automatically without supervision. Experiments suggest that our approach
    follows gaze better than standard baselines and produces plausible results for
    everyday situations.

    ActionFlowNet: Learning Motion Representation for Action Recognition

    Joe Yue-Hei Ng, Jonghyun Choi, Jan Neumann, Larry S. Davis
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Even with the recent advances in convolutional neural networks (CNN) in
    various visual recognition tasks, the state-of-the-art action recognition
    system still relies on hand crafted motion feature such as optical flow to
    achieve the best performance. We propose a multitask learning model
    ActionFlowNet to train a single stream network directly from raw pixels to
    jointly estimate optical flow while recognizing actions with convolutional
    neural networks, capturing both appearance and motion in a single model. We
    additionally provide insights to how the quality of the learned optical flow
    affects the action recognition. Our model not only significantly improves
    action recognition accuracy by a large margin (17%) compared to
    state-of-the-art CNN-based action recognition models trained without external
    large scale data and additional optical flow input, but also produces the
    optical flow as a side product.

    Automatic Model Based Dataset Generation for Fast and Accurate Crop and Weeds Detection

    Maurilio Di Cicco, Ciro Potena, Giorgio Grisetti, Alberto Pretto
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    Selective weeding is one of the key challenges in the field of agriculture
    robotics: in order to accomplish this task, a farm robot should be able to
    accurately detect plants and to distinguish them between crop and weeds. In
    this paper, we face this problem by proposing a novel and effective approach
    that aims to dramatically minimize the human intervention needed to train the
    detection and classification algorithms. The idea is to synthetically generate
    the training datasets by using state-of-the-art computer graphics methods and
    tools. We explicitly model the relevant aspects of the target environment
    (i.e., crop and weeds species, type of soil, light conditions): by changing the
    model parameters, it is possible to render a huge number of photo-realistic
    views of the environment, to be used as training data. The proposed methodology
    has been validated exploiting a very recent deep learning based image
    segmentation architecture called SegNet [Kendall et al.]. We trained and tested
    different custom-built variants of SegNet using both real and synthetically
    generated datasets, the reported results confirm the effectiveness and the
    potentiality of our approach.

    Facial Expression Recognition using Convolutional Neural Networks: State of the Art

    Christopher Pramerdorfer, Martin Kampel
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The ability to recognize facial expressions automatically enables novel
    applications in human-computer interaction and other areas. Consequently, there
    has been active research in this field, with several recent works utilizing
    Convolutional Neural Networks (CNNs) for feature extraction and inference.
    These works differ significantly in terms of CNN architectures and other
    factors. Based on the reported results alone, the performance impact of these
    factors is unclear. In this paper, we review the state of the art in
    image-based facial expression recognition using CNNs and highlight algorithmic
    differences and their performance impact. On this basis, we identify existing
    bottlenecks and consequently directions for advancing this research field.
    Furthermore, we demonstrate that overcoming one of these bottlenecks – the
    comparatively basic architectures of the CNNs utilized in this field – leads to
    a substantial performance increase. By forming an ensemble of modern deep CNNs,
    we obtain a FER2013 test accuracy of 75.2%, outperforming previous works
    without requiring auxiliary training data or face registration.

    Gesture-based Bootstrapping for Egocentric Hand Segmentation

    Yubo Zhang, Vishnu Naresh Boddeti, Kris M. Kitani
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Accurately identifying hands in images is a key sub-task for human activity
    understanding with wearable first-person point-of-view cameras. Traditional
    hand segmentation approaches rely on a large corpus of manually labeled data to
    generate robust hand detectors. However, these approaches still face challenges
    as the appearance of the hand varies greatly across users, tasks, environments
    or illumination conditions. A key observation in the case of many wearable
    applications and interfaces is that, it is only necessary to accurately detect
    the user’s hands in a specific situational context. Based on this observation,
    we introduce an interactive approach to learn a person-specific hand
    segmentation model that does not require any manually labeled training data.
    Our approach proceeds in two steps, an interactive bootstrapping step for
    identifying moving hand regions, followed by learning a personalized user
    specific hand appearance model. Concretely, our approach uses two convolutional
    neural networks: (1) a gesture network that uses pre-defined motion information
    to detect the hand region; and (2) an appearance network that learns a person
    specific model of the hand region based on the output of the gesture network.
    During training, to make the appearance network robust to errors in the gesture
    network, the loss function of the former network incorporates the confidence of
    the gesture network while learning. Experiments demonstrate the robustness of
    our approach with an F1 score over 0.8 on all challenging datasets across a
    wide range of illumination and hand appearance variations, improving over a
    baseline approach by over 10%.

    Fast Fourier single-pixel imaging using binary illumination

    Zibang Zhang, Xueying Wang, Jingang Zhong
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Fourier single-pixel imaging (FSI) has proven capable of reconstructing
    high-quality two-dimensional and three-dimensional images. The utilization of
    the sparsity of natural images in Fourier domain allows high-resolution images
    to be reconstructed from far fewer measurements than effective image pixels.
    However, applying original FSI in digital micro-mirror device (DMD) based
    high-speed imaging system turns out to be challenging, because the original FSI
    uses grayscale Fourier basis patterns for illumination while DMDs generate
    grayscale patterns at a relatively low rate. DMDs are a binary device which can
    only generate a black-and-white pattern at each instance. In this paper, we
    adopt binary Fourier patterns for illumination to achieve DMD-based high-speed
    single-pixel imaging. Binary Fourier patterns are generated by upsampling and
    then applying error diffusion based dithering to the grayscale patterns.
    Experiments demonstrate the proposed technique able to achieve static imaging
    with high quality and dynamic imaging in real time. The proposed technique
    potentially allows high-quality and high-speed imaging over broad wavebands.

    Exploiting 2D Floorplan for Building-scale Panorama RGBD Alignment

    Erik Wijmans, Yasutaka Furukawa
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a novel algorithm that utilizes a 2D floorplan to align
    panorama RGBD scans. While effective panorama RGBD alignment techniques exist,
    such a system requires extremely dense RGBD image sampling. Our approach can
    significantly reduce the number of necessary scans with the aid of a floorplan
    image. We formulate a novel Markov Random Field inference problem as a scan
    placement over the floorplan, as opposed to the conventional scan-to-scan
    alignment. The technical contributions lie in multi-modal image correspondence
    cues (between scans and schematic floorplan) as well as a novel coverage
    potential avoiding an inherent stacking bias. The proposed approach has been
    evaluated on five challenging large indoor spaces. To the best of our
    knowledge, we present the first effective system that utilizes a 2D floorplan
    image for building-scale 3D pointcloud alignment. The source code and the data
    will be shared with the community to further enhance indoor mapping research.

    Deep TEN: Texture Encoding Network

    Hang Zhang, Jia Xue, Kristin Dana
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a Deep Texture Encoding Network (Deep-TEN) with a novel Encoding
    Layer integrated on top of convolutional layers, which ports the entire
    dictionary learning and encoding pipeline into a single model. Current methods
    build from distinct components, using standard encoders with separate
    off-the-shelf features such as SIFT descriptors or pre-trained CNN features for
    material recognition. Our new approach provides an end-to-end learning
    framework, where the inherent visual vocabularies are learned directly from the
    loss function. The features, dictionaries and the encoding representation for
    the classifier are all learned simultaneously. The representation is orderless
    and therefore is particularly useful for material and texture recognition. The
    Encoding Layer generalizes robust residual encoders such as VLAD and Fisher
    Vectors, and has the property of discarding domain specific information which
    makes the learned convolutional features easier to transfer. Additionally,
    joint training using multiple datasets of varied sizes and class labels is
    supported resulting in increased recognition performance. The experimental
    results show superior performance as compared to state-of-the-art methods using
    gold-standard databases such as MINC-2500, Flickr Material Database,
    KTH-TIPS-2b, and two recent databases 4D-Light-Field-Material and GTOS. The
    source code for the complete system are publicly available.

    A series of maximum entropy upper bounds of the differential entropy

    Frank Nielsen, Richard Nock
    Comments: 18 pages
    Subjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We present a series of closed-form maximum entropy upper bounds for the
    differential entropy of a continuous univariate random variable and study the
    properties of that series. We then show how to use those generic bounds for
    upper bounding the differential entropy of Gaussian mixture models. This
    requires to calculate the raw moments and raw absolute moments of Gaussian
    mixtures in closed-form that may also be handy in statistical machine learning
    and information theory. We report on our experiments and discuss on the
    tightness of those bounds.


    Artificial Intelligence

    Measuring Adverse Drug Effects on Multimorbity using Tractable Bayesian Networks

    Jessa Bekker, Arjen Hommersom, Martijn Lappenschaar, Jesse Davis
    Comments: Machine Learning for Health @ NIPS 2016
    Subjects: Artificial Intelligence (cs.AI)

    Managing patients with multimorbidity often results in polypharmacy: the
    prescription of multiple drugs. However, the long-term effects of specific
    combinations of drugs and diseases are typically unknown. In particular, drugs
    prescribed for one condition may result in adverse effects for the other. To
    investigate which types of drugs may affect the further progression of
    multimorbidity, we query models of diseases and prescriptions that are learned
    from primary care data. State-of-the-art tractable Bayesian network
    representations, on which such complex queries can be computed efficiently, are
    employed for these large medical networks. Our results confirm that
    prescriptions may lead to unintended negative consequences in further
    development of multimorbidity in cardiovascular diseases. Moreover, a drug
    treatment for one disease group may affect diseases of another group.

    GOTM: a Goal-oriented Framework for Capturing Uncertainty of Medical Treatments

    Davoud Mougouei, David Powers
    Comments: Idea Paper
    Subjects: Artificial Intelligence (cs.AI)

    It has been widely recognized that uncertainty is an inevitable aspect of
    diagnosis and treatment of medical disorders. Such uncertainties hence, need to
    be considered in computerized medical models. The existing medical modeling
    techniques however, have mainly focused on capturing uncertainty associated
    with diagnosis of medical disorders while ignoring uncertainty of treatments.
    To tackle this issue, we have proposed using a fuzzy-based modeling and
    description technique for capturing uncertainties in treatment plans. We have
    further contributed a formal framework which allows for goal-oriented modeling
    and analysis of medical treatments.

    Learning representations through stochastic gradient descent in cross-validation error

    Richard S. Sutton, Vivek Veeriah
    Comments: preliminary draft
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Representations are fundamental to Artificial Intelligence. Typically, the
    performance of a learning system depends on its data representation. These data
    representations are usually hand-engineered based on some prior domain
    knowledge regarding the task. More recently, the trend is to learn these
    representations through deep neural networks as these can produce dramatical
    performance improvements over hand-engineered data representations. In this
    paper, we present a new incremental learning algorithm, called crossprop, for
    learning representations based on prior learning experiences. Unlike
    backpropagation, crossprop minimizes the cross-validation error. Specifically,
    our algorithm considers the influences of all the past weights on the current
    squared error, and uses this gradient for incrementally learning the weights in
    a neural network. This idea is similar to that of tuning the learning system
    through an offline cross-validation procedure. Crossprop is applicable to
    incremental learning tasks, where a sequence of examples are encountered by the
    learning system and they need to be processed one by one and then discarded.
    The learning system can use each example only once and can spend only a limited
    amount of computation for an example. From our preliminary experiments, we
    concluce that crossprop is a promising alternative for backprop for
    representation learning.

    Advancing Bayesian Optimization: The Mixed-Global-Local (MGL) Kernel and Length-Scale Cool Down

    Kim Peter Wabersich, Marc Toussaint
    Comments: Long version of accepted NIPS BayesOpt 2016 paper
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Bayesian Optimization (BO) has become a core method for solving expensive
    black-box optimization problems. While much research focussed on the choice of
    the acquisition function, we focus on online length-scale adaption and the
    choice of kernel function. Instead of choosing hyperparameters in view of
    maximum likelihood on past data, we propose to use the acquisition function to
    decide on hyperparameter adaptation more robustly and in view of the future
    optimization progress. Further, we propose a particular kernel function that
    includes non-stationarity and local anisotropy and thereby implicitly
    integrates the efficiency of local convex optimization with global Bayesian
    optimization. Comparisons to state-of-the art BO methods underline the
    efficiency of these mechanisms on global optimization benchmarks.


    Distributed, Parallel, and Cluster Computing

    Clipper: A Low-Latency Online Prediction Serving System

    Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph E. Gonzalez, Ion Stoica
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    Machine learning is being deployed in a growing number of applications which
    demand real-time, accurate, and robust predictions under heavy query load.
    However, most machine learning frameworks and systems only address model
    training and not deployment.

    In this paper, we introduce Clipper, the first general-purpose low-latency
    prediction serving system. Interposing between end-user applications and a wide
    range of machine learning frameworks, Clipper introduces a modular architecture
    to simplify model deployment across frameworks. Furthermore, by introducing
    caching, batching, and adaptive model selection techniques, Clipper reduces
    prediction latency and improves prediction throughput, accuracy, and robustness
    without modifying the underlying machine learning frameworks. We evaluate
    Clipper on four common machine learning benchmark datasets and demonstrate its
    ability to meet the latency, accuracy, and throughput demands of online serving
    applications. Finally, we compare Clipper to the TensorFlow Serving system and
    demonstrate comparable prediction throughput and latency on a range of models
    while enabling new functionality, improved accuracy, and robustness.

    Tuple spaces implementations and their efficiency

    Vitaly Buravlev, Rocco De Nicola, Claudio Antares Mezzina
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Among the paradigms for parallel and distributed computing, the one
    popularized with Linda, and based on tuple spaces, is one of the least used,
    despite the fact of being intuitive, easy to understand and to use. A tuple
    space is a repository, where processes can add, withdraw or read tuples by
    means of atomic operations. Tuples may contain different values, and processes
    can inspect their content via pattern matching. The lack of a reference
    implementation for this paradigm has prevented its widespread. In this paper,
    first we perform an extensive analysis of a number of actual implementations of
    the tuple space paradigm and summarise their main features. Then, we select
    four such implementations and compare their performances on four different case
    studies that aim at stressing different aspects of computing such as
    communication, data manipulation, and cpu usage. After reasoning on strengths
    and weaknesses of the four implementations, we conclude with some
    recommendations for future work towards building an effective implementation of
    the tuple space paradigm.

    Solidus: An Incentive-compatible Cryptocurrency Based on Permissionless Byzantine Consensus

    Ittai Abraham, Dahlia Malkhi, Kartik Nayak, Ling Ren, Alexander Spiegelman
    Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Computer Science and Game Theory (cs.GT)

    The decentralized cryptocurrency Bitcoin has experienced great success but
    also encountered many challenges. One of the challenges has been the long
    confirmation time and low transaction throughput. Another challenge is the lack
    of incentives at certain steps of the protocol, raising concerns for
    transaction withholding, selfish mining, etc. To address these challenges, we
    propose Solidus, a decentralized cryptocurrency based on permissionless
    Byzantine consensus. A core technique in Solidus is to use proof of work for
    leader election to adapt the Practical Byzantine Fault Tolerance (PBFT)
    protocol to a permissionless setting. We also design Solidus to be incentive
    compatible and to mitigate selfish mining. Solidus improves on Bitcoin in
    confirmation time and provides safety and liveness assuming Byzantine players
    and the largest coalition of rational players collectively control less than
    one-third of the computation power.


    Learning

    Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing

    Constantinos Daskalakis, Qinxuan Pan
    Subjects: Learning (cs.LG); Information Theory (cs.IT); Probability (math.PR); Statistics Theory (math.ST); Machine Learning (stat.ML)

    We show that the square Hellinger distance between two Bayesian networks on
    the same directed graph, (G), is subadditive with respect to the neighborhoods
    of (G). Namely, if (P) and (Q) are the probability distributions defined by two
    Bayesian networks on the same DAG, our inequality states that the square
    Hellinger distance, (H^2(P,Q)), between (P) and (Q) is upper bounded by the
    sum, (sum_v H^2(P_{{v} cup Pi_v}, Q_{{v} cup Pi_v})), of the square
    Hellinger distances between the marginals of (P) and (Q) on every node (v) and
    its parents (Pi_v) in the DAG. Importantly, our bound does not involve the
    conditionals but the marginals of (P) and (Q). We derive a similar inequality
    for more general Markov Random Fields.

    As an application of our inequality, we show that distinguishing whether two
    Bayesian networks (P) and (Q) on the same (but potentially unknown) DAG satisfy
    (P=Q) vs (d_{
    m TV}(P,Q)>epsilon) can be performed from
    ( ilde{O}(|Sigma|^{3/4(d+1)} cdot n/epsilon^2)) samples, where (d) is the
    maximum in-degree of the DAG and (Sigma) the domain of each variable of the
    Bayesian networks. If (P) and (Q) are defined on potentially different and
    potentially unknown trees, the sample complexity becomes
    ( ilde{O}(|Sigma|^{4.5} n/epsilon^2)), whose dependence on (n, epsilon) is
    optimal up to logarithmic factors. Lastly, if (P) and (Q) are product
    distributions over ({0,1}^n) and (Q) is known, the sample complexity becomes
    (O(sqrt{n}/epsilon^2)), which is optimal up to constant factors.

    Advancing Bayesian Optimization: The Mixed-Global-Local (MGL) Kernel and Length-Scale Cool Down

    Kim Peter Wabersich, Marc Toussaint
    Comments: Long version of accepted NIPS BayesOpt 2016 paper
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Bayesian Optimization (BO) has become a core method for solving expensive
    black-box optimization problems. While much research focussed on the choice of
    the acquisition function, we focus on online length-scale adaption and the
    choice of kernel function. Instead of choosing hyperparameters in view of
    maximum likelihood on past data, we propose to use the acquisition function to
    decide on hyperparameter adaptation more robustly and in view of the future
    optimization progress. Further, we propose a particular kernel function that
    includes non-stationarity and local anisotropy and thereby implicitly
    integrates the efficiency of local convex optimization with global Bayesian
    optimization. Comparisons to state-of-the art BO methods underline the
    efficiency of these mechanisms on global optimization benchmarks.

    Analytical Stacked Gaussian Process Model

    Kareem Abdelfatah, Junshu Bao, Gabriel Terejanu
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    A probabilistic model is proposed by stacking a set of independently trained
    Gaussian processes to obtain prediction of quantities of interests that require
    composition of functions. Analytical derivations are provided for first and
    second-order moments of the stacked Gaussian process using RBF and polynomial
    kernels. The StackedGP model can be extended to any number of layers and nodes
    per layer, and it provides flexibility in kernel selection for each node. The
    proposed nonparametric stacked model is validated using different synthetic
    datasets and its performance is measured in two real-world applications.

    A Review of Intelligent Practices for Irrigation Prediction

    Hans Krupakar, Akshay Jayakumar, Dhivya G
    Comments: 18 pages, 3 figures, 1 table, In National Conference on Computational Intelligence and High-Performance Computing (NCCIHPC)
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Population growth and increasing droughts are creating unprecedented strain
    on the continued availability of water resources. Since irrigation is a major
    consumer of fresh water, wastage of resources in this sector could have strong
    consequences. To address this issue, irrigation water management and prediction
    techniques need to be employed effectively and should be able to account for
    the variabilities present in the environment. The different techniques surveyed
    in this paper can be classified into two categories: computational and
    statistical. Computational methods deal with scientific correlations between
    physical parameters whereas statistical methods involve specific prediction
    algorithms that can be used to automate the process of irrigation water
    prediction. These algorithms interpret semantic relationships between the
    various parameters of temperature, pressure, evapotranspiration etc. and store
    them as numerical precomputed entities specific to the conditions and the area
    used as the data for the training corpus used to train it. We focus on
    reviewing the computational methods used to determine Evapotranspiration and
    its implications. We compare the efficiencies of different data mining and
    machine learning methods implemented in this area, such as Logistic Regression,
    Decision Tress Classifier, SysFor, Support Vector Machine(SVM), Fuzzy Logic
    techniques, Artifical Neural Networks(ANNs) and various hybrids of Genetic
    Algorithms (GA) applied to irrigation prediction. We also recommend a possible
    technique for the same based on its superior results in other such time series
    analysis tasks.

    Testing Bayesian Networks

    Clement Canonne, Ilias Diakonikolas, Daniel Kane, Alistair Stewart
    Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST)

    This work initiates a systematic investigation of testing {em
    high-dimensional} structured distributions by focusing on testing {em Bayesian
    networks} — the prototypical family of directed graphical models. A Bayesian
    network is defined by a directed acyclic graph, where we associate a random
    variable with each node. The value at any particular node is conditionally
    independent of all the other non-descendant nodes once its parents are fixed.
    Specifically, we study the properties of identity testing and closeness testing
    of Bayesian networks. Our main contribution is the first non-trivial efficient
    testing algorithms for these problems and corresponding information-theoretic
    lower bounds. For a wide range of parameter settings, our testing algorithms
    have sample complexity {em sublinear} in the dimension and are sample-optimal,
    up to constant factors.

    Optimal mean-based algorithms for trace reconstruction

    Anindya De, Ryan O'Donnell, Rocco Servedio
    Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Learning (cs.LG)

    In the (deletion-channel) trace reconstruction problem, there is an unknown
    (n)-bit source string (x). An algorithm is given access to independent traces
    of (x), where a trace is formed by deleting each bit of~(x) independently with
    probability~(delta). The goal of the algorithm is to recover~(x) exactly (with
    high probability), while minimizing samples (number of traces) and running
    time.

    Previously, the best known algorithm for the trace reconstruction problem was
    due to Holenstein~et~al.; it uses (exp( ilde{O}(n^{1/2}))) samples and
    running time for any fixed (0 < delta < 1). It is also what we call a
    “mean-based algorithm”, meaning that it only uses the empirical means of the
    individual bits of the traces. Holenstein~et~al.~also gave a lower bound,
    showing that any mean-based algorithm must use at least (n^{ ilde{Omega}(log
    n)}) samples.

    In this paper we improve both of these results, obtaining matching upper and
    lower bounds for mean-based trace reconstruction. For any constant deletion
    rate (0 < delta < 1), we give a mean-based algorithm that uses
    (exp(O(n^{1/3}))) time and traces; we also prove that any mean-based algorithm
    must use at least (exp(Omega(n^{1/3}))) traces. In fact, we obtain matching
    upper and lower bounds even for (delta) subconstant and (
    ho := 1-delta)
    subconstant: when ((log^3 n)/n ll delta leq 1/2) the bound is
    (exp(-Theta(delta n)^{1/3})), and when (1/sqrt{n} ll
    ho leq 1/2) the
    bound is (exp(-Theta(n/
    ho)^{1/3})).

    Our proofs involve estimates for the maxima of Littlewood polynomials on
    complex disks. We show that these techniques can also be used to perform trace
    reconstruction with random insertions and bit-flips in addition to deletions.
    We also find a surprising result: for deletion probabilities (delta > 1/2),
    the presence of insertions can actually help with trace reconstruction.

    Testing Ising Models

    Constantinos Daskalakis, Nishanth Dikkala, Gautam Kamath
    Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)

    Given samples from an unknown multivariate distribution (p), is it possible
    to distinguish whether (p) is the product of its marginals versus (p) being far
    from every product distribution? Similarly, is it possible to distinguish
    whether (p) equals a given distribution (q) versus (p) and (q) being far from
    each other? These problems of testing independence and goodness-of-fit have
    received enormous attention in statistics, information theory, and theoretical
    computer science, with sample-optimal algorithms known in several interesting
    regimes of parameters. Unfortunately, it has also been understood that these
    problems become intractable in large dimensions, necessitating exponential
    sample complexity.

    Motivated by the exponential lower bounds for general distributions as well
    as the ubiquity of Markov Random Fields (MRFs) in the modeling of
    high-dimensional distributions, we initiate the study of distribution testing
    on structured multivariate distributions, and in particular the prototypical
    example of MRFs: the Ising Model. We demonstrate that, in this structured
    setting, we can avoid the curse of dimensionality, obtaining sample and time
    efficient testers for independence and goodness-of-fit. Along the way, we
    develop new tools for establishing concentration of functions of the Ising
    model, using the exchangeable pairs framework developed by Chatterjee, and
    improving upon this framework. In particular, we prove tighter concentration
    results for multi-linear functions of the Ising model in the high-temperature
    regime.

    Phase transitions in Restricted Boltzmann Machines with generic priors

    Adriano Barra, Giuseppe Genovese, Peter Sollich, Daniele Tantari
    Comments: 5 pages, 4 figures
    Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

    We study generalised restricted Boltzmann machines with generic priors for
    units and weights, interpolating between Boolean and Gaussian variables. We
    present a complete analysis of the replica symmetric phase diagram of these
    models, which can be regarded as generalised Hopfield models. We show the way
    the paramagnetic phase boundary is directly related to the optimal size of the
    training set necessary for good generalisation in a teacher- student scenario.
    Moreover we underline the role of the retrieval phase for both inference and
    learning processes. We show that retrieval is robust for a large class of
    weight and unit priors, beyond the standard Hopfield scenario.

    Clipper: A Low-Latency Online Prediction Serving System

    Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph E. Gonzalez, Ion Stoica
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    Machine learning is being deployed in a growing number of applications which
    demand real-time, accurate, and robust predictions under heavy query load.
    However, most machine learning frameworks and systems only address model
    training and not deployment.

    In this paper, we introduce Clipper, the first general-purpose low-latency
    prediction serving system. Interposing between end-user applications and a wide
    range of machine learning frameworks, Clipper introduces a modular architecture
    to simplify model deployment across frameworks. Furthermore, by introducing
    caching, batching, and adaptive model selection techniques, Clipper reduces
    prediction latency and improves prediction throughput, accuracy, and robustness
    without modifying the underlying machine learning frameworks. We evaluate
    Clipper on four common machine learning benchmark datasets and demonstrate its
    ability to meet the latency, accuracy, and throughput demands of online serving
    applications. Finally, we compare Clipper to the TensorFlow Serving system and
    demonstrate comparable prediction throughput and latency on a range of models
    while enabling new functionality, improved accuracy, and robustness.

    BaTFLED: Bayesian Tensor Factorization Linked to External Data

    Nathan H Lazar, Mehmet Gönen, Kemal Sönmez
    Comments: 4 main pages with 14 supplemental pages
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Quantitative Methods (q-bio.QM)

    The vast majority of current machine learning algorithms are designed to
    predict single responses or a vector of responses, yet many types of response
    are more naturally organized as matrices or higher-order tensor objects where
    characteristics are shared across modes. We present a new machine learning
    algorithm BaTFLED (Bayesian Tensor Factorization Linked to External Data) that
    predicts values in a three-dimensional response tensor using input features for
    each of the dimensions. BaTFLED uses a probabilistic Bayesian framework to
    learn projection matrices mapping input features for each mode into latent
    representations that multiply to form the response tensor. By utilizing a
    Tucker decomposition, the model can capture weights for interactions between
    latent factors for each mode in a small core tensor. Priors that encourage
    sparsity in the projection matrices and core tensor allow for feature selection
    and model regularization. This method is shown to far outperform elastic net
    and neural net models on ‘cold start’ tasks from data simulated in a three-mode
    structure. Additionally, we apply the model to predict dose-response curves in
    a panel of breast cancer cell lines treated with drug compounds that was used
    as a Dialogue for Reverse Engineering Assessments and Methods (DREAM)
    challenge.

    A series of maximum entropy upper bounds of the differential entropy

    Frank Nielsen, Richard Nock
    Comments: 18 pages
    Subjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We present a series of closed-form maximum entropy upper bounds for the
    differential entropy of a continuous univariate random variable and study the
    properties of that series. We then show how to use those generic bounds for
    upper bounding the differential entropy of Gaussian mixture models. This
    requires to calculate the raw moments and raw absolute moments of Gaussian
    mixtures in closed-form that may also be handy in statistical machine learning
    and information theory. We report on our experiments and discuss on the
    tightness of those bounds.

    Learning representations through stochastic gradient descent in cross-validation error

    Richard S. Sutton, Vivek Veeriah
    Comments: preliminary draft
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Representations are fundamental to Artificial Intelligence. Typically, the
    performance of a learning system depends on its data representation. These data
    representations are usually hand-engineered based on some prior domain
    knowledge regarding the task. More recently, the trend is to learn these
    representations through deep neural networks as these can produce dramatical
    performance improvements over hand-engineered data representations. In this
    paper, we present a new incremental learning algorithm, called crossprop, for
    learning representations based on prior learning experiences. Unlike
    backpropagation, crossprop minimizes the cross-validation error. Specifically,
    our algorithm considers the influences of all the past weights on the current
    squared error, and uses this gradient for incrementally learning the weights in
    a neural network. This idea is similar to that of tuning the learning system
    through an offline cross-validation procedure. Crossprop is applicable to
    incremental learning tasks, where a sequence of examples are encountered by the
    learning system and they need to be processed one by one and then discarded.
    The learning system can use each example only once and can spend only a limited
    amount of computation for an example. From our preliminary experiments, we
    concluce that crossprop is a promising alternative for backprop for
    representation learning.

    Tensor-Factor Analysis

    Andrew Stevens, Yunchen Pu, Yannan Sun, Greg Spell, Lawrence Carin
    Comments: Rejected from ICML 2016
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    A nonparametric factor analysis framework is developed for tensor-variate
    data. The dictionary elements (factor loadings) are inferred as tensors, as
    opposed to vectors. Our tensor-factor analysis (TFA) framework is developed in
    the context of the beta-process factor analysis (BPFA) using a nonparametric
    tensor decomposition for each dictionary element. We extend the multiplicative
    gamma prior to allow inference of the shape parameter, which can help control
    model complexity during learning. Moreover, we extend the TFA model to include
    translation invariance and a multi-layer structure—a deep convolutional TFA.
    We test our approach on 2D & 3D image denoising, inpainting, and image
    classification. Our TFA models provide more diverse dictionaries. In
    particular, TFA provides state of the art results for simultaneous image
    inpainting & denoising and on the Caltech 101 benchmark.

    Robust Low-Complexity Randomized Methods for Locating Outliers in Large Matrices

    Xingguo Li, Jarvis Haupt
    Comments: 16 pages, 4 figures
    Subjects: Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)

    This paper examines the problem of locating outlier columns in a large,
    otherwise low-rank matrix, in settings where {}{the data} are noisy, or where
    the overall matrix has missing elements. We propose a randomized two-step
    inference framework, and establish sufficient conditions on the required sample
    complexities under which these methods succeed (with high probability) in
    accurately locating the outliers for each task. Comprehensive numerical
    experimental results are provided to verify the theoretical bounds and
    demonstrate the computational efficiency of the proposed algorithm.


    Information Theory

    Low-rank matrix recovery via rank one tight frame measurements

    Holger Rauhut, Ulrich Terstiege
    Comments: 3 pages
    Subjects: Information Theory (cs.IT); Probability (math.PR)

    The task of reconstructing a low rank matrix from incomplete linear
    measurements arises in areas such as machine learning, quantum state tomography
    and in the phase retrieval problem. In this note, we study the particular setup
    that the measurements are taken with respect to rank one matrices constructed
    from the elements of a random tight frame. We consider a convex optimization
    approach and show both robustness of the reconstruction with respect to noise
    on the measurements as well as stability with respect to passing to
    approximately low rank matrices. This is achieved by establishing a version of
    the null space property of the corresponding measurement map.

    Blind Identification of SM and Alamouti STBC-OFDM Signals

    Yahia A. Eldemerdash, Octavia A. Dobre, Bruce J. Liao
    Subjects: Information Theory (cs.IT)

    This paper proposes an efficient identification algorithm for spatial
    multiplexing (SM) and Alamouti (AL) coded orthogonal frequency division
    multiplexing (OFDM) signals. The cross-correlation between the received signals
    from different antennas is exploited to provide a discriminating feature to
    identify SM-OFDM and AL-OFDM signals. The proposed algorithm requires neither
    estimation of the channel coefficients and noise power, nor the modulation of
    the transmitted signal. Moreover, it does not need space-time block code (STBC)
    or OFDM block synchronization. The effectiveness of the proposed algorithm is
    demonstrated through extensive simulation experiments in the presence of
    diverse transmission impairments, such as time and frequency offsets, Doppler
    frequency, and spatially correlated fading.

    Hybrid Analog-Digital Transceiver Designs for Cognitive Large-Scale Antenna Array Systems

    Christos G. Tsinos, Sina Maleki, Symeon Chatzinotas, Bjorn Ottersten
    Comments: Millimeter wave (mmWave), Cognitive Radio, Multiple-Input Multiple-Output (MIMO), Large-scale antenna arrays, Hybrid Pre-coding, Alternating Direction Method of Multipliers (ADMM)
    Subjects: Information Theory (cs.IT)

    Milimeter wave (mmWave) band mobile communications can be a solution to the
    continuously increasing traffic demand in modern wireless systems. Even though
    mmWave bands are scarcely occupied, the design of a prospect transceiver should
    guarantee the efficient coexistence with the incumbent services in these bands.
    To that end, in this paper, two underlay cognitive transceiver designs are
    proposed that enable the mmWave spectrum access while controlling the
    interference to the incumbent users. MmWave systems usually require large
    antenna arrays to achieve satisfactory performance and thus, they cannot
    support fully digital transceiver designs due to high demands in hardware
    complexity and power consumption. Thus, in order to develop efficient
    solutions, the proposed approaches are based on a hybrid analog-digital
    pre-coding architecture. In such hybrid designs, the overall beamformer can be
    factorized in a low dimensional digital counterpart applied in the baseband and
    in an analog one applied in the RF domain. The first cognitive solution
    developed in this paper designs the cognitive hybrid pre-coder by maximizing
    the mutual information between its two ends subject to interference, power and
    hardware constraints related to the analog counterpart. The second solution
    aims at reduced complexity requirements and thus derives the hybrid pre-coder
    by minimizing the Frobenious norm of its difference to the optimal digital only
    one. A novel solution for the post-coder at the cognitive receiver part is
    further proposed here based on a hardware constrained Minimum Mean Square Error
    criterion. Simulations show that the performance of both the proposed hybrid
    approaches is very close to the one of the fully digital solution for typical
    wireless environments.

    A series of maximum entropy upper bounds of the differential entropy

    Frank Nielsen, Richard Nock
    Comments: 18 pages
    Subjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We present a series of closed-form maximum entropy upper bounds for the
    differential entropy of a continuous univariate random variable and study the
    properties of that series. We then show how to use those generic bounds for
    upper bounding the differential entropy of Gaussian mixture models. This
    requires to calculate the raw moments and raw absolute moments of Gaussian
    mixtures in closed-form that may also be handy in statistical machine learning
    and information theory. We report on our experiments and discuss on the
    tightness of those bounds.

    MIMO Channel Reconstruction from Lower Dimensional Multiple Antenna Measurements

    Rimvydas Aleksiejunas
    Comments: 16 pages, 6 figures
    Subjects: Information Theory (cs.IT)

    A method for reconstructing multiple-input multiple-output (MIMO) channel
    correlation matrices from lower dimensional channel measurements is presented.
    Exploiting the symmetry of correlation matrix structure enables reproducing
    higher dimensional MIMO channel matrices from available lower order
    measurements. This leads to practically important applications allowing
    prediction of higher dimensional MIMO system capacity. In particular, we study
    Kronecker-type MIMO channels suitable for reconstructing full channel matrices
    from partial information about transmit-receive fading in spatial and
    polarimetric domains and analyze validity conditions for such models. One of
    the important channel conditions is Doppler frequency related to
    non-stationarity in the environment. We present simulations of cluster-type
    scattering model using 2×2 MIMO channel correlation matrices to predict
    performance of 2×4 MIMO system including recovery of angular power spectrum. An
    example of dual circular polarized 2×4 MIMO land mobile satellite measurements
    in 2.5 GHz frequency band illustrates applicability of the method to
    reconstruct spatial and polarimetric channel correlation matrices for
    estimating ergodic channel capacity from single-antenna or uni-polarized
    measurements.

    A Compressive Method for Centralized PSD Map Construction

    Mohammad Eslami, Farah Torkamani-Azar, Esfandiar Mehrshahi
    Comments: The 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), Ph.D. Forum
    Subjects: Information Theory (cs.IT)

    Spectrum resources are facing huge demands and cognitive radio (CR) can
    improve the spectrum utilization. Recently, power spectral density (PSD) map is
    defined to enable the CR to reuse the frequency resources regarding to the
    area. For this reason, the sensed PSDs are fused by a Fusion Center (FC) which
    the sensed PSDs are collected by the distributed sensors in the area. But, for
    a given zone, the sensed PSD by neighbor CR sensors may contain a shared common
    component for a while. This component can be exploited in the theory of the
    distributed source coding (DSC) to compress sensing data more. In this paper
    based on the distributed compressive sensing (DCS) a method is proposed to
    compress and reconstruct the PSDs of the sensors when the data transmission is
    slightly imperfect. Simulation results show the advantages of using proposed
    method in compressing, reducing overhead and also recovering PSDs. % Proposed
    method can be used to develop a framework when the holding times of the users
    are large in comparison with the rate of the spectrum sensing.

    Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing

    Constantinos Daskalakis, Qinxuan Pan
    Subjects: Learning (cs.LG); Information Theory (cs.IT); Probability (math.PR); Statistics Theory (math.ST); Machine Learning (stat.ML)

    We show that the square Hellinger distance between two Bayesian networks on
    the same directed graph, (G), is subadditive with respect to the neighborhoods
    of (G). Namely, if (P) and (Q) are the probability distributions defined by two
    Bayesian networks on the same DAG, our inequality states that the square
    Hellinger distance, (H^2(P,Q)), between (P) and (Q) is upper bounded by the
    sum, (sum_v H^2(P_{{v} cup Pi_v}, Q_{{v} cup Pi_v})), of the square
    Hellinger distances between the marginals of (P) and (Q) on every node (v) and
    its parents (Pi_v) in the DAG. Importantly, our bound does not involve the
    conditionals but the marginals of (P) and (Q). We derive a similar inequality
    for more general Markov Random Fields.

    As an application of our inequality, we show that distinguishing whether two
    Bayesian networks (P) and (Q) on the same (but potentially unknown) DAG satisfy
    (P=Q) vs (d_{
    m TV}(P,Q)>epsilon) can be performed from
    ( ilde{O}(|Sigma|^{3/4(d+1)} cdot n/epsilon^2)) samples, where (d) is the
    maximum in-degree of the DAG and (Sigma) the domain of each variable of the
    Bayesian networks. If (P) and (Q) are defined on potentially different and
    potentially unknown trees, the sample complexity becomes
    ( ilde{O}(|Sigma|^{4.5} n/epsilon^2)), whose dependence on (n, epsilon) is
    optimal up to logarithmic factors. Lastly, if (P) and (Q) are product
    distributions over ({0,1}^n) and (Q) is known, the sample complexity becomes
    (O(sqrt{n}/epsilon^2)), which is optimal up to constant factors.

    Testing Bayesian Networks

    Clement Canonne, Ilias Diakonikolas, Daniel Kane, Alistair Stewart
    Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST)

    This work initiates a systematic investigation of testing {em
    high-dimensional} structured distributions by focusing on testing {em Bayesian
    networks} — the prototypical family of directed graphical models. A Bayesian
    network is defined by a directed acyclic graph, where we associate a random
    variable with each node. The value at any particular node is conditionally
    independent of all the other non-descendant nodes once its parents are fixed.
    Specifically, we study the properties of identity testing and closeness testing
    of Bayesian networks. Our main contribution is the first non-trivial efficient
    testing algorithms for these problems and corresponding information-theoretic
    lower bounds. For a wide range of parameter settings, our testing algorithms
    have sample complexity {em sublinear} in the dimension and are sample-optimal,
    up to constant factors.

    Testing Ising Models

    Constantinos Daskalakis, Nishanth Dikkala, Gautam Kamath
    Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)

    Given samples from an unknown multivariate distribution (p), is it possible
    to distinguish whether (p) is the product of its marginals versus (p) being far
    from every product distribution? Similarly, is it possible to distinguish
    whether (p) equals a given distribution (q) versus (p) and (q) being far from
    each other? These problems of testing independence and goodness-of-fit have
    received enormous attention in statistics, information theory, and theoretical
    computer science, with sample-optimal algorithms known in several interesting
    regimes of parameters. Unfortunately, it has also been understood that these
    problems become intractable in large dimensions, necessitating exponential
    sample complexity.

    Motivated by the exponential lower bounds for general distributions as well
    as the ubiquity of Markov Random Fields (MRFs) in the modeling of
    high-dimensional distributions, we initiate the study of distribution testing
    on structured multivariate distributions, and in particular the prototypical
    example of MRFs: the Ising Model. We demonstrate that, in this structured
    setting, we can avoid the curse of dimensionality, obtaining sample and time
    efficient testers for independence and goodness-of-fit. Along the way, we
    develop new tools for establishing concentration of functions of the Ising
    model, using the exchangeable pairs framework developed by Chatterjee, and
    improving upon this framework. In particular, we prove tighter concentration
    results for multi-linear functions of the Ising model in the high-temperature
    regime.




沪ICP备19023445号-2号
友情链接