IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Mon, 13 Feb 2017

    我爱机器学习(52ml.net)发表于 2017-02-13 00:00:00
    love 0

    Neural and Evolutionary Computing

    Stochastic Configuration Networks: Fundamentals and Algorithms

    Dianhui Wang, Ming Li
    Comments: 19 pages, 4 figures and 28 references
    Subjects: Neural and Evolutionary Computing (cs.NE)

    This paper contributes to a development of randomized methods for neural
    networks. The proposed learner model is generated incrementally by stochastic
    configuration (SC) algorithms, termed as Stochastic Configuration Networks
    (SCNs). In contrast to the existing randomised learning algorithms for single
    layer feed-forward neural networks (SLFNNs), we randomly assign the input
    weights and biases of the hidden nodes in the light of a supervisory mechanism,
    and the output weights are analytically evaluated in either constructive or
    selective manner. As fundamentals of SCN-based data modelling techniques, we
    establish some theoretical results on the universal approximation property.
    Three versions of SC algorithms are presented for regression problems
    (applicable for classification problems as well) in this work. Simulation
    results concerning both function approximation and real world data regression
    indicate some remarkable merits of our proposed SCNs in terms of less human
    intervention on the network size setting, the scope adaptation of random
    parameters, fast learning and sound generalization.

    A Deterministic and Generalized Framework for Unsupervised Learning with Restricted Boltzmann Machines

    Eric W. Tramel, Marylou Gabrié, Andre Manoel, Francesco Caltagirone, Florent Krzakala
    Subjects: Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Restricted Boltzmann machines (RBMs) are energy-based neural-networks which
    are commonly used as the building blocks for deep architectures neural
    architectures. In this work, we derive a deterministic framework for the
    training, evaluation, and use of RBMs based upon the Thouless-Anderson-Palmer
    (TAP) mean-field approximation of widely-connected systems with weak
    interactions coming from spin-glass theory. While the TAP approach has been
    extensively studied for fully-visible binary spin systems, our construction is
    generalized to latent-variable models, as well as to arbitrarily distributed
    real-valued spin systems with bounded support. In our numerical experiments, we
    demonstrate the effective deterministic training of our proposed models and are
    able to show interesting features of unsupervised learning which could not be
    directly observed with sampling. Additionally, we demonstrate how to utilize
    our TAP-based framework for leveraging trained RBMs as joint priors in
    denoising problems.

    Arabic Language Sentiment Analysis on Health Services

    Abdulaziz M. Alayba, Vasile Palade, Matthew England, Rahat Iqbal
    Comments: Authors accepted version of submission for ASAR 2017
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Social and Information Networks (cs.SI)

    The social media network phenomenon leads to a massive amount of valuable
    data that is available online and easy to access. Many users share images,
    videos, comments, reviews, news and opinions on different social networks
    sites, with Twitter being one of the most popular ones. Data collected from
    Twitter is highly unstructured, and extracting useful information from tweets
    is a challenging task. Twitter has a huge number of Arabic users who mostly
    post and write their tweets using the Arabic language. While there has been a
    lot of research on sentiment analysis in English, the amount of researches and
    datasets in Arabic language is limited. This paper introduces an Arabic
    language dataset which is about opinions on health services and has been
    collected from Twitter. The paper will first detail the process of collecting
    the data from Twitter and also the process of filtering, pre-processing and
    annotating the Arabic text in order to build a big sentiment analysis dataset
    in Arabic. Several Machine Learning algorithms (Naive Bayes, Support Vector
    Machine and Logistic Regression) alongside Deep and Convolutional Neural
    Networks were utilized in our experiments of sentiment analysis on our health
    dataset.

    Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

    Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, Yurong Chen
    Comments: Accepted as a conference track paper by ICLR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    This paper presents incremental network quantization (INQ), a novel method,
    targeting to efficiently convert any pre-trained full-precision convolutional
    neural network (CNN) model into a low-precision version whose weights are
    constrained to be either powers of two or zero. Unlike existing methods which
    are struggled in noticeable accuracy loss, our INQ has the potential to resolve
    this issue, as benefiting from two innovations. On one hand, we introduce three
    interdependent operations, namely weight partition, group-wise quantization and
    re-training. A well-proven measure is employed to divide the weights in each
    layer of a pre-trained CNN model into two disjoint groups. The weights in the
    first group are responsible to form a low-precision base, thus they are
    quantized by a variable-length encoding method. The weights in the other group
    are responsible to compensate for the accuracy loss from the quantization, thus
    they are the ones to be re-trained. On the other hand, these three operations
    are repeated on the latest re-trained group in an iterative manner until all
    the weights are converted into low-precision ones, acting as an incremental
    network quantization and accuracy enhancement procedure. Extensive experiments
    on the ImageNet classification task using almost all known deep CNN
    architectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the
    efficacy of the proposed method. Specifically, at 5-bit quantization, our
    models have improved accuracy than the 32-bit floating-point references. Taking
    ResNet-18 as an example, we further show that our quantized models with 4-bit,
    3-bit and 2-bit ternary weights have improved or very similar accuracy against
    its 32-bit floating-point baseline. Besides, impressive results with the
    combination of network pruning and INQ are also reported. The code will be made
    publicly available.


    Computer Vision and Pattern Recognition

    Dual-Tree Wavelet Scattering Network with Parametric Log Transformation for Object Classification

    Amarjot Singh, Nick Kingsbury
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We introduce a ScatterNet that uses a parametric log transformation with
    Dual-Tree complex wavelets to extract translation invariant representations
    from a multi-resolution image. The parametric transformation aids the OLS
    pruning algorithm by converting the skewed distributions into relatively
    mean-symmetric distributions while the Dual-Tree wavelets improve the
    computational efficiency of the network. The proposed network is shown to
    outperform Mallat’s ScatterNet on two image datasets, both for classification
    accuracy and computational efficiency. The advantages of the proposed network
    over other supervised and some unsupervised methods are also presented using
    experiments performed on different training dataset sizes.

    A clustering approach to heterogeneous change detection

    Luigi Tommaso Luppino, Stian Normann Anfinsen, Gabriele Moser, Robert Jenssen, Filippo Maria Bianchi, Sebastiano Serpico, Gregoire Mercier
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Change detection in heterogeneous multitemporal satellite images is a
    challenging and still not much studied topic in remote sensing and earth
    observation. This paper focuses on comparison of image pairs covering the same
    geographical area and acquired by two different sensors, one optical radiometer
    and one synthetic aperture radar, at two different times. We propose a
    clustering-based technique to detect changes, identified as clusters that split
    or merge in the different images. To evaluate potentials and limitations of our
    method, we perform experiments on real data. Preliminary results confirm the
    relationship between splits and merges of clusters and the occurrence of
    changes. However, it becomes evident that it is necessary to incorporate prior,
    ancillary, or application-specific information to improve the interpretation of
    clustering results and to identify unambiguously the areas of change.

    Texture Characterization by Using Shape Co-occurrence Patterns

    Gui-Song Xia, Gang Liu, Xiang Bai, Liangpei Zhang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Texture characterization is a key problem in image understanding and pattern
    recognition. In this paper, we present a flexible shape-based texture
    representation using shape co-occurrence patterns. More precisely, texture
    images are first represented by tree of shapes, each of which is associated
    with several geometrical and radiometric attributes. Then four typical kinds of
    shape co-occurrence patterns based on the hierarchical relationship of the
    shapes in the tree are learned as codewords. Three different coding methods are
    investigated to learn the codewords, with which, any given texture image can be
    encoded into a descriptive vector. In contrast with existing works, the
    proposed method not only inherits the strong ability to depict geometrical
    aspects of textures and the high robustness to variations of imaging conditions
    from the shape-based method, but also provides a flexible way to consider shape
    relationships and to compute high-order statistics on the tree. To our
    knowledge, this is the first time to use co-occurrence patterns of explicit
    shapes as a tool for texture analysis. Experiments on various texture datasets
    and scene datasets demonstrate the efficiency of the proposed method.

    Graph Fourier Transform with Negative Edges for Depth Image Coding

    Weng-Tai Su, Gene Cheung, Chia-Wen Lin
    Comments: 5 pages, submitted to submitted to IEEE International Conference on Image Processing, Beijing, China, September, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent advent in graph signal processing (GSP) has led to the development of
    new graph-based transforms and wavelets for image / video coding, where the
    underlying graph describes inter-pixel correlations. In this paper, we develop
    a new transform called signed graph Fourier transform (SGFT), where the
    underlying graph G contains negative edges that describe anti-correlations
    between pixel pairs. Specifically, we first construct a one-state Markov
    process that models both inter-pixel correlations and anti-correlations. We
    then derive the corresponding precision matrix, and show that the loopy graph
    Laplacian matrix Q of a graph G with a negative edge and two self-loops at its
    end nodes is approximately equivalent. This proves that the eigenvectors of Q –
    called SGFT – approximates the optimal Karhunen-Lo`eve Transform (KLT). We show
    the importance of the self-loops in G to ensure Q is positive semi-definite. We
    prove that the first eigenvector of Q is piecewise constant (PWC), and thus can
    well approximate a piecewise smooth (PWS) signal like a depth image.
    Experimental results show that a block-based coding scheme based on SGFT
    outperforms a previous scheme using graph transforms with only positive edges
    for several depth images.

    Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

    Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, Yurong Chen
    Comments: Accepted as a conference track paper by ICLR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    This paper presents incremental network quantization (INQ), a novel method,
    targeting to efficiently convert any pre-trained full-precision convolutional
    neural network (CNN) model into a low-precision version whose weights are
    constrained to be either powers of two or zero. Unlike existing methods which
    are struggled in noticeable accuracy loss, our INQ has the potential to resolve
    this issue, as benefiting from two innovations. On one hand, we introduce three
    interdependent operations, namely weight partition, group-wise quantization and
    re-training. A well-proven measure is employed to divide the weights in each
    layer of a pre-trained CNN model into two disjoint groups. The weights in the
    first group are responsible to form a low-precision base, thus they are
    quantized by a variable-length encoding method. The weights in the other group
    are responsible to compensate for the accuracy loss from the quantization, thus
    they are the ones to be re-trained. On the other hand, these three operations
    are repeated on the latest re-trained group in an iterative manner until all
    the weights are converted into low-precision ones, acting as an incremental
    network quantization and accuracy enhancement procedure. Extensive experiments
    on the ImageNet classification task using almost all known deep CNN
    architectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the
    efficacy of the proposed method. Specifically, at 5-bit quantization, our
    models have improved accuracy than the 32-bit floating-point references. Taking
    ResNet-18 as an example, we further show that our quantized models with 4-bit,
    3-bit and 2-bit ternary weights have improved or very similar accuracy against
    its 32-bit floating-point baseline. Besides, impressive results with the
    combination of network pruning and INQ are also reported. The code will be made
    publicly available.

    Reconstruction for Feature Disentanglement in Pose-invariant Face Recognition

    Xi Peng, Xiang Yu, Kihyuk Sohn, Dimitris Metaxas, Manmohan Chandraker
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep neural networks (DNNs) trained on large-scale datasets have recently
    achieved impressive improvements in face recognition. But a persistent
    challenge remains to develop methods capable of handling large pose variations
    that are relatively under-represented in training data. This paper presents a
    method for learning a feature representation that is invariant to pose, without
    requiring extensive pose coverage in training data. We first propose to use a
    synthesis network for generating non-frontal views from a single frontal image,
    in order to increase the diversity of training data while preserving accurate
    facial details that are critical for identity discrimination. Our next
    contribution is a multi-source multi-task DNN that seeks a rich embedding
    representing identity information, as well as information such as pose and
    landmark locations. Finally, we propose a Siamese network to explicitly
    disentangle identity and pose, by demanding alignment between the feature
    reconstructions through various combinations of identity and pose features
    obtained from two images of the same subject. Experiments on face datasets in
    both controlled and wild scenarios, such as MultiPIE, LFW and 300WLP, show that
    our method consistently outperforms the state-of-the-art, especially on images
    with large head pose variations.

    A New Rank Constraint on Multi-view Fundamental Matrices, and its Application to Camera Location Recovery

    Soumyadip Sengupta, Tal Amir, Meirav Galun, Tom Goldstein, David W. Jacobs, Amit Singer, Ronen Basri
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Accurate estimation of camera matrices is an important step in structure from
    motion algorithms. In this paper we introduce a novel rank constraint on
    collections of fundamental matrices in multi-view settings. We show that in
    general, with the selection of proper scale factors, a matrix formed by
    stacking fundamental matrices between pairs of images has rank 6. Moreover,
    this matrix forms the symmetric part of a rank 3 matrix whose factors relate
    directly to the corresponding camera matrices. We use this new characterization
    to produce better estimations of fundamental matrices by optimizing an L1-cost
    function using Iterative Re-weighted Least Squares and Alternate Direction
    Method of Multiplier. We further show that this procedure can improve the
    recovery of camera locations, particularly in multi-view settings in which
    fewer images are available.

    A large comparison of feature-based approaches for buried target classification in forward-looking ground-penetrating radar

    Joseph A. Camilo, Leslie M. Collins, Jordan M. Malof
    Comments: 11 pages, 14 figures, for submission to IEEE TGARS
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Forward-looking ground-penetrating radar (FLGPR) has recently been
    investigated as a remote sensing modality for buried target detection (e.g.,
    landmines). In this context, raw FLGPR data is beamformed into images and then
    computerized algorithms are applied to automatically detect subsurface buried
    targets. Most existing algorithms are supervised, meaning they are trained to
    discriminate between labeled target and non-target imagery, usually based on
    features extracted from the imagery. A large number of features have been
    proposed for this purpose, however thus far it is unclear which are the most
    effective. The first goal of this work is to provide a comprehensive comparison
    of detection performance using existing features on a large collection of FLGPR
    data. Fusion of the decisions resulting from processing each feature is also
    considered. The second goal of this work is to investigate two modern feature
    learning approaches from the object recognition literature: the
    bag-of-visual-words and the Fisher vector for FLGPR processing. The results
    indicate that the new feature learning approaches outperform existing methods.
    Results also show that fusion between existing features and new features yields
    little additional performance improvements.


    Artificial Intelligence

    Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning

    Jason D. Williams, Kavosh Asadi, Geoffrey Zweig
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    End-to-end learning of recurrent neural networks (RNNs) is an attractive
    solution for dialog systems; however, current techniques are data-intensive and
    require thousands of dialogs to learn simple behaviors. We introduce Hybrid
    Code Networks (HCNs), which combine an RNN with domain-specific knowledge
    encoded as software and system action templates. Compared to existing
    end-to-end approaches, HCNs considerably reduce the amount of training data
    required, while retaining the key benefit of inferring a latent representation
    of dialog state. In addition, HCNs can be optimized with supervised learning,
    reinforcement learning, or a mixture of both. HCNs attain state-of-the-art
    performance on the bAbI dialog dataset, and outperform two commercially
    deployed customer-facing dialog systems.

    Modeling Semantic Expectation: Using Script Knowledge for Referent Prediction

    Ashutosh Modi, Ivan Titov, Vera Demberg, Asad Sayeed, Manfred Pinkal
    Comments: 14 pages, published at TACL, 2017, Volume-5, Pg 31-44, 2017
    Journal-ref: Transactions of ACL, Volume-5, Pg 31-44 (2017)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Recent research in psycholinguistics has provided increasing evidence that
    humans predict upcoming content. Prediction also affects perception and might
    be a key to robustness in human language processing. In this paper, we
    investigate the factors that affect human prediction by building a
    computational model that can predict upcoming discourse referents based on
    linguistic knowledge alone vs. linguistic knowledge jointly with common-sense
    knowledge in the form of scripts. We find that script knowledge significantly
    improves model estimates of human predictions. In a second study, we test the
    highly controversial hypothesis that predictability influences referring
    expression type but do not find evidence for such an effect.

    Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

    Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, Yurong Chen
    Comments: Accepted as a conference track paper by ICLR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    This paper presents incremental network quantization (INQ), a novel method,
    targeting to efficiently convert any pre-trained full-precision convolutional
    neural network (CNN) model into a low-precision version whose weights are
    constrained to be either powers of two or zero. Unlike existing methods which
    are struggled in noticeable accuracy loss, our INQ has the potential to resolve
    this issue, as benefiting from two innovations. On one hand, we introduce three
    interdependent operations, namely weight partition, group-wise quantization and
    re-training. A well-proven measure is employed to divide the weights in each
    layer of a pre-trained CNN model into two disjoint groups. The weights in the
    first group are responsible to form a low-precision base, thus they are
    quantized by a variable-length encoding method. The weights in the other group
    are responsible to compensate for the accuracy loss from the quantization, thus
    they are the ones to be re-trained. On the other hand, these three operations
    are repeated on the latest re-trained group in an iterative manner until all
    the weights are converted into low-precision ones, acting as an incremental
    network quantization and accuracy enhancement procedure. Extensive experiments
    on the ImageNet classification task using almost all known deep CNN
    architectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the
    efficacy of the proposed method. Specifically, at 5-bit quantization, our
    models have improved accuracy than the 32-bit floating-point references. Taking
    ResNet-18 as an example, we further show that our quantized models with 4-bit,
    3-bit and 2-bit ternary weights have improved or very similar accuracy against
    its 32-bit floating-point baseline. Besides, impressive results with the
    combination of network pruning and INQ are also reported. The code will be made
    publicly available.

    Multi-agent Reinforcement Learning in Sequential Social Dilemmas

    Joel Z. Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, Thore Graepel
    Comments: 10 pages, 7 figures
    Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Learning (cs.LG)

    Matrix games like Prisoner’s Dilemma have guided research on social dilemmas
    for decades. However, they necessarily treat the choice to cooperate or defect
    as an atomic action. In real-world social dilemmas these choices are temporally
    extended. Cooperativeness is a property that applies to policies, not
    elementary actions. We introduce sequential social dilemmas that share the
    mixed incentive structure of matrix game social dilemmas but also require
    agents to learn policies that implement their strategic intentions. We analyze
    the dynamics of policies learned by multiple self-interested independent
    learning agents, each using its own deep Q-network, on two Markov games we
    introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We
    characterize how learned behavior in each domain changes as a function of
    environmental factors including resource abundance. Our experiments show how
    conflict can emerge from competition over shared resources and shed light on
    how the sequential nature of real world social dilemmas affects cooperation.

    Learning Semantic Script Knowledge with Event Embeddings

    Ashutosh Modi, Ivan Titov
    Comments: 4 Pages, 1 figure, ICLR Workshop
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)

    Induction of common sense knowledge about prototypical sequences of events
    has recently received much attention. Instead of inducing this knowledge in the
    form of graphs, as in much of the previous work, in our method, distributed
    representations of event realizations are computed based on distributed
    representations of predicates and their arguments, and then these
    representations are used to predict prototypical event orderings. The
    parameters of the compositional process for computing the event representations
    and the ranking component of the model are jointly estimated from texts. We
    show that this approach results in a substantial boost in ordering performance
    with respect to previous methods.


    Information Retrieval

    Mining Electronic Health Records: A Survey

    Pranjul Yadav, Michael Steinbach, Vipin Kumar, Gyorgy Simon
    Subjects: Information Retrieval (cs.IR)

    The continuously increasing cost of the US healthcare system has received
    significant attention. Central to the ideas aimed at curbing this trend is the
    use of technology, in the form of the mandate to implement electronic health
    records (EHRs). EHRs consist of patient information such as demographics,
    medications, laboratory test results, diagnosis codes and procedures. Mining
    EHRs could lead to improvement in patient health management as EHRs contain
    detailed information related to disease prognosis for large patient
    populations. In this manuscript, we provide a structured and comprehensive
    overview of data mining techniques for modeling EHR data. We first provide a
    detailed understanding of the major application areas to which EHR mining has
    been applied and then discuss the nature of EHR data and its accompanying
    challenges. Next, we describe major approaches used for EHR mining, the metrics
    associated with EHRs, and the various study designs. With this foundation, we
    then provide a systematic and methodological organization of existing data
    mining techniques used to model EHRs and discuss ideas for future research. We
    conclude this survey with a comprehensive summary of clinical data mining
    applications of EHR data, as illustrated in the online supplement.


    Computation and Language

    Arabic Language Sentiment Analysis on Health Services

    Abdulaziz M. Alayba, Vasile Palade, Matthew England, Rahat Iqbal
    Comments: Authors accepted version of submission for ASAR 2017
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Social and Information Networks (cs.SI)

    The social media network phenomenon leads to a massive amount of valuable
    data that is available online and easy to access. Many users share images,
    videos, comments, reviews, news and opinions on different social networks
    sites, with Twitter being one of the most popular ones. Data collected from
    Twitter is highly unstructured, and extracting useful information from tweets
    is a challenging task. Twitter has a huge number of Arabic users who mostly
    post and write their tweets using the Arabic language. While there has been a
    lot of research on sentiment analysis in English, the amount of researches and
    datasets in Arabic language is limited. This paper introduces an Arabic
    language dataset which is about opinions on health services and has been
    collected from Twitter. The paper will first detail the process of collecting
    the data from Twitter and also the process of filtering, pre-processing and
    annotating the Arabic text in order to build a big sentiment analysis dataset
    in Arabic. Several Machine Learning algorithms (Naive Bayes, Support Vector
    Machine and Logistic Regression) alongside Deep and Convolutional Neural
    Networks were utilized in our experiments of sentiment analysis on our health
    dataset.

    Universal Semantic Parsing

    Siva Reddy, Oscar Täckström, Slav Petrov, Mark Steedman, Mirella Lapata
    Comments: 15 pages with supplementary
    Subjects: Computation and Language (cs.CL)

    Universal Dependencies (UD) provides a cross-linguistically uniform syntactic
    representation, with the aim of advancing multilingual applications of parsing
    and natural language understanding. Reddy et al. (2016) recently developed a
    semantic interface for (English) Stanford Dependencies, based on the lambda
    calculus. In this work, we introduce UDepLambda, a similar semantic interface
    for UD, which allows mapping natural language to logical forms in an almost
    language-independent framework. We evaluate our approach on semantic parsing
    for the task of question answering against Freebase. To facilitate multilingual
    evaluation, we provide German and Spanish translations of the WebQuestions and
    GraphQuestions datasets. Results show that UDepLambda outperforms strong
    baselines across languages and datasets. For English, it achieves the strongest
    result to date on GraphQuestions, with competitive results on WebQuestions.

    Modeling Semantic Expectation: Using Script Knowledge for Referent Prediction

    Ashutosh Modi, Ivan Titov, Vera Demberg, Asad Sayeed, Manfred Pinkal
    Comments: 14 pages, published at TACL, 2017, Volume-5, Pg 31-44, 2017
    Journal-ref: Transactions of ACL, Volume-5, Pg 31-44 (2017)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Recent research in psycholinguistics has provided increasing evidence that
    humans predict upcoming content. Prediction also affects perception and might
    be a key to robustness in human language processing. In this paper, we
    investigate the factors that affect human prediction by building a
    computational model that can predict upcoming discourse referents based on
    linguistic knowledge alone vs. linguistic knowledge jointly with common-sense
    knowledge in the form of scripts. We find that script knowledge significantly
    improves model estimates of human predictions. In a second study, we test the
    highly controversial hypothesis that predictability influences referring
    expression type but do not find evidence for such an effect.

    UsingWord Embedding for Cross-Language Plagiarism Detection

    J. Ferrero, F. Agnes, L. Besacier, D. Schwab
    Comments: Accepted to EACL 2017 (short)
    Subjects: Computation and Language (cs.CL)

    This paper proposes to use distributed representation of words (word
    embeddings) in cross-language textual similarity detection. The main
    contributions of this paper are the following: (a) we introduce new
    cross-language similarity detection methods based on distributed representation
    of words; (b) we combine the different methods proposed to verify their
    complementarity and finally obtain an overall F1 score of 89.15% for
    English-French similarity detection at chunk level (88.5% at sentence level) on
    a very challenging corpus.

    Local System Voting Feature for Machine Translation System Combination

    Markus Freitag, Jan-Thorsten Peter, Stephan Peitz, Minwei Feng, Hermann Ney
    Comments: published WMT 2015
    Subjects: Computation and Language (cs.CL)

    In this paper, we enhance the traditional confusion network system
    combination approach with an additional model trained by a neural network. This
    work is motivated by the fact that the commonly used binary system voting
    models only assign each input system a global weight which is responsible for
    the global impact of each input system on all translations. This prevents
    individual systems with low system weights from having influence on the system
    combination output, although in some situations this could be helpful. Further,
    words which have only been seen by one or few systems rarely have a chance of
    being present in the combined output. We train a local system voting model by a
    neural network which is based on the words themselves and the combinatorial
    occurrences of the different system outputs. This gives system combination the
    option to prefer other systems at different word positions even for the same
    sentence.

    Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning

    Jason D. Williams, Kavosh Asadi, Geoffrey Zweig
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    End-to-end learning of recurrent neural networks (RNNs) is an attractive
    solution for dialog systems; however, current techniques are data-intensive and
    require thousands of dialogs to learn simple behaviors. We introduce Hybrid
    Code Networks (HCNs), which combine an RNN with domain-specific knowledge
    encoded as software and system action templates. Compared to existing
    end-to-end approaches, HCNs considerably reduce the amount of training data
    required, while retaining the key benefit of inferring a latent representation
    of dialog state. In addition, HCNs can be optimized with supervised learning,
    reinforcement learning, or a mixture of both. HCNs attain state-of-the-art
    performance on the bAbI dialog dataset, and outperform two commercially
    deployed customer-facing dialog systems.


    Distributed, Parallel, and Cluster Computing

    D4M 3.0

    Lauren Milechin, Alexander Chen, Vijay Gadepally, Dylan Hutchison, Siddharth Samsi, Jeremy Kepner
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)

    The D4M tool is used by hundreds of researchers to perform complex analytics
    on unstructured data. Over the past few years, the D4M toolbox has evolved to
    support connectivity with a variety of database engines, graph analytics in the
    Apache Accumulo database, and an implementation using the Julia programming
    language. In this article, we describe some of our latest additions to the D4M
    toolbox and our upcoming D4M 3.0 release.

    Improving the Performance of Fully Connected Neural Networks by Out-of-Place Matrix Transpose

    Shaohuai Shi, Pengfei Xu, Xiaowen Chu
    Comments: 5 pages
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Fully connected network has been widely used in deep learning, and its
    computation efficiency is highly benefited from the matrix multiplication
    algorithm with cuBLAS on GPU. However, We found that, there exist some
    drawbacks of cuBLAS in calculating matrix ( extbf{A}) multiplies the transpose
    of matrix ( extbf{B}) (i.e., NT operation). To reduce the impact of NT
    operation by cuBLAS, we exploit the out-of-place transpose of matrix
    ( extbf{B}) to avoid using NT operation, and then we apply our method to
    Caffe, which is a popular deep learning tool. Our contribution is two-fold.
    First, we propose a naive method (TNN) and model-based method (MTNN) to
    increase the performance in calculating ( extbf{A} imes extbf{B}^T), and it
    achieves about 4.7 times performance enhancement in our tested cases on GTX1080
    card. Second, we integrate MTNN method into Caffe to enhance the efficiency in
    training fully connected networks, which achieves about 70% speedup compared to
    the original Caffe in our configured fully connected networks on GTX1080 card.

    (Leader/Randomization/Signature)-free Byzantine Consensus for Consortium Blockchains

    Tyler Crain, Vincent Gramoli, Mikel Larrea, Michel Raynal
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)

    This paper presents a new Byzantine consensus algorithm targeting consortium
    blockchains. To this end, it first revisits the consensus validity property by
    requiring that the decided value satisfies a predefined predicate, which does
    not systematically exclude a value proposed only by Byzantine processes,
    thereby generalizing the validity properties found in the literature. Then, the
    paper presents a simple and modular Byzantine consensus algorithm that relies
    neither on a leader, nor on signatures, nor on randomization. It features the
    fastest multivalued reduction to binary consensus we know of and a time optimal
    binary Byzantine consensus algorithm. The multivalued reduction runs multiple
    instances of binary consensus concurrently, which result in a bitmask that is
    then applied to a vector of multivalued proposals to filter out a valid
    proposed value that is decided. To ensure eventual decision deterministically,
    the underlying binary consensus algorithm assumes eventual synchrony.

    Finer-grained Locking in Concurrent Dynamic Planar Convex Hulls

    K. Alex Mills, James Smith
    Comments: 4 pages; 2 figures; brief announcement submitted to SPAA 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    The convex hull of a planar point set is the smallest convex polygon
    containing each point in the set. The dynamic convex hull problem concerns
    efficiently maintaining the convex hull of a set of points subject to additions
    and removals. One algorithm for this problem uses two external balanced binary
    search trees (BSTs) (M. H. Overmars, J. van Leeuwen 1981). We present the first
    concurrent solution for this problem, which uses a single BST that stores
    references to intermediate convex hull solutions at each node. We implement and
    evaluate two lock-based approaches: a) fine-grained locking, where each node of
    the tree is protected by a lock, and b) “finer-grained locking”, where each
    node contains a separate lock for each of the left and right chains. In our
    throughput experiments, we observe that finer-grained locking yields an 8-60%
    improvement over fine-grained locking, and a 38-61x improvement over
    coarsegrained locking and software transactional memory (STM). When applied to
    find the convex hull of static point sets, our approach outperforms a parallel
    divide-and-conquer implementation by 2-4x using an equivalent number of
    threads.

    Elastic Resource Management with Adaptive State Space Partitioning of Markov Decision Processes

    Konstantinos Lolos, Ioannis Konstantinou, Verena Kantere, Nectarios Koziris
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Modern large-scale computing deployments consist of complex applications
    running over machine clusters. An important issue in these is the offering of
    elasticity, i.e., the dynamic allocation of resources to applications to meet
    fluctuating workload demands. Threshold based approaches are typically
    employed, yet they are difficult to configure and optimize. Approaches based on
    reinforcement learning have been proposed, but they require a large number of
    states in order to model complex application behavior. Methods that adaptively
    partition the state space have been proposed, but their partitioning criteria
    and strategies are sub-optimal. In this work we present MDP_DT, a novel
    full-model based reinforcement learning algorithm for elastic resource
    management that employs adaptive state space partitioning. We propose two novel
    statistical criteria and three strategies and we experimentally prove that they
    correctly decide both where and when to partition, outperforming existing
    approaches. We experimentally evaluate MDP_DT in a real large scale cluster
    over variable not-encountered workloads and we show that it takes more informed
    decisions compared to static and model-free approaches, while requiring a
    minimal amount of training data.

    Comparative benchmarking of cloud computing vendors with High Performance Linpack

    Mohammad Mohammadi, Timur Bazhirov
    Subjects: Performance (cs.PF); Distributed, Parallel, and Cluster Computing (cs.DC)

    We present a comparative analysis of the maximum performance achieved by the
    Linpack benchmark on compute intensive hardware publicly available from
    multiple cloud providers. We study both performance within a single compute
    node, and speedup for distributed memory calculations with up to 32 nodes or at
    least 512 computing cores. We distinguish between hyper-threaded and
    non-hyper-threaded scenarios and estimate the performance per single computing
    core. We also compare results with a traditional supercomputing system for
    reference. Our findings provide a way to rank the cloud providers and
    demonstrate the viability of the cloud for high performance computing
    applications.


    Learning

    Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models

    Sergey Ioffe
    Subjects: Learning (cs.LG)

    Batch Normalization is quite effective at accelerating and improving the
    training of deep models. However, its effectiveness diminishes when the
    training minibatches are small, or do not consist of independent samples. We
    hypothesize that this is due to the dependence of model layer inputs on all the
    examples in the minibatch, and different activations being produced between
    training and inference. We propose Batch Renormalization, a simple and
    effective extension to ensure that the training and inference models generate
    the same outputs that depend on individual examples rather than the entire
    minibatch. Models trained with Batch Renormalization perform substantially
    better than batchnorm when training with small or non-i.i.d. minibatches. At
    the same time, Batch Renormalization retains the benefits of batchnorm such as
    insensitivity to initialization and training efficiency.

    A Deterministic and Generalized Framework for Unsupervised Learning with Restricted Boltzmann Machines

    Eric W. Tramel, Marylou Gabrié, Andre Manoel, Francesco Caltagirone, Florent Krzakala
    Subjects: Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Restricted Boltzmann machines (RBMs) are energy-based neural-networks which
    are commonly used as the building blocks for deep architectures neural
    architectures. In this work, we derive a deterministic framework for the
    training, evaluation, and use of RBMs based upon the Thouless-Anderson-Palmer
    (TAP) mean-field approximation of widely-connected systems with weak
    interactions coming from spin-glass theory. While the TAP approach has been
    extensively studied for fully-visible binary spin systems, our construction is
    generalized to latent-variable models, as well as to arbitrarily distributed
    real-valued spin systems with bounded support. In our numerical experiments, we
    demonstrate the effective deterministic training of our proposed models and are
    able to show interesting features of unsupervised learning which could not be
    directly observed with sampling. Additionally, we demonstrate how to utilize
    our TAP-based framework for leveraging trained RBMs as joint priors in
    denoising problems.

    Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning

    Stefan Elfwing, Eiji Uchibe, Kenji Doya
    Comments: 18 pages, 21 figures
    Subjects: Learning (cs.LG)

    In recent years, neural networks have enjoyed a renaissance as function
    approximators in reinforcement learning. Two decades after Teasauro’s TD-Gammon
    achieved near top-level human performance in backgammon, the deep reinforcement
    learning algorithm DQN (combining Q-learning with a deep neural network,
    experience replay, and a separate target network) achieved human-level
    performance in many Atari 2600 games. The purpose of this study is twofold.
    First, based on the expected energy restricted Boltzmann machine (EE-RBM), we
    propose two activation functions for neural network function approximation in
    reinforcement learning: the sigmoid-weighted linear (SiL) unit and its
    derivative function (SiLd1). The activation of the SiL unit is computed by the
    sigmoid function multiplied by its input, which is equal to the contribution to
    the output from one hidden unit in an EE-RBM. Second, we suggest that the more
    traditional approach of using on-policy learning with eligibility traces,
    instead of experience replay, and softmax action selection can be competitive
    with DQN, without the need for a separate target network. We validate our
    proposed approach by, first, achieving new state-of-the-art results in both
    stochastic SZ-Tetris and Tetris with a small 10×10 board, using TD((lambda))
    learning and shallow SiLd1 network agents, and, then, outperforming DQN in the
    Atari 2600 domain by using a deep Sarsa((lambda)) agent with SiL and SiLd1
    hidden units.

    Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities

    Ruitong Huang, Tor Lattimore, András György, Csaba Szepesvári
    Subjects: Learning (cs.LG)

    The follow the leader (FTL) algorithm, perhaps the simplest of all online
    learning algorithms, is known to perform well when the loss functions it is
    used on are convex and positively curved. In this paper we ask whether there
    are other “lucky” settings when FTL achieves sublinear, “small” regret. In
    particular, we study the fundamental problem of linear prediction over a
    non-empty convex, compact domain. Amongst other results, we prove that the
    curvature of the boundary of the domain can act as if the losses were curved:
    In this case, we prove that as long as the mean of the loss vectors have
    positive lengths bounded away from zero, FTL enjoys a logarithmic growth rate
    of regret, while, e.g., for polytope domains and stochastic data it enjoys
    finite expected regret. Building on a previously known meta-algorithm, we also
    get an algorithm that simultaneously enjoys the worst-case guarantees and the
    bound available for FTL.

    Multi-step Off-policy Learning Without Importance Sampling Ratios

    Ashique Rupam Mahmood, Huizhen Yu, Richard S. Sutton
    Comments: 24 pages, 4 figures
    Subjects: Learning (cs.LG)

    To estimate the value functions of policies from exploratory data, most
    model-free off-policy algorithms rely on importance sampling, where the use of
    importance sampling ratios often leads to estimates with severe variance. It is
    thus desirable to learn off-policy without using the ratios. However, such an
    algorithm does not exist for multi-step learning with function approximation.
    In this paper, we introduce the first such algorithm based on
    temporal-difference (TD) learning updates. We show that an explicit use of
    importance sampling ratios can be eliminated by varying the amount of
    bootstrapping in TD updates in an action-dependent manner. Our new algorithm
    achieves stability using a two-timescale gradient-based TD update. A prior
    algorithm based on lookup table representation called Tree Backup can also be
    retrieved using action-dependent bootstrapping, becoming a special case of our
    algorithm. In two challenging off-policy tasks, we demonstrate that our
    algorithm is stable, effectively avoids the large variance issue, and can
    perform substantially better than its state-of-the-art counterpart.

    Soft tensegrity robots

    John Rieffel, Jean-Baptiste Mouret
    Subjects: Robotics (cs.RO); Learning (cs.LG); Systems and Control (cs.SY)

    Living organisms intertwine soft (e.g., muscle) and hard (e.g., bones)
    materials, giving them an intrinsic flexibility and resiliency often lacking in
    conventional rigid robots. The emerging field of soft robotics seeks to harness
    these same properties in order to create resilient machines. The nature of soft
    materials, however, presents considerable challenges to aspects of design,
    construction, and control — and up until now, the vast majority of gaits for
    soft robots have been hand-designed through empirical trial-and-error. This
    manuscript describes an easy-to-assemble tensegrity-based soft robot capable of
    highly dynamic locomotive gaits and demonstrating structural and behavioral
    resilience in the face of physical damage. Enabling this is the use of a
    machine learning algorithm able to discover novel gaits with a minimal number
    of physical trials. These results lend further credence to soft-robotic
    approaches that seek to harness the interaction of complex material dynamics in
    order to generate a wealth of dynamical behaviors.

    Multi-agent Reinforcement Learning in Sequential Social Dilemmas

    Joel Z. Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, Thore Graepel
    Comments: 10 pages, 7 figures
    Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Learning (cs.LG)

    Matrix games like Prisoner’s Dilemma have guided research on social dilemmas
    for decades. However, they necessarily treat the choice to cooperate or defect
    as an atomic action. In real-world social dilemmas these choices are temporally
    extended. Cooperativeness is a property that applies to policies, not
    elementary actions. We introduce sequential social dilemmas that share the
    mixed incentive structure of matrix game social dilemmas but also require
    agents to learn policies that implement their strategic intentions. We analyze
    the dynamics of policies learned by multiple self-interested independent
    learning agents, each using its own deep Q-network, on two Markov games we
    introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We
    characterize how learned behavior in each domain changes as a function of
    environmental factors including resource abundance. Our experiments show how
    conflict can emerge from competition over shared resources and shed light on
    how the sequential nature of real world social dilemmas affects cooperation.

    Fixing an error in Caponnetto and de Vito (2007)

    Dougal J. Sutherland
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Statistics Theory (math.ST)

    The seminal paper of Caponnetto and de Vito (2007) provides minimax-optimal
    rates for kernel ridge regression in a very general setting. Its proof,
    however, contains an error in its bound on the effective dimensionality. In
    this note, we explain the mistake, provide a correct bound, and show that the
    main theorem remains true.


    Information Theory

    Multidimensional Index Modulation in Wireless Communications

    Bharath Shamasundar, Swaroop Jacob, Sandeep Bhat, A. Chockalingam
    Subjects: Information Theory (cs.IT)

    In index modulation schemes, information bits are conveyed through indexing
    of transmission entities such as antennas, subcarriers, times slots, precoders,
    subarrays, and radio frequency (RF) mirrors. Index modulation schemes are
    attractive for their advantages such as good performance, high rates, and
    hardware simplicity. This paper focuses on index modulation schemes in which
    multiple transmission entities, namely, {em antennas}, {em time slots}, and
    {em RF mirrors}, are indexed {em simultaneously}. Recognizing that such
    multidimensional index modulation schemes encourage sparsity in their transmit
    signal vectors, we propose efficient signal detection schemes that use
    compressive sensing based reconstruction algorithms. Results show that, for a
    given rate, improved performance is achieved when the number of indexed
    transmission entities is increased. We also explore indexing opportunities in
    {em load modulation}, which is a modulation scheme that offers power
    efficiency and reduced RF hardware complexity advantages in multiantenna
    systems. Results show that indexing space and time in load modulated
    multiantenna systems can achieve improved performance.

    Spatial Oversampling in LOS MIMO Systems with 1-Bit Quantization at the Receiver

    Tim Hälsig, Berthold Lankl
    Comments: Accepted at International ITG SCC 2017
    Subjects: Information Theory (cs.IT)

    In this paper we investigate the achievable rate of LOS MIMO systems that use
    1-bit quantization and spatial oversampling at the receiver. We propose that by
    using additional antennas at the receiver, the loss incurred due to the strong
    limitation of the quantization can be decreased. Mutual information results
    show that considerable rate gains can be achieved depending on the number and
    arrangement of the antennas. In some of the cases, even the full available rate
    from the transmitter can be attained. Furthermore, the results also reveal that
    two-dimensional antenna arrays can benefit more from spatial oversampling than
    one-dimensional arrays, when using 1-bit quantization in the LOS MIMO scenario.

    Performance of Cell-Free Massive MIMO Systems with MMSE and LSFD Receivers

    Elina Nayebi, Alexei Ashikhmin, Thomas L. Marzetta, Bhaskar D. Rao
    Subjects: Information Theory (cs.IT)

    Cell-Free Massive MIMO comprises a large number of distributed single-antenna
    access points (APs) serving a much smaller number of users. There is no
    partitioning into cells and each user is served by all APs.

    In this paper, the uplink performance of cell-free systems with minimum mean
    squared error (MMSE) and large scale fading decoding (LSFD) receivers is
    investigated. The main idea of LSFD receiver is to maximize achievable
    throughput using only large scale fading coefficients between APs and users.
    Capacity lower bounds for MMSE and LSFD receivers are derived. An asymptotic
    approximation for signal-to-interference-plus-noise ratio (SINR) of MMSE
    receiver is derived as a function of large scale fading coefficients only. The
    obtained approximation is accurate even for a small number of antennas. MMSE
    and LSFD receivers demonstrate five-fold and two-fold gains respectively over
    matched filter (MF) receiver in terms of 5%-outage rate.

    Cramér-Rao Lower Bounds for Positioning with Large Intelligent Surfaces

    Sha Hu, Fredrik Rusek, Ove Edfors
    Comments: 5 pages, 7 figures, conference
    Subjects: Information Theory (cs.IT)

    We consider the potential for positioning with a system where antenna arrays
    are deployed as a large intelligent surface (LIS). We derive
    Fisher-informations and Cram'{e}r-Rao lower bounds (CRLB) in closed-form for
    terminals along the central perpendicular line (CPL) of the LIS for all three
    Cartesian dimensions. For terminals at positions other than the CPL,
    closed-form expressions for the Fisher-informations and CRLBs seem out of
    reach, and we alternatively provide approximations (in closed-form) which are
    shown to be very accurate. We also show that under mild conditions, the CRLBs
    in general decrease quadratically in the surface-area for both the (x) and (y)
    dimensions. For the (z)-dimension (distance from the LIS), the CRLB decreases
    linearly in the surface-area when terminals are along the CPL. However, when
    terminals move away from the CPL, the CRLB is dramatically increased and then
    also decreases quadratically in the surface-area. We also extensively discuss
    the impact of different deployments (centralized and distributed) of the LIS.

    The Potential of Using Large Antenna Arrays on Intelligent Surfaces

    Sha Hu, Fredrik Rusek, Ove Edfors
    Comments: 6 pages, 10 figures,conference
    Subjects: Information Theory (cs.IT)

    In this paper, we consider capacities of single-antenna terminals
    communicating to large antenna arrays that are deployed on surfaces. That is,
    the entire surface is used as an intelligent receiving antenna array. Under the
    condition that the surface area is sufficiently large, the received signal
    after matched-filtering (MF) can be well approximated by an intersymbol
    interference (ISI) channel where channel taps are closely related to a sinc
    function. Based on such an approximation, we have derived the capacities for
    both one-dimensional (terminals on a line) and high dimensional (terminals on a
    plane or in a cube) terminal-deployments. In particular, we analyze the
    normalized capacity (ar{mathcal{C}}), measured in nats/s/Hz/m(^2), under the
    constraint that the transmit power per m(^2), (ar{P}), is fixed. We show that
    when the user-density increases, the limit of (ar{mathcal{C}}), achieved as
    the wavelength (lambda) approaches 0, is (ar{P}/(2N_0)) nats/s/Hz/m(^2),
    where (N_0) is the spatial power spectral density (PSD) of noise. In addition,
    we also show that the number of signal dimensions is (2/lambda) per meter
    deployed surface for the one-dimensional case, and (pi/lambda^2) per m(^2)
    deployed surface for two and three dimensional terminal-deployments.

    PCA in Data-Dependent Noise (Correlated-PCA): Nearly Optimal Finite Sample Guarantees

    Namrata Vaswani, Praneeth Narayanamurthy
    Comments: 7 pages, 2 figures
    Subjects: Information Theory (cs.IT); Machine Learning (stat.ML)

    We study Principal Component Analysis (PCA) in a setting where a part of the
    corrupting noise is data-dependent and, hence, the noise and the true data are
    correlated. Under a bounded-ness assumption on both the true data and noise,
    and a few assumptions on the data-noise correlation, we obtain a sample
    complexity bound for the most common PCA solution, singular value decomposition
    (SVD). This bound, which is within a logarithmic factor of the best achievable,
    significantly improves upon our bound from recent work (NIPS 2016) where we
    first studied this “correlated-PCA” problem.

    Sparsity/Undersampling Tradeoffs in Anisotropic Undersampling, with Applications in MR Imaging/Spectroscopy

    Hatef Monajemi, David L. Donoho
    Subjects: Information Theory (cs.IT)

    We study anisotropic undersampling schemes like those used in
    multi-dimensional NMR spectroscopy and MR imaging, which sample exhaustively in
    certain time dimensions and randomly in others.

    Our analysis shows that anisotropic undersampling schemes are equivalent to
    certain block-diagonal measurement systems. We develop novel exact formulas for
    the sparsity/undersampling tradeoffs in such measurement systems. Our formulas
    predict finite-(N) phase transition behavior differing substantially from the
    well known asymptotic phase transitions for classical Gaussian undersampling.
    Extensive empirical work shows that our formulas accurately describe observed
    finite-(N) behavior, while the usual formulas based on universality are
    substantially inaccurate.

    We also vary the anisotropy, keeping the total number of samples fixed, and
    for each variation we determine the precise sparsity/undersampling tradeoff
    (phase transition). We show that, other things being equal, the ability to
    recover a sparse object decreases with an increasing number of
    exhaustively-sampled dimensions.

    The ARMA(k) Gaussian Feedback Capacity

    Tao Liu, Guangyue Han
    Subjects: Information Theory (cs.IT)

    Using Kim’s variational formulation (with a slight yet important
    modification), we derive the ARMA(k) Gaussian feedback capacity, i.e., the
    feedback capacity of an additive channel where the noise is a k-th order
    autoregressive moving average Gaussian process. More specifically, the ARMA(k)
    Gaussian feedback capacity is expressed as a simple function evaluated at a
    solution to a system of polynomial equations, which has only finitely many
    solutions for the cases k=1, 2 and possibly beyond.

    Density Functional Estimators with k-Nearest Neighbor Bandwidths

    Weihao Gao, Sewoong Oh, Pramod Viswanath
    Comments: 17 pages, 6 figures
    Subjects: Information Theory (cs.IT)

    Estimating expected polynomials of density functions from samples is a basic
    problem with numerous applications in statistics and information theory.
    Although kernel density estimators are widely used in practice for such
    functional estimation problems, practitioners are left on their own to choose
    an appropriate bandwidth for each application in hand. Further, kernel density
    estimators suffer from boundary biases, which are prevalent in real world data
    with lower dimensional structures. We propose using the fixed-k nearest
    neighbor distances for the bandwidth, which adaptively adjusts to local
    geometry. Further, we propose a novel estimator based on local likelihood
    density estimators, that mitigates the boundary biases. Although such a choice
    of fixed-k nearest neighbor distances to bandwidths results in inconsistent
    estimators, we provide a simple debiasing scheme that precomputes the
    asymptotic bias and divides off this term. With this novel correction, we show
    consistency of this debiased estimator. We provide numerical experiments
    suggesting that it improves upon competing state-of-the-art methods.

    An Overview of Multi-Processor Approximate Message Passing

    Junan Zhu, Ryan Pilgrim, Dror Baron
    Subjects: Information Theory (cs.IT)

    Approximate message passing (AMP) is an algorithmic framework for solving
    linear inverse problems from noisy measurements, with exciting applications
    such as reconstructing images, audio, hyper spectral images, and various other
    signals, including those acquired in compressive signal acquisiton systems. The
    growing prevalence of big data systems has increased interest in large-scale
    problems, which may involve huge measurement matrices that are unsuitable for
    conventional computing systems. To address the challenge of large-scale
    processing, multiprocessor (MP) versions of AMP have been developed. We provide
    an overview of two such MP-AMP variants. In row-MP-AMP, each computing node
    stores a subset of the rows of the matrix and processes corresponding
    measurements. In column- MP-AMP, each node stores a subset of columns, and is
    solely responsible for reconstructing a portion of the signal. We will discuss
    pros and cons of both approaches, summarize recent research results for each,
    and explain when each one may be a viable approach. Aspects that are
    highlighted include some recent results on state evolution for both MP-AMP
    algorithms, and the use of data compression to reduce communication in the MP
    network.

    Elementary (L^infty) error estimates for super-resolution de-noising

    Weilin Li
    Subjects: Information Theory (cs.IT)

    This paper studies the problem of recovering a discrete complex measure on
    the torus from a finite number of corrupted Fourier samples. We assume the
    support of the unknown discrete measure satisfies a minimum separation
    condition and we use convex regularization methods to recover approximations of
    the original measure. We focus on two well-known convex regularization methods,
    and for both, we establish an error estimate that bounds the smoothed-out error
    in terms of the target resolution and noise level. Our (L^infty) approximation
    rate is entirely new for one of the methods, and improves upon a previously
    established (L^1) estimate for the other. We provide a unified analysis and an
    elementary proof of the theorem.

    Secure Multi-Source Multicast

    Alejandro Cohen, Asaf Cohen, Muriel Medard, Omer Gurewitz
    Subjects: Information Theory (cs.IT)

    The principal mission of Multi-Source Multicast (MSM) is to disseminate all
    messages from all sources in a network to all destinations. MSM is utilized in
    numerous applications. In many of them, securing the messages disseminated is
    critical. A common secure model is to consider a network where there is an
    eavesdropper which is able to observe a subset of the network links, and seek a
    code which keeps the eavesdropper ignorant regarding all the messages. While
    this is solved when all messages are located at a single source, Secure MSM
    (SMSM) is an open problem, and the rates required are hard to characterize in
    general. In this paper, we consider Individual Security, which promises that
    the eavesdropper has zero mutual information with each message individually. We
    completely characterize the rate region for SMSM under individual security, and
    show that such a security level is achievable at the full capacity of the
    network, that is, the cut-set bound is the matching converse, similar to
    non-secure MSM. Moreover, we show that the field size is similar to non-secure
    MSM and does not have to be larger due to the security constraint.

    Conversion of Coherence Number and Concurrence into Entanglement (k)-concurrence

    Seungbeom Chin
    Comments: 17 pages
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    We investigate the (k)-concurrence entanglement convertability theorem of
    coherence. By introducing the (coherence) (number), which is the generalization
    of coherence rank to mixed states, we present a necessary and sufficient
    condition for a coherent mixed state to be converted to an entangled state of
    nonzero (k)-concurrence. We also quantitatively compare the amount of
    (k)-concurrence entanglement with coherence concurrence (C_c), a recently
    introduced convex roof monotone of coherence.




沪ICP备19023445号-2号
友情链接