IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 19 Oct 2016

    我爱机器学习(52ml.net)发表于 2016-10-19 00:00:00
    love 0

    Neural and Evolutionary Computing

    Scaling Up MAP-Elites Using Centroidal Voronoi Tessellations

    Vassilis Vassiliades, Konstantinos Chatzilygeroudis, Jean-Baptiste Mouret
    Subjects: Neural and Evolutionary Computing (cs.NE)

    The recently introduced Multi-dimensional Archive of Phenotypic Elites
    (MAP-Elites) is an evolutionary algorithm capable of producing a large archive
    of diverse, high-performing solutions in a single run. It works by discretizing
    a continuous feature space into unique regions according to the desired
    discretization per dimension. While simple, this algorithm has a main drawback:
    it cannot scale to high-dimensional feature spaces since the number of regions
    increase exponentially with the number of dimensions. In this paper, we address
    this limitation by introducing a simple extension of MAP-Elites that has a
    constant, pre-defined number of regions irrespective of the dimensionality of
    the feature space. Our main insight is that methods from computational geometry
    could partition a high-dimensional space into well-spread geometric regions. In
    particular, our algorithm uses a centroidal Voronoi tessellation (CVT) to
    divide the feature space into a desired number of regions; it then places every
    generated individual in its closest region, replacing a less fit one if the
    region is already occupied. We demonstrate the effectiveness of the new
    “CVT-MAP-Elites” algorithm in high-dimensional feature spaces through
    comparisons against MAP-Elites in a hexapod locomotion task.

    Design Mining Microbial Fuel Cell Cascades

    Richard J. Preen, Jiseon You, Larry Bull, Ioannis A. Ieropoulos
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)

    Microbial fuel cells (MFCs) perform wastewater treatment and electricity
    production through the conversion of organic matter using microorganisms. For
    practical applications, it has been suggested that greater efficiency can be
    achieved by arranging multiple MFC units into physical stacks in a cascade with
    feedstock flowing sequentially between units. In this paper, we investigate the
    use of computational intelligence to physically explore and optimise
    (potentially) heterogeneous MFC designs in a cascade, i.e. without simulation.
    Conductive structures are 3-D printed and inserted into the anodic chamber of
    each MFC unit, augmenting a carbon fibre veil anode and affecting the
    hydrodynamics, including the feedstock volume and hydraulic retention time, as
    well as providing unique habitats for microbial colonisation. We show that it
    is possible to use design mining to identify new conductive inserts that
    increase both the cascade power output and power density.

    Online Contrastive Divergence with Generative Replay: Experience Replay without Storing Data

    Decebal Constantin Mocanu, Maria Torres Vega, Eric Eaton, Peter Stone, Antonio Liotta
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Conceived in the early 1990s, Experience Replay (ER) has been shown to be a
    successful mechanism to allow online learning algorithms to reuse past
    experiences. Traditionally, ER can be applied to all machine learning paradigms
    (i.e., unsupervised, supervised, and reinforcement learning). Recently, ER has
    contributed to improving the performance of deep reinforcement learning. Yet,
    its application to many practical settings is still limited by the memory
    requirements of ER, necessary to explicitly store previous observations. To
    remedy this issue, we explore a novel approach, Online Contrastive Divergence
    with Generative Replay (OCD_GR), which uses the generative capability of
    Restricted Boltzmann Machines (RBMs) instead of recorded past experiences. The
    RBM is trained online, and does not require the system to store any of the
    observed data points. We compare OCD_GR to ER on 9 real-world datasets,
    considering a worst-case scenario (data points arriving in sorted order) as
    well as a more realistic one (sequential random-order data points). Our results
    show that in 64.28% of the cases OCD_GR outperforms ER and in the remaining
    35.72% it has an almost equal performance, while having a considerably reduced
    space complexity (i.e., memory usage) at a comparable time complexity.


    Computer Vision and Pattern Recognition

    Fast L1-NMF for Multiple Parametric Model Estimation

    Mariano Tepper, Guillermo Sapiro
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In this work we introduce a comprehensive algorithmic pipeline for multiple
    parametric model estimation. The proposed approach analyzes the information
    produced by a random sampling algorithm (e.g., RANSAC) from a machine
    learning/optimization perspective, using a extit{parameterless} biclustering
    algorithm based on L1 nonnegative matrix factorization (L1-NMF). The proposed
    framework exploits consistent patterns that naturally arise during the RANSAC
    execution, while explicitly avoiding spurious inconsistencies. Contrarily to
    the main trends in the literature, the proposed technique does not impose
    non-intersecting parametric models. A new accelerated algorithm to compute
    L1-NMFs allows to handle medium-sized problems faster while also extending the
    usability of the algorithm to much larger datasets. This accelerated algorithm
    has applications in any other context where an L1-NMF is needed, beyond the
    biclustering approach to parameter estimation here addressed. We accompany the
    algorithmic presentation with theoretical foundations and numerous and diverse
    examples.

    Semantic Decomposition and Recognition of Long and Complex Manipulation Action Sequences

    Eren Erdal Aksoy, Adil Orhan, Florentin Woergoetter
    Comments: IJCV preprint manuscript
    Journal-ref: International Journal of Computer Vision, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Understanding continuous human actions is a non-trivial but important problem
    in computer vision. Although there exists a large corpus of work in the
    recognition of action sequences, most approaches suffer from problems relating
    to vast variations in motions, action combinations, and scene contexts. In this
    paper, we introduce a novel method for semantic segmentation and recognition of
    long and complex manipulation action tasks, such as “preparing a breakfast” or
    “making a sandwich”. We represent manipulations with our recently introduced
    “Semantic Event Chain” (SEC) concept, which captures the underlying
    spatiotemporal structure of an action invariant to motion, velocity, and scene
    context. Solely based on the spatiotemporal interactions between manipulated
    objects and hands in the extracted SEC, the framework automatically parses
    individual manipulation streams performed either sequentially or concurrently.
    Using event chains, our method further extracts basic primitive elements of
    each parsed manipulation. Without requiring any prior object knowledge, the
    proposed framework can also extract object-like scene entities that exhibit the
    same role in semantically similar manipulations. We conduct extensive
    experiments on various recent datasets to validate the robustness of the
    framework.

    From Traditional to Modern : Domain Adaptation for Action Classification in Short Social Video Clips

    Aditya Singh, Saurabh Saini, Rajvi Shah, P J Narayanan
    Comments: 9 pages, GCPR, 2016
    Journal-ref: Pattern Recognition,38th German Conference, GCPR 2016, Hannover,
    Germany, September 12-15, 2016, Proceedings,pp 245-257
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Short internet video clips like vines present a significantly wild
    distribution compared to traditional video datasets. In this paper, we focus on
    the problem of unsupervised action classification in wild vines using
    traditional labeled datasets. To this end, we use a data augmentation based
    simple domain adaptation strategy. We utilise semantic word2vec space as a
    common subspace to embed video features from both, labeled source domain and
    unlablled target domain. Our method incrementally augments the labeled source
    with target samples and iteratively modifies the embedding function to bring
    the source and target distributions together. Additionally, we utilise a
    multi-modal representation that incorporates noisy semantic information
    available in form of hash-tags. We show the effectiveness of this simple
    adaptation technique on a test set of vines and achieve notable improvements in
    performance.

    Deep Identity-aware Transfer of Facial Attributes

    Mu Li, Wangmeng Zuo, David Zhang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a Deep convolutional network model for Identity-Aware
    Transfer (DIAT) of facial attributes. Given the source input image and the
    reference attribute, DIAT aims to generate a facial image (i.e., target image)
    that not only owns the reference attribute but also keep the same or similar
    identity to the input image. We develop a two-stage scheme to transfer the
    input image to each reference attribute label. A feed-forward transform network
    is first trained by combining perceptual identity-aware loss and GAN-based
    attribute loss, and a face enhancement network is then introduced to improve
    the visual quality. We further define perceptual identity loss on the
    convolutional feature maps of the attribute discriminator, resulting in a
    DIAT-A model. Our DIAT and DIAT-A models can provide a unified solution for
    several representative facial attribute transfer tasks such as expression
    transfer, accessory removal, age progression, and gender transfer. The
    experimental results validate their effectiveness. Even for some
    identity-related attribute (e.g., gender), our DIAT-A can obtain visually
    impressive results by changing the attribute while retaining most identity
    features of the source image.

    Master's Thesis : Deep Learning for Visual Recognition

    Rémi Cadène, Nicolas Thome, Matthieu Cord
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The goal of our research is to develop methods advancing automatic visual
    recognition. In order to predict the unique or multiple labels associated to an
    image, we study different kind of Deep Neural Networks architectures and
    methods for supervised features learning. We first draw up a state-of-the-art
    review of the Convolutional Neural Networks aiming to understand the history
    behind this family of statistical models, the limit of modern architectures and
    the novel techniques currently used to train deep CNNs. The originality of our
    work lies in our approach focusing on tasks with a low amount of data. We
    introduce different models and techniques to achieve the best accuracy on
    several kind of datasets, such as a medium dataset of food recipes (100k
    images) for building a web API, or a small dataset of satellite images (6,000)
    for the DSG online challenge that we’ve won. We also draw up the
    state-of-the-art in Weakly Supervised Learning, introducing different kind of
    CNNs able to localize regions of interest. Our last contribution is a
    framework, build on top of Torch7, for training and testing deep models on any
    visual recognition tasks and on datasets of any scale.

    Edge Based Grid Super-Imposition for Crowd Emotion Recognition

    Amol Patwardhan
    Comments: 6 pages, 6 figure, 1 table, emotion, crowd, group, spontaneous, indoor, edge, grid, mesh, tracking, temporal feature
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

    Numerous automatic continuous emotion detection system studies have examined
    mostly use of videos and images containing individual person expressing
    emotions. This study examines the detection of spontaneous emotions in a group
    and crowd settings. Edge detection was used with a grid of lines
    superimposition to extract the features. The feature movement in terms of
    movement from the reference point was used to track across sequences of images
    from the color channel. Additionally the video data capturing was done on
    spontaneous emotions invoked by watching sports events from group of
    participants. The method was view and occlusion independent and the results
    were not affected by presence of multiple people chaotically expressing various
    emotions. The edge thresholds of 0.2 and grid thresholds of 20 showed the best
    accuracy results. The overall accuracy of the group emotion classifier was
    70.9%.

    M2CAI Workflow Challenge: Convolutional Neural Networks with Time Smoothing and Hidden Markov Model for Video Frames Classification

    Rémi Cadène, Thomas Robert, Nicolas Thome, Matthieu Cord
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Our approach is among the three best to tackle the M2CAI Workflow challenge.
    The latter consists in recognizing the operation phase for each frames of
    endoscopic videos. In this technical report, we compare several classification
    models and temporal smoothing methods. Our submitted solution is a fine tuned
    Residual Network-200 on 80% of the training set with temporal smoothing using
    simple temporal averaging of the predictions and a Hidden Markov Model modeling
    the sequence.

    Shape-based defect classification for Non Destructive Testing

    Gianni D'Angelo, Salvatore Rampone
    Comments: 5 pages, IEEE International Workshop
    Journal-ref: IEEE International Workshop on Metrology for Aerospace, Benevento,
    Italy, June 4-5, 2015
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Engineering, Finance, and Science (cs.CE)

    The aim of this work is to classify the aerospace structure defects detected
    by eddy current non-destructive testing. The proposed method is based on the
    assumption that the defect is bound to the reaction of the probe coil impedance
    during the test. Impedance plane analysis is used to extract a feature vector
    from the shape of the coil impedance in the complex plane, through the use of
    some geometric parameters. Shape recognition is tested with three different
    machine-learning based classifiers: decision trees, neural networks and Naive
    Bayes. The performance of the proposed detection system are measured in terms
    of accuracy, sensitivity, specificity, precision and Matthews correlation
    coefficient. Several experiments are performed on dataset of eddy current
    signal samples for aircraft structures. The obtained results demonstrate the
    usefulness of our approach and the competiveness against existing descriptors.

    Real-time analysis of cataract surgery videos using statistical models

    Katia Charrière, Gwenolé Quellec, Mathieu Lamard, David Martiano, Guy Cazuguel, Gouenou Coatrieux, Béatrice Cochener
    Comments: This is an extended version of a paper presented at the CBMI 2016 conference
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The automatic analysis of the surgical process, from videos recorded during
    surgeries, could be very useful to surgeons, both for training and for
    acquiring new techniques. The training process could be optimized by
    automatically providing some targeted recommendations or warnings, similar to
    the expert surgeon’s guidance. In this paper, we propose to reuse videos
    recorded and stored during cataract surgeries to perform the analysis. The
    proposed system allows to automatically recognize, in real time, what the
    surgeon is doing: what surgical phase or, more precisely, what surgical step he
    or she is performing. This recognition relies on the inference of a multilevel
    statistical model which uses 1) the conditional relations between levels of
    description (steps and phases) and 2) the temporal relations among steps and
    among phases. The model accepts two types of inputs: 1) the presence of
    surgical tools, manually provided by the surgeons, or 2) motion in videos,
    automatically analyzed through the Content Based Video retrieval (CBVR)
    paradigm. Different data-driven statistical models are evaluated in this paper.
    For this project, a dataset of 30 cataract surgery videos was collected at
    Brest University hospital. The system was evaluated in terms of area under the
    ROC curve. Promising results were obtained using either the presence of
    surgical tools ((A_z) = 0.983) or motion analysis ((A_z) = 0.759). The
    generality of the method allows to adapt it to any kinds of surgeries. The
    proposed solution could be used in a computer assisted surgery tool to support
    surgeons during the surgery.

    ARTiS: Appearance-based Action Recognition in Task Space

    Markus Eich, Sareh Shirazi, Gordon Wyeth
    Comments: Submitted to ICRA 2017
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    To have a robot actively supporting a human during a collaborative task, it
    is crucial that robots are able to identify the current action in order to
    predict the next one. Common approaches make use of high-level knowledge, such
    as object affordances, semantics or understanding of actions in terms of pre-
    and post-conditions. These approaches often require hand-coded a priori
    knowledge, time- and resource- intensive or supervised learning techniques. We
    propose to reframe this problem as an appearance- based place recognition
    problem. In our framework, we regard sequences of visual images of human
    actions as a map in analogy to the visual place recognition problem. Observing
    the task for the second time, our approach is able to recognize pre-observed
    actions in a one-shot learning approach and is thereby able to recognize the
    current observation in the task space. We propose two new methods for creating
    and aligning action observations within a task map. We compare and verify our
    approaches with real data of humans assembling an IKEA flat pack drawer.


    Artificial Intelligence

    Deep Amortized Inference for Probabilistic Programs

    Daniel Ritchie, Paul Horsfall, Noah D. Goodman
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Probabilistic programming languages (PPLs) are a powerful modeling tool, able
    to represent any computable probability distribution. Unfortunately,
    probabilistic program inference is often intractable, and existing PPLs mostly
    rely on expensive, approximate sampling-based methods. To alleviate this
    problem, one could try to learn from past inferences, so that future inferences
    run faster. This strategy is known as amortized inference; it has recently been
    applied to Bayesian networks and deep generative models. This paper proposes a
    system for amortized inference in PPLs. In our system, amortization comes in
    the form of a parameterized guide program. Guide programs have similar
    structure to the original program, but can have richer data flow, including
    neural network components. These networks can be optimized so that the guide
    approximately samples from the posterior distribution defined by the original
    program. We present a flexible interface for defining guide programs and a
    stochastic gradient-based scheme for optimizing guide parameters, as well as
    some preliminary results on automatically deriving guide programs. We explore
    in detail the common machine learning pattern in which a ‘local’ model is
    specified by ‘global’ random values and used to generate independent observed
    data points; this gives rise to amortized local inference supporting global
    model learning.

    Census Signal Temporal Logic Inference for Multi-Agent Group Behavior Analysis

    Zhe Xu, Agung Julius
    Comments: 26 pages, 5 figures
    Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Logic (math.LO); Optimization and Control (math.OC)

    In this paper, we define a novel census signal temporal logic (CensusSTL)
    that focuses on the number of agents in different subsets of a group that
    complete a certain task specified by the signal temporal logic (STL). CensusSTL
    consists of an “inner logic” STL formula and an “outer logic” STL formula. We
    present a new inference algorithm to infer CensusSTL formulae from the
    trajectory data of a group of agents. We first identify the “inner logic” STL
    formula and then infer the subgroups based on whether the agents’ behaviors
    satisfy the “inner logic” formula at each time point. We use two different
    approaches to infer the subgroups based on similarity and complementarity,
    respectively. The “outer logic” CensusSTL formula is inferred from the census
    trajectories of different subgroups. We apply the algorithm in analyzing data
    from a soccer match by inferring the CensusSTL formula for different subgroups
    of a soccer team.

    Identifiability and Transportability in Dynamic Causal Networks

    Gilles Blondel, Marta Arias, Ricard Gavaldà
    Comments: Presented at the 2016 ACM SIGKDD Workshop on Causal Discovery
    Subjects: Artificial Intelligence (cs.AI)

    In this paper we propose a causal analog to the purely observational Dynamic
    Bayesian Networks, which we call Dynamic Causal Networks. We provide a sound
    and complete algorithm for identification of Dynamic Causal Net- works, namely,
    for computing the effect of an intervention or experiment, based on passive
    observations only, whenever possible. We note the existence of two types of
    confounder variables that affect in substantially different ways the iden-
    tification procedures, a distinction with no analog in either Dynamic Bayesian
    Networks or standard causal graphs. We further propose a procedure for the
    transportability of causal effects in Dynamic Causal Network settings, where
    the re- sult of causal experiments in a source domain may be used for the
    identification of causal effects in a target domain.

    Weighted Positive Binary Decision Diagrams for Exact Probabilistic Inference

    Giso H. Dal, Peter J.F. Lucas
    Comments: 30 pages
    Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

    Recent work on weighted model counting has been very successfully applied to
    the problem of probabilistic inference in Bayesian networks. The probability
    distribution is encoded into a Boolean normal form and compiled to a target
    language, in order to represent local structure expressed among conditional
    probabilities more efficiently. We show that further improvements are possible,
    by exploiting the knowledge that is lost during the encoding phase and
    incorporating it into a compiler inspired by Satisfiability Modulo Theories.
    Constraints among variables are used as a background theory, which allows us to
    optimize the Shannon decomposition. We propose a new language, called Weighted
    Positive Binary Decision Diagrams, that reduces the cost of probabilistic
    inference by using this decomposition variant to induce an arithmetic circuit
    of reduced size.

    Diagnosis of aerospace structure defects by a HPC implemented soft computing algorithm

    Gianni D'Angelo, Salvatore Rampone
    Comments: 5 pages, IEEE International Workshop on Metrology for Aerospace
    Journal-ref: IEEE International Workshop on Metrology for Aerospace, Benevento,
    Italy, May 29-30, 2014
    Subjects: Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Data Analysis, Statistics and Probability (physics.data-an)

    This study concerns with the diagnosis of aerospace structure defects by
    applying a HPC parallel implementation of a novel learning algorithm, named
    U-BRAIN. The Soft Computing approach allows advanced multi-parameter data
    processing in composite materials testing. The HPC parallel implementation
    overcomes the limits due to the great amount of data and the complexity of data
    processing. Our experimental results illustrate the effectiveness of the
    U-BRAIN parallel implementation as defect classifier in aerospace structures.
    The resulting system is implemented on a Linux-based cluster with multi-core
    architecture.

    Makespan Optimal Solving of Cooperative Path-Finding via Reductions to Propositional Satisfiability

    Pavel Surynek
    Subjects: Artificial Intelligence (cs.AI)

    The problem of makespan optimal solving of cooperative path finding (CPF) is
    addressed in this paper. The task in CPF is to relocate a group of agents in a
    non-colliding way so that each agent eventually reaches its goal location from
    the given initial location. The abstraction adopted in this work assumes that
    agents are discrete items moving in an undirected graph by traversing edges.
    Makespan optimal solving of CPF means to generate solutions that are as short
    as possi-ble in terms of the total number of time steps required for the
    execution of the solution.

    We show that reducing CPF to propositional satisfiability (SAT) represents a
    viable option for obtaining makespan optimal solutions. Several encodings of
    CPF into propositional formulae are suggested and experimentally evaluated. The
    evaluation indicates that SAT based CPF solving outperforms other makespan
    optimal methods significantly in highly constrained situations (environments
    that are densely occupied by agents).

    VRPBench: A Vehicle Routing Benchmark Tool

    Guilherme A. Zeni, Mauro Menzori, P. S. Martins, Luis A. A. Meira
    Subjects: Artificial Intelligence (cs.AI)

    The number of optimization techniques in the combinatorial domain is large
    and diversified. Nevertheless, there is still a lack of real benchmarks to
    validate optimization algorithms. In this work we introduce VRPBench, a tool to
    create instances and visualize solutions to the Vehicle Routing Problem (VRP)
    in a planar graph embedded in the Euclidean 2D space. We use VRPBench to model
    a real-world mail delivery case of the city of Artur Nogueira. Such scenarios
    were characterized as a multi-objective optimization of the VRP. We extracted a
    weighted graph from a digital map of the city to create a challenging benchmark
    for the VRP. Each instance models one generic day of mail delivery with
    hundreds to thousands of delivery points, thus allowing both the comparison and
    validation of optimization algorithms for routing problems.

    Design Mining Microbial Fuel Cell Cascades

    Richard J. Preen, Jiseon You, Larry Bull, Ioannis A. Ieropoulos
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)

    Microbial fuel cells (MFCs) perform wastewater treatment and electricity
    production through the conversion of organic matter using microorganisms. For
    practical applications, it has been suggested that greater efficiency can be
    achieved by arranging multiple MFC units into physical stacks in a cascade with
    feedstock flowing sequentially between units. In this paper, we investigate the
    use of computational intelligence to physically explore and optimise
    (potentially) heterogeneous MFC designs in a cascade, i.e. without simulation.
    Conductive structures are 3-D printed and inserted into the anodic chamber of
    each MFC unit, augmenting a carbon fibre veil anode and affecting the
    hydrodynamics, including the feedstock volume and hydraulic retention time, as
    well as providing unique habitats for microbial colonisation. We show that it
    is possible to use design mining to identify new conductive inserts that
    increase both the cascade power output and power density.

    Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models

    Pranay Dighe, Afsaneh Asaei, Herve Bourlard
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)

    Conventional deep neural networks (DNN) for speech acoustic modeling rely on
    Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary
    class labels as the targets for DNN training. Subword classes in speech
    recognition systems correspond to context-dependent tied states or senones. The
    present work addresses some limitations of GMM-HMM senone alignments for DNN
    training. We hypothesize that the senone probabilities obtained from a DNN
    trained with binary labels can provide more accurate targets to learn better
    acoustic models. However, DNN outputs bear inaccuracies which are exhibited as
    high dimensional unstructured noise, whereas the informative components are
    structured and low-dimensional. We exploit principle component analysis (PCA)
    and sparse coding to characterize the senone subspaces. Enhanced probabilities
    obtained from low-rank and sparse reconstructions are used as soft-targets for
    DNN acoustic modeling, that also enables training with untranscribed data.
    Experiments conducted on AMI corpus shows 4.6% relative reduction in word error
    rate.


    Computation and Language

    Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models

    Pranay Dighe, Afsaneh Asaei, Herve Bourlard
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)

    Conventional deep neural networks (DNN) for speech acoustic modeling rely on
    Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary
    class labels as the targets for DNN training. Subword classes in speech
    recognition systems correspond to context-dependent tied states or senones. The
    present work addresses some limitations of GMM-HMM senone alignments for DNN
    training. We hypothesize that the senone probabilities obtained from a DNN
    trained with binary labels can provide more accurate targets to learn better
    acoustic models. However, DNN outputs bear inaccuracies which are exhibited as
    high dimensional unstructured noise, whereas the informative components are
    structured and low-dimensional. We exploit principle component analysis (PCA)
    and sparse coding to characterize the senone subspaces. Enhanced probabilities
    obtained from low-rank and sparse reconstructions are used as soft-targets for
    DNN acoustic modeling, that also enables training with untranscribed data.
    Experiments conducted on AMI corpus shows 4.6% relative reduction in word error
    rate.

    Stylometric Analysis of Early Modern Period English Plays

    Mark Eisen, Santiago Segarra, Gabriel Egan, Alejandro Ribeiro
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Function word adjacency networks (WANs) are used to study the authorship of
    plays from the Early Modern English period. In these networks, nodes are
    function words and directed edges between two nodes represent the likelihood of
    ordered co-appearance of the two words. For every analyzed play a WAN is
    constructed and these are aggregated to generate author profile networks. We
    first study the similarity of writing styles between Early English playwrights
    by comparing the profile WANs. The accuracy of using WANs for authorship
    attribution is then demonstrated by attributing known plays among six popular
    playwrights. The WAN method is shown to additionally outperform other
    frequency-based methods on attributing Early English plays. This high
    classification power is then used to investigate the authorship of anonymous
    plays. Moreover, WANs are shown to be reliable classifiers even when
    attributing collaborative plays. For several plays of disputed co- authorship,
    a deeper analysis is performed by attributing every act and scene separately,
    in which we both corroborate existing breakdowns and provide evidence of new
    assignments. Finally, the impact of genre on attribution accuracy is examined
    revealing that the genre of a play partially conditions the choice of the
    function words used in it.

    Vietnamese Named Entity Recognition using Token Regular Expressions and Bidirectional Inference

    Phuong Le-Hong
    Comments: Submitted to the VLSP Workshop 2016
    Subjects: Computation and Language (cs.CL)

    This paper describes an efficient approach to improve the accuracy of a named
    entity recognition system for Vietnamese. The approach combines regular
    expressions over tokens and a bidirectional inference method in a sequence
    labelling model, which achieves an overall (F_1) score of 89.66% on a test set
    of an evaluation compaign, organized in late 2016 by the Vietnamese Language
    and Speech Processing (VLSP) community.

    SYSTRAN's Pure Neural Machine Translation Systems

    Josep Crego, Jungi Kim, Guillaume Klein, Anabel Rebollo, Kathy Yang, Jean Senellart, Egor Akhanov, Patrice Brunelle, Aurelien Coquard, Yongchao Deng, Satoshi Enoue, Chiyo Geiss, Joshua Johanson, Ardas Khalsa, Raoum Khiari, Byeongil Ko, Catherine Kobus, Jean Lorieux, Leidiana Martins, Dang-Chuan Nguyen, Alexandra Priori, Thomas Riccardi, Natalia Segal, Christophe Servan, Cyril Tiquet, Bo Wang, Jin Yang, Dakun Zhang, Jing Zhou, Peter Zoldan
    Subjects: Computation and Language (cs.CL)

    Since the first online demonstration of Neural Machine Translation (NMT) by
    LISA, NMT development has recently moved from laboratory to production systems
    as demonstrated by several entities announcing roll-out of NMT engines to
    replace their existing technologies. NMT systems have a large number of
    training configurations and the training process of such systems is usually
    very long, often a few weeks, so role of experimentation is critical and
    important to share. In this work, we present our approach to production-ready
    systems simultaneously with release of online demonstrators covering a large
    variety of languages (12 languages, for 32 language pairs). We explore
    different practical choices: an efficient and evolutive open-source framework;
    data preparation; network architecture; additional implemented features; tuning
    for production; etc. We discuss about evaluation methodology, present our first
    findings and we finally outline further work.

    Our ultimate goal is to share our expertise to build competitive production
    systems for “generic” translation. We aim at contributing to set up a
    collaborative framework to speed-up adoption of the technology, foster further
    research efforts and enable the delivery and adoption to/by industry of
    use-case specific engines integrated in real production workflows. Mastering of
    the technology would allow us to build translation engines suited for
    particular needs, outperforming current simplest/uniform systems.

    Addressing Community Question Answering in English and Arabic

    Giovanni Da San Martino, Alberto Barrón-Cedeño, Salvatore Romeo, Alessandro Moschitti, Shafiq Joty, Fahad A. Al Obaidli, Kateryna Tymoshenko, Antonio Uva
    Comments: presented at Second WebQA workshop, SIGIR2016 (this http URL)
    Subjects: Computation and Language (cs.CL)

    This paper studies the impact of different types of features applied to
    learning to re-rank questions in community Question Answering. We tested our
    models on two datasets released in SemEval-2016 Task 3 on “Community Question
    Answering”. Task 3 targeted real-life Web fora both in English and Arabic. Our
    models include bag-of-words features (BoW), syntactic tree kernels (TKs), rank
    features, embeddings, and machine translation evaluation features. To the best
    of our knowledge, structural kernels have barely been applied to the question
    reranking task, where they have to model paraphrase relations. In the case of
    the English question re-ranking task, we compare our learning to rank (L2R)
    algorithms against a strong baseline given by the Google-generated ranking
    (GR). The results show that i) the shallow structures used in our TKs are
    robust enough to noisy data and ii) improving GR is possible, but effective BoW
    features and TKs along with an accurate model of GR features in the used L2R
    algorithm are required. In the case of the Arabic question re-ranking task, for
    the first time we applied tree kernels on syntactic trees of Arabic sentences.
    Our approaches to both tasks obtained the second best results on SemEval-2016
    subtasks B on English and D on Arabic.

    Personalized Machine Translation: Preserving Original Author Traits

    Ella Rabinovich, Shachar Mirkin, Raj Nath Patel, Lucia Specia, Shuly Wintner
    Comments: 10 pages
    Subjects: Computation and Language (cs.CL)

    The language that we produce reflects our personality, and various personal
    and demographic characteristics can be detected in natural language texts. We
    focus on one particular personal trait, gender, and study how it is manifested
    in original texts and in translations. We show that gender has a powerful,
    clear signal in originals, but this signal is obfuscated in human and machine
    translation. We then propose simple domain-adaptation techniques that help
    retain the original gender traits in the translation outcome, without harming
    the quality of the translation, thereby creating more personalized machine
    translation systems.

    End-to-end attention-based distant speech recognition with Highway LSTM

    Hassan Taherian
    Comments: 8 pages, 2 figures
    Subjects: Computation and Language (cs.CL)

    End-to-end attention-based models have been shown to be competitive
    alternatives to conventional DNN-HMM models in the Speech Recognition Systems.
    In this paper, we extend existing end-to-end attention-based models that can be
    applied for Distant Speech Recognition (DSR) task. Specifically, we propose an
    end-to-end attention-based speech recognizer with multichannel input that
    performs sequence prediction directly at the character level. To gain a better
    performance, we also incorporate Highway long short-term memory (HLSTM) which
    outperforms previous models on AMI distant speech recognition task.

    The infochemical core

    Antoni Hernández-Fernández, Ramon Ferrer-i-Cancho
    Journal-ref: The infochemical core. Journal of Quantitative Linguistics 23 (2),
    133-153 (2016)
    Subjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL)

    Vocalizations and less often gestures have been the object of linguistic
    research over decades. However, the development of a general theory of
    communication with human language as a particular case requires a clear
    understanding of the organization of communication through other means.
    Infochemicals are chemical compounds that carry information and are employed by
    small organisms that cannot emit acoustic signals of optimal frequency to
    achieve successful communication. Here the distribution of infochemicals across
    species is investigated when they are ranked by their degree or the number of
    species with which it is associated (because they produce or they are sensitive
    to it). The quality of the fit of different functions to the dependency between
    degree and rank is evaluated with a penalty for the number of parameters of the
    function. Surprisingly, a double Zipf (a Zipf distribution with two regimes
    with a different exponent each) is the model yielding the best fit although it
    is the function with the largest number of parameters. This suggests that the
    world wide repertoire of infochemicals contains a chemical nucleus shared by
    many species and reminiscent of the core vocabularies found for human language
    in dictionaries or large corpora.

    A General Framework for Content-enhanced Network Representation Learning

    Xiaofei Sun, Jiang Guo, Xiao Ding, Ting Liu
    Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Learning (cs.LG)

    This paper investigates the problem of network embedding, which aims at
    learning low-dimensional vector representation of nodes in networks. Most
    existing network embedding methods rely solely on the network structure, i.e.,
    the linkage relationships between nodes, but ignore the rich content
    information associated with it, which is common in real world networks and
    beneficial to describing the characteristics of a node. In this paper, we
    propose content-enhanced network embedding (CENE), which is capable of jointly
    leveraging the network structure and the content information. Our approach
    integrates text modeling and structure modeling in a general framework by
    treating the content information as a special kind of node. Experiments on
    several real world net- works with application to node classification show that
    our models outperform all existing network embedding methods, demonstrating the
    merits of content information and joint learning.


    Distributed, Parallel, and Cluster Computing

    Distributed Computation of Mixing Time

    Anisur Rahaman Molla, Gopal Pandurangan
    Comments: To appear in the Proceedings of ICDCN 2017, 10 pages
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    The mixing time of a graph is an important metric, which is not only useful
    in analyzing connectivity and expansion properties of the network, but also
    serves as a key parameter in designing efficient algorithms. We present an
    efficient distributed algorithm for computing the mixing time of undirected
    graphs. Our algorithm estimates the mixing time ( au_s) (with respect to a
    source node (s)) of any (n)-node undirected graph in (O( au_s log n)) rounds.
    Our algorithm is based on random walks and require very little memory and use
    lightweight local computations, and work in the CONGEST model. Hence our
    algorithm is scalable under bandwidth constraints and can be an helpful
    building block in the design of topologically aware networks.

    High-performance K-means Implementation based on a Coarse-grained Map-Reduce Architecture

    Zhehao Li, Jifang Jin, Lingli Wang
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR)

    The k-means algorithm is one of the most common clustering algorithms and
    widely used in data mining and pattern recognition. The increasing
    computational requirement of big data applications makes hardware acceleration
    for the k-means algorithm necessary. In this paper, a coarse-grained Map-Reduce
    architecture is proposed to implement the k-means algorithm on an FPGA.
    Algorithmic segmentation, data path elaboration and automatic control are
    applied to optimize the architecture for high performance. In addition, high
    level synthesis technique is utilized to reduce development cycles and
    complexity. For a single iteration in the k-means algorithm, a throughput of
    28.74 Gbps is achieved. The performance shows at least 3.93x speedup compared
    with four representative existing FPGA-based implementations and can satisfy
    the demand of big data applications.

    Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server

    Arda Aytekin, Hamid Reza Feyzmahdavian, Mikael Johansson
    Comments: 10 pages, 3 figures
    Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Machine Learning (stat.ML)

    This paper presents an asynchronous incremental aggregated gradient algorithm
    and its implementation in a parameter server framework for solving regularized
    optimization problems. The algorithm can handle both general convex (possibly
    non-smooth) regularizers and general convex constraints. When the empirical
    data loss is strongly convex, we establish linear convergence rate, give
    explicit expressions for step-size choices that guarantee convergence to the
    optimum, and bound the associated convergence factors. The expressions have an
    explicit dependence on the degree of asynchrony and recover classical results
    under synchronous operation. Simulations and implementations on commercial
    compute clouds validate our findings.


    Learning

    Online Contrastive Divergence with Generative Replay: Experience Replay without Storing Data

    Decebal Constantin Mocanu, Maria Torres Vega, Eric Eaton, Peter Stone, Antonio Liotta
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Conceived in the early 1990s, Experience Replay (ER) has been shown to be a
    successful mechanism to allow online learning algorithms to reuse past
    experiences. Traditionally, ER can be applied to all machine learning paradigms
    (i.e., unsupervised, supervised, and reinforcement learning). Recently, ER has
    contributed to improving the performance of deep reinforcement learning. Yet,
    its application to many practical settings is still limited by the memory
    requirements of ER, necessary to explicitly store previous observations. To
    remedy this issue, we explore a novel approach, Online Contrastive Divergence
    with Generative Replay (OCD_GR), which uses the generative capability of
    Restricted Boltzmann Machines (RBMs) instead of recorded past experiences. The
    RBM is trained online, and does not require the system to store any of the
    observed data points. We compare OCD_GR to ER on 9 real-world datasets,
    considering a worst-case scenario (data points arriving in sorted order) as
    well as a more realistic one (sequential random-order data points). Our results
    show that in 64.28% of the cases OCD_GR outperforms ER and in the remaining
    35.72% it has an almost equal performance, while having a considerably reduced
    space complexity (i.e., memory usage) at a comparable time complexity.

    Federated Learning: Strategies for Improving Communication Efficiency

    Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, Dave Bacon
    Subjects: Learning (cs.LG)

    Federated Learning is a machine learning setting where the goal is to train a
    high-quality centralized model with training data distributed over a large
    number of clients each with unreliable and relatively slow network connections.
    We consider learning algorithms for this setting where on each round, each
    client independently computes an update to the current model based on its local
    data, and communicates this update to a central server, where the client-side
    updates are aggregated to compute a new global model. The typical clients in
    this setting are mobile phones, and communication efficiency is of utmost
    importance. In this paper, we propose two ways to reduce the uplink
    communication costs. The proposed methods are evaluated on the application of
    training a deep neural network to perform image classification. Our best
    approach reduces the upload communication required to train a reasonable model
    by two orders of magnitude.

    Provably Good Early Detection of Diseases using Non-Sparse Covariance-Regularized Linear Discriminant Analysis

    Haoyi Xiong, Yanjie Fu, Wenqing Hu, Guanling Chen, Laura E. Barnes
    Subjects: Learning (cs.LG)

    To improve the performance of Linear Discriminant Analysis (LDA) for early
    detection of diseases using Electronic Health Records (EHR) data, we propose
    TheName{} — a novel framework for emph{underline{E}HR based
    underline{E}arly underline{D}etection of underline{D}iseases} on top of
    emph{Covariance-Regularized} LDA models. Specifically, TheName employs a
    emph{non-sparse} inverse covariance matrix (or namely precision matrix)
    estimator derived from graphical lasso and incorporates the estimator into LDA
    classifiers to improve classification accuracy. Theoretical analysis on
    TheName shows that it can bound the expected error rate of LDA
    classification, under certain assumptions. Finally, we conducted extensive
    experiments using a large-scale real-world EHR dataset — CHSN. We compared our
    solution with other regularized LDA and downstream classifiers. The result
    shows TheName outperforms all baselines and backups our theoretical analysis.

    Sequential Learning without Feedback

    Manjesh Hanawal, Csaba Szepesvari, Venkatesh Saligrama
    Subjects: Learning (cs.LG)

    In many security and healthcare systems a sequence of features/sensors/tests
    are used for detection and diagnosis. Each test outputs a prediction of the
    latent state, and carries with it inherent costs. Our objective is to {it
    learn} strategies for selecting tests to optimize accuracy & costs.
    Unfortunately it is often impossible to acquire in-situ ground truth
    annotations and we are left with the problem of unsupervised sensor selection
    (USS). We pose USS as a version of stochastic partial monitoring problem with
    an {it unusual} reward structure (even noisy annotations are unavailable).
    Unsurprisingly no learner can achieve sublinear regret without further
    assumptions. To this end we propose the notion of weak-dominance. This is a
    condition on the joint probability distribution of test outputs and latent
    state and says that whenever a test is accurate on an example, a later test in
    the sequence is likely to be accurate as well. We empirically verify that weak
    dominance holds on real datasets and prove that it is a maximal condition for
    achieving sublinear regret. We reduce USS to a special case of multi-armed
    bandit problem with side information and develop polynomial time algorithms
    that achieve sublinear regret.

    Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

    Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, Kunal Talwar
    Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Learning (cs.LG)

    Some machine learning applications involve training data that is sensitive,
    such as the medical histories of patients in a clinical trial. A model may
    inadvertently and implicitly store some of its training data; careful analysis
    of the model may therefore reveal sensitive information.

    To address this problem, we demonstrate a generally applicable approach to
    providing strong privacy guarantees for training data. The approach combines,
    in a black-box fashion, multiple models trained with disjoint datasets, such as
    records from different subsets of users. Because they rely directly on
    sensitive data, these models are not published, but instead used as teachers
    for a student model. The student learns to predict an output chosen by noisy
    voting among all of the teachers, and cannot directly access an individual
    teacher or the underlying data or parameters. The student’s privacy properties
    can be understood both intuitively (since no single teacher and thus no single
    dataset dictates the student’s training) and formally, in terms of differential
    privacy. These properties hold even if an adversary can not only query the
    student but also inspect its internal workings.

    Compared with previous work, the approach imposes only weak assumptions on
    how teachers are trained: it applies to any model, including non-convex models
    like DNNs. We achieve state-of-the-art privacy/utility trade-offs on MNIST and
    SVHN thanks to an improved privacy analysis and semi-supervised learning.

    Deep Amortized Inference for Probabilistic Programs

    Daniel Ritchie, Paul Horsfall, Noah D. Goodman
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Probabilistic programming languages (PPLs) are a powerful modeling tool, able
    to represent any computable probability distribution. Unfortunately,
    probabilistic program inference is often intractable, and existing PPLs mostly
    rely on expensive, approximate sampling-based methods. To alleviate this
    problem, one could try to learn from past inferences, so that future inferences
    run faster. This strategy is known as amortized inference; it has recently been
    applied to Bayesian networks and deep generative models. This paper proposes a
    system for amortized inference in PPLs. In our system, amortization comes in
    the form of a parameterized guide program. Guide programs have similar
    structure to the original program, but can have richer data flow, including
    neural network components. These networks can be optimized so that the guide
    approximately samples from the posterior distribution defined by the original
    program. We present a flexible interface for defining guide programs and a
    stochastic gradient-based scheme for optimizing guide parameters, as well as
    some preliminary results on automatically deriving guide programs. We explore
    in detail the common machine learning pattern in which a ‘local’ model is
    specified by ‘global’ random values and used to generate independent observed
    data points; this gives rise to amortized local inference supporting global
    model learning.

    Fast L1-NMF for Multiple Parametric Model Estimation

    Mariano Tepper, Guillermo Sapiro
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In this work we introduce a comprehensive algorithmic pipeline for multiple
    parametric model estimation. The proposed approach analyzes the information
    produced by a random sampling algorithm (e.g., RANSAC) from a machine
    learning/optimization perspective, using a extit{parameterless} biclustering
    algorithm based on L1 nonnegative matrix factorization (L1-NMF). The proposed
    framework exploits consistent patterns that naturally arise during the RANSAC
    execution, while explicitly avoiding spurious inconsistencies. Contrarily to
    the main trends in the literature, the proposed technique does not impose
    non-intersecting parametric models. A new accelerated algorithm to compute
    L1-NMFs allows to handle medium-sized problems faster while also extending the
    usability of the algorithm to much larger datasets. This accelerated algorithm
    has applications in any other context where an L1-NMF is needed, beyond the
    biclustering approach to parameter estimation here addressed. We accompany the
    algorithmic presentation with theoretical foundations and numerous and diverse
    examples.

    Feasibility Based-Large Margin Nearest Neighbor Metric Learning

    Babak Hosseini, Barbara Hammer
    Comments: This is the preprint of a submitted conference paper as provided by the authors
    Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG)

    In the area of data classification, one of the prominent algorithms is the
    large margin nearest neighbor (LMNN) approach which is a metric learning to
    enhance the performance of the popular k-nearest neighbor classifier. In
    principles, LMNN learns a more efficient metric in the input space by using a
    linear mapping as the outcome of a convex optimization problem. However, one of
    the greatest weak points of LMNN is the strong reliance of its optimization
    paradigm on how the neighboring points are chosen. In this paper, it is
    mathematically proved for the first time that the regular way of choosing the
    target points can lead to non-feasible optimization conditions regardless of
    the number of chosen neighboring points. We present a mathematical approach to
    categorize the target points into feasible and infeasible exemplars, an also we
    provide a feasibility measure for preference of the target candidates. In our
    proposed Feasibility Based-LMNN algorithm, we use the above clue to construct
    the optimization problem based on the most promising general mapping directions
    in the input space. Our empirical results shows that via using the proposed
    FB-LMNN approach the optimization problem will converge in a better optimum
    point, and therefor leads to better classification on the well-known benchmark
    datasets.

    Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models

    Pranay Dighe, Afsaneh Asaei, Herve Bourlard
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)

    Conventional deep neural networks (DNN) for speech acoustic modeling rely on
    Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary
    class labels as the targets for DNN training. Subword classes in speech
    recognition systems correspond to context-dependent tied states or senones. The
    present work addresses some limitations of GMM-HMM senone alignments for DNN
    training. We hypothesize that the senone probabilities obtained from a DNN
    trained with binary labels can provide more accurate targets to learn better
    acoustic models. However, DNN outputs bear inaccuracies which are exhibited as
    high dimensional unstructured noise, whereas the informative components are
    structured and low-dimensional. We exploit principle component analysis (PCA)
    and sparse coding to characterize the senone subspaces. Enhanced probabilities
    obtained from low-rank and sparse reconstructions are used as soft-targets for
    DNN acoustic modeling, that also enables training with untranscribed data.
    Experiments conducted on AMI corpus shows 4.6% relative reduction in word error
    rate.

    Markov Chain Truncation for Doubly-Intractable Inference

    Colin Wei, Iain Murray
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Computing partition functions, the normalizing constants of probability
    distributions, is often hard. Variants of importance sampling give unbiased
    estimates of a normalizer Z, however, unbiased estimates of the reciprocal 1/Z
    are harder to obtain. Unbiased estimates of 1/Z allow Markov chain Monte Carlo
    sampling of “doubly-intractable” distributions, such as the parameter posterior
    for Markov Random Fields or Exponential Random Graphs. We demonstrate how to
    construct unbiased estimates for 1/Z given access to black-box importance
    sampling estimators for Z. We adapt recent work on random series truncation and
    Markov chain coupling, producing estimators with lower variance and a higher
    percentage of positive estimates than before. Our debiasing algorithms are
    simple to implement, and have some theoretical and empirical advantages over
    existing methods.

    Stylometric Analysis of Early Modern Period English Plays

    Mark Eisen, Santiago Segarra, Gabriel Egan, Alejandro Ribeiro
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Function word adjacency networks (WANs) are used to study the authorship of
    plays from the Early Modern English period. In these networks, nodes are
    function words and directed edges between two nodes represent the likelihood of
    ordered co-appearance of the two words. For every analyzed play a WAN is
    constructed and these are aggregated to generate author profile networks. We
    first study the similarity of writing styles between Early English playwrights
    by comparing the profile WANs. The accuracy of using WANs for authorship
    attribution is then demonstrated by attributing known plays among six popular
    playwrights. The WAN method is shown to additionally outperform other
    frequency-based methods on attributing Early English plays. This high
    classification power is then used to investigate the authorship of anonymous
    plays. Moreover, WANs are shown to be reliable classifiers even when
    attributing collaborative plays. For several plays of disputed co- authorship,
    a deeper analysis is performed by attributing every act and scene separately,
    in which we both corroborate existing breakdowns and provide evidence of new
    assignments. Finally, the impact of genre on attribution accuracy is examined
    revealing that the genre of a play partially conditions the choice of the
    function words used in it.

    Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server

    Arda Aytekin, Hamid Reza Feyzmahdavian, Mikael Johansson
    Comments: 10 pages, 3 figures
    Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Machine Learning (stat.ML)

    This paper presents an asynchronous incremental aggregated gradient algorithm
    and its implementation in a parameter server framework for solving regularized
    optimization problems. The algorithm can handle both general convex (possibly
    non-smooth) regularizers and general convex constraints. When the empirical
    data loss is strongly convex, we establish linear convergence rate, give
    explicit expressions for step-size choices that guarantee convergence to the
    optimum, and bound the associated convergence factors. The expressions have an
    explicit dependence on the degree of asynchrony and recover classical results
    under synchronous operation. Simulations and implementations on commercial
    compute clouds validate our findings.

    An Interactive Machine Learning Framework

    Teng Lee, James Johnson, Steve Cheng
    Subjects: Human-Computer Interaction (cs.HC); Learning (cs.LG)

    Machine learning (ML) is believed to be an effective and efficient tool to
    build reliable prediction model or extract useful structure from an avalanche
    of data. However, ML is also criticized by its difficulty in interpretation and
    complicated parameter tuning. In contrast, visualization is able to well
    organize and visually encode the entangled information in data and guild
    audiences to simpler perceptual inferences and analytic thinking. But large
    scale and high dimensional data will usually lead to the failure of many
    visualization methods. In this paper, we close a loop between ML and
    visualization via interaction between ML algorithm and users, so machine
    intelligence and human intelligence can cooperate and improve each other in a
    mutually rewarding way. In particular, we propose “transparent boosting tree
    (TBT)”, which visualizes both the model structure and prediction statistics of
    each step in the learning process of gradient boosting tree to user, and
    involves user’s feedback operations to trees into the learning process. In TBT,
    ML is in charge of updating weights in learning model and filtering information
    shown to user from the big data, while visualization is in charge of providing
    a visual understanding of ML model to facilitate user exploration. It combines
    the advantages of both ML in big data statistics and human in decision making
    based on domain knowledge. We develop a user friendly interface for this novel
    learning method, and apply it to two datasets collected from real applications.
    Our study shows that making ML transparent by using interactive visualization
    can significantly improve the exploration of ML algorithms, give rise to novel
    insights of ML models, and integrates both machine and human intelligence.

    Modern WLAN Fingerprinting Indoor Positioning Methods and Deployment Challenges

    Ali Khalajmehrabadi, Nikolaos Gatsis, David Akopian
    Subjects: Networking and Internet Architecture (cs.NI); Computational Engineering, Finance, and Science (cs.CE); Learning (cs.LG)

    Wireless Local Area Network (WLAN) has become a promising choice for indoor
    positioning as the only existing and established infrastructure, to localize
    the mobile and stationary users indoors. However, since WLAN has been initially
    designed for wireless networking and not positioning, the localization task
    based on WLAN signals has several challenges. Amongst the WLAN positioning
    methods, WLAN fingerprinting localization has recently achieved great attention
    due to its promising results. WLAN fingerprinting faces several challenges and
    hence, in this paper, our goal is to overview these challenges and the
    state-of-the-art solutions. This paper consists of three main parts: 1)
    Conventional localization schemes; 2) State-of-the-art approaches; 3) Practical
    deployment challenges. Since all the proposed methods in WLAN literature have
    been conducted and tested in different settings, the reported results are not
    equally comparable. So, we compare some of the main localization schemes in a
    single real environment and assess their localization accuracy, positioning
    error statistics, and complexity. Our results depict illustrative evaluation of
    WLAN localization systems and guide to future improvement opportunities.

    Structured Group Sparsity: A Novel Indoor WLAN Localization, Outlier Detection, and Radio Map Interpolation Scheme

    Ali Khalajmehrabadi, Nikolaos Gatsis, David Akopian
    Subjects: Networking and Internet Architecture (cs.NI); Computational Engineering, Finance, and Science (cs.CE); Learning (cs.LG)

    This paper introduces novel schemes for indoor localization, outlier
    detection, and radio map interpolation using Wireless Local Area Networks
    (WLANs). The localization method consists of a novel multicomponent
    optimization technique that minimizes the squared (ell_{2})-norm of the
    residuals between the radio map and the online Received Signal Strength (RSS)
    measurements, the (ell_{1})-norm of the user’s location vector, and weighted
    (ell_{2})-norms of layered groups of Reference Points (RPs). RPs are grouped
    using a new criterion based on the similarity between the so-called Access
    Point (AP) coverage vectors. In addition, since AP readings are prone to
    containing inordinate readings, called outliers, an augmented optimization
    problem is proposed to detect the outliers and localize the user with cleaned
    online measurements. Moreover, a novel scheme to record fingerprints from a
    smaller number of RPs and estimate the radio map at RPs without recorded
    fingerprints is developed using sparse recovery techniques. All localization
    schemes are tested on RSS fingerprints collected from a real environment. The
    overall scheme has comparable complexity with competing approaches, while it
    performs with high accuracy under a small number of APs and finer granularity
    of RPs.

    A Joint Indoor WLAN Localization and Outlier Detection Scheme Using LASSO and Elastic-Net Optimization Techniques

    Ali Khalajmehrabadi, Nikolaos Gatsis, Daniel Pack, David Akopian
    Subjects: Networking and Internet Architecture (cs.NI); Computational Engineering, Finance, and Science (cs.CE); Learning (cs.LG)

    In this paper, we introduce two indoor Wireless Local Area Network (WLAN)
    positioning methods using augmented sparse recovery algorithms. These schemes
    render a sparse user’s position vector, and in parallel, minimize the distance
    between the online measurement and radio map. The overall localization scheme
    for both methods consists of three steps: 1) coarse localization, obtained from
    comparing the online measurements with clustered radio map. A novel graph-based
    method is proposed to cluster the offline fingerprints. In the online phase, a
    Region Of Interest (ROI) is selected within which we search for the user’s
    location; 2) Access Point (AP) selection; and 3) fine localization through the
    novel sparse recovery algorithms. Since the online measurements are subject to
    inordinate measurement readings, called outliers, the sparse recovery methods
    are modified in order to jointly estimate the outliers and user’s position
    vector. The outlier detection procedure identifies the APs whose readings are
    either not available or erroneous. The proposed localization methods have been
    tested with Received Signal Strength (RSS) measurements in a typical office
    environment and the results show that they can localize the user with
    significantly high accuracy and resolution which is superior to the results
    from competing WLAN fingerprinting localization methods.


    Information Theory

    Uniform Recovery from Subgaussian Multi-Sensor Measurements

    Il Yong Chu, Ben Adcock
    Comments: 39 pages, 7 figures
    Subjects: Information Theory (cs.IT); Functional Analysis (math.FA)

    Parallel acquisition systems are employed successfully in a variety of
    different sensing applications when a single sensor cannot provide enough
    measurements for a high-quality reconstruction. In this paper, we consider
    compressed sensing (CS) for parallel acquisition systems when the individual
    sensors use subgaussian random sampling. Our main results are a series of
    uniform recovery guarantees which relate the number of measurements required to
    the basis in which the solution is sparse and certain characteristics of the
    multi-sensor system, known as sensor profile matrices. In particular, we derive
    sufficient conditions for optimal recovery, in the sense that the number of
    measurements required per sensor decreases linearly with the total number of
    sensors, and demonstrate explicit examples of multi-sensor systems for which
    this holds. We establish these results by proving the so-called Asymmetric
    Restricted Isometry Property (ARIP) for the sensing system and use this to
    derive both nonuniversal and universal recovery guarantees. Compared to
    existing work, our results not only lead to better stability and robustness
    estimates but also provide simpler and sharper constants in the measurement
    conditions. Finally, we show how the problem of CS with block-diagonal sensing
    matrices can be viewed as a particular case of our multi-sensor framework.
    Specializing our results to this setting leads to a new recovery guarantee
    depending only on the incoherence of the sparsity basis. This improves existing
    results in the literature and confirms the existence of deterministic sparsity
    bases for which CS with subgaussian block-diagonal matrices has a comparable
    recovery guarantee to CS with full subgaussian matrices.

    Packet Error Rate Analysis of Uncoded Schemes in Block-Fading Channels using Extreme Value Theory

    Aamir Mahmood, Riku Jäntti
    Subjects: Information Theory (cs.IT)

    We present a generic approximation of the packet error rate (PER) function of
    uncoded schemes in the AWGN channel using extreme value theory (EVT). The PER
    function can assume both the exponential and the Gaussian Q-function bit error
    rate (BER) forms. The EVT approach leads us to a best closed-form
    approximation, in terms of accuracy and computational efficiency, of the
    average PER in block-fading channels. The numerical analysis shows that the
    approximation holds tight for any value of SNR and packet length whereas the
    earlier studies approximate the average PER only at asymptotic SNRs and packet
    lengths.

    Tractable Framework for the Analysis of General Multi-Tier Heterogeneous Cellular Networks

    Serkan Ak, Hazer Inaltekin, H. Vincent Poor
    Comments: arXiv admin note: substantial text overlap with arXiv:1601.06023, arXiv:1601.07315
    Subjects: Information Theory (cs.IT)

    This paper investigates the downlink performance of K-tier heteregeneous
    cellular networks (HCNs) under general settings. First, Gaussian approximation
    bounds for the standardized aggregate wireless interference (AWI) in dense
    K-tier HCNs are obtained for when base stations (BSs) in each tier are
    distributed over the plane according to a spatial and general Poisson point
    process. The Kolmogorov-Smirnov (KS) distance is used to measure deviations of
    the distribution of the standardized AWI from the standard normal distribution.
    An explicit and analytical expression bounding the KS distance between these
    two distributions is obtained as a function of a broad range of network
    parameters such as per-tier transmission power levels, per-tier BS intensity,
    BS locations, general fading statistics, and general bounded path-loss models.
    Bounds achieve a good statistical match between the standardized AWI
    distribution and its normal approximation even for moderately dense HCNs.
    Second, various spatial performance metrics of interest such as outage
    capacity, ergodic capacity and area spectral efficiency in the downlink of
    K-tier HCNs for general signal propogation models are investigated by making
    use of the derived distribution approximation results. Considering two specific
    BS association policies, it is shown that the derived performance bounds track
    the actual performance metrics reasonably well for a wide range of BS
    intensities, with the gap among them becoming negligibly small for denser HCN
    deployments.

    An SMDP Approach to Optimal PHY Configuration in Wireless Networks

    Mark Shifrin, Daniel S. Menasché, Asaf Cohen, Omer Gurewitz, Dennis Goeckel
    Comments: arXiv admin note: text overlap with arXiv:1601.06859
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    In this work, we study the optimal configuration of the physical layer in
    wireless networks by means of Semi-Markov Decision Process (SMDP) modeling. In
    particular, assume the physical layer is characterized by a set of potential
    operating points, with each point corresponding to a rate and reliability pair;
    for example, these pairs might be obtained through a now-standard
    diversity-vs-multiplexing trade-off characterization. Given the current network
    state (e.g., buffer occupancies), a Decision Maker (DM) needs to dynamically
    decide which operating point to use. The SMDP problem formulation allows us to
    choose from these pairs an optimal selection, which is expressed by a decision
    rule as a function of the number of awaiting packets in the source finite
    queue, channel state, size of the packet to be transmitted. We derive a general
    solution which covers various model configurations, including packet size
    distributions and varying channels. For the specific case of exponential
    transmission time, we analytically prove the optimal policy has a threshold
    structure. Numerical results validate this finding, as well as depict
    muti-threshold policies for time varying channels such as the Gilbert-Elliot
    channel

    Auxiliary Beam Pair Enabled AoD and AoA Estimation in Closed-Loop Large-Scale mmWave MIMO System

    Dalin Zhu, Junil Choi, Robert W. Heath Jr
    Comments: Submitted to IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    Channel estimation is of critical importance in millimeter-wave (mmWave)
    multiple-input multiple-output (MIMO) system. Due to the use of large antenna
    arrays, low-complexity mmWave specific channel estimation algorithms are
    required. In this paper, an auxiliary beam pair design is proposed to provide
    high-resolution estimates of the channel’s angle-of-departure (AoD) and
    angle-of-arrival (AoA) for mmWave MIMO system. By performing an amplitude
    comparison with respect to each auxiliary beam pair, a set of ratio measures
    that characterize the channel’s AoD and AoA are obtained by the receiver.
    Either the best ratio measure or the estimated AoD is quantized and fed back to
    the transmitter via a feedback channel. The proposed technique can be
    incorporated into control channel design to minimize initial access delay.
    Though the design principles are derived assuming a high-power regime,
    evaluation under more realistic assumption show that by employing the proposed
    method, good angle estimation performance is achieved under various
    signal-to-noise ratio levels and channel conditions.

    Conic Quadratic Formulations for Wireless Communications Design

    Quang-Doanh Vu, Markku Juntti, Een-Kee Hong, Le-Nam Tran
    Comments: Submitted for possible publication, 13 pages, 9 figures
    Subjects: Information Theory (cs.IT)

    As a wide class of resource management problems in wireless communications
    are nonconvex and even NP-hard in many cases, finding globally optimal
    solutions to these problems is of little practical interest. Towards more
    pragmatic approaches, there is a rich literature on iterative methods aiming at
    finding a solution satisfying necessary optimality conditions to these
    problems. These approaches have been derived under several similar mathematical
    frameworks such as inner approximation algorithm, concave-convex procedure,
    majorization-minimization algorithm, and successive convex approximation (SCA).
    However, a large portion of existing algorithms arrive at a relatively generic
    program at each iteration, which is less computationally efficient compared to
    a more standard convex formulation. The purpose of this paper is to present
    useful transformations and approximations to deal with nonconvexity in wireless
    communications design problems. More specifically, the proposed formulations
    can approximate nonconvex problems by a series of second-order cone programs
    (SOCPs) in light of SCA framework. We revisit various design problems in
    wireless communications to demonstrate the advantages of the proposed idea.
    Theoretical analysis and numerical results show the superior efficiency in
    terms of computational cost of our proposed solutions compared to the existing
    ones.

    On Optimizing Hierarchical Modulation in AWGN channel

    Baicen Xiao, Kexin Xiao, Zhiyong Chen, Hui Liu
    Comments: Submitted to IEEE ICC 2017
    Subjects: Information Theory (cs.IT)

    Hierarchical modulation (HM) is able to provide different levels of
    protection for data streams and achieve a rate region that cannot be realized
    by traditional orthogonal schemes, such as time division (TD). Nevertheless,
    criterions and algorithms for general HM design are not available in existing
    literatures. In this paper, we jointly optimize the constellation positions and
    binary labels for HM to be used in additive white gaussian noise (AWGN)
    channel. Based on bit-interleaved coded modulation (BICM) with successive
    interference cancellation (SIC) capacity, our main purpose is to maximize the
    rate of one data stream, with power constrains and the constrain that the rate
    of other data streams should be larger than given thresholds. Multi-start
    interior-point algorithm is used to carry out the constellation optimization
    problems and methods to reduce optimization complexity are also proposed in
    this paper. Numerical results verify the performance gains of optimized HM
    compared with optimized quadrature amplidude modulation (QAM) based HM and
    other orthogonal transmission methods.

    Optimal Communication Scheduling and Remote Estimation over an Additive Noise Channel

    Xiaobin Gao, Emrah Akyol, Tamer Basar
    Comments: Extended version of a journal submission
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Systems and Control (cs.SY)

    This paper considers a sequential sensor scheduling and remote estimation
    problem with one sensor and one estimator. The sensor makes sequential
    observations about the state of an underlying memoryless stochastic process and
    makes a decision as to whether or not to send this measurement to the
    estimator. The sensor and the estimator have the common objective of minimizing
    expected distortion in the estimation of the state of the process, over a
    finite time horizon. The sensor is either charged a cost for each transmission
    or constrained on transmission times. As opposed to the prior work where
    communication between the sensor and the estimator was assumed to be perfect
    (noiseless), in this work an additive noise channel with fixed power constraint
    is considered; hence, the sensor has to encode its message before transmission.
    Under some technical assumptions, we obtain the optimal encoding and estimation
    policies in conjunction with the optimal transmission schedule. The impact of
    the presence of a noisy channel is analyzed numerically based on dynamic
    programming. This analysis yields some rather surprising results such as a
    phase transition phenomenon in the number of used transmission opportunities,
    which was not encountered in the noiseless communication setting.




沪ICP备19023445号-2号
友情链接