IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Mon, 27 Mar 2017

    我爱机器学习(52ml.net)发表于 2017-03-27 00:00:00
    love 0

    Neural and Evolutionary Computing

    PonyGE2: Grammatical Evolution in Python

    Michael Fenton, James McDermott, David Fagan, Stefan Forstenlechner, Michael O'Neill, Erik Hemberg
    Comments: 8 pages, 4 figures, submitted to the 2017 GECCO Workshop on Evolutionary Computation Software Systems (EvoSoft)
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Grammatical Evolution (GE) is a population-based evolutionary algorithm,
    where a formal grammar is used in the genotype to phenotype mapping process.
    PonyGE2 is an open source implementation of GE in Python, developed at UCD’s
    Natural Computing Research and Applications group. It is intended as an
    advertisement and a starting-point for those new to GE, a reference for
    students and researchers, a rapid-prototyping medium for our own experiments,
    and a Python workout. As well as providing the characteristic genotype to
    phenotype mapping of GE, a search algorithm engine is also provided. A number
    of sample problems and tutorials on how to use and adapt PonyGE2 have been
    developed.

    Long-Term Evolution of Genetic Programming Populations

    W. B. Langdon
    Comments: Longer version of Langdon:2017:GECCO, July 2017, ACM, Berlin, RN/17/05 this http URL
    Subjects: Neural and Evolutionary Computing (cs.NE)

    We evolve binary mux-6 trees for up to 100000 generations evolving some
    programs with more than a hundred million nodes. Our unbounded Long-Term
    Evolution Experiment LTEE GP appears not to evolve building blocks but does
    suggests a limit to bloat. We do see periods of tens even hundreds of
    generations where the population is 100 percent functionally converged. The
    distribution of tree sizes is not as predicted by theory.


    Computer Vision and Pattern Recognition

    Radiomics strategies for risk assessment of tumour failure in head-and-neck cancer

    Martin Vallières (1), Emily Kay-Rivest (2), Léo Jean Perrin (3), Xavier Liem (4), Christophe Furstoss (5), Hugo J. W. L. Aerts (6), Nader Khaouam (5), Phuc Felix Nguyen-Tan (4), Chang-Shu Wang (3), Khalil Sultanem (2), Jan Seuntjens (1), Issam El Naqa (7) ((1) Medical Physics Unit, McGill University, Montréal, Canada, (2) Radiation Oncology Division, Hôpital général juif, Montréal, Canada, (3) Department of Radiation Oncology, Centre hospitalier universitaire de Sherbrooke, Montréal, Canada, (4) Department of Radiation Oncology, Centre hospitalier de l'Université de Montréal, Montréal, Canada, (5) Department of Radiation Oncology, Hôpital Maisonneuve-Rosemont, Montréal, Canada, (6) Departments of Radiation Oncology & Radiology, Dana-Farber Cancer Institute, Boston, USA, (7) Department of Radiation Oncology, Physics Division, University of Michigan, Ann Arbor, USA)
    Comments: (1) Paper: 33 pages, 4 figures, 1 table; (2) SUPP info: 41 pages, 7 figures, 8 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Quantitative extraction of high-dimensional mineable data from medical images
    is a process known as radiomics. Radiomics is foreseen as an essential
    prognostic tool for cancer risk assessment and the quantification of
    intratumoural heterogeneity. In this work, 1615 radiomic features (quantifying
    tumour image intensity, shape, texture) extracted from pre-treatment FDG-PET
    and CT images of 300 patients from four different cohorts were analyzed for the
    risk assessment of locoregional recurrences (LR) and distant metastases (DM) in
    head-and-neck cancer. Prediction models combining radiomic and clinical
    variables were constructed via random forests and imbalance-adjustment
    strategies using two of the four cohorts. Independent validation of the
    prediction and prognostic performance of the models was carried out on the
    other two cohorts (LR: AUC = 0.69 and CI = 0.67; DM: AUC = 0.86 and CI = 0.88).
    Furthermore, the results obtained via Kaplan-Meier analysis demonstrated the
    potential of radiomics for assessing the risk of specific tumour outcomes using
    multiple stratification groups. This could have important clinical impact,
    notably by allowing for a better personalization of chemo-radiation treatments
    for head-and-neck cancer patients from different risk groups.

    Local Deep Neural Networks for Age and Gender Classification

    Zukang Liao, Stavros Petridis, Maja Pantic
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Local deep neural networks have been recently introduced for gender
    recognition. Although, they achieve very good performance they are very
    computationally expensive to train. In this work, we introduce a simplified
    version of local deep neural networks which significantly reduces the training
    time. Instead of using hundreds of patches per image, as suggested by the
    original method, we propose to use 9 overlapping patches per image which cover
    the entire face region. This results in a much reduced training time, since
    just 9 patches are extracted per image instead of hundreds, at the expense of a
    slightly reduced performance. We tested the proposed modified local deep neural
    networks approach on the LFW and Adience databases for the task of gender and
    age classification. For both tasks and both databases the performance is up to
    1% lower compared to the original version of the algorithm. We have also
    investigated which patches are more discriminative for age and gender
    classification. It turns out that the mouth and eyes regions are useful for age
    classification, whereas just the eye region is useful for gender
    classification.

    Multi-stage Multi-recursive-input Fully Convolutional Networks for Neuronal Boundary Detection

    Wei Shen, Bin Wang, Yuan Jiang, Yan Wang, Alan Yuille
    Comments: 10 pages, 7 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In the field of connectomics, neuroscientists seek to identify cortical
    connectivity comprehensively. Neuronal boundary detection from the Electron
    Microscopy (EM) images is often done to assist the automatic reconstruction of
    neuronal circuit. But the segmentation of EM images is a challenging problem,
    as it requires the detector to be able to detect both filament-like thin and
    blob-like thick membrane, while suppressing the ambiguous intracellular
    structure. In this paper, we propose multi-stage multi-recursive-input fully
    convolutional networks to address this problem. The multiple recursive inputs
    for one stage, i.e., the multiple side outputs with different receptive field
    sizes learned from the lower stage, provide multi-scale contextual boundary
    information for the consecutive learning. This design is
    biologically-plausible, as it likes a human visual system to compare different
    possible segmentation solutions to address the ambiguous boundary issue. Our
    multi-stage networks are trained end-to-end. It achieves promising results on a
    public available mouse piriform cortex dataset, which significantly outperforms
    other competitors.

    Content-Based Image Retrieval Based on Late Fusion of Binary and Local Descriptors

    Nouman Ali, Danish Ali Mazhar, Zeshan Iqbal, Rehan Ashraf, Jawad Ahmed, Farrukh Zeeshan Khan
    Journal-ref: International Journal of Computer Science and Information Security
    (IJCSIS), Volume 14, Issue 11, Publication date 2016/11
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    One of the challenges in Content-Based Image Retrieval (CBIR) is to reduce
    the semantic gaps between low-level features and high-level semantic concepts.
    In CBIR, the images are represented in the feature space and the performance of
    CBIR depends on the type of selected feature representation. Late fusion also
    known as visual words integration is applied to enhance the performance of
    image retrieval. The recent advances in image retrieval diverted the focus of
    research towards the use of binary descriptors as they are reported
    computationally efficient. In this paper, we aim to investigate the late fusion
    of Fast Retina Keypoint (FREAK) and Scale Invariant Feature Transform (SIFT).
    The late fusion of binary and local descriptor is selected because among binary
    descriptors, FREAK has shown good results in classification-based problems
    while SIFT is robust to translation, scaling, rotation and small distortions.
    The late fusion of FREAK and SIFT integrates the performance of both feature
    descriptors for an effective image retrieval. Experimental results and
    comparisons show that the proposed late fusion enhances the performances of
    image retrieval.

    Medical Image Retrieval using Deep Convolutional Neural Network

    Adnan Qayyum, Syed Muhammad Anwar, Muhammad Awais, Muhammad Majid
    Comments: Submitted to Neurocomputing
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    With a widespread use of digital imaging data in hospitals, the size of
    medical image repositories is increasing rapidly. This causes difficulty in
    managing and querying these large databases leading to the need of content
    based medical image retrieval (CBMIR) systems. A major challenge in CBMIR
    systems is the semantic gap that exists between the low level visual
    information captured by imaging devices and high level semantic information
    perceived by human. The efficacy of such systems is more crucial in terms of
    feature representations that can characterize the high-level information
    completely. In this paper, we propose a framework of deep learning for CBMIR
    system by using deep Convolutional Neural Network (CNN) that is trained for
    classification of medical images. An intermodal dataset that contains twenty
    four classes and five modalities is used to train the network. The learned
    features and the classification results are used to retrieve medical images.
    For retrieval, best results are achieved when class based predictions are used.
    An average classification accuracy of 99.77% and a mean average precision of
    0.69 is achieved for retrieval task. The proposed method is best suited to
    retrieve multimodal medical images for different body organs.

    Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach

    Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Zhao, Shuicheng Yan
    Comments: Accepted to appear in CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We investigate a principle way to progressively mine discriminative object
    regions using classification networks to address the weakly-supervised semantic
    segmentation problems. Classification networks are only responsive to small and
    sparse discriminative regions from the object of interest, which deviates from
    the requirement of the segmentation task that needs to localize dense, interior
    and integral regions for pixel-wise inference. To mitigate this gap, we propose
    a new adversarial erasing approach for localizing and expanding object regions
    progressively. Starting with a single small object region, our proposed
    approach drives the classification network to sequentially discover new and
    complement object regions by erasing the current mined regions in an
    adversarial manner. These localized regions eventually constitute a dense and
    complete object region for learning semantic segmentation. To further enhance
    the quality of the discovered regions by adversarial erasing, an online
    prohibitive segmentation learning approach is developed to collaborate with
    adversarial erasing by providing auxiliary segmentation supervision modulated
    by the more reliable classification scores. Despite its apparent simplicity,
    the proposed approach achieves 55.0% and 55.7% mean Intersection-over-Union
    (mIoU) scores on PASCAL VOC 2012 val and test sets, which are the new
    state-of-the-arts.

    DeepVisage: Making face recognition simple yet with powerful generalization skills

    Abul Hasnat, Julien Bohné, Stéphane Gentric, Liming Chen
    Comments: 9 pages, under review
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Face recognition (FR) methods report significant performance by adopting the
    convolutional neural network (CNN) based learning methods. Although CNNs are
    mostly trained by optimizing the softmax loss, the recent trend shows an
    improvement of accuracy with different strategies, such as task-specific CNN
    learning with different loss functions, fine-tuning on target dataset, metric
    learning and concatenating features from multiple CNNs. Incorporating these
    tasks obviously requires additional efforts. Moreover, it demotivates the
    discovery of efficient CNN models for FR which are trained only with identity
    labels. We focus on this fact and propose an easily trainable and single CNN
    based FR method. Our CNN model exploits the residual learning framework.
    Additionally, it uses normalized features to compute the loss. Our extensive
    experiments show excellent generalization on different datasets. We obtain very
    competitive and state-of-the-art results on the LFW, IJB-A, YouTube faces and
    CACD datasets.

    Feature Fusion using Extended Jaccard Graph and Stochastic Gradient Descent for Robot

    Shenglan Liu, Muxin Sun, Wei Wang, Feilong Wang
    Comments: Assembly Automation
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)

    Robot vision is a fundamental device for human-robot interaction and robot
    complex tasks. In this paper, we use Kinect and propose a feature graph fusion
    (FGF) for robot recognition. Our feature fusion utilizes RGB and depth
    information to construct fused feature from Kinect. FGF involves multi-Jaccard
    similarity to compute a robust graph and utilize word embedding method to
    enhance the recognition results. We also collect DUT RGB-D face dataset and a
    benchmark datset to evaluate the effectiveness and efficiency of our method.
    The experimental results illustrate FGF is robust and effective to face and
    object datasets in robot applications.

    A Hybrid Deep Learning Approach for Texture Analysis

    Hussein Adly, Mohamed Moustafa
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Texture classification is a problem that has various applications such as
    remote sensing and forest species recognition. Solutions tend to be custom fit
    to the dataset used but fails to generalize. The Convolutional Neural Network
    (CNN) in combination with Support Vector Machine (SVM) form a robust selection
    between powerful invariant feature extractor and accurate classifier. The
    fusion of experts provides stability in classification rates among different
    datasets.

    Scalable Person Re-identification on Supervised Smoothed Manifold

    Song Bai, Xiang Bai, Qi Tian
    Comments: Accepted as spotlight by CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Most existing person re-identification algorithms either extract robust
    visual features or learn discriminative metrics for person images. However, the
    underlying manifold which those images reside on is rarely investigated. That
    raises a problem that the learned metric is not smooth with respect to the
    local geometry structure of the data manifold.

    In this paper, we study person re-identification with manifold-based affinity
    learning, which did not receive enough attention from this area. An
    unconventional manifold-preserving algorithm is proposed, which can 1) make the
    best use of supervision from training data, whose label information is given as
    pairwise constraints; 2) scale up to large repositories with low on-line time
    complexity; and 3) be plunged into most existing algorithms, serving as a
    generic postprocessing procedure to further boost the identification
    accuracies. Extensive experimental results on five popular person
    re-identification benchmarks consistently demonstrate the effectiveness of our
    method. Especially, on the largest CUHK03 and Market-1501, our method
    outperforms the state-of-the-art alternatives by a large margin with high
    efficiency, which is more appropriate for practical applications.

    Improving Classification by Improving Labelling: Introducing Probabilistic Multi-Label Object Interaction Recognition

    Michael Wray, Davide Moltisanti, Walterio Mayol-Cuevas, Dima Damen
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This work deviates from easy-to-define class boundaries for object
    interactions. For the task of object interaction recognition, often captured
    using an egocentric view, we show that semantic ambiguities in verbs and
    recognising sub-interactions along with concurrent interactions result in
    legitimate class overlaps (Figure 1). We thus aim to model the mapping between
    observations and interaction classes, as well as class overlaps, towards a
    probabilistic multi-label classifier that emulates human annotators. Given a
    video segment containing an object interaction, we model the probability for a
    verb, out of a list of possible verbs, to be used to annotate that interaction.
    The proba- bility is learnt from crowdsourced annotations, and is tested on two
    public datasets, comprising 1405 video sequences for which we provide
    annotations on 90 verbs. We outper- form conventional single-label
    classification by 11% and 6% on the two datasets respectively, and show that
    learning from annotation probabilities outperforms majority voting and enables
    discovery of co-occurring labels.

    Deep Direct Regression for Multi-Oriented Scene Text Detection

    Wenhao He, Xu-Yao Zhang, Fei Yin, Cheng-Lin Liu
    Comments: 9 pages, 9 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we first provide a new perspective to divide existing high
    performance object detection methods into direct and indirect regressions.
    Direct regression performs boundary regression by predicting the offsets from a
    given point, while indirect regression predicts the offsets from some bounding
    box proposals. Then we analyze the drawbacks of the indirect regression, which
    the recent state-of-the-art detection structures like Faster-RCNN and SSD
    follows, for multi-oriented scene text detection, and point out the potential
    superiority of direct regression. To verify this point of view, we propose a
    deep direct regression based method for multi-oriented scene text detection.
    Our detection framework is simple and effective with a fully convolutional
    network and one-step post processing. The fully convolutional network is
    optimized in an end-to-end way and has bi-task outputs where one is pixel-wise
    classification between text and non-text, and the other is direct regression to
    determine the vertex coordinates of quadrilateral text boundaries. The proposed
    method is particularly beneficial for localizing incidental scene texts. On the
    ICDAR2015 Incidental Scene Text benchmark, our method achieves the F1-measure
    of 81%, which is a new state-of-the-art and significantly outperforms previous
    approaches. On other standard datasets with focused scene texts, our method
    also reaches the state-of-the-art performance.

    View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data

    Pengfei Zhang, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jianru Xue, Nanning Zheng
    Comments: Technique report
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Skeleton-based human action recognition has recently attracted increasing
    attention due to the popularity of 3D skeleton data. One main challenge lies in
    the large view variations in captured human actions. We propose a novel view
    adaptation scheme to automatically regulate observation viewpoints during the
    occurrence of an action. Rather than re-positioning the skeletons based on a
    human defined prior criterion, we design a view adaptive recurrent neural
    network (RNN) with LSTM architecture, which enables the network itself to adapt
    to the most suitable observation viewpoints from end to end. Extensive
    experiment analyses show that the proposed view adaptive RNN model strives to
    (1) transform the skeletons of various views to much more consistent viewpoints
    and (2) maintain the continuity of the action rather than transforming every
    frame to the same position with the same body orientation. Our model achieves
    state-of-the-art performance on three benchmark datasets. On the current
    largest NTU RGB+D dataset, our scheme outperforms the state of the art by an
    impressive 6% gain in accuracy.

    Semi-Automatic Segmentation and Ultrasonic Characterization of Solid Breast Lesions

    Mohammad Saad Billah, Tahmida Binte Mahmud
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Characterization of breast lesions is an essential prerequisite to detect
    breast cancer in an early stage. Automatic segmentation makes this
    categorization method robust by freeing it from subjectivity and human error.
    Both spectral and morphometric features are successfully used for
    differentiating between benign and malignant breast lesions. In this thesis, we
    used empirical mode decomposition method for semi-automatic segmentation.
    Sonographic features like ehcogenicity, heterogeneity, FNPA, margin definition,
    Hurst coefficient, compactness, roundness, aspect ratio, convexity, solidity,
    form factor were calculated to be used as our characterization parameters. All
    of these parameters did not give desired comparative results. But some of them
    namely echogenicity, heterogeneity, margin definition, aspect ratio and
    convexity gave good results and were used for characterization.

    Single Image Super-resolution with a Parameter Economic Residual-like Convolutional Neural Network

    Yudong Liang, Ze Yang, Kai Zhang, Yihui He, Jinjun Wang, Nanning Zheng
    Comments: Extentions of mmm 2017 paper
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent years have witnessed great success of convolutional neural network
    (CNN) for various problems both in low and high level visions. Especially
    noteworthy is the residual network which was originally proposed to handle
    high-level vision problems and enjoys several merits. This paper aims to extend
    the merits of residual network, such as skip connection induced fast training,
    for a typical low-level vision problem, i.e., single image super-resolution. In
    general, the two main challenges of existing deep CNN for supper-resolution lie
    in the gradient exploding/vanishing problem and large amount of parameters or
    computational cost as CNN goes deeper. Correspondingly, the skip connections or
    identity mapping shortcuts are utilized to avoid gradient exploding/vanishing
    problem. To tackle with the second problem, a parameter economic CNN
    architecture which has carefully designed width, depth and skip connections was
    proposed. Different residual-like architectures for image superresolution has
    also been compared. Experimental results have demonstrated that the proposed
    CNN model can not only achieve state-of-the-art PSNR and SSIM results for
    single image super-resolution but also produce visually pleasant results. This
    paper has extended the mmm 2017 paper with more experiments and explanations.

    On the Robustness of Convolutional Neural Networks to Internal Architecture and Weight Perturbations

    Nicholas Cheney, Martin Schrimpf, Gabriel Kreiman
    Comments: under review at ICML 2017
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Deep convolutional neural networks are generally regarded as robust function
    approximators. So far, this intuition is based on perturbations to external
    stimuli such as the images to be classified. Here we explore the robustness of
    convolutional neural networks to perturbations to the internal weights and
    architecture of the network itself. We show that convolutional networks are
    surprisingly robust to a number of internal perturbations in the higher
    convolutional layers but the bottom convolutional layers are much more fragile.
    For instance, Alexnet shows less than a 30% decrease in classification
    performance when randomly removing over 70% of weight connections in the top
    convolutional or dense layers but performance is almost at chance with the same
    perturbation in the first convolutional layer. Finally, we suggest further
    investigations which could continue to inform the robustness of convolutional
    networks to internal perturbations.


    Artificial Intelligence

    Reasoning by Cases in Structured Argumentation

    Mathieu Beirlaen, Jesse Heyninck, Christian Straßer
    Comments: Proceedings of SAC/KRR 2017
    Subjects: Artificial Intelligence (cs.AI)

    We extend the (ASPIC^+) framework for structured argumentation so as to allow
    applications of the reasoning by cases inference scheme for defeasible
    arguments. Given an argument with conclusion `(A) or (B)’, an argument based on
    (A) with conclusion (C), and an argument based on (B) with conclusion (C), we
    allow the construction of an argument with conclusion (C). We show how our
    framework leads to different results than other approaches in non-monotonic
    logic for dealing with disjunctive information, such as disjunctive default
    theory or approaches based on the OR-rule (which allows to derive a defeasible
    rule `If ((A) or (B)) then (C)’, given two defeasible rules `If (A) then (C)’
    and `If (B) then (C)’). We raise new questions regarding the subtleties of
    reasoning defeasibly with disjunctive information, and show that its
    formalization is more intricate than one would presume.

    Smart Augmentation – Learning an Optimal Data Augmentation Strategy

    Joseph Lemley, Shabab Bazrafkan, Peter Corcoran
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    A recurring problem faced when training neural networks is that there is
    typically not enough data to maximize the generalization capability of deep
    neural networks(DNN). There are many techniques to address this, including data
    augmentation, dropout, and transfer learning. In this paper, we introduce an
    additional method which we call Smart Augmentation and we show how to use it to
    increase the accuracy and reduce overfitting on a target network. Smart
    Augmentation works by creating a network that learns how to generate augmented
    data during the training process of a target network in a way that reduces that
    networks loss. This allows us to learn augmentations that minimize the error of
    that network.

    Smart Augmentation has shown the potential to increase accuracy by
    demonstrably significant measures on all datasets tested. In addition, it has
    shown potential to achieve similar or improved performance levels with
    significantly smaller network sizes in a number of tested cases.

    Overcoming Catastrophic Forgetting by Incremental Moment Matching

    Sang-Woo Lee, Jin-Hwa Kim, Jung-Woo Ha, Byoung-Tak Zhang
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Catastrophic forgetting is a problem which refers to losing the information
    of the first task after training from the second task in continual learning of
    neural networks. To resolve this problem, we propose the incremental moment
    matching (IMM), which uses the Bayesian neural network framework. IMM assumes
    that the posterior distribution of parameters of neural networks is
    approximated with Gaussian distribution and incrementally matches the moment of
    the posteriors, which are trained for the first and second task, respectively.
    To make our Gaussian assumption reasonable, the IMM procedure utilizes various
    transfer learning techniques including weight transfer, L2-norm of old and new
    parameters, and a newly proposed variant of dropout using old parameters. We
    analyze our methods on the MNIST and CIFAR-10 datasets, and then evaluate them
    on a real-world life-log dataset collected using Google Glass. Experimental
    results show that IMM produces state-of-the-art performance in a variety of
    datasets.

    Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop

    Justin Cranshaw, Emad Elwany, Todd Newman, Rafal Kocielnik, Bowen Yu, Sandeep Soni, Jaime Teevan, Andrés Monroy-Hernández
    Comments: 10 pages
    Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Although information workers may complain about meetings, they are an
    essential part of their work life. Consequently, busy people spend a
    significant amount of time scheduling meetings. We present Calendar.help, a
    system that provides fast, efficient scheduling through structured workflows.
    Users interact with the system via email, delegating their scheduling needs to
    the system as if it were a human personal assistant. Common scheduling
    scenarios are broken down using well-defined workflows and completed as a
    series of microtasks that are automated when possible and executed by a human
    otherwise. Unusual scenarios fall back to a trained human assistant who
    executes them as unstructured macrotasks. We describe the iterative approach we
    used to develop Calendar.help, and share the lessons learned from scheduling
    thousands of meetings during a year of real-world deployments. Our findings
    provide insight into how complex information tasks can be broken down into
    repeatable components that can be executed efficiently to improve productivity.

    Supervisor Synthesis of POMDP based on Automata Learning

    Xiaobin Zhang, Bo Wu, Hai Lin
    Subjects: Systems and Control (cs.SY); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL)

    As a general and thus popular model for autonomous systems, partially
    observable Markov decision process (POMDP) can capture uncertainties from
    different sources like sensing noises, actuation errors, and uncertain
    environments. However, its comprehensiveness makes the planning and control in
    POMDP difficult. Traditional POMDP planning problems target to find the optimal
    policy to maximize the expectation of accumulated rewards. But for safety
    critical applications, guarantees of system performance described by formal
    specifications are desired, which motivates us to consider formal methods to
    synthesize supervisor for POMDP. With system specifications given by
    Probabilistic Computation Tree Logic (PCTL), we propose a supervisory control
    framework with a type of deterministic finite automata (DFA), za-DFA, as the
    controller form. While the existing work mainly relies on optimization
    techniques to learn fixed-size finite state controllers (FSCs), we develop an
    (L^*) learning based algorithm to determine both space and transitions of
    za-DFA. Membership queries and different oracles for conjectures are defined.
    The learning algorithm is sound and complete. An example is given in detailed
    steps to illustrate the supervisor synthesis algorithm.


    Computation and Language

    Crowdsourcing Universal Part-Of-Speech Tags for Code-Switching

    Victor Soto, Julia Hirschberg
    Comments: Submitted to Interspeech 2017
    Subjects: Computation and Language (cs.CL)

    Code-switching is the phenomenon by which bilingual speakers switch between
    multiple languages during communication. The importance of developing language
    technologies for codeswitching data is immense, given the large populations
    that routinely code-switch. High-quality linguistic annotations are extremely
    valuable for any NLP task, and performance is often limited by the amount of
    high-quality labeled data. However, little such data exists for code-switching.
    In this paper, we describe crowd-sourcing universal part-of-speech tags for the
    Miami Bangor Corpus of Spanish-English code-switched speech. We split the
    annotation task into three subtasks: one in which a subset of tokens are
    labeled automatically, one in which questions are specifically designed to
    disambiguate a subset of high frequency words, and a more general cascaded
    approach for the remaining data in which questions are displayed to the worker
    following a decision tree structure. Each subtask is extended and adapted for a
    multilingual setting and the universal tagset. The quality of the annotation
    process is measured using hidden check questions annotated with gold labels.
    The overall agreement between gold standard labels and the majority vote is
    between 0.95 and 0.96 for just three labels and the average recall across
    part-of-speech tags is between 0.87 and 0.99, depending on the task.

    Interactive Natural Language Acquisition in a Multi-modal Recurrent Neural Architecture

    Stefan Heinrich, Stefan Wermter
    Comments: To appear in Connection Science, 2017, 33 pages
    Subjects: Computation and Language (cs.CL); Neurons and Cognition (q-bio.NC)

    The human brain is one of the most complex dynamic systems that enables us to
    communicate in natural language. We have a good understanding of some
    principles underlying natural languages and language processing, some knowledge
    about socio-cultural conditions framing acquisition, and some insights about
    where activity is occurring in the brain. However, we were not yet able to
    understand the behavioural and mechanistic characteristics for natural language
    and how mechanisms in the brain allow to acquire and process language.

    In an effort to bridge the gap between insights from behavioural psychology
    and neuroscience, the goal of this paper is to contribute a computational
    understanding of the appropriate characteristics that favour language
    acquisition, in a brain-inspired neural architecture. Accordingly, we provide
    concepts and refinements in cognitive modelling regarding principles and
    mechanisms in the brain – such as the hierarchical abstraction of context – in
    a plausible recurrent architecture. On this basis, we propose neurocognitively
    plausible model for embodied language acquisition from real world interaction
    of a humanoid robot with its environment. The model is capable of learning
    language production grounded in both, temporal dynamic somatosensation and
    vision. In particular, the architecture consists of a continuous time recurrent
    neural network, where parts have different leakage characteristics and thus
    operate on multiple timescales for every modality and the association of the
    higher level nodes of all modalities into cell assemblies. Thus, this model
    features hierarchical concept abstraction in sensation as well as concept
    decomposition in production, multi-modal integration, and self-organisation of
    latent representations.

    Batch-normalized joint training for DNN-based distant speech recognition

    Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
    Comments: arXiv admin note: text overlap with arXiv:1703.08002
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Improving distant speech recognition is a crucial step towards flexible
    human-machine interfaces. Current technology, however, still exhibits a lack of
    robustness, especially when adverse acoustic conditions are met. Despite the
    significant progress made in the last years on both speech enhancement and
    speech recognition, one potential limitation of state-of-the-art technology
    lies in composing modules that are not well matched because they are not
    trained jointly. To address this concern, a promising approach consists in
    concatenating a speech enhancement and a speech recognition deep neural network
    and to jointly update their parameters as if they were within a single bigger
    network. Unfortunately, joint training can be difficult because the output
    distribution of the speech enhancement system may change substantially during
    the optimization procedure. The speech recognition module would have to deal
    with an input distribution that is non-stationary and unnormalized. To mitigate
    this issue, we propose a joint training approach based on a fully
    batch-normalized architecture. Experiments, conducted using different datasets,
    tasks and acoustic conditions, revealed that the proposed framework
    significantly overtakes other competitive solutions, especially in challenging
    environments.

    TokTrack: A Complete Token Provenance and Change Tracking Dataset for the English Wikipedia

    Fabian Flöck, Kenan Erdogan, Maribel Acosta
    Subjects: Computation and Language (cs.CL)

    We present a dataset that contains every instance of all tokens (~ words)
    ever written in undeleted, non-redirect English Wikipedia articles until
    October 2016, in total 13,545,349,787 instances. Each token is annotated with
    (i) the article revision it was originally created in, and (ii) lists with all
    the revisions in which the token was ever deleted and (potentially) re-added
    and re-deleted from its article, enabling a complete and straightforward
    tracking of its history. This data would be exceedingly hard to create by an
    average potential user as it is (i) very expensive to compute and as (ii)
    accurately tracking the history of each token in revisioned documents is a
    non-trivial task. Adapting a state-of-the-art algorithm, we have produced a
    dataset that allows for a range of analyses and metrics, already popular in
    research and going beyond, to be generated on complete-Wikipedia scale;
    ensuring quality and allowing researchers to forego expensive text-comparison
    computation, which so far has hindered scalable usage. We show how this data
    enables, on token-level, computation of provenance, measuring survival of
    content over time, very detailed conflict metrics, and fine-grained
    interactions of editors like partial reverts, re-additions and other metrics,
    in the process gaining several novel insights.

    Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop

    Justin Cranshaw, Emad Elwany, Todd Newman, Rafal Kocielnik, Bowen Yu, Sandeep Soni, Jaime Teevan, Andrés Monroy-Hernández
    Comments: 10 pages
    Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Although information workers may complain about meetings, they are an
    essential part of their work life. Consequently, busy people spend a
    significant amount of time scheduling meetings. We present Calendar.help, a
    system that provides fast, efficient scheduling through structured workflows.
    Users interact with the system via email, delegating their scheduling needs to
    the system as if it were a human personal assistant. Common scheduling
    scenarios are broken down using well-defined workflows and completed as a
    series of microtasks that are automated when possible and executed by a human
    otherwise. Unusual scenarios fall back to a trained human assistant who
    executes them as unstructured macrotasks. We describe the iterative approach we
    used to develop Calendar.help, and share the lessons learned from scheduling
    thousands of meetings during a year of real-world deployments. Our findings
    provide insight into how complex information tasks can be broken down into
    repeatable components that can be executed efficiently to improve productivity.

    Are crossing dependencies really scarce?

    Ramon Ferrer-i-Cancho, Carlos Gomez-Rodriguez, J.L. Esteban
    Subjects: Physics and Society (physics.soc-ph); Statistical Mechanics (cond-mat.stat-mech); Computation and Language (cs.CL); Data Analysis, Statistics and Probability (physics.data-an)

    The syntactic structure of a sentence can be modelled as a tree, where
    vertices correspond to words and edges indicate syntactic dependencies. It has
    been claimed recurrently that the number of edge crossings in real sentences is
    small. However, a baseline or null hypothesis has been lacking. Here we
    quantify the amount of crossings of real sentences and compare it to the
    predictions of a series of baselines. We conclude that crossings are really
    scarce in real sentences. Their scarcity is unexpected by the hubiness of the
    trees. Indeed, real sentences are close to linear trees, where the potential
    number of crossings is maximized.

    Interacting Conceptual Spaces I : Grammatical Composition of Concepts

    Joe Bolt, Bob Coecke, Fabrizio Genovese, Martha Lewis, Dan Marsden, Robin Piedeleu
    Subjects: Logic in Computer Science (cs.LO); Computation and Language (cs.CL)

    The categorical compositional approach to meaning has been successfully
    applied in natural language processing, outperforming other models in
    mainstream empirical language processing tasks. We show how this approach can
    be generalized to conceptual space models of cognition. In order to do this,
    first we introduce the category of convex relations as a new setting for
    categorical compositional semantics, emphasizing the convex structure important
    to conceptual space applications. We then show how to construct conceptual
    spaces for various types such as nouns, adjectives and verbs. Finally we show
    by means of examples how concepts can be systematically combined to establish
    the meanings of composite phrases from the meanings of their constituent parts.
    This provides the mathematical underpinnings of a new compositional approach to
    cognition.


    Distributed, Parallel, and Cluster Computing

    An Algorithmic Approach to the Asynchronous Computability Theorem

    Vikram Saraph, Maurice Herlihy, Eli Gafni
    Comments: 16 pages, 2 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    The asynchronous computability theorem (ACT) uses concepts from combinatorial
    topology to characterize which tasks have wait-free solutions in read-write
    memory. A task can be expressed as a relation between two chromatic simplicial
    complexes. The theorem states that a task has a protocol (algorithm) if and
    only if there is a certain chromatic simplicial map compatible with that
    relation.

    While the original proof of the ACT relied on an involved combinatorial
    argument, Borowsky and Gafni later proposed an alternative proof that relied on
    a algorithmic construction, termed the “convergence algorithm”. The description
    of this algorithm was incomplete, and presented without proof. In this paper,
    we give the first complete description, along with a proof of correctness.

    Virtualization technology for distributed time sensitive domains

    Carlos Antonio Perea-Gómez
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS)

    This paper reports on the state of the art of virtualization technology for
    both general purpose domains as well as real-time domains. There exits no
    entirely instantaneous data transmission/transfer. There always exist a delay
    while transmitting data, either in the processing or in the medium itself.
    However most systems are designed to function appropriately with a delay
    tolerance. This delay, inevitably, is affected when operating with an extra
    layer, the virtualization. For real time systems it is crucial to know the
    temporal limits in order not to surpass them. Introducing virtualization in the
    real-time domain therefore requires deeper analysis by making use of techniques
    that will offer results with deterministic execution times. The study of time
    in systems and its behaviour under various possible circumstances is hence a
    key for properly assessing this technology applied to both domains, especially
    the real-time domain.

    A duality-based approach for distributed min-max optimization with application to demand side management

    Ivano Notarnicola, Mauro Franceschelli, Giuseppe Notarstefano
    Comments: arXiv admin note: substantial text overlap with arXiv:1611.09168
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (cs.SY)

    In this paper we consider a distributed optimization scenario in which a set
    of processors aims at minimizing the maximum of a collection of “separable
    convex functions” subject to local constraints. This set-up is motivated by
    peak-demand minimization problems in smart grids. Here, the goal is to minimize
    the peak value over a finite horizon with: (i) the demand at each time instant
    being the sum of contributions from different devices, and (ii) the local
    states at different time instants being coupled through local dynamics. The
    min-max structure and the double coupling (through the devices and over the
    time horizon) makes this problem challenging in a distributed set-up (e.g.,
    well-known distributed dual decomposition approaches cannot be applied). We
    propose a distributed algorithm based on the combination of duality methods and
    properties from min-max optimization. Specifically, we derive a series of
    equivalent problems by introducing ad-hoc slack variables and by going back and
    forth from primal and dual formulations. On the resulting problem we apply a
    dual subgradient method, which turns out to be a distributed algorithm. We
    prove the correctness of the proposed algorithm and show its effectiveness via
    numerical computations.

    A randomized primal distributed algorithm for partitioned and big-data non-convex optimization

    Ivano Notarnicola, Giuseppe Notarstefano
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (cs.NA)

    In this paper we consider a distributed optimization scenario in which the
    aggregate objective function to minimize is partitioned, big-data and possibly
    non-convex. Specifically, we focus on a set-up in which the dimension of the
    decision variable depends on the network size as well as the number of local
    functions, but each local function handled by a node depends only on a (small)
    portion of the entire optimization variable. This problem set-up has been shown
    to appear in many interesting network application scenarios. As main paper
    contribution, we develop a simple, primal distributed algorithm to solve the
    optimization problem, based on a randomized descent approach, which works under
    asynchronous gossip communication. We prove that the proposed asynchronous
    algorithm is a proper, ad-hoc version of a coordinate descent method and thus
    converges to a stationary point. To show the effectiveness of the proposed
    algorithm, we also present numerical simulations on a non-convex quadratic
    program, which confirm the theoretical results.

    Taming Tail Latency for Erasure-coded, Distributed Storage Systems

    Vaneet Aggarwal, Abubakr O. Al-Abbasi, Jingxian Fan, Tian Lan
    Comments: 11 pages, 8 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

    Distributed storage systems are known to be susceptible to long tails in
    response time. In modern online storage systems such as Bing, Facebook, and
    Amazon, the long tails of the service latency are of particular concern. with
    99.9th percentile response times being orders of magnitude worse than the mean.
    As erasure codes emerge as a popular technique to achieve high data reliability
    in distributed storage while attaining space efficiency, taming tail latency
    still remains an open problem due to the lack of mathematical models for
    analyzing such systems. To this end, we propose a framework for quantifying and
    optimizing tail latency in erasure-coded storage systems. In particular, we
    derive upper bounds on tail latency in closed form for arbitrary service time
    distribution and heterogeneous files. Based on the model, we formulate an
    optimization problem to jointly minimize the weighted latency tail probability
    of all files over the placement of files on the servers, and the choice of
    servers to access the requested files. The non-convex problem is solved using
    an efficient, alternating optimization algorithm. Numerical results show
    significant reduction of tail latency for erasure-coded storage systems with a
    realistic workload.

    LRC: Dependency-Aware Cache Management for Data Analytics Clusters

    Yinghao Yu, Wei Wang, Jun Zhang, Khaled Ben Letaief
    Comments: 9 pages
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Memory caches are being aggressively used in today’s data-parallel systems
    such as Spark, Tez, and Piccolo. However, prevalent systems employ rather
    simple cache management policies–notably the Least Recently Used (LRU)
    policy–that are oblivious to the application semantics of data dependency,
    expressed as a directed acyclic graph (DAG). Without this knowledge, memory
    caching can at best be performed by “guessing” the future data access patterns
    based on historical information (e.g., the access recency and/or frequency),
    which frequently results in inefficient, erroneous caching with low hit ratio
    and a long response time. In this paper, we propose a novel cache replacement
    policy, Least Reference Count (LRC), which exploits the application-specific
    DAG information to optimize the cache management. LRC evicts the cached data
    blocks whose reference count is the smallest. The reference count is defined,
    for each data block, as the number of dependent child blocks that have not been
    computed yet. We demonstrate the efficacy of LRC through both empirical
    analysis and cluster deployments against popular benchmarking workloads. Our
    Spark implementation shows that, compared with LRU, LRC speeds up typical
    applications by 60%.

    Automatically Tuning the GCC Compiler to Optimize the Performance of Applications Running on the ARM Cortex-M3

    Craig Blackmore, Oliver Ray, Kerstin Eder
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

    This paper introduces a novel method for automatically tuning the selection
    of compiler flags in order to optimize the performance of software that is
    intended to run on particular hardware platforms. Motivated by the rapid recent
    expansion of so-called Internet of Things (IoT) devices, we are particularly
    interested in improving the execution time of code running on embedded system
    architectures. We demonstrate the effectiveness of our approach on code
    compiled by the GNU C Compiler (GCC) for the ARM Cortex-M3 (CM3) processor; and
    we show how our method outperforms the current industry standard -O3
    optimization level across a diverse embedded system benchmark suite called
    BEEBS. We begin by conducting an investigatory study to quantify the potential
    gains by using existing iterative compilation approaches that time-intensively
    search for optimal configurations for each given benchmark. Then we adapt
    iterative compilation to output a single configuration that optimizes
    performance across the entire benchmark suite as a whole. Although this is a
    time consuming process, our approach eventually constructs a simple variation
    of -O3, which we call -Ocm3, that realizes nearly two thirds of the known
    available gains on the CM3 architecture (beyond -O3) and significantly
    outperforms a far more complex state-of-the-art predictive method. Our approach
    suggests that 26 flags should be removed from -O3 while three other flags
    should be added. We analyze in detail two of the former and explain why turning
    them off improves performance on this processor.

    Speeding up TestU01 with the use of HTCondor

    Joshua Beizer
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Testing random number generators is a very important task that, in the resent
    past, has taken upwards of twelve hours when testing with the current agship
    testing suite TestU01. Through this paper we will discuss the possible
    performance increases to the existing random number generator testing suite
    TestU01 that are available by offering the executable to an HTCondor pool to
    execute on. We will see that with a few modifications we are able to decrease
    the running time of a sample run of Big Crush from about five and a half hours
    to only five and a half minutes. We will also see that not only the time taken
    for the testing to complete is shortened, but also the amount of time the user
    is unable to use their testing computer is reduced to almost none.
    Additionally, during this paper we will look at the standard implementation of
    TestU01 and how it compares with a preexisting Parallel version of TestU01. We
    will be comparing the performance of the standard version of TestU01 with the
    parallel version so that we have a performance baseline to test our own
    modifications against. Finally, this paper will also discuss the use of the
    distributed computing system HTCondor, and cover some basic instructions
    related to setting up and using HTCondor. HTCondor is already known to increase
    the performance of a networked group of computers by allowing super users to
    utilize additional resources from other lesser users idle workstation. We will
    relate the model recommended for HTCondor to a standard computer lab found at a
    University and explain why they are the perfect candidates for the system.

    Flare: Native Compilation for Heterogeneous Workloads in Apache Spark

    Grégory M. Essertel, Ruby Y. Tahboub, James M. Decker, Kevin J. Brown, Kunle Olukotun, Tiark Rompf
    Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF); Programming Languages (cs.PL)

    The need for modern data analytics to combine relational, procedural, and
    map-reduce-style functional processing is widely recognized. State-of-the-art
    systems like Spark have added SQL front-ends and relational query optimization,
    which promise an increase in expressiveness and performance. But how good are
    these extensions at extracting high performance from modern hardware platforms?

    While Spark has made impressive progress, we show that for relational
    workloads, there is still a significant gap compared with best-of-breed query
    engines. And when stepping outside of the relational world, query optimization
    techniques are ineffective if large parts of a computation have to be treated
    as user-defined functions (UDFs).

    We present Flare: a new back-end for Spark that brings performance closer to
    the best SQL engines, without giving up the added expressiveness of Spark. We
    demonstrate order of magnitude speedups both for relational workloads such as
    TPC-H, as well as for a range of machine learning kernels that combine
    relational and iterative functional processing.

    Flare achieves these results through (1) compilation to native code, (2)
    replacing parts of the Spark runtime system, and (3) extending the scope of
    optimization and code generation to large classes of UDFs.


    Learning

    Joint Modeling of Event Sequence and Time Series with Attentional Twin Recurrent Neural Networks

    Shuai Xiao, Junchi Yan, Mehrdad Farajtabar, Le Song, Xiaokang Yang, Hongyuan Zha
    Comments: 14 pages
    Subjects: Learning (cs.LG)

    A variety of real-world processes (over networks) produce sequences of data
    whose complex temporal dynamics need to be studied. More especially, the event
    timestamps can carry important information about the underlying network
    dynamics, which otherwise are not available from the time-series evenly sampled
    from continuous signals. Moreover, in most complex processes, event sequences
    and evenly-sampled times series data can interact with each other, which
    renders joint modeling of those two sources of data necessary. To tackle the
    above problems, in this paper, we utilize the rich framework of (temporal)
    point processes to model event data and timely update its intensity function by
    the synergic twin Recurrent Neural Networks (RNNs). In the proposed
    architecture, the intensity function is synergistically modulated by one RNN
    with asynchronous events as input and another RNN with time series as input.
    Furthermore, to enhance the interpretability of the model, the attention
    mechanism for the neural point process is introduced. The whole model with
    event type and timestamp prediction output layers can be trained end-to-end and
    allows a black-box treatment for modeling the intensity. We substantiate the
    superiority of our model in synthetic data and three real-world benchmark
    datasets.

    Overcoming Catastrophic Forgetting by Incremental Moment Matching

    Sang-Woo Lee, Jin-Hwa Kim, Jung-Woo Ha, Byoung-Tak Zhang
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Catastrophic forgetting is a problem which refers to losing the information
    of the first task after training from the second task in continual learning of
    neural networks. To resolve this problem, we propose the incremental moment
    matching (IMM), which uses the Bayesian neural network framework. IMM assumes
    that the posterior distribution of parameters of neural networks is
    approximated with Gaussian distribution and incrementally matches the moment of
    the posteriors, which are trained for the first and second task, respectively.
    To make our Gaussian assumption reasonable, the IMM procedure utilizes various
    transfer learning techniques including weight transfer, L2-norm of old and new
    parameters, and a newly proposed variant of dropout using old parameters. We
    analyze our methods on the MNIST and CIFAR-10 datasets, and then evaluate them
    on a real-world life-log dataset collected using Google Glass. Experimental
    results show that IMM produces state-of-the-art performance in a variety of
    datasets.

    K-Means Clustering using Tabu Search with Quantized Means

    Kojo Sarfo Gyamfi, James Brusey, Andrew Hunt
    Comments: World Conference on Engineering and Computer Science
    Subjects: Learning (cs.LG)

    The Tabu Search (TS) metaheuristic has been proposed for K-Means clustering
    as an alternative to Lloyd’s algorithm, which for all its ease of
    implementation and fast runtime, has the major drawback of being trapped at
    local optima. While the TS approach can yield superior performance, it involves
    a high computational complexity. Moreover, the difficulty in parameter
    selection in the existing TS approach does not make it any more attractive.
    This paper presents an alternative, low-complexity formulation of the TS
    optimization procedure for K-Means clustering. This approach does not require
    many parameter settings. We initially constrain the centers to points in the
    dataset. We then aim at evolving these centers using a unique neighborhood
    structure that makes use of gradient information of the objective function.
    This results in an efficient exploration of the search space, after which the
    means are refined. The proposed scheme is implemented in MATLAB and tested on
    four real-world datasets, and it achieves a significant improvement over the
    existing TS approach in terms of the intra cluster sum of squares and
    computational time.

    Linear classifier design under heteroscedasticity in Linear Discriminant Analysis

    Kojo Sarfo Gyamfi, James Brusey, Andrew Hunt, Elena Gaura
    Subjects: Learning (cs.LG)

    Under normality and homoscedasticity assumptions, Linear Discriminant
    Analysis (LDA) is known to be optimal in terms of minimising the Bayes error
    for binary classification. In the heteroscedastic case, LDA is not guaranteed
    to minimise this error. Assuming heteroscedasticity, we derive a linear
    classifier, the Gaussian Linear Discriminant (GLD), that directly minimises the
    Bayes error for binary classification. In addition, we also propose a local
    neighbourhood search (LNS) algorithm to obtain a more robust classifier if the
    data is known to have a non-normal distribution. We evaluate the proposed
    classifiers on two artificial and ten real-world datasets that cut across a
    wide range of application areas including handwriting recognition, medical
    diagnosis and remote sensing, and then compare our algorithm against existing
    LDA approaches and other linear classifiers. The GLD is shown to outperform the
    original LDA procedure in terms of the classification accuracy under
    heteroscedasticity. While it compares favourably with other existing
    heteroscedastic LDA approaches, the GLD requires as much as 60 times lower
    training time on some datasets. Our comparison with the support vector machine
    (SVM) also shows that, the GLD, together with the LNS, requires as much as 150
    times lower training time to achieve an equivalent classification accuracy on
    some of the datasets. Thus, our algorithms can provide a cheap and reliable
    option for classification in a lot of expert systems.

    Asymmetric Learning Vector Quantization for Efficient Nearest Neighbor Classification in Dynamic Time Warping Spaces

    Brijnesh Jain, David Schultz
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    The nearest neighbor method together with the dynamic time warping (DTW)
    distance is one of the most popular approaches in time series classification.
    This method suffers from high storage and computation requirements for large
    training sets. As a solution to both drawbacks, this article extends learning
    vector quantization (LVQ) from Euclidean spaces to DTW spaces. The proposed LVQ
    scheme uses asymmetric weighted averaging as update rule. Empirical results
    exhibited superior performance of asymmetric generalized LVQ (GLVQ) over other
    state-of-the-art prototype generation methods for nearest neighbor
    classification.

    Multi-Level Discovery of Deep Options

    Roy Fox, Sanjay Krishnan, Ion Stoica, Ken Goldberg
    Subjects: Learning (cs.LG)

    Augmenting an agent’s control with useful higher-level behaviors called
    options can greatly reduce the sample complexity of reinforcement learning, but
    manually designing options is infeasible in high-dimensional and abstract state
    spaces. While recent work has proposed several techniques for automated option
    discovery, they do not scale to multi-level hierarchies and to expressive
    representations such as deep networks. We present Discovery of Deep Options
    (DDO), a policy-gradient algorithm that discovers parametrized options from a
    set of demonstration trajectories, and can be used recursively to discover
    additional levels of the hierarchy. The scalability of our approach to
    multi-level hierarchies stems from the decoupling of low-level option discovery
    from high-level meta-control policy learning, facilitated by
    under-parametrization of the high level. We demonstrate that using the
    discovered options to augment the action space of Deep Q-Network agents can
    accelerate learning by guiding exploration in tasks where random actions are
    unlikely to reach valuable states. We show that DDO is effective in adding
    options that accelerate learning in 4 out of 5 Atari RAM environments chosen in
    our experiments. We also show that DDO can discover structure in robot-assisted
    surgical videos and kinematics that match expert annotation with 72% accuracy.

    Experimental Identification of Hard Data Sets for Classification and Feature Selection Methods with Insights on Method Selection

    Cuiju Luan, Guozhu Dong
    Comments: 19 pages, 3 figures, 12 tables
    Subjects: Learning (cs.LG)

    The paper first reports an experimentally identified list of benchmark data
    sets that are hard for representative classification and feature selection
    methods. This was done after systematically evaluating a total of 54
    combinations of methods, involving nine state-of-the-art classification
    algorithms and six commonly used feature selection methods, on 129 data sets
    from the UCI repository (some data sets with known high classification accuracy
    were excluded). In this paper, a data set for classification is called hard if
    none of the 54 combinations can achieve an AUC over 0.8 and none of them can
    achieve an F-Measure value over 0.8; it is called easy otherwise. A total of 17
    out of the 129 data sets were found to be hard in that sense. This paper also
    compares the performance of different methods, and it produces rankings of
    classification methods, separately on the hard data sets and on the easy data
    sets. This paper is the first to rank methods separately for hard data sets and
    for easy data sets. It turns out that the classifier rankings resulting from
    our experiments are somehow different from those in the literature and hence
    they offer new insights on method selection.

    On the Robustness of Convolutional Neural Networks to Internal Architecture and Weight Perturbations

    Nicholas Cheney, Martin Schrimpf, Gabriel Kreiman
    Comments: under review at ICML 2017
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Deep convolutional neural networks are generally regarded as robust function
    approximators. So far, this intuition is based on perturbations to external
    stimuli such as the images to be classified. Here we explore the robustness of
    convolutional neural networks to perturbations to the internal weights and
    architecture of the network itself. We show that convolutional networks are
    surprisingly robust to a number of internal perturbations in the higher
    convolutional layers but the bottom convolutional layers are much more fragile.
    For instance, Alexnet shows less than a 30% decrease in classification
    performance when randomly removing over 70% of weight connections in the top
    convolutional or dense layers but performance is almost at chance with the same
    perturbation in the first convolutional layer. Finally, we suggest further
    investigations which could continue to inform the robustness of convolutional
    networks to internal perturbations.

    Batch-normalized joint training for DNN-based distant speech recognition

    Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
    Comments: arXiv admin note: text overlap with arXiv:1703.08002
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Improving distant speech recognition is a crucial step towards flexible
    human-machine interfaces. Current technology, however, still exhibits a lack of
    robustness, especially when adverse acoustic conditions are met. Despite the
    significant progress made in the last years on both speech enhancement and
    speech recognition, one potential limitation of state-of-the-art technology
    lies in composing modules that are not well matched because they are not
    trained jointly. To address this concern, a promising approach consists in
    concatenating a speech enhancement and a speech recognition deep neural network
    and to jointly update their parameters as if they were within a single bigger
    network. Unfortunately, joint training can be difficult because the output
    distribution of the speech enhancement system may change substantially during
    the optimization procedure. The speech recognition module would have to deal
    with an input distribution that is non-stationary and unnormalized. To mitigate
    this issue, we propose a joint training approach based on a fully
    batch-normalized architecture. Experiments, conducted using different datasets,
    tasks and acoustic conditions, revealed that the proposed framework
    significantly overtakes other competitive solutions, especially in challenging
    environments.

    Smart Augmentation – Learning an Optimal Data Augmentation Strategy

    Joseph Lemley, Shabab Bazrafkan, Peter Corcoran
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    A recurring problem faced when training neural networks is that there is
    typically not enough data to maximize the generalization capability of deep
    neural networks(DNN). There are many techniques to address this, including data
    augmentation, dropout, and transfer learning. In this paper, we introduce an
    additional method which we call Smart Augmentation and we show how to use it to
    increase the accuracy and reduce overfitting on a target network. Smart
    Augmentation works by creating a network that learns how to generate augmented
    data during the training process of a target network in a way that reduces that
    networks loss. This allows us to learn augmentations that minimize the error of
    that network.

    Smart Augmentation has shown the potential to increase accuracy by
    demonstrably significant measures on all datasets tested. In addition, it has
    shown potential to achieve similar or improved performance levels with
    significantly smaller network sizes in a number of tested cases.

    Feature Fusion using Extended Jaccard Graph and Stochastic Gradient Descent for Robot

    Shenglan Liu, Muxin Sun, Wei Wang, Feilong Wang
    Comments: Assembly Automation
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)

    Robot vision is a fundamental device for human-robot interaction and robot
    complex tasks. In this paper, we use Kinect and propose a feature graph fusion
    (FGF) for robot recognition. Our feature fusion utilizes RGB and depth
    information to construct fused feature from Kinect. FGF involves multi-Jaccard
    similarity to compute a robust graph and utilize word embedding method to
    enhance the recognition results. We also collect DUT RGB-D face dataset and a
    benchmark datset to evaluate the effectiveness and efficiency of our method.
    The experimental results illustrate FGF is robust and effective to face and
    object datasets in robot applications.


    Information Theory

    Smart Meter Privacy with Renewable Energy and a Storage Device

    Giulio Giaconi, Deniz Gunduz, H. Vincent Poor
    Comments: 31 pages, 7 figures, submitted for Journal publication
    Subjects: Information Theory (cs.IT)

    A smart meter (SM) measures a consumer’s electricity consumption and reports
    it automatically to a utility provider (UP) in almost real time. Despite many
    advantages of SMs, their use also leads to serious concerns about consumer
    privacy. In this paper, SM privacy is studied by considering the presence of a
    renewable energy source (RES) and a battery, which can be used to partially
    hide the consumer’s energy consumption behavior. Privacy is measured by the
    information leakage rate, which denotes the average mutual information between
    the user’s real energy consumption and the energy requested from the grid,
    which the SM reads and reports to the UP. The impact of the knowledge of the
    amount of energy generated by the RES at the UP is also considered. The minimum
    information leakage rate is characterized as a computable information theoretic
    single-letter expression in the two extreme cases, that is, when the battery
    capacity is infinite or zero. Numerical results are presented for the finite
    battery capacity case to illustrate the potential privacy gains from the
    existence of a storage device. It is shown that, while the information leakage
    rate decreases with increasing availability of an RES, larger storage capacity
    is needed to fully exploit the available energy to improve the privacy.

    A new class of three-weight linear codes from weakly regular plateaued functions

    Sihem Mesnager, Ferruh Özbudak, Ahmet Sınak
    Comments: The Extended Abstract of this work was submitted to WCC-2017 (the Tenth International Workshop on Coding and Cryptography)
    Subjects: Information Theory (cs.IT)

    Linear codes with few weights have many applications in secret sharing
    schemes, authentication codes, communication and strongly regular graphs. In
    this paper, we consider linear codes with three weights in arbitrary
    characteristic. To do this, we generalize the recent contribution of Mesnager
    given in [Cryptography and Communications 9(1), 71-84, 2017]. We first present
    a new class of binary linear codes with three weights from plateaued Boolean
    functions and their weight distributions. We next introduce the notion of
    (weakly) regular plateaued functions in odd characteristic (p) and give
    concrete examples of these functions. Moreover, we construct a new class of
    three-weight linear (p)-ary codes from weakly regular plateaued functions and
    determine their weight distributions. We finally analyse the constructed linear
    codes for secret sharing schemes.

    Diffusion L0-norm constraint improved proportionate LMS algorithm for sparse distributed estimation

    Zongsheng Zheng, Zhigang Liu
    Subjects: Information Theory (cs.IT)

    To exploit the sparsity of the considered system, the diffusion
    proportionate-type least mean square (PtLMS) algorithms assign different gains
    to each tap in the convergence stage while the diffusion sparsity-constrained
    LMS (ScLMS) algorithms pull the components towards zeros in the steady-state
    stage. In this paper, by minimizing a differentiable cost function that
    utilizes the Riemannian distance between the updated and previous weight
    vectors as well as the L0 norm of the weighted updated weight vector, we
    propose a diffusion L0-norm constraint improved proportionate LMS (L0-IPLMS)
    algorithm, which combines the benefits of the diffusion PtLMS and diffusion
    ScLMS algorithms and performs the best performance among them. Simulations in a
    system identification context confirm the improvement of the proposed
    algorithm.

    Combinatorial metrics: MacWilliams-type identities, isometries and extension property

    Jerry Anderson Pinheiro, Roberto Assis Machado, Marcelo Firer
    Subjects: Information Theory (cs.IT)

    In this work we characterize the combinatorial metrics admitting a
    MacWilliams-type identity and describe the group of linear isometries of such
    metrics. Considering coverings that are not connected, we classify the metrics
    satisfying the MacWilliams extension property.

    SINR and Throughput of Dense Cellular Networks with Stretched Exponential Path Loss

    Ahmad AlAmmouri, Jeffrey G. Andrews, François Baccelli
    Comments: Submitted to IEEE TWC
    Subjects: Information Theory (cs.IT)

    Distance-based attenuation is a critical aspect of wireless communications.
    As opposed to the ubiquitous power-law path loss model, this paper proposes a
    stretched exponential path loss model that is suitable for short-range
    communication. In this model, the signal power attenuates over a distance (r)
    as (e^{-alpha r^{eta}}), where (alpha,eta) are tunable parameters. Using
    experimental propagation measurements, we show that the proposed model is
    accurate for short to moderate distances in the range (r in (5,300)) meters
    and so is a suitable model for dense and ultradense networks. We integrate this
    path loss model into a downlink cellular network with base stations modeled by
    a Poisson point process, and derive expressions for the coverage probability,
    potential throughput, and area spectral efficiency. Although the most general
    result for coverage probability has a double integral, several special cases
    are given where the coverage probability has a compact or even closed form. We
    then show that the potential throughput is maximized for a particular BS
    density and then collapses to zero for high densities, assuming a fixed SINR
    threshold. We next prove that the area spectral efficiency, which assumes an
    adaptive SINR threshold, is non-decreasing with the BS density and converges to
    a constant for high densities.

    Millimeter Wave Small-Scale Spatial Statistics in an Urban Microcell Scenario

    Shu Sun, Hangsong Yan, George R. MacCartney Jr., Theodore S. Rappaport
    Comments: 7 pages, 11 figures, in 2017 IEEE International Conference on Communications (ICC), May 2017
    Subjects: Information Theory (cs.IT)

    This paper presents outdoor wideband small-scale spatial fading and
    autocorrelation measurements and results in the 73 GHz millimeter-wave (mmWave)
    band conducted in downtown Brooklyn, New York. Both directional and
    omnidirectional receiver (RX) antennas are studied. Two pairs of transmitter
    (TX) and RX locations were tested with one line-of-sight (LOS) and one
    non-lineof- sight (NLOS) environment, where a linear track was employed at each
    RX to move the antenna in half-wavelength increments. Measured data reveal that
    the small-scale spatial fading of the received signal voltage amplitude are
    generally Ricean-distributed for both omnidirectional and directional RX
    antenna patterns under both LOS and NLOS conditions in most cases, except for
    the log-normal distribution for the omnidirectional RX antenna pattern in the
    NLOS environment. Sinusoidal exponential and typical exponential functions are
    found to model small-scale spatial autocorrelation of the received signal
    voltage amplitude in LOS and NLOS environments in most cases, respectively.
    Furthermore, different decorrelation distances were observed for different RX
    track orientations, i.e., for different directions of motion relative to the
    TX. Results herein are valuable for characterizing smallscale spatial fading
    and autocorrelation properties in multipleinput multiple-output (MIMO) systems
    for fifth-generation (5G) mmWave frequencies.

    A Novel Millimeter-Wave Channel Simulator and Applications for 5G Wireless Communications

    Shu Sun, George R. MacCartney Jr., Theodore S. Rappaport
    Comments: 7 pages, 8 figures, in 2017 IEEE International Conference on Communications (ICC), May 2017
    Subjects: Information Theory (cs.IT)

    This paper presents details and applications of a novel channel simulation
    software named NYUSIM, which can be used to generate realistic temporal and
    spatial channel responses to support realistic physical- and link-layer
    simulations and design for fifth-generation (5G) cellular communications.
    NYUSIM is built upon the statistical spatial channel model for broadband
    millimeter-wave (mmWave) wireless communication systems developed by
    researchers at New York University (NYU). The simulator is applicable for a
    wide range of carrier frequencies (500 MHz to 100 GHz), radio frequency (RF)
    bandwidths (0 to 800 MHz), antenna beamwidths (7? to 360? for azimuth and 7? to
    45? for elevation), and operating scenarios (urban microcell, urban macrocell,
    and rural macrocell), and also incorporates multiple-input multiple-output
    (MIMO) antenna arrays at the transmitter and receiver. This paper also provides
    examples to demonstrate how to use NYUSIM for analyzing MIMO channel conditions
    and spectral efficiencies, which show that NYUSIM is an alternative and more
    realistic channel model compared to the 3rd Generation Partnership Project
    (3GPP) and other channel models for mmWave bands.

    Millimeter Wave MIMO Channel Estimation Based on Adaptive Compressed Sensing

    Shu Sun, Theodore S. Rappaport
    Comments: 7 pages, 5 figures, in 2017 IEEE International Conference on Communications Workshop (ICCW), May 2017
    Subjects: Information Theory (cs.IT)

    Multiple-input multiple-output (MIMO) systems are well suited for
    millimeter-wave (mmWave) wireless communications where large antenna arrays can
    be integrated in small form factors due to tiny wavelengths, thereby providing
    high array gains while supporting spatial multiplexing, beamforming, or antenna
    diversity. It has been shown that mmWave channels exhibit sparsity due to the
    limited number of dominant propagation paths, thus compressed sensing
    techniques can be leveraged to conduct channel estimation at mmWave
    frequencies. This paper presents a novel approach of constructing beamforming
    dictionary matrices for sparse channel estimation using the continuous basis
    pursuit (CBP) concept, and proposes two novel low-complexity algorithms to
    exploit channel sparsity for adaptively estimating multipath channel parameters
    in mmWave channels. We verify the performance of the proposed CBP-based
    beamforming dictionary and the two algorithms using a simulator built upon a
    three-dimensional mmWave statistical spatial channel model, NYUSIM, that is
    based on real-world propagation measurements. Simulation results show that the
    CBPbased dictionary offers substantially higher estimation accuracy and greater
    spectral efficiency than the grid-based counterpart introduced by previous
    researchers, and the algorithms proposed here render better performance but
    require less computational effort compared with existing algorithms.

    Fast and Flexible Successive-Cancellation List Decoders for Polar Codes

    Seyyed Ali Hashemi, Carlo Condo, Warren J. Gross
    Subjects: Information Theory (cs.IT)

    Polar codes have gained significant amount of attention during the past few
    years and have been selected as a coding scheme for the next generation of
    mobile broadband standard. Among decoding schemes, successive-cancellation list
    (SCL) decoding provides a reasonable trade-off between the error-correction
    performance and hardware implementation complexity when used to decode polar
    codes, at the cost of limited throughput. The simplified SCL (SSCL) and its
    extension SSCL-SPC increase the speed of decoding by removing redundant
    calculations when encountering particular information and frozen bit patterns
    (rate one and single parity check codes), while keeping the error-correction
    performance unaltered. In this paper, we improve SSCL and SSCL-SPC by proving
    that the list size imposes a specific number of bit estimations required to
    decode rate one and single parity check codes. Thus, the number of estimations
    can be limited while guaranteeing exactly the same error-correction performance
    as if all bits of the code were estimated. We call the new decoding algorithms
    Fast-SSCL and Fast-SSCL-SPC. Moreover, we show that the number of bit
    estimations in a practical application can be tuned to achieve desirable speed,
    while keeping the error-correction performance almost unchanged. Hardware
    architectures implementing both algorithms are then described and implemented:
    it is shown that our design can achieve 1.86 Gb/s throughput, higher than the
    best state-of-the-art decoders.

    MSE estimates for multitaper spectral estimation and off-grid compressive sensing

    Luís Daniel Abreu, José Luis Romero
    Comments: This replaces arXiv: 1503.02991
    Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)

    We obtain estimates for the Mean Squared Error (MSE) for the multitaper
    spectral estimator and certain compressive acquisition methods for multi-band
    signals. We confirm a fact discovered by Thomson [Spectrum estimation and
    harmonic analysis, Proc. IEEE, 1982]: assuming bandwidth (W) and (N) time
    domain observations, the average of the square of the first (K=2NW) Slepian
    functions approaches, as (K) grows, an ideal band-pass kernel for the interval
    ([-W,W]). We provide an analytic proof of this fact and measure the
    corresponding rate of convergence in the (L^{1}) norm. This validates a
    heuristic approximation used to control the MSE of the multitaper estimator.
    The estimates have also consequences for the method of compressive acquisition
    of multi-band signals introduced by Davenport and Wakin, giving MSE
    approximation bounds for the dictionary formed by modulation of the critical
    number of prolates.

    Taming Tail Latency for Erasure-coded, Distributed Storage Systems

    Vaneet Aggarwal, Abubakr O. Al-Abbasi, Jingxian Fan, Tian Lan
    Comments: 11 pages, 8 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

    Distributed storage systems are known to be susceptible to long tails in
    response time. In modern online storage systems such as Bing, Facebook, and
    Amazon, the long tails of the service latency are of particular concern. with
    99.9th percentile response times being orders of magnitude worse than the mean.
    As erasure codes emerge as a popular technique to achieve high data reliability
    in distributed storage while attaining space efficiency, taming tail latency
    still remains an open problem due to the lack of mathematical models for
    analyzing such systems. To this end, we propose a framework for quantifying and
    optimizing tail latency in erasure-coded storage systems. In particular, we
    derive upper bounds on tail latency in closed form for arbitrary service time
    distribution and heterogeneous files. Based on the model, we formulate an
    optimization problem to jointly minimize the weighted latency tail probability
    of all files over the placement of files on the servers, and the choice of
    servers to access the requested files. The non-convex problem is solved using
    an efficient, alternating optimization algorithm. Numerical results show
    significant reduction of tail latency for erasure-coded storage systems with a
    realistic workload.

    Existence of Stein Kernels under a Spectral Gap, and Discrepancy Bound

    Thomas A. Courtade, Max Fathi, Ashwin Pananjady
    Comments: 19 pages, comments are welcome
    Subjects: Probability (math.PR); Information Theory (cs.IT); Functional Analysis (math.FA)

    We establish existence of Stein kernels for probability measures on
    (mathbb{R}^d) satisfying a Poincar’e inequality, and obtain bounds on the
    Stein discrepancy of such measures. Applications to quantitative central limit
    theorems are discussed, including a new CLT in Wasserstein distance (W_2) with
    optimal rate and dependence on the dimension. As a byproduct, we obtain a
    stability version of an estimate of the Poincar’e constant of probability
    measures under a second moment constraint. The results extend more generally to
    the setting of converse weighted Poincar’e inequalities. The proof is based on
    simple arguments of calculus of variations.

    Further, we establish two general properties enjoyed by the Stein
    discrepancy, holding whenever a Stein kernel exists: Stein discrepancy is
    strictly decreasing along the CLT, and it controls the skewness of a random
    vector.




沪ICP备19023445号-2号
友情链接