IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Mon, 23 Jan 2017

    我爱机器学习(52ml.net)发表于 2017-01-23 00:00:00
    love 0

    Neural and Evolutionary Computing

    Fusion of Heterogeneous Data in Convolutional Networks for Urban Semantic Labeling (Invited Paper)

    Nicolas Audebert (Palaiseau, OBELIX), Bertrand Le Saux (Palaiseau), Sébastien Lefèvre (OBELIX)
    Comments: Joint Urban Remote Sensing Event (JURSE), Mar 2017, Dubai, United Arab Emirates. Joint Urban Remote Sensing Event 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

    In this work, we present a novel module to perform fusion of heterogeneous
    data using fully convolutional networks for semantic labeling. We introduce
    residual correction as a way to learn how to fuse predictions coming out of a
    dual stream architecture. Especially, we perform fusion of DSM and IRRG optical
    data on the ISPRS Vaihingen dataset over a urban area and obtain new
    state-of-the-art results.

    Using LLVM-based JIT Compilation in Genetic Programming

    Michal Gregor, Juraj Spalek
    Comments: Link to the IEEE published version: this http URL
    Subjects: Neural and Evolutionary Computing (cs.NE)

    The paper describes an approach to implementing genetic programming, which
    uses the LLVM library to just-in-time compile/interpret the evolved abstract
    syntax trees. The solution is described in some detail, including a parser
    (based on FlexC++ and BisonC++) that can construct the trees from a simple toy
    language with C-like syntax. The approach is compared with a previous
    implementation (based on direct execution of trees using polymorphic functors)
    in terms of execution speed.


    Computer Vision and Pattern Recognition

    End-To-End Visual Speech Recognition With LSTMs

    Stavros Petridis, Zuwei Li, Maja Pantic
    Comments: Accepted for publication, ICASSP 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    Traditional visual speech recognition systems consist of two stages, feature
    extraction and classification. Recently, several deep learning approaches have
    been presented which automatically extract features from the mouth images and
    aim to replace the feature extraction stage. However, research on joint
    learning of features and classification is very limited. In this work, we
    present an end-to-end visual speech recognition system based on Long-Short
    Memory (LSTM) networks. To the best of our knowledge, this is the first model
    which simultaneously learns to extract features directly from the pixels and
    perform classification and also achieves state-of-the-art performance in visual
    speech classification. The model consists of two streams which extract features
    directly from the mouth and difference images, respectively. The temporal
    dynamics in each stream are modelled by an LSTM and the fusion of the two
    streams takes place via a Bidirectional LSTM (BLSTM). An absolute improvement
    of 9.7% over the base line is reported on the OuluVS2 database, and 1.5% on the
    CUAVE database when compared with other methods which use a similar visual
    front-end.

    A Large-scale Dataset and Benchmark for Similar Trademark Retrieval

    Osman Tursun, Cemal Aker, Sinan Kalkan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Trademark retrieval (TR) has become an important yet challenging problem due
    to an ever increasing trend in trademark applications and infringement
    incidents. There have been many promising attempts for the TR problem, which,
    however, fell impracticable since they were evaluated with limited and mostly
    trivial datasets. In this paper, we provide a large-scale dataset with
    benchmark queries with which different TR approaches can be evaluated
    systematically. Moreover, we provide a baseline on this benchmark using the
    widely-used methods applied to TR in the literature. Furthermore, we identify
    and correct two important issues in TR approaches that were not addressed
    before: reversal of contrast, and presence of irrelevant text in trademarks
    severely affect the TR methods. Lastly, we applied deep learning, namely,
    several popular Convolutional Neural Network models, to the TR problem. To the
    best of the authors, this is the first attempt to do so.

    Robust Intrinsic and Extrinsic Calibration of RGB-D Cameras

    Filippo Basso, Emanuele Menegatti, Alberto Pretto
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    Color-depth cameras (RGB-D cameras) have become the primary sensors in most
    robotics systems, from service robotics to industrial robotics applications.
    Typical consumer-grade RGB-D cameras are provided with a coarse intrinsic and
    extrinsic calibration that generally does not meet the accuracy requirements
    needed by many robotics applications (e.g., high accuracy 3D environment
    reconstruction and mapping, high precision object recognition and localization,
    …). In this paper, we propose a human-friendly, reliable and accurate
    calibration framework that enables to easily estimate both the intrinsic and
    extrinsic parameters of a general color-depth sensor couple. Our approach is
    based on a novel, two components, measurement error model that unifies the
    error sources of different RGB-D pairs based on different technologies, such as
    structured-light 3D cameras and time-of-flight cameras. The proposed correction
    model is implemented using two different parametric undistortion maps that
    provide the calibrated readings by means of linear combinations of control
    functions. Our method provides some important advantages compared to other
    state-of-the-art systems: it is general (i.e., well suited for different types
    of sensors), it is based on an easy and stable calibration protocol, it
    provides a greater calibration accuracy, and it has been implemented within the
    ROS robotics framework. We report detailed and comprehensive experimental
    validations and performance comparisons to support our statements.

    Automatic Generation of Typographic Font from a Small Font Subset

    Tomo Miyazaki, Tatsunori Tsuchiya, Yoshihiro Sugaya, Shinichiro Omachi, Masakazu Iwamura, Seiichi Uchida, Koichi Kise
    Comments: 12 pages, 17 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper addresses the automatic generation of a typographic font from a
    subset of characters. Specifically, we use a subset of a typographic font to
    extrapolate additional characters. Consequently, we obtain a complete font
    containing a number of characters sufficient for daily use. The automated
    generation of Japanese fonts is in high demand because a Japanese font requires
    over 1,000 characters. Unfortunately, professional typographers create most
    fonts, resulting in significant financial and time investments for font
    generation. The proposed method can be a great aid for font creation because
    designers do not need to create the majority of the characters for a new font.
    The proposed method uses strokes from given samples for font generation. The
    strokes, from which we construct characters, are extracted by exploiting a
    character skeleton dataset. This study makes three main contributions: a novel
    method of extracting strokes from characters, which is applicable to both
    standard fonts and their variations; a fully automated approach for
    constructing characters; and a selection method for sample characters. We
    demonstrate our proposed method by generating 2,965 characters in 47 fonts.
    Objective and subjective evaluations verify that the generated characters are
    similar to handmade characters.

    Efficient Feature Matching by Progressive Candidate Search

    Sehyung Lee, Jongwoo Lim, Il Hong Suh
    Comments: 9 pages including 1 references, 9 figures, 2 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a novel feature matching algorithm that systematically utilizes
    the geometric properties of features such as position, scale, and orientation,
    in addition to the conventional descriptor vectors. In challenging scenes with
    the presence of repetitive patterns or with a large viewpoint change, it is
    hard to find the correct correspondences using feature descriptors only, since
    the descriptor distances of the correct matches may not be the least among the
    candidates due to appearance changes. Assuming that the layout of the nearby
    features does not changed much, we propose the bidirectional transfer measure
    to gauge the geometric consistency of a pair of feature correspondences. The
    feature matching problem is formulated as a Markov random field (MRF) which
    uses descriptor distances and relative geometric similarities together. The
    unmatched features are explicitly modeled in the MRF to minimize its negative
    impact. For speed and stability, instead of solving the MRF on the entire
    features at once, we start with a small set of confident feature matches, and
    then progressively search the candidates in nearby features and expand the MRF
    with them. Experimental comparisons show that the proposed algorithm finds
    better feature correspondences, i.e. more matches with higher inlier ratio, in
    many challenging scenes with much lower computational cost than the
    state-of-the-art algorithms.

    Dual Recovery Network with Online Compensation for Image Super-Resolution

    Sifeng Xia, Wenhan Yang, Tiesong Zhao, Jiaying Liu
    Comments: submitted to ICME
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The image super-resolution (SR) methods will essentially lead to a loss of
    some high-frequency (HF) information when predicting high-resolution (HR)
    images from low-resolution (LR) images without using external references. To
    address that, we additionally utilize online retrieved data to facilitate image
    SR in a unified deep framework. A novel dual high-frequency recovery network
    (DHN) is proposed to predict an HR image with three parts: an LR image, an
    internal inferred HF (IHF) map (HF missing part inferred solely from the LR
    image) and an external extracted HF (EHF) map. In particular, we infer the HF
    information based on both the LR image and similar HR
    efc{references} which
    is retrieved online. For the EHF map, we align the
    efc{references} with
    affine transformation and then in the aligned {references}, part of HF signals
    are extracted by the proposed DHN to compensate for the HF loss. Extensive
    experimental results demonstrate that our DHN achieves notably better
    performance than state-of-the-art SR methods.

    Holistic Interstitial Lung Disease Detection using Deep Convolutional Neural Networks: Multi-label Learning and Unordered Pooling

    Mingchen Gao, Ziyue Xu, Le Lu, Adam P. Harrison, Ronald M. Summers, Daniel J. Mollura
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Accurately predicting and detecting interstitial lung disease (ILD) patterns
    given any computed tomography (CT) slice without any pre-processing
    prerequisites, such as manually delineated regions of interest (ROIs), is a
    clinically desirable, yet challenging goal. The majority of existing work
    relies on manually-provided ILD ROIs to extract sampled 2D image patches from
    CT slices and, from there, performs patch-based ILD categorization. Acquiring
    manual ROIs is labor intensive and serves as a bottleneck towards
    fully-automated CT imaging ILD screening over large-scale populations.
    Furthermore, despite the considerable high frequency of more than one ILD
    pattern on a single CT slice, previous works are only designed to detect one
    ILD pattern per slice or patch.

    To tackle these two critical challenges, we present multi-label deep
    convolutional neural networks (CNNs) for detecting ILDs from holistic CT slices
    (instead of ROIs or sub-images). Conventional single-labeled CNN models can be
    augmented to cope with the possible presence of multiple ILD pattern labels,
    via 1) continuous-valued deep regression based robust norm loss functions or 2)
    a categorical objective as the sum of element-wise binary logistic losses. Our
    methods are evaluated and validated using a publicly available database of 658
    patient CT scans under five-fold cross-validation, achieving promising
    performance on detecting four major ILD patterns: Ground Glass, Reticular,
    Honeycomb, and Emphysema. We also investigate the effectiveness of a CNN
    activation-based deep-feature encoding scheme using Fisher vector encoding,
    which treats ILD detection as spatially-unordered deep texture classification.

    Fast and Efficient Skin Detection for Facial Detection

    Mohammad Reza Mahmoodi
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, an efficient skin detection system is proposed. The algorithm
    is based on a very fast efficient pre-processing step utilizing the concept of
    ternary conversion in order to identify candidate windows and subsequently, a
    novel local two-stage diffusion method which has F-score accuracy of 0.5978 on
    SDD dataset. The pre-processing step has been proven to be useful to boost the
    speed of the system by eliminating 82% of an image in average. This is obtained
    by keeping the true positive rate above 98%. In addition, a novel segmentation
    algorithm is also designed to process candidate windows which is quantitatively
    and qualitatively proven to be very efficient in term of accuracy. The
    algorithm has been implemented in FPGA to obtain real-time processing speed.
    The system is designed fully pipeline and the inherent parallel structure of
    the algorithm is fully exploited to maximize the performance. The system is
    implemented on a Spartan-6 LXT45 Xilinx FPGA and it is capable of processing 98
    frames of 640*480 24-bit color images per second.

    High Performance Novel Skin Segmentation Algorithm for Images With Complex Background

    Mohammad Reza Mahmoodi
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Skin Segmentation is widely used in biometric applications such as face
    detection, face recognition, face tracking, and hand gesture recognition.
    However, several challenges such as nonlinear illumination, equipment effects,
    personal interferences, ethnicity variations, etc., are involved in detection
    process that result in the inefficiency of color based methods. Even though
    many ideas have already been proposed, the problem has not been satisfactorily
    solved yet. This paper introduces a technique that addresses some limitations
    of the previous works. The proposed algorithm consists of three main steps
    including initial seed generation of skin map, Otsu segmentation in color
    images, and finally a two-stage diffusion. The initial seed of skin pixels is
    provided based on the idea of ternary image as there are certain pixels in
    images which are associated to human complexion with very high probability. The
    Otsu segmentation is performed on several color channels in order to identify
    homogeneous regions. The result accompanying with the edge map of the image is
    utilized in two consecutive diffusion steps in order to annex initially
    unidentified skin pixels to the seed. Both quantitative and qualitative results
    demonstrate the effectiveness of the proposed system in compare with the
    state-of-the-art works.

    Fusion of Heterogeneous Data in Convolutional Networks for Urban Semantic Labeling (Invited Paper)

    Nicolas Audebert (Palaiseau, OBELIX), Bertrand Le Saux (Palaiseau), Sébastien Lefèvre (OBELIX)
    Comments: Joint Urban Remote Sensing Event (JURSE), Mar 2017, Dubai, United Arab Emirates. Joint Urban Remote Sensing Event 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

    In this work, we present a novel module to perform fusion of heterogeneous
    data using fully convolutional networks for semantic labeling. We introduce
    residual correction as a way to learn how to fuse predictions coming out of a
    dual stream architecture. Especially, we perform fusion of DSM and IRRG optical
    data on the ISPRS Vaihingen dataset over a urban area and obtain new
    state-of-the-art results.


    Artificial Intelligence

    Logical Inferences with Contexts of RDF Triples

    Vinh Nguyen, Amit Sheth
    Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB)

    Logical inference, an integral feature of the Semantic Web, is the process of
    deriving new triples by applying entailment rules on knowledge bases. The
    entailment rules are determined by the model-theoretic semantics. Incorporating
    context of an RDF triple (e.g., provenance, time, and location) into the
    inferencing process requires the formal semantics to be capable of describing
    the context of RDF triples also in the form of triples, or in other words, RDF
    contextual triples about triples. The formal semantics should also provide the
    rules that could entail new contextual triples about triples. In this paper, we
    propose the first inferencing mechanism that allows context of RDF triples,
    represented in the form of RDF triples about triples, to be the first-class
    citizens in the model-theoretic semantics and in the logical rules. Our
    inference mechanism is well-formalized with all new concepts being captured in
    the model-theoretic semantics. This formal semantics also allows us to derive a
    new set of entailment rules that could entail new contextual triples about
    triples. To demonstrate the feasibility and the scalability of the proposed
    mechanism, we implement a new tool in which we transform the existing knowledge
    bases to our representation of RDF triples about triples and provide the option
    for this tool to compute the inferred triples for the proposed rules. We
    evaluate the computation of the proposed rules on a large scale using various
    real-world knowledge bases such as Bio2RDF NCBI Genes and DBpedia. The results
    show that the computation of the inferred triples can be highly scalable. On
    average, one billion inferred triples adds 5-6 minutes to the overall
    transformation process. NCBI Genes, with 20 billion triples in total, took only
    232 minutes for the transformation of 12 billion triples and added 42 minutes
    for inferring 8 billion triples to the overall process.


    Information Retrieval

    The Parallel Distributed Image Search Engine (ParaDISE)

    Dimitrios Markonis, Roger Schaer, Alba García Seco de Herrera, Henning Müller
    Comments: 23 pages, 9 figures
    Subjects: Information Retrieval (cs.IR)

    Image retrieval is a complex task that differs according to the context and
    the user requirements in any specific field, for example in a medical
    environment. Search by text is often not possible or optimal and retrieval by
    the visual content does not always succeed in modelling high-level concepts
    that a user is looking for. Modern image retrieval techniques consist of
    multiple steps and aim to retrieve information from large–scale datasets and
    not only based on global image appearance but local features and if possible in
    a connection between visual features and text or semantics. This paper presents
    the Parallel Distributed Image Search Engine (ParaDISE), an image retrieval
    system that combines visual search with text–based retrieval and that is
    available as open source and free of charge. The main design concepts of
    ParaDISE are flexibility, expandability, scalability and interoperability.
    These concepts constitute the system, able to be used both in real-world
    applications and as an image retrieval research platform. Apart from the
    architecture and the implementation of the system, two use cases are described,
    an application of ParaDISE in retrieval of images from the medical literature
    and a visual feature evaluation for medical image retrieval. Future steps
    include the creation of an open source community that will contribute and
    expand this platform based on the existing parts.


    Computation and Language

    CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation

    Saeedeh Shekarpour, Valerie Shalin, Krishnaprasad Thirunarayan, Amit P. Sheth
    Subjects: Computation and Language (cs.CL)

    While the general analysis of named entities has received substantial
    research attention, the analysis of relations over named entities has not. In
    fact, a review of the literature on unstructured as well as structured data
    revealed a deficiency in research on the abstract conceptualization required to
    organize relations. We believe that such an abstract conceptualization can
    benefit various communities and applications such as natural language
    processing, information extraction, machine learning and ontology engineering.
    In this paper, we present CEVO (i.e., a Comprehensive EVent Ontology) built on
    Levin’s conceptual hierarchy of English verbs that categorizes verbs with the
    shared meaning and syntactic behavior. We present the fundamental concepts and
    requirements for this ontology. Furthermore, we present three use cases for
    demonstrating the benefits of this ontology on annotation tasks: 1) annotating
    relations in plain text, 2) annotating ontological properties and 3) linking
    textual relations to ontological properties.

    Leveraging Cognitive Features for Sentiment Analysis

    Abhijit Mishra, Diptesh Kanojia, Seema Nagar, Kuntal Dey, Pushpak Bhattacharyya
    Comments: The SIGNLL Conference on Computational Natural Language Learning (CoNLL 2016)
    Subjects: Computation and Language (cs.CL)

    Sentiments expressed in user-generated short text and sentences are nuanced
    by subtleties at lexical, syntactic, semantic and pragmatic levels. To address
    this, we propose to augment traditional features used for sentiment analysis
    and sarcasm detection, with cognitive features derived from the eye-movement
    patterns of readers. Statistical classification using our enhanced feature set
    improves the performance (F-score) of polarity detection by a maximum of 3.7%
    and 9.3% on two datasets, over the systems that use only traditional features.
    We perform feature significance analysis, and experiment on a held-out dataset,
    showing that cognitive features indeed empower sentiment analyzers to handle
    complex constructs.

    Harnessing Cognitive Features for Sarcasm Detection

    Abhijit Mishra, Diptesh Kanojia, Seema Nagar, Kuntal Dey, Pushpak Bhattacharyya
    Comments: The 54th Annual Meeting of The Association for Computational Linguistics (ACL 2016)
    Subjects: Computation and Language (cs.CL)

    In this paper, we propose a novel mechanism for enriching the feature vector,
    for the task of sarcasm detection, with cognitive features extracted from
    eye-movement patterns of human readers. Sarcasm detection has been a
    challenging research problem, and its importance for NLP applications such as
    review summarization, dialog systems and sentiment analysis is well recognized.
    Sarcasm can often be traced to incongruity that becomes apparent as the full
    sentence unfolds. This presence of incongruity- implicit or explicit- affects
    the way readers eyes move through the text. We observe the difference in the
    behaviour of the eye, while reading sarcastic and non sarcastic sentences.
    Motivated by his observation, we augment traditional linguistic and stylistic
    features for sarcasm detection with the cognitive features obtained from
    readers eye movement data. We perform statistical classification using the
    enhanced feature set so obtained. The augmented cognitive features improve
    sarcasm detection by 3.7% (in terms of F-score), over the performance of the
    best reported system.

    End-To-End Visual Speech Recognition With LSTMs

    Stavros Petridis, Zuwei Li, Maja Pantic
    Comments: Accepted for publication, ICASSP 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    Traditional visual speech recognition systems consist of two stages, feature
    extraction and classification. Recently, several deep learning approaches have
    been presented which automatically extract features from the mouth images and
    aim to replace the feature extraction stage. However, research on joint
    learning of features and classification is very limited. In this work, we
    present an end-to-end visual speech recognition system based on Long-Short
    Memory (LSTM) networks. To the best of our knowledge, this is the first model
    which simultaneously learns to extract features directly from the pixels and
    perform classification and also achieves state-of-the-art performance in visual
    speech classification. The model consists of two streams which extract features
    directly from the mouth and difference images, respectively. The temporal
    dynamics in each stream are modelled by an LSTM and the fusion of the two
    streams takes place via a Bidirectional LSTM (BLSTM). An absolute improvement
    of 9.7% over the base line is reported on the OuluVS2 database, and 1.5% on the
    CUAVE database when compared with other methods which use a similar visual
    front-end.


    Learning

    Git Blame Who?: Stylistic Authorship Attribution of Small, Incomplete Source Code Fragments

    Edwin Dauber, Aylin Caliskan-Islam, Richard Harang, Rachel Greenstadt
    Subjects: Learning (cs.LG); Cryptography and Security (cs.CR)

    Program authorship attribution has implications for the privacy of
    programmers who wish to contribute code anonymously. While previous work has
    shown that complete files that are individually authored can be attributed, we
    show here for the first time that accounts belonging to open source
    contributors containing short, incomplete, and typically uncompilable fragments
    can also be effectively attributed.

    We propose a technique for authorship attribution of contributor accounts
    containing small source code samples, such as those that can be obtained from
    version control systems or other direct comparison of sequential versions. We
    show that while application of previous methods to individual small source code
    samples yields an accuracy of about 73% for 106 programmers as a baseline, by
    ensembling and averaging the classification probabilities of a sufficiently
    large set of samples belonging to the same author we achieve 99% accuracy for
    assigning the set of samples to the correct author. Through these results, we
    demonstrate that attribution is an important threat to privacy for programmers
    even in real-world collaborative environments such as GitHub. Additionally, we
    propose the use of calibration curves to identify samples by unknown and
    previously unencountered authors in the open world setting. We show that we can
    also use these calibration curves in the case that we do not have linking
    information and thus are forced to classify individual samples directly. This
    is because the calibration curves allow us to identify which samples are more
    likely to have been correctly attributed. Using such a curve can help an
    analyst choose a cut-off point which will prevent most misclassifications, at
    the cost of causing the rejection of some of the more dubious correct
    attributions.

    Detectability thresholds in networks with dynamic link and community structure

    Paolo Barucca, Fabrizio Lillo, Piero Mazzarisi, Daniele Tantari
    Comments: 5 pages, 4 figures
    Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Physics and Society (physics.soc-ph); Machine Learning (stat.ML)

    We study the inference of a model of temporal networks in which both
    communities and links keep memory of previous network state. By considering
    maximum likelihood inference from single snapshot observation of the network,
    we show that link persistence decreases the detectability threshold, preventing
    the inference of communities even when they are in principle strong enough,
    while community persistence tends to restore this possibility. Then we show
    that the inferred communities share a maximum overlap with those of a specific
    previous instant of time, corresponding to the maximum of a time-lagged
    assortativity parameter, and therefore they can be closer to those of the past
    than of the present. We analytically characterize these transitions as a
    function of the memory parameters of the model.

    Empirical Study of Drone Sound Detection in Real-Life Environment with Deep Neural Networks

    Sungho Jeon, Jong-Woo Shin, Young-Jun Lee, Woong-Hee Kim, YoungHyoun Kwon, Hae-Yong Yang
    Comments: IEEE 5 Pages, Submitted
    Subjects: Sound (cs.SD); Learning (cs.LG)

    This work aims to investigate the use of deep neural network to detect
    commercial hobby drones in real-life environments by analyzing their sound
    data. The purpose of work is to contribute to a system for detecting drones
    used for malicious purposes, such as for terrorism. Specifically, we present a
    method capable of detecting the presence of commercial hobby drones as a binary
    classification problem based on sound event detection. We recorded the sound
    produced by a few popular commercial hobby drones, and then augmented this data
    with diverse environmental sound data to remedy the scarcity of drone sound
    data in diverse environments. We investigated the effectiveness of
    state-of-the-art event sound classification methods, i.e., a Gaussian Mixture
    Model (GMM), Convolutional Neural Network (CNN), and Recurrent Neural Network
    (RNN), for drone sound detection. Our empirical results, which were obtained
    with a testing dataset collected on an urban street, confirmed the
    effectiveness of these models for operating in a real environment. In summary,
    our RNN models showed the best detection performance with an F-Score of 0.8009
    with 240 ms of input audio with a short processing time, indicating their
    applicability to real-time detection systems.

    A Novel Variable Selection Method based on Frequent Pattern Tree for Real-time Traffic Accident Risk Prediction

    Lei Lin, Qian Wang, Adel W. Sadek
    Comments: OPT-i 2014 – 1st International Conference on Engineering and Applied Sciences Optimization, Proceedings
    Subjects: Applications (stat.AP); Learning (cs.LG)

    Traffic accident data are usually noisy, contain missing values, and
    heterogeneous. How to select the most important variables to improve real-time
    traffic accident risk prediction has become a concern of many recent studies.
    This paper proposes a novel variable selection method based on the Frequent
    Pattern tree (FP tree) algorithm. First, all the frequent patterns in the
    traffic accident dataset are discovered. Then for each frequent pattern, a new
    criterion, called the Relative Object Purity Ratio (ROPR) which we proposed, is
    calculated. This ROPR is added to the importance score of the variables that
    differentiates one frequent pattern from the others. To test the proposed
    method, a dataset was compiled from the traffic accidents records detected by
    only one detector on interstate highway I-64 in Virginia in 2005. This data set
    was then linked to other variables such as real-time traffic information and
    weather conditions. Both the proposed method based on the FP tree algorithm, as
    well as the widely utilized, random forest method, were then used to identify
    the important variables or the Virginia data set. The results indicate that
    there are some differences between the variables deemed important by the FP
    tree and those selected by the random forest method. Following this, two
    baseline models (i.e. a nearest neighbor (k-NN) method and a Bayesian network)
    were developed to predict accident risk based on the variables identified by
    both the FP tree method and the random forest method. The results show that the
    models based on the variable selection using the FP tree performed better than
    those based on the random forest method for several versions of the k-NN and
    Bayesian network models.The best results were derived from a Bayesian network
    model using variables from FP tree. That model could predict 61.11% of
    accidents accurately, while having a false alarm rate of 38.16%.

    Rare Disease Physician Targeting: A Factor Graph Approach

    Yong Cai, Yunlong Wang, Dong Dai
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    In rare disease physician targeting, a major challenge is how to identify
    physicians who are treating diagnosed or underdiagnosed rare diseases patients.
    Rare diseases have extremely low incidence rate. For a specified rare disease,
    only a small number of patients are affected and a fractional of physicians are
    involved. The existing targeting methodologies, such as segmentation and
    profiling, are developed under mass market assumption. They are not suitable
    for rare disease market where the target classes are extremely imbalanced. The
    authors propose a graphical model approach to predict targets by jointly
    modeling physician and patient features from different data spaces and
    utilizing the extra relational information. Through an empirical example with
    medical claim and prescription data, the proposed approach demonstrates better
    accuracy in finding target physicians. The graph representation also provides
    visual interpretability of relationship among physicians and patients. The
    model can be extended to incorporate more complex dependency structures. This
    article contributes to the literature of exploring the benefit of utilizing
    relational dependencies among entities in healthcare industry.

    Poisson–Gamma Dynamical Systems

    Aaron Schein, Mingyuan Zhou, Hanna Wallach
    Comments: Appeared in the Proceedings of the 29th Advances in Neural Information Processing Systems (NIPS 2016)
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We introduce a new dynamical system for sequentially observed multivariate
    count data. This model is based on the gamma–Poisson construction—a natural
    choice for count data—and relies on a novel Bayesian nonparametric prior that
    ties and shrinks the model parameters, thus avoiding overfitting. We present an
    efficient MCMC inference algorithm that advances recent work on augmentation
    schemes for inference in negative binomial models. Finally, we demonstrate the
    model’s inductive bias using a variety of real-world data sets, showing that it
    exhibits superior predictive performance over other models and infers highly
    interpretable latent structure.


    Information Theory

    On the Optimality of Separation between Caching and Delivery in General Cache Networks

    Navid Naderializadeh, Mohammad Ali Maddah-Ali, A. Salman Avestimehr
    Comments: A short version of this work will be submitted to the 2017 IEEE International Symposium on Information Theory (ISIT)
    Subjects: Information Theory (cs.IT)

    We consider a system, containing a library of multiple files and a general
    memoryless communication network through which a server is connected to
    multiple users, each equipped with a local isolated cache of certain size that
    can be used to store part of the library. Each user will ask for one of the
    files in the library, which needs to be delivered by the server through the
    intermediate communication network. The objective is to design the cache
    placement (without prior knowledge of users’ future requests) and the delivery
    phase in order to minimize the (normalized) delivery delay. We assume that the
    delivery phase consists of two steps: (1) generation of a set of multicast
    messages at the server, one for each subset of users, and (2) delivery of the
    multicast messages to the users.

    In this setting, we show that there exists a universal scheme for cache
    placement and multicast message generation, which is independent of the
    underlying communication network between the server and the users, and achieves
    the optimal delivery delay to within a constant factor for all memoryless
    networks. We prove this result, even though the capacity region of the
    underlying communication network is not known, even approximately. This result
    demonstrates that in the aforementioned setting, a separation between caching
    and multicast message generation on one hand, and delivering the multicast
    messages to the users on the other hand is approximately optimal. This result
    has the important practical implication that the prefetching can be done
    independent of network structure in the upcoming delivery phase.

    Mutual Information and Optimality of Approximate Message-Passing in Random Linear Estimation

    Jean Barbier, Nicolas Macris, Mohamad Dia, Florent Krzakala
    Comments: Submitted to the IEEE Transactions in Information Theory
    Subjects: Information Theory (cs.IT); Disordered Systems and Neural Networks (cond-mat.dis-nn); Mathematical Physics (math-ph)

    We consider the estimation of a signal from the knowledge of its noisy linear
    random Gaussian projections. A few examples where this problem is relevant are
    compressed sensing, sparse superposition codes, and code division multiple
    access. There has been a number of works considering the mutual information for
    this problem using the replica method from statistical physics. Here we put
    these considerations on a firm rigorous basis. First, we show, using a
    Guerra-Toninelli type interpolation, that the replica formula yields an upper
    bound to the exact mutual information. Secondly, for many relevant practical
    cases, we present a converse lower bound via a method that uses spatial
    coupling, state evolution analysis and the I-MMSE theorem. This yields a single
    letter formula for the mutual information and the minimal-mean-square error for
    random Gaussian linear estimation of all discrete bounded signals. In addition,
    we prove that the low complexity approximate message-passing algorithm is
    optimal outside of the so-called hard phase, in the sense that it
    asymptotically reaches the minimal-mean-square error. In this work spatial
    coupling is used primarily as a proof technique. However our results also prove
    two important features of spatially coupled noisy linear random Gaussian
    estimation. First there is no algorithmically hard phase. This means that for
    such systems approximate message-passing always reaches the minimal-mean-square
    error. Secondly, in a proper limit the mutual information associated to such
    systems is the same as the one of uncoupled linear random Gaussian estimation.

    Low-complexity Receiver for Multi-Level Polar Coded Modulation in Non-Orthogonal Multiple Access

    Beatrice Tomasi, Frédéric Gabry, Valerio Bioglio, Ingmar Land, Jean-Claude Belfiore
    Subjects: Information Theory (cs.IT)

    Non-orthogonal multiple access (NOMA) schemes have been proved to increase
    the multiple-access achievable rate with respect to orthogonal multiple access
    (OMA). In this paper we propose a novel communication system that combines
    multi-level coded modulation and polar codes in a NOMA scenario. Computational
    complexity decreases with the proposed scheme with respect to state-of-the-art
    solutions. We also highlight the trade-off between error rate performance and
    computational complexity.

    Analysis of Proportional Fair Scheduling Under Bursty On-Off Traffic

    Fei Liu, Janne Riihijärvi, Marina Petrova
    Comments: This is the author’s version of an article that has been accepted for publication in IEEE Communications Letters
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    Proportional fair scheduling (PFS) has been adopted as a standard solution
    for fair resource allocation in modern wireless cellular networks. With the
    emergence of heterogeneous networks with widely varying user loads, it is of
    great importance to characterize the performance of PFS under bursty traffic,
    which is the case in most wireless streaming and data transfer services. In
    this letter, we provide the first analytical solution to the performance of PFS
    under bursty on-off traffic load. We use the Gaussian approximation model to
    derive a closed-form expression of the achievable user data rates. In order to
    further improve the accuracy of our baseline analytical solution for multi-cell
    networks, we design a hybrid approximation by employing multi-interference
    analysis. The simulation results verify that our model guarantees extremely low
    data rate estimation error, which is further insensitive to changes in session
    duration, traffic load and user density.

    The Broadcast Channel with Degraded Message Sets and Unreliable Conference

    Dor Itzhak, Yossef Steinberg
    Subjects: Information Theory (cs.IT)

    As demonstrated in many recent studies, cooperation between users can greatly
    improve the performance of communication systems. Most of the works in the
    literature present models where all the users are aware of the resources
    available for cooperation. However, the scenario where cooperation links are
    sometimes unavailable or that some users cannot be updated whether the
    cooperation links are present or not, is more realistic in today’s dynamic
    ad-hoc communication systems. In such a case we need coding schemes that
    exploit the cooperation links if they are present, and can still operate if
    cooperation is not possible. In this work we study the general broadcast
    channel model with degraded message sets and cooperation links that may be
    absent, and derive it’s capacity region under such uncertainty conditions.

    Physical Layer Security Jamming: Theoretical Limits and Practical Designs in Wireless Networks

    Kanapathippillai Cumanan, Hong Xing, Peng Xu, Gan Zheng, Xuchu Dai, Arumugam Nallanathan, Zhiguo Ding, George K. Karagiannidis
    Subjects: Information Theory (cs.IT)

    Physical layer security has been recently recognized as a promising new
    design paradigm to provide security in wireless networks. In addition to the
    existing conventional cryptographic methods, physical layer security exploits
    the dynamics of fading channels to enhance secured wireless links. In this
    approach, jamming plays a key role by generating noise signals to confuse the
    potential eavesdroppers, and significantly improves quality and reliability of
    secure communications between legitimate terminals. This article presents
    theoretical limits and practical designs of jamming approaches for physical
    layer security. In particular, the theoretical limits explore the achievable
    secrecy rates of user cooperation based jamming whilst the centralized, and
    game theoretic based precoding techniques are reviewed for practical
    implementations. In addition, the emerging wireless energy harvesting
    techniques are exploited to harvest the required energy to transmit jamming
    signals. Future directions of these approaches, and the associated research
    challenges are also briefly outlined.

    Random Caching Based Cooperative Transmission in Heterogeneous Wireless Networks

    Wanli Wen, Ying Cui, Fu-Chun Zheng, Shi Jin, Yanxiang Jiang
    Comments: 30 pages, 7 figures, submitted to ICC, TWC
    Subjects: Information Theory (cs.IT)

    Base station cooperation in heterogeneous wireless networks (HetNets) is a
    promising approach to improve the network performance, but it also imposes a
    significant challenge on backhaul. On the other hand, caching at small base
    stations (SBSs) is considered as an efficient way to reduce backhaul load in
    HetNets. In this paper, we jointly consider SBS caching and cooperation in a
    downlink largescale HetNet. We propose two SBS cooperative transmission schemes
    under random caching at SBSs with the caching distribution as a design
    parameter. Using tools from stochastic geometry and adopting appropriate
    integral transformations, we first derive a tractable expression for the
    successful transmission probability under each scheme. Then, under each scheme,
    we consider the successful transmission probability maximization by optimizing
    the caching distribution, which is a challenging optimization problem with a
    non-convex objective function. By exploring optimality properties and using
    optimization techniques, under each scheme, we obtain a local optimal solution
    in the general case and global optimal solutions in some special cases.
    Compared with some existing caching designs in the literature, e.g., the most
    popular caching, the i.i.d. caching and the uniform caching, the optimal random
    caching under each scheme achieves better successful transmission probability
    performance. The analysis and optimization results provide valuable design
    insights for practical HetNets.

    Age-optimal Information Updates in Multihop Networks

    Ahmed M. Bedewy, Yin Sun, Ness B. Shroff
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    The problem of reducing the age-of-information has been extensively studied
    in the single-hop setting. In this paper, we minimize the age-of-information in
    general multihop networks. We first prove that for exponential packet
    transmission times, a preemptive Last Generated First Served (LGFS) policy
    results in smaller age processes at all nodes of the network (in a stochastic
    ordering sense) than any other causal policy. We then show that for the class
    of non-preemptive work-conserving policies, for any general packet transmission
    time distributions, the non-preemptive LGFS policy minimizes the age processes
    at all nodes of the network (again in a stochastic ordering sense). These
    optimality results not only hold for the age process, but also for any
    non-decreasing functional of the age process.

    Antenna Deployment Method for MIMO Radar under the Situation of Multiple Interference Regions

    Tianxian Zhang, Jiadong Liang, Yichuan Yang, Guolong Cui, Lingjiang Kong, Xiaobo Yang
    Comments: 12 pages
    Subjects: Information Theory (cs.IT)

    In this paper, considering multiple interference regions simultaneously, an
    optimal antenna deployment problem for distributed Multi-Input Multi-Output
    (MIMO) radar is investigated. The optimal antenna deployment problem is solved
    by proposing an antenna deployment method based on Multi-Objective Particle
    Swarm Optimization (MOPSO). Firstly, we construct a multi-objective
    optimization problem for MIMO radar antenna deployment by choosing the
    interference power densities of different regions as objective functions. Then,
    to obtain the optimal deployment result without wasting time and computational
    resources, an iteration convergence criterion based on interval distance is
    proposed. The iteration convergence criterion can be used to stop the MOPSO
    optimization process efficiently when the optimal antenna deployment algorithm
    reaches the desired convergence level. Finally, numerical results are provided
    to verify the validity of the proposed algorithm.

    Improved Algorithms For Structured Sparse Recovery

    Lingxiao Huang, Yifei Jin, Jian Li, Haitao Wang
    Subjects: Information Theory (cs.IT); Data Structures and Algorithms (cs.DS)

    It is known that certain structures of the signal in addition to the standard
    notion of sparsity (called structured sparsity) can improve the sample
    complexity in several compressive sensing applications. Recently, Hegde et al.
    proposed a framework, called approximation-tolerant model-based compressive
    sensing, for recovering signals with structured sparsity. Their framework
    requires two oracles, the head- and the tail-approximation projection oracles.
    The two oracles should return approximate solutions in the model which is
    closest to the query signal. In this paper, we consider two structured sparsity
    models and obtain improved projection algorithms. The first one is the tree
    sparsity model, which captures the support structure in the wavelet
    decomposition of piecewise-smooth signals. We propose a linear time
    ((1-epsilon))-approximation algorithm for head-approximation projection and a
    linear time ((1+epsilon))-approximation algorithm for tail-approximation
    projection. The best previous result is an ( ilde{O}(nlog n)) time
    bicriterion approximation algorithm (meaning that their algorithm may return a
    solution of sparsity larger than (k)) by Hegde et al. Our result provides an
    affirmative answer to the open problem mentioned in the survey of Hegde and
    Indyk. As a corollary, we can recover a constant approximate (k)-sparse signal.
    The other is the Constrained Earth Mover Distance (CEMD) model, which is useful
    to model the situation where the positions of the nonzero coefficients of a
    signal do not change significantly as a function of spatial (or temporal)
    locations. We obtain the first single criterion constant factor approximation
    algorithm for the head-approximation projection. The previous best known
    algorithm is a bicriterion approximation. Using this result, we can get a
    faster constant approximation algorithm with fewer measurements for the
    recovery problem in CEMD model.

    Rigorous Dynamics of Expectation-Propagation-Based Signal Recovery from Unitarily Invariant Measurements

    Keigo Takeuchi
    Comments: Submitted to ISIT2017. A short version of arXiv:1701.05284
    Subjects: Information Theory (cs.IT)

    This paper investigates sparse signal recovery based on expectation
    propagation (EP) from unitarily invariant measurements. A rigorous analysis is
    presented for the state evolution (SE) of an EP-based message-passing algorithm
    in the large system limit, where both input and output dimensions tend to
    infinity at an identical speed. The main result is the justification of an SE
    formula conjectured by Ma and Ping in 2016.

    On the Performance of X-Duplex Relaying

    Shuai Li, Mingxin Zhou, Jianjun Wu, Lingyang Song, Yonghui Li, Hongbin Li
    Comments: 30 pages, 7 figures, journal
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    In this paper, we study a X-duplex relay system with one source, one
    amplify-and-forward (AF) relay and one destination, where the relay is equipped
    with a shared antenna and two radio frequency (RF) chains used for transmission
    or reception. X-duplex relay can adaptively configure the connection between
    its RF chains and antenna to operate in either HD or FD mode, according to the
    instantaneous channel conditions. We first derive the distribution of the
    signal to interference plus noise ratio (SINR), based on which we then analyze
    the outage probability, average symbol error rate (SER), and average sum rate.
    We also investigate the X-duplex relay with power allocation and derive the
    lower bound and upper bound of the corresponding outage probability. Both
    analytical and simulated results show that the X-duplex relay achieves a better
    performance over pure FD and HD schemes in terms of SER, outage probability and
    average sum rate, and the performance floor caused by the residual self
    interference can be eliminated using flexible RF chain configurations.

    High Rate LDPC Codes from Difference Covering Arrays

    D. Donovan, A. Rao, E. Şule Yazıcı
    Comments: 11 pages
    Subjects: Combinatorics (math.CO); Information Theory (cs.IT)

    This paper presents a combinatorial construction of low-density parity-check
    (LDPC) codes from difference covering arrays. While the original construction
    by Gallagher was by randomly allocating bits in a sparse parity-check matrix,
    over the past 20 years researchers have used a variety of more structured
    approaches to construct these codes, with the more recent constructions of
    well-structured LDPC coming from balanced incomplete block designs (BIBDs) and
    from Latin squares over finite fields. However these constructions have
    suffered from the limited orders for which these designs exist. Here we present
    a construction of LDPC codes of length (4n^2 – 2n) for all (n) using the cyclic
    group of order (2n). These codes achieve high information rate (greater than
    0.8) for (n geq 8), have girth at least 6 and have minimum distance 6 for (n)
    odd.

    Duality of channels and codes

    Joseph M. Renes
    Comments: 23 pages, 2 figures
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    For any given channel (W) with classical inputs and possibly quantum outputs,
    a dual classical-input channel (W^perp) can be defined by embedding the
    original into a channel (mathcal N) with quantum inputs and outputs. Here we
    give new uncertainty relations for a general class of entropies that lead to
    very close relationships between the original channel and its dual. Moreover,
    we show that channel duality can be combined with duality of linear codes,
    whereupon the uncertainty relations imply that the performance of a given code
    over a given channel is entirely characterized by the performance of the dual
    code on the dual channel. This has several applications. In the context of
    polar codes, it implies that the rates of polarization to ideal and useless
    channels must be identical. Duality also relates the tasks of channel coding
    and privacy amplification, implying that the finite blocklength performance of
    extractors and codes is precisely linked, and that optimal rate extractors can
    be transformed into capacity-achieving codes, and vice versa. Finally, duality
    also extends to the EXIT function of any channel and code. Here it implies that
    for any channel family, if the EXIT function for a fixed code has a sharp
    transition, then it must be such that the rate of the code equals the capacity
    at the transition. This may give a different route to proving a code family
    achieves capacity by establishing sharp EXIT function transitions.




沪ICP备19023445号-2号
友情链接