IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 8 Feb 2017

    我爱机器学习(52ml.net)发表于 2017-02-08 00:00:00
    love 0

    Neural and Evolutionary Computing

    Estimation of classrooms occupancy using a multi-layer perceptron

    Eugénio Rodrigues, Luísa Dias Pereira, Adélio Rodrigues Gaspar, Álvaro Gomes, Manuel Carlos Gameiro da Silva
    Comments: 6 pages, 2 figures, conference article
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    This paper presents a multi-layer perceptron model for the estimation of
    classrooms number of occupants from sensed indoor environmental data-relative
    humidity, air temperature, and carbon dioxide concentration. The modelling
    datasets were collected from two classrooms in the Secondary School of Pombal,
    Portugal. The number of occupants and occupation periods were obtained from
    class attendance reports. However, post-class occupancy was unknown and the
    developed model is used to reconstruct the classrooms occupancy by filling the
    unreported periods. Different model structure and environment variables
    combination were tested. The model with best accuracy had as input vector 10
    variables of five averaged time intervals of relative humidity and carbon
    dioxide concentration. The model presented a mean square error of 1.99,
    coefficient of determination of 0.96 with a significance of p-value < 0.001,
    and a mean absolute error of 1 occupant. These results show promising
    estimation capabilities in uncertain indoor environment conditions.

    Toward the automated analysis of complex diseases in genome-wide association studies using genetic programming

    Andrew Sohn, Randal S. Olson, Jason H. Moore
    Comments: 9 pages, 4 figures, submitted to GECCO 2017 conference and currently under review
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

    Machine learning has been gaining traction in recent years to meet the demand
    for tools that can efficiently analyze and make sense of the ever-growing
    databases of biomedical data in health care systems around the world. However,
    effectively using machine learning methods requires considerable domain
    expertise, which can be a barrier of entry for bioinformaticians new to
    computational data science methods. Therefore, off-the-shelf tools that make
    machine learning more accessible can prove invaluable for bioinformaticians. To
    this end, we have developed an open source pipeline optimization tool
    (TPOT-MDR) that uses genetic programming to automatically design machine
    learning pipelines for bioinformatics studies. In TPOT-MDR, we implement
    Multifactor Dimensionality Reduction (MDR) as a feature construction method for
    modeling higher-order feature interactions, and combine it with a new expert
    knowledge-guided feature selector for large biomedical data sets. We
    demonstrate TPOT-MDR’s capabilities using a combination of simulated and real
    world data sets from human genetics and find that TPOT-MDR significantly
    outperforms modern machine learning methods such as logistic regression and
    eXtreme Gradient Boosting (XGBoost). We further analyze the best pipeline
    discovered by TPOT-MDR for a real world problem and highlight TPOT-MDR’s
    ability to produce a high-accuracy solution that is also easily interpretable.

    Characterisation of speech diversity using self-organising maps

    Tom A. F. Anderson, David M. W. Powers
    Comments: 16th Speech Science and Technology Conference (SST2016)
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)

    We report investigations into speaker classification of larger quantities of
    unlabelled speech data using small sets of manually phonemically annotated
    speech. The Kohonen speech typewriter is a semi-supervised method comprised of
    self-organising maps (SOMs) that achieves low phoneme error rates. A SOM is a
    2D array of cells that learn vector representations of the data based on
    neighbourhoods. In this paper, we report a method to evaluate pronunciation
    using multilevel SOMs with /hVd/ single syllable utterances for the study of
    vowels, for Australian pronunciation.


    Computer Vision and Pattern Recognition

    An Implementation of Faster RCNN with Study for Region Sampling

    Xinlei Chen, Abhinav Gupta
    Comments: Technical Report, 3 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We adapted the join-training scheme of Faster RCNN framework from Caffe to
    TensorFlow as a baseline implementation for object detection. Our code is made
    publicly available. This report documents the simplifications made to the
    original pipeline, with justifications from ablation analysis on both PASCAL
    VOC 2007 and COCO 2014. We further investigated the role of non-maximal
    suppression (NMS) in selecting regions-of-interest (RoIs) for region
    classification, and found that a biased sampling toward small regions helps
    performance and can achieve on-par mAP to NMS-based sampling when converged
    sufficiently.

    Tracking using Numerous Anchor points

    Tanushri Chakravorty, Guillaume-Alexandre Bilodeau, Eric Granger
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, an online adaptive model-free tracker is proposed to track
    single objects in video sequences to deal with real-world tracking challenges
    like low-resolution, object deformation, occlusion and motion blur. The novelty
    lies in the construction of a strong appearance model that captures features
    from the initialized bounding box and then are assembled into anchor-point
    features. These features memorize the global pattern of the object and have an
    internal star graph-like structure. These features are unique and flexible and
    helps tracking generic and deformable objects with no limitation on specific
    objects. In addition, the relevance of each feature is evaluated online using
    short-term consistency and long-term consistency. These parameters are adapted
    to retain consistent features that vote for the object location and that deal
    with outliers for long-term tracking scenarios. Additionally, voting in a
    Gaussian manner helps in tackling inherent noise of the tracking system and in
    accurate object localization. Furthermore, the proposed tracker uses pairwise
    distance measure to cope with scale variations and combines pixel-level binary
    features and global weighted color features for model update. Finally,
    experimental results on a visual tracking benchmark dataset are presented to
    demonstrate the effectiveness and competitiveness of the proposed tracker.

    Face Aging With Conditional Generative Adversarial Networks

    Grigory Antipov, Moez Baccouche, Jean-Luc Dugelay
    Comments: 5 pages, 3 figures, submitted to ICIP 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    It has been recently shown that Generative Adversarial Networks (GANs) can
    produce synthetic images of exceptional visual fidelity. In this work, we
    propose the GAN-based method for automatic face aging. Contrary to previous
    works employing GANs for altering of facial attributes, we make a particular
    emphasize on preserving the original person’s identity in the aged version of
    his/her face. To this end, we introduce a novel approach for
    “Identity-Preserving” optimization of GAN’s latent vectors. The objective
    evaluation of the resulting aged and rejuvenated face images by the
    state-of-the-art face recognition and age estimation solutions demonstrate the
    high potential of the proposed method.

    Image Reconstruction using Matched Wavelet Estimated from Data Sensed Compressively using Partial Canonical Identity Matrix

    Naushad Ansari, Anubha Gupta
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes a joint framework wherein lifting-based, separable,
    image-matched wavelets are estimated from compressively sensed (CS) images and
    used for the reconstruction of the same. Matched wavelet can be easily designed
    if full image is available. Also matched wavelet may provide better
    reconstruction results in CS application compared to standard wavelet
    sparsifying basis. Since in CS application, we have compressively sensed image
    instead of full image, existing methods of designing matched wavelet cannot be
    used. Thus, we propose a joint framework that estimates matched wavelet from
    the compressively sensed images and also reconstructs full images. This paper
    has three significant contributions. First, lifting-based, image-matched
    separable wavelet is designed from compressively sensed images and is also used
    to reconstruct the same. Second, a simple sensing matrix is employed to sample
    data at sub-Nyquist rate such that sensing and reconstruction time is reduced
    considerably without any noticeable degradation in the reconstruction
    performance. Third, a new multi-level L-Pyramid wavelet decomposition strategy
    is provided for separable wavelet implementation on images that leads to
    improved reconstruction performance. Compared to CS-based reconstruction using
    standard wavelets with Gaussian sensing matrix and with existing wavelet
    decomposition strategy, the proposed methodology provides faster and better
    image reconstruction in compressive sensing application.

    Hashing in the Zero Shot Framework via Domain Adaptation

    Shubham Pachori, Ameya Deshpande, Shanmuganathan Raman
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Techniques to learn hash codes which can store and retrieve large dimensional
    multimedia data efficiently have attracted broad research interests in the
    recent years. With rapid explosion of newly emerged concepts and online data,
    existing supervised hashing algorithms suffer from the problem of scarcity of
    ground truth annotations due to the high cost of obtaining manual annotations.
    Therefore, we propose an algorithm to learn a hash function from training
    images belonging to seen classes which can efficiently encode images of unseen
    classes to binary codes. Specifically, we project the image features from
    visual space and semantic features from semantic space into a common Hamming
    subspace. Earlier works to generate hash codes have tried to relax the discrete
    constraints on hash codes and solve the continuous optimization problem.
    However, it often leads to quantization errors. In this work, we use the
    max-margin classifier to learn an efficient hash function. To address the
    concern of domain-shift which may arise due to the introduction of new classes,
    we also introduce an unsupervised domain adaptation model in the proposed
    hashing framework. Results on the three image datasets show the advantage of
    using domain adaptation in learning a high-quality hash function and
    superiority of our method for the task of image retrieval performance as
    compared to several state-of-the-art hashing methods.

    A New Point-set Registration Algorithm for Fingerprint Matching

    A. Pasha Hosseinbor, Renat Zhdanov, Alexander Ushveridze
    Comments: Point pattern matching, point-set registration, fingerprint, minutia matching, alignment
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A novel minutia-based fingerprint matching algorithm is proposed that employs
    iterative global alignment on two minutia sets. The matcher considers all
    possible minutia pairings and iteratively aligns the two sets until the number
    of minutia pairs does not exceed the maximum number of allowable one-to-one
    pairings. The optimal alignment parameters are derived analytically via linear
    least squares. The first alignment establishes a region of overlap between the
    two minutia sets, which is then (iteratively) refined by each successive
    alignment. After each alignment, minutia pairs that exhibit weak correspondence
    are discarded. The process is repeated until the number of remaining pairs no
    longer exceeds the maximum number of allowable one-to-one pairings. The
    proposed algorithm is tested on both the FVC2000 and FVC2002 databases, and the
    results indicate that the proposed matcher is both effective and efficient for
    fingerprint authentication; it is fast and does not utilize any computationally
    expensive mathematical functions (e.g. trigonometric, exponential). In addition
    to the proposed matcher, another contribution of the paper is the analytical
    derivation of the least squares solution for the optimal alignment parameters
    for two point-sets lacking exact correspondence.

    Development of JavaScript-based deep learning platform and application to distributed training

    Masatoshi Hidaka, Ken Miura, Tatsuya Harada
    Comments: Workshop paper for ICLR2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

    Deep learning is increasingly attracting attention for processing big data.
    Existing frameworks for deep learning must be set up to specialized computer
    systems. Gaining sufficient computing resources therefore entails high costs of
    deployment and maintenance. In this work, we implement a matrix library and
    deep learning framework that uses JavaScript. It can run on web browsers
    operating on ordinary personal computers and smartphones. Using JavaScript,
    deep learning can be accomplished in widely diverse environments without the
    necessity for software installation. Using GPGPU from WebCL framework, our
    framework can train large scale convolutional neural networks such as VGGNet
    and ResNet. In the experiments, we demonstrate their practicality by training
    VGGNet in a distributed manner using web browsers as the client.

    A Region Based Easy Path Wavelet Transform For Sparse Image Representation

    Renato Budinich
    Subjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

    The Easy Path Wavelet Transform is an adaptive transform for bivariate
    functions (in particular natural images) which has been proposed in [1]. It
    provides a sparse representation by finding a path in the domain of the
    function leveraging the local correlations of the function values. It then
    applies a one dimensional wavelet transform to the obtained vector, decimates
    the points and iterates the procedure. The main drawback of such method is the
    need to store, for each level of the transform, the path which vectorizes the
    two dimensional data. Here we propose a variation on the method which consists
    of firstly applying a segmentation procedure to the function domain,
    partitioning it into regions where the variation in the function values is low;
    in a second step, inside each such region, a path is found in some
    deterministic way, i.e. not data-dependent. This circumvents the need to store
    the paths at each level, while still obtaining good quality lossy compression.
    This method is particularly well suited to encode a Region of Interest in the
    image with different quality than the rest of the image.

    [1] Gerlind Plonka. The easy path wavelet transform: A new adaptive wavelet
    transform for sparse representation of two-dimensional data. Multiscale
    Modeling & Simulation, 7(3):1474(-)1496, 2008.

    Low Rank Matrix Recovery with Simultaneous Presence of Outliers and Sparse Corruption

    Mostafa Rahmani, George Atia
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We study a data model in which the data matrix D can be expressed as D = L +
    S + C, where L is a low rank matrix, S an element-wise sparse matrix and C a
    matrix whose non-zero columns are outlying data points. To date, robust PCA
    algorithms have solely considered models with either S or C, but not both. As
    such, existing algorithms cannot account for simultaneous element-wise and
    column-wise corruptions. In this paper, a new robust PCA algorithm that is
    robust to simultaneous types of corruption is proposed. Our approach hinges on
    the sparse approximation of a sparsely corrupted column so that the sparse
    expansion of a column with respect to the other data points is used to
    distinguish a sparsely corrupted inlier column from an outlying data point. We
    also develop a randomized design which provides a scalable implementation of
    the proposed approach. The core idea of sparse approximation is analyzed
    analytically where we show that the underlying ell_1-norm minimization can
    obtain the representation of an inlier in presence of sparse corruptions.


    Artificial Intelligence

    Extracting Lifted Mutual Exclusion Invariants from Temporal Planning Domains

    Sara Bernardini, Fabio Fagnani, David E. Smith
    Subjects: Artificial Intelligence (cs.AI)

    We present a technique for automatically extracting mutual exclusion
    invariants from temporal planning instances. It first identifies a set of
    invariant templates by inspecting the lifted representation of the domain and
    then checks these templates against properties that assure invariance. Our
    technique builds on other approaches to invariant synthesis presented in the
    literature, but departs from their limited focus on instantaneous actions by
    addressing temporal domains. To deal with time, we formulate invariance
    conditions that account for the entire structure of the actions and the
    possible concurrent interactions between them. As a result, we construct a
    significantly more comprehensive technique than previous methods, which is able
    to find not only invariants for temporal domains, but also a broader set of
    invariants for non-temporal domains. The experimental results reported in this
    paper provide evidence that identifying a broader set of invariants results in
    the generation of fewer multi-valued state variables with larger domains. We
    show that, in turn, this reduction in the number of variables reflects
    positively on the performance of a number of temporal planners that use a
    variable/value representation by significantly reducing their running time.

    ASHACL: Alternative Shapes Constraint Language

    Peter F. Patel-Schneider
    Comments: 17 pages
    Subjects: Artificial Intelligence (cs.AI)

    ASHACL, a variant of the W3C Shapes Constraint Language, is designed to
    determine whether an RDF graph meets some conditions. These conditions are
    grouped into shapes, which validate whether particular RDF terms each meet the
    constraints of the shape. Shapes are themselves expressed as RDF triples in an
    RDF graph, called a shapes graph.

    Solving the Brachistochrone Problem by an Influence Diagram

    Jiří Vomlel
    Comments: 8 pages, 2 figures
    Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI)

    Influence diagrams are a decision-theoretic extension of probabilistic
    graphical models. In this paper we show how they can be used to solve the
    Brachistochrone problem. We present results of numerical experiments on this
    problem, compare the solution provided by the influence diagram with the
    optimal solution. The R code used for the experiments is presented in the
    Appendix.

    Representations of language in a model of visually grounded speech signal

    Grzegorz Chrupała, Lieke Gelderloos, Afra Alishahi
    Comments: 10 pages
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We present a visually grounded model of speech perception which projects
    spoken utterances and images to a joint semantic space. We use a multi-layer
    recurrent highway network to model the temporal nature of spoken speech, and
    show that it learns to extract both form and meaning-based linguistic knowledge
    from the input signal. We carry out an in-depth analysis of the representations
    used by different components of the trained model and show that encoding of
    semantic aspects tends to become richer as we go up the hierarchy of layers,
    whereas encoding of form-related aspects of the language input tends to
    initially increase and then plateau or decrease.

    Learning what matters – Sampling interesting patterns

    Vladimir Dzyuba, Matthijs van Leeuwen
    Comments: PAKDD 2017, extended version
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Databases (cs.DB)

    In the field of exploratory data mining, local structure in data can be
    described by patterns and discovered by mining algorithms. Although many
    solutions have been proposed to address the redundancy problems in pattern
    mining, most of them either provide succinct pattern sets or take the interests
    of the user into account-but not both. Consequently, the analyst has to invest
    substantial effort in identifying those patterns that are relevant to her
    specific interests and goals. To address this problem, we propose a novel
    approach that combines pattern sampling with interactive data mining. In
    particular, we introduce the LetSIP algorithm, which builds upon recent
    advances in 1) weighted sampling in SAT and 2) learning to rank in interactive
    pattern mining. Specifically, it exploits user feedback to directly learn the
    parameters of the sampling distribution that represents the user’s interests.
    We compare the performance of the proposed algorithm to the state-of-the-art in
    interactive pattern mining by emulating the interests of a user. The resulting
    system allows efficient and interleaved learning and sampling, thus
    user-specific anytime data exploration. Finally, LetSIP demonstrates favourable
    trade-offs concerning both quality-diversity and exploitation-exploration when
    compared to existing methods.


    Information Retrieval

    First Study on Data Readiness Level

    Hui Guan, Thanos Gentimis, Hamid Krim, James Keiser
    Comments: 9 pages
    Subjects: Information Retrieval (cs.IR); General Literature (cs.GL)

    We introduce the idea of Data Readiness Level (DRL) to measure the relative
    richness of data to answer specific questions often encountered by data
    scientists. We first approach the problem in its full generality explaining its
    desired mathematical properties and applications and then we propose and study
    two DRL metrics. Specifically, we define DRL as a function of at least four
    properties of data: Noisiness, Believability, Relevance, and Coherence. The
    information-theoretic based metrics, Cosine Similarity and Document Disparity,
    are proposed as indicators of Relevance and Coherence for a piece of data. The
    proposed metrics are validated through a text-based experiment using Twitter
    data.

    Volatility Prediction using Financial Disclosures Sentiments with Word Embedding-based IR Models

    Navid Rekabsaz, Mihai Lupu, Artem Baklanov, Allan Hanbury, Alexander Duer, Linda Anderson
    Subjects: Information Retrieval (cs.IR); Computational Engineering, Finance, and Science (cs.CE)

    Volatility prediction–an essential concept in financial markets–has
    recently been addressed using sentiment analysis methods. We investigate the
    sentiment of annual disclosures of companies in stock markets to forecast
    volatility. We specifically explore the use of recent Information Retrieval
    (IR) term weighting models that are effectively extended by related terms using
    word embeddings. In parallel to textual information, factual market data have
    been widely used as the mainstream approach to forecast market risk. We
    therefore study different fusion methods to combine text and market data
    resources. Our word embedding-based approach significantly outperforms
    state-of-the-art methods. In addition, we investigate the characteristics of
    the reports of the companies in different financial sectors.

    Effects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study

    Ibrahim Abu El-Khair
    Journal-ref: Abu El-Khair, Ibrahim. (2006). Effects of stop words elimination
    for Arabic information retrieval: a comparative study. International Journal
    of Computing & Information Sciences 4 (3), 119-133
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    The effectiveness of three stop words lists for Arabic Information
    Retrieval—General Stoplist, Corpus-Based Stoplist, Combined Stoplist —were
    investigated in this study. Three popular weighting schemes were examined: the
    inverse document frequency weight, probabilistic weighting, and statistical
    language modelling. The Idea is to combine the statistical approaches with
    linguistic approaches to reach an optimal performance, and compare their effect
    on retrieval. The LDC (Linguistic Data Consortium) Arabic Newswire data set was
    used with the Lemur Toolkit. The Best Match weighting scheme used in the Okapi
    retrieval system had the best overall performance of the three weighting
    algorithms used in the study, stoplists improved retrieval effectiveness
    especially when used with the BM25 weight. The overall performance of a general
    stoplist was better than the other two lists.


    Computation and Language

    Fast and Accurate Sequence Labeling with Iterated Dilated Convolutions

    Emma Strubell, Patrick Verga, David Belanger, Andrew McCallum
    Subjects: Computation and Language (cs.CL)

    Bi-directional LSTMs have emerged as a standard method for obtaining
    per-token vector representations serving as input to various token labeling
    tasks (whether followed by Viterbi prediction or independent classification).
    This paper proposes an alternative to Bi-LSTMs for this purpose: iterated
    dilated convolutional neural networks (ID-CNNs), which have better capacity
    than traditional CNNs for large context and structured prediction. We describe
    a distinct combination of network structure, parameter sharing and training
    procedures that is not only more accurate than Bi-LSTM-CRFs, but also 8x faster
    at test time on long sequences. Moreover, ID-CNNs with independent
    classification enable a dramatic 14x test-time speedup, while still attaining
    accuracy comparable to the Bi-LSTM-CRF. We further demonstrate the ability of
    ID-CNNs to combine evidence over long sequences by demonstrating their improved
    accuracy on whole-document (rather than per-sentence) inference. Unlike LSTMs
    whose sequential processing on sentences of length N requires O(N) time even in
    the face of parallelism, IDCNNs permit fixed-depth convolutions to run in
    parallel across entire documents. Today when many companies run basic NLP on
    the entire web and large-volume traffic, faster methods are paramount to saving
    time and energy costs.

    Characterisation of speech diversity using self-organising maps

    Tom A. F. Anderson, David M. W. Powers
    Comments: 16th Speech Science and Technology Conference (SST2016)
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)

    We report investigations into speaker classification of larger quantities of
    unlabelled speech data using small sets of manually phonemically annotated
    speech. The Kohonen speech typewriter is a semi-supervised method comprised of
    self-organising maps (SOMs) that achieves low phoneme error rates. A SOM is a
    2D array of cells that learn vector representations of the data based on
    neighbourhoods. In this paper, we report a method to evaluate pronunciation
    using multilevel SOMs with /hVd/ single syllable utterances for the study of
    vowels, for Australian pronunciation.

    Knowledge Adaptation: Teaching to Adapt

    Sebastian Ruder, Parsa Ghaffari, John G. Breslin
    Comments: 11 pages, 4 figures, 2 tables
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Domain adaptation is crucial in many real-world applications where the
    distribution of the training data differs from the distribution of the test
    data. Previous Deep Learning-based approaches to domain adaptation need to be
    trained jointly on source and target domain data and are therefore unappealing
    in scenarios where models need to be adapted to a large number of domains or
    where a domain is evolving, e.g. spam detection where attackers continuously
    change their tactics.

    To fill this gap, we propose Knowledge Adaptation, an extension of Knowledge
    Distillation (Bucilua et al., 2006; Hinton et al., 2015) to the domain
    adaptation scenario. We show how a student model achieves state-of-the-art
    results on unsupervised domain adaptation from multiple sources on a standard
    sentiment analysis benchmark by taking into account the domain-specific
    expertise of multiple teachers and the similarities between their domains.

    When learning from a single teacher, using domain similarity to gauge
    trustworthiness is inadequate. To this end, we propose a simple metric that
    correlates well with the teacher’s accuracy in the target domain. We
    demonstrate that incorporating high-confidence examples selected by this metric
    enables the student model to achieve state-of-the-art performance in the
    single-source scenario.

    Representations of language in a model of visually grounded speech signal

    Grzegorz Chrupała, Lieke Gelderloos, Afra Alishahi
    Comments: 10 pages
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We present a visually grounded model of speech perception which projects
    spoken utterances and images to a joint semantic space. We use a multi-layer
    recurrent highway network to model the temporal nature of spoken speech, and
    show that it learns to extract both form and meaning-based linguistic knowledge
    from the input signal. We carry out an in-depth analysis of the representations
    used by different components of the trained model and show that encoding of
    semantic aspects tends to become richer as we go up the hierarchy of layers,
    whereas encoding of form-related aspects of the language input tends to
    initially increase and then plateau or decrease.

    EliXa: A Modular and Flexible ABSA Platform

    Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri
    Comments: 5 pages, conference
    Journal-ref: Proceedings of the 9th International Workshop on Semantic
    Evaluation (SemEval 2015). Association for Computational Linguistics, June
    2015, Denver, Colorado, pp.748-752
    Subjects: Computation and Language (cs.CL)

    This paper presents a supervised Aspect Based Sentiment Analysis (ABSA)
    system. Our aim is to develop a modular platform which allows to easily conduct
    experiments by replacing the modules or adding new features. We obtain the best
    result in the Opinion Target Extraction (OTE) task (slot 2) using an
    off-the-shelf sequence labeler. The target polarity classification (slot 3) is
    addressed by means of a multiclass SVM algorithm which includes lexical based
    features such as the polarity values obtained from domain and open polarity
    lexicons. The system obtains accuracies of 0.70 and 0.73 for the restaurant and
    laptop domain respectively, and performs second best in the out-of-domain
    hotel, achieving an accuracy of 0.80.

    A Knowledge-Grounded Neural Conversation Model

    Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Wen-tau Yih, Michel Galley
    Comments: 10 pages
    Subjects: Computation and Language (cs.CL)

    Neural network models are capable of generating extremely natural sounding
    conversational interactions. Nevertheless, these models have yet to demonstrate
    that they can incorporate content in the form of factual information or
    entity-grounded opinion that would enable them to serve in more task-oriented
    conversational applications. This paper presents a novel, fully data-driven,
    and knowledge-grounded neural conversation model aimed at producing more
    contentful responses without slot filling. We generalize the widely-used
    Seq2Seq approach by conditioning responses on both conversation history and
    external “facts”, allowing the model to be versatile and applicable in an
    open-domain setting. Our approach yields significant improvements over a
    competitive Seq2Seq baseline. Human judges found that our outputs are
    significantly more informative.

    Effects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study

    Ibrahim Abu El-Khair
    Journal-ref: Abu El-Khair, Ibrahim. (2006). Effects of stop words elimination
    for Arabic information retrieval: a comparative study. International Journal
    of Computing & Information Sciences 4 (3), 119-133
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    The effectiveness of three stop words lists for Arabic Information
    Retrieval—General Stoplist, Corpus-Based Stoplist, Combined Stoplist —were
    investigated in this study. Three popular weighting schemes were examined: the
    inverse document frequency weight, probabilistic weighting, and statistical
    language modelling. The Idea is to combine the statistical approaches with
    linguistic approaches to reach an optimal performance, and compare their effect
    on retrieval. The LDC (Linguistic Data Consortium) Arabic Newswire data set was
    used with the Lemur Toolkit. The Best Match weighting scheme used in the Okapi
    retrieval system had the best overall performance of the three weighting
    algorithms used in the study, stoplists improved retrieval effectiveness
    especially when used with the BM25 weight. The overall performance of a general
    stoplist was better than the other two lists.

    Comparative Study of CNN and RNN for Natural Language Processing

    Wenpeng Yin, Katharina Kann, Mo Yu, Hinrich Schütze
    Comments: 7 pages, 11 figures
    Subjects: Computation and Language (cs.CL)

    Deep neural networks (DNN) have revolutionized the field of natural language
    processing (NLP). Convolutional neural network (CNN) and recurrent neural
    network (RNN), the two main types of DNN architectures, are widely explored to
    handle various NLP tasks. CNN is supposed to be good at extracting
    position-invariant features and RNN at modeling units in sequence. The state of
    the art on many NLP tasks often switches due to the battle between CNNs and
    RNNs. This work is the first systematic comparison of CNN and RNN on a wide
    range of representative NLP tasks, aiming to give basic guidance for DNN
    selection.

    The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task

    Roy Schwartz, Maarten Sap, Ioannis Konstas, Leila Zilles, Yejin Choi, Noah A. Smith
    Subjects: Computation and Language (cs.CL)

    A writer’s style depends not just on personal traits but also on her intent
    and mental state. In this paper, we show how variants of the same writing task
    can lead to measurable differences in writing style. We present a case study
    based on the story cloze task (Mostafazadeh et al., 2016a), where annotators
    were assigned similar writing tasks with different constraints: (1) writing an
    entire story, (2) adding a story ending for a given story context, and (3)
    adding an incoherent ending to a story. We show that a simple linear classifier
    informed with stylistic features is able to successfully distinguish between
    the three cases, without even looking at the story context. In addition, our
    style-based classifier establishes a new state-of-the-art result on the story
    cloze challenge, substantially higher than previous results based on deep
    learning models. Our results demonstrate that different task framings can
    dramatically affect the way people write.

    Neural Discourse Structure for Text Categorization

    Yangfeng Ji, Noah Smith
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    We show that discourse structure, as defined by Rhetorical Structure Theory
    and provided by an existing discourse parser, benefits text categorization. Our
    approach uses a recursive neural network and a newly proposed attention
    mechanism to compute a representation of the text that focuses on salient
    content, from the perspective of both RST and the task. Experiments consider
    variants of the approach and illustrate its strengths and weaknesses.

    Living a discrete life in a continuous world: Reference with distributed representations

    Gemma Boleda, Sebastian Padó, Nghia The Pham, Marco Baroni
    Comments: Under review at ACL 2017. 6 pages, 1 figure
    Subjects: Computation and Language (cs.CL)

    Reference is the crucial property of language that allows us to connect
    linguistic expressions to the world. Modeling it requires handling both
    continuous and discrete aspects of meaning. Data-driven models excel at the
    former, but struggle with the latter, and the reverse is true for symbolic
    models.

    We propose a fully data-driven, end-to-end trainable model that, while
    operating on continuous multimodal representations, learns to organize them
    into a discrete-like entity library. We also introduce a referential task to
    test it, cross-modal tracking. Our model beats standard neural network
    architectures, but is outperformed by some parametrizations of Memory Networks,
    another model with external memory.

    Beam Search Strategies for Neural Machine Translation

    Markus Freitag, Yaser Al-Onaizan
    Subjects: Computation and Language (cs.CL)

    The basic concept in Neural Machine Translation (NMT) is to train a large
    Neural Network that maximizes the translation performance on a given parallel
    corpus. NMT is then using a simple left-to-right beam-search decoder to
    generate new translations that approximately maximize the trained conditional
    probability. The current beam search strategy generates the target sentence
    word by word from left-to- right while keeping a fixed amount of active
    candidates at each time step. First, this simple search is less adaptive as it
    also expands candidates whose scores are much worse than the current best.
    Secondly, it does not expand hypotheses if they are not within the best scoring
    candidates, even if their scores are close to the best one. The latter one can
    be avoided by increasing the beam size until no performance improvement can be
    observed. While you can reach better performance, this has the draw- back of a
    slower decoding speed. In this paper, we concentrate on speeding up the decoder
    by applying a more flexible beam search strategy whose candidate size may vary
    at each time step depending on the candidate scores. We speed up the original
    decoder by up to 43% for the two language pairs German-English and
    Chinese-English without losing any translation quality.

    Ensemble Distillation for Neural Machine Translation

    Markus Freitag, Yaser Al-Onaizan, Baskaran Sankaran
    Subjects: Computation and Language (cs.CL)

    Knowledge distillation describes a method for training a student network to
    perform better by learning from a stronger teacher network. In this work, we
    run experiments with different kinds of teacher net- works to enhance the
    translation performance of a student Neural Machine Translation (NMT) network.
    We demonstrate techniques based on an ensemble and a best BLEU teacher network.
    We also show how to benefit from a teacher network that has the same
    architecture and dimensions of the student network. Further- more, we introduce
    a data filtering technique based on the dissimilarity between the forward
    translation (obtained during knowledge distillation) of a given source sentence
    and its target reference. We use TER to measure dissimilarity. Finally, we show
    that an ensemble teacher model can significantly reduce the student model size
    while still getting performance improvements compared to the baseline student
    network.

    Multi-task Coupled Attentions for Category-specific Aspect and Opinion Terms Co-extraction

    Wenya Wang, Sinno Jialin Pan, Daniel Dahlmeier
    Subjects: Computation and Language (cs.CL)

    In aspect-based sentiment analysis, most existing methods either focus on
    aspect/opinion terms extraction or aspect terms categorization. However, each
    task by itself only provides partial information to end users. To generate more
    detailed and structured opinion analysis, we propose a finer-grained problem,
    which we call category-specific aspect and opinion terms extraction. This
    problem involves the identification of aspect and opinion terms within each
    sentence, as well as the categorization of the identified terms. To this end,
    we propose an end-to-end multi-task attention model, where each task
    corresponds to aspect/opinion terms extraction for a specific category. Our
    model benefits from exploring the commonalities and relationships among
    different tasks to address the data sparsity issue. We demonstrate its
    state-of-the-art performance on three benchmark datasets.


    Distributed, Parallel, and Cluster Computing

    Achieving Dilution without Knowledge of Coordinates in the SINR Model

    William K. Moses Jr., Shailesh Vaya
    Comments: 10 pages
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Considerable literature has been developed for various fundamental
    distributed problems in the SINR (Signal-to-Interference-plus-Noise-Ratio)
    model for radio transmission. A setting typically studied is when all nodes
    transmit a signal of the same strength, and each device only has access to
    knowledge about the total number of nodes in the network (n), the range from
    which each node’s label is taken ([1,dots,N]), and the label of the device
    itself. In addition, an assumption is made that each node also knows its
    coordinates in the Euclidean plane. In this paper, we create a technique which
    allows algorithm designers to remove that last assumption. The assumption about
    the unavailability of the knowledge of the physical coordinates of the nodes
    truly captures the `ad-hoc’ nature of wireless networks.

    Previous work in this area uses a flavor of a technique called dilution, in
    which nodes transmit in a (predetermined) round-robin fashion, and are able to
    reach all their neighbors. However, without knowing the physical coordinates,
    it’s not possible to know the coordinates of their containing (pivotal) grid
    box and seemingly not possible to use dilution (to coordinate their
    transmissions). We propose a new technique to achieve dilution without using
    the knowledge of physical coordinates. This technique exploits the
    understanding that the transmitting nodes lie in 2-D space, segmented by an
    appropriate pivotal grid, without explicitly referring to the actual physical
    coordinates of these nodes. Using this technique, it is possible for every weak
    device to successfully transmit its message to all of its neighbors in
    (Theta(lg N)) rounds, as long as the density of transmitting nodes in any
    physical grid box is bounded by a known constant. This technique, we feel, is
    an important generic tool for devising practical protocols when physical
    coordinates of the nodes are not known.

    Unobtrusive Deferred Update Stabilization for Efficient Geo-Replication

    Chathuri Gunawardhana, Manuel Bravo, Luís Rodrigues
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)

    In this paper we propose a novel approach to manage the throughput vs latency
    tradeoff that emerges when managing updates in geo-replicated systems. Our
    approach consists in allowing full concurrency when processing local updates
    and using a deferred local serialisation procedure before shipping updates to
    remote datacenters. This strategy allows to implement inexpensive mechanisms to
    ensure system consistency requirements while avoiding intrusive effects on
    update operations, a major performance limitation of previous systems. We have
    implemented our approach as a variant of Riak KV. Our extensive evaluation
    shows that we outperform sequencer-based approaches by almost an order of
    magnitude in the maximum achievable throughput. Furthermore, unlike previous
    sequencer-free solutions, our approach reaches nearly optimal remote update
    visibility latencies without limiting throughput.

    Model-driven Scheduling for Distributed Stream Processing Systems

    Anshu Shukla, Yogesh Simmhan
    Comments: 54 pages
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Distributed Stream Processing frameworks are being commonly used with the
    evolution of Internet of Things(IoT). These frameworks are designed to adapt to
    the dynamic input message rate by scaling in/out.Apache Storm, originally
    developed by Twitter is a widely used stream processing engine while others
    includes Flink, Spark streaming. For running the streaming applications
    successfully there is need to know the optimal resource requirement, as
    over-estimation of resources adds extra cost.So we need some strategy to come
    up with the optimal resource requirement for a given streaming application. In
    this article, we propose a model-driven approach for scheduling streaming
    applications that effectively utilizes a priori knowledge of the applications
    to provide predictable scheduling behavior. Specifically, we use application
    performance models to offer reliable estimates of the resource allocation
    required. Further, this intuition also drives resource mapping, and helps
    narrow the estimated and actual dataflow performance and resource utilization.
    Together, this model-driven scheduling approach gives a predictable application
    performance and resource utilization behavior for executing a given DSPS
    application at a target input stream rate on distributed resources.

    Development of JavaScript-based deep learning platform and application to distributed training

    Masatoshi Hidaka, Ken Miura, Tatsuya Harada
    Comments: Workshop paper for ICLR2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

    Deep learning is increasingly attracting attention for processing big data.
    Existing frameworks for deep learning must be set up to specialized computer
    systems. Gaining sufficient computing resources therefore entails high costs of
    deployment and maintenance. In this work, we implement a matrix library and
    deep learning framework that uses JavaScript. It can run on web browsers
    operating on ordinary personal computers and smartphones. Using JavaScript,
    deep learning can be accomplished in widely diverse environments without the
    necessity for software installation. Using GPGPU from WebCL framework, our
    framework can train large scale convolutional neural networks such as VGGNet
    and ResNet. In the experiments, we demonstrate their practicality by training
    VGGNet in a distributed manner using web browsers as the client.


    Learning

    Preference-based Teaching

    Ziyuan Gao, Christoph Ries, Hans Ulrich Simon, Sandra Zilles
    Comments: 35 pages
    Subjects: Learning (cs.LG)

    We introduce a new model of teaching named “preference-based teaching” and a
    corresponding complexity parameter—the preference-based teaching dimension
    (PBTD)—representing the worst-case number of examples needed to teach any
    concept in a given concept class. Although the PBTD coincides with the
    well-known recursive teaching dimension (RTD) on finite classes, it is
    radically different on infinite ones: the RTD becomes infinite already for
    trivial infinite classes (such as half-intervals) whereas the PBTD evaluates to
    reasonably small values for a wide collection of infinite classes including
    classes consisting of so-called closed sets w.r.t. a given closure operator,
    including various classes related to linear sets over (mathbb{N}_0) (whose RTD
    had been studied quite recently) and including the class of Euclidean
    half-spaces. On top of presenting these concrete results, we provide the reader
    with a theoretical framework (of a combinatorial flavor) which helps to derive
    bounds on the PBTD.

    Empirical Risk Minimization for Stochastic Convex Optimization: (O(1/n))- and (O(1/n^2))-type of Risk Bounds

    Lijun Zhang, Tianbao Yang, Rong Jin
    Subjects: Learning (cs.LG)

    Although there exist plentiful theories of empirical risk minimization (ERM)
    for supervised learning, current theoretical understandings of ERM for a
    related problem—stochastic convex optimization (SCO), are limited. In this
    work, we strengthen the realm of ERM for SCO by exploiting smoothness and
    strong convexity conditions to improve the risk bounds. First, we establish an
    (widetilde{O}(d/n + sqrt{F_*/n})) risk bound when the random function is
    nonnegative, convex and smooth, and the expected function is Lipschitz
    continuous, where (d) is the dimensionality of the problem, (n) is the number
    of samples, and (F_*) is the minimal risk. Thus, when (F_*) is small we obtain
    an (widetilde{O}(d/n)) risk bound, which is analogous to the
    (widetilde{O}(1/n)) optimistic rate of ERM for supervised learning. Second, if
    the objective function is also (lambda)-strongly convex, we prove an
    (widetilde{O}(d/n + kappa F_*/n )) risk bound where (kappa) is the condition
    number, and improve it to (O(1/[lambda n^2] + kappa F_*/n)) when
    (n=widetilde{Omega}(kappa d)). As a result, we obtain an (O(kappa/n^2))
    risk bound under the condition that (n) is large and (F_*) is small, which to
    the best of our knowledge, is the first (O(1/n^2))-type of risk bound of ERM.
    Third, we stress that the above results are established in a unified framework,
    which allows us to derive new risk bounds under weaker conditions, e.g.,
    without convexity of the random function and Lipschitz continuity of the
    expected function. Finally, we demonstrate that to achieve an (O(1/[lambda
    n^2] + kappa F_*/n)) risk bound for supervised learning, the
    (widetilde{Omega}(kappa d)) requirement on (n) can be replaced with
    (Omega(kappa^2)), which is dimensionality-independent.

    Sparse Algorithm for Robust LSSVM in Primal Space

    Li Chen, Shuisheng Zhou
    Comments: 22 pages, 4 figures
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    As enjoying the closed form solution, least squares support vector machine
    (LSSVM) has been widely used for classification and regression problems having
    the comparable performance with other types of SVMs. However, LSSVM has two
    drawbacks: sensitive to outliers and lacking sparseness. Robust LSSVM (R-LSSVM)
    overcomes the first partly via nonconvex truncated loss function, but the
    current algorithms for R-LSSVM with the dense solution are faced with the
    second drawback and are inefficient for training large-scale problems. In this
    paper, we interpret the robustness of R-LSSVM from a re-weighted viewpoint and
    give a primal R-LSSVM by the representer theorem. The new model may have sparse
    solution if the corresponding kernel matrix has low rank. Then approximating
    the kernel matrix by a low-rank matrix and smoothing the loss function by
    entropy penalty function, we propose a convergent sparse R-LSSVM (SR-LSSVM)
    algorithm to achieve the sparse solution of primal R-LSSVM, which overcomes two
    drawbacks of LSSVM simultaneously. The proposed algorithm has lower complexity
    than the existing algorithms and is very efficient for training large-scale
    problems. Many experimental results illustrate that SR-LSSVM can achieve better
    or comparable performance with less training time than related algorithms,
    especially for training large scale problems.

    Estimation of classrooms occupancy using a multi-layer perceptron

    Eugénio Rodrigues, Luísa Dias Pereira, Adélio Rodrigues Gaspar, Álvaro Gomes, Manuel Carlos Gameiro da Silva
    Comments: 6 pages, 2 figures, conference article
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    This paper presents a multi-layer perceptron model for the estimation of
    classrooms number of occupants from sensed indoor environmental data-relative
    humidity, air temperature, and carbon dioxide concentration. The modelling
    datasets were collected from two classrooms in the Secondary School of Pombal,
    Portugal. The number of occupants and occupation periods were obtained from
    class attendance reports. However, post-class occupancy was unknown and the
    developed model is used to reconstruct the classrooms occupancy by filling the
    unreported periods. Different model structure and environment variables
    combination were tested. The model with best accuracy had as input vector 10
    variables of five averaged time intervals of relative humidity and carbon
    dioxide concentration. The model presented a mean square error of 1.99,
    coefficient of determination of 0.96 with a significance of p-value < 0.001,
    and a mean absolute error of 1 occupant. These results show promising
    estimation capabilities in uncertain indoor environment conditions.

    Knowledge Adaptation: Teaching to Adapt

    Sebastian Ruder, Parsa Ghaffari, John G. Breslin
    Comments: 11 pages, 4 figures, 2 tables
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Domain adaptation is crucial in many real-world applications where the
    distribution of the training data differs from the distribution of the test
    data. Previous Deep Learning-based approaches to domain adaptation need to be
    trained jointly on source and target domain data and are therefore unappealing
    in scenarios where models need to be adapted to a large number of domains or
    where a domain is evolving, e.g. spam detection where attackers continuously
    change their tactics.

    To fill this gap, we propose Knowledge Adaptation, an extension of Knowledge
    Distillation (Bucilua et al., 2006; Hinton et al., 2015) to the domain
    adaptation scenario. We show how a student model achieves state-of-the-art
    results on unsupervised domain adaptation from multiple sources on a standard
    sentiment analysis benchmark by taking into account the domain-specific
    expertise of multiple teachers and the similarities between their domains.

    When learning from a single teacher, using domain similarity to gauge
    trustworthiness is inadequate. To this end, we propose a simple metric that
    correlates well with the teacher’s accuracy in the target domain. We
    demonstrate that incorporating high-confidence examples selected by this metric
    enables the student model to achieve state-of-the-art performance in the
    single-source scenario.

    Truncated Variational EM for Semi-Supervised Neural Simpletrons

    Dennis Forster, Jörg Lücke
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Inference and learning for probabilistic generative networks is often very
    challenging and typically prevents scalability to as large networks as used for
    deep discriminative approaches. To obtain efficiently trainable, large-scale
    and well performing generative networks for semi-supervised learning, we here
    combine two recent developments: a neural network reformulation of hierarchical
    Poisson mixtures (Neural Simpletrons), and a novel truncated variational EM
    approach (TV-EM). TV-EM provides theoretical guarantees for learning in
    generative networks, and its application to Neural Simpletrons results in
    particularly compact, yet approximately optimal, modifications of learning
    equations. If applied to standard benchmarks, we empirically find, that
    learning converges in fewer EM iterations, that the complexity per EM iteration
    is reduced, and that final likelihood values are higher on average. For the
    task of classification on data sets with few labels, learning improvements
    result in consistently lower error rates if compared to applications without
    truncation. Experiments on the MNIST data set herein allow for comparison to
    standard and state-of-the-art models in the semi-supervised setting. Further
    experiments on the NIST SD19 data set show the scalability of the approach when
    a manifold of additional unlabeled data is available.

    Gated Multimodal Units for Information Fusion

    John Arevalo, Thamar Solorio, Manuel Montes-y-Gómez, Fabio A. González
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    This paper presents a novel model for multimodal learning based on gated
    neural networks. The Gated Multimodal Unit (GMU) model is intended to be used
    as an internal unit in a neural network architecture whose purpose is to find
    an intermediate representation based on a combination of data from different
    modalities. The GMU learns to decide how modalities influence the activation of
    the unit using multiplicative gates. It was evaluated on a multilabel scenario
    for genre classification of movies using the plot and the poster. The GMU
    improved the macro f-score performance of single-modality approaches and
    outperformed other fusion strategies, including mixture of experts models.
    Along with this work, the MM-IMDb dataset is released which, to the best of our
    knowledge, is the largest publicly available multimodal dataset for genre
    prediction on movies.

    Representations of language in a model of visually grounded speech signal

    Grzegorz Chrupała, Lieke Gelderloos, Afra Alishahi
    Comments: 10 pages
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We present a visually grounded model of speech perception which projects
    spoken utterances and images to a joint semantic space. We use a multi-layer
    recurrent highway network to model the temporal nature of spoken speech, and
    show that it learns to extract both form and meaning-based linguistic knowledge
    from the input signal. We carry out an in-depth analysis of the representations
    used by different components of the trained model and show that encoding of
    semantic aspects tends to become richer as we go up the hierarchy of layers,
    whereas encoding of form-related aspects of the language input tends to
    initially increase and then plateau or decrease.

    Continuous-Time User Modeling in the Presence of Badges: A Probabilistic Approach

    Ali Khodadadi, Seyed Abbas Hosseini, Erfan Tavakoli, Hamid R. Rabiee
    Comments: 27 pages, 7 figures
    Subjects: Social and Information Networks (cs.SI); Learning (cs.LG)

    User modeling plays an important role in delivering customized web services
    to the users and improving their engagement. However, most user models in the
    literature do not explicitly consider the temporal behavior of users. More
    recently, continuous-time user modeling has gained considerable attention and
    many user behavior models have been proposed based on temporal point processes.
    However, typical point process based models often considered the impact of peer
    influence and content on the user participation and neglected other factors.
    Gamification elements, are among those factors that are neglected, while they
    have a strong impact on user participation in online services. In this paper,
    we propose interdependent multi-dimensional temporal point processes that
    capture the impact of badges on user participation besides the peer influence
    and content factors. We extend the proposed processes to model user actions
    over the community based question and answering websites, and propose an
    inference algorithm based on Variational-EM that can efficiently learn the
    model parameters. Extensive experiments on both synthetic and real data
    gathered from Stack Overflow show that our inference algorithm learns the
    parameters efficiently and the proposed method can better predict the user
    behavior compared to the alternatives.

    Low Rank Matrix Recovery with Simultaneous Presence of Outliers and Sparse Corruption

    Mostafa Rahmani, George Atia
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We study a data model in which the data matrix D can be expressed as D = L +
    S + C, where L is a low rank matrix, S an element-wise sparse matrix and C a
    matrix whose non-zero columns are outlying data points. To date, robust PCA
    algorithms have solely considered models with either S or C, but not both. As
    such, existing algorithms cannot account for simultaneous element-wise and
    column-wise corruptions. In this paper, a new robust PCA algorithm that is
    robust to simultaneous types of corruption is proposed. Our approach hinges on
    the sparse approximation of a sparsely corrupted column so that the sparse
    expansion of a column with respect to the other data points is used to
    distinguish a sparsely corrupted inlier column from an outlying data point. We
    also develop a randomized design which provides a scalable implementation of
    the proposed approach. The core idea of sparse approximation is analyzed
    analytically where we show that the underlying ell_1-norm minimization can
    obtain the representation of an inlier in presence of sparse corruptions.

    Neural Discourse Structure for Text Categorization

    Yangfeng Ji, Noah Smith
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    We show that discourse structure, as defined by Rhetorical Structure Theory
    and provided by an existing discourse parser, benefits text categorization. Our
    approach uses a recursive neural network and a newly proposed attention
    mechanism to compute a representation of the text that focuses on salient
    content, from the perspective of both RST and the task. Experiments consider
    variants of the approach and illustrate its strengths and weaknesses.

    Learning similarity preserving representations with neural similarity encoders

    Franziska Horn, Klaus-Robert Müller
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Many dimensionality reduction or manifold learning algorithms optimize for
    retaining the pairwise similarities, distances, or local neighborhoods of data
    points. Spectral methods like Kernel PCA (kPCA) or isomap achieve this by
    computing the singular value decomposition (SVD) of some similarity matrix to
    obtain a low dimensional representation of the original data. However, this is
    computationally expensive if a lot of training examples are available and,
    additionally, representations for new (out-of-sample) data points can only be
    created when the similarities to the original training examples can be
    computed. We introduce similarity encoders (SimEc), which learn similarity
    preserving representations by using a feed-forward neural network to map data
    into an embedding space where the original similarities can be approximated
    linearly. The model optimizes the same objective as kPCA but in the process it
    learns a linear or non-linear embedding function (in the form of the tuned
    neural network), with which the representations of novel data points can be
    computed – even if the original pairwise similarities of the training set were
    generated by an unknown process such as human ratings. By creating embeddings
    for both image and text datasets, we demonstrate that SimEc can, on the one
    hand, reach the same solution as spectral methods, and, on the other hand,
    obtain meaningful embeddings from similarities based on human labels.

    Toward the automated analysis of complex diseases in genome-wide association studies using genetic programming

    Andrew Sohn, Randal S. Olson, Jason H. Moore
    Comments: 9 pages, 4 figures, submitted to GECCO 2017 conference and currently under review
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

    Machine learning has been gaining traction in recent years to meet the demand
    for tools that can efficiently analyze and make sense of the ever-growing
    databases of biomedical data in health care systems around the world. However,
    effectively using machine learning methods requires considerable domain
    expertise, which can be a barrier of entry for bioinformaticians new to
    computational data science methods. Therefore, off-the-shelf tools that make
    machine learning more accessible can prove invaluable for bioinformaticians. To
    this end, we have developed an open source pipeline optimization tool
    (TPOT-MDR) that uses genetic programming to automatically design machine
    learning pipelines for bioinformatics studies. In TPOT-MDR, we implement
    Multifactor Dimensionality Reduction (MDR) as a feature construction method for
    modeling higher-order feature interactions, and combine it with a new expert
    knowledge-guided feature selector for large biomedical data sets. We
    demonstrate TPOT-MDR’s capabilities using a combination of simulated and real
    world data sets from human genetics and find that TPOT-MDR significantly
    outperforms modern machine learning methods such as logistic regression and
    eXtreme Gradient Boosting (XGBoost). We further analyze the best pipeline
    discovered by TPOT-MDR for a real world problem and highlight TPOT-MDR’s
    ability to produce a high-accuracy solution that is also easily interpretable.


    Information Theory

    Massive MIMO Beam-forming for High Speed Train Communication: Directivity vs Beamwidth

    Xuhong Chen, Jiaxun Lu, Pingyi Fan
    Subjects: Information Theory (cs.IT)

    High-mobility adaption and massive Multiple-input Multiple-output (MIMO)
    application are two primary evolving objectives for the next generation high
    speed train communication system. In this paper, we consider how to design a
    location-aware beam-forming for the massive MIMO system.We first analyze the
    tradeoff between beam directivity and beamwidth, based on which we present the
    sensitivity analysis of positioning accuracy. Then, we derive the maximum beam
    directivity and corresponding beamwidth under the restriction of diverse
    positioning accuracies to guarantee a high efficient transmission. Finally, we
    present a low-complexity beam-forming design with positioning robustness
    utilizing location information, which requires neither eigen-decomposing (ED)
    the uplink channel covariance matrix (CCM) nor ED the downlink CCM (DCCM).
    Numerical simulation indicates that a massive MIMO system with less than a
    certain positioning error can guarantee a required performance with satisfying
    transmission efficiency in the high-mobility scenario.

    Robust Regularized ZF in Cooperative Broadcast Channel under Distributed CSIT

    Qianrui Li, Paul de Kerret, David Gesbert, Nicolas Gresset
    Comments: 38 pages, 4 figures, submitted to IEEE transaction on IT
    Subjects: Information Theory (cs.IT)

    In this work, we consider the sum rate performance of joint processing
    coordinated multi-point transmission network (JP-CoMP, a.k.a Network MIMO) in a
    so-called distributed channel state information (D-CSI) setting. In the D-CSI
    setting, the various transmitters (TXs) acquire a local, TX-dependent, estimate
    of the global multi-user channel state matrix obtained via terminal feedback
    and limited backhauling. The CSI noise across TXs can be independent or
    correlated, so as to reflect the degree to which TXs can exchange information
    over the backhaul, hence allowing to model a range of situations bridging fully
    distributed and fully centralized CSI settings. In this context we aim to study
    the price of CSI distributiveness in terms of sum rate at finite SNR when
    compared with conventional centralized scenarios. We consider the family of
    JP-CoMP precoders known as regularized zero-forcing (RZF). We conduct our study
    in the large scale antenna regime, as it is currently envisioned to be used in
    real 5G deployments. It is then possible to obtain accurate approximations on
    so-called deterministic equivalents of the signal to interference and noise
    ratios. Guided by the obtained deterministic equivalents, we propose an
    approach to derive a RZF scheme that is robust to the distributed aspect of the
    CSI, whereby the key idea lies in the optimization of a TX-dependent power
    level and regularization factor. Our analysis confirms the improved robustness
    of the proposed scheme with respect to CSI inconsistency at different TXs, even
    with moderate number of antennas and receivers (RXs).

    An Efficient Global Algorithm for Single-Group Multicast Beamforming

    Cheng Lu, Ya-Feng Liu
    Comments: 35 pages, 4 figures, 5 tables; submitted for possible publication
    Subjects: Information Theory (cs.IT)

    Consider the single-group multicast beamforming problem, where multiple users
    receive the same data stream simultaneously from a single transmitter. The
    problem is NP-hard and all existing algorithms for the problem either find
    suboptimal approximate or local stationary solutions. In this paper, we propose
    an efficient branch-and-bound algorithm for the problem that is guaranteed to
    find its global solution. To the best of our knowledge, our proposed algorithm
    is the first tailored global algorithm for the single-group multicast
    beamforming problem. Simulation results show that our proposed algorithm is
    computationally efficient (albeit its theoretical worst-case iteration
    complexity is exponential with respect to the number of receivers) and it
    significantly outperforms a state-of-the-art general-purpose global
    optimization solver called Baron. Our proposed algorithm provides an important
    benchmark for performance evaluation of existing algorithms for the same
    problem. By using it as the benchmark, we show that two state-of-the-art
    algorithms, semidefinite relaxation algorithm and successive linear
    approximation algorithm, work well when the problem dimension (i.e., the number
    of antennas at the transmitter and the number of receivers) is small but their
    performance deteriorates quickly as the problem dimension increases.

    A Region Based Easy Path Wavelet Transform For Sparse Image Representation

    Renato Budinich
    Subjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

    The Easy Path Wavelet Transform is an adaptive transform for bivariate
    functions (in particular natural images) which has been proposed in [1]. It
    provides a sparse representation by finding a path in the domain of the
    function leveraging the local correlations of the function values. It then
    applies a one dimensional wavelet transform to the obtained vector, decimates
    the points and iterates the procedure. The main drawback of such method is the
    need to store, for each level of the transform, the path which vectorizes the
    two dimensional data. Here we propose a variation on the method which consists
    of firstly applying a segmentation procedure to the function domain,
    partitioning it into regions where the variation in the function values is low;
    in a second step, inside each such region, a path is found in some
    deterministic way, i.e. not data-dependent. This circumvents the need to store
    the paths at each level, while still obtaining good quality lossy compression.
    This method is particularly well suited to encode a Region of Interest in the
    image with different quality than the rest of the image.

    [1] Gerlind Plonka. The easy path wavelet transform: A new adaptive wavelet
    transform for sparse representation of two-dimensional data. Multiscale
    Modeling & Simulation, 7(3):1474(-)1496, 2008.

    The Meta Distribution of the SIR for Cellular Networks with Power Control

    Yuanjie Wang, Martin Haenggi, Zhenhui Tan
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    The meta distribution of the signal-to-interference ratio (SIR) provides
    fine-grained information about the performance of individual links in a
    wireless network. This paper focuses on the analysis of the meta distribution
    of the SIR for both the cellular network uplink and downlink with fractional
    power control. For the uplink scenario, an approximation of the interfering
    user point process with a non-homogeneous Poisson point process is used. The
    moments of the meta distribution for both scenarios are calculated. Some
    bounds, the analytical expression, the mean local delay, and the beta
    approximation of the meta distribution are provided. The results give
    interesting insights into the effect of the power control in both the uplink
    and downlink. Detailed simulations show that the approximations made in the
    analysis are well justified.

    Joint Pushing and Caching for Bandwidth Utilization Maximization in Wireless Networks

    Yaping Sun, Ying Cui, Hui Liu
    Comments: 5 pages, 1 figure, submitted to IEEE ISIT 2017
    Subjects: Information Theory (cs.IT)

    Joint pushing and caching is recognized as an efficient remedy to the problem
    of spectrum scarcity incurred by the tremendous mobile data traffic. In this
    paper, by exploiting storage resource at end-users and predictability of
    users’s demand processes, we would like to design the optimal joint pushing and
    caching to maximize bandwidth utilization, which is one of the most important
    concerns for network operators. In particular, we formulate the stochastic
    optimization problem as an infinite horizon average cost Markov Decision
    Process (MDP), for which there generally exist only numerical solutions without
    many insights. By structural analysis, we show how the optimal policy achieves
    a balance between the current transmission cost and the future average
    transmission cost. In addition, we show that the optimal average transmission
    cost decreases with the cache size, revealing a tradeoff between the cache size
    and the bandwidth utilization. Moreover, due to the fact that obtaining a
    numerical optimal solution suffers the curse of dimensionality and implementing
    it requires a centralized controller and global system information, we develop
    a decentralized policy with polynomial complexity w.r.t. the numbers of users
    and files as well as cache size, by a linear approximation of the value
    function and relaxing the original policy optimization problem. The results in
    this paper offer useful guidelines for designing practical cache-enabled
    content-centric wireless networks.

    Temporal-Spatial Aggregation for Cache-Enabled Wireless Multicasting Networks with Asynchronous Content Requests

    Jifang Xing, Ying Cui, Vincent Lau
    Comments: 5 pages, 2 figures, submitted to IEEE ISIT 2017
    Subjects: Information Theory (cs.IT)

    Existing multicasting schemes for massive content delivery do not fully
    utilize multicasting opportunities in delay tolerant content-oriented
    applications. In this paper, we propose a novel temporal-spatial
    aggregation-based multicasting scheme in a large-scale cache-enabled wireless
    network. The proposed scheme can efficiently exploit multicasting opportunities
    in asynchronous content requests to improve spectral efficiency. By making use
    of the delay tolerance of elastic services, the proposed scheme achieves a
    better energy-throughput-delay tradeoff. Utilizing tools from stochastic
    geometry, we derive a tractable expression for the successful transmission
    probability in the general region. Using asymptotic approximations, we derive
    closed form successful transmission probabilities in the large delay region as
    well as the large and small user density regions. The asymptotic results reveal
    that the successful transmission probability increases and the energy
    consumption decreases at the cost of delay increase in these asymptotic
    regions. The analysis in this paper provides a new understanding of the
    energy-throughput-delay tradeoff for massive content delivery in large-scale
    cache-enabled wireless networks.

    Capacity Bounds on the Downlink of Symmetric, Multi-Relay, Single Receiver C-RAN Networks

    Shirin Saeedi Bidokhti, Gerhard Kramer, Shlomo Shamai (Shitz)
    Subjects: Information Theory (cs.IT)

    The downlink of symmetric Cloud Radio Access Networks (C-RAN) with multiple
    relays and a single receiver is studied. Lower and upper bounds are derived on
    the capacity. The lower bound is achieved by Marton’s coding which facilitates
    dependence among the multiple-access channel inputs. The upper bound uses
    Ozarow’s technique to augment the system with an auxiliary random variable. The
    bounds are studied over scalar Gaussian C-RANs and are shown to meet and
    characterize the capacity for interesting regimes of operation.

    Multiuser Communication Based on the DFT Eigenstructure

    R. M. Campello de Souza, H. M. de Oliveira, R. J. Cintra
    Comments: 5 pages, 2 figures, 3 tables
    Subjects: Information Theory (cs.IT); Applications (stat.AP)

    The eigenstructure of the discrete Fourier transform (DFT) is examined and
    new systematic procedures to generate eigenvectors of the unitary DFT are
    proposed. DFT eigenvectors are suggested as user signatures for data
    communication over the real adder channel (RAC). The proposed multiuser
    communication system over the 2-user RAC is detailed.

    Sequential Coding of Gauss-Markov Sources over Packet-Erasure Channels with Feedback

    Anatoly Khina, Ashish Khisti, Babak Hassibi
    Comments: Extended version of a paper submitted to ISIT 2017
    Subjects: Information Theory (cs.IT)

    We consider the problem of sequential transmission of Gauss-Markov sources
    over packet-erasure channels with a possibly delayed output feedback. For the
    case of instantaneous feedback, we determine the optimal squared error
    distortions for given transmission rates for all time instants, and construct a
    scheme that achieves all of them simultaneously. This establishes the optimal
    rate-distortion region for sequential coding of Gauss–Markov sources without
    packet erasures, as a special case. For the case of delayed feedback, we
    connect the problem to that of compression with side information that is known
    at the encoder and may be known at the decoder – where the most recent packets
    serve as side information that may have been erased. We conclude the paper by
    demonstrating that the loss due to a delay by one time instant is rather small.

    Successive Local and Successive Global Omniscience

    Anoosheh Heidarzadeh, Alex Sprintson
    Subjects: Information Theory (cs.IT)

    This paper considers two generalizations of the cooperative data exchange
    problem, referred to as the successive local omniscience (SLO) and the
    successive global omniscience (SGO). The users are divided into (ell) nested
    sub-groups. Each user initially knows a subset of packets in a ground set (X)
    of size (k), and all users wish to learn all packets in (X). The users exchange
    their packets by broadcasting coded or uncoded packets. In SLO or SGO,
    respectively, the (l)th ((1leq lleq ell)) smallest sub-group of users need
    to learn all packets they collectively hold or all packets in (X), in the (l)th
    round of transmissions. The problem is to find the minimum sum-rate (i.e., the
    total transmission rate by all users) for each round, subject to minimizing the
    sum-rate for the previous round. To solve this problem, we use a
    linear-programming approach. For the cases in which the packets are randomly
    distributed among users, we construct a system of linear equations whose
    solution characterizes the minimum sum-rate for each round with high
    probability as (k) tends to infinity. Moreover, for the special case of two
    nested groups, we derive closed-form expressions, which hold with high
    probability as (k) tends to infinity, for the minimum sum-rate for each round.

    One shot entanglement assisted classical and quantum communication over noisy quantum channels: A hypothesis testing and convex split approach

    Anurag Anshu, Rahul Jain, Naqueeb Ahmad Warsi
    Comments: 29 pages, 8 figure, version 1
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    In this work we study several communication tasks over noisy quantum channels
    and provide bounds on the amount of reliable communication between senders and
    receivers. Our results are in the one-shot setting with the extra resource of
    quantum entanglement between senders and receivers. We show 1) tight bounds on
    the amount of reliable communication for the {em point-to-point channel}, 2)
    an achievability bound for point-to-point channel with quantum side information
    about the channel at the encoder (quantum analogue of the {em
    Gel’fand-Pinsker} channel), 3) an achievability bound for point-to-point
    channel with limited quantum side information about the channel at the encoder,
    4) an achievability bound for the {em broadcast-channel}, a quantum analogue
    of the {em Marton inner bound}.

    We take a unified approach to arrive at all our results, primarily using {em
    hypothesis testing and convex split}. The forms of our results match with that
    of the corresponding results for similar communication tasks over noisy
    classical channels. To obtain the quantum analogue of Marton inner bound we
    prove a (bi-partite) generalization of the convex-split lemma, which may be of
    independent interest.

    Trimming the Independent Fat: Sufficient Statistics, Mutual Information, and Predictability from Effective Channel States

    Ryan G. James, John R. Mahoney, James P. Crutchfield
    Comments: 6 pages, 4 figures; this http URL
    Subjects: Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Chaotic Dynamics (nlin.CD); Machine Learning (stat.ML)

    One of the most fundamental questions one can ask about a pair of random
    variables X and Y is the value of their mutual information. Unfortunately, this
    task is often stymied by the extremely large dimension of the variables. We
    might hope to replace each variable by a lower-dimensional representation that
    preserves the relationship with the other variable. The theoretically ideal
    implementation is the use of minimal sufficient statistics, where it is
    well-known that either X or Y can be replaced by their minimal sufficient
    statistic about the other while preserving the mutual information. While
    intuitively reasonable, it is not obvious or straightforward that both
    variables can be replaced simultaneously. We demonstrate that this is in fact
    possible: the information X’s minimal sufficient statistic preserves about Y is
    exactly the information that Y’s minimal sufficient statistic preserves about
    X. As an important corollary, we consider the case where one variable is a
    stochastic process’ past and the other its future and the present is viewed as
    a memoryful channel. In this case, the mutual information is the channel
    transmission rate between the channel’s effective states. That is, the
    past-future mutual information (the excess entropy) is the amount of
    information about the future that can be predicted using the past. Translating
    our result about minimal sufficient statistics, this is equivalent to the
    mutual information between the forward- and reverse-time causal states of
    computational mechanics. We close by discussing multivariate extensions to this
    use of minimal sufficient statistics.

    A Digital Hardware Fast Algorithm and FPGA-based Prototype for a Novel 16-point Approximate DCT for Image Compression Applications

    F. M. Bayer, R. J. Cintra, A. Edirisuriya, A. Madanayake
    Comments: 17 pages, 6 figures, 6 tables
    Journal-ref: Measurement Science and Technology, Volume 23, Number 11, 2012
    Subjects: Multimedia (cs.MM); Hardware Architecture (cs.AR); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Methodology (stat.ME)

    The discrete cosine transform (DCT) is the key step in many image and video
    coding standards. The 8-point DCT is an important special case, possessing
    several low-complexity approximations widely investigated. However, 16-point
    DCT transform has energy compaction advantages. In this sense, this paper
    presents a new 16-point DCT approximation with null multiplicative complexity.
    The proposed transform matrix is orthogonal and contains only zeros and ones.
    The proposed transform outperforms the well-know Walsh-Hadamard transform and
    the current state-of-the-art 16-point approximation. A fast algorithm for the
    proposed transform is also introduced. This fast algorithm is experimentally
    validated using hardware implementations that are physically realized and
    verified on a 40 nm CMOS Xilinx Virtex-6 XC6VLX240T FPGA chip for a maximum
    clock rate of 342 MHz. Rapid prototypes on FPGA for 8-bit input word size shows
    significant improvement in compressed image quality by up to 1-2 dB at the cost
    of only eight adders compared to the state-of-art 16-point DCT approximation
    algorithm in the literature [S. Bouguezel, M. O. Ahmad, and M. N. S. Swamy. A
    novel transform for image compression. In {em Proceedings of the 53rd IEEE
    International Midwest Symposium on Circuits and Systems (MWSCAS)}, 2010].




沪ICP备19023445号-2号
友情链接