IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 2 Nov 2016

    我爱机器学习(52ml.net)发表于 2016-11-02 00:00:00
    love 0

    Neural and Evolutionary Computing

    Surrogate-Assisted Partial Order-based Evolutionary Optimisation

    Vanessa Volz, Günter Rudolph, Boris Naujoks
    Subjects: Neural and Evolutionary Computing (cs.NE)

    In this paper, we propose a novel approach (SAPEO) to support the survival
    selection process in multi-objective evolutionary algorithms with surrogate
    models – it dynamically chooses individuals to evaluate exactly based on the
    model uncertainty and the distinctness of the population. We introduce variants
    that differ in terms of the risk they allow when doing survival selection.
    Here, the anytime performance of different SAPEO variants is evaluated in
    conjunction with an SMS-EMOA using the BBOB bi-objective benchmark. We compare
    the obtained results with the performance of the regular SMS-EMOA, as well as
    another surrogate-assisted approach. The results open up general questions
    about the applicability and required conditions for surrogate-assisted
    multi-objective evolutionary algorithms to be tackled in the future.

    Full-Capacity Unitary Recurrent Neural Networks

    Scott Wisdom, Thomas Powers, John R. Hershey, Jonathan Le Roux, Les Atlas
    Comments: 9 pages, to appear in NIPS
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recurrent neural networks are powerful models for processing sequential data,
    but they are generally plagued by vanishing and exploding gradient problems.
    Unitary recurrent neural networks (uRNNs), which use unitary recurrence
    matrices, have recently been proposed as a means to avoid these issues.
    However, in previous experiments, the recurrence matrices were restricted to be
    a product of parameterized unitary matrices, and an open question remains: when
    does such a parameterization fail to represent all unitary matrices, and how
    does this restricted representational capacity limit what can be learned? To
    address this question, we propose full-capacity uRNNs that optimize their
    recurrence matrix over all unitary matrices, leading to significantly improved
    performance over uRNNs that use a restricted-capacity recurrence matrix. Our
    contribution consists of two main components. First, we provide a theoretical
    argument to determine if a unitary parameterization has restricted capacity.
    Using this argument, we show that a recently proposed unitary parameterization
    has restricted capacity for hidden state dimension greater than 7. Second, we
    show how a complete, full-capacity unitary recurrence matrix can be optimized
    over the differentiable manifold of unitary matrices. The resulting
    multiplicative gradient step is very simple and does not require gradient
    clipping or learning rate adaptation. We confirm the utility of our claims by
    empirically evaluating our new full-capacity uRNNs on both synthetic and
    natural data, achieving superior performance compared to both LSTMs and the
    original restricted-capacity uRNNs.


    Computer Vision and Pattern Recognition

    Structured illumination microscopy with unknown patterns and a statistical prior

    Li-Hao Yeh, Lei Tian, Laura Waller
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)

    Structured illumination microscopy (SIM) improves resolution by
    down-modulating high-frequency information of an object to fit within the
    passband of the optical system. Generally, the reconstruction process requires
    prior knowledge of the illumination patterns, which implies a well-calibrated
    and aberration-free system. Here, we propose a new algorithmic self-calibration
    strategy for SIM that does not need to know the exact patterns a priori, but
    only their covariance. The algorithm, termed PE-SIMS, includes a
    Pattern-Estimation (PE) step and a SIM reconstruction procedure using a
    Statistical prior (SIMS). Additionally, we perform a pixel reassignment process
    (SIMS-PR) to enhance the reconstruction quality. We achieve 2x better
    resolution than a conventional widefield microscope, while remaining
    insensitive to aberration-induced pattern distortion and robust against
    parameter tuning compared to other blind SIM algorithms.

    Dictionary Integration using 3D Morphable Face Models for Pose-invariant Collaborative-representation-based Classification

    Xiaoning Song, Zhen-Hua Feng, Guosheng Hu, Josef Kittler, William Christmas, Xiao-Jun Wu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The paper presents a dictionary integration algorithm using 3D morphable face
    models (3DMM) for pose-invariant collaborative-representation-based face
    classification. To this end, we first fit a 3DMM to the 2D face images of a
    dictionary to reconstruct the 3D shape and texture of each image. The 3D faces
    are used to render a number of virtual 2D face images with arbitrary pose
    variations to augment the training data, by merging the original and rendered
    virtual samples {color{black}to create} an extended dictionary. Second, to
    reduce the information redundancy of the extended dictionary and improve the
    sparsity of reconstruction coefficient vectors using
    collaborative-representation-based classification (CRC), we exploit an on-line
    elimination scheme to optimise the extended dictionary by identifying the most
    representative training samples for a given query. The final goal is to perform
    pose-invariant face classification using the proposed dictionary integration
    method and the on-line pruning strategy under the CRC framework. Experimental
    results obtained for a set of well-known face datasets demonstrate the merits
    of the proposed method, especially its robustness to pose variations.

    Sliding Dictionary Based Sparse Representation For Action Recognition

    Yashas Annadani, D L Rakshith, Soma Biswas
    Comments: 7 Pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The task of action recognition has been in the forefront of research, given
    its applications in gaming, surveillance and health care. In this work, we
    propose a simple, yet very effective approach which works seamlessly for both
    offline and online action recognition using the skeletal joints. We construct a
    sliding dictionary which has the training data along with their time stamps.
    This is used to compute the sparse coefficients of the input action sequence
    which is divided into overlapping windows and each window gives a probability
    score for each action class. In addition, we compute another simple feature,
    which calibrates each of the action sequences to the training sequences, and
    models the deviation of the action from the each of the training data. Finally,
    a score level fusion of the two heterogeneous but complementary features for
    each window is obtained and the scores for the available windows are
    successively combined to give the confidence scores of each action class. This
    way of combining the scores makes the approach suitable for scenarios where
    only part of the sequence is available. Extensive experimental evaluation on
    three publicly available datasets shows the effectiveness of the proposed
    approach for both offline and online action recognition tasks.

    Best-Buddies Tracking

    Shaul Oron, Denis Suhanov, Shai Avidan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Best-Buddies Tracking (BBT) applies the Best-Buddies Similarity measure (BBS)
    to the problem of model-free online tracking. BBS was introduced as a
    similarity measure between two point sets and was shown to be very effective
    for template matching. Originally, BBS was designed to work with point sets of
    equal size, and we propose a modification that lets it handle point sets of
    different size. The modified BBS is better suited to handle scale changes in
    the template size, as well as support a variable number of template images. We
    embed the modified BBS in a particle filter framework and obtain good results
    on a number of standard benchmarks.

    Deep fusion of visual signatures for client-server facial analysis

    Binod Bhattarai, Gaurav Sharma, Frederic Jurie
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Facial analysis is a key technology for enabling human-machine interaction.
    In this context, we present a client-server framework, where a client transmits
    the signature of a face to be analyzed to the server, and, in return, the
    server sends back various information describing the face e.g. is the person
    male or female, is she/he bald, does he have a mustache, etc. We assume that a
    client can compute one (or a combination) of visual features; from very simple
    and efficient features, like Local Binary Patterns, to more complex and
    computationally heavy, like Fisher Vectors and CNN based, depending on the
    computing resources available. The challenge addressed in this paper is to
    design a common universal representation such that a single merged signature is
    transmitted to the server, whatever be the type and number of features computed
    by the client, ensuring nonetheless an optimal performance. Our solution is
    based on learning of a common optimal subspace for aligning the different face
    features and merging them into a universal signature. We have validated the
    proposed method on the challenging CelebA dataset, on which our method
    outperforms existing state-of-the-art methods when rich representation is
    available at test time, while giving competitive performance when only simple
    signatures (like LBP) are available at test time due to resource constraints on
    the client.

    Embedding Deep Metric for Person Re-identication A Study Against Large Variations

    Hailin Shi, Yang Yang, Xiangyu Zhu, Shengcai Liao, Zhen Lei, Weishi Zheng, Stan Z. Li
    Comments: Published in ECCV2016. arXiv admin note: substantial text overlap with arXiv:1511.07545
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Person re-identification is challenging due to the large variations of pose,
    illumination, occlusion and camera view. Owing to these variations, the
    pedestrian data is distributed as highly-curved manifolds in the feature space,
    despite the current convolutional neural networks (CNN)’s capability of feature
    extraction. However, the distribution is unknown, so it is difficult to use the
    geodesic distance when comparing two samples. In practice, the current deep
    embedding methods use the Euclidean distance for the training and test. On the
    other hand, the manifold learning methods suggest to use the Euclidean distance
    in the local range, combining with the graphical relationship between samples,
    for approximating the geodesic distance. From this point of view, selecting
    suitable positive i.e. intra-class) training samples within a local range is
    critical for training the CNN embedding, especially when the data has large
    intra-class variations. In this paper, we propose a novel moderate positive
    sample mining method to train robust CNN for person re-identification, dealing
    with the problem of large variation. In addition, we improve the learning by a
    metric weight constraint, so that the learned metric has a better
    generalization ability. Experiments show that these two strategies are
    effective in learning robust deep metrics for person re-identification, and
    accordingly our deep model significantly outperforms the state-of-the-art
    methods on several benchmarks of person re-identification. Therefore, the study
    presented in this paper may be useful in inspiring new designs of deep models
    for person re-identification.

    A Benchmark Dataset and Saliency-guided Stacked Autoencoder for Video-based Salient Object Detection

    Jia Li, Changqun Xia, Xiaowu Chen
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Image-based salient object detection (SOD) has been extensively studied in
    the past decades. However, video-based SOD is much less explored since there
    lacks large-scale video datasets within which salient objects are unambiguously
    defined and annotated. Toward this end, this paper proposes a video-based SOD
    dataset that consists of 200 videos (64 minutes). In constructing the dataset,
    we manually annotate all objects and regions over 7,650 uniformly sampled
    keyframes and collect the eye-tracking data of 23 subjects that free-view all
    videos. From the user data, we find salient objects in video can be defined as
    objects that consistently pop-out throughout the video, and objects with such
    attributes can be unambiguously annotated by combining manually annotated
    object/region masks with eye-tracking data of multiple subjects. To the best of
    our knowledge, it is currently the largest dataset for video-based salient
    object detection.

    Based on the dataset, this paper proposes an unsupervised approach for
    video-based SOD by using a saliency-guided stacked autoencoder. In the proposed
    approach, spatiotemporal saliency cues are first extracted at pixel, superpixel
    and object levels. With these saliency cues, a stacked autoencoder is
    unsupervisedly trained which can automatically infer a saliency score for each
    pixel by progressively encoding the high-dimensional saliency cues gathered
    from the pixel and its spatiotemporal neighbors. Experimental results show that
    the proposed approach outperforms 19 image-based and 5 video-based models on
    the proposed dataset. Moreover, the comprehensive benchmarking results show
    that the proposed dataset is very challenging and have the potential to greatly
    boost the development of video-based SOD.

    Learning recurrent representations for hierarchical behavior modeling

    Eyrun Eyjolfsdottir, Kristin Branson, Yisong Yue, Pietro Perona
    Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    We propose a framework for detecting action patterns from motion sequences
    and modeling the sensory-motor relationship of animals, using a generative
    recurrent neural network. The network has a discriminative part (classifying
    actions) and a generative part (predicting motion), whose recurrent cells are
    laterally connected, allowing higher levels of the network to represent high
    level phenomena. We test our framework on two types of data, fruit fly behavior
    and online handwriting. Our results show that 1) taking advantage of unlabeled
    sequences, by predicting future motion, significantly improves action detection
    performance when training labels are scarce, 2) the network learns to represent
    high level phenomena such as writer identity and fly gender, without
    supervision, and 3) simulated motion trajectories, generated by treating motion
    prediction as input to the network, look realistic and may be used to
    qualitatively evaluate whether the model has learnt generative control rules.

    Exploiting Spatio-Temporal Structure with Recurrent Winner-Take-All Networks

    Eder Santana, Matthew Emigh, Pablo Zerges, Jose C Principe
    Comments: under review
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    We propose a convolutional recurrent neural network, with Winner-Take-All
    dropout for high dimensional unsupervised feature learning in multi-dimensional
    time series. We apply the proposedmethod for object recognition with temporal
    context in videos and obtain better results than comparable methods in the
    literature, including the Deep Predictive Coding Networks previously proposed
    by Chalasani and Principe.Our contributions can be summarized as a scalable
    reinterpretation of the Deep Predictive Coding Networks trained end-to-end with
    backpropagation through time, an extension of the previously proposed
    Winner-Take-All Autoencoders to sequences in time, and a new technique for
    initializing and regularizing convolutional-recurrent neural networks.


    Artificial Intelligence

    Using Artificial Intelligence to Identify State Secrets

    Renato Rocha Souza, Flavio Codeco Coelho, Rohan Shah, Matthew Connelly
    Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

    Whether officials can be trusted to protect national security information has
    become a matter of great public controversy, reigniting a long-standing debate
    about the scope and nature of official secrecy. The declassification of
    millions of electronic records has made it possible to analyze these issues
    with greater rigor and precision. Using machine-learning methods, we examined
    nearly a million State Department cables from the 1970s to identify features of
    records that are more likely to be classified, such as international
    negotiations, military operations, and high-level communications. Even with
    incomplete data, algorithms can use such features to identify 90% of classified
    cables with <11% false positives. But our results also show that there are
    longstanding problems in the identification of sensitive information. Error
    analysis reveals many examples of both overclassification and
    underclassification. This indicates both the need for research on inter-coder
    reliability among officials as to what constitutes classified material and the
    opportunity to develop recommender systems to better manage both classification
    and declassification.

    Detecting Affordances by Visuomotor Simulation

    Wolfram Schenck, Hendrik Hasenbein, Ralf Möller
    Comments: 26 pages, 8 figures
    Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

    The term “affordance” denotes the behavioral meaning of objects. We propose a
    cognitive architecture for the detection of affordances in the visual modality.
    This model is based on the internal simulation of movement sequences. For each
    movement step, the resulting sensory state is predicted by a forward model,
    which in turn triggers the generation of a new (simulated) motor command by an
    inverse model. Thus, a series of mental images in the sensory and in the motor
    domain is evoked. Starting from a real sensory state, a large number of such
    sequences is simulated in parallel. Final affordance detection is based on the
    generated motor commands. We apply this model to a real-world mobile robot
    which is faced with obstacle arrangements some of which are passable (corridor)
    and some of which are not (dead ends). The robot’s task is to detect the right
    affordance (“pass-through-able” or “non-pass-through-able”). The required
    internal models are acquired in a hierarchical training process. Afterwards,
    the robotic agent is able to distinguish reliably between corridors and dead
    ends. This real-world result enhances the validity of the proposed mental
    simulation approach. In addition, we compare several key factors in the
    simulation process regarding performance and efficiency.

    Local Subspace-Based Outlier Detection using Global Neighbourhoods

    Bas van Stein, Matthijs van Leeuwen, Thomas Bäck
    Comments: Short version accepted at IEEE BigData 2016
    Subjects: Artificial Intelligence (cs.AI)

    Outlier detection in high-dimensional data is a challenging yet important
    task, as it has applications in, e.g., fraud detection and quality control.
    State-of-the-art density-based algorithms perform well because they 1) take the
    local neighbourhoods of data points into account and 2) consider feature
    subspaces. In highly complex and high-dimensional data, however, existing
    methods are likely to overlook important outliers because they do not
    explicitly take into account that the data is often a mixture distribution of
    multiple components.

    We therefore introduce GLOSS, an algorithm that performs local subspace
    outlier detection using global neighbourhoods. Experiments on synthetic data
    demonstrate that GLOSS more accurately detects local outliers in mixed data
    than its competitors. Moreover, experiments on real-world data show that our
    approach identifies relevant outliers overlooked by existing methods,
    confirming that one should keep an eye on the global perspective even when
    doing local outlier detection.

    Learning recurrent representations for hierarchical behavior modeling

    Eyrun Eyjolfsdottir, Kristin Branson, Yisong Yue, Pietro Perona
    Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    We propose a framework for detecting action patterns from motion sequences
    and modeling the sensory-motor relationship of animals, using a generative
    recurrent neural network. The network has a discriminative part (classifying
    actions) and a generative part (predicting motion), whose recurrent cells are
    laterally connected, allowing higher levels of the network to represent high
    level phenomena. We test our framework on two types of data, fruit fly behavior
    and online handwriting. Our results show that 1) taking advantage of unlabeled
    sequences, by predicting future motion, significantly improves action detection
    performance when training labels are scarce, 2) the network learns to represent
    high level phenomena such as writer identity and fly gender, without
    supervision, and 3) simulated motion trajectories, generated by treating motion
    prediction as input to the network, look realistic and may be used to
    qualitatively evaluate whether the model has learnt generative control rules.

    Towards Blended Reactive Planning and Acting using Behavior Trees

    Michele Colledanchise, Diogo Almeida, Petter Ögren
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

    In this paper, we study the problem of using a planning algorithm to
    automatically create and update a Behavior Tree (BT), controlling a robot in a
    dynamic environment. Exploiting the characteristic of BTs, in terms of
    modularity and reactivity, the robot continually acts and plans to achieve a
    given goal using a set of abstract actions and conditions. The construction of
    the BT is based on an extension of the Hybrid Backward-Forward algorithm (HBF)
    that allows us to refine the acting process by mapping the descriptive models
    onto operational models of actions, thus integrating the ability of planning in
    infinite state space of HBF with the continuous modular reactive action
    execution of BTs. We believe that this might be a first step to address the
    recently raised open challenge in automated planning: the need of a
    hierarchical structure and a continuous online planning and acting framework.
    We prove the convergence of the proposed approach as well as the absence of
    deadlocks and livelocks, and we illustrate our approach in two different
    robotics scenarios.

    Towards Lifelong Self-Supervision: A Deep Learning Direction for Robotics

    Jay M. Wong
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

    Despite outstanding success in vision amongst other domains, many of the
    recent deep learning approaches have evident drawbacks for robots. This
    manuscript surveys recent work in the literature that pertain to applying deep
    learning systems to the robotics domain, either as means of estimation or as a
    tool to resolve motor commands directly from raw percepts. These recent
    advances are only a piece to the puzzle. We suggest that deep learning as a
    tool alone is insufficient in building a unified framework to acquire general
    intelligence. For this reason, we complement our survey with insights from
    cognitive development and refer to ideas from classical control theory,
    producing an integrated direction for a lifelong learning architecture.

    Robust Spectral Inference for Joint Stochastic Matrix Factorization

    Moontae Lee, David Bindel, David Mimno
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Spectral inference provides fast algorithms and provable optimality for
    latent topic analysis. But for real data these algorithms require additional
    ad-hoc heuristics, and even then often produce unusable results. We explain
    this poor performance by casting the problem of topic inference in the
    framework of Joint Stochastic Matrix Factorization (JSMF) and showing that
    previous methods violate the theoretical conditions necessary for a good
    solution to exist. We then propose a novel rectification method that learns
    high quality topics and their interactions even on small, noisy data. This
    method achieves results comparable to probabilistic techniques in several
    domains while maintaining scalability and provable optimality.

    Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

    Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, Ni Lao
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Extending the success of deep neural networks to natural language
    understanding and symbolic reasoning requires complex operations and external
    memory. Recent neural program induction approaches have attempted to address
    this problem, but are typically limited to differentiable memory, and
    consequently cannot scale beyond small synthetic tasks. In this work, we
    propose the Manager-Programmer-Computer framework, which integrates neural
    networks with non-differentiable memory to support abstract, scalable and
    precise operations through a friendly neural computer interface. Specifically,
    we introduce a Neural Symbolic Machine, which contains a sequence-to-sequence
    neural “programmer”, and a non-differentiable “computer” that is a Lisp
    interpreter with code assist. To successfully apply REINFORCE for training, we
    augment it with approximate gold programs found by an iterative maximum
    likelihood training process. NSM is able to learn a semantic parser from weak
    supervision over a large knowledge base. It achieves new state-of-the-art
    performance on WebQuestionsSP, a challenging semantic parsing dataset, with
    weak supervision. Compared to previous approaches, NSM is end-to-end, therefore
    does not rely on feature engineering or domain specific knowledge.


    Information Retrieval

    Product-based Neural Networks for User Response Prediction

    Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, Jun Wang
    Comments: 6 pages, 5 figures, ICDM2016
    Subjects: Learning (cs.LG); Information Retrieval (cs.IR)

    Predicting user responses, such as clicks and conversions, is of great
    importance and has found its usage in many Web applications including
    recommender systems, web search and online advertising. The data in those
    applications is mostly categorical and contains multiple fields; a typical
    representation is to transform it into a high-dimensional sparse binary feature
    representation via one-hot encoding. Facing with the extreme sparsity,
    traditional models may limit their capacity of mining shallow patterns from the
    data, i.e. low-order feature combinations. Deep models like deep neural
    networks, on the other hand, cannot be directly applied for the
    high-dimensional input because of the huge feature space. In this paper, we
    propose a Product-based Neural Networks (PNN) with an embedding layer to learn
    a distributed representation of the categorical data, a product layer to
    capture interactive patterns between inter-field categories, and further fully
    connected layers to explore high-order feature interactions. Our experimental
    results on two large-scale real-world ad click datasets demonstrate that PNNs
    consistently outperform the state-of-the-art models on various metrics.

    MusicMood: Predicting the mood of music from song lyrics using machine learning

    Sebastian Raschka
    Comments: 9 pages, 5 figures
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Sentiment prediction of contemporary music can have a wide-range of
    applications in modern society, for instance, selecting music for public
    institutions such as hospitals or restaurants to potentially improve the
    emotional well-being of personnel, patients, and customers, respectively. In
    this project, music recommendation system built upon on a naive Bayes
    classifier, trained to predict the sentiment of songs based on song lyrics
    alone. The experimental results show that music corresponding to a happy mood
    can be detected with high precision based on text features obtained from song
    lyrics.

    CBAS: context based arabic stemmer

    Mahmoud El-Defrawy, Yasser El-Sonbaty, Nahla A. Belal
    Journal-ref: International Journal on Natural Language Computing (IJNLC) Vol.
    4, No.3, June 2015
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Arabic morphology encapsulates many valuable features such as word root.
    Arabic roots are being utilized for many tasks; the process of extracting a
    word root is referred to as stemming. Stemming is an essential part of most
    Natural Language Processing tasks, especially for derivative languages such as
    Arabic. However, stemming is faced with the problem of ambiguity, where two or
    more roots could be extracted from the same word. On the other hand,
    distributional semantics is a powerful co-occurrence model. It captures the
    meaning of a word based on its context. In this paper, a distributional
    semantics model utilizing Smoothed Pointwise Mutual Information (SPMI) is
    constructed to investigate its effectiveness on the stemming analysis task. It
    showed an accuracy of 81.5%, with a at least 9.4% improvement over other
    stemmers.


    Computation and Language

    Faster decoding for subword level Phrase-based SMT between related languages

    Anoop Kunchukuttan, Pushpak Bhattacharyya
    Comments: Accepted at VarDial3 (Third Workshop on NLP for Similar Languages, Varieties and Dialects) collocated with COLING 2016; 7 pages
    Subjects: Computation and Language (cs.CL)

    A common and effective way to train translation systems between related
    languages is to consider sub-word level basic units. However, this increases
    the length of the sentences resulting in increased decoding time. The increase
    in length is also impacted by the specific choice of data format for
    representing the sentences as subwords. In a phrase-based SMT framework, we
    investigate different choices of decoder parameters as well as data format and
    their impact on decoding time and translation accuracy. We suggest best options
    for these settings that significantly improve decoding time with little impact
    on the translation accuracy.

    Recurrent Neural Network Language Model Adaptation Derived Document Vector

    Wei Li, Brian Kan, Wing Mak
    Subjects: Computation and Language (cs.CL)

    In many natural language processing (NLP) tasks, a document is commonly
    modeled as a bag of words using the term frequency-inverse document frequency
    (TF-IDF) vector. One major shortcoming of the frequency-based TF-IDF feature
    vector is that it ignores word orders that carry syntactic and semantic
    relationships among the words in a document, and they can be important in some
    NLP tasks such as genre classification. This paper proposes a novel distributed
    vector representation of a document: a simple recurrent-neural-network language
    model (RNN-LM) or a long short-term memory RNN language model (LSTM-LM) is
    first created from all documents in a task; some of the LM parameters are then
    adapted by each document, and the adapted parameters are vectorized to
    represent the document. The new document vectors are labeled as DV-RNN and
    DV-LSTM respectively. We believe that our new document vectors can capture some
    high-level sequential information in the documents, which other current
    document representations fail to capture. The new document vectors were
    evaluated in the genre classification of documents in three corpora: the Brown
    Corpus, the BNC Baby Corpus and an artificially created Penn Treebank dataset.
    Their classification performances are compared with the performance of TF-IDF
    vector and the state-of-the-art distributed memory model of paragraph vector
    (PV-DM). The results show that DV-LSTM significantly outperforms TF-IDF and
    PV-DM in most cases, and combinations of the proposed document vectors with
    TF-IDF or PV-DM may further improve performance.

    Dual Learning for Machine Translation

    Yingce Xia, Di He, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, Wei-Ying Ma
    Journal-ref: NIPS 2016
    Subjects: Computation and Language (cs.CL)

    While neural machine translation (NMT) is making good progress in the past
    two years, tens of millions of bilingual sentence pairs are needed for its
    training. However, human labeling is very costly. To tackle this training data
    bottleneck, we develop a dual-learning mechanism, which can enable an NMT
    system to automatically learn from unlabeled data through a dual-learning game.
    This mechanism is inspired by the following observation: any machine
    translation task has a dual task, e.g., English-to-French translation (primal)
    versus French-to-English translation (dual); the primal and dual tasks can form
    a closed loop, and generate informative feedback signals to train the
    translation models, even if without the involvement of a human labeler. In the
    dual-learning mechanism, we use one agent to represent the model for the primal
    task and the other agent to represent the model for the dual task, then ask
    them to teach each other through a reinforcement learning process. Based on the
    feedback signals generated during this process (e.g., the language-model
    likelihood of the output of a model, and the reconstruction error of the
    original sentence after the primal and dual translations), we can iteratively
    update the two models until convergence (e.g., using the policy gradient
    methods). We call the corresponding approach to neural machine translation
    emph{dual-NMT}. Experiments show that dual-NMT works very well on
    English(leftrightarrow)French translation; especially, by learning from
    monolingual data (with 10% bilingual data for warm start), it achieves a
    comparable accuracy to NMT trained from the full bilingual data for the
    French-to-English translation task.

    Improving Twitter Sentiment Classification via Multi-Level Sentiment-Enriched Word Embeddings

    Shufeng Xiong
    Subjects: Computation and Language (cs.CL)

    Most of existing work learn sentiment-specific word representation for
    improving Twitter sentiment classification, which encoded both n-gram and
    distant supervised tweet sentiment information in learning process. They assume
    all words within a tweet have the same sentiment polarity as the whole tweet,
    which ignores the word its own sentiment polarity. To address this problem, we
    propose to learn sentiment-specific word embedding by exploiting both lexicon
    resource and distant supervised information. We develop a multi-level
    sentiment-enriched word embedding learning method, which uses parallel
    asymmetric neural network to model n-gram, word level sentiment and tweet level
    sentiment in learning process. Experiments on standard benchmarks show our
    approach outperforms state-of-the-art methods.

    RNN Approaches to Text Normalization: A Challenge

    Richard Sproat, Navdeep Jaitly
    Comments: 17 pages, 13 tables, 3 figures
    Subjects: Computation and Language (cs.CL)

    This paper presents a challenge to the community: given a large corpus of
    written text aligned to its normalized spoken form, train an RNN to learn the
    correct normalization function. We present a data set of general text where the
    normalizations were generated using an existing text normalization component of
    a text-to-speech system. This data set will be released open-source in the near
    future.

    We also present our own experiments with this data set with a variety of
    different RNN architectures. While some of the architectures do in fact produce
    very good results when measured in terms of overall accuracy, the errors that
    are produced are problematic, since they would convey completely the wrong
    message if such a system were deployed in a speech application. On the other
    hand, we show that a simple FST-based filter can mitigate those errors, and
    achieve a level of accuracy not achievable by the RNN alone.

    Though our conclusions are largely negative on this point, we are actually
    not arguing that the text normalization problem is intractable using an pure
    RNN approach, merely that it is not going to be something that can be solved
    merely by having huge amounts of annotated text data and feeding that to a
    general RNN model. And when we open-source our data, we will be providing a
    novel data set for sequence-to-sequence modeling in the hopes that the the
    community can find better solutions.

    CBAS: context based arabic stemmer

    Mahmoud El-Defrawy, Yasser El-Sonbaty, Nahla A. Belal
    Journal-ref: International Journal on Natural Language Computing (IJNLC) Vol.
    4, No.3, June 2015
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Arabic morphology encapsulates many valuable features such as word root.
    Arabic roots are being utilized for many tasks; the process of extracting a
    word root is referred to as stemming. Stemming is an essential part of most
    Natural Language Processing tasks, especially for derivative languages such as
    Arabic. However, stemming is faced with the problem of ambiguity, where two or
    more roots could be extracted from the same word. On the other hand,
    distributional semantics is a powerful co-occurrence model. It captures the
    meaning of a word based on its context. In this paper, a distributional
    semantics model utilizing Smoothed Pointwise Mutual Information (SPMI) is
    constructed to investigate its effectiveness on the stemming analysis task. It
    showed an accuracy of 81.5%, with a at least 9.4% improvement over other
    stemmers.

    Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

    Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, Ni Lao
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Extending the success of deep neural networks to natural language
    understanding and symbolic reasoning requires complex operations and external
    memory. Recent neural program induction approaches have attempted to address
    this problem, but are typically limited to differentiable memory, and
    consequently cannot scale beyond small synthetic tasks. In this work, we
    propose the Manager-Programmer-Computer framework, which integrates neural
    networks with non-differentiable memory to support abstract, scalable and
    precise operations through a friendly neural computer interface. Specifically,
    we introduce a Neural Symbolic Machine, which contains a sequence-to-sequence
    neural “programmer”, and a non-differentiable “computer” that is a Lisp
    interpreter with code assist. To successfully apply REINFORCE for training, we
    augment it with approximate gold programs found by an iterative maximum
    likelihood training process. NSM is able to learn a semantic parser from weak
    supervision over a large knowledge base. It achieves new state-of-the-art
    performance on WebQuestionsSP, a challenging semantic parsing dataset, with
    weak supervision. Compared to previous approaches, NSM is end-to-end, therefore
    does not rely on feature engineering or domain specific knowledge.

    MusicMood: Predicting the mood of music from song lyrics using machine learning

    Sebastian Raschka
    Comments: 9 pages, 5 figures
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Sentiment prediction of contemporary music can have a wide-range of
    applications in modern society, for instance, selecting music for public
    institutions such as hospitals or restaurants to potentially improve the
    emotional well-being of personnel, patients, and customers, respectively. In
    this project, music recommendation system built upon on a naive Bayes
    classifier, trained to predict the sentiment of songs based on song lyrics
    alone. The experimental results show that music corresponding to a happy mood
    can be detected with high precision based on text features obtained from song
    lyrics.


    Distributed, Parallel, and Cluster Computing

    Self-Awareness of Cloud Applications

    Alexandru Iosup, Xiaoyun Zhu, Arif Merchant, Eva Kalyvianaki, Martina Maggio, Simon Spinner, Tarek Abdelzaher, Ole Mengshoel, Sara Bouchenak
    Comments: Overall: 40 pages, 1 figure, 135 references. Note: This is an extended survey. A much shorter, revised version of this material will be available in print, as part of a Springer book on “Self-Aware Computing”. The book is due to appear in 2017
    Subjects: Software Engineering (cs.SE); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI); Systems and Control (cs.SY)

    Cloud applications today deliver an increasingly larger portion of the
    Information and Communication Technology (ICT) services. To address the scale,
    growth, and reliability of cloud applications, self-aware management and
    scheduling are becoming commonplace. How are they used in practice? In this
    chapter, we propose a conceptual framework for analyzing state-of-the-art
    self-awareness approaches used in the context of cloud applications. We map
    important applications corresponding to popular and emerging application
    domains to this conceptual framework, and compare the practical
    characteristics, benefits, and drawbacks of self-awareness approaches. Last, we
    propose a roadmap for addressing open challenges in self-aware cloud and
    datacenter applications.


    Learning

    Semi-Supervised Radio Signal Identification

    Timothy J. O'Shea, Nathan West, Matthew Vondal, T. Charles Clancy
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Radio recognition in complex multi-user environments is an important tool for
    optimizing spectrum utilization, identifying and minimizing interference, and
    enforcing spectrum policy. Radio data is readily available and easy to obtain,
    but labeled data is often scarce making supervised learning strategies
    difficult and time consuming to curate. We demonstrate that semi-supervised
    learning techniques can be used to scale learning beyond supervised datasets,
    allowing for both discerning and recalling radio signals of interest by using
    sparse signal representations based on both unsupervised and supervised methods
    for nonlinear feature learning.

    Recurrent Neural Radio Anomaly Detection

    Timothy J O'Shea, T. Charles Clancy, Robert W. McGwier
    Subjects: Learning (cs.LG)

    We introduce a powerful recurrent neural network based method for novelty
    detection to the application of detecting radio anomalies. This approach holds
    promise in significantly increasing the ability of naive anomaly detection to
    detect small anomalies in highly complex complexity multi-user radio bands. We
    demonstrate the efficacy of this approach on a number of common real over the
    air radio communications bands of interest and quantify detection performance
    in terms of probability of detection an false alarm rates across a range of
    interference to band power ratios and compare to baseline methods.

    Stationary time-vertex signal processing

    Andreas Loukas, Nathanaël Perraudin
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)

    The goal of this paper is to improve learning for multivariate processes
    whose structure is dependent on some known graph topology. Typically, the graph
    information is incorporated to the learning process via a smoothness assumption
    postulating that the values supported on well connected vertices exhibit small
    variations. We argue that smoothness is not enough. To capture the behavior of
    complex interconnected systems, such as transportation and biological networks,
    it is important to train expressive models, being able to reproduce a wide
    range of graph and temporal behaviors. Motivated by this need, this paper puts
    forth a novel definition of time-vertex wide-sense stationarity, or joint
    stationarity for short. We believe that the proposed definition is natural, at
    it elegantly relates to existing definitions of stationarity in the time and
    vertex domains. We use joint stationarity to regularize learning and to reduce
    computational complexity in both estimation and recovery tasks. In particular,
    we show that for any jointly stationary process: (a) one can learn the
    covariance structure from O(1) samples, and (b) can solve MMSE recovery
    problems, such as interpolation, denoising, forecasting, in complexity that is
    linear to the edges and timesteps. Experiments with three datasets suggest that
    joint stationarity can yield significant accuracy improvements in the
    reconstruction effort.

    Improving a Credit Scoring Model by Incorporating Bank Statement Derived Features

    Rory P. Bunker, Wenjun Zhang, M. Asif Naeem
    Subjects: Learning (cs.LG)

    In this paper, we investigate the extent to which features derived from bank
    statements provided by loan applicants, and which are not declared on an
    application form, can enhance a credit scoring model for a New Zealand lending
    company. Exploring the potential of such information to improve credit scoring
    models in this manner has not been studied previously. We construct a baseline
    model based solely on the existing scoring features obtained from the loan
    application form, and a second baseline model based solely on the new bank
    statement-derived features. A combined feature model is then created by
    augmenting the application form features with the new bank statement derived
    features. Our experimental results using ROC analysis show that a combined
    feature model performs better than both of the two baseline models, and show
    that a number of the bank statement-derived features have value in improving
    the credit scoring model. The target data set used for modelling was highly
    imbalanced, and Naive Bayes was found to be the best performing model, and
    outperformed a number of other classifiers commonly used in credit scoring,
    suggesting its potential for future use on highly imbalanced data sets.

    Robust Spectral Inference for Joint Stochastic Matrix Factorization

    Moontae Lee, David Bindel, David Mimno
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Spectral inference provides fast algorithms and provable optimality for
    latent topic analysis. But for real data these algorithms require additional
    ad-hoc heuristics, and even then often produce unusable results. We explain
    this poor performance by casting the problem of topic inference in the
    framework of Joint Stochastic Matrix Factorization (JSMF) and showing that
    previous methods violate the theoretical conditions necessary for a good
    solution to exist. We then propose a novel rectification method that learns
    high quality topics and their interactions even on small, noisy data. This
    method achieves results comparable to probabilistic techniques in several
    domains while maintaining scalability and provable optimality.

    Product-based Neural Networks for User Response Prediction

    Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, Jun Wang
    Comments: 6 pages, 5 figures, ICDM2016
    Subjects: Learning (cs.LG); Information Retrieval (cs.IR)

    Predicting user responses, such as clicks and conversions, is of great
    importance and has found its usage in many Web applications including
    recommender systems, web search and online advertising. The data in those
    applications is mostly categorical and contains multiple fields; a typical
    representation is to transform it into a high-dimensional sparse binary feature
    representation via one-hot encoding. Facing with the extreme sparsity,
    traditional models may limit their capacity of mining shallow patterns from the
    data, i.e. low-order feature combinations. Deep models like deep neural
    networks, on the other hand, cannot be directly applied for the
    high-dimensional input because of the huge feature space. In this paper, we
    propose a Product-based Neural Networks (PNN) with an embedding layer to learn
    a distributed representation of the categorical data, a product layer to
    capture interactive patterns between inter-field categories, and further fully
    connected layers to explore high-order feature interactions. Our experimental
    results on two large-scale real-world ad click datasets demonstrate that PNNs
    consistently outperform the state-of-the-art models on various metrics.

    MusicMood: Predicting the mood of music from song lyrics using machine learning

    Sebastian Raschka
    Comments: 9 pages, 5 figures
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Sentiment prediction of contemporary music can have a wide-range of
    applications in modern society, for instance, selecting music for public
    institutions such as hospitals or restaurants to potentially improve the
    emotional well-being of personnel, patients, and customers, respectively. In
    this project, music recommendation system built upon on a naive Bayes
    classifier, trained to predict the sentiment of songs based on song lyrics
    alone. The experimental results show that music corresponding to a happy mood
    can be detected with high precision based on text features obtained from song
    lyrics.

    Bayesian Adaptive Data Analysis Guarantees from Subgaussianity

    Sam Elder
    Subjects: Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)

    The new field of adaptive data analysis seeks to provide algorithms and
    provable guarantees for models of machine learning that allow researchers to
    reuse their data, which normally falls outside of the usual statistical
    paradigm of static data analysis. In 2014, Dwork, Feldman, Hardt, Pitassi,
    Reingold and Roth introduced one potential model and proposed several solutions
    based on differential privacy. In previous work in 2016, we described a problem
    with this model and instead proposed a Bayesian variant, but also found that
    the analogous Bayesian methods cannot achieve the same statistical guarantees
    as in the static case.

    In this paper, we prove the first positive results for the Bayesian model,
    showing that with a Dirichlet prior, the posterior mean algorithm indeed
    matches the statistical guarantees of the static case. We conjecture that this
    is true for any conjugate prior from the exponential family, but can only prove
    this in special cases.

    The main ingredient, Theorem 4, shows that the ( ext{Beta}(alpha,eta))
    distribution is subgaussian with variance proxy (O(1/(alpha+eta+1))), a
    concentration result also of independent interest. Unlike most moment-based
    concentration techniques, which bound the centered moments, our proof utilizes
    a simple condition on the raw moments of a positive random variable.

    Kernel Bandwidth Selection for SVDD: Peak Criterion Approach for Large Data

    Sergiy Peredriy, Deovrat Kakde, Arin Chaudhuri
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Support Vector Data Description (SVDD) provides a useful approach to
    construct a description of multivariate data for single-class classification
    and outlier detection with various practical applications. Gaussian kernel used
    in SVDD formulation allows flexible data description defined by observations
    designated as support vectors. The data boundary of such description is
    non-spherical and conforms to the geometric features of the data. By varying
    the Gaussian kernel bandwidth parameter, the SVDD-generated boundary can be
    made either smoother (more spherical) or tighter/jagged. The former case may
    lead to under-fitting, whereas the latter may result in overfitting. Peak
    criterion has been proposed to select an optimal value of the kernel bandwidth
    to strike the balance between the data boundary smoothness and its ability to
    capture the general geometric shape of the data. Peak criterion involves
    training SVDD at various values of the kernel bandwidth parameter. When
    training datasets are large, the time required to obtain the optimal value of
    the Gaussian kernel bandwidth parameter according to Peak method can become
    prohibitively large. This paper proposes an extension of Peak method for the
    case of large data. The proposed method gives good results when applied to
    several datasets. Two existing alternative methods of computing the Gaussian
    kernel bandwidth parameter (Coefficient of Variation and Distance to the
    Farthest Neighbor) were modified to allow comparison with the proposed method
    on convergence. Empirical comparison demonstrates the advantage of the proposed
    method.

    Exploiting Spatio-Temporal Structure with Recurrent Winner-Take-All Networks

    Eder Santana, Matthew Emigh, Pablo Zerges, Jose C Principe
    Comments: under review
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    We propose a convolutional recurrent neural network, with Winner-Take-All
    dropout for high dimensional unsupervised feature learning in multi-dimensional
    time series. We apply the proposedmethod for object recognition with temporal
    context in videos and obtain better results than comparable methods in the
    literature, including the Deep Predictive Coding Networks previously proposed
    by Chalasani and Principe.Our contributions can be summarized as a scalable
    reinterpretation of the Deep Predictive Coding Networks trained end-to-end with
    backpropagation through time, an extension of the previously proposed
    Winner-Take-All Autoencoders to sequences in time, and a new technique for
    initializing and regularizing convolutional-recurrent neural networks.

    Computationally Efficient Influence Maximization in Stochastic and Adversarial Models: Algorithms and Analysis

    Justin Khim, Varun Jog, Po-Ling Loh
    Comments: 56 pages, 2 figures, 1 table
    Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)

    We consider the problem of influence maximization in fixed networks, for both
    stochastic and adversarial contagion models. The common goal is to select a
    subset of nodes of a specified size to infect so that the number of infected
    nodes at the conclusion of the epidemic is as large as possible. In the
    stochastic setting, the epidemic spreads according to a general triggering
    model, which includes the popular linear threshold and independent cascade
    models. We establish upper and lower bounds for the influence of an initial
    subset of nodes in the network, where the influence is defined as the expected
    number of infected nodes. Although the problem of exact influence computation
    is NP-hard in general, our bounds may be evaluated efficiently, leading to
    scalable algorithms for influence maximization with rigorous theoretical
    guarantees. In the adversarial spreading setting, an adversary is allowed to
    specify the edges through which contagion may spread, and the player chooses
    sets of nodes to infect in successive rounds. Both the adversary and player may
    behave stochastically, but we limit the adversary to strategies that are
    oblivious of the player’s actions. We establish upper and lower bounds on the
    minimax pseudo-regret in both undirected and directed networks.

    Surpassing Gradient Descent Provably: A Cyclic Incremental Method with Linear Convergence Rate

    Aryan Mokhtari, Mert Gürbüzbalaban, Alejandro Ribeiro
    Subjects: Optimization and Control (math.OC); Learning (cs.LG)

    Recently, there has been growing interest in developing optimization methods
    for solving large-scale machine learning problems. Most of these problems boil
    down to the problem of minimizing an average of a finite set of smooth and
    strongly convex functions where the number of functions (n) is large. Gradient
    descent method (GD) is successful in minimizing convex problems at a fast
    linear rate; however, it is not applicable to the considered large-scale
    optimization setting because of the high computational complexity. Incremental
    methods resolve this drawback of gradient methods by replacing the required
    gradient for the descent direction with an incremental gradient approximation.
    They operate by evaluating one gradient per iteration and executing the average
    of the (n) available gradients as a gradient approximate. Although, incremental
    methods reduce the computational cost of GD, their convergence rates do not
    justify their advantage relative to GD in terms of the total number of gradient
    evaluations until convergence. In this paper, we introduce a Double Incremental
    Aggregated Gradient method (DIAG) that computes the gradient of only one
    function at each iteration, which is chosen based on a cyclic scheme, and uses
    the aggregated average gradient of all the functions to approximate the full
    gradient. The iterates of the proposed DIAG method uses averages of both
    iterates and gradients in oppose to classic incremental methods that utilize
    gradient averages but do not utilize iterate averages. We prove that not only
    the proposed DIAG method converges linearly to the optimal solution, but also
    its linear convergence factor justifies the advantage of incremental methods on
    GD. In particular, we prove that the worst case performance of DIAG is better
    than the worst case performance of GD.

    Stochastic Variational Deep Kernel Learning

    Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, Eric P. Xing
    Comments: 13 pages, 6 tables, 3 figures. Appearing in NIPS 2016
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Methodology (stat.ME)

    Deep kernel learning combines the non-parametric flexibility of kernel
    methods with the inductive biases of deep learning architectures. We propose a
    novel deep kernel learning model and stochastic variational inference procedure
    which generalizes deep kernel learning approaches to enable classification,
    multi-task learning, additive covariance structures, and stochastic gradient
    training. Specifically, we apply additive base kernels to subsets of output
    features from deep neural architectures, and jointly learn the parameters of
    the base kernels and deep network through a Gaussian process marginal
    likelihood objective. Within this framework, we derive an efficient form of
    stochastic variational inference which leverages local kernel interpolation,
    inducing points, and structure exploiting algebra. We show improved performance
    over stand alone deep networks, SVMs, and state of the art scalable Gaussian
    processes on several classification benchmarks, including an airline delay
    dataset containing 6 million training points, CIFAR, and ImageNet.

    The (χ)-Divergence for Approximate Inference

    Adji B. Dieng, Dustin Tran, Rajesh Ranganath, John Paisley, David M. Blei
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Computation (stat.CO); Methodology (stat.ME)

    Variational inference enables Bayesian analysis for complex probabilistic
    models with massive data sets. It works by positing a family of distributions
    and finding the member in the family that is closest to the posterior. While
    successful, variational methods can run into pathologies; for example, they
    typically underestimate posterior uncertainty. We propose CHI-VI, a
    complementary algorithm to traditional variational inference with KL((q) ||
    (p)) and an alternative algorithm to EP. CHI-VI is a black box algorithm that
    minimizes the (chi)-divergence from the posterior to the family of
    approximating distributions. In EP, only local minimization of the KL((p) ||
    (q)) objective is possible. In contrast, CHI-VI optimizes a well-defined global
    objective. It directly minimizes an upper bound to the model evidence that
    equivalently minimizes the (chi)-divergence. In experiments, we illustrate the
    utility of the upper bound for sandwich estimating the model evidence. We also
    compare several probabilistic models and a Cox process for basketball data. We
    find CHI-VI often yields better classification error rates and better posterior
    uncertainty.

    Enhanced Factored Three-Way Restricted Boltzmann Machines for Speech Detection

    Pengfei Sun, Jun Qin
    Comments: 8 pages
    Journal-ref: IEEE Signal Processing Letter 2016
    Subjects: Sound (cs.SD); Learning (cs.LG); Machine Learning (stat.ML)

    In this letter, we propose enhanced factored three way restricted Boltzmann
    machines (EFTW-RBMs) for speech detection. The proposed model incorporates
    conditional feature learning by multiplying the dynamical state of the third
    unit, which allows a modulation over the visible-hidden node pairs. Instead of
    stacking previous frames of speech as the third unit in a recursive manner, the
    correlation related weighting coefficients are assigned to the contextual
    neighboring frames. Specifically, a threshold function is designed to capture
    the long-term features and blend the globally stored speech structure. A
    factored low rank approximation is introduced to reduce the parameters of the
    three-dimensional interaction tensor, on which non-negative constraint is
    imposed to address the sparsity characteristic. The validations through the
    area-under-ROC-curve (AUC) and signal distortion ratio (SDR) show that our
    approach outperforms several existing 1D and 2D (i.e., time and time-frequency
    domain) speech detection algorithms in various noisy environments.

    Application Specific Instrumentation (ASIN): A Bio-inspired Paradigm to Instrumentation using recognition before detection

    Amit Kumar Mishra
    Subjects: Other Computer Science (cs.OH); Learning (cs.LG)

    In this paper we present a new scheme for instrumentation, which has been
    inspired by the way small mammals sense their environment. We call this scheme
    Application Specific Instrumentation (ASIN). A conventional instrumentation
    system focuses on gathering as much information about the scene as possible.
    This, usually, is a generic system whose data can be used by another system to
    take a specific action. ASIN fuses these two steps into one. The major merit of
    the proposed scheme is that it uses low resolution sensors and much less
    computational overhead to give good performance for a highly specialised
    application

    Embedding Deep Metric for Person Re-identication A Study Against Large Variations

    Hailin Shi, Yang Yang, Xiangyu Zhu, Shengcai Liao, Zhen Lei, Weishi Zheng, Stan Z. Li
    Comments: Published in ECCV2016. arXiv admin note: substantial text overlap with arXiv:1511.07545
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Person re-identification is challenging due to the large variations of pose,
    illumination, occlusion and camera view. Owing to these variations, the
    pedestrian data is distributed as highly-curved manifolds in the feature space,
    despite the current convolutional neural networks (CNN)’s capability of feature
    extraction. However, the distribution is unknown, so it is difficult to use the
    geodesic distance when comparing two samples. In practice, the current deep
    embedding methods use the Euclidean distance for the training and test. On the
    other hand, the manifold learning methods suggest to use the Euclidean distance
    in the local range, combining with the graphical relationship between samples,
    for approximating the geodesic distance. From this point of view, selecting
    suitable positive i.e. intra-class) training samples within a local range is
    critical for training the CNN embedding, especially when the data has large
    intra-class variations. In this paper, we propose a novel moderate positive
    sample mining method to train robust CNN for person re-identification, dealing
    with the problem of large variation. In addition, we improve the learning by a
    metric weight constraint, so that the learned metric has a better
    generalization ability. Experiments show that these two strategies are
    effective in learning robust deep metrics for person re-identification, and
    accordingly our deep model significantly outperforms the state-of-the-art
    methods on several benchmarks of person re-identification. Therefore, the study
    presented in this paper may be useful in inspiring new designs of deep models
    for person re-identification.

    Full-Capacity Unitary Recurrent Neural Networks

    Scott Wisdom, Thomas Powers, John R. Hershey, Jonathan Le Roux, Les Atlas
    Comments: 9 pages, to appear in NIPS
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recurrent neural networks are powerful models for processing sequential data,
    but they are generally plagued by vanishing and exploding gradient problems.
    Unitary recurrent neural networks (uRNNs), which use unitary recurrence
    matrices, have recently been proposed as a means to avoid these issues.
    However, in previous experiments, the recurrence matrices were restricted to be
    a product of parameterized unitary matrices, and an open question remains: when
    does such a parameterization fail to represent all unitary matrices, and how
    does this restricted representational capacity limit what can be learned? To
    address this question, we propose full-capacity uRNNs that optimize their
    recurrence matrix over all unitary matrices, leading to significantly improved
    performance over uRNNs that use a restricted-capacity recurrence matrix. Our
    contribution consists of two main components. First, we provide a theoretical
    argument to determine if a unitary parameterization has restricted capacity.
    Using this argument, we show that a recently proposed unitary parameterization
    has restricted capacity for hidden state dimension greater than 7. Second, we
    show how a complete, full-capacity unitary recurrence matrix can be optimized
    over the differentiable manifold of unitary matrices. The resulting
    multiplicative gradient step is very simple and does not require gradient
    clipping or learning rate adaptation. We confirm the utility of our claims by
    empirically evaluating our new full-capacity uRNNs on both synthetic and
    natural data, achieving superior performance compared to both LSTMs and the
    original restricted-capacity uRNNs.

    Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

    Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, Ni Lao
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Extending the success of deep neural networks to natural language
    understanding and symbolic reasoning requires complex operations and external
    memory. Recent neural program induction approaches have attempted to address
    this problem, but are typically limited to differentiable memory, and
    consequently cannot scale beyond small synthetic tasks. In this work, we
    propose the Manager-Programmer-Computer framework, which integrates neural
    networks with non-differentiable memory to support abstract, scalable and
    precise operations through a friendly neural computer interface. Specifically,
    we introduce a Neural Symbolic Machine, which contains a sequence-to-sequence
    neural “programmer”, and a non-differentiable “computer” that is a Lisp
    interpreter with code assist. To successfully apply REINFORCE for training, we
    augment it with approximate gold programs found by an iterative maximum
    likelihood training process. NSM is able to learn a semantic parser from weak
    supervision over a large knowledge base. It achieves new state-of-the-art
    performance on WebQuestionsSP, a challenging semantic parsing dataset, with
    weak supervision. Compared to previous approaches, NSM is end-to-end, therefore
    does not rely on feature engineering or domain specific knowledge.


    Information Theory

    Energy Efficiency Optimization with Simultaneous Wireless Information and Power Transfer in MIMO Broadcast Channels

    Jie Tang, Daniel K. C. So, Arman Shojaeifard, Kai-Kit Wong
    Subjects: Information Theory (cs.IT)

    Simultaneous wireless information and power transfer (SWIPT) is anticipated
    to have great applications in fifth-generation (5G) and beyond communication
    systems. In this paper, we address the energy efficiency (EE) optimization
    problem for SWIPT multiple-input multiple-output broadcast channel (MIMO-BC)
    with time-switching (TS) receiver design. Our aim is to maximize the EE of the
    system whilst satisfying certain constraints in terms of maximum transmit power
    and minimum harvested energy per user. The coupling of the optimization
    variables, namely, transmit covariance matrices and TS ratios, leads to a EE
    problem which is non-convex, and hence very difficult to solve directly. Hence,
    we transform the original maximization problem with multiple constraints into a
    min-max problem with a single constraint and multiple auxiliary variables. We
    propose a dual inner/outer layer resource allocation framework to tackle the
    problem. For the inner-layer, we invoke an extended SWIPT-based BC-multiple
    access channel (MAC) duality approach and provide two iterative resource
    allocation schemes under fixed auxiliary variables for solving the dual MAC
    problem. A sub-gradient searching scheme is then proposed for the outer-layer
    in order to obtain the optimal auxiliary variables. Numerical results confirm
    the effectiveness of the proposed algorithms and illustrate that significant
    performance gain in terms of EE can be achieved by adopting the proposed
    extended BC-MAC duality-based algorithm.

    Detection of Single vs Multiple Antenna Transmission Systems Using Pilot Data

    Thakshila Wimalajeewa, Pramod K Varshney, Wei Su
    Subjects: Information Theory (cs.IT)

    In this paper, we consider the problem of classifying the transmission system
    when it is not known a priori whether the transmission is via a single antenna
    or multiple antennas. The receiver is assumed to be employed with a known
    number of antennas. In a data frame transmitted by most multiple input multiple
    output (MIMO) systems, some pilot or training data is inserted for symbol
    timing synchronization and estimation of the channel. Our goal is to perform
    MIMO transmit antenna classification using this pilot data. More specifically,
    the problem of determining the transmission system is cast as a multiple
    hypothesis testing problem where the number of hypotheses is equal to the
    maximum number of transmit antennas. Under the assumption of receiver having
    the exact knowledge of pilot data used for timing synchronization and channel
    estimation, we consider maximum likelihood (ML) and correlation test statistics
    to classify the MIMO transmit system. When only probabilistic knowledge of
    pilot data is available at the receiver, a hybrid-maximum likelihood (HML)
    based test statistic is constructed using the expectation-maximization (EM)
    algorithm. The performance of the proposed algorithms is illustrated via
    simulations and comparative merits of different techniques in terms of the
    computational complexity and performance are discussed.

    Generalized Entropy Concentration for Counts

    Kostas N. Oikonomou
    Subjects: Information Theory (cs.IT)

    We consider the phenomenon of entropy concentration under linear constraints
    in a discrete setting, using the “balls and bins” paradigm, but without the
    assumption that the number of balls allocated to the bins is known. Therefore
    instead of frequency vectors and ordinary entropy, we have count vectors with
    unknown sum, and a certain generalized entropy. We show that if the constraints
    bound the allowable sums, this suffices for concentration to occur even in this
    setting. The concentration can be either in terms of deviation from the maximum
    generalized entropy value, or in terms of the norm of the difference from the
    maximum generalized entropy vector. Without any asymptotic considerations, we
    quantify the concentration in terms of various parameters, notably a tolerance
    on the constraints which ensures that they are always satisfied by an integral
    vector. Generalized entropy maximization is not only compatible with ordinary
    MaxEnt, but can also be considered an extension of it, as it allows us to
    address problems that cannot be formulated as MaxEnt problems.

    Joint Antenna Selection and Spatial Switching for Energy Efficient MIMO SWIPT System

    Jie Tang, Daniel K. C. So, Arman Shojaeifard, Kai-Kit Wong, Jinming Wen
    Subjects: Information Theory (cs.IT)

    In this paper, we investigate joint antenna selection and spatial switching
    (SS) for quality-of-service (QoS)-constrained energy efficiency (EE)
    optimization in a multiple-input multiple-output (MIMO) simultaneous wireless
    information and power transfer (SWIPT) system. A practical linear power model
    taking into account the entire transmit-receive chain is accordingly utilized.
    The corresponding fractional-combinatorial and non-convex EE problem, involving
    joint optimization of eigen-channel assignment, power allocation, and active
    receive antenna set selection, subject to satisfying minimum sum-rate and power
    transfer constraints, is extremely difficult to solve directly. In order to
    tackle this, we separate the eigen-channel assignment and power allocation
    procedure with the antenna selection functionality. In particular, we first
    tackle the EE maximization problem under fixed receive antenna set using
    Dinkelbach-based convex programming, iterative joint eigen-channel assignment
    and power allocation, and low-complexity multi-objective optimization
    (MOO)-based approach. On the other hand, the number of active receive antennas
    induces a trade-off in the achievable sum-rate and power transfer versus the
    transmit-independent power consumption. We provide a fundamental study of the
    achievable EE with antenna selection and accordingly develop dynamic optimal
    exhaustive search and Frobenius-norm-based schemes. Simulation results confirm
    the theoretical findings and demonstrate that the proposed resource allocation
    algorithms can efficiently approach the optimal EE.

    When is Noisy State Information at the Encoder as Useless as No Information or as Good as Noise-Free State?

    Rui Xu, Jun Chen, Tsachy Weissman, Jian-Kang Zhang
    Comments: This paper was presented in part at the 2016 IEEE International Symposium on Information Theory. 16 pages, 8 figures
    Subjects: Information Theory (cs.IT)

    For any binary-input channel with perfect state information at the decoder,
    if the mutual information between the noisy state observation at the encoder
    and the true channel state is below a positive threshold determined solely by
    the state distribution, then the capacity is the same as that with no encoder
    side information. A complementary phenomenon is revealed for the generalized
    probing capacity. Extensions beyond binary-input channels are developed.

    An Experimental Comparison of Coded Modulation Strategies for 100 Gbit/s Transceivers

    Eric Sillekens, Alex Alvarado, Chigo Okonkwo, Benn C. Thomsen
    Subjects: Information Theory (cs.IT)

    Coded modulation is a key technique to increase the spectral efficiency of
    coherent optical communication systems. Two popular strategies for coded
    modulation are turbo trellis-coded modulation (TTCM) and bit-interleaved coded
    modulation (BICM) based on low-density parity-check (LDPC) codes. Although BICM
    LDPC is suboptimal, its simplicity makes it very popular in practice. In this
    work, we compare the performance of TTCM and BICM LDPC using
    information-theoretic measures. Our information-theoretic results show that for
    the same overhead and modulation format only a very small penalty (less than
    0.1 dB) is to be expected when an ideal BICM LDPC scheme is used. However, the
    results obtained for the coded modulation schemes implemented in this paper
    show that the TTCM outperforms BICM LDPC by a larger margin. For a 1000 km
    transmission at 100 Gbit/s, the observed gain was 0.4 dB.

    Nonlinear Frequency-Division Multiplexing in the Focusing Regime

    Xianhe Yangzhang, Mansoor I. Yousefi, Alex Alvarado, Domanic Lavery, Polina Bayvel
    Comments: Invited paper to be presented at The Optical Fiber Communications Conference and Exposition (OFC), March 2017
    Subjects: Information Theory (cs.IT)

    Achievable rates of the nonlinear frequency-division multiplexing (NFDM) and
    wavelength-division multiplexing (WDM) subject to the same power and bandwidth
    constraints are computed as a function of transmit power in the standard
    single-mode fiber. NFDM achieves higher rates than WDM.

    Multiuser Media-based Modulation for Massive MIMO Systems

    Bharath Shamasundar, A. Chockalingam
    Subjects: Information Theory (cs.IT)

    In this paper, we consider {em media-based modulation (MBM)}, an attractive
    modulation scheme which is getting increased research attention recently, for
    the uplink of a massive MIMO system. Each user is equipped with one transmit
    antenna with multiple radio frequency (RF) mirrors (parasitic elements) placed
    near it. The base station (BS) is equipped with tens to hundreds of receive
    antennas. MBM with (m_{rf}) RF mirrors and (n_r) receive antennas over a
    multipath channel has been shown to asymptotically (as (m_{rf}
    ightarrow
    infty)) achieve the capacity of (n_r) parallel AWGN channels. This suggests
    that MBM can be attractive for use in massive MIMO systems which typically
    employ a large number of receive antennas at the BS. In this paper, we
    investigate the potential performance advantage of multiuser MBM (MU-MBM) in a
    massive MIMO setting. Our results show that multiuser MBM (MU-MBM) can
    significantly outperform other modulation schemes. For example, a bit error
    performance achieved using 500 receive antennas at the BS in a massive MIMO
    system using conventional modulation can be achieved using just 128 antennas
    using MU-MBM. Even multiuser spatial modulation, and generalized spatial
    modulation in the same massive MIMO settings require more than 200 antennas to
    achieve the same bit error performance. Also, recognizing that the MU-MBM
    signal vectors are inherently sparse, we propose an efficient MU-MBM signal
    detection scheme that uses compressive sensing based reconstruction algorithms
    like orthogonal matching pursuit (OMP), compressive sampling matching pursuit
    (CoSaMP), and subspace pursuit (SP).

    Bounds on Codes with Locality and Availability

    S.B.Balaji, P.Vijay Kumar
    Comments: To be submitted to ISIT 2017
    Subjects: Information Theory (cs.IT)

    In this paper we investigate bounds on rate and minimum distance of codes
    with (t) availability. We present bounds on minimum distance of code with (t)
    availability that are tighter than existing bounds. For bounds on rate of a
    code with (t) availability, we restrict ourself to a sub class of codes with
    (t) availability and derive a tighter rate bound. For (t=3,4), we also present
    a high-rate construction.

    Interference-Constrained Pricing for D2D Networks

    Yuan Liu, Rui Wang, Zhu Han
    Comments: To appear in IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    The concept of device-to-device (D2D) communications underlaying cellular
    networks opens up potential benefits for improving system performance but also
    brings new challenges such as interference management. In this paper, we
    propose a pricing framework for interference management from the D2D users to
    the cellular system, where the base station (BS) protects itself (or its
    serving cellular users) by pricing the crosstier interference caused from the
    D2D users. A Stackelberg game is formulated to model the interactions between
    the BS and D2D users. Specifically, the BS sets prices to a maximize its
    revenue (or any desired utility) subject to an interference temperature
    constraint. For given prices, the D2D users competitively adapt their power
    allocation strategies for individual utility maximization. We first analyze the
    competition among the D2D users by noncooperative game theory and an iterative
    based distributed power allocation algorithm is proposed. Then, depending on
    how much network information the BS knows, we develop two optimal algorithms,
    one for uniform pricing with limited network information and the other for
    differentiated pricing with global network information. The uniform pricing
    algorithm can be implemented by a fully distributed manner and requires minimum
    information exchange between the BS and D2D users, and the differentiated
    pricing algorithm is partially distributed and requires no iteration between
    the BS and D2D users. Then a suboptimal differentiated pricing scheme is
    proposed to reduce complexity and it can be implemented in a fully distributed
    fashion. Extensive simulations are conducted to verify the proposed framework
    and algorithms.

    Rethinking Sketching as Sampling: A Graph Signal Processing Approach

    Fernando Gama, Antonio G. Marques, Gonzalo Mateos, Alejandro Ribeiro
    Comments: Submitted to IEEE Trans. Signal Processing
    Subjects: Information Theory (cs.IT)

    Sampling of bandlimited graph signals has well-documented merits for
    dimensionality reduction, affordable storage, and online processing of
    streaming network data. Most existing sampling methods are designed to minimize
    the error incurred when reconstructing the original signal from its samples.
    Oftentimes these parsimonious signals serve as inputs to
    computationally-intensive linear operator (e.g., graph filters and transforms).
    Hence, interest shifts from reconstructing the signal itself towards instead
    approximating the output of the prescribed linear operator efficiently. In this
    context, we propose a novel sampling scheme that leverages the bandlimitedness
    of the input as well as the transformation whose output we wish to approximate.
    We formulate problems to jointly optimize sample selection and a sketch of the
    target linear transformation, so when the latter is affordably applied to the
    sampled input signal the result is close to the desired output. These designs
    are carried out off line, and several heuristic (sub)optimal solvers are
    proposed to accommodate high-dimensional problems, especially when
    computational resources are at a premium. Similar sketching as sampling ideas
    are also shown effective in the context of linear inverse problems. The
    developed sampling plus reduced-complexity processing pipeline is particularly
    useful for streaming data, where the linear transform has to be applied fast
    and repeatedly to successive inputs or response signals. Numerical tests show
    the effectiveness of the proposed algorithms in classifying handwritten digits
    from as few as 20 out of 784 pixels in the input images, as well as in
    accurately estimating the frequency components of bandlimited graph signals
    sampled at few nodes.

    Optimal Signaling for Secure Communications over Gaussian MIMO Wiretap Channels

    Sergey Loyka, Charalambos D. Charalambous
    Comments: accepted by IEEE Trans. Info. Theory
    Subjects: Information Theory (cs.IT)

    Optimal signalling over the Gaussian MIMO wire-tap channel is studied under
    the total transmit power constraint. A closed-form solution for an optimal
    transmit covariance matrix is obtained when the channel is strictly degraded.
    In combination with the rank-1 solution, this provides the complete
    characterization of the optimal covariance for the case of two transmit
    antennas. The cases of weak eavesdropper and high SNR are considered. It is
    shown that the optimal covariance does not converge to a scaled identity in the
    high-SNR regime. Necessary optimality conditions and a tight upper bound on the
    rank of an optimal covariance matrix are established for the general case,
    along with a lower bound to the secrecy capacity, which is tight in a number of
    scenarios.

    Symmetry, Outer Bounds, and Code Constructions: A Computer-Aided Investigation on the Fundamental Limits of Caching

    Chao Tian
    Comments: 34 pages, 7 figures; submitted to IEEE Trans. Information Theory
    Subjects: Information Theory (cs.IT)

    We considered the optimal memory-transmission-rate tradeoff of caching
    systems. Different from the conventional analytical approach usually seen in
    the information theory literature, we rely on a computer-aided approach in this
    investigation. The linear programming (LP) outer bound of the entropy space
    serves as the starting point of this approach, however our effort goes
    significantly beyond using it to prove information inequalities. We first
    identify and formalize the symmetry structure in the problem, which enables us
    to show the existence of optimal symmetric solutions. A symmetry-reduced linear
    program is then used to identify the boundary of the memory-transmission-rate
    tradeoff for several simple cases, for which we obtain a set of tight outer
    bounds. General hypotheses on the optimal tradeoff region are formed from these
    computed data, which are then analytically proved. This leads to a complete
    characterization of the optimal tradeoff for systems with only two users, and
    certain partial characterization for systems with only two files. Next, we show
    that by carefully analyzing the joint entropy structure of the outer bounds for
    certain cases, a novel code construction can be reverse-engineered, which
    eventually leads to a general class of codes. Finally, we show that strong
    outer bounds can be computed through strategically relaxing the LP. This allows
    us firstly to deduce generic characteristic of the converse proof, and secondly
    to compute outer bounds for larger problem cases, despite the seemingly
    impossible computation scale.




沪ICP备19023445号-2号
友情链接