IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Mon, 31 Oct 2016

    我爱机器学习(52ml.net)发表于 2016-10-31 00:00:00
    love 0

    Neural and Evolutionary Computing

    Learning to Reason With Adaptive Computation

    Mark Neumann, Pontus Stenetorp, Sebastian Riedel
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    Multi-hop inference is necessary for machine learning systems to successfully
    solve tasks such as Recognising Textual Entailment and Machine Reading. In this
    work, we demonstrate the effectiveness of adaptive computation for learning the
    number of inference steps required for examples of different complexity and
    that learning the correct number of inference steps is difficult. We introduce
    the first model involving Adaptive Computation Time which provides a small
    performance benefit on top of a similar model without an adaptive component as
    well as enabling considerable insight into the reasoning process of the model.


    Computer Vision and Pattern Recognition

    Real-time Online Action Detection Forests using Spatio-temporal Contexts

    Seungryul Baek, Kwang In Kim, Tae-Kyun Kim
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Online action detection (OAD) is challenging since 1) robust yet
    computationally expensive features cannot be straightforwardly used due to the
    real-time processing requirements and 2) the localization and classification of
    actions have to be performed even before they are fully observed. We propose a
    new random forest (RF)-based online action detection framework that addresses
    these challenges. Our algorithm uses computationally efficient skeletal joint
    features. High accuracy is achieved by using robust convolutional neural
    network (CNN)-based features which are extracted from the raw RGBD images, plus
    the temporal relationships between the current frame of interest, and the past
    and future frames. While these high-quality features are not available in
    real-time testing scenario, we demonstrate that they can be effectively
    exploited in training RF classifiers: We use these spatio-temporal contexts to
    craft RF’s new split functions improving RFs’ leaf node statistics. Experiments
    with challenging MSRAction3D, G3D, and OAD datasets demonstrate that our
    algorithm significantly improves the accuracy over the state-of-the-art online
    action detection algorithms while achieving the real-time efficiency of
    existing skeleton-based RF classifiers.

    The TUM LapChole dataset for the M2CAI 2016 workflow challenge

    Ralf Stauder, Daniel Ostler, Michael Kranzfelder, Sebastian Koller, Hubertus Feußner, Nassir Navab
    Comments: 5 pages, 2 figures, preliminary reference for published dataset (until larger comparison study of workshop organizers is published)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this technical report we present our collected dataset of laparoscopic
    cholecystectomies (LapChole). Laparoscopic videos of a total of 20 surgeries
    were recorded and annotated with surgical phase labels, of which 15 were
    randomly pre-determined as training data, while the remaining 5 videos are
    selected as test data. This dataset was later included as part of the M2CAI
    2016 workflow detection challenge during MICCAI 2016 in Athens.

    Learnable Visual Markers

    Oleg Grinchuk, Vadim Lebedev, Victor Lempitsky
    Comments: NIPS 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a new approach to designing visual markers (analogous to QR-codes,
    markers for augmented reality, and robotic fiducial tags) based on the advances
    in deep generative networks. In our approach, the markers are obtained as color
    images synthesized by a deep network from input bit strings, whereas another
    deep network is trained to recover the bit strings back from the photos of
    these markers. The two networks are trained simultaneously in a joint
    backpropagation process that takes characteristic photometric and geometric
    distortions associated with marker fabrication and marker scanning into
    account. Additionally, a stylization loss based on statistics of activations in
    a pretrained classification network can be inserted into the learning in order
    to shift the marker appearance towards some texture prototype. In the
    experiments, we demonstrate that the markers obtained using our approach are
    capable of retaining bit strings that are long enough to be practical. The
    ability to automatically adapt markers according to the usage scenario and the
    desired capacity as well as the ability to combine information encoding with
    artistic stylization are the unique properties of our approach. As a byproduct,
    our approach provides an insight on the structure of patterns that are most
    suitable for recognition by ConvNets and on their ability to distinguish
    composite patterns.

    Judging a Book By its Cover

    Brian Kenji Iwana, Seiichi Uchida
    Comments: 5 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Book covers communicate information to potential readers, but can the same
    information be learned by computers? We propose a method of using a
    Convolutional Neural Network (CNN) to predict the genre of a book based on the
    visual clues provided by its cover. The purpose is to investigate whether
    relationships between books and their covers can be learned. However,
    determining the genre of a book is a difficult task because covers can be
    ambiguous and genres can be overarching. Despite this, we show that a CNN can
    extract features and learn underlying design rules set by the designer to
    define a genre. Using machine learning, we can bring the large amount of
    resources available to the book cover design process.

    Towards automatic pulmonary nodule management in lung cancer screening with deep learning

    Francesco Ciompi, Kaman Chung, Sarah J. van Riel, Arnaud Arindra Adiyoso Setio, Paul K. Gerke, Colin Jacobs, Ernst Th. Scholten, Cornelia Schaefer-Prokop, Mathilde M. W. Wille, Alfonso Marchiano, Ugo Pastorino, Mathias Prokop, Bram van Ginneken
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The introduction of lung cancer screening programs will produce an
    unprecedented amount of chest CT scans in the near future, which radiologists
    will have to read in order to decide on a patient follow-up strategy. According
    to the current guidelines, the workup of screen-detected nodules strongly
    relies on nodule size and nodule type. In this paper, we present a deep
    learning system based on multi-stream multi-scale convolutional networks, which
    automatically classifies all nodule types relevant for nodule workup. The
    system processes raw CT data containing a nodule without the need for any
    additional information such as nodule segmentation or nodule size and learns a
    representation of 3D data by analyzing an arbitrary number of 2D views of a
    given nodule. The deep learning system was trained with data from the Italian
    MILD screening trial and validated on an independent set of data from the
    Danish DLCST screening trial. We analyze the advantage of processing nodules at
    multiple scales with a multi-stream convolutional network architecture, and we
    show that the proposed deep learning system achieves performance at classifying
    nodule type within the inter-observer variability among four experienced human
    observers.

    Recent advances in content based video copy detection

    Sanket Shinde, Girija Chiddarwar
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    With the immense number of videos being uploaded to the video sharing sites,
    issue of copyright infringement arises with uploading of illicit copies or
    transformed versions of original video. Thus safeguarding copyright of digital
    media has become matter of concern. To address this concern, it is obliged to
    have a video copy detection system which is sufficiently robust to detect these
    transformed videos with ability to pinpoint location of copied segments. This
    paper outlines recent advancement in content based video copy detection, mainly
    focusing on different visual features employed by video copy detection systems.
    Finally we evaluate performance of existing video copy detection systems.

    Icon: An Interactive Approach to Train Deep Neural Networks for Segmentation of Neuronal Structures

    Felix Gonda, Verena Kaynig, Ray Thouis, Daniel Haehn, Jeff Lichtman, Toufiq Parag, Hanspeter Pfister
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present an interactive approach to train a deep neural network pixel
    classifier for the segmentation of neuronal structures. An interactive training
    scheme reduces the extremely tedious manual annotation task that is typically
    required for deep networks to perform well on image segmentation problems. Our
    proposed method employs a feedback loop that captures sparse annotations using
    a graphical user interface, trains a deep neural network based on recent and
    past annotations, and displays the prediction output to users in almost
    real-time. Our implementation of the algorithm also allows multiple users to
    provide annotations in parallel and receive feedback from the same classifier.
    Quick feedback on classifier performance in an interactive setting enables
    users to identify and label examples that are more important than others for
    segmentation purposes. Our experiments show that an interactively-trained pixel
    classifier produces better region segmentation results on Electron Microscopy
    (EM) images than those generated by a network of the same architecture trained
    offline on exhaustive ground-truth labels.

    Compressive Holographic Video

    Zihao Wang, Leonidas Spinoulas, Kuan He, Huaijin Chen, Lei Tian, Aggelos K. Katsaggelos, Oliver Cossairt
    Comments: 12 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Compressed sensing has been discussed separately in spatial and temporal
    domains. Compressive holography has been introduced as a method that allows 3D
    tomographic reconstruction at different depths from a single 2D image. Coded
    exposure is a temporal compressed sensing method for high speed video
    acquisition. In this work, we combine compressive holography and coded exposure
    techniques and extend the discussion to 4D reconstruction in space and time
    from one coded captured image. In our prototype, digital in-line holography was
    used for imaging macroscopic, fast moving objects. The pixel-wise temporal
    modulation was implemented by a digital micromirror device. In this paper we
    demonstrate (10 imes) temporal super resolution with multiple depths recovery
    from a single image. Two examples are presented for the purpose of recording
    subtle vibrations and tracking small particles within 5 ms.

    Cross-Modal Scene Networks

    Yusuf Aytar, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
    Comments: See more at this http URL arXiv admin note: text overlap with arXiv:1607.07295
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Multimedia (cs.MM)

    People can recognize scenes across many different modalities beyond natural
    images. In this paper, we investigate how to learn cross-modal scene
    representations that transfer across modalities. To study this problem, we
    introduce a new cross-modal scene dataset. While convolutional neural networks
    can categorize scenes well, they also learn an intermediate representation not
    aligned across modalities, which is undesirable for cross-modal transfer
    applications. We present methods to regularize cross-modal convolutional neural
    networks so that they have a shared representation that is agnostic of the
    modality. Our experiments suggest that our scene representation can help
    transfer representations across modalities for retrieval. Moreover, our
    visualizations suggest that units emerge in the shared representation that tend
    to activate on consistent concepts independently of the modality.

    SoundNet: Learning Sound Representations from Unlabeled Video

    Yusuf Aytar, Carl Vondrick, Antonio Torralba
    Comments: NIPS 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Sound (cs.SD)

    We learn rich natural sound representations by capitalizing on large amounts
    of unlabeled sound data collected in the wild. We leverage the natural
    synchronization between vision and sound to learn an acoustic representation
    using two-million unlabeled videos. Unlabeled video has the advantage that it
    can be economically acquired at massive scales, yet contains useful signals
    about natural sound. We propose a student-teacher training procedure which
    transfers discriminative visual knowledge from well established visual
    recognition models into the sound modality using unlabeled video as a bridge.
    Our sound representation yields significant performance improvements over the
    state-of-the-art results on standard benchmarks for acoustic scene/object
    classification. Visualizations suggest some high-level semantics automatically
    emerge in the sound network, even though it is trained without ground truth
    labels.


    Artificial Intelligence

    Flexible constrained sampling with guarantees for pattern mining

    Vladimir Dzyuba, Matthijs van Leeuwen, Luc De Raedt
    Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (stat.ML)

    Pattern sampling has been proposed as a potential solution to the infamous
    pattern explosion. Instead of enumerating all patterns that satisfy the
    constraints, individual patterns are sampled proportional to a given quality
    measure. Several sampling algorithms have been proposed, but each of them has
    its limitations when it comes to 1) flexibility in terms of quality measures
    and constraints that can be used, and/or 2) guarantees with respect to sampling
    accuracy. We therefore present Flexics, the first flexible pattern sampler that
    supports a broad class of quality measures and constraints, while providing
    strong guarantees regarding sampling accuracy. To achieve this, we leverage the
    perspective on pattern mining as a constraint satisfaction problem and build
    upon the latest advances in sampling solutions in SAT as well as existing
    pattern mining algorithms. Furthermore, the proposed algorithm is applicable to
    a variety of pattern languages, which allows us to introduce and tackle the
    novel task of sampling sets of patterns. We introduce and empirically evaluate
    two variants of Flexics: 1) a generic variant that addresses the well-known
    itemset sampling task and the novel pattern set sampling task as well as a wide
    range of expressive constraints within these tasks, and 2) a specialized
    variant that exploits existing frequent itemset techniques to achieve
    substantial speed-ups. Experiments show that Flexics is both accurate and
    efficient, making it a useful tool for pattern-based data exploration.

    Discovering Blind Spots of Predictive Models: Representations and Policies for Guided Exploration

    Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Eric Horvitz
    Subjects: Artificial Intelligence (cs.AI)

    Predictive models deployed in the world may assign incorrect labels to
    instances with high confidence. Such errors or unknown unknowns are rooted in
    model incompleteness, and typically arise because of the mismatch between
    training data and the cases seen in the open world. As the models are blind to
    such errors, input from an oracle is needed to identify these failures. In this
    paper, we formulate and address the problem of optimizing the discovery of
    unknown unknowns of any predictive model under a fixed budget, which limits the
    number of times an oracle can be queried for true labels. We propose a
    model-agnostic methodology which uses feedback from an oracle to both identify
    unknown unknowns and to intelligently guide the discovery. We employ a
    two-phase approach which first organizes the data into multiple partitions
    based on instance similarity, and then utilizes an explore-exploit strategy for
    discovering unknown unknowns across these partitions. We demonstrate the
    efficacy of our framework by varying the underlying causes of unknown unknowns
    across various applications. To the best of our knowledge, this paper presents
    the first algorithmic approach to the problem of discovering unknown unknowns
    of predictive models.

    Improving Sampling from Generative Autoencoders with Markov Chains

    Kai Arulkumaran, Antonia Creswell, Anil Anthony Bharath
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We focus on generative autoencoders, such as variational or adversarial
    autoencoders, which jointly learn a generative model alongside an inference
    model. We define generative autoencoders as autoencoders which are trained to
    softly enforce a prior on the latent distribution learned by the model.
    However, the model does not necessarily learn to match the prior. We formulate
    a Markov chain Monte Carlo (MCMC) sampling process, equivalent to iteratively
    encoding and decoding, which allows us to sample from the learned latent
    distribution. Using this we can improve the quality of samples drawn from the
    model, especially when the learned distribution is far from the prior. Using
    MCMC sampling, we also reveal previously unseen differences between generative
    autoencoders trained either with or without the denoising criterion.

    Text Segmentation using Named Entity Recognition and Co-reference Resolution in English and Greek Texts

    Pavlina Fragkou
    Comments: 32 pages. arXiv admin note: text overlap with arXiv:1308.0661, arXiv:1204.2847 by other authors
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

    In this paper we examine the benefit of performing named entity recognition
    (NER) and co-reference resolution to an English and a Greek corpus used for
    text segmentation. The aim here is to examine whether the combination of text
    segmentation and information extraction can be beneficial for the
    identification of the various topics that appear in a document. NER was
    performed manually in the English corpus and was compared with the output
    produced by publicly available annotation tools while, an already existing tool
    was used for the Greek corpus. Produced annotations from both corpora were
    manually corrected and enriched to cover four types of named entities.
    Co-reference resolution i.e., substitution of every reference of the same
    instance with the same named entity identifier was subsequently performed. The
    evaluation, using five text segmentation algorithms for the English corpus and
    four for the Greek corpus leads to the conclusion that, the benefit highly
    depends on the segment’s topic, the number of named entity instances appearing
    in it, as well as the segment’s length.

    How Users Explore Ontologies on the Web: A Study of NCBO's BioPortal Usage Logs

    Simon Walk, Lisette Espín-Noboa, Denis Helic, Markus Strohmaier, Mark Musen
    Comments: Under review for WWW’17
    Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Digital Libraries (cs.DL); Human-Computer Interaction (cs.HC)

    Ontologies in the biomedical domain are numerous, highly specialized and very
    expensive to develop. Thus, a crucial prerequisite for ontology adoption and
    reuse is effective support for exploring and finding existing ontologies.
    Towards that goal, the National Center for Biomedical Ontology (NCBO) has
    developed BioPortal—an online repository designed to support users in
    exploring and finding more than 500 existing biomedical ontologies. In 2016,
    BioPortal represents one of the largest portals for exploration of semantic
    biomedical vocabularies and terminologies, which is used by many researchers
    and practitioners. While usage of this portal is high, we know very little
    about how exactly users search and explore ontologies and what kind of usage
    patterns or user groups exist in the first place. Deeper insights into user
    behavior on such portals can provide valuable information to devise strategies
    for a better support of users in exploring and finding existing ontologies, and
    thereby enable better ontology reuse. To that end, we study and group users
    according to their browsing behavior on BioPortal using data mining techniques.
    Additionally, we use the obtained groups to characterize and compare
    exploration strategies across ontologies. In particular, we were able to
    identify seven distinct browsing-behavior types, which all make use of
    different functionality provided by BioPortal. For example, Search Explorers
    make extensive use of the search functionality while Ontology Tree Explorers
    mainly rely on the class hierarchy to explore ontologies. Further, we show that
    specific characteristics of ontologies influence the way users explore and
    interact with the website. Our results may guide the development of more
    user-oriented systems for ontology exploration on the Web.

    Fuzzy Bayesian Learning

    Indranil Pan, Dirk Bester
    Comments: 13 pages, 10 figures, submitted
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI)

    In this paper we propose a novel approach for learning from data using rule
    based fuzzy inference systems where the model parameters are estimated using
    Bayesian inference and Markov Chain Monte Carlo (MCMC) techniques. We show the
    applicability of the method for regression and classification tasks using
    synthetic data-sets and also a real world example in the financial services
    industry. Then we demonstrate how the method can be extended for knowledge
    extraction to select the individual rules in a Bayesian way which best explains
    the given data. Finally we discuss the advantages and pitfalls of using this
    method over state-of-the-art techniques and highlight the specific class of
    problems where this would be useful.

    Integrating Topic Models and Latent Factors for Recommendation

    Danis J. Wilson, Wei Zhang
    Comments: 10 pages, 3 figures
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

    The research of personalized recommendation techniques today has mostly
    parted into two mainstream directions, i.e., the factorization-based approaches
    and topic models. Practically, they aim to benefit from the numerical ratings
    and textual reviews, correspondingly, which compose two major information
    sources in various real-world systems. However, although the two approaches are
    supposed to be correlated for their same goal of accurate recommendation, there
    still lacks a clear theoretical understanding of how their objective functions
    can be mathematically bridged to leverage the numerical ratings and textual
    reviews collectively, and why such a bridge is intuitively reasonable to match
    up their learning procedures for the rating prediction and top-N recommendation
    tasks, respectively.

    In this work, we exposit with mathematical analysis that, the vector-level
    randomization functions to coordinate the optimization objectives of
    factorizational and topic models unfortunately do not exist at all, although
    they are usually pre-assumed and intuitively designed in the literature.
    Fortunately, we also point out that one can avoid the seeking of such a
    randomization function by optimizing a Joint Factorizational Topic (JFT) model
    directly. We apply our JFT model to restaurant recommendation, and study its
    performance in both normal and cross-city recommendation scenarios, where the
    latter is an extremely difficult task for its inherent cold-start nature.
    Experimental results on real-world datasets verified the appealing performance
    of our approach against previous methods, on both rating prediction and top-N
    recommendation tasks.

    Optimal Belief Approximation

    Reimar H. Leike, Torsten A. Enßlin
    Subjects: Statistics Theory (math.ST); Artificial Intelligence (cs.AI); Data Analysis, Statistics and Probability (physics.data-an)

    In Bayesian statistics probability distributions express beliefs. However,
    for many problems the beliefs cannot be computed analytically and
    approximations of beliefs are needed. We seek a ranking function that
    quantifies how “embarrassing” it is to communicate a given approximation. We
    show that there is only one ranking under the requirements that (1) the best
    ranked approximation is the non-approximated belief and (2) that the ranking
    judges approximations only by their predictions for actual outcomes. We find
    that this ranking is equivalent to the Kullback-Leibler divergence that is
    frequently used in the literature. However, there seems to be confusion about
    the correct order in which its functional arguments, the approximated and
    non-approximated beliefs, should be used. We hope that our elementary
    derivation settles the apparent confusion. We show for example that when
    approximating beliefs with Gaussian distributions the optimal approximation is
    given by moment matching. This is in contrast to many suggested computational
    schemes.


    Information Retrieval

    Sentiment Analysis of Twitter Data for Predicting Stock Market Movements

    Venkata Sasank Pagolu, Kamal Nayan Reddy Challa, Ganapati Panda, Babita Majhi
    Comments: 6 pages 4 figures Conference Paper
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Social and Information Networks (cs.SI)

    Predicting stock market movements is a well-known problem of interest.
    Now-a-days social media is perfectly representing the public sentiment and
    opinion about current events. Especially, twitter has attracted a lot of
    attention from researchers for studying the public sentiments. Stock market
    prediction on the basis of public sentiments expressed on twitter has been an
    intriguing field of research. Previous studies have concluded that the
    aggregate public mood collected from twitter may well be correlated with Dow
    Jones Industrial Average Index (DJIA). The thesis of this work is to observe
    how well the changes in stock prices of a company, the rises and falls, are
    correlated with the public opinions being expressed in tweets about that
    company. Understanding author’s opinion from a piece of text is the objective
    of sentiment analysis. The present paper have employed two different textual
    representations, Word2vec and N-gram, for analyzing the public sentiments in
    tweets. In this paper, we have applied sentiment analysis and supervised
    machine learning principles to the tweets extracted from twitter and analyze
    the correlation between stock market movements of a company and sentiments in
    tweets. In an elaborate way, positive news and tweets in social media about a
    company would definitely encourage people to invest in the stocks of that
    company and as a result the stock price of that company would increase. At the
    end of the paper, it is shown that a strong correlation exists between the rise
    and falls in stock prices with the public sentiments in tweets.

    Integrating Topic Models and Latent Factors for Recommendation

    Danis J. Wilson, Wei Zhang
    Comments: 10 pages, 3 figures
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

    The research of personalized recommendation techniques today has mostly
    parted into two mainstream directions, i.e., the factorization-based approaches
    and topic models. Practically, they aim to benefit from the numerical ratings
    and textual reviews, correspondingly, which compose two major information
    sources in various real-world systems. However, although the two approaches are
    supposed to be correlated for their same goal of accurate recommendation, there
    still lacks a clear theoretical understanding of how their objective functions
    can be mathematically bridged to leverage the numerical ratings and textual
    reviews collectively, and why such a bridge is intuitively reasonable to match
    up their learning procedures for the rating prediction and top-N recommendation
    tasks, respectively.

    In this work, we exposit with mathematical analysis that, the vector-level
    randomization functions to coordinate the optimization objectives of
    factorizational and topic models unfortunately do not exist at all, although
    they are usually pre-assumed and intuitively designed in the literature.
    Fortunately, we also point out that one can avoid the seeking of such a
    randomization function by optimizing a Joint Factorizational Topic (JFT) model
    directly. We apply our JFT model to restaurant recommendation, and study its
    performance in both normal and cross-city recommendation scenarios, where the
    latter is an extremely difficult task for its inherent cold-start nature.
    Experimental results on real-world datasets verified the appealing performance
    of our approach against previous methods, on both rating prediction and top-N
    recommendation tasks.

    Toward Implicit Sample Noise Modeling: Deviation-driven Matrix Factorization

    Guang-He Lee, Shao-Wen Yang, Shou-De Lin
    Comments: 6 pages + 1 reference page
    Subjects: Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)

    The objective function of a matrix factorization model usually aims to
    minimize the average of a regression error contributed by each element.
    However, given the existence of stochastic noises, the implicit deviations of
    sample data from their true values are almost surely diverse, which makes each
    data point not equally suitable for fitting a model. In this case, simply
    averaging the cost among data in the objective function is not ideal.
    Intuitively we would like to emphasize more on the reliable instances (i.e.,
    those contain smaller noise) while training a model. Motivated by such
    observation, we derive our formula from a theoretical framework for optimal
    weighting under heteroscedastic noise distribution. Specifically, by modeling
    and learning the deviation of data, we design a novel matrix factorization
    model. Our model has two advantages. First, it jointly learns the deviation and
    conducts dynamic reweighting of instances, allowing the model to converge to a
    better solution. Second, during learning the deviated instances are assigned
    lower weights, which leads to faster convergence since the model does not need
    to overfit the noise. The experiments are conducted in clean recommendation and
    noisy sensor datasets to test the effectiveness of the model in various
    scenarios. The results show that our model outperforms the state-of-the-art
    factorization and deep learning models in both accuracy and efficiency.

    Text Segmentation using Named Entity Recognition and Co-reference Resolution in English and Greek Texts

    Pavlina Fragkou
    Comments: 32 pages. arXiv admin note: text overlap with arXiv:1308.0661, arXiv:1204.2847 by other authors
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

    In this paper we examine the benefit of performing named entity recognition
    (NER) and co-reference resolution to an English and a Greek corpus used for
    text segmentation. The aim here is to examine whether the combination of text
    segmentation and information extraction can be beneficial for the
    identification of the various topics that appear in a document. NER was
    performed manually in the English corpus and was compared with the output
    produced by publicly available annotation tools while, an already existing tool
    was used for the Greek corpus. Produced annotations from both corpora were
    manually corrected and enriched to cover four types of named entities.
    Co-reference resolution i.e., substitution of every reference of the same
    instance with the same named entity identifier was subsequently performed. The
    evaluation, using five text segmentation algorithms for the English corpus and
    four for the Greek corpus leads to the conclusion that, the benefit highly
    depends on the segment’s topic, the number of named entity instances appearing
    in it, as well as the segment’s length.


    Computation and Language

    Word Embeddings for the Construction Domain

    Antoine J.-P. Tixier, Michalis Vazirgiannis, Matthew R. Hallowell
    Subjects: Computation and Language (cs.CL)

    We introduce word vectors for the construction domain. Our vectors were
    obtained by running word2vec on an 11M-word corpus that we created from scratch
    by leveraging freely-accessible online sources of construction-related text. We
    first explore the embedding space and show that our vectors capture meaningful
    construction-specific concepts. We then evaluate the performance of our vectors
    against that of ones trained on a 100B-word corpus (Google News) within the
    framework of an injury report classification task. Without any parameter
    tuning, our embeddings give competitive results, and outperform the Google News
    vectors in many cases. Using a keyword-based compression of the reports also
    leads to a significant speed-up with only a limited loss in performance. We
    release our corpus and the data set we created for the classification task as
    publicly available, in the hope that they will be used by future studies for
    benchmarking and building on our work.

    Text Segmentation using Named Entity Recognition and Co-reference Resolution in English and Greek Texts

    Pavlina Fragkou
    Comments: 32 pages. arXiv admin note: text overlap with arXiv:1308.0661, arXiv:1204.2847 by other authors
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

    In this paper we examine the benefit of performing named entity recognition
    (NER) and co-reference resolution to an English and a Greek corpus used for
    text segmentation. The aim here is to examine whether the combination of text
    segmentation and information extraction can be beneficial for the
    identification of the various topics that appear in a document. NER was
    performed manually in the English corpus and was compared with the output
    produced by publicly available annotation tools while, an already existing tool
    was used for the Greek corpus. Produced annotations from both corpora were
    manually corrected and enriched to cover four types of named entities.
    Co-reference resolution i.e., substitution of every reference of the same
    instance with the same named entity identifier was subsequently performed. The
    evaluation, using five text segmentation algorithms for the English corpus and
    four for the Greek corpus leads to the conclusion that, the benefit highly
    depends on the segment’s topic, the number of named entity instances appearing
    in it, as well as the segment’s length.

    Towards a continuous modeling of natural language domains

    Sebastian Ruder, Parsa Ghaffari, John G. Breslin
    Comments: 5 pages, 3 figures, published in Uphill Battles in Language Processing workshop, EMNLP 2016
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Humans continuously adapt their style and language to a variety of domains.
    However, a reliable definition of `domain’ has eluded researchers thus far.
    Additionally, the notion of discrete domains stands in contrast to the
    multiplicity of heterogeneous domains that humans navigate, many of which
    overlap. In order to better understand the change and variation of human
    language, we draw on research in domain adaptation and extend the notion of
    discrete domains to the continuous spectrum. We propose representation
    learning-based models that can adapt to continuous domains and detail how these
    can be used to investigate variation in language. To this end, we propose to
    use dialogue modeling as a test bed due to its proximity to language modeling
    and its social component.

    Representation Learning Models for Entity Search

    Shijia E, Yang Xiang, Mohan Zhang
    Comments: submitted to WWW2017
    Subjects: Computation and Language (cs.CL)

    We focus on the problem of learning distributed representations for entity
    search queries, named entities, and their short descriptions. With our
    representation learning models, the entity search query, named entity and
    description can be represented as low-dimensional vectors. Our goal is to
    develop a simple but effective model that can make the distributed
    representations of query related entities similar to the query in the vector
    space. Hence, we propose three kinds of learning strategies, and the difference
    between them mainly lies in how to deal with the relationship between an entity
    and its description. We analyze the strengths and weaknesses of each learning
    strategy and validate our methods on public datasets which contain four kinds
    of named entities, i.e., movies, TV shows, restaurants and celebrities. The
    experimental results indicate that our proposed methods can adapt to different
    types of entity search queries, and outperform the current state-of-the-art
    methods based on keyword matching and vanilla word2vec models. Besides, the
    proposed methods can be trained fast and be easily extended to other similar
    tasks.

    Sentiment Analysis of Twitter Data for Predicting Stock Market Movements

    Venkata Sasank Pagolu, Kamal Nayan Reddy Challa, Ganapati Panda, Babita Majhi
    Comments: 6 pages 4 figures Conference Paper
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Social and Information Networks (cs.SI)

    Predicting stock market movements is a well-known problem of interest.
    Now-a-days social media is perfectly representing the public sentiment and
    opinion about current events. Especially, twitter has attracted a lot of
    attention from researchers for studying the public sentiments. Stock market
    prediction on the basis of public sentiments expressed on twitter has been an
    intriguing field of research. Previous studies have concluded that the
    aggregate public mood collected from twitter may well be correlated with Dow
    Jones Industrial Average Index (DJIA). The thesis of this work is to observe
    how well the changes in stock prices of a company, the rises and falls, are
    correlated with the public opinions being expressed in tweets about that
    company. Understanding author’s opinion from a piece of text is the objective
    of sentiment analysis. The present paper have employed two different textual
    representations, Word2vec and N-gram, for analyzing the public sentiments in
    tweets. In this paper, we have applied sentiment analysis and supervised
    machine learning principles to the tweets extracted from twitter and analyze
    the correlation between stock market movements of a company and sentiments in
    tweets. In an elaborate way, positive news and tweets in social media about a
    company would definitely encourage people to invest in the stocks of that
    company and as a result the stock price of that company would increase. At the
    end of the paper, it is shown that a strong correlation exists between the rise
    and falls in stock prices with the public sentiments in tweets.


    Distributed, Parallel, and Cluster Computing

    Domain Specific Distributed Search Engine Based on Semantic P2P Networks

    Lican Huang
    Comments: 10 pages, 7 figures , ICNDC2016 conference
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    This paper presents a distributed search engine based on semantic P2P
    Networks. The user’s computers join the domains in which user wants to share
    information in semantic P2P networks which is domain specific virtual tree
    (VIRGO ). Each user computer contains search engine which indexes the domain
    specific information on local computer or Internet. We can get all search
    information through P2P message provided by all joined computers. By companies’
    effort, we have implemented a prototype of distributed search engine, which
    demonstrates easily retrieving domain-related information provided by joined
    computers .

    CHOPtrey: contextual online polynomial extrapolation for enhanced multi-core co-simulation of complex systems

    Abir Ben Khaled-El Feki, Laurent Duval, Cyril Faure, Daniel Simon, Mongi Ben Gaid
    Subjects: Systems and Control (cs.SY); Computational Engineering, Finance, and Science (cs.CE); Distributed, Parallel, and Cluster Computing (cs.DC)

    The growing complexity of Cyber-Physical Systems (CPS), together with
    increasingly available parallelism provided by multi-core chips, fosters the
    parallelization of simulation. Simulation speed-ups are expected from
    co-simulation and parallelization based on model splitting into weak-coupled
    sub-models, as for instance in the framework of Functional Mockup Interface
    (FMI). However, slackened synchronization between sub-models and their
    associated solvers running in parallel introduces integration errors, which
    must be kept inside acceptable bounds.

    CHOPtrey denotes a forecasting framework enhancing the performance of complex
    system co-simulation, with a trivalent articulation. First, we consider the
    framework of a Computationally Hasty Online Prediction system (CHOPred). It
    allows to improve the trade-off between integration speed-ups, needing large
    communication steps, and simulation precision, needing frequent updates for
    model inputs. Second, smoothed adaptive forward prediction improves
    co-simulation accuracy. It is obtained by past-weighted extrapolation based on
    Causal Hopping Oblivious Polynomials (CHOPoly). And third, signal behavior is
    segmented to handle the discontinuities of the exchanged signals: the
    segmentation is performed in a Contextual & Hierarchical Ontology of Patterns
    (CHOPatt).

    Implementation strategies and simulation results demonstrate the framework
    ability to adaptively relax data communication constraints beyond
    synchronization points which sensibly accelerate simulation. The CHOPtrey
    framework extends the range of applications of standard Lagrange-type methods,
    often deemed unstable. The embedding of predictions in lag-dependent smoothing
    and discontinuity handling demonstrates its practical efficiency.

    Performance evaluation of explicit finite difference algorithms with varying amounts of computational and memory intensity

    Satya P. Jammy, Christian T. Jacobs, Neil D. Sandham
    Comments: Author accepted version. Accepted for publication in Journal of Computational Science on 27 October 2016
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn)

    Future architectures designed to deliver exascale performance motivate the
    need for novel algorithmic changes in order to fully exploit their
    capabilities. In this paper, the performance of several numerical algorithms,
    characterised by varying degrees of memory and computational intensity, are
    evaluated in the context of finite difference methods for fluid dynamics
    problems. It is shown that, by storing some of the evaluated derivatives as
    single thread- or process-local variables in memory, or recomputing the
    derivatives on-the-fly, a speed-up of ~2 can be obtained compared to
    traditional algorithms that store all derivatives in global arrays.

    Type oriented parallel programming for Exascale

    Nick Brown
    Comments: As presented at the Exascale Applications and Software Conference (EASC), 9th-11th April 2013
    Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC)

    Whilst there have been great advances in HPC hardware and software in recent
    years, the languages and models that we use to program these machines have
    remained much more static. This is not from a lack of effort, but instead by
    virtue of the fact that the foundation that many programming languages are
    built on is not sufficient for the level of expressivity required for parallel
    work. The result is an implicit trade-off between programmability and
    performance which is made worse due to the fact that, whilst many scientific
    users are experts within their own fields, they are not HPC experts.

    Type oriented programming looks to address this by encoding the complexity of
    a language via the type system. Most of the language functionality is contained
    within a loosely coupled type library that can be flexibly used to control many
    aspects such as parallelism. Due to the high level nature of this approach
    there is much information available during compilation which can be used for
    optimisation and, in the absence of type information, the compiler can apply
    sensible default options thus supporting both the expert programmer and novice
    alike.

    We demonstrate that, at no performance or scalability penalty when running on
    up to 8196 cores of a Cray XE6 system, codes written in this type oriented
    manner provide improved programmability. The programmer is able to write
    simple, implicit parallel, HPC code at a high level and then explicitly tune by
    adding additional type information if required.


    Learning

    Correlated-PCA: Principal Components' Analysis when Data and Noise are Correlated

    Namrata Vaswani, Han Guo
    Comments: To appear in NIPS 2016. Longer version submitted to IEEE Trans. Sig. Proc. is availabe at this http URL arXiv admin note: text overlap with arXiv:1608.04320
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Given a matrix of observed data, Principal Components Analysis (PCA) computes
    a small number of orthogonal directions that contain most of its variability.
    Provably accurate solutions for PCA have been in use for a long time. However,
    to the best of our knowledge, all existing theoretical guarantees for it assume
    that the data and the corrupting noise are mutually independent, or at least
    uncorrelated. This is valid in practice often, but not always. In this paper,
    we study the PCA problem in the setting where the data and noise can be
    correlated. Such noise is often also referred to as “data-dependent noise”. We
    obtain a correctness result for the standard eigenvalue decomposition (EVD)
    based solution to PCA under simple assumptions on the data-noise correlation.
    We also develop and analyze a generalization of EVD, cluster-EVD, that improves
    upon EVD in certain regimes.

    Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods

    Antoine Gautier, Quynh Nguyen, Matthias Hein
    Comments: Long version of NIPS 2016 paper
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

    The optimization problem behind neural networks is highly non-convex.
    Training with stochastic gradient descent and variants requires careful
    parameter tuning and provides no guarantee to achieve the global optimum. In
    contrast we show under quite weak assumptions on the data that a particular
    class of feedforward neural networks can be trained globally optimal with a
    linear convergence rate with our nonlinear spectral method. Up to our knowledge
    this is the first practically feasible method which achieves such a guarantee.
    While the method can in principle be applied to deep networks, we restrict
    ourselves for simplicity in this paper to one and two hidden layer networks.
    Our experiments confirm that these models are rich enough to achieve good
    performance on a series of real-world datasets.

    Improving Sampling from Generative Autoencoders with Markov Chains

    Kai Arulkumaran, Antonia Creswell, Anil Anthony Bharath
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We focus on generative autoencoders, such as variational or adversarial
    autoencoders, which jointly learn a generative model alongside an inference
    model. We define generative autoencoders as autoencoders which are trained to
    softly enforce a prior on the latent distribution learned by the model.
    However, the model does not necessarily learn to match the prior. We formulate
    a Markov chain Monte Carlo (MCMC) sampling process, equivalent to iteratively
    encoding and decoding, which allows us to sample from the learned latent
    distribution. Using this we can improve the quality of samples drawn from the
    model, especially when the learned distribution is far from the prior. Using
    MCMC sampling, we also reveal previously unseen differences between generative
    autoencoders trained either with or without the denoising criterion.

    Toward Implicit Sample Noise Modeling: Deviation-driven Matrix Factorization

    Guang-He Lee, Shao-Wen Yang, Shou-De Lin
    Comments: 6 pages + 1 reference page
    Subjects: Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)

    The objective function of a matrix factorization model usually aims to
    minimize the average of a regression error contributed by each element.
    However, given the existence of stochastic noises, the implicit deviations of
    sample data from their true values are almost surely diverse, which makes each
    data point not equally suitable for fitting a model. In this case, simply
    averaging the cost among data in the objective function is not ideal.
    Intuitively we would like to emphasize more on the reliable instances (i.e.,
    those contain smaller noise) while training a model. Motivated by such
    observation, we derive our formula from a theoretical framework for optimal
    weighting under heteroscedastic noise distribution. Specifically, by modeling
    and learning the deviation of data, we design a novel matrix factorization
    model. Our model has two advantages. First, it jointly learns the deviation and
    conducts dynamic reweighting of instances, allowing the model to converge to a
    better solution. Second, during learning the deviated instances are assigned
    lower weights, which leads to faster convergence since the model does not need
    to overfit the noise. The experiments are conducted in clean recommendation and
    noisy sensor datasets to test the effectiveness of the model in various
    scenarios. The results show that our model outperforms the state-of-the-art
    factorization and deep learning models in both accuracy and efficiency.

    Hierarchical Clustering via Spreading Metrics

    Aurko Roy, Sebastian Pokutta
    Comments: Extended abstract in proceedings of NIPS 2016
    Subjects: Learning (cs.LG)

    We study the cost function for hierarchical clusterings introduced by
    [arXiv:1510.05043] where hierarchies are treated as first-class objects rather
    than deriving their cost from projections into flat clusters. It was also shown
    in [arXiv:1510.05043] that a top-down algorithm returns a hierarchical
    clustering of cost at most (Oleft(alpha_n log n
    ight)) times the cost of
    the optimal hierarchical clustering, where (alpha_n) is the approximation
    ratio of the Sparsest Cut subroutine used. Thus using the best known
    approximation algorithm for Sparsest Cut due to Arora-Rao-Vazirani, the top
    down algorithm returns a hierarchical clustering of cost at most
    (Oleft(log^{3/2} n
    ight)) times the cost of the optimal solution. We improve
    this by giving an (O(log{n}))-approximation algorithm for this problem. Our
    main technical ingredients are a combinatorial characterization of ultrametrics
    induced by this cost function, deriving an Integer Linear Programming (ILP)
    formulation for this family of ultrametrics, and showing how to iteratively
    round an LP relaxation of this formulation by using the idea of emph{sphere
    growing} which has been extensively used in the context of graph partitioning.
    We also prove that our algorithm returns an (O(log{n}))-approximate
    hierarchical clustering for a generalization of this cost function also studied
    in [arXiv:1510.05043]. Experiments show that the hierarchies found by using the
    ILP formulation as well as our rounding algorithm often have better projections
    into flat clusters than the standard linkage based algorithms. We also give
    constant factor inapproximability results for this problem.

    A Conceptual Development of Quench Prediction App build on LSTM and ELQA framework

    Matej Mertik, Maciej Wielgosz, Andrzej Skoczeń
    Subjects: Learning (cs.LG)

    This article presents a development of web application for quench prediction
    in gls{te-mpe-ee} at CERN. The authors describe an ELectrical Quality
    Assurance (ELQA) framework, a platform which was designed for rapid development
    of web integrated data analysis applications for different analysis needed
    during the hardware commissioning of the Large Hadron Collider (LHC). In second
    part the article describes a research carried out with the data collected from
    Quench Detection System by means of using an LSTM recurrent neural network. The
    article discusses and presents a conceptual work of implementing quench
    prediction application for gls{te-mpe-ee} based on the ELQA and quench
    prediction algorithm.

    SOL: A Library for Scalable Online Learning Algorithms

    Yue Wu, Steven C.H. Hoi, Chenghao Liu, Jing Lu, Doyen Sahoo, Nenghai Yu
    Comments: 5 pages
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    SOL is an open-source library for scalable online learning algorithms, and is
    particularly suitable for learning with high-dimensional data. The library
    provides a family of regular and sparse online learning algorithms for
    large-scale binary and multi-class classification tasks with high efficiency,
    scalability, portability, and extensibility. SOL was implemented in C++, and
    provided with a collection of easy-to-use command-line tools, python wrappers
    and library calls for users and developers, as well as comprehensive documents
    for both beginners and advanced users. SOL is not only a practical machine
    learning toolbox, but also a comprehensive experimental platform for online
    learning research. Experiments demonstrate that SOL is highly efficient and
    scalable for large-scale machine learning with high-dimensional data.

    Orthogonal Random Features

    Felix X. Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, Sanjiv Kumar
    Comments: NIPS 2016
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We present an intriguing discovery related to Random Fourier Features: in
    Gaussian kernel approximation, replacing the random Gaussian matrix by a
    properly scaled random orthogonal matrix significantly decreases kernel
    approximation error. We call this technique Orthogonal Random Features (ORF),
    and provide theoretical and empirical justification for this behavior.
    Motivated by this discovery, we further propose Structured Orthogonal Random
    Features (SORF), which uses a class of structured discrete orthogonal matrices
    to speed up the computation. The method reduces the time cost from
    (mathcal{O}(d^2)) to (mathcal{O}(d log d)), where (d) is the data
    dimensionality, with almost no compromise in kernel approximation quality
    compared to ORF. Experiments on several datasets verify the effectiveness of
    ORF and SORF over the existing methods. We also provide discussions on using
    the same type of discrete orthogonal structure for a broader range of
    applications.

    Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes

    Jack W Rae, Jonathan J Hunt, Tim Harley, Ivo Danihelka, Andrew Senior, Greg Wayne, Alex Graves, Timothy P Lillicrap
    Comments: in 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain
    Subjects: Learning (cs.LG)

    Neural networks augmented with external memory have the ability to learn
    algorithmic solutions to complex tasks. These models appear promising for
    applications such as language modeling and machine translation. However, they
    scale poorly in both space and time as the amount of memory grows — limiting
    their applicability to real-world domains. Here, we present an end-to-end
    differentiable memory access scheme, which we call Sparse Access Memory (SAM),
    that retains the representational power of the original approaches whilst
    training efficiently with very large memories. We show that SAM achieves
    asymptotic lower bounds in space and time complexity, and find that an
    implementation runs (1,!000 imes) faster and with (3,!000 imes) less
    physical memory than non-sparse models. SAM learns with comparable data
    efficiency to existing models on a range of synthetic tasks and one-shot
    Omniglot character recognition, and can scale to tasks requiring (100,!000)s
    of time steps and memories. As well, we show how our approach can be adapted
    for models that maintain temporal associations between memories, as with the
    recently introduced Differentiable Neural Computer.

    Homotopy Method for Tensor Principal Component Analysis

    Anima Anandkumar, Yuan Deng, Rong Ge, Hossein Mobah
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Developing efficient and guaranteed nonconvex algorithms has been an
    important challenge in modern machine learning. Algorithms with good empirical
    performance such as stochastic gradient descent often lack theoretical
    guarantees. In this paper, we analyze the class of homotopy or continuation
    methods for global optimization of nonconvex functions. These methods start
    from an objective function that is efficient to optimize (e.g. convex), and
    progressively modify it to obtain the required objective, and the solutions are
    passed along the homotopy path. For the challenging problem of tensor PCA, we
    prove global convergence of the homotopy method in the “high noise” regime. The
    signal-to-noise requirement for our algorithm is tight in the sense that it
    matches the recovery guarantee for the best degree-4 sum-of-squares algorithm.
    In addition, we prove a phase transition along the homotopy path for tensor
    PCA. This allows to simplify the homotopy method to a local search algorithm,
    viz., tensor power iterations, with a specific initialization and a noise
    injection procedure, while retaining the theoretical guarantees.

    Towards a continuous modeling of natural language domains

    Sebastian Ruder, Parsa Ghaffari, John G. Breslin
    Comments: 5 pages, 3 figures, published in Uphill Battles in Language Processing workshop, EMNLP 2016
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Humans continuously adapt their style and language to a variety of domains.
    However, a reliable definition of `domain’ has eluded researchers thus far.
    Additionally, the notion of discrete domains stands in contrast to the
    multiplicity of heterogeneous domains that humans navigate, many of which
    overlap. In order to better understand the change and variation of human
    language, we draw on research in domain adaptation and extend the notion of
    discrete domains to the continuous spectrum. We propose representation
    learning-based models that can adapt to continuous domains and detail how these
    can be used to investigate variation in language. To this end, we propose to
    use dialogue modeling as a test bed due to its proximity to language modeling
    and its social component.

    A framework for adaptive regularization in streaming Lasso models

    Ricardo Pio Monti, Christoforos Anagnostopoulos, Giovanni Montana
    Comments: 18 pages, 4 figures. arXiv admin note: text overlap with arXiv:1511.02187
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Large scale, streaming datasets are ubiquitous in modern machine learning.
    Streaming algorithms must be scalable, amenable to incremental training and
    robust to the presence of non-stationarity. In this work consider the problem
    of learning (ell_1) regularized linear models in the context of streaming
    data. In particular, the focus of this work revolves around how to select the
    regularization parameter when data arrives sequentially and the underlying
    distribution is non-stationary (implying the choice of optimal regularization
    parameter is itself time-varying). We propose a novel framework through which
    to infer an adaptive regularization parameter. Our approach employs an (ell_1)
    penalty constraint where the corresponding sparsity parameter is iteratively
    updated via stochastic gradient descent. This serves to reformulate the choice
    of regularization parameter in a principled framework for online learning and
    allows for the derivation of convergence guarantees in a non-stochastic
    setting. We validate our approach using simulated and real datasets and present
    an application to a neuroimaging dataset.

    (f)-Divergence Inequalities via Functional Domination

    Igal Sason, Sergio Verdú
    Comments: A conference paper, 5 pages. To be presented in the 2016 ICSEE International Conference on the Science of Electrical Engineering, Nov. 16–18, Eilat, Israel. See this https URL for the full paper version, published as a journal paper in the IEEE Trans. on Information Theory, vol. 62, no. 11, pp. 5973-6006, November 2016
    Subjects: Information Theory (cs.IT); Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)

    This paper considers derivation of (f)-divergence inequalities via the
    approach of functional domination. Bounds on an (f)-divergence based on one or
    several other (f)-divergences are introduced, dealing with pairs of probability
    measures defined on arbitrary alphabets. In addition, a variety of bounds are
    shown to hold under boundedness assumptions on the relative information. The
    journal paper, which includes more approaches for the derivation of
    f-divergence inequalities and proofs, is available on the arXiv at
    this https URL, and it has been published in the IEEE Trans.
    on Information Theory, vol. 62, no. 11, pp. 5973-6006, November 2016.

    Missing Data Imputation for Supervised Learning

    Jason Poulos, Rafael Valle
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    This paper compares methods for imputing missing categorical data for
    supervised learning tasks. The ability of researchers to accurately fit a model
    and yield unbiased estimates may be compromised by missing data, which are
    prevalent in survey-based social science research. We experiment on two machine
    learning benchmark datasets with missing categorical data, comparing
    classifiers trained on non-imputed (i.e., one-hot encoded) or imputed data with
    different degrees of missing-data perturbation. The results show imputation
    methods can increase predictive accuracy in the presence of missing-data
    perturbation. Additionally, we find that for imputed models, missing-data
    perturbation can improve prediction accuracy by regularizing the classifier.

    Professor Forcing: A New Algorithm for Training Recurrent Networks

    Alex Lamb, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron Courville, Yoshua Bengio
    Comments: NIPS 2016 Accepted Paper
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    The Teacher Forcing algorithm trains recurrent networks by supplying observed
    sequence values as inputs during training and using the network’s own
    one-step-ahead predictions to do multi-step sampling. We introduce the
    Professor Forcing algorithm, which uses adversarial domain adaptation to
    encourage the dynamics of the recurrent network to be the same when training
    the network and when sampling from the network over multiple time steps. We
    apply Professor Forcing to language modeling, vocal synthesis on raw waveforms,
    handwriting generation, and image generation. Empirically we find that
    Professor Forcing acts as a regularizer, improving test likelihood on character
    level Penn Treebank and sequential MNIST. We also find that the model
    qualitatively improves samples, especially when sampling for a large number of
    time steps. This is supported by human evaluation of sample quality. Trade-offs
    between Professor Forcing and Scheduled Sampling are discussed. We produce
    T-SNEs showing that Professor Forcing successfully makes the dynamics of the
    network during training and sampling more similar.

    Operator Variational Inference

    Rajesh Ranganath, Jaan Altosaar, Dustin Tran, David M. Blei
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Computation (stat.CO); Methodology (stat.ME)

    Variational inference is an umbrella term for algorithms which cast Bayesian
    inference as optimization. Classically, variational inference uses the
    Kullback-Leibler divergence to define the optimization. Though this divergence
    has been widely used, the resultant posterior approximation can suffer from
    undesirable statistical properties. To address this, we reexamine variational
    inference from its roots as an optimization problem. We use operators, or
    functions of functions, to design variational objectives. As one example, we
    design a variational objective with a Langevin-Stein operator. We develop a
    black box algorithm, operator variational inference (OPVI), for optimizing any
    operator objective. Importantly, operators enable us to make explicit the
    statistical and computational tradeoffs for variational inference. We can
    characterize different properties of variational objectives, such as objectives
    that admit data subsampling—allowing inference to scale to massive data—as
    well as objectives that admit variational programs—a rich class of posterior
    approximations that does not require a tractable density. We illustrate the
    benefits of OPVI on a mixture model and a generative model of images.

    Cross-Modal Scene Networks

    Yusuf Aytar, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
    Comments: See more at this http URL arXiv admin note: text overlap with arXiv:1607.07295
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Multimedia (cs.MM)

    People can recognize scenes across many different modalities beyond natural
    images. In this paper, we investigate how to learn cross-modal scene
    representations that transfer across modalities. To study this problem, we
    introduce a new cross-modal scene dataset. While convolutional neural networks
    can categorize scenes well, they also learn an intermediate representation not
    aligned across modalities, which is undesirable for cross-modal transfer
    applications. We present methods to regularize cross-modal convolutional neural
    networks so that they have a shared representation that is agnostic of the
    modality. Our experiments suggest that our scene representation can help
    transfer representations across modalities for retrieval. Moreover, our
    visualizations suggest that units emerge in the shared representation that tend
    to activate on consistent concepts independently of the modality.

    SoundNet: Learning Sound Representations from Unlabeled Video

    Yusuf Aytar, Carl Vondrick, Antonio Torralba
    Comments: NIPS 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Sound (cs.SD)

    We learn rich natural sound representations by capitalizing on large amounts
    of unlabeled sound data collected in the wild. We leverage the natural
    synchronization between vision and sound to learn an acoustic representation
    using two-million unlabeled videos. Unlabeled video has the advantage that it
    can be economically acquired at massive scales, yet contains useful signals
    about natural sound. We propose a student-teacher training procedure which
    transfers discriminative visual knowledge from well established visual
    recognition models into the sound modality using unlabeled video as a bridge.
    Our sound representation yields significant performance improvements over the
    state-of-the-art results on standard benchmarks for acoustic scene/object
    classification. Visualizations suggest some high-level semantics automatically
    emerge in the sound network, even though it is trained without ground truth
    labels.


    Information Theory

    User Cooperation for Enhanced Throughput Fairness in Wireless Powered Communication Networks

    Mingquan Zhong, Suzhi Bi, Xiaohui Lin
    Comments: This paper has been accepted by Springer Wireless Networks. arXiv admin note: text overlap with arXiv:1606.02033
    Subjects: Information Theory (cs.IT)

    This paper studies a novel user cooperation method in a wireless powered
    cooperative communication network (WPCN) in which a pair of distributed
    terminal users first harvest wireless energy broadcasted by one energy node
    (EN) and then use the harvested energy to transmit information to a destination
    node (DN). In particular, the two cooperating users exchange their independent
    information with each other so as to form a virtual antenna array and transmit
    jointly to the DN. By allowing the users to share their harvested energy to
    transmit each other’s information, the proposed method can effectively mitigate
    the inherent user unfairness problem in WPCN, where one user may suffer from
    very low data rate due to poor energy harvesting performance and high data
    transmission consumptions. Depending on the availability of channel state
    information at the transmitters, we consider the two users cooperating using
    either coherent or non-coherent data transmissions. In both cases, we derive
    the maximum common throughput achieved by the cooperation schemes through
    optimizing the time allocation on wireless energy transfer, user message
    exchange, and joint information transmissions in a fixed-length time slot. We
    also perform numerical analysis to study the impact of channel conditions on
    the system performance. By comparing with some existing benchmark schemes, our
    results demonstrate the effectiveness of the proposed user cooperation in a
    WPCN under different application scenarios.

    Generalized Common Information: Common Information Extraction and Private Sources Synthesis

    Lei Yu, Houqiang Li, Chang Wen Chen
    Comments: 42 pages. arXiv admin note: text overlap with arXiv:1203.0730 by other authors
    Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)

    In literature, two different common informations were defined by G’acs and
    K”orner and by Wyner, respectively. In this paper, we generalize and unify
    them, and define a generalized version of common information,
    information-correlation function, by exploiting maximal correlation as a
    commonness or privacy measure. The G’acs-K”orner common information and Wyner
    common information are two special and extreme cases of our generalized
    definition. Besides, we also study the problems of common information
    extraction and private sources synthesis, and show that information-correlation
    function is the optimal rate under a given maximal correlation constraint in
    these problems.

    Performance Impact of LOS and NLOS Transmissions in Dense Cellular Networks under Rician Fading

    Amir H. Jafari, Ming Ding, David Lopez-Perez, Jie Zhang
    Comments: 24 pages, 3 figures. Submitted to IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    In this paper, we analyse the performance of dense small cell network (SCNs).
    We derive analytical expressions for both their coverage probability and their
    area spectral efficiency (ASE) using a path loss model that considers both
    line-of-sight (LOS) and non-LOS (NLOS) components. Due to the close proximity
    of small cell base stations (BSs) and user equipments (UEs) in such dense SCNs,
    we also consider Rician fading as the multi-path fading channel model for both
    the LOS and NLOS fading transmissions. The Rayleigh fading used in most of
    existing works analysing dense SCNs is not accurate enough. Then, we compare
    the performance impact of LOS and NLOS transmissions in dense SCNs under Rician
    fading with that based on Rayleigh fading. The analysis and the simulation
    results show that in dense SCNs where LOS transmissions dominate the
    performance, the impact of Rician fading on the overall system performance is
    minor, and does not help to address the performance losses brought by the
    transition of many interfering signals from NLOS to LOS.

    Generalized I-MMSE for K-User Gaussian Channels

    Samah A. M. Ghanem
    Comments: arXiv admin note: substantial text overlap with arXiv:1504.06884
    Subjects: Information Theory (cs.IT)

    In this paper, we generalize the fundamental relation between the mutual
    information and the minimum mean squared error (MMSE) by Guo, Shamai, and Verdu
    [1] to K-User Gaussian channels. We prove that the derivative of the multiuser
    mutual information with respect to the signal to noise ratio (SNR) is equal to
    the total MMSE plus a covariance term with respect to the cross correlation of
    the multiuser input estimates, the channels and the precoding matrices. We shed
    light that such relation is a generalized I-MMSE with one step lookahead and
    lookback, applied to the Successive Interference Cancellation (SIC) in the
    decoding process.

    An Application of Group Theory in Confidential Network Communications

    Juan Antonio Lopez-Ramos, Joachim Rosenthal, Davide Schipani, Reto Schnyder
    Comments: to appear in Mathematical Methods in the Applied Sciences
    Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)

    A new proposal for group key exchange is introduced which proves to be both
    efficient and secure and compares favorably with state of the art protocols.

    Decentralized Power Control for Slotted Spread Spectrum Aloha with Successive Interference Cancellation

    Francisco Lázaro
    Comments: Accepted for publication at the 11th International ITG Conference on Systems, Communications and Coding, SCC 2017
    Subjects: Information Theory (cs.IT)

    In this paper, we study slotted Spread Spectrum Aloha with Successive
    Interference Cancellation at the receiver over a Gaussian channel. We consider
    a decentralized power control setting in which each user chooses its transmit
    power independently at random according to a power distribution with continuous
    support. In this setting, we derive an analytical expression for the expected
    interference power experienced by a user. This allows us to derive analytically
    the power distribution that, during the Successive Interference Cancellation
    process leads to a constant signal to noise plus interference ratio for all
    users. We consider both perfect and imperfect interference cancellation.

    Finite-Length Analysis of Frameless ALOHA

    Francisco Lázaro, Čedomir Stefanović
    Comments: Accepted for publication at SCC 2017
    Subjects: Information Theory (cs.IT)

    In this paper we present an exact finite-length analysis of frameless ALOHA
    that is obtained through a dynamical programming approach. Monte Carlo
    simulations are performed in order to verify the analysis. Two examples are
    provided that illustrate how the analysis can be used to optimize the
    parameters of frameless ALOHA. To the best of the knowledge of the authors,
    this is the first contribution dealing with an exact finite-length
    characterization of a protocol from the coded slotted ALOHA family of
    protocols.

    Steearable Discrete Cosine Transform

    Giulia Fracastoro, Sophie Marie Fosson, Enrico Magli
    Subjects: Information Theory (cs.IT); Multimedia (cs.MM); Optimization and Control (math.OC)

    In image compression, classical block-based separable transforms tend to be
    inefficient when image blocks contain arbitrarily shaped discontinuities. For
    this reason, transforms incorporating directional information are an appealing
    alternative. In this paper, we propose a new approach to this problem, namely a
    discrete cosine transform (DCT) that can be steered in any chosen direction.
    Such transform, called steerable DCT (SDCT), allows to rotate in a flexible way
    pairs of basis vectors, and enables precise matching of directionality in each
    image block, achieving improved coding efficiency. The optimal rotation angles
    for SDCT can be represented as solution of a suitable rate-distortion (RD)
    problem. We propose iterative methods to search such solution, and we develop a
    fully fledged image encoder to practically compare our techniques with other
    competing transforms. Analytical and numerical results prove that SDCT
    outperforms both DCT and state-of-the-art directional transforms.

    Symbol Synchronization for Diffusive Molecular Communication Systems

    Vahid Jamali, Arman Ahmadzadeh, Robert Schober
    Comments: This paper has been submitted for presentation at IEEE International Conference on Communications (ICC) 2017
    Subjects: Information Theory (cs.IT)

    Symbol synchronization refers to the estimation of the start of a symbol
    interval and is needed for reliable detection. In this paper, we develop a
    symbol synchronization framework for molecular communication (MC) systems where
    we consider some practical challenges which have not been addressed in the
    literature yet. In particular, we take into account that in MC systems, the
    transmitter may not be equipped with an internal clock and may not be able to
    emit molecules with a fixed release frequency. Such restrictions hold for
    practical nanotransmitters, e.g. modified cells, where the lengths of the
    symbol intervals may vary due to the inherent randomness in the availability of
    food and energy for molecule generation, the process for molecule production,
    and the release process. To address this issue, we propose to employ two types
    of molecules, one for synchronization and one for data transmission. We derive
    the optimal maximum likelihood (ML) symbol synchronization scheme as a
    performance upper bound. Since ML synchronization entails high complexity, we
    also propose two low-complexity synchronization schemes, namely a peak
    observation-based scheme and a threshold-trigger scheme, which are suitable for
    MC systems with limited computational capabilities. Our simulation results
    reveal the effectiveness of the proposed synchronization~schemes and suggest
    that the end-to-end performance of MC systems significantly depends on the
    accuracy of symbol synchronization.

    (f)-Divergence Inequalities via Functional Domination

    Igal Sason, Sergio Verdú
    Comments: A conference paper, 5 pages. To be presented in the 2016 ICSEE International Conference on the Science of Electrical Engineering, Nov. 16–18, Eilat, Israel. See this https URL for the full paper version, published as a journal paper in the IEEE Trans. on Information Theory, vol. 62, no. 11, pp. 5973-6006, November 2016
    Subjects: Information Theory (cs.IT); Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)

    This paper considers derivation of (f)-divergence inequalities via the
    approach of functional domination. Bounds on an (f)-divergence based on one or
    several other (f)-divergences are introduced, dealing with pairs of probability
    measures defined on arbitrary alphabets. In addition, a variety of bounds are
    shown to hold under boundedness assumptions on the relative information. The
    journal paper, which includes more approaches for the derivation of
    f-divergence inequalities and proofs, is available on the arXiv at
    this https URL, and it has been published in the IEEE Trans.
    on Information Theory, vol. 62, no. 11, pp. 5973-6006, November 2016.

    Broadcast Coded Modulation: Multilevel and Bit-interleaved Construction

    Ahmed Abotabl, Aria Nosratinia
    Subjects: Information Theory (cs.IT)

    The capacity of the AWGN broadcast channel is achieved by superposition
    coding, but superposition of individual coded modulations expands the
    modulation alphabet and distorts its configuration. Coded modulation over a
    broadcast channel subject to a specific channel-input modulation constraint
    remains an important open problem. Some progress has been made in the related
    area of unequal-error protection modulations which can be considered
    single-user broadcast transmission, but it does not approach all points on the
    boundary of the capacity region. This paper studies broadcast coded modulation
    using multilevel coding (MLC) subject to a specific channel input
    constellation. The conditions under which multilevel codes can achieve the
    constellation-constrained capacity of the AWGN broadcast channel are derived.
    For any given constellation, we propose a pragmatic multilevel design technique
    with near constellation-constrained-capacity performance where the coupling of
    the superposition inner and outer codes are localized to each bit-level. It is
    shown that this can be further relaxed to a code coupling on only one bit
    level, with little or no penalty under natural labeling. The rate allocation
    problem between the bit levels of the two users is studied and a pragmatic
    method is proposed, again with near-capacity performance. In further pursuit of
    lower complexity, a hybrid MLC-BICM is proposed, whose performance is shown to
    be very close to the boundary of the constellation-constrained capacity region.
    Simulation results show that good point-to-point LDPC codes produce excellent
    performance in the proposed coded modulation framework.

    Through the Haze: A Non-Convex Approach to Blind Calibration for Linear Random Sensing Models

    Valerio Cambareri, Laurent Jacques
    Comments: 42 pages, 7 figures. A finalised version of this draft is being submitted to Information and Inference: a Journal of the IMA
    Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

    Computational sensing strategies often suffer from calibration errors in the
    physical implementation of their ideal sensing models. Such uncertainties are
    typically addressed by using multiple, accurately chosen training signals to
    recover the missing information on the sensing model, an approach that can be
    resource-consuming and cumbersome. Conversely, blind calibration does not
    employ any training signal, but corresponds to a bilinear inverse problem whose
    algorithmic solution is an open issue. We here address blind calibration as a
    non-convex problem for linear random sensing models, in which we aim to recover
    an unknown signal from its projections on sub-Gaussian random vectors, each
    subject to an unknown multiplicative factor (or gain). To solve this
    optimisation problem we resort to projected gradient descent starting from a
    suitable, carefully chosen initialisation point. An analysis of this algorithm
    allows us to show that it converges to the global optimum provided a sample
    complexity requirement is met, i.e., relating convergence to the amount of
    information collected during the sensing process. Interestingly, we show that
    this requirement is actually linear (up to (log) factors) in the number of
    unknowns of the problem. This sample complexity is found both in absence of
    prior information, as well as when subspace priors are available for both the
    signal and gains, allowing a further reduction of the number of observations
    required for their provably exact recovery. Moreover, in the presence of noise
    we show how our algorithm yields a solution whose accuracy degrades gracefully
    with the amount of noise affecting the measurements. Finally, we present some
    numerical experiments in an imaging context, for which our algorithm allows for
    a simple solution to blind calibration of the gains in an imaging sensor array.

    Direct-dynamical entanglement-discord relations

    Virginia Feldman, Jonas Maziero, A. Auyuanet
    Comments: 7 pages, 3 figures
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    In this article, by considering Bell-diagonal two-qubit initial states
    submitted to local dynamics generated by the phase damping, bit flip, phase
    flip, bit-phase flip, and depolarizing channels, we report some elegant
    direct-dynamical relations between geometric measures of entanglement and
    discord. The complex scenario appearing already in this simplified case study
    indicates that similarly simple relation shall hardly be found in more general
    situations.




沪ICP备19023445号-2号
友情链接