IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Thu, 3 Nov 2016

    我爱机器学习(52ml.net)发表于 2016-11-03 00:00:00
    love 0

    Neural and Evolutionary Computing

    Extensions and Limitations of the Neural GPU

    Eric Price, Wojciech Zaremba, Ilya Sutskever
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

    The Neural GPU is a recent model that can learn algorithms such as
    multi-digit binary addition and binary multiplication in a way that generalizes
    to inputs of arbitrary length. We show that there are two simple ways of
    improving the performance of the Neural GPU: by carefully designing a
    curriculum, and by increasing model size. The latter requires careful memory
    management, as a naive implementation of the Neural GPU is memory intensive. We
    find that these techniques to increase the set of algorithmic problems that can
    be solved by the Neural GPU: we have been able to learn to perform all the
    arithmetic operations (and generalize to arbitrarily long numbers) when the
    arguments are given in the decimal representation (which, surprisingly, has not
    been possible before). We have also been able to train the Neural GPU to
    evaluate long arithmetic expressions with multiple operands that require
    respecting the precedence order of the operands, although these have succeeded
    only in their binary representation, and not with 100\% accuracy.

    In addition, we attempt to gain insight into the Neural GPU by understanding
    its failure modes. We find that Neural GPUs that correctly generalize to
    arbitrarily long numbers still fail to compute the correct answer on
    highly-symmetric, atypical inputs: for example, a Neural GPU that achieves
    near-perfect generalization on decimal multiplication of up to 100-digit long
    numbers can fail on (000000dots002 imes 000000dots002) while succeeding at
    (2 imes 2). These failure modes are reminiscent of adversarial examples.

    Deep counter networks for asynchronous event-based processing

    Jonathan Binas, Giacomo Indiveri, Michael Pfeiffer
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Despite their advantages in terms of computational resources, latency, and
    power consumption, event-based implementations of neural networks have not been
    able to achieve the same performance figures as their equivalent
    state-of-the-art deep network models. We propose counter neurons as minimal
    spiking neuron models which only require addition and comparison operations,
    thus avoiding costly multiplications. We show how inference carried out in deep
    counter networks converges to the same accuracy levels as are achieved with
    state-of-the-art conventional networks. As their event-based style of
    computation leads to reduced latency and sparse updates, counter networks are
    ideally suited for efficient compact and low-power hardware implementation. We
    present theory and training methods for counter networks, and demonstrate on
    the MNIST benchmark that counter networks converge quickly, both in terms of
    time and number of operations required, to state-of-the-art classification
    accuracy.

    The new hybrid COAW method for solving multi-objective problems

    Zeinab Borhanifar, Elham Shadkam
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

    In this article using Cuckoo Optimization Algorithm and simple additive
    weighting method the hybrid COAW algorithm is presented to solve
    multi-objective problems. Cuckoo algorithm is an efficient and structured
    method for solving nonlinear continuous problems. The created Pareto frontiers
    of the COAW proposed algorithm are exact and have good dispersion. This method
    has a high speed in finding the Pareto frontiers and identifies the beginning
    and end points of Pareto frontiers properly. In order to validation the
    proposed algorithm, several experimental problems were analyzed. The results of
    which indicate the proper effectiveness of COAW algorithm for solving
    multi-objective problems.

    Deep Neural Networks for HDR imaging

    Kshiteej Sheth
    Comments: 9 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose novel methods of solving two tasks using Convolutional Neural
    Networks, firstly the task of generating HDR map of a static scene using
    differently exposed LDR images of the scene captured using conventional cameras
    and secondly the task of finding an optimal tone mapping operator that would
    give a better score on the TMQI metric compared to the existing methods. We
    quantitatively show the performance of our networks and illustrate the cases
    where our networks performs good as well as bad.


    Computer Vision and Pattern Recognition

    Wearable Vision Detection of Environmental Fall Risks using Convolutional Neural Networks

    Mina Nouredanesh, Andrew McCormick, Sunil L. Kukreja, James Tung
    Comments: Accepted paper-The 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2016)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, a method to detect environmental hazards related to a fall
    risk using a mobile vision system is proposed. First-person perspective videos
    are proposed to provide objective evidence on cause and circumstances of
    perturbed balance during activities of daily living, targeted to seniors. A
    classification problem was defined with 12 total classes of potential fall
    risks, including slope changes (e.g., stairs, curbs, ramps) and surfaces (e.g.,
    gravel, grass, concrete). Data was collected using a chest-mounted GoPro
    camera. We developed a convolutional neural network for automatic feature
    extraction, reduction, and classification of frames. Initial results, with a
    mean square error of 8%, are promising.

    Deep Neural Networks for HDR imaging

    Kshiteej Sheth
    Comments: 9 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose novel methods of solving two tasks using Convolutional Neural
    Networks, firstly the task of generating HDR map of a static scene using
    differently exposed LDR images of the scene captured using conventional cameras
    and secondly the task of finding an optimal tone mapping operator that would
    give a better score on the TMQI metric compared to the existing methods. We
    quantitatively show the performance of our networks and illustrate the cases
    where our networks performs good as well as bad.

    Dual Attention Networks for Multimodal Reasoning and Matching

    Hyeonseob Nam, Jung-Woo Ha, Jeonghee Kim
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose Dual Attention Networks (DANs) which jointly leverage visual and
    textual attention mechanisms to capture fine-grained interplay between vision
    and language. DANs attend to specific regions in images and words in text
    through multiple steps and gather essential information from both modalities.
    Based on this framework, we introduce two types of DANs for multimodal
    reasoning and matching, respectively. First, the reasoning model allows visual
    and textual attentions to steer each other during collaborative inference,
    which is useful for tasks such as Visual Question Answering (VQA). Second, the
    matching model exploits the two attention mechanisms to estimate the similarity
    between images and sentences by focusing on their shared semantics. Our
    extensive experiments validate the effectiveness of DANs in combining vision
    and language, achieving the state-of-the-art performance on public benchmarks
    for VQA and image-text matching.

    CRF-CNN: Modeling Structured Information in Human Pose Estimation

    Xiao Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang
    Comments: NIPS
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep convolutional neural networks (CNN) have achieved great success. On the
    other hand, modeling structural information has been proved critical in many
    vision problems. It is of great interest to integrate them effectively. In a
    classical neural network, there is no message passing between neurons in the
    same layer. In this paper, we propose a CRF-CNN framework which can
    simultaneously model structural information in both output and hidden feature
    layers in a probabilistic way, and it is applied to human pose estimation. A
    message passing scheme is proposed, so that in various layers each body joint
    receives messages from all the others in an efficient way. Such message passing
    can be implemented with convolution between features maps in the same layer,
    and it is also integrated with feedforward propagation in neural networks.
    Finally, a neural network implementation of end-to-end learning CRF-CNN is
    provided. Its effectiveness is demonstrated through experiments on two
    benchmark datasets.

    Flood-Filling Networks

    Michał Januszewski, Jeremy Maitin-Shepard, Peter Li, Jörgen Kornfeld, Winfried Denk, Viren Jain
    Comments: 11 pages, 4 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    State-of-the-art image segmentation algorithms generally consist of at least
    two successive and distinct computations: a boundary detection process that
    uses local image information to classify image locations as boundaries between
    objects, followed by a pixel grouping step such as watershed or connected
    components that clusters pixels into segments. Prior work has varied the
    complexity and approach employed in these two steps, including the
    incorporation of multi-layer neural networks to perform boundary prediction,
    and the use of global optimizations during pixel clustering. We propose a
    unified and end-to-end trainable machine learning approach, flood-filling
    networks, in which a recurrent 3d convolutional network directly produces
    individual segments from a raw image. The proposed approach robustly segments
    images with an unknown and variable number of objects as well as highly
    variable object sizes. We demonstrate the approach on a challenging 3d image
    segmentation task, connectomic reconstruction from volume electron microscopy
    data, on which flood-filling neural networks substantially improve accuracy
    over other state-of-the-art methods. The proposed approach can replace complex
    multi-step segmentation pipelines with a single neural network that is learned
    end-to-end.

    Solving Visual Madlibs with Multiple Cues

    Tatiana Tommasi, Arun Mallya, Bryan Plummer, Svetlana Lazebnik, Alexander C. Berg, Tamara L. Berg
    Comments: submitted to IJCV — under review
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents an approach for answering fill-in-the-blank multiple
    choice questions from the Visual Madlibs dataset. Instead of generic and
    commonly used representations trained on the ImageNet dataset, our approach
    employs a combination of networks trained for specialized tasks such as scene
    recognition, person activity classification, and attribute prediction. We also
    present a method for localizing phrases from candidate answers in order to
    provide spatial support for feature extraction. We map each of these features,
    together with candidate answers, to a joint embedding space through normalized
    canonical correlation analysis (CCA). Finally, we solve an optimization problem
    to learn to combine CCA scores from multiple cues to select the best answer.
    Extensive experimental results show a significant improvement over the previous
    state of the art and confirm that answering questions from a wide range of
    types benefits from examining a variety of image cues and carefully choosing
    the spatial support of feature extraction.

    Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks

    Hao Wang, Xingjian Shi, Dit-Yan Yeung
    Comments: To appear at NIPS 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Hybrid methods that utilize both content and rating information are commonly
    used in many recommender systems. However, most of them use either handcrafted
    features or the bag-of-words representation as a surrogate for the content
    information but they are neither effective nor natural enough. To address this
    problem, we develop a collaborative recurrent autoencoder (CRAE) which is a
    denoising recurrent autoencoder (DRAE) that models the generation of content
    sequences in the collaborative filtering (CF) setting. The model generalizes
    recent advances in recurrent deep learning from i.i.d. input to non-i.i.d.
    (CF-based) input and provides a new denoising scheme along with a novel
    learnable pooling scheme for the recurrent autoencoder. To do this, we first
    develop a hierarchical Bayesian model for the DRAE and then generalize it to
    the CF setting. The synergy between denoising and CF enables CRAE to make
    accurate recommendations while learning to fill in the blanks in sequences.
    Experiments on real-world datasets from different domains (CiteULike and
    Netflix) show that, by jointly modeling the order-aware generation of sequences
    for the content information and performing CF for the ratings, CRAE is able to
    significantly outperform the state of the art on both the recommendation task
    based on ratings and the sequence generation task based on content information.

    Natural-Parameter Networks: A Class of Probabilistic Neural Networks

    Hao Wang, Xingjian Shi, Dit-Yan Yeung
    Comments: To appear at NIPS 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Neural networks (NN) have achieved state-of-the-art performance in various
    applications. Unfortunately in applications where training data is
    insufficient, they are often prone to overfitting. One effective way to
    alleviate this problem is to exploit the Bayesian approach by using Bayesian
    neural networks (BNN). Another shortcoming of NN is the lack of flexibility to
    customize different distributions for the weights and neurons according to the
    data, as is often done in probabilistic graphical models. To address these
    problems, we propose a class of probabilistic neural networks, dubbed
    natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment
    of NN. NPN allows the usage of arbitrary exponential-family distributions to
    model the weights and neurons. Different from traditional NN and BNN, NPN takes
    distributions as input and goes through layers of transformation before
    producing distributions to match the target output distributions. As a Bayesian
    treatment, efficient backpropagation (BP) is performed to learn the natural
    parameters for the distributions over both the weights and neurons. The output
    distributions of each layer, as byproducts, may be used as second-order
    representations for the associated tasks such as link prediction. Experiments
    on real-world datasets show that NPN can achieve state-of-the-art performance.


    Artificial Intelligence

    A Framework for Searching for General Artificial Intelligence

    Marek Rosa, Jan Feyereisl, The GoodAI Collective
    Subjects: Artificial Intelligence (cs.AI)

    There is a significant lack of unified approaches to building generally
    intelligent machines. The majority of current artificial intelligence research
    operates within a very narrow field of focus, frequently without considering
    the importance of the ‘big picture’. In this document, we seek to describe and
    unify principles that guide the basis of our development of general artificial
    intelligence. These principles revolve around the idea that intelligence is a
    tool for searching for general solutions to problems. We define intelligence as
    the ability to acquire skills that narrow this search, diversify it and help
    steer it to more promising areas. We also provide suggestions for studying,
    measuring, and testing the various skills and abilities that a human-level
    intelligent machine needs to acquire. The document aims to be both
    implementation agnostic, and to provide an analytic, systematic, and scalable
    way to generate hypotheses that we believe are needed to meet the necessary
    conditions in the search for general artificial intelligence. We believe that
    such a framework is an important stepping stone for bringing together
    definitions, highlighting open problems, connecting researchers willing to
    collaborate, and for unifying the arguably most significant search of this
    century.

    Strong Neutrosophic Graphs and Subgraph Topological Subspaces

    W. B. Vasantha Kandasamy, Ilanthenral K, Florentin Smarandache
    Comments: 226 pages, many graphs, Europa Belgique, 2016
    Subjects: Artificial Intelligence (cs.AI)

    In this book authors for the first time introduce the notion of strong
    neutrosophic graphs. They are very different from the usual graphs and
    neutrosophic graphs. Using these new structures special subgraph topological
    spaces are defined. Further special lattice graph of subgraphs of these graphs
    are defined and described. Several interesting properties using subgraphs of a
    strong neutrosophic graph are obtained. Several open conjectures are proposed.
    These new class of strong neutrosophic graphs will certainly find applications
    in Neutrosophic Cognitive Maps (NCM), Neutrosophic Relational Maps (NRM) and
    Neutrosophic Relational Equations (NRE) with appropriate modifications.

    Inferring Coupling of Distributed Dynamical Systems via Transfer Entropy

    Oliver M. Cliff, Mikhail Prokopenko, Robert Fitch
    Subjects: Artificial Intelligence (cs.AI)

    In this work, we are interested in structure learning for a set of spatially
    distributed dynamical systems, where individual subsystems are coupled via
    latent variables and observed through a filter. We represent this model as a
    directed acyclic graph (DAG) that characterises the unidirectional coupling
    between subsystems. Standard approaches to structure learning are not
    applicable in this framework due to the hidden variables, however we can
    exploit the properties of certain dynamical systems to formulate exact methods
    based on state space reconstruction. We approach the problem by using
    reconstruction theorems to analytically derive a tractable expression for the
    KL-divergence of a candidate DAG from the observed dataset. We show this
    measure can be decomposed as a function of two information-theoretic measures,
    transfer entropy and stochastic interaction. We then present two mathematically
    robust scoring functions based on transfer entropy and statistical independence
    tests. These results support the previously held conjecture that transfer
    entropy can be used to infer effective connectivity in complex networks.

    An application of incomplete pairwise comparison matrices for ranking top tennis players

    Sándor Bozóki, László Csató, József Temesi
    Comments: 14 pages, 2 figures
    Journal-ref: European Journal of Operational Research (2016). 248(1): 211-218
    Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Applications (stat.AP)

    Pairwise comparison is an important tool in multi-attribute decision making.
    Pairwise comparison matrices (PCM) have been applied for ranking criteria and
    for scoring alternatives according to a given criterion. Our paper presents a
    special application of incomplete PCMs: ranking of professional tennis players
    based on their results against each other. The selected 25 players have been on
    the top of the ATP rankings for a shorter or longer period in the last 40
    years. Some of them have never met on the court. One of the aims of the paper
    is to provide ranking of the selected players, however, the analysis of
    incomplete pairwise comparison matrices is also in the focus. The eigenvector
    method and the logarithmic least squares method were used to calculate weights
    from incomplete PCMs. In our results the top three players of four decades were
    Nadal, Federer and Sampras. Some questions have been raised on the properties
    of incomplete PCMs and remains open for further investigation.

    Extensions and Limitations of the Neural GPU

    Eric Price, Wojciech Zaremba, Ilya Sutskever
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

    The Neural GPU is a recent model that can learn algorithms such as
    multi-digit binary addition and binary multiplication in a way that generalizes
    to inputs of arbitrary length. We show that there are two simple ways of
    improving the performance of the Neural GPU: by carefully designing a
    curriculum, and by increasing model size. The latter requires careful memory
    management, as a naive implementation of the Neural GPU is memory intensive. We
    find that these techniques to increase the set of algorithmic problems that can
    be solved by the Neural GPU: we have been able to learn to perform all the
    arithmetic operations (and generalize to arbitrarily long numbers) when the
    arguments are given in the decimal representation (which, surprisingly, has not
    been possible before). We have also been able to train the Neural GPU to
    evaluate long arithmetic expressions with multiple operands that require
    respecting the precedence order of the operands, although these have succeeded
    only in their binary representation, and not with 100\% accuracy.

    In addition, we attempt to gain insight into the Neural GPU by understanding
    its failure modes. We find that Neural GPUs that correctly generalize to
    arbitrarily long numbers still fail to compute the correct answer on
    highly-symmetric, atypical inputs: for example, a Neural GPU that achieves
    near-perfect generalization on decimal multiplication of up to 100-digit long
    numbers can fail on (000000dots002 imes 000000dots002) while succeeding at
    (2 imes 2). These failure modes are reminiscent of adversarial examples.

    TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games

    Gabriel Synnaeve, Nantas Nardelli, Alex Auvolat, Soumith Chintala, Timothée Lacroix, Zeming Lin, Florian Richoux, Nicolas Usunier
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    We present TorchCraft, an open-source library that enables deep learning
    research on Real-Time Strategy (RTS) games such as StarCraft: Brood War, by
    making it easier to control these games from a machine learning framework, here
    Torch. This white paper argues for using RTS games as a benchmark for AI
    research, and describes the design and components of TorchCraft.

    The new hybrid COAW method for solving multi-objective problems

    Zeinab Borhanifar, Elham Shadkam
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

    In this article using Cuckoo Optimization Algorithm and simple additive
    weighting method the hybrid COAW algorithm is presented to solve
    multi-objective problems. Cuckoo algorithm is an efficient and structured
    method for solving nonlinear continuous problems. The created Pareto frontiers
    of the COAW proposed algorithm are exact and have good dispersion. This method
    has a high speed in finding the Pareto frontiers and identifies the beginning
    and end points of Pareto frontiers properly. In order to validation the
    proposed algorithm, several experimental problems were analyzed. The results of
    which indicate the proper effectiveness of COAW algorithm for solving
    multi-objective problems.

    Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks

    Hao Wang, Xingjian Shi, Dit-Yan Yeung
    Comments: To appear at NIPS 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Hybrid methods that utilize both content and rating information are commonly
    used in many recommender systems. However, most of them use either handcrafted
    features or the bag-of-words representation as a surrogate for the content
    information but they are neither effective nor natural enough. To address this
    problem, we develop a collaborative recurrent autoencoder (CRAE) which is a
    denoising recurrent autoencoder (DRAE) that models the generation of content
    sequences in the collaborative filtering (CF) setting. The model generalizes
    recent advances in recurrent deep learning from i.i.d. input to non-i.i.d.
    (CF-based) input and provides a new denoising scheme along with a novel
    learnable pooling scheme for the recurrent autoencoder. To do this, we first
    develop a hierarchical Bayesian model for the DRAE and then generalize it to
    the CF setting. The synergy between denoising and CF enables CRAE to make
    accurate recommendations while learning to fill in the blanks in sequences.
    Experiments on real-world datasets from different domains (CiteULike and
    Netflix) show that, by jointly modeling the order-aware generation of sequences
    for the content information and performing CF for the ratings, CRAE is able to
    significantly outperform the state of the art on both the recommendation task
    based on ratings and the sequence generation task based on content information.

    Natural-Parameter Networks: A Class of Probabilistic Neural Networks

    Hao Wang, Xingjian Shi, Dit-Yan Yeung
    Comments: To appear at NIPS 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Neural networks (NN) have achieved state-of-the-art performance in various
    applications. Unfortunately in applications where training data is
    insufficient, they are often prone to overfitting. One effective way to
    alleviate this problem is to exploit the Bayesian approach by using Bayesian
    neural networks (BNN). Another shortcoming of NN is the lack of flexibility to
    customize different distributions for the weights and neurons according to the
    data, as is often done in probabilistic graphical models. To address these
    problems, we propose a class of probabilistic neural networks, dubbed
    natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment
    of NN. NPN allows the usage of arbitrary exponential-family distributions to
    model the weights and neurons. Different from traditional NN and BNN, NPN takes
    distributions as input and goes through layers of transformation before
    producing distributions to match the target output distributions. As a Bayesian
    treatment, efficient backpropagation (BP) is performed to learn the natural
    parameters for the distributions over both the weights and neurons. The output
    distributions of each layer, as byproducts, may be used as second-order
    representations for the associated tasks such as link prediction. Experiments
    on real-world datasets show that NPN can achieve state-of-the-art performance.

    Bots as Virtual Confederates: Design and Ethics

    Peter M Krafft, Michael Macy, Alex Pentland
    Comments: Forthcoming in CSCW 2017
    Journal-ref: The 20th ACM Conference on Computer-Supported Cooperative Work and
    Social Computing (CSCW) (2016)
    Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

    The use of bots as virtual confederates in online field experiments holds
    extreme promise as a new methodological tool in computational social science.
    However, this potential tool comes with inherent ethical challenges. Informed
    consent can be difficult to obtain in many cases, and the use of confederates
    necessarily implies the use of deception. In this work we outline a design
    space for bots as virtual confederates, and we propose a set of guidelines for
    meeting the status quo for ethical experimentation. We draw upon examples from
    prior work in the CSCW community and the broader social science literature for
    illustration. While a handful of prior researchers have used bots in online
    experimentation, our work is meant to inspire future work in this area and
    raise awareness of the associated ethical issues.


    Information Retrieval

    A bioinformatics system for searching Co-Occurrence based on Co-Operational Formation with Advanced Method (COCOFAM)

    Junseok Park, Gwangmin Kim, Dongjin Jang, Sungji Choo, Sunghwa Bae, Doheon Lee
    Comments: 5 pages, 4 figures
    Subjects: Information Retrieval (cs.IR); Distributed, Parallel, and Cluster Computing (cs.DC)

    Literature analysis is a key step in obtaining background information in
    biomedical research. However, it is difficult for researchers to obtain
    knowledge of their interests in an efficient manner because of the massive
    amount of the published biomedical literature. Therefore, efficient and
    systematic search strategies are required, which allow ready access to the
    substantial amount of literature. In this paper, we propose a novel search
    system, named Co-Occurrence based on Co-Operational Formation with Advanced
    Method(COCOFAM) which is suitable for the large-scale literature analysis.
    COCOFAM is based on integrating both Spark for local clusters and a global job
    scheduler to gather crowdsourced co-occurrence data on global clusters. It will
    allow users to obtain information of their interests from the substantial
    amount of literature.

    Online bagging for recommendation with incremental matrix factorization

    João Vinagre, Alípio Mário Jorge, João Gama
    Comments: Presented at STREAMEVOLV 2016, held in conjunction with ECML/PKDD 2016, Riva del Garda, Italy, September 23rd, 2016
    Subjects: Information Retrieval (cs.IR)

    Online recommender systems often deal with continuous, potentially fast and
    unbounded flows of data. Ensemble methods for recommender systems have been
    used in the past in batch algorithms, however they have never been studied with
    incremental algorithms, that are capable of processing those data streams on
    the fly. We propose online bagging, using an incremental matrix factorization
    algorithm for positive-only data streams. Using prequential evaluation, we show
    that bagging is able to improve accuracy more than 35% over the baseline with
    small computational overhead.

    And the Winner is …: Bayesian Twitter-based Prediction on 2016 U.S. Presidential Election

    Elvyna Tunggawan, Yustinus Eko Soelistio
    Comments: This is the non-final version of the paper. The final version is published in the IC3INA 2016 Conference (3-5 Oct. 2016, this http URL). All citation should be directed to the final version
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    This paper describes a Naive-Bayesian predictive model for 2016 U.S.
    Presidential Election based on Twitter data. We use 33,708 tweets gathered
    since December 16, 2015 until February 29, 2016. We introduce a simpler data
    preprocessing method to label the data and train the model. The model achieves
    95.8% accuracy on 10-fold cross validation and predicts Ted Cruz and Bernie
    Sanders as Republican and Democratic nominee respectively. It achieves a
    comparable result to those in its competitor methods.

    The Deep Journey from Content to Collaborative Filtering

    Oren Barkan, Noam Koenigstein, Eylon Yogev
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Learning (cs.LG)

    In Recommender Systems research, algorithms are often characterized as either
    Collaborative Filtering (CF) or Content Based (CB). CF algorithms are trained
    using a dataset of user explicit or implicit preferences while CB algorithms
    are typically based on item profiles. These approaches harness very different
    data sources hence the resulting recommended items are generally also very
    different. This paper presents a novel model that serves as a bridge from items
    content into their CF representations. We introduce a multiple input deep
    regression model to predict the CF latent embedding vectors of items based on
    their textual description and metadata. We showcase the effectiveness of the
    proposed model by predicting the CF vectors of movies and apps based on their
    textual descriptions. Finally, we show that the model can be further improved
    by incorporating metadata such as the movie release year and tags which
    contribute to a higher accuracy.


    Computation and Language

    Fuzzy paraphrases in learning word representations with a corpus and a lexicon

    Yuanzhi Ke, Masafumi Hagiwara
    Comments: 10 pages, 1 figures, 5 tables. Under review as a conference paper at ICLR 2017
    Subjects: Computation and Language (cs.CL)

    We figure out a trap that is not carefully addressed in the previous works
    using lexicons or ontologies to train or improve distributed word
    representations: For polysemantic words and utterances changing meaning in
    different contexts, their paraphrases or related entities in a lexicon or an
    ontology are unreliable and sometimes deteriorate the learning of word
    representations. Thus, we propose an approach to address the problem that
    considers each paraphrase of a word in a lexicon not fully a paraphrase, but a
    fuzzy member (i.e., fuzzy paraphrase) in the paraphrase set whose degree of
    truth (i.e., membership) depends on the contexts. Then we propose an efficient
    method to use the fuzzy paraphrases to learn word embeddings. We approximately
    estimate the local membership of paraphrases, and train word embeddings using a
    lexicon jointly by replacing the words in the contexts with their paraphrases
    randomly subject to the membership of each paraphrase. The experimental results
    show that our method is efficient, overcomes the weakness of the previous
    related works in extracting semantic information and outperforms the previous
    works of learning word representations using lexicons.

    Ordinal Common-sense Inference

    Sheng Zhang, Rachel Rudinger, Kevin Duh, Benjamin Van Durme
    Subjects: Computation and Language (cs.CL)

    Humans have the capacity to draw common-sense inferences from natural
    language: various things that are likely but not certain to hold based on
    established discourse, and are rarely stated explicitly. We propose an
    evaluation of automated common-sense inference based on an extension of
    recognizing textual entailment: predicting ordinal human responses of
    subjective likelihood of an inference holding in a given context. We describe a
    framework for extracting common-sense knowledge for corpora, which is then used
    to construct a dataset for this ordinal entailment task, which we then use to
    train and evaluate a sequence to sequence neural network model. Further, we
    annotate subsets of previously established datasets via our ordinal annotation
    protocol in order to then analyze the distinctions between these and what we
    have constructed.

    The Intelligent Voice 2016 Speaker Recognition System

    Abbas Khosravani, Cornelius Glackin, Nazim Dugan, Gérard Chollet, Nigel Cannings
    Comments: 7 pages, 3 figures, NIST SRE 2016 Workshop
    Subjects: Computation and Language (cs.CL); Sound (cs.SD); Machine Learning (stat.ML)

    This paper presents the Intelligent Voice (IV) system submitted to the NIST
    2016 Speaker Recognition Evaluation (SRE). The primary emphasis of SRE this
    year was on developing speaker recognition technology which is robust for novel
    languages that are much more heterogeneous than those used in the current
    state-of-the-art, using significantly less training data, that does not contain
    meta-data from those languages. The system is based on the state-of-the-art
    i-vector/PLDA which is developed on the fixed training condition, and the
    results are reported on the protocol defined on the development set of the
    challenge.

    Detecting Context Dependent Messages in a Conversational Environment

    Chaozhuo Li, Yu Wu, Wei Wu, Chen Xing, Zhoujun Li, Ming Zhou
    Subjects: Computation and Language (cs.CL)

    While automatic response generation for building chatbot systems has drawn a
    lot of attention recently, there is limited understanding on when we need to
    consider the linguistic context of an input text in the generation process. The
    task is challenging, as messages in a conversational environment are short and
    informal, and evidence that can indicate a message is context dependent is
    scarce. After a study of social conversation data crawled from the web, we
    observed that some characteristics estimated from the responses of messages are
    discriminative for identifying context dependent messages. With the
    characteristics as weak supervision, we propose using a Long Short Term Memory
    (LSTM) network to learn a classifier. Our method carries out text
    representation and classifier learning in a unified framework. Experimental
    results show that the proposed method can significantly outperform baseline
    methods on accuracy of classification.

    Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text

    Ameya Prabhu, Aditya Joshi, Manish Shrivastava, Vasudeva Varma
    Comments: Accepted paper at COLING 2016
    Subjects: Computation and Language (cs.CL)

    Sentiment analysis (SA) using code-mixed data from social media has several
    applications in opinion mining ranging from customer satisfaction to social
    campaign analysis in multilingual societies. Advances in this area are impeded
    by the lack of a suitable annotated dataset. We introduce a Hindi-English
    (Hi-En) code-mixed dataset for sentiment analysis and perform empirical
    analysis comparing the suitability and performance of various state-of-the-art
    SA methods in social media.

    In this paper, we introduce learning sub-word level representations in LSTM
    (Subword-LSTM) architecture instead of character-level or word-level
    representations. This linguistic prior in our architecture enables us to learn
    the information about sentiment value of important morphemes. This also seems
    to work well in highly noisy text containing misspellings as shown in our
    experiments which is demonstrated in morpheme-level feature maps learned by our
    model. Also, we hypothesize that encoding this linguistic prior in the
    Subword-LSTM architecture leads to the superior performance. Our system attains
    accuracy 4-5% greater than traditional approaches on our dataset, and also
    outperforms the available system for sentiment analysis in Hi-En code-mixed
    text by 18%.

    Structure vs. Language: Investigating the Multi-factors of Asymmetric Opinions on Online Social Interrelationship with a Case Study

    Bo Wang, Yingjun Sun, Yuan Wang
    Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)

    Though current researches often study the properties of online social
    relationship from an objective view, we also need to understand individuals’
    subjective opinions on their interrelationships in social computing studies.
    Inspired by the theories from sociolinguistics, the latest work indicates that
    interactive language can reveal individuals’ asymmetric opinions on their
    interrelationship. In this work, in order to explain the opinions’ asymmetry on
    interrelationship with more latent factors, we extend the investigation from
    single relationship to the structural context in online social network. We
    analyze the correlation between interactive language features and the
    structural context of interrelationships. The structural context of vertex,
    edges and triangles in social network are considered. With statistical analysis
    on Enron email dataset, we find that individuals’ opinions (measured by
    interactive language features) on their interrelationship are related to some
    of their important structural context in social network. This result can help
    us to understand and measure the individuals’ opinions on their
    interrelationship with more intrinsic information.

    Measuring Asymmetric Opinions on Online Social Interrelationship by Synthetizing the Interactive Language and Social Network Features

    Bo Wang, Yanshu Yu, Yuan Wang
    Comments: arXiv admin note: text overlap with arXiv:1409.2450 by other authors
    Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)

    Instead of studying the properties of social relationship from an objective
    view, in this paper, we focus on individuals’ subjective and asymmetric
    opinions on their interrelationships. Inspired by the theories from
    sociolinguistics, we investigate two individuals’ opinions on their
    interrelationship with their interactive language features. Eliminating the
    difference of personal language style, we clarify that the asymmetry of
    interactive language feature values can indicate individuals’ asymmetric
    opinions on their interrelationship. We also discuss how the degree of
    opinions’ asymmetry is related to the individuals’ personality traits.
    Furthermore, to measure the individuals’ asymmetric opinions on
    interrelationship concretely, we develop a novel model synthetizing interactive
    language and social network features. The experimental results with Enron email
    dataset provide multiple evidences of the asymmetric opinions on
    interrelationship, and also verify the effectiveness of the proposed model in
    measuring the degree of opinions’ asymmetry.

    Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks

    Hao Wang, Xingjian Shi, Dit-Yan Yeung
    Comments: To appear at NIPS 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Hybrid methods that utilize both content and rating information are commonly
    used in many recommender systems. However, most of them use either handcrafted
    features or the bag-of-words representation as a surrogate for the content
    information but they are neither effective nor natural enough. To address this
    problem, we develop a collaborative recurrent autoencoder (CRAE) which is a
    denoising recurrent autoencoder (DRAE) that models the generation of content
    sequences in the collaborative filtering (CF) setting. The model generalizes
    recent advances in recurrent deep learning from i.i.d. input to non-i.i.d.
    (CF-based) input and provides a new denoising scheme along with a novel
    learnable pooling scheme for the recurrent autoencoder. To do this, we first
    develop a hierarchical Bayesian model for the DRAE and then generalize it to
    the CF setting. The synergy between denoising and CF enables CRAE to make
    accurate recommendations while learning to fill in the blanks in sequences.
    Experiments on real-world datasets from different domains (CiteULike and
    Netflix) show that, by jointly modeling the order-aware generation of sequences
    for the content information and performing CF for the ratings, CRAE is able to
    significantly outperform the state of the art on both the recommendation task
    based on ratings and the sequence generation task based on content information.

    Natural-Parameter Networks: A Class of Probabilistic Neural Networks

    Hao Wang, Xingjian Shi, Dit-Yan Yeung
    Comments: To appear at NIPS 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Neural networks (NN) have achieved state-of-the-art performance in various
    applications. Unfortunately in applications where training data is
    insufficient, they are often prone to overfitting. One effective way to
    alleviate this problem is to exploit the Bayesian approach by using Bayesian
    neural networks (BNN). Another shortcoming of NN is the lack of flexibility to
    customize different distributions for the weights and neurons according to the
    data, as is often done in probabilistic graphical models. To address these
    problems, we propose a class of probabilistic neural networks, dubbed
    natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment
    of NN. NPN allows the usage of arbitrary exponential-family distributions to
    model the weights and neurons. Different from traditional NN and BNN, NPN takes
    distributions as input and goes through layers of transformation before
    producing distributions to match the target output distributions. As a Bayesian
    treatment, efficient backpropagation (BP) is performed to learn the natural
    parameters for the distributions over both the weights and neurons. The output
    distributions of each layer, as byproducts, may be used as second-order
    representations for the associated tasks such as link prediction. Experiments
    on real-world datasets show that NPN can achieve state-of-the-art performance.

    And the Winner is …: Bayesian Twitter-based Prediction on 2016 U.S. Presidential Election

    Elvyna Tunggawan, Yustinus Eko Soelistio
    Comments: This is the non-final version of the paper. The final version is published in the IC3INA 2016 Conference (3-5 Oct. 2016, this http URL). All citation should be directed to the final version
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    This paper describes a Naive-Bayesian predictive model for 2016 U.S.
    Presidential Election based on Twitter data. We use 33,708 tweets gathered
    since December 16, 2015 until February 29, 2016. We introduce a simpler data
    preprocessing method to label the data and train the model. The model achieves
    95.8% accuracy on 10-fold cross validation and predicts Ted Cruz and Bernie
    Sanders as Republican and Democratic nominee respectively. It achieves a
    comparable result to those in its competitor methods.

    The Deep Journey from Content to Collaborative Filtering

    Oren Barkan, Noam Koenigstein, Eylon Yogev
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Learning (cs.LG)

    In Recommender Systems research, algorithms are often characterized as either
    Collaborative Filtering (CF) or Content Based (CB). CF algorithms are trained
    using a dataset of user explicit or implicit preferences while CB algorithms
    are typically based on item profiles. These approaches harness very different
    data sources hence the resulting recommended items are generally also very
    different. This paper presents a novel model that serves as a bridge from items
    content into their CF representations. We introduce a multiple input deep
    regression model to predict the CF latent embedding vectors of items based on
    their textual description and metadata. We showcase the effectiveness of the
    proposed model by predicting the CF vectors of movies and apps based on their
    textual descriptions. Finally, we show that the model can be further improved
    by incorporating metadata such as the movie release year and tags which
    contribute to a higher accuracy.


    Distributed, Parallel, and Cluster Computing

    A Balanced Parallel Distributed Sorting Implemented with PGX.D

    Zahra Khatami, Sungpack Hong, Jinsu Lee, Siegfried Depner, Hassan Chafi
    Comments: 8 pages, 12 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Sorting has been one of the most challenging studied problems in different
    scientific researches. Although many techniques and algorithms has been
    proposed on the theory of efficient parallel sorting implementation, however
    achieving the desired performance on the variety of architectures with the
    large number of processors is still the challenging issue. Maximizing the
    parallelism level in the application can be achieved by minimizing the overhead
    due to load imbalance and waiting time due to the memory latencies. In this
    paper, we present a distributed sorting implemented in PGX.D, a fast
    distributed graph processing system, which outperforms Spark distributed
    sorting by around 2x-3x by hiding communication latencies and minimizing
    unnecessary overheads. Furthermore, it shows that the proposed PGX.D sorting
    method handles duplicated data efficiently and always results in having load
    balance for different input data distribution types.

    Per-Server Dominant-Share Fairness (PS-DSF): A Multi-Resource Fair Allocation Mechanism for Heterogeneous Servers

    Jalal Khamse-Ashari, Ioannis Lambadaris, George Kesidis, Bhuvan Urgaonkar, Yiqiang Zhao
    Comments: 10 pages, 7 figures, technical report
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Users of cloud computing platforms pose different types of demands for
    multiple resources on servers (physical or virtual machines). Besides
    differences in their resource capacities, servers may be additionally
    heterogeneous in their ability to service users – certain users’ tasks may only
    be serviced by a subset of the servers. We identify important shortcomings in
    existing multi-resource fair allocation mechanisms – Dominant Resource Fairness
    (DRF) and its follow up work – when used in such environments. We develop a new
    fair allocation mechanism called Per-Server Dominant-Share Fairness (PS-DSF)
    which we show offers all desirable sharing properties that DRF is able to offer
    in the case of a single “resource pool” (i.e., if the resources of all servers
    were pooled together into one hypothetical server). We evaluate the performance
    of PS-DSF through simulations. Our evaluation shows the enhanced efficiency of
    PS-DSF compared to the existing allocation mechanisms. We argue how our
    proposed allocation mechanism is applicable in cloud computing networks and
    especially large scale data-centers.

    Scalable Semi-Supervised Learning over Networks using Nonsmooth Convex Optimization

    Alexander Jung, Alfred O. Hero III, Alexandru Mara, Sabeur Aridhi
    Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

    We propose a scalable method for semi-supervised (transductive) learning from
    massive network-structured datasets. Our approach to semi-supervised learning
    is based on representing the underlying hypothesis as a graph signal with small
    total variation. Requiring a small total variation of the graph signal
    representing the underlying hypothesis corresponds to the central smoothness
    assumption that forms the basis for semi-supervised learning, i.e., input
    points forming clusters have similar output values or labels. We formulate the
    learning problem as a nonsmooth convex optimization problem which we solve by
    appealing to Nesterovs optimal first-order method for nonsmooth optimization.
    We also provide a message passing formulation of the learning method which
    allows for a highly scalable implementation in big data frameworks.

    Hybrid CPU-GPU generation of the Hamiltonian and Overlap matrices in FLAPW methods

    Diego Fabregat-Traver (1), Davor Davidović (2), Markus Höhnerbach (1), Edoardo Di Napoli (3 and 4) ((1) AICES, RWTH Aachen University, (2) RBI, Zagreb, Croatia, (3) Jülich Supercomputing Centre, (4) Jülich Aachen Research Alliance — High-performance Computing)
    Subjects: Computational Engineering, Finance, and Science (cs.CE); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

    In this paper we focus on the integration of high-performance numerical
    libraries in ab initio codes and the portability of performance and
    scalability. The target of our work is FLEUR, a software for electronic
    structure calculations developed in the Forschungszentrum J”ulich over the
    course of two decades. The presented work follows up on a previous effort to
    modernize legacy code by re-engineering and rewriting it in terms of highly
    optimized libraries. We illustrate how this initial effort to get efficient and
    portable shared-memory code enables fast porting of the code to emerging
    heterogeneous architectures. More specifically, we port the code to nodes
    equipped with multiple GPUs. We divide our study in two parts. First, we show
    considerable speedups attained by minor and relatively straightforward code
    changes to off-load parts of the computation to the GPUs. Then, we identify
    further possible improvements to achieve even higher performance and
    scalability. On a system consisting of 16-cores and 2 GPUs, we observe speedups
    of up to 5x with respect to our optimized shared-memory code, which in turn
    means between 7.5x and 12.5x speedup with respect to the original FLEUR code.

    A bioinformatics system for searching Co-Occurrence based on Co-Operational Formation with Advanced Method (COCOFAM)

    Junseok Park, Gwangmin Kim, Dongjin Jang, Sungji Choo, Sunghwa Bae, Doheon Lee
    Comments: 5 pages, 4 figures
    Subjects: Information Retrieval (cs.IR); Distributed, Parallel, and Cluster Computing (cs.DC)

    Literature analysis is a key step in obtaining background information in
    biomedical research. However, it is difficult for researchers to obtain
    knowledge of their interests in an efficient manner because of the massive
    amount of the published biomedical literature. Therefore, efficient and
    systematic search strategies are required, which allow ready access to the
    substantial amount of literature. In this paper, we propose a novel search
    system, named Co-Occurrence based on Co-Operational Formation with Advanced
    Method(COCOFAM) which is suitable for the large-scale literature analysis.
    COCOFAM is based on integrating both Spark for local clusters and a global job
    scheduler to gather crowdsourced co-occurrence data on global clusters. It will
    allow users to obtain information of their interests from the substantial
    amount of literature.


    Learning

    Why and When Can Deep — but Not Shallow — Networks Avoid the Curse of Dimensionality

    Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Miranda, Qianli Liao
    Subjects: Learning (cs.LG)

    The paper reviews an emerging body of theoretical results on deep learning
    including the conditions under which it can be exponentially better than
    shallow learning. Deep convolutional networks represent an important special
    case of these conditions, though weight sharing is not the main reason for
    their exponential advantage. Explanation of a few key theorems is provided
    together with new results, open problems and conjectures.

    Scalable Semi-Supervised Learning over Networks using Nonsmooth Convex Optimization

    Alexander Jung, Alfred O. Hero III, Alexandru Mara, Sabeur Aridhi
    Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

    We propose a scalable method for semi-supervised (transductive) learning from
    massive network-structured datasets. Our approach to semi-supervised learning
    is based on representing the underlying hypothesis as a graph signal with small
    total variation. Requiring a small total variation of the graph signal
    representing the underlying hypothesis corresponds to the central smoothness
    assumption that forms the basis for semi-supervised learning, i.e., input
    points forming clusters have similar output values or labels. We formulate the
    learning problem as a nonsmooth convex optimization problem which we solve by
    appealing to Nesterovs optimal first-order method for nonsmooth optimization.
    We also provide a message passing formulation of the learning method which
    allows for a highly scalable implementation in big data frameworks.

    The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

    Chris J. Maddison, Andriy Mnih, Yee Whye Teh
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    The reparameterization trick enables the optimization of large scale
    stochastic computation graphs via gradient descent. The essence of the trick is
    to refactor each stochastic node into a differentiable function of its
    parameters and a random variable with fixed distribution. After refactoring,
    the gradients of the loss propagated by the chain rule through the graph are
    low variance unbiased estimators of the gradients of the expected loss. While
    many continuous random variables have such reparameterizations, discrete random
    variables lack continuous reparameterizations due to the discontinuous nature
    of discrete states. In this work we introduce concrete random variables —
    continuous relaxations of discrete random variables. The concrete distribution
    is a new family of distributions with closed form densities and a simple
    reparameterization. Whenever a discrete stochastic node of a computation graph
    can be refactored into a one-hot bit representation that is treated
    continuously, concrete stochastic nodes can be used with automatic
    differentiation to produce low-variance biased gradients of objectives
    (including objectives that depend on the log-likelihood of latent stochastic
    nodes) on the corresponding discrete graph. We demonstrate their effectiveness
    on density estimation and structured prediction tasks using neural networks.

    TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games

    Gabriel Synnaeve, Nantas Nardelli, Alex Auvolat, Soumith Chintala, Timothée Lacroix, Zeming Lin, Florian Richoux, Nicolas Usunier
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    We present TorchCraft, an open-source library that enables deep learning
    research on Real-Time Strategy (RTS) games such as StarCraft: Brood War, by
    making it easier to control these games from a machine learning framework, here
    Torch. This white paper argues for using RTS games as a benchmark for AI
    research, and describes the design and components of TorchCraft.

    Online Multi-view Clustering with Incomplete Views

    Weixiang Shao, Lifang He, Chun-Ta Lu, Philip S. Yu
    Subjects: Learning (cs.LG)

    In the era of big data, it is common to have data with multiple modalities or
    coming from multiple sources, known as “multi-view data”. Multi-view clustering
    provides a natural way to generate clusters from such data. Since different
    views share some consistency and complementary information, previous works on
    multi-view clustering mainly focus on how to combine various numbers of views
    to improve clustering performance. However, in reality, each view may be
    incomplete, i.e., instances missing in the view. Furthermore, the size of data
    could be extremely huge. It is unrealistic to apply multi-view clustering in
    large real-world applications without considering the incompleteness of views
    and the memory requirement. None of previous works have addressed all these
    challenges simultaneously. In this paper, we propose an online multi-view
    clustering algorithm, OMVC, which deals with large-scale incomplete views. We
    model the multi-view clustering problem as a joint weighted nonnegative matrix
    factorization problem and process the multi-view data chunk by chunk to reduce
    the memory requirement. OMVC learns the latent feature matrices for all the
    views and pushes them towards a consensus. We further increase the robustness
    of the learned latent feature matrices in OMVC via lasso regularization. To
    minimize the influence of incompleteness, dynamic weight setting is introduced
    to give lower weights to the incoming missing instances in different views.
    More importantly, to reduce the computational time, we incorporate a faster
    projected gradient descent by utilizing the Hessian matrices in OMVC. Extensive
    experiments conducted on four real data demonstrate the effectiveness of the
    proposed OMVC method.

    Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks

    Hao Wang, Xingjian Shi, Dit-Yan Yeung
    Comments: To appear at NIPS 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Hybrid methods that utilize both content and rating information are commonly
    used in many recommender systems. However, most of them use either handcrafted
    features or the bag-of-words representation as a surrogate for the content
    information but they are neither effective nor natural enough. To address this
    problem, we develop a collaborative recurrent autoencoder (CRAE) which is a
    denoising recurrent autoencoder (DRAE) that models the generation of content
    sequences in the collaborative filtering (CF) setting. The model generalizes
    recent advances in recurrent deep learning from i.i.d. input to non-i.i.d.
    (CF-based) input and provides a new denoising scheme along with a novel
    learnable pooling scheme for the recurrent autoencoder. To do this, we first
    develop a hierarchical Bayesian model for the DRAE and then generalize it to
    the CF setting. The synergy between denoising and CF enables CRAE to make
    accurate recommendations while learning to fill in the blanks in sequences.
    Experiments on real-world datasets from different domains (CiteULike and
    Netflix) show that, by jointly modeling the order-aware generation of sequences
    for the content information and performing CF for the ratings, CRAE is able to
    significantly outperform the state of the art on both the recommendation task
    based on ratings and the sequence generation task based on content information.

    Natural-Parameter Networks: A Class of Probabilistic Neural Networks

    Hao Wang, Xingjian Shi, Dit-Yan Yeung
    Comments: To appear at NIPS 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Neural networks (NN) have achieved state-of-the-art performance in various
    applications. Unfortunately in applications where training data is
    insufficient, they are often prone to overfitting. One effective way to
    alleviate this problem is to exploit the Bayesian approach by using Bayesian
    neural networks (BNN). Another shortcoming of NN is the lack of flexibility to
    customize different distributions for the weights and neurons according to the
    data, as is often done in probabilistic graphical models. To address these
    problems, we propose a class of probabilistic neural networks, dubbed
    natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment
    of NN. NPN allows the usage of arbitrary exponential-family distributions to
    model the weights and neurons. Different from traditional NN and BNN, NPN takes
    distributions as input and goes through layers of transformation before
    producing distributions to match the target output distributions. As a Bayesian
    treatment, efficient backpropagation (BP) is performed to learn the natural
    parameters for the distributions over both the weights and neurons. The output
    distributions of each layer, as byproducts, may be used as second-order
    representations for the associated tasks such as link prediction. Experiments
    on real-world datasets show that NPN can achieve state-of-the-art performance.

    Distributed Mean Estimation with Limited Communication

    Ananda Theertha Suresh, Felix X. Yu, H. Brendan McMahan, Sanjiv Kumar
    Subjects: Learning (cs.LG)

    Motivated by the need for distributed optimization algorithms with low
    communication cost, we study communication efficient algorithms to perform
    distributed mean estimation. We study scenarios in which each client sends one
    bit per dimension. We first show that for (d) dimensional data with (n)
    clients, a naive stochastic rounding approach yields a mean squared error
    (Theta(d/n)). We then show by applying a structured random rotation of the
    data (an (mathcal{O}(d log d)) algorithm), the error can be reduced to
    (mathcal{O}((log d)/n)). The algorithms and the analysis make no
    distributional assumptions on the data.

    Deep counter networks for asynchronous event-based processing

    Jonathan Binas, Giacomo Indiveri, Michael Pfeiffer
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Despite their advantages in terms of computational resources, latency, and
    power consumption, event-based implementations of neural networks have not been
    able to achieve the same performance figures as their equivalent
    state-of-the-art deep network models. We propose counter neurons as minimal
    spiking neuron models which only require addition and comparison operations,
    thus avoiding costly multiplications. We show how inference carried out in deep
    counter networks converges to the same accuracy levels as are achieved with
    state-of-the-art conventional networks. As their event-based style of
    computation leads to reduced latency and sparse updates, counter networks are
    ideally suited for efficient compact and low-power hardware implementation. We
    present theory and training methods for counter networks, and demonstrate on
    the MNIST benchmark that counter networks converge quickly, both in terms of
    time and number of operations required, to state-of-the-art classification
    accuracy.

    Deep Neural Networks for HDR imaging

    Kshiteej Sheth
    Comments: 9 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose novel methods of solving two tasks using Convolutional Neural
    Networks, firstly the task of generating HDR map of a static scene using
    differently exposed LDR images of the scene captured using conventional cameras
    and secondly the task of finding an optimal tone mapping operator that would
    give a better score on the TMQI metric compared to the existing methods. We
    quantitatively show the performance of our networks and illustrate the cases
    where our networks performs good as well as bad.

    The Deep Journey from Content to Collaborative Filtering

    Oren Barkan, Noam Koenigstein, Eylon Yogev
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Learning (cs.LG)

    In Recommender Systems research, algorithms are often characterized as either
    Collaborative Filtering (CF) or Content Based (CB). CF algorithms are trained
    using a dataset of user explicit or implicit preferences while CB algorithms
    are typically based on item profiles. These approaches harness very different
    data sources hence the resulting recommended items are generally also very
    different. This paper presents a novel model that serves as a bridge from items
    content into their CF representations. We introduce a multiple input deep
    regression model to predict the CF latent embedding vectors of items based on
    their textual description and metadata. We showcase the effectiveness of the
    proposed model by predicting the CF vectors of movies and apps based on their
    textual descriptions. Finally, we show that the model can be further improved
    by incorporating metadata such as the movie release year and tags which
    contribute to a higher accuracy.

    The Machine Learning Algorithm as Creative Musical Tool

    Rebecca Fiebrink, Baptiste Caramiaux
    Comments: Pre-print to appear in the Oxford Handbook on Algorithmic Music. Oxford University Press
    Subjects: Human-Computer Interaction (cs.HC); Learning (cs.LG)

    Machine learning is the capacity of a computational system to learn
    structures from datasets in order to make predictions on newly seen data. Such
    an approach offers a significant advantage in music scenarios in which
    musicians can teach the system to learn an idiosyncratic style, or can break
    the rules to explore the system’s capacity in unexpected ways. In this chapter
    we draw on music, machine learning, and human-computer interaction to elucidate
    an understanding of machine learning algorithms as creative tools for music and
    the sonic arts. We motivate a new understanding of learning algorithms as
    human-computer interfaces. We show that, like other interfaces, learning
    algorithms can be characterised by the ways their affordances intersect with
    goals of human users. We also argue that the nature of interaction between
    users and algorithms impacts the usability and usefulness of those algorithms
    in profound ways. This human-centred view of machine learning motivates our
    concluding discussion of what it means to employ machine learning as a creative
    tool.

    Precise deep neural network computation on imprecise low-power analog hardware

    Jonathan Binas, Daniel Neil, Giacomo Indiveri, Shih-Chii Liu, Michael Pfeiffer
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    There is an urgent need for compact, fast, and power-efficient hardware
    implementations of state-of-the-art artificial intelligence. Here we propose a
    power-efficient approach for real-time inference, in which deep neural networks
    (DNNs) are implemented through low-power analog circuits. Although analog
    implementations can be extremely compact, they have been largely supplanted by
    digital designs, partly because of device mismatch effects due to fabrication.
    We propose a framework that exploits the power of Deep Learning to compensate
    for this mismatch by incorporating the measured variations of the devices as
    constraints in the DNN training process. This eliminates the use of mismatch
    minimization strategies such as the use of very large transistors, and allows
    circuit complexity and power-consumption to be reduced to a minimum. Our
    results, based on large-scale simulations as well as a prototype VLSI chip
    implementation indicate at least a 3-fold improvement of processing efficiency
    over current digital implementations.


    Information Theory

    A Novel Hybrid Beamforming Algorithm with Unified Analog Beamforming by Subspace Construction Based on Partial CSI for Massive MIMO-OFDM Systems

    Dengkui Zhu, Boyu Li, Ping Liang
    Comments: accepted to journal
    Subjects: Information Theory (cs.IT)

    Hybrid beamforming (HB) has been widely studied for reducing the number of
    costly radio frequency (RF) chains in massive multiple-input multiple-output
    (MIMO) systems. However, previous works on HB are limited to a single user
    equipment (UE) or a single group of UEs, employing the frequency-flat
    first-level analog beamforming (AB) that cannot be applied to multiple groups
    of UEs served in different frequency resources in an orthogonal
    frequency-division multiplexing (OFDM) system. In this paper, a novel HB
    algorithm with unified AB based on the spatial covariance matrix (SCM)
    knowledge of all UEs is proposed for a massive MIMO-OFDM system in order to
    support multiple groups of UEs. The proposed HB method with a much smaller
    number of RF chains can achieve more than 95% performance of full digital
    beamforming. In addition, a novel practical subspace construction (SC)
    algorithm based on partial channel state information is proposed to estimate
    the required SCM. The proposed SC method can offer more than 97% performance of
    the perfect SCM case. With the proposed methods, significant cost and power
    savings can be achieved without large loss in performance. Furthermore, the
    proposed methods can be applied to massive MIMO-OFDM systems in both
    time-division duplex and frequency-division duplex.

    A novel 2D non-stationary wideband massive MIMO channel model

    C.F. Lopez, C.-X. Wang, R. Feng
    Comments: 6 pages, 5 figures, conference
    Journal-ref: IEEE International Workshop on Computer Aided Modelling and Design
    of Communication Links and Networks (CAMAD), Toronto, Canada, Oct. 2016
    Subjects: Information Theory (cs.IT)

    In this paper, a novel two-dimensional (2D) non-stationary wideband
    geometry-based stochastic model (GBSM) for massive multiple-input
    multiple-output (MIMO) communication systems is proposed. Key characteristics
    of massive MIMO channels such as near field effects and cluster evolution along
    the array are addressed in this model. Near field effects are modelled by a
    second-order approximation to spherical wavefronts, i.e., parabolic wavefronts,
    leading to linear drifts of the angles of multipath components (MPCs) and
    non-stationarity along the array. Cluster evolution along the array involving
    cluster (dis)appearance and smooth average power variations is considered.
    Cluster (dis)appearance is modeled by a two-state Markov process and smooth
    average power variations are modelled by a spatial lognormal process.
    Statistical properties of the channel model such as time autocorrelation
    function (ACF), spatial cross-correlation function (CCF), and cluster average
    power and Rician factor variations over the array are derived. Finally,
    simulation results are presented and analyzed, demonstrating that parabolic
    wavefronts and cluster soft evolution are good candidates to model important
    massive MIMO channel characteristics.

    A High Throughput Pilot Allocation for M2M Communication in Crowded Massive MIMO Systems

    Huimei Han, Xudong Guo, Ying Li
    Comments: 5 pages,6 figures
    Subjects: Information Theory (cs.IT)

    A new scheme to resolve the intra-cell pilot collision for M2M communication
    in crowded massive multiple-input multiple-output (MIMO) systems is proposed.
    The proposed scheme permits those failed user equipments (UEs), judged by a
    strongest-user collision resolution (SUCR) protocol, to contend for the idle
    pilots, i.e., the pilots that are not selected by any UE in the initial step.
    This scheme is called as SUCR combined idle pilots access (SUCR-IPA). To
    analyze the performance of the SUCR-IPA scheme, we develop a simple method to
    compute the access success probability of the UEs in each random access slot
    (RAST). The simulation results coincide well with the analysis. It is also
    shown that, compared to the SUCR protocol, the proposed SUCR-IPA scheme
    increases the throughput of the system significantly, and thus decreases the
    number of access attempts dramatically.

    Asynchronous Peak Detection for Demodulation in Molecular Communication

    Adam Noel, Andrew W. Eckford
    Comments: 6 pages, 1 table, 4 figures. Submitted to IEEE ICC 2017
    Subjects: Information Theory (cs.IT)

    Molecular communication requires low-complexity symbol detection algorithms
    to deal with the many sources of uncertainty that are inherent in these
    channels. This paper proposes two variants of a high-performance asynchronous
    peak detection algorithm for a receiver that makes independent observations.
    The first variant has low complexity and measures the largest observation
    within a sampling interval. The second variant adds decision feedback to
    mitigate inter-symbol interference. Although the algorithm does not require
    synchronization between the transmitter and receiver, results demonstrate that
    the bit error performance of symbol-by-symbol detection using the first variant
    is better than using a single sample whose sampling time is chosen a priori.
    The second variant is shown to have performance comparable to that of an energy
    detector. Both variants of the algorithm demonstrate better resilience to
    timing offsets than that of existing detectors.

    Directional Training and Fast Sector-based Processing Schemes for mmWave Channels

    Zheda Li, Nadisanka Rupasinghe, Ozgun Y. Bursalioglu, Chenwei Wang, Haralabos Papadopoulos, Giuseppe Caire
    Comments: 12 pages, 7 figures
    Subjects: Information Theory (cs.IT)

    We consider a single-cell scenario involving a single base station (BS) with
    a massive array serving multi-antenna terminals in the downlink of a mmWave
    channel. We present a class of multiuser user MIMO schemes, which rely on
    uplink training from the user terminals, and on uplink/downlink channel
    reciprocity. The BS employs virtual sector-based processing according to which,
    user-channel estimation and data transmission are performed in parallel over
    non-overlapping angular sectors. The uplink training schemes we consider are
    non-orthogonal, that is, we allow multiple users to transmit pilots on the same
    pilot dimension (thereby potentially interfering with one another). Elementary
    processing allows each sector to determine the subset of user channels that can
    be resolved on the sector (effectively pilot contamination free) and, thus, the
    subset of users that can be served by the sector. This allows resolving
    multiple users on the same pilot dimension at different sectors, thereby
    increasing the overall multiplexing gains of the system. Our analysis and
    simulations reveal that, by using appropriately designed directional training
    beams at the user terminals, the sector-based transmission schemes we present
    can yield substantial spatial multiplexing and ergodic user-rates improvements
    with respect to their orthogonal-training counterparts.

    Bounds for the (l_1)-distance of (q)-ary lattices obtained via Constructions D, D(^{'}) and (overline{D})

    Eleonesio Strey, Sueli I. R. Costa
    Comments: 15 pages
    Subjects: Information Theory (cs.IT)

    Lattices have been used in several problems in coding theory and
    cryptography. In this paper we approach (q)-ary lattices obtained via
    Constructions D, (D’) and (overline{D}). It is shown connections between
    Constructions D and (D’). Bounds for the minimum (l_1)-distance of lattices
    (Lambda_{D}), (Lambda_{D’}) and (Lambda_{overline{D}}) and, under certain
    conditions, a generator matrix for (Lambda_{D’}) are presented. In addition,
    when the chain of codes used is closed under the zero-one addition, we derive
    explicit expressions for the minimum (l_1)-distances of the lattices
    (Lambda_{D}) and (Lambda_{overline{D}}) attached to the distances of the
    codes used in these constructions.

    Compositionality Results for Quantitative Information Flow

    Yusuke Kawamoto, Konstantinos Chatzikokolakis, Catuscia Palamidessi
    Comments: 30 pages
    Subjects: Logic in Computer Science (cs.LO); Information Theory (cs.IT)

    In the min-entropy approach to quantitative information flow, the leakage is
    defined in terms of a minimization problem, which, in the case of large
    systems, can be computationally rather heavy. The same happens for the recently
    proposed generalization called g-vulnerability. In this paper we study the case
    in which the channel associated to the system can be decomposed into simpler
    channels, which typically happens when the observables consist of multiple
    components. Our main contribution is the derivation of bounds on the g-leakage
    of the whole system in terms of the g-leakages of its components. We also
    consider the particular cases of min-entropy leakage and of parallel channels,
    generalizing and systematizing results from the literature. We demonstrate the
    effectiveness of our method and evaluate the precision of our bounds using
    examples.




沪ICP备19023445号-2号
友情链接