IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 9 Nov 2016

    我爱机器学习(52ml.net)发表于 2016-11-09 00:00:00
    love 0

    Neural and Evolutionary Computing

    Adversarial Ladder Networks

    Juan Maroñas Molano, Alberto Albiol Colomer, Roberto Paredes Palacios
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

    The use of unsupervised data in addition to supervised data in training
    discriminative neural networks has improved the performance of this clas-
    sification scheme. However, the best results were achieved with a training
    process that is divided in two parts: first an unsupervised pre-training step
    is done for initializing the weights of the network and after these weights are
    refined with the use of supervised data. On the other hand adversarial noise
    has improved the results of clas- sical supervised learning. Recently, a new
    neural network topology called Ladder Network, where the key idea is based in
    some properties of hierar- chichal latent variable models, has been proposed as
    a technique to train a neural network using supervised and unsupervised data at
    the same time with what is called semi-supervised learning. This technique has
    reached state of the art classification. In this work we add adversarial noise
    to the ladder network and get state of the art classification, with several
    important conclusions on how adversarial noise can help in addition with new
    possible lines of investi- gation. We also propose an alternative to add
    adversarial noise to unsu- pervised data.

    Unsupervised Pretraining for Sequence to Sequence Learning

    Prajit Ramachandran, Peter J. Liu, Quoc V. Le
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Sequence to sequence models are successful tools for supervised sequence
    learning tasks, such as machine translation. Despite their success, these
    models still require much labeled data and it is unclear how to improve them
    using unlabeled data, which is much less expensive to obtain. In this paper, we
    present simple changes that lead to a significant improvement in the accuracy
    of seq2seq models when the labeled set is small. Our method intializes the
    encoder and decoder of the seq2seq model with the trained weights of two
    language models, and then all weights are jointly fine-tuned with labeled data.
    An additional language modeling loss can be used to regularize the model during
    fine-tuning. We apply this method to low-resource tasks in machine translation
    and abstractive summarization and find that it significantly improves the
    subsequent supervised models. Our main finding is that the pretraining
    accelerates training and improves generalization of seq2seq models, achieving
    state-of-the-art results on the WMT English(
    ightarrow)German task. Our model
    obtains an improvement of 1.3 BLEU from the previous best models on both WMT’14
    and WMT’15 English(
    ightarrow)German. Our ablation study shows that
    pretraining helps seq2seq models in different ways depending on the nature of
    the task: translation benefits from the improved generalization whereas
    summarization benefits from the improved optimization.

    Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders

    Nat Dilokthanakul, Pedro A.M. Mediano, Marta Garnelo, Matthew C.H. Lee, Hugh Salimbeni, Kai Arulkumaran, Murray Shanahan
    Comments: 12 pages, 4 figures, Under review as a conference paper at ICLR 2017
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We study a variant of the variational autoencoder model with a Gaussian
    mixture as a prior distribution, with the goal of performing unsupervised
    clustering through deep generative models. We observe that the standard
    variational approach in these models is unsuited for unsupervised clustering,
    and mitigate this problem by leveraging a principled information-theoretic
    regularisation term known as consistency violation. Adding this term to the
    standard variational optimisation objective yields networks with both
    meaningful internal representations and well-defined clusters. We demonstrate
    the performance of this scheme on synthetic data, MNIST and SVHN, showing that
    the obtained clusters are distinct, interpretable and result in achieving
    higher performance on unsupervised clustering classification than previous
    approaches.

    The Neural Noisy Channel

    Lei Yu, Phil Blunsom, Chris Dyer, Edward Grefenstette, Tomas Kocisky
    Comments: ICLR 2017 submission
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    We formulate sequence to sequence transduction as a noisy channel decoding
    problem and use recurrent neural networks to parameterise the source and
    channel models. Unlike direct models which can suffer from explaining-away
    effects during training, noisy channel models must produce outputs that explain
    their inputs, and their component models can be trained with not only paired
    training samples but also unpaired samples from the marginal output
    distribution. Using a latent variable to control how much of the conditioning
    sequence the channel model needs to read in order to generate a subsequent
    symbol, we obtain a tractable and effective beam search decoder. Experimental
    results on abstractive sentence summarisation, morphological inflection, and
    machine translation show that noisy channel models outperform direct models,
    and that they significantly benefit from increased amounts of unpaired output
    data that direct models cannot easily use.

    Cognitive Discriminative Mappings for Rapid Learning

    Wen-Chieh Fang, Yi-ting Chiang
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Humans can learn concepts or recognize items from just a handful of examples,
    while machines require many more samples to perform the same task. In this
    paper, we build a computational model to investigate the possibility of this
    kind of rapid learning. The proposed method aims to improve the learning task
    of input from sensory memory by leveraging the information retrieved from
    long-term memory. We present a simple and intuitive technique called cognitive
    discriminative mappings (CDM) to explore the cognitive problem. First, CDM
    separates and clusters the data instances retrieved from long-term memory into
    distinct classes with a discrimination method in working memory when a sensory
    input triggers the algorithm. CDM then maps each sensory data instance to be as
    close as possible to the median point of the data group with the same class.
    The experimental results demonstrate that the CDM approach is effective for
    learning the discriminative features of supervised classifications with few
    training sensory input instances.

    An Efficient Approach to Boosting Performance of Deep Spiking Network Training

    Seongsik Park, Sang-gil Lee, Hyunha Nam, Sungroh Yoon
    Comments: 9 pages, 5 figures
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Nowadays deep learning is dominating the field of machine learning with
    state-of-the-art performance in various application areas. Recently, spiking
    neural networks (SNNs) have been attracting a great deal of attention, notably
    owning to their power efficiency, which can potentially allow us to implement a
    low-power deep learning engine suitable for real-time/mobile applications.
    However, implementing SNN-based deep learning remains challenging, especially
    gradient-based training of SNNs by error backpropagation. We cannot simply
    propagate errors through SNNs in conventional way because of the property of
    SNNs that process discrete data in the form of a series. Consequently, most of
    the previous studies employ a workaround technique, which first trains a
    conventional weighted-sum deep neural network and then maps the learning
    weights to the SNN under training, instead of training SNN parameters directly.
    In order to eliminate this workaround, recently proposed is a new class of SNN
    named deep spiking networks (DSNs), which can be trained directly (without a
    mapping from conventional deep networks) by error backpropagation with
    stochastic gradient descent. In this paper, we show that the initialization of
    the membrane potential on the backward path is an important step in DSN
    training, through diverse experiments performed under various conditions.
    Furthermore, we propose a simple and efficient method that can improve DSN
    training by controlling the initial membrane potential on the backward path. In
    our experiments, adopting the proposed approach allowed us to boost the
    performance of DSN training in terms of converging time and accuracy.

    Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks

    David Balduzzi, Brian McWilliams, Tony Butler-Yeoman
    Comments: 13 pages, 6 figures
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Modern convolutional networks, incorporating rectifiers and max-pooling, are
    neither smooth nor convex. Standard guarantees therefore do not apply.
    Nevertheless, methods from convex optimization such as gradient descent and
    Adam are widely used as building blocks for deep learning algorithms. This
    paper provides the first convergence guarantee applicable to modern convnets.
    The guarantee matches a lower bound for convex nonsmooth functions. The key
    technical tool is the neural Taylor approximation — a straightforward
    application of Taylor expansions to neural networks — and the associated
    Taylor loss. Experiments on a range of optimizers, layers, and tasks provide
    evidence that the analysis accurately captures the dynamics of neural
    optimization.

    The second half of the paper applies the Taylor approximation to isolate the
    main difficulty in training rectifier nets: that gradients are shattered. We
    investigate the hypothesis that, by exploring the space of activation
    configurations more thoroughly, adaptive optimizers such as RMSProp and Adam
    are able to converge to better solutions.

    Neuromorphic Silicon Photonics

    Alexander N. Tait, Ellen Zhou, Thomas Ferreira de Lima, Allie X. Wu, Mitchell A. Nahmias, Bhavin J. Shastri, Paul R. Prucnal
    Comments: 10 pages, 6 figures, submitted
    Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE); Optics (physics.optics)

    We report first observations of an integrated analog photonic network, in
    which connections are configured by microring weight banks, as well as the
    first use of electro-optic modulators as photonic neurons. A mathematical
    isomorphism between the silicon photonic circuit and a continuous neural model
    is demonstrated through dynamical bifurcation analysis. Exploiting this
    isomorphism, existing neural engineering tools can be adapted to silicon
    photonic information processing systems. A 49-node silicon photonic neural
    network programmed using a “neural compiler” is simulated and predicted to
    outperform a conventional approach 1,960-fold in a toy differential system
    emulation task. Photonic neural networks leveraging silicon photonic platforms
    could access new regimes of ultrafast information processing for radio,
    control, and scientific computing.


    Computer Vision and Pattern Recognition

    Multispectral Deep Neural Networks for Pedestrian Detection

    Jingjing Liu, Shaoting Zhang, Shu Wang, Dimitris N. Metaxas
    Comments: 13 pages, 8 figures, BMVC 2016 oral
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Multispectral pedestrian detection is essential for around-the-clock
    applications, e.g., surveillance and autonomous driving. We deeply analyze
    Faster R-CNN for multispectral pedestrian detection task and then model it into
    a convolutional network (ConvNet) fusion problem. Further, we discover that
    ConvNet-based pedestrian detectors trained by color or thermal images
    separately provide complementary information in discriminating human instances.
    Thus there is a large potential to improve pedestrian detection by using color
    and thermal images in DNNs simultaneously. We carefully design four ConvNet
    fusion architectures that integrate two-branch ConvNets on different DNNs
    stages, all of which yield better performance compared with the baseline
    detector. Our experimental results on KAIST pedestrian benchmark show that the
    Halfway Fusion model that performs fusion on the middle-level convolutional
    features outperforms the baseline method by 11% and yields a missing rate 3.5%
    lower than the other proposed architectures.

    Estimating motion with principal component regression strategies

    Felipe P. do Carmo, Vania Vieira Estrela, Joaquim Teixeira de Assis
    Comments: 6 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:1610.02923
    Journal-ref: Proceedings of the IEEE International Workshop on Multimedia
    Signal Processing, 2009, MMSP ’09, 2009
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, two simple principal component regression methods for
    estimating the optical flow between frames of video sequences according to a
    pel-recursive manner are introduced. These are easy alternatives to dealing
    with mixtures of motion vectors in addition to the lack of prior information on
    spatial-temporal statistics (although they are supposed to be normal in a local
    sense). The 2D motion vector estimation approaches take into consideration
    simple image properties and are used to harmonize regularized least square
    estimates. Their main advantage is that no knowledge of the noise distribution
    is necessary, although there is an underlying assumption of localized
    smoothness. Preliminary experiments indicate that this approach provides robust
    estimates of the optical flow.

    The Loss Surface of Residual Networks: Ensembles and the Role of Batch Normalization

    Etai Littwin, Lior Wolf
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep Residual Networks present a premium in performance in comparison to
    conventional networks of the same depth and are trainable at extreme depths. It
    has recently been shown that Residual Networks behave like ensembles of
    relatively shallow networks. We show that these ensembles are dynamic: while
    initially the virtual ensemble is mostly at depths lower than half the
    network’s depth, as training progresses, it becomes deeper and deeper. The main
    mechanism that controls the dynamic ensemble behavior is the scaling
    introduced, e.g., by the Batch Normalization technique. We explain this
    behavior and demonstrate the driving force behind it. As a main tool in our
    analysis, we employ generalized spin glass models, which we also use in order
    to study the number of critical points in the optimization of Residual
    Networks.

    Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

    Pichao Wang, Zhaoyang Li, Yonghong Hou, Wanqing Li
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recently, Convolutional Neural Networks (ConvNets) have shown promising
    performances in many computer vision tasks, especially image-based recognition.
    How to effectively use ConvNets for video-based recognition is still an open
    problem. In this paper, we propose a compact, effective yet simple method to
    encode spatio-temporal information carried in (3D) skeleton sequences into
    multiple (2D) images, referred to as Joint Trajectory Maps (JTM), and ConvNets
    are adopted to exploit the discriminative features for real-time human action
    recognition. The proposed method has been evaluated on three public benchmarks,
    i.e., MSRC-12 Kinect gesture dataset (MSRC-12), G3D dataset and UTD multimodal
    human action dataset (UTD-MHAD) and achieved the state-of-the-art results.

    Domain Adaptation with L2 constraints for classifying images from different endoscope systems

    Toru Tamaki, Shoji Sonoyama, Takio Kurita, Tsubasa Hirakawa, Bisser Raytchev, Kazufumi Kaneda, Tetsushi Koide, Shigeto Yoshida, Hiroshi Mieno, Shinji Tanaka, Kazuaki Chayama
    Comments: 28 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    This paper proposes a method for domain adaptation that extends the maximum
    margin domain transfer (MMDT) proposed by Hoffman et al., by introducing L_2
    distance constraints between samples of different domains; thus, our method is
    denoted as MMDTL2. Motivated by the differences between the images taken by
    narrow band imaging (NBI) endoscopic devices, we utilize different NBI devices
    as different domains and estimate the transformations between samples of
    different domains, i.e., image samples taken by different NBI endoscope
    systems. We first formulate the problem in the primal form, and then derive the
    dual form with much lesser computational costs as compared to the naive
    approach. From our experimental results using NBI image datasets from two
    different NBI endoscopic devices, we find that MMDTL2 is more stable than MMDT
    and better than support vector machines without adaptation.

    Multiple Object Tracking with Kernelized Correlation Filters in Urban Mixed Traffic

    Yuebin Yang, Guillaume-Alexandre Bilodeau
    Comments: 7 pages, 4 figures, 2 tables, 1 algorithm
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recently, the Kernelized Correlation Filters tracker (KCF) achieved
    competitive performance and robustness in visual object tracking. On the other
    hand, visual trackers are not typically used in multiple object tracking. In
    this paper, we investigate how a robust visual tracker like KCF can improve
    multiple object tracking. Since KCF is a fast tracker, many can be used in
    parallel and still result in fast tracking. We build a multiple object tracking
    system based on KCF and background subtraction. Background subtraction is
    applied to extract moving objects and get their scale and size in combination
    with KCF outputs, while KCF is used for data association and to handle
    fragmentation and occlusion problems. As a result, KCF and background
    subtraction help each other to take tracking decision at every frame. Sometimes
    KCF outputs are the most trustworthy (e.g. during occlusion), while in some
    other case, it is the background subtraction outputs. To validate the
    effectiveness of our system, the algorithm is demonstrated on four urban video
    recordings from a standard dataset. Results show that our method is competitive
    with state-of-the-art trackers even if we use a much simpler data association
    step.

    Quantum spectral analysis: frequency at time

    Mario Mastriani
    Comments: 140 pages, 78 figures, 8 tables. arXiv admin note: text overlap with arXiv:0803.2507 by other authors
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A quantum time-dependent spectrum analysis, or simply, quantum spectral
    analysis (QuSA) is presented in this work, and it is based on Schrodinger
    equation, which is a partial differential equation that describes how the
    quantum state of a non-relativistic physical system changes with time. In
    classic world is named frequency at time (FAT), which is presented here in
    opposition and as a complement of traditional spectral analysis
    frequency-dependent based on Fourier theory. Besides, FAT is a metric, which
    assesses the impact of the flanks of a signal on its frequency spectrum, which
    is not taken into account by Fourier theory and even less in real time. Even
    more, and unlike all derived tools from Fourier Theory (i.e., continuous,
    discrete, fast, short-time, fractional and quantum Fourier Transform, as well
    as, Gabor) FAT has the following advantages: a) compact support with excellent
    energy output treatment, b) low computational cost, O(N) for signals and O(N2)
    for images, c) it does not have phase uncertainties (indeterminate phase for
    magnitude = 0) as Discrete and Fast Fourier Transform (DFT, FFT, respectively),
    d) among others. In fact, FAT constitutes one side of a triangle (which from
    now on is closed) and it consists of the original signal in time, spectral
    analysis based on Fourier Theory and FAT. Thus a toolbox is completed, which it
    is essential for all applications of Digital Signal Processing (DSP) and
    Digital Image Processing (DIP); and, even, in the latter, FAT allows edge
    detection (which is called flank detection in case of signals), denoising,
    despeckling, compression, and superresolution of still images. Such
    applications include signals intelligence and imagery intelligence. On the
    other hand, we will present other DIP tools, which are also derived from the
    Schrodinger equation.

    Gradients of Counterfactuals

    Mukund Sundararajan, Ankur Taly, Qiqi Yan
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Gradients have been used to quantify feature importance in machine learning
    models. Unfortunately, in nonlinear deep networks, not only individual neurons
    but also the whole network can saturate, and as a result an important input
    feature can have a tiny gradient. We study various networks, and observe that
    this phenomena is indeed widespread, across many inputs.

    We propose to examine interior gradients, which are gradients of
    counterfactual inputs constructed by scaling down the original input. We apply
    our method to the GoogleNet architecture for object recognition in images, as
    well as a ligand-based virtual screening network with categorical features and
    an LSTM based language model for the Penn Treebank dataset. We visualize how
    interior gradients better capture feature importance. Furthermore, interior
    gradients are applicable to a wide variety of deep networks, and have the
    attribution property that the feature importance scores sum to the the
    prediction score.

    Best of all, interior gradients can be computed just as easily as gradients.
    In contrast, previous methods are complex to implement, which hinders practical
    adoption.


    Artificial Intelligence

    On interestingness measures of formal concepts

    Sergei O. Kuznetsov, Tatiana Makhalova
    Comments: 20 pages, 5 figures, 3 tables
    Subjects: Artificial Intelligence (cs.AI)

    Formal concepts and closed itemsets proved to be of big importance for
    knowledge discovery, both as a tool for concise representation of association
    rules and a tool for clustering and constructing domain taxonomies and
    ontologies. Exponential explosion makes it difficult to consider the whole
    concept lattice arising from data, one needs to select most useful and
    interesting concepts. In this paper interestingness measures of concepts are
    considered and compared with respect to various aspects, such as efficiency of
    computation and applicability to noisy data and performing ranking correlation.

    Cognitive Discriminative Mappings for Rapid Learning

    Wen-Chieh Fang, Yi-ting Chiang
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Humans can learn concepts or recognize items from just a handful of examples,
    while machines require many more samples to perform the same task. In this
    paper, we build a computational model to investigate the possibility of this
    kind of rapid learning. The proposed method aims to improve the learning task
    of input from sensory memory by leveraging the information retrieved from
    long-term memory. We present a simple and intuitive technique called cognitive
    discriminative mappings (CDM) to explore the cognitive problem. First, CDM
    separates and clusters the data instances retrieved from long-term memory into
    distinct classes with a discrimination method in working memory when a sensory
    input triggers the algorithm. CDM then maps each sensory data instance to be as
    close as possible to the median point of the data group with the same class.
    The experimental results demonstrate that the CDM approach is effective for
    learning the discriminative features of supervised classifications with few
    training sensory input instances.

    The Data Complexity of Description Logic Ontologies

    Carsten Lutz, Frank Wolter
    Subjects: Artificial Intelligence (cs.AI)

    We analyze the data complexity of ontology-mediated querying where the
    ontologies are formulated in a description logic (DL) of the ALC family and
    queries are conjunctive queries, positive existential queries, or acyclic
    conjunctive queries. Our approach is non-uniform in the sense that we aim to
    understand the complexity of each single ontology instead of for all ontologies
    formulated in a certain language. While doing so, we quantify over the queries
    and are interested, for example, in the question whether all queries can be
    evaluated in polynomial time w.r.t. a given ontology. Our results include a
    PTime/coNP-dichotomy for ontologies of depth one in the description logic
    ALCFI, the equivalence of a PTime/coNP-dichotomy for ALC and ALCI-ontologies of
    unrestricted depth to the famous dichotomy conjecture for CSPs by Feder and
    Vardi, and the failure of PTime/coNP-dichotomy theorem for ALCF-ontologies.
    Regarding the latter DL, we also show that it is undecidable whether a given
    ontology admits PTime query evaluation.

    Proceedings of the First International Workshop on Argumentation in Logic Programming and Non-Monotonic Reasoning (Arg-LPNMR 2016)

    Sarah Alice Gaggl, Juan Carlos Nieves, Hannes Strass
    Subjects: Artificial Intelligence (cs.AI)

    This volume contains the papers presented at Arg-LPNMR 2016: First
    International Workshop on Argumentation in Logic Programming and Nonmonotonic
    Reasoning held on July 8-10, 2016 in New York City, NY.

    Combining observational and experimental data to find heterogeneous treatment effects

    Alexander Peysakhovich, Akos Lada
    Subjects: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Every design choice will have different effects on different units. However
    traditional A/B tests are often underpowered to identify these heterogeneous
    effects. This is especially true when the set of unit-level attributes is
    high-dimensional and our priors are weak about which particular covariates are
    important. However, there are often observational data sets available that are
    orders of magnitude larger. We propose a method to combine these two data
    sources to estimate heterogeneous treatment effects. First, we use
    observational time series data to estimate a mapping from covariates to
    unit-level effects. These estimates are likely biased but under some conditions
    the bias preserves unit-level relative rank orderings. If these conditions
    hold, we only need sufficient experimental data to identify a monotonic,
    one-dimensional transformation from observationally predicted treatment effects
    to real treatment effects. This reduces power demands greatly and makes the
    detection of heterogeneous effects much easier. As an application, we show how
    our method can be used to improve Facebook page recommendations.

    Sentence Ordering using Recurrent Neural Networks

    Lajanugen Logeswaran, Honglak Lee, Dragomir Radev
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Modeling the structure of coherent texts is a task of great importance in
    NLP. The task of organizing a given set of sentences into a coherent order has
    been commonly used to build and evaluate models that understand such structure.
    In this work we propose an end-to-end neural approach based on the recently
    proposed set to sequence mapping framework to address the sentence ordering
    problem. Our model achieves state-of-the-art performance in the order
    discrimination task on two datasets widely used in the literature. We also
    consider a new interesting task of ordering abstracts from conference papers
    and research proposals and demonstrate strong performance against recent
    methods. Visualizing the sentence representations learned by the model shows
    that the model has captured high level logical structure in these paragraphs.
    The model also learns rich semantic sentence representations by learning to
    order texts, performing comparably to recent unsupervised representation
    learning methods in the sentence similarity and paraphrase detection tasks.

    The Neural Noisy Channel

    Lei Yu, Phil Blunsom, Chris Dyer, Edward Grefenstette, Tomas Kocisky
    Comments: ICLR 2017 submission
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    We formulate sequence to sequence transduction as a noisy channel decoding
    problem and use recurrent neural networks to parameterise the source and
    channel models. Unlike direct models which can suffer from explaining-away
    effects during training, noisy channel models must produce outputs that explain
    their inputs, and their component models can be trained with not only paired
    training samples but also unpaired samples from the marginal output
    distribution. Using a latent variable to control how much of the conditioning
    sequence the channel model needs to read in order to generate a subsequent
    symbol, we obtain a tractable and effective beam search decoder. Experimental
    results on abstractive sentence summarisation, morphological inflection, and
    machine translation show that noisy channel models outperform direct models,
    and that they significantly benefit from increased amounts of unpaired output
    data that direct models cannot easily use.

    Learning from Untrusted Data

    Moses Charikar, Jacob Steinhardt, Gregory Valiant
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Cryptography and Security (cs.CR); Statistics Theory (math.ST)

    The vast majority of theoretical results in machine learning and statistics
    assume that the available training data is a reasonably reliable reflection of
    the phenomena to be learned or estimated. Similarly, the majority of machine
    learning and statistical techniques used in practice are brittle to the
    presence of large amounts of biased or malicious data. In this work we propose
    two novel frameworks in which to study estimation, learning, and optimization
    in the presence of significant fractions of arbitrary data. The first
    framework, which we term list-decodable learning, asks whether it is possible
    to return a list of answers, with the guarantee that at least one of them is
    accurate. For example, given a dataset of (n) points for which an unknown
    subset of (alpha n) points are drawn from a distribution of interest, and no
    assumptions are made about the remaining ((1-alpha)n) points, is it possible
    to return a list of (poly(1/alpha)) answers, one of which is correct? The
    second framework, which we term the semi-verified learning model considers the
    extent to which a small dataset of trusted data (drawn from the distribution in
    question) can be leveraged to enable the accurate extraction of information
    from a much larger but untrusted dataset (of which only an (alpha)-fraction is
    drawn from the distribution).

    We show strong positive results in both settings, and provide an algorithm
    for robust learning in a very general stochastic optimization setting. This
    general result has immediate implications for robust estimation in a number of
    settings, including for robustly estimating the mean of distributions with
    bounded second moments, robustly learning mixtures of such distributions, and
    robustly finding planted partitions in random graphs in which significant
    portions of the graph have been perturbed by an adversary.

    Normalizing Flows on Riemannian Manifolds

    Mevlana C. Gemici, Danilo Rezende, Shakir Mohamed
    Comments: 3 pages, 2 figures, Workshop on Bayesian Deep Learning at NIPS 2016
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Statistics Theory (math.ST)

    We consider the problem of density estimation on Riemannian manifolds.
    Density estimation on manifolds has many applications in fluid-mechanics,
    optics and plasma physics and it appears often when dealing with angular
    variables (such as used in protein folding, robot limbs, gene-expression) and
    in general directional statistics. In spite of the multitude of algorithms
    available for density estimation in the Euclidean spaces (mathbf{R}^n) that
    scale to large n (e.g. normalizing flows, kernel methods and variational
    approximations), most of these methods are not immediately suitable for density
    estimation in more general Riemannian manifolds. We revisit techniques related
    to homeomorphisms from differential geometry for projecting densities to
    sub-manifolds and use it to generalize the idea of normalizing flows to more
    general Riemannian manifolds. The resulting algorithm is scalable, simple to
    implement and suitable for use with automatic differentiation. We demonstrate
    concrete examples of this method on the n-sphere (mathbf{S}^n).

    Dynamic Coattention Networks For Question Answering

    Caiming Xiong, Victor Zhong, Richard Socher
    Comments: 13 pages, 7 figures
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    Several deep learning models have been proposed for question answering.
    However, due to their single-pass nature, they have no way to recover from
    local maxima corresponding to incorrect answers. To address this problem, we
    introduce the Dynamic Coattention Network (DCN) for question answering. The DCN
    first fuses co-dependent representations of the question and the document in
    order to focus on relevant parts of both. Then a dynamic pointing decoder
    iterates over potential answer spans. This iterative procedure enables the
    model to recover from initial local maxima corresponding to incorrect answers.
    On the Stanford question answering dataset, a single DCN model improves the
    previous state of the art from 71.0% F1 to 75.9%, while a DCN ensemble obtains
    80.4% F1.

    A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

    Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher
    Comments: Under review as a conference paper at ICLR 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    Transfer and multi-task learning have traditionally focused on either a
    single source-target pair or very few, similar tasks. Ideally, the linguistic
    levels of morphology, syntax and semantics would benefit each other by being
    trained in a single model. We introduce such a joint many-task model together
    with a strategy for successively growing its depth to solve increasingly
    complex tasks. All layers include shortcut connections to both word
    representations and lower-level task predictions. We use a simple
    regularization term to allow for optimizing all model weights to improve one
    task’s loss without exhibiting catastrophic interference of the other tasks.
    Our single end-to-end trainable model obtains state-of-the-art results on
    chunking, dependency parsing, semantic relatedness and textual entailment. It
    also performs competitively on POS tagging. Our dependency parsing layer relies
    only on a single feed-forward pass and does not require a beam search.

    Quasi-Recurrent Neural Networks

    James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher
    Comments: Submitted to conference track at ICLR 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    Recurrent neural networks are a powerful tool for modeling sequential data,
    but the dependence of each timestep’s computation on the previous timestep’s
    output limits parallelism and makes RNNs unwieldy for very long sequences. We
    introduce quasi-recurrent neural networks (QRNNs), an approach to neural
    sequence modeling that alternates convolutional layers, which apply in parallel
    across timesteps, and a minimalist recurrent pooling function that applies in
    parallel across channels. Despite lacking trainable recurrent layers, stacked
    QRNNs have better predictive accuracy than stacked LSTMs of the same hidden
    size. Due to their increased parallelism, they are up to 16 times faster at
    train and test time. Experiments on language modeling, sentiment
    classification, and character-level neural machine translation demonstrate
    these advantages and underline the viability of QRNNs as a basic building block
    for a variety of sequence tasks.


    Information Retrieval

    Balotage in Argentina 2015, a sentiment analysis of tweets

    Daniel Robins, Fernando Emmanuel Frati, Jonatan Alvarez, Jose Texier
    Comments: in Spanish. Jornadas de Cloud Computing, La Plata – Argentina. 2016
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Social and Information Networks (cs.SI)

    Twitter social network contains a large amount of information generated by
    its users. That information is composed of opinions and comments that may
    reflect trends in social behavior. There is talk of trend when it is possible
    to identify opinions and comments geared towards the same shared by a lot of
    people direction. To determine if two or more written opinions share the same
    address, techniques Natural Language Processing (NLP) are used. This paper
    proposes a methodology for predicting reflected in Twitter from the use of
    sentiment analysis functions NLP based on social behaviors. The case study was
    selected the 2015 Presidential in Argentina, and a software architecture Big
    Data composed Vertica data base with the component called Pulse was used.
    Through the analysis it was possible to detect trends in voting intentions with
    regard to the presidential candidates, achieving greater accuracy in predicting
    that achieved with traditional systems surveys.


    Computation and Language

    Unsupervised Pretraining for Sequence to Sequence Learning

    Prajit Ramachandran, Peter J. Liu, Quoc V. Le
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Sequence to sequence models are successful tools for supervised sequence
    learning tasks, such as machine translation. Despite their success, these
    models still require much labeled data and it is unclear how to improve them
    using unlabeled data, which is much less expensive to obtain. In this paper, we
    present simple changes that lead to a significant improvement in the accuracy
    of seq2seq models when the labeled set is small. Our method intializes the
    encoder and decoder of the seq2seq model with the trained weights of two
    language models, and then all weights are jointly fine-tuned with labeled data.
    An additional language modeling loss can be used to regularize the model during
    fine-tuning. We apply this method to low-resource tasks in machine translation
    and abstractive summarization and find that it significantly improves the
    subsequent supervised models. Our main finding is that the pretraining
    accelerates training and improves generalization of seq2seq models, achieving
    state-of-the-art results on the WMT English(
    ightarrow)German task. Our model
    obtains an improvement of 1.3 BLEU from the previous best models on both WMT’14
    and WMT’15 English(
    ightarrow)German. Our ablation study shows that
    pretraining helps seq2seq models in different ways depending on the nature of
    the task: translation benefits from the improved generalization whereas
    summarization benefits from the improved optimization.

    Sentence Ordering using Recurrent Neural Networks

    Lajanugen Logeswaran, Honglak Lee, Dragomir Radev
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Modeling the structure of coherent texts is a task of great importance in
    NLP. The task of organizing a given set of sentences into a coherent order has
    been commonly used to build and evaluate models that understand such structure.
    In this work we propose an end-to-end neural approach based on the recently
    proposed set to sequence mapping framework to address the sentence ordering
    problem. Our model achieves state-of-the-art performance in the order
    discrimination task on two datasets widely used in the literature. We also
    consider a new interesting task of ordering abstracts from conference papers
    and research proposals and demonstrate strong performance against recent
    methods. Visualizing the sentence representations learned by the model shows
    that the model has captured high level logical structure in these paragraphs.
    The model also learns rich semantic sentence representations by learning to
    order texts, performing comparably to recent unsupervised representation
    learning methods in the sentence similarity and paraphrase detection tasks.

    Veracity Computing from Lexical Cues and Perceived Certainty Trends

    Uwe D. Reichel, Piroska Lendvai
    Comments: to appear in: Proc. 2nd Workshop on Noisy User-generated Text, Osaka, Japan, 2016
    Subjects: Computation and Language (cs.CL)

    We present a data-driven method for determining the veracity of a set of
    rumorous claims on social media data. Tweets from different sources pertaining
    to a rumor are processed on three levels: first, factuality values are assigned
    to each tweet based on four textual cue categories relevant for our journalism
    use case; these amalgamate speaker support in terms of polarity and commitment
    in terms of certainty and speculation. Next, the proportions of these lexical
    cues are utilized as predictors for tweet certainty in a generalized linear
    regression model. Subsequently, lexical cue proportions, predicted certainty,
    as well as their time course characteristics are used to compute veracity for
    each rumor in terms of the identity of the rumor-resolving tweet and its binary
    resolution value judgment. The system operates without access to
    extralinguistic resources. Evaluated on the data portion for which hand-labeled
    examples were available, it achieves .74 F1-score on identifying rumor
    resolving tweets and .76 F1-score on predicting if a rumor is resolved as true
    or false.

    Contradiction Detection for Rumorous Claims

    Piroska Lendvai, Uwe D. Reichel
    Comments: to appear in: Proc. Extra-Propositional Aspects of Meaning (ExProM) in Computational Linguistics, Osaka, Japan, 2016
    Subjects: Computation and Language (cs.CL)

    The utilization of social media material in journalistic workflows is
    increasing, demanding automated methods for the identification of mis- and
    disinformation. Since textual contradiction across social media posts can be a
    signal of rumorousness, we seek to model how claims in Twitter posts are being
    textually contradicted. We identify two different contexts in which
    contradiction emerges: its broader form can be observed across independently
    posted tweets and its more specific form in threaded conversations. We define
    how the two scenarios differ in terms of central elements of argumentation:
    claims and conversation structure. We design and evaluate models for the two
    scenarios uniformly as 3-way Recognizing Textual Entailment tasks in order to
    represent claims and conversation structure implicitly in a generic inference
    model, while previous studies used explicit or no representation of these
    properties. To address noisy text, our classifiers use simple similarity
    features derived from the string and part-of-speech level. Corpus statistics
    reveal distribution differences for these features in contradictory as opposed
    to non-contradictory tweet relations, and the classifiers yield state of the
    art performance.

    The Neural Noisy Channel

    Lei Yu, Phil Blunsom, Chris Dyer, Edward Grefenstette, Tomas Kocisky
    Comments: ICLR 2017 submission
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    We formulate sequence to sequence transduction as a noisy channel decoding
    problem and use recurrent neural networks to parameterise the source and
    channel models. Unlike direct models which can suffer from explaining-away
    effects during training, noisy channel models must produce outputs that explain
    their inputs, and their component models can be trained with not only paired
    training samples but also unpaired samples from the marginal output
    distribution. Using a latent variable to control how much of the conditioning
    sequence the channel model needs to read in order to generate a subsequent
    symbol, we obtain a tractable and effective beam search decoder. Experimental
    results on abstractive sentence summarisation, morphological inflection, and
    machine translation show that noisy channel models outperform direct models,
    and that they significantly benefit from increased amounts of unpaired output
    data that direct models cannot easily use.

    Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based Approaches

    Shane Settle, Karen Livescu
    Comments: To appear at SLT 2016
    Subjects: Computation and Language (cs.CL)

    Acoustic word embeddings — fixed-dimensional vector representations of
    variable-length spoken word segments — have begun to be considered for tasks
    such as speech recognition and query-by-example search. Such embeddings can be
    learned discriminatively so that they are similar for speech segments
    corresponding to the same word, while being dissimilar for segments
    corresponding to different words. Recent work has found that acoustic word
    embeddings can outperform dynamic time warping on query-by-example search and
    related word discrimination tasks. However, the space of embedding models and
    training approaches is still relatively unexplored. In this paper we present
    new discriminative embedding models based on recurrent neural networks (RNNs).
    We consider training losses that have been successful in prior work, in
    particular a cross entropy loss for word classification and a contrastive loss
    that explicitly aims to separate same-word and different-word pairs in a
    “Siamese network” training setting. We find that both classifier-based and
    Siamese RNN embeddings improve over previously reported results on a word
    discrimination task, with Siamese RNNs outperforming classification models. In
    addition, we present analyses of the learned embeddings and the effects of
    variables such as dimensionality and network structure.

    A Surrogate-based Generic Classifier for Chinese Movie Reviews

    Yufeng Ma, Long Xia, Wenqi Shen, Weiguo Fan
    Subjects: Computation and Language (cs.CL)

    With the emerging of various online video platforms like Youtube, Youku and
    LeTV, online movie reviews become more and more important both for movie
    viewers and producers. As a result, automatically classifying reviews according
    to different requirements evolves as a popular research topic and is very
    essential in our daily life. In this paper, we focused on reviews of hot TV
    series in China and successfully trained generic classifiers based on 8
    predefined categories. The experimental results showed promising performance
    and effectiveness of its generalization to different TV series.

    Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents

    Rui Zhang, Honglak Lee, Dragomir Radev
    Comments: NAACL2016
    Subjects: Computation and Language (cs.CL)

    The goal of sentence and document modeling is to accurately represent the
    meaning of sentences and documents for various Natural Language Processing
    tasks. In this work, we present Dependency Sensitive Convolutional Neural
    Networks (DSCNN) as a general-purpose classification system for both sentences
    and documents. DSCNN hierarchically builds textual representations by
    processing pretrained word embeddings via Long Short-Term Memory networks and
    subsequently extracting features with convolution operators. Compared with
    existing recursive neural models with tree structures, DSCNN does not rely on
    parsers and expensive phrase labeling, and thus is not restricted to
    sentence-level tasks. Moreover, unlike other CNN-based models that analyze
    sentences locally by sliding windows, our system captures both the dependency
    information within each sentence and relationships across sentences in the same
    document. Experiment results demonstrate that our approach is achieving
    state-of-the-art performance on several tasks, including sentiment analysis,
    question type classification, and subjectivity classification.

    Cruciform: Solving Crosswords with Natural Language Processing

    Dragomir Radev, Rui Zhang, Steve Wilson, Derek Van Assche, Henrique Spyra Gubert, Alisa Krivokapic, MeiXing Dong, Chongruo Wu, Spruce Bondera, Luke Brandl, Jeremy Dohmann
    Subjects: Computation and Language (cs.CL)

    Crossword puzzles are popular word games that require not only a large
    vocabulary, but also a broad knowledge of topics. Answering each clue is a
    natural language task on its own as many clues contain nuances, puns, or
    counter-intuitive word definitions. Additionally, it can be extremely difficult
    to ascertain definitive answers without the constraints of the crossword grid
    itself. This task is challenging for both humans and computers. We describe
    here a new crossword solving system, Cruciform. We employ a group of natural
    language components, each of which returns a list of candidate words with
    scores when given a clue. These lists are used in conjunction with the fill
    intersections in the puzzle grid to formulate a constraint satisfaction
    problem, in a manner similar to the one used in the Dr. Fill system. We
    describe the results of several of our experiments with the system.

    A Convolutional Encoder Model for Neural Machine Translation

    Jonas Gehring, Michael Auli, David Grangier, Yann N. Dauphin
    Comments: 13 pages
    Subjects: Computation and Language (cs.CL)

    The prevalent approach to neural machine translation relies on bi-directional
    LSTMs to encode the source sentence. In this paper we present a faster and
    conceptually simpler architecture based on a succession of convolutional
    layers. This allows to encode the entire source sentence simultaneously
    compared to recurrent networks for which computation is constrained by temporal
    dependencies. We achieve a new state-of-the-art on WMT’16 English-Romanian
    translation and outperform several recently published results on the WMT’15
    English-German task. We also achieve almost the same accuracy as a very deep
    LSTM setup on WMT’14 English-French translation. Our convolutional encoder
    speeds up CPU decoding by more than two times at the same or higher accuracy as
    a strong bi-directional LSTM baseline.

    Balotage in Argentina 2015, a sentiment analysis of tweets

    Daniel Robins, Fernando Emmanuel Frati, Jonatan Alvarez, Jose Texier
    Comments: in Spanish. Jornadas de Cloud Computing, La Plata – Argentina. 2016
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Social and Information Networks (cs.SI)

    Twitter social network contains a large amount of information generated by
    its users. That information is composed of opinions and comments that may
    reflect trends in social behavior. There is talk of trend when it is possible
    to identify opinions and comments geared towards the same shared by a lot of
    people direction. To determine if two or more written opinions share the same
    address, techniques Natural Language Processing (NLP) are used. This paper
    proposes a methodology for predicting reflected in Twitter from the use of
    sentiment analysis functions NLP based on social behaviors. The case study was
    selected the 2015 Presidential in Argentina, and a software architecture Big
    Data composed Vertica data base with the component called Pulse was used.
    Through the analysis it was possible to detect trends in voting intentions with
    regard to the presidential candidates, achieving greater accuracy in predicting
    that achieved with traditional systems surveys.


    Distributed, Parallel, and Cluster Computing

    Optimization and parallelization of B-spline based orbital evaluations in QMC on multi/many-core shared memory processors

    Amrita Mathuriya, Ye Luo, Anouar Benali, Luke Shulenburger, Jeongnim Kim
    Comments: 11 pages, 10 figures, 4 tables
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    B-spline based orbital representations are widely used in Quantum Monte Carlo
    (QMC) simulations of solids, historically taking as much as 50% of the total
    run time. Random accesses to a large four-dimensional array make it challenging
    to efficiently utilize caches and wide vector units of modern CPUs. We present
    node-level optimizations of B-spline evaluations on multi/many-core shared
    memory processors. To increase SIMD efficiency and bandwidth utilization, we
    first apply data layout transformation from array-of-structures to
    structure-of-arrays (SoA). Then by blocking SoA objects, we optimize cache
    reuse and get sustained throughput for a range of problem sizes. We implement
    efficient nested threading in B-spline orbital evaluation kernels, paving the
    way towards enabling strong scaling of QMC simulations. These optimizations are
    portable on four distinct cache-coherent architectures and result in up to 5.6x
    performance enhancements on Intel Xeon Phi processor 7250P (KNL), 5.7x on Intel
    Xeon Phi coprocessor 7120P, 10x on an Intel Xeon processor E5v4 CPU and 9.5x on
    BlueGene/Q processor. Our nested threading implementation shows nearly ideal
    parallel efficiency on KNL up to 16 threads. We employ roofline performance
    analysis to model the impacts of our optimizations. This work combined with our
    current efforts of optimizing other QMC kernels, result in greater than 4.5x
    speedup of miniQMC on KNL.

    Multidimensional Asymptotic Consensus in Dynamic Networks

    Bernadette Charron-Bost, Matthias Függer, Thomas Nowak
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (cs.SY)

    We study the problem of asymptotic consensus as it occurs in a wide range of
    applications in both man-made and natural systems. In particular, we study
    systems with directed communication graphs that may change over time.

    We recently proposed a new family of convex combination algorithms in
    dimension one whose weights depend on the received values and not only on the
    communication topology. Here, we extend this approach to arbitrarily high
    dimensions by introducing two new algorithms: the ExtremePoint and the Centroid
    algorithm. Contrary to classical convex combination algorithms, both have
    component-wise contraction rates that are constant in the number of agents.
    Paired with a speed-up technique for convex combination algorithms, we get a
    convergence time linear in the number of agents, which is optimal.

    Besides their respective contraction rates, the two algorithms differ in the
    fact that the Centroid algorithm’s update rule is independent of any coordinate
    system while the ExtremePoint algorithm implicitly assumes a common agreed-upon
    coordinate system among agents. The latter assumption may be realistic in some
    man-made multi-agent systems but is highly questionable in systems designed for
    the modelization of natural phenomena.

    Finally we prove that our new algorithms also achieve asymptotic consensus
    under very weak connectivity assumptions, provided that agent interactions are
    bidirectional.

    Memory layout in GPU implementation of lattice Boltzmann method for sparse 3D geometries

    Tadeusz Tomczak, Roman G. Szafran
    Comments: 14 pages, 14 figures, sent to Computer Physics Comunications
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

    We describe a high-performance implementation of the lattice Boltzmann method
    (LBM) for sparse 3D geometries on graphic processors (GPU). The main
    contribution of this work is a data layout that allows to minimise the number
    of redundant memory transactions during the propagation step of LBM. We show
    that by using a uniform mesh of small three-dimensional tiles and a careful
    data placement it is possible to utilise more than 70% of maximum theoretical
    GPU memory bandwidth for D3Q19 lattice and double precision numbers. The
    performance of our implementation is thoroughly examined and compared with
    other GPU implementations of LBM. The proposed method performs the best for
    sparse geometries with good spatial locality.

    Automated Application Offloading through Ant-inspired Decision-Making

    Roya Golchay (CITI), Frédéric Le Mouël (CITI), Julien Ponge (CITI), Nicolas Stouls (CITI)
    Journal-ref: Proceedings of the 13th International Conference on New
    Technologies in Distributed Systems (NOTERE’2016), Jul 2016, Paris, France
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

    -The explosive trend of smartphone usage as the most effective and convenient
    communication tools of human life in recent years make developers build ever
    more complex smartphone applications. Gaming, navigation, video editing,
    augmented reality, and speech recognition applications require considerable
    computational power and energy. Although smart- phones have a wide range of
    capabilities – GPS, WiFi, cameras – their inherent limitations – frequent
    disconnections, mobility – and significant constraints – size, lower weights,
    longer battery life – make difficult to exploiting their full potential to run
    complex applications. Several research works have proposed solutions in
    application offloading domain, but few ones concerning the highly changing
    properties of the environment. To address these issues, we realize an automated
    application offloading middleware, ACOMMA, with dynamic and re-adaptable
    decision-making engine. The decision engine of ACOMMA is based on an ant-
    inspired algorithm.

    An Optimal Ancestry Labeling Scheme with Applications to XML Trees and Universal Posets

    Pierre Fraigniaud (IRIF), Amos Korman (IRIF)
    Journal-ref: Journal of the ACM, ACM, 2016, 63, pp.1 – 31
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

    In this paper we solve the ancestry-labeling scheme problem which aims at
    assigning the shortest possible labels (bit strings) to nodes of rooted trees,
    so that ancestry queries between any two nodes can be answered by inspecting
    their assigned labels only. This problem was introduced more than twenty years
    ago by Kannan et al. [STOC ’88], and is among the most well-studied problems in
    the field of informative labeling schemes. We construct an ancestry-labeling
    scheme for n-node trees with label size log 2 n + O(log log n) bits, thus
    matching the log 2 n + (Omega)(log log n) bits lower bound given by Alstrup et
    al. [SODA ’03]. Our scheme is based on a simplified ancestry scheme that
    operates extremely well on a restricted set of trees. In particular, for the
    set of n-node trees with depth at most d, the simplified ancestry scheme enjoys
    label size of log 2 n + 2 log 2 d + O(1) bits. Since the depth of most XML
    trees is at most some small constant, such an ancestry scheme may be of
    practical use. In addition, we also obtain an adjacency-labeling scheme that
    labels n-node trees of depth d with labels of size log 2 n + 3 log 2 d + O(1)
    bits. All our schemes assign the labels in linear time, and guarantee that any
    query can be answered in constant time. Finally, our ancestry scheme finds
    applications to the construction of small universal partially ordered sets
    (posets). Specifically, for any fixed integer k, it enables the construction of
    a universal poset of size~Osize~ size~O(n k) for the family of n-element posets
    with tree-dimension at most k. Up to lower order terms, this bound is tight
    thanks to a lower bound of n k–o(1) due to Alon and Scheinerman [Order ’88].

    GPU-Based Parallel Integration of Large Numbers of Independent ODE Systems

    Kyle E Niemeyer, Chih-Jen Sung
    Comments: 21 pages, 2 figures
    Journal-ref: Numerical Computations with GPUs, Ch. 8 (2014) 159-182. V
    Kindratenko (Ed.)
    Subjects: Mathematical Software (cs.MS); Distributed, Parallel, and Cluster Computing (cs.DC); Computational Physics (physics.comp-ph)

    The task of integrating a large number of independent ODE systems arises in
    various scientific and engineering areas. For nonstiff systems, common explicit
    integration algorithms can be used on GPUs, where individual GPU threads
    concurrently integrate independent ODEs with different initial conditions or
    parameters. One example is the fifth-order adaptive Runge-Kutta-Cash-Karp
    (RKCK) algorithm. In the case of stiff ODEs, standard explicit algorithms
    require impractically small time-step sizes for stability reasons, and implicit
    algorithms are therefore commonly used instead to allow larger time steps and
    reduce the computational expense. However, typical high-order implicit
    algorithms based on backwards differentiation formulae (e.g., VODE, LSODE)
    involve complex logical flow that causes severe thread divergence when
    implemented on GPUs, limiting the performance. Therefore, alternate algorithms
    are needed. A GPU-based Runge-Kutta-Chebyshev (RKC) algorithm can handle
    moderate levels of stiffness and performs significantly faster than not only an
    equivalent CPU version but also a CPU-based implicit algorithm (VODE) based on
    results shown in the literature. In this chapter, we present the mathematical
    background, implementation details, and source code for the RKCK and RKC
    algorithms for use integrating large numbers of independent systems of ODEs on
    GPUs. In addition, brief performance comparisons are shown for each algorithm,
    demonstrating the potential benefit of moving to GPU-based ODE integrators.


    Learning

    Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders

    Nat Dilokthanakul, Pedro A.M. Mediano, Marta Garnelo, Matthew C.H. Lee, Hugh Salimbeni, Kai Arulkumaran, Murray Shanahan
    Comments: 12 pages, 4 figures, Under review as a conference paper at ICLR 2017
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We study a variant of the variational autoencoder model with a Gaussian
    mixture as a prior distribution, with the goal of performing unsupervised
    clustering through deep generative models. We observe that the standard
    variational approach in these models is unsuited for unsupervised clustering,
    and mitigate this problem by leveraging a principled information-theoretic
    regularisation term known as consistency violation. Adding this term to the
    standard variational optimisation objective yields networks with both
    meaningful internal representations and well-defined clusters. We demonstrate
    the performance of this scheme on synthetic data, MNIST and SVHN, showing that
    the obtained clusters are distinct, interpretable and result in achieving
    higher performance on unsupervised clustering classification than previous
    approaches.

    Gradients of Counterfactuals

    Mukund Sundararajan, Ankur Taly, Qiqi Yan
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Gradients have been used to quantify feature importance in machine learning
    models. Unfortunately, in nonlinear deep networks, not only individual neurons
    but also the whole network can saturate, and as a result an important input
    feature can have a tiny gradient. We study various networks, and observe that
    this phenomena is indeed widespread, across many inputs.

    We propose to examine interior gradients, which are gradients of
    counterfactual inputs constructed by scaling down the original input. We apply
    our method to the GoogleNet architecture for object recognition in images, as
    well as a ligand-based virtual screening network with categorical features and
    an LSTM based language model for the Penn Treebank dataset. We visualize how
    interior gradients better capture feature importance. Furthermore, interior
    gradients are applicable to a wide variety of deep networks, and have the
    attribution property that the feature importance scores sum to the the
    prediction score.

    Best of all, interior gradients can be computed just as easily as gradients.
    In contrast, previous methods are complex to implement, which hinders practical
    adoption.

    PixelSNE: Visualizing Fast with Just Enough Precision via Pixel-Aligned Stochastic Neighbor Embedding

    Minjeong Kim, Minsuk Choi, Sunwoong Lee, Jian Tang, Haesun Park, Jaegul Choo
    Subjects: Learning (cs.LG)

    Embedding and visualizing large-scale high-dimensional data in a
    two-dimensional space is an important problem since such visualization can
    reveal deep insights out of complex data. Most of the existing embedding
    approaches, however, run on an excessively high precision, ignoring the fact
    that at the end, embedding outputs are converted into coarse-grained discrete
    pixel coordinates in a screen space. Motivated by such an observation and
    directly considering pixel coordinates in an embedding optimization process, we
    accelerate Barnes-Hut tree-based t-distributed stochastic neighbor embedding
    (BH-SNE), known as a state-of-the-art 2D embedding method, and propose a novel
    method called PixelSNE, a highly-efficient, screen resolution-driven 2D
    embedding method with a linear computational complexity in terms of the number
    of data items. Our experimental results show the significantly fast running
    time of PixelSNE by a large margin against BH-SNE, while maintaining the
    minimal degradation in the embedding quality. Finally, the source code of our
    method is publicly available at this https URL

    An Efficient Approach to Boosting Performance of Deep Spiking Network Training

    Seongsik Park, Sang-gil Lee, Hyunha Nam, Sungroh Yoon
    Comments: 9 pages, 5 figures
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Nowadays deep learning is dominating the field of machine learning with
    state-of-the-art performance in various application areas. Recently, spiking
    neural networks (SNNs) have been attracting a great deal of attention, notably
    owning to their power efficiency, which can potentially allow us to implement a
    low-power deep learning engine suitable for real-time/mobile applications.
    However, implementing SNN-based deep learning remains challenging, especially
    gradient-based training of SNNs by error backpropagation. We cannot simply
    propagate errors through SNNs in conventional way because of the property of
    SNNs that process discrete data in the form of a series. Consequently, most of
    the previous studies employ a workaround technique, which first trains a
    conventional weighted-sum deep neural network and then maps the learning
    weights to the SNN under training, instead of training SNN parameters directly.
    In order to eliminate this workaround, recently proposed is a new class of SNN
    named deep spiking networks (DSNs), which can be trained directly (without a
    mapping from conventional deep networks) by error backpropagation with
    stochastic gradient descent. In this paper, we show that the initialization of
    the membrane potential on the backward path is an important step in DSN
    training, through diverse experiments performed under various conditions.
    Furthermore, we propose a simple and efficient method that can improve DSN
    training by controlling the initial membrane potential on the backward path. In
    our experiments, adopting the proposed approach allowed us to boost the
    performance of DSN training in terms of converging time and accuracy.

    Learning Dynamic Programming with Split-Merge Networks

    Alex Noway, Joan Bruna
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We consider the learning of algorithmic tasks by mere observation of
    input-output pairs. Rather than studying this as a black-box discrete
    regression problem with no assumption whatsoever on the input-output mapping,
    we concentrate on tasks that are amenable to the principle of emph{divide and
    conquer}, and study what are its implications in terms of learning.

    This principle creates a powerful inductive bias that we exploit with neural
    architectures that are defined recursively, by learning two scale-invariant
    atomic operators: how to emph{split} a given input into two disjoint sets, and
    how to emph{merge} two partially solved tasks into a larger partial solution.
    The scale invariance creates parameter sharing across all stages of the
    architecture, and the dynamic design creates architectures whose complexity can
    be tuned in a differentiable manner.

    As a result, our model is trained by backpropagation not only to minimize the
    errors at the output, but also to do so as efficiently as possible, by
    enforcing shallower computation graphs. Moreover, thanks to the scale
    invariance, the model can be trained only with only input/output pairs,
    removing the need to know oracle intermediate split and merge decisions. As it
    turns out, accuracy and complexity are not independent qualities, and we verify
    empirically that when the learnt complexity matches the underlying complexity
    of the task, this results in higher accuracy and better generalization in two
    paradigmatic problems: sorting and finding planar convex hulls.

    Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks

    David Balduzzi, Brian McWilliams, Tony Butler-Yeoman
    Comments: 13 pages, 6 figures
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Modern convolutional networks, incorporating rectifiers and max-pooling, are
    neither smooth nor convex. Standard guarantees therefore do not apply.
    Nevertheless, methods from convex optimization such as gradient descent and
    Adam are widely used as building blocks for deep learning algorithms. This
    paper provides the first convergence guarantee applicable to modern convnets.
    The guarantee matches a lower bound for convex nonsmooth functions. The key
    technical tool is the neural Taylor approximation — a straightforward
    application of Taylor expansions to neural networks — and the associated
    Taylor loss. Experiments on a range of optimizers, layers, and tasks provide
    evidence that the analysis accurately captures the dynamics of neural
    optimization.

    The second half of the paper applies the Taylor approximation to isolate the
    main difficulty in training rectifier nets: that gradients are shattered. We
    investigate the hypothesis that, by exploring the space of activation
    configurations more thoroughly, adaptive optimizers such as RMSProp and Adam
    are able to converge to better solutions.

    Learning from Untrusted Data

    Moses Charikar, Jacob Steinhardt, Gregory Valiant
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Cryptography and Security (cs.CR); Statistics Theory (math.ST)

    The vast majority of theoretical results in machine learning and statistics
    assume that the available training data is a reasonably reliable reflection of
    the phenomena to be learned or estimated. Similarly, the majority of machine
    learning and statistical techniques used in practice are brittle to the
    presence of large amounts of biased or malicious data. In this work we propose
    two novel frameworks in which to study estimation, learning, and optimization
    in the presence of significant fractions of arbitrary data. The first
    framework, which we term list-decodable learning, asks whether it is possible
    to return a list of answers, with the guarantee that at least one of them is
    accurate. For example, given a dataset of (n) points for which an unknown
    subset of (alpha n) points are drawn from a distribution of interest, and no
    assumptions are made about the remaining ((1-alpha)n) points, is it possible
    to return a list of (poly(1/alpha)) answers, one of which is correct? The
    second framework, which we term the semi-verified learning model considers the
    extent to which a small dataset of trusted data (drawn from the distribution in
    question) can be leveraged to enable the accurate extraction of information
    from a much larger but untrusted dataset (of which only an (alpha)-fraction is
    drawn from the distribution).

    We show strong positive results in both settings, and provide an algorithm
    for robust learning in a very general stochastic optimization setting. This
    general result has immediate implications for robust estimation in a number of
    settings, including for robustly estimating the mean of distributions with
    bounded second moments, robustly learning mixtures of such distributions, and
    robustly finding planted partitions in random graphs in which significant
    portions of the graph have been perturbed by an adversary.

    Unsupervised Pretraining for Sequence to Sequence Learning

    Prajit Ramachandran, Peter J. Liu, Quoc V. Le
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Sequence to sequence models are successful tools for supervised sequence
    learning tasks, such as machine translation. Despite their success, these
    models still require much labeled data and it is unclear how to improve them
    using unlabeled data, which is much less expensive to obtain. In this paper, we
    present simple changes that lead to a significant improvement in the accuracy
    of seq2seq models when the labeled set is small. Our method intializes the
    encoder and decoder of the seq2seq model with the trained weights of two
    language models, and then all weights are jointly fine-tuned with labeled data.
    An additional language modeling loss can be used to regularize the model during
    fine-tuning. We apply this method to low-resource tasks in machine translation
    and abstractive summarization and find that it significantly improves the
    subsequent supervised models. Our main finding is that the pretraining
    accelerates training and improves generalization of seq2seq models, achieving
    state-of-the-art results on the WMT English(
    ightarrow)German task. Our model
    obtains an improvement of 1.3 BLEU from the previous best models on both WMT’14
    and WMT’15 English(
    ightarrow)German. Our ablation study shows that
    pretraining helps seq2seq models in different ways depending on the nature of
    the task: translation benefits from the improved generalization whereas
    summarization benefits from the improved optimization.

    Sentence Ordering using Recurrent Neural Networks

    Lajanugen Logeswaran, Honglak Lee, Dragomir Radev
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Modeling the structure of coherent texts is a task of great importance in
    NLP. The task of organizing a given set of sentences into a coherent order has
    been commonly used to build and evaluate models that understand such structure.
    In this work we propose an end-to-end neural approach based on the recently
    proposed set to sequence mapping framework to address the sentence ordering
    problem. Our model achieves state-of-the-art performance in the order
    discrimination task on two datasets widely used in the literature. We also
    consider a new interesting task of ordering abstracts from conference papers
    and research proposals and demonstrate strong performance against recent
    methods. Visualizing the sentence representations learned by the model shows
    that the model has captured high level logical structure in these paragraphs.
    The model also learns rich semantic sentence representations by learning to
    order texts, performing comparably to recent unsupervised representation
    learning methods in the sentence similarity and paraphrase detection tasks.

    Cognitive Discriminative Mappings for Rapid Learning

    Wen-Chieh Fang, Yi-ting Chiang
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Humans can learn concepts or recognize items from just a handful of examples,
    while machines require many more samples to perform the same task. In this
    paper, we build a computational model to investigate the possibility of this
    kind of rapid learning. The proposed method aims to improve the learning task
    of input from sensory memory by leveraging the information retrieved from
    long-term memory. We present a simple and intuitive technique called cognitive
    discriminative mappings (CDM) to explore the cognitive problem. First, CDM
    separates and clusters the data instances retrieved from long-term memory into
    distinct classes with a discrimination method in working memory when a sensory
    input triggers the algorithm. CDM then maps each sensory data instance to be as
    close as possible to the median point of the data group with the same class.
    The experimental results demonstrate that the CDM approach is effective for
    learning the discriminative features of supervised classifications with few
    training sensory input instances.

    Domain Adaptation with L2 constraints for classifying images from different endoscope systems

    Toru Tamaki, Shoji Sonoyama, Takio Kurita, Tsubasa Hirakawa, Bisser Raytchev, Kazufumi Kaneda, Tetsushi Koide, Shigeto Yoshida, Hiroshi Mieno, Shinji Tanaka, Kazuaki Chayama
    Comments: 28 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    This paper proposes a method for domain adaptation that extends the maximum
    margin domain transfer (MMDT) proposed by Hoffman et al., by introducing L_2
    distance constraints between samples of different domains; thus, our method is
    denoted as MMDTL2. Motivated by the differences between the images taken by
    narrow band imaging (NBI) endoscopic devices, we utilize different NBI devices
    as different domains and estimate the transformations between samples of
    different domains, i.e., image samples taken by different NBI endoscope
    systems. We first formulate the problem in the primal form, and then derive the
    dual form with much lesser computational costs as compared to the naive
    approach. From our experimental results using NBI image datasets from two
    different NBI endoscopic devices, we find that MMDTL2 is more stable than MMDT
    and better than support vector machines without adaptation.

    An Online Prediction Framework for Non-Stationary Time Series

    Christopher Xie, Avleen Bijral, Juan Lavista Ferres
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We extend an online time series prediction algorithm for ARMA processes to
    describe a framework for time series prediction that can efficiently handle
    non-stationarities that exist in many real time series. We show that
    appropriate transformations to such time series can lead to theoretical and
    empirical gains. To account for the phenomenon of cointegration in the
    multivariate case, we present a novel algorithm EC-VARMA-OGD that estimates
    both the auto-regressive and the cointegrating parameters. Relaxing the
    assumptions for the analysis, we prove a sub-linear regret bound for all the
    methods described. We note that the theoretical guarantees do not provide a
    complete picture, thus we provide a data-dependent analysis of the
    follow-the-leader algorithm for least squares loss that explains the success of
    using non-stationary transformations. We support all of our results with
    experiments on simulated and real data.

    Adversarial Ladder Networks

    Juan Maroñas Molano, Alberto Albiol Colomer, Roberto Paredes Palacios
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

    The use of unsupervised data in addition to supervised data in training
    discriminative neural networks has improved the performance of this clas-
    sification scheme. However, the best results were achieved with a training
    process that is divided in two parts: first an unsupervised pre-training step
    is done for initializing the weights of the network and after these weights are
    refined with the use of supervised data. On the other hand adversarial noise
    has improved the results of clas- sical supervised learning. Recently, a new
    neural network topology called Ladder Network, where the key idea is based in
    some properties of hierar- chichal latent variable models, has been proposed as
    a technique to train a neural network using supervised and unsupervised data at
    the same time with what is called semi-supervised learning. This technique has
    reached state of the art classification. In this work we add adversarial noise
    to the ladder network and get state of the art classification, with several
    important conclusions on how adversarial noise can help in addition with new
    possible lines of investi- gation. We also propose an alternative to add
    adversarial noise to unsu- pervised data.

    Learning Influence Functions from Incomplete Observations

    Xinran He, Ke Xu, David Kempe, Yan Liu
    Comments: Full version of paper “Learning Influence Functions from Incomplete Observations” in NIPS16
    Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)

    We study the problem of learning influence functions under incomplete
    observations of node activations. Incomplete observations are a major concern
    as most (online and real-world) social networks are not fully observable. We
    establish both proper and improper PAC learnability of influence functions
    under randomly missing observations. Proper PAC learnability under the
    Discrete-Time Linear Threshold (DLT) and Discrete-Time Independent Cascade
    (DIC) models is established by reducing incomplete observations to complete
    observations in a modified graph. Our improper PAC learnability result applies
    for the DLT and DIC models as well as the Continuous-Time Independent Cascade
    (CIC) model. It is based on a parametrization in terms of reachability
    features, and also gives rise to an efficient and practical heuristic.
    Experiments on synthetic and real-world datasets demonstrate the ability of our
    method to compensate even for a fairly large fraction of missing observations.


    Information Theory

    Tradeoff Caching Strategy of Outage Probability and Fronthaul Usage in Cloud-RAN

    Zhun Ye, Cunhua Pan, Huiling Zhu, Jiangzhou Wang
    Comments: submitted to one journal for possible publication
    Subjects: Information Theory (cs.IT)

    In this paper, optimal content caching strategy is proposed to jointly
    minimize the cell average outage probability and fronthaul usage in cloud radio
    access network (Cloud-RAN). At first, an accurate closed form expression of the
    outage probability conditioned on the user’s location is presented, and the
    cell average outage probability is obtained through the composite Simpson’s
    integration. The caching strategy for jointly optimizing the cell average
    outage probability and fronthaul usage is then formulated as a weighted sum
    minimization problem, which is a nonlinear 0-1 integer NP-hard problem. In
    order to deal with the NP-hard problem, two heuristic algorithms are proposed
    in this paper. Firstly, a genetic algorithm (GA) based approach is proposed.
    Numerical results show that the performance of the proposed GA-based approach
    with significantly reduced computational complexity is close to the optimal
    performance achieved by exhaustive search based caching strategy. Secondly, in
    order to further reduce the computational complexity, a mode selection approach
    is proposed. Numerical results show that this approach can achieve near-optimal
    performance over a wide range of the weighting factors through a single
    computation.

    RSB Decoupling Property of MAP Estimators

    Ali Bereyhi, Ralf R. Müller, Hermann Schulz-Baldes
    Comments: 5 pages, presented in Information Theory Workshop 2016
    Subjects: Information Theory (cs.IT)

    The large-system decoupling property of a MAP estimator is studied when it
    estimates the i.i.d. vector (oldsymbol{x}) from the observation
    (oldsymbol{y}=mathbf{A}oldsymbol{x}+oldsymbol{z}) with (mathbf{A})
    being chosen from a wide range of matrix ensembles, and the noise vector
    (oldsymbol{z}) being i.i.d. and Gaussian. Using the replica method, we show
    that the marginal joint distribution of any two corresponding input and output
    symbols converges to a deterministic distribution which describes the
    input-output distribution of a single user system followed by a MAP estimator.
    Under the (b)RSB assumption, the single user system is a scalar channel with
    additive noise where the noise term is given by the sum of an independent
    Gaussian random variable and (b) correlated interference terms. As the (b)RSB
    assumption reduces to RS, the interference terms vanish which results in the
    formerly studied RS decoupling principle.

    On the Uplink Spectral Efficiency of Full-Duplex Cooperative OFDMA Systems

    Jafar Banar, S. Mohammad Razavizadeh
    Comments: 5 pages, 4 figures, conference
    Subjects: Information Theory (cs.IT)

    In this paper, we develop a resource allocation algorithm for uplink of
    in-band full-duplex (FD) cellular networks. The FD cellular network is assumed
    to be based on orthogonal frequency division multiple access (OFDMA) and
    consists of a base station communicating with multiple users. Some of the users
    in the network act as relay for other users and help them to transmit their
    data to the base station. These relays are FD and work based on amplify and
    forward (AF) protocol. By appropriate selection of the relays and optimized
    allocation of subcarriers and powers to all users, we try to maximize the total
    sum rate of the network. During this optimization, we also impose some
    constraints on the users’ quality of service (QoS) and power. We propose a new
    algorithm to select the best relays based on the users’ maximum data rate and
    also use Linear Assignment Problem Jonker-Volgenant (LAPJV) algorithm for
    subcarrier assignment. It is proved that the resulting optimization problem can
    be converted to a convex problem, and hence it can be solved by standard
    numerical methods. The simulation results demonstrate the effect of the
    proposed scheme on the sum rate and coverage of the network.

    Spectrum sharing in energy harvesting cognitive radio networks: A cross-layer perspective

    Tian Zhang, Wei Chen
    Comments: Conference paper
    Subjects: Information Theory (cs.IT)

    In the paper, we present a cross-layer perspective on data transmission in
    energy harvesting cognitive radio networks (CRNs). The physical layer power
    allocation and network layer delay are jointly considered. The delay optimal
    power allocation is studied taking account the randomness of harvested energy,
    data generation, channel state and the grid price. To guarantee PU’s
    transmission, its Signal-Interference-Ratio (SIR) should be no less than a
    threshold. Each user, including primary user (PU) as well as secondary user
    (SU), has energy harvesting devices, and the PU can also purchases the grid
    power. Each user is rational and selfish to minimize its own the buffer delay.
    We formulate a stochastic Stackelberg game in a bilevel manner. After
    decoupling via rewriting the objective and constraints, an equivalent tractable
    reconstruction is derived. First, we give a distributive algorithm to obtain
    the Nash equilibrium (NE) of the lower level SUs’ noncooperative stochastic
    game. Thereafter, the stochastic Stackelberg game is discussed under the
    circumstances that there is no information exchange between PU and SU.
    Distributed iterative algorithms are designed. Furthermore, a distributive
    online algorithm is proposed. Finally, simulations are carried out to verify
    the correctness and demonstrate the effectiveness of proposed algorithms.

    Single-Tap Precoders and Decoders for Multi-User MIMO FBMC-OQAM under Strong Channel Frequency Selectivity

    François Rottenberg, Xavier Mestre, François Horlin, Jérôme Louveaux
    Journal-ref: IEEE Transactions on Signal Processing (2016)
    Subjects: Information Theory (cs.IT)

    The design of linear precoders or decoders for multiuser (MU) multiple-input
    multiple-output (MIMO) filterbank multicarrier (FBMC) modulations in the case
    of strong channel frequency selectivity is presented. The users and the base
    station (BS) communicate using space division multiple access (SDMA). The low
    complexity proposed solution is based on a single tap per-subcarrier
    precoding/decoding matrix at the base station (BS) in the downlink/uplink. As
    opposed to classical approaches that assume flat channel frequency selectivity
    at the subcarrier level, the BS does not make this assumption and takes into
    account the distortion caused by channel frequency selectivity. The expression
    of the FBMC asymptotic mean squared error (MSE) in the case of strong channel
    selectivity derived in earlier works is developed and extended. The linear
    precoders and decoders are found by optimizing the MSE formula under two design
    criteria, namely zero forcing (ZF) or minimum mean squared error (MMSE).
    Finally, simulation results demonstrate the performance of the optimized
    design. As long as the number of BS antennas is larger than the number of
    users, it is shown that those extra degrees of freedom can be used to
    compensate for the channel frequency selectivity.

    Policy Optimization for Content Push via Energy Harvesting Small Cells in Heterogeneous Networks

    Jie Gong, Sheng Zhou, Zhenyu Zhou, Zhisheng Niu
    Comments: 30 pages, 7 figures, to appear in IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    Motivated by the rapid development of energy harvesting technology and
    content-aware communication in access networks, this paper considers the push
    mechanism design in small-cell base stations (SBSs) powered by renewable
    energy. A user request can be satisfied by either push or unicast from the SBS.
    If the SBS cannot handle the request, the user is blocked by the SBS and is
    served by the macro-cell BS (MBS) instead, which typically consumes more
    energy. We aim to minimize the ratio of user requests blocked by the SBS. With
    finite battery capacity, Markov decision process based problem is formulated,
    and the optimal policy is found by dynamic programming (DP). Two
    threshold-based policies are proposed: the push-only threshold-based (POTB)
    policy and the energy-efficient threshold-based (EETB) policy, and the
    closed-form blocking probabilities with infinite battery capacity are derived.
    Numerical results show that the proposed policies outperform the conventional
    non-push policy if the content popularity changes slowly or the content request
    generating rate is high, and can achieve the performance of the greedy optimal
    threshold-based (GOTB) policy. In addition, the performance gap between the
    threshold-based policies and the DP optimal policy is small when the energy
    arrival rate is low or the request generating rate is high.

    Low-Complexity QoS-Aware Coordinated Scheduling for Heterogenous Networks

    Jun Zhu, Hong-Chuan Yang
    Comments: 6 pages, 6 figures, accepted by IEEE Transactions on Vehicular Technology
    Subjects: Information Theory (cs.IT)

    In this paper, we consider a heterogenous network (HetNet), where low-power
    indoor femtocells are deployed in the coverage area of the existing macro base
    station (MBS). This paper proposes a novel coordinated random beamforming and
    user scheduling strategy to improve the throughput of users served by the
    femtocell access point (FAP) while satisfying the quality-of-service (QoS)
    requirements of users served by both MBS and FAP. The strategy, termed as
    QoS-Aware Coodinated Scheduling (QACS), requires limited coordination between
    the MBS and FAP, i.e., only the indexes of the qualified beams are shared.
    Exact statistical analysis for the ergodic achievable rate of both FAP and MBS
    with the proposed strategy are presented. Scheduling fairness is also addressed
    for the proposed QACS.

    On a condition equivalent to the Maximal Distance Separable conjecture

    Jeffery Sun, Steven Damelin, Daniel Kaiser
    Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

    We prove that the following are equivalent. We denote by (mathcal{P}_q) the
    vector space of functions from a finite field (mathbb{F}_q) to itself, which
    can be represented as the space (mathcal{P}_q := mathbb{F}_q[x]/(x^q-x)) of
    polynomial functions. We denote by (mathcal{O}_n subset mathcal{P}_q) the
    set of polynomials that are either the zero polynomial, or have at most (n)
    distinct roots in (mathbb{F}_q). Given two subspaces (Y,Z) of (mathcal{P}_q),
    we denote by (langle Y,Z
    angle) their span.

    A) Let (k, q) integers, with (q) a prime power and (2 leq k leq q). Suppose
    that either:

    1) (q) is odd

    2) (q) is even and (k
    otin {3, q-1}).

    Then there do not exist distinct subspaces (Y) and (Z) of (mathcal{P}_q)
    such that:

    1′) (dim(langle Y, Z
    angle) = k)

    2′) (dim(Y) = dim(Z) = k-1).

    3′) (langle Y, Z
    angle subset mathcal{O}_{k-1})

    4′) (Y, Z subset mathcal{O}_{k-2})

    5′) (Ycap Z subset mathcal{O}_{k-3}).

    B) The MDS conjecture is true.

    Optimal Dynamic Point Selection for Power Minimization in Multiuser Downlink CoMP

    Duy H. N. Nguyen, Long B. Le, Tho Le-Ngoc
    Comments: 14 pages, 9 figures, accepted to IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    This paper examines a CoMP system where multiple base-stations (BS) employ
    coordinated beamforming to serve multiple mobile-stations (MS). Under the
    dynamic point selection mode, each MS can be assigned to only one BS at any
    time. This work then presents a solution framework to optimize the BS
    associations and coordinated beamformers for all MSs. With target
    signal-to-interference-plus-noise ratios at the MSs, the design objective is to
    minimize either the weighted sum transmit power or the per-BS transmit power
    margin. Since the original optimization problems contain binary variables
    indicating the BS associations, finding their optimal solutions is a
    challenging task. To circumvent this difficulty, we first relax the original
    problems into new optimization problems by expanding their constraint sets.
    Based on the nonconvex quadratic constrained quadratic programming framework,
    we show that these relaxed problems can be solved optimally. Interestingly,
    with the first design objective, the obtained solution from the relaxed problem
    is also optimal to the original problem. With the second design objective, a
    suboptimal solution to the original problem is then proposed, based on the
    obtained solution from the relaxed problem. Simulation results show that the
    resulting jointly optimal BS association and beamforming design significantly
    outperforms fixed BS association schemes.

    Chaos-based Wireless Communication Resisting Multipath Effect

    Junliang Yao, Chen Li, Haipeng Ren
    Comments: 5 pages, 3 figures, 1 table
    Subjects: Information Theory (cs.IT)

    In additive white gaussian noise (AWGN) channel, chaos has been proved to be
    optimal coherent communication waveform in the sense of maximizing the
    signal-to-noise ratio (SNR). In wireless communication system, intersymbol
    interference (ISI) caused by multipath propagation is one of main obstacles to
    achieve high bit transmission rate and low bit error rate (BER). How to resist
    multipath effect is a fundamental problem in chaos-based wireless communication
    system (CWCS). In this paper, a CWCS is built by transmitting chaotic signals
    generated by a hybrid dynamical system and filtering the received signal using
    the corresponding matched filter to decrease noise effect and to detect binary
    information. We find that the multipath effect can be effectively resisted by
    regrouping the return map of received signal and setting the corresponding
    threshold using available information. We show that the optimal threshold is a
    function of the channel parameters and the transmitted information symbols.
    Practically, the channel parameters are time-variant, and the future
    information symbols are unavailable. In this case, a suboptimal threshold (SOT)
    is proposed, and the BER using the SOT is derived analytically. Numerical
    experimental results show that the CWCS achieves a very competitive performance
    even under inaccurate channel parameters.

    Spectral Statistics of Lattice Graph Percolation Models

    Stephen Kruzick, Jose M. F. Moura
    Subjects: Numerical Analysis (cs.NA); Information Theory (cs.IT)

    In graph signal processing, the graph adjacency matrix or the graph Laplacian
    commonly define the shift operator. The spectral decomposition of the shift
    operator plays an important role in that the eigenvalues represent frequencies
    and the eigenvectors provide a spectral basis. This is useful, for example, in
    the design of filters. However, the graph or network may be uncertain due to
    stochastic influences in construction and maintenance, and, under such
    conditions, the eigenvalues of the shift matrix become random variables. This
    paper examines the spectral distribution of the eigenvalues of random networks
    formed by including each link of a D-dimensional lattice supergraph
    independently with identical probability, a percolation model. Using the
    stochastic canonical equation methods developed by Girko for symmetric matrices
    with independent upper triangular entries, a deterministic distribution is
    found that asymptotically approximates the empirical spectral distribution of
    the scaled adjacency matrix for a model with arbitrary parameters. The main
    results characterize the form of the solution to an important system of
    equations that leads to this deterministic distribution function and
    significantly reduce the number of equations that must be solved to find the
    solution for a given set of model parameters. Simulations comparing the
    expected empirical spectral distributions and the computed deterministic
    distributions are provided for sample parameters.

    Analysis of Static Cellular Cooperation between Mutually Nearest Neighboring Nodes

    Luis David Alvarez Corrales, Anastasios Giovanidis, Philippe Martins, Laurent Decreusefond
    Comments: 17 pages, double column, Appendices A-D, 9 Figures, 18 total subfigures. arXiv admin note: text overlap with arXiv:1604.04640
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Probability (math.PR)

    Cooperation in cellular networks is a promising scheme to improve system
    performance. Existing works consider that a user dynamically chooses the
    stations that cooperate for his/her service, but such assumption often has
    practical limitations. Instead, cooperation groups can be predefined and
    static, with nodes linked by fixed infrastructure. To analyze such a potential
    network, we propose a grouping method based on node proximity. With the
    Mutually Nearest Neighbour Relation, we allow the formation of singles and
    pairs of nodes. Given an initial topology for the stations, two new point
    processes are defined, one for the singles and one for the pairs. We derive
    structural characteristics for these processes and analyse the resulting
    interference fields. When the node positions follow a Poisson Point Process
    (PPP) the processes of singles and pairs are not Poisson. However, the
    performance of the original model can be approximated by the superposition of
    two PPPs. This allows the derivation of exact expressions for the coverage
    probability. Numerical evaluation shows coverage gains from different signal
    cooperation that can reach up to 15% compared to the standard noncooperative
    coverage. The analysis is general and can be applied to any type of cooperation
    in pairs of transmitting nodes.




沪ICP备19023445号-2号
友情链接