IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Fri, 30 Dec 2016

    我爱机器学习(52ml.net)发表于 2016-12-30 00:00:00
    love 0

    Neural and Evolutionary Computing

    Deep neural heart rate variability analysis

    Tamas Madl
    Comments: 6 pages in NIPS 2016 Workshop on Machine Learning for Health (ML4HC)
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Despite of the pain and limited accuracy of blood tests for early recognition
    of cardiovascular disease, they dominate risk screening and triage. On the
    other hand, heart rate variability is non-invasive and cheap, but not
    considered accurate enough for clinical practice. Here, we tackle heart beat
    interval based classification with deep learning. We introduce an end to end
    differentiable hybrid architecture, consisting of a layer of biological neuron
    models of cardiac dynamics (modified FitzHugh Nagumo neurons) and several
    layers of a standard feed-forward neural network. The proposed model is
    evaluated on ECGs from 474 stable at-risk (coronary artery disease) patients,
    and 1172 chest pain patients of an emergency department. We show that it can
    significantly outperform models based on traditional heart rate variability
    predictors, as well as approaching or in some cases outperforming clinical
    blood tests, based only on 60 seconds of inter-beat intervals.

    A Basic Recurrent Neural Network Model

    Fathi M. Salem
    Comments: 8 pages
    Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We present a model of a basic recurrent neural network (or bRNN) that
    includes a separate linear term with a slightly “stable” fixed matrix to
    guarantee bounded solutions and fast dynamic response. We formulate a state
    space viewpoint and adapt the constrained optimization Lagrange Multiplier
    (CLM) technique and the vector Calculus of Variations (CoV) to derive the
    (stochastic) gradient descent. In this process, one avoids the commonly used
    re-application of the circular chain-rule and identifies the error
    back-propagation with the co-state backward dynamic equations. We assert that
    this bRNN can successfully perform regression tracking of time-series.
    Moreover, the “vanishing and exploding” gradients are explicitly quantified and
    explained through the co-state dynamics and the update laws. The adapted CoV
    framework, in addition, can correctly and principally integrate new loss
    functions in the network on any variable and for varied goals, e.g., for
    supervised learning on the outputs and unsupervised learning on the internal
    (hidden) states.

    Stochastic single flux quantum neuromorphic computing using magnetically tunable Josephson junctions

    S. E. Russek, C. A. Donnelly, M. L. Schneider, B. Baek, M. R. Pufall, W. H. Rippard, P. F. Hopkins, P. D. Dresselhaus, S. P. Benz
    Comments: 2016 IEEE International Conference on Rebooting Computing (ICRC)
    Subjects: Superconductivity (cond-mat.supr-con); Neural and Evolutionary Computing (cs.NE)

    Single flux quantum (SFQ) circuits form a natural neuromorphic technology
    with SFQ pulses and superconducting transmission lines simulating action
    potentials and axons, respectively. Here we present a new component, magnetic
    Josephson junctions, that have a tunablility and re-configurability that was
    lacking from previous SFQ neuromorphic circuits. The nanoscale magnetic
    structure acts as a tunable synaptic constituent that modifies the junction
    critical current. These circuits can operate near the thermal limit where
    stochastic firing of the neurons is an essential component of the technology.
    This technology has the ability to create complex neural systems with greater
    than 10^21 neural firings per second with approximately 1 W dissipation.

    Optimization of Test Case Generation using Genetic Algorithm (GA)

    Ahmed Mateen, Marriam Nazir, Salman Afsar Awan
    Comments: 9 pages
    Journal-ref: International Journal of Computer Applications Foundation of
    Computer Science (FCS), NY, USA Volume 151 – Number 7 Year of Publication:
    2016
    Subjects: Software Engineering (cs.SE); Neural and Evolutionary Computing (cs.NE)

    Testing provides means pertaining to assuring software performance. The total
    aim of software industry is actually to make a certain start associated with
    high quality software for the end user. However, associated with software
    testing has quite a few underlying concerns, which are very important and need
    to pay attention on these issues. These issues are effectively generating,
    prioritization of test cases, etc. These issues can be overcome by paying
    attention and focus. Solitary of the greatest Problems in the software testing
    area is usually how to acquire a great proper set associated with cases to
    confirm software. Some other strategies and also methodologies are proposed
    pertaining to shipping care of most of these issues. Genetic Algorithm (GA)
    belongs to evolutionary algorithms. Evolutionary algorithms have a significant
    role in the automatic test generation and many researchers are focusing on it.
    In this study explored software testing related issues by using the GA
    approach. In addition to right after applying some analysis, better solution
    produced, that is feasible and reliable. The particular research presents the
    implementation of GAs because of its generation of optimized test cases. Along
    these lines, this paper gives proficient system for the optimization of test
    case generation using genetic algorithm.

    The Predictron: End-To-End Learning and Planning

    David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    One of the key challenges of artificial intelligence is to learn models that
    are effective in the context of planning. In this document we introduce the
    predictron architecture. The predictron consists of a fully abstract model,
    represented by a Markov reward process, that can be rolled forward multiple
    “imagined” planning steps. Each forward pass of the predictron accumulates
    internal rewards and values over multiple planning depths. The predictron is
    trained end-to-end so as to make these accumulated values accurately
    approximate the true value function. We applied the predictron to procedurally
    generated random mazes and a simulator for the game of pool. The predictron
    yielded significantly more accurate predictions than conventional deep neural
    network architectures.


    Computer Vision and Pattern Recognition

    Learning Visual N-Grams from Web Data

    Ang Li, Allan Jabri, Armand Joulin, Laurens van der Maaten
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Real-world image recognition systems need to recognize tens of thousands of
    classes that constitute a plethora of visual concepts. The traditional approach
    of annotating thousands of images per class for training is infeasible in such
    a scenario, prompting the use of webly supervised data. This paper explores the
    training of image-recognition systems on large numbers of images and associated
    user comments. In particular, we develop visual n-gram models that can predict
    arbitrary phrases that are relevant to the content of an image. Our visual
    n-gram models are feed-forward convolutional networks trained using new loss
    functions that are inspired by n-gram models commonly used in language
    modeling. We demonstrate the merits of our models in phrase prediction,
    phrase-based image retrieval, relating images and captions, and zero-shot
    transfer.

    From Virtual to Real World Visual Perception using Domain Adaptation — The DPM as Example

    Antonio M. Lopez, Jiaolong Xu, Jose L. Gomez, David Vazquez, German Ros
    Comments: Invited book chapter to appear in “Domain Adaptation in Computer Vision Applications”, Springer Series: Advances in Computer Vision and Pattern Recognition, Edited by Gabriela Csurka
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Supervised learning tends to produce more accurate classifiers than
    unsupervised learning in general. This implies that training data is preferred
    with annotations. When addressing visual perception challenges, such as
    localizing certain object classes within an image, the learning of the involved
    classifiers turns out to be a practical bottleneck. The reason is that, at
    least, we have to frame object examples with bounding boxes in thousands of
    images. A priori, the more complex the model is regarding its number of
    parameters, the more annotated examples are required. This annotation task is
    performed by human oracles, which ends up in inaccuracies and errors in the
    annotations (aka ground truth) since the task is inherently very cumbersome and
    sometimes ambiguous. As an alternative we have pioneered the use of virtual
    worlds for collecting such annotations automatically and with high precision.
    However, since the models learned with virtual data must operate in the real
    world, we still need to perform domain adaptation (DA). In this chapter we
    revisit the DA of a deformable part-based model (DPM) as an exemplifying case
    of virtual- to-real-world DA. As a use case, we address the challenge of
    vehicle detection for driver assistance, using different publicly available
    virtual-world data. While doing so, we investigate questions such as: how does
    the domain gap behave due to virtual-vs-real data with respect to dominant
    object appearance per domain, as well as the role of photo-realism in the
    virtual world.

    Partial Membership Latent Dirichlet Allocation

    Chao Chen, Alina Zare, Huy Trinh, Gbeng Omotara, J. Tory Cobb, Timotius Lagaunne
    Comments: Version 1, Sent for Review. arXiv admin note: substantial text overlap with arXiv:1511.02821
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Topic models (e.g., pLSA, LDA, sLDA) have been widely used for segmenting
    imagery. However, these models are confined to crisp segmentation, forcing a
    visual word (i.e., an image patch) to belong to one and only one topic. Yet,
    there are many images in which some regions cannot be assigned a crisp
    categorical label (e.g., transition regions between a foggy sky and the ground
    or between sand and water at a beach). In these cases, a visual word is best
    represented with partial memberships across multiple topics. To address this,
    we present a partial membership latent Dirichlet allocation (PM-LDA) model and
    an associated parameter estimation algorithm. This model can be useful for
    imagery where a visual word may be a mixture of multiple topics. Experimental
    results on visual and sonar imagery show that PM-LDA can produce both crisp and
    soft semantic image segmentations; a capability previous topic modeling methods
    do not have.

    Fast color transfer from multiple images

    Asad Khan, Luo Jiang, Wei Li, Ligang Liu
    Comments: arXiv admin note: text overlap with arXiv:1610.04861
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

    Color transfer between images uses the statistics information of image
    effectively. We present a novel approach of local color transfer between images
    based on the simple statistics and locally linear embedding. A sketching
    interface is proposed for quickly and easily specifying the color
    correspondences between target and source image. The user can specify the
    correspondences of local region using scribes, which more accurately transfers
    the target color to the source image while smoothly preserving the boundaries,
    and exhibits more natural output results. Our algorithm is not restricted to
    one-to-one image color transfer and can make use of more than one target images
    to transfer the color in different regions in the source image. Moreover, our
    algorithm does not require to choose the same color style and image size
    between source and target images. We propose the sub-sampling to reduce the
    computational load. Comparing with other approaches, our algorithm is much
    better in color blending in the input data. Our approach preserves the other
    color details in the source image. Various experimental results show that our
    approach specifies the correspondences of local color region in source and
    target images. And it expresses the intention of users and generates more
    actual and natural results of visual effect.

    Unsupervised domain adaptation in brain lesion segmentation with adversarial networks

    Konstantinos Kamnitsas, Christian Baumgartner, Christian Ledig, Virginia F.J. Newcombe, Joanna P. Simpson, Andrew D. Kane, David K. Menon, Aditya Nori, Antonio Criminisi, Daniel Rueckert, Ben Glocker
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Significant advances have been made towards building accurate automatic
    segmentation systems for a variety of biomedical applications using machine
    learning. However, the performance of these systems often degrades when they
    are applied on new data that differ from the training data, for example, due to
    variations in imaging protocols. Manually annotating new data for each test
    domain is not a feasible solution. In this work we investigate unsupervised
    domain adaptation using adversarial neural networks to train a segmentation
    method which is more invariant to differences in the input data, and which does
    not require any annotations on the test domain. Specifically, we learn
    domain-invariant features by learning to counter an adversarial network, which
    attempts to classify the domain of the input data by observing the activations
    of the segmentation network. Furthermore, we propose a multi-connected domain
    discriminator for improved adversarial training. Our system is evaluated using
    two MR databases of subjects with traumatic brain injuries, acquired using
    different scanners and imaging protocols. Using our unsupervised approach, we
    obtain segmentation accuracies which are close to the upper bound of supervised
    domain adaptation.

    Deep Unsupervised Representation Learning for Remote Sensing Images

    DaoYu Lin
    Comments: arXiv admin note: text overlap with arXiv:1511.06434, arXiv:1606.03498, arXiv:1406.2661, arXiv:1508.00092 by other authors
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Scene classification plays a key role in interpreting the remotely sensed
    high-resolution images. With the development of deep learning, supervised
    learning in classification of Remote Sensing with convolutional networks (CNNs)
    has been frequently adopted. However, researchers paid less attention to
    unsupervised learning in remote sensing with CNNs. In order to filling the gap,
    this paper proposes a set of CNNs called extbf{M}ultiple
    l extbf{A}ye extbf{R} fea extbf{T}ure m extbf{A}tching(MARTA) generative
    adversarial networks (GANs) to learn representation using only unlabeled data.
    There will be two models of MARTA GANs involved: (1) a generative model (G)
    that captures the data distribution and provides more training data; (2) a
    discriminative model (D) that estimates the possibility that a sample came from
    the training data rather than (G) and in this way a well-formed representation
    of dataset can be learned. Therefore, MARTA GANs obtain the state-of-the-art
    results which outperform the results got from UC-Merced Land-use dataset and
    Brazilian Coffee Scenes dataset.

    Semantic Video Segmentation by Gated Recurrent Flow Propagation

    David Nilsson, Cristian Sminchisescu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Semantic video segmentation is challenging due to the sheer amount of data
    that needs to be processed and labeled in order to construct accurate models.
    In this paper we present a deep, end-to-end trainable methodology to video
    segmentation that is capable of leveraging information present in unlabeled
    data in order to improve semantic estimates. Our model combines a convolutional
    architecture and a spatial transformer recurrent layer that are able to
    temporally propagate labeling information by means of optical flow, adaptively
    gated based on its locally estimated uncertainty. The flow, the recogition and
    the gated propagation modules can be trained jointly, end-to-end. The gated
    recurrent flow propagation component of our model can be plugged-in any static
    semantic segmentation architecture and turn it into a weakly supervised video
    processing one. Our extensive experiments in the challenging CityScapes dataset
    indicate that the resulting model can leverage unlabeled temporal frames next
    to a labeled one in order to improve both the video segmentation accuracy and
    the consistency of its temporal labeling, at no additional annotation cost.

    FastMask: Segment Object Multi-scale Candidates in One Shot

    Hexiang Hu, Shiyi Lan, Yuning Jiang, Zhimin Cao, Fei Sha
    Comments: Our implementation is available on this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Objects appear to scale differently in natural images. This fact requires
    methods dealing with object-centric tasks e.g. object proposal to have robust
    performance over scale variances of objects. In the paper we present a novel
    segment proposal framework, namely FastMask, which takes advantage of the
    hierarchical structure in deep convolutional neural network to segment
    multi-scale objects in one shot. Innovatively, we generalize segment proposal
    network into three different functional components (body, neck and head). We
    further propose a weight-shared residual neck module as well as a
    scale-tolerant attentional head module for multi-scale training and efficient
    one-shot inference. On MS COCO benchmark, the proposed FastMask outperforms all
    state-of-the-art segment proposal methods in average recall while keeping 2~5
    times faster. More impressively, with a slight trade-off in accuracy, FastMask
    can segment objects in near real time (~13 fps) at 800( imes)600 resolution
    images, highlighting its potential in practical applications. Our
    implementation is available on this https URL

    Accelerated Convolutions for Efficient Multi-Scale Time to Contact Computation in Julia

    Alexander Amini, Berthold Horn, Alan Edelman
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Performance (cs.PF)

    Convolutions have long been regarded as fundamental to applied mathematics,
    physics and engineering. Their mathematical elegance allows for common tasks
    such as numerical differentiation to be computed efficiently on large data
    sets. Efficient computation of convolutions is critical to artificial
    intelligence in real-time applications, like machine vision, where convolutions
    must be continuously and efficiently computed on tens to hundreds of kilobytes
    per second. In this paper, we explore how convolutions are used in fundamental
    machine vision applications. We present an accelerated n-dimensional
    convolution package in the high performance computing language, Julia, and
    demonstrate its efficacy in solving the time to contact problem for machine
    vision. Results are measured against synthetically generated videos and
    quantitatively assessed according to their mean squared error from the ground
    truth. We achieve over an order of magnitude decrease in compute time and
    allocated memory for comparable machine vision applications. All code is
    packaged and integrated into the official Julia Package Manager to be used in
    various other scenarios.

    Multivariate mixture model for myocardium segmentation combining multi-source images

    Xiahai Zhuang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes a method for simultaneous segmentation of multi-source
    images, using the multivariate mixture model (MvMM) and maximum of
    log-likelihood (LL) framework. The segmentation is a procedure of texture
    classification, and the MvMM is used to model the joint intensity distribution
    of the images. Specifically, the method is applied to the myocardial
    segmentation combining the complementary texture information from
    multi-sequence (MS) cardiac magnetic resonance (CMR) images. Furthermore, there
    exist inter-image mis-registration and intra-image misalignment of slices in
    the MS CMR images. Hence, the MvMM is formulated with transformations, which
    are embedded into the LL framework and optimized simultaneously with the
    segmentation parameters. The proposed method is able to correct the inter- and
    intra-image misalignment by registering each slice of the MS CMR to a virtual
    common space, as well as to delineate the indistinguishable boundaries of
    myocardium consisting of pathologies. Results have shown statistically
    significant improvement in the segmentation performance of the proposed method
    with respect to the conventional approaches which can solely segment each image
    separately. The proposed method has also demonstrated better robustness in the
    incongruent data, where some images may not fully cover the region of interest
    and the full coverage can only be reconstructed combining the images from
    multiple sources.

    Symbolic Representation and Classification of Logos

    D. S. Guru, N. Vinay Kumar
    Comments: 15 pages, 6 figures, 6 tables, Proceedings of International Conference on Computer Vision and Image Processing (CVIP 2016). arXiv admin note: text overlap with arXiv:1609.01414
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, a model for classification of logos based on symbolic
    representation of features is presented. The proposed model makes use of global
    features of logo images such as color, texture, and shape features for
    classification. The logo images are broadly classified into three different
    classes, viz., logo image containing only text, an image with only symbol, and
    an image with both text and a symbol. In each class, the similar looking logo
    images are clustered using K-means clustering algorithm. The intra-cluster
    variations present in each cluster corresponding to each class are then
    preserved using symbolic interval data. Thus referenced logo images are
    represented in the form of interval data. A sample logo image is then
    classified using suitable symbolic classifier. For experimentation purpose,
    relatively large amount of color logo images is created consisting of 5044 logo
    images. The classification results are validated with the help of accuracy,
    precision, recall, F-measure, and time. To check the efficacy of the proposed
    model, the comparative analyses are given against the other models. The results
    show that the proposed model outperforms the other models with respect to time
    and F-measure.

    A novel Gaussian mixture model for superpixel segmentation

    Zhihua Ban, Jianguo Liu, Li Cao
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Superpixel segmentation is used to partition an image into perceptually
    coherence atomic regions. As a preprocessing step of computer vision
    applications, it can enormously reduce the number of entries of subsequent
    algorithms. With each superpixel associated with a Gaussian distribution, we
    assume that a pixel is generated by first randomly choosing one of the
    superpixels, and then the pixel is drawn from the corresponding Gaussian
    density. Unlike most applications of Gaussian mixture model in clustering, data
    points in our model are assumed to be non-identically distributed. Given an
    image, a log-likelihood function is constructed for maximizing. Based on a
    solution derived from the expectation-maximization method, a well designed
    algorithm is proposed. Our method is of linear complexity with respect to the
    number of pixels, and it can be implemented using parallel techniques. To the
    best of our knowledge, our algorithm outperforms the state-of-the-art in
    accuracy and presents a competitive performance in computational efficiency.

    An FFT-based Synchronization Approach to Recognize Human Behaviors using STN-LFP Signal

    Hosein M. Golshan, Adam O. Hebb, Sara J. Hanrahan, Joshua Nedrud, Mohammad H. Mahoor
    Comments: IEEE Conf on ICASSP 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)

    Classification of human behavior is key to developing closed-loop Deep Brain
    Stimulation (DBS) systems, which may be able to decrease the power consumption
    and side effects of the existing systems. Recent studies have shown that the
    Local Field Potential (LFP) signals from both Subthalamic Nuclei (STN) of the
    brain can be used to recognize human behavior. Since the DBS leads implanted in
    each STN can collect three bipolar signals, the selection of a suitable pair of
    LFPs that achieves optimal recognition performance is still an open problem to
    address. Considering the presence of synchronized aggregate activity in the
    basal ganglia, this paper presents an FFT-based synchronization approach to
    automatically select a relevant pair of LFPs and use the pair together with an
    SVM-based MKL classifier for behavior recognition purposes. Our experiments on
    five subjects show the superiority of the proposed approach compared to other
    methods used for behavior classification.

    Quantum Clustering and Gaussian Mixtures

    Mahajabin Rahman, Davi Geiger
    Comments: 16 pages, 13 figures
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV)

    The mixture of Gaussian distributions, a soft version of k-means , is
    considered a state-of-the-art clustering algorithm. It is widely used in
    computer vision for selecting classes, e.g., color, texture, and shapes. In
    this algorithm, each class is described by a Gaussian distribution, defined by
    its mean and covariance. The data is described by a weighted sum of these
    Gaussian distributions. We propose a new method, inspired by quantum
    interference in physics. Instead of modeling each class distribution directly,
    we model a class wave function such that its magnitude square is the class
    Gaussian distribution. We then mix the class wave functions to create the
    mixture wave function. The final mixture distribution is then the magnitude
    square of the mixture wave function. As a result, we observe the quantum class
    interference phenomena, not present in the Gaussian mixture model. We show that
    the quantum method outperforms the Gaussian mixture method in every aspect of
    the estimations. It provides more accurate estimations of all distribution
    parameters, with much less fluctuations, and it is also more robust to data
    deformations from the Gaussian assumptions. We illustrate our method for color
    segmentation as an example application.

    Meta-Unsupervised-Learning: A supervised approach to unsupervised learning

    Vikas K. Garg, Adam Tauman Kalai
    Comments: 22 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    We introduce a new paradigm to investigate unsupervised learning, reducing
    unsupervised learning to supervised learning. Specifically, we mitigate the
    subjectivity in unsupervised decision-making by leveraging knowledge acquired
    from prior, possibly heterogeneous, supervised learning tasks. We demonstrate
    the versatility of our framework via comprehensive expositions and detailed
    experiments on several unsupervised problems such as (a) clustering, (b)
    outlier detection, and (c) similarity prediction under a common umbrella of
    meta-unsupervised-learning. We also provide rigorous PAC-agnostic bounds to
    establish the theoretical foundations of our framework, and show that our
    framing of meta-clustering circumvents Kleinberg’s impossibility theorem for
    clustering.


    Artificial Intelligence

    A hybrid approach to supervised machine learning for algorithmic melody composition

    Rouven Bauer
    Subjects: Artificial Intelligence (cs.AI)

    In this work we present an algorithm for composing monophonic melodies
    similar in style to those of a given, phrase annotated, sample of melodies. For
    implementation, a hybrid approach incorporating parametric Markov models of
    higher order and a contour concept of phrases is used. This work is based on
    the master thesis of Thayabaran Kathiresan (2015). An online listening test
    conducted shows that enhancing a pure Markov model with musically relevant
    context, like count and planed melody contour, improves the result
    significantly.

    Efficient iterative policy optimization

    Nicolas Le Roux
    Comments: 12 pages
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)

    We tackle the issue of finding a good policy when the number of policy
    updates is limited. This is done by approximating the expected policy reward as
    a sequence of concave lower bounds which can be efficiently maximized,
    drastically reducing the number of policy updates required to achieve good
    performance. We also extend existing methods to negative rewards, enabling the
    use of control variates.

    Automated timetabling for small colleges and high schools using huge integer programs

    Joshua S. Friedman
    Subjects: Artificial Intelligence (cs.AI)

    We formulate an integer program to solve a highly constrained academic
    timetabling problem at the United States Merchant Marine Academy. The IP
    instance that results from our real case study has approximately both 170,000
    rows and columns and solves to near optimality in 12 hours, using a commercial
    solver. Our model is applicable to both high schools and small colleges who
    wish to deviate from group scheduling. We also solve a necessary preprocessing
    student subgrouping problem, which breaks up big groups of students into small
    groups so they can optimally fit into small capacity classes.

    Lifted Relational Algebra with Recursion and Connections to Modal Logic

    Eugenia Ternovska
    Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Databases (cs.DB)

    We propose a new formalism for specifying and reasoning about problems that
    involve heterogeneous “pieces of information” — large collections of data,
    decision procedures of any kind and complexity and connections between them.
    The essence of our proposal is to lift Codd’s relational algebra from
    operations on relational tables to operations on classes of structures (with
    recursion), and to add a direction of information propagation. We observe the
    presence of information propagation in several formalisms for efficient
    reasoning and use it to express unary negation and operations used in graph
    databases. We carefully analyze several reasoning tasks and establish a precise
    connection between a generalized query evaluation and temporal logic model
    checking. Our development allows us to reveal a general correspondence between
    classical and modal logics and may shed a new light on the good computational
    properties of modal logics and related formalisms.

    Deep neural heart rate variability analysis

    Tamas Madl
    Comments: 6 pages in NIPS 2016 Workshop on Machine Learning for Health (ML4HC)
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Despite of the pain and limited accuracy of blood tests for early recognition
    of cardiovascular disease, they dominate risk screening and triage. On the
    other hand, heart rate variability is non-invasive and cheap, but not
    considered accurate enough for clinical practice. Here, we tackle heart beat
    interval based classification with deep learning. We introduce an end to end
    differentiable hybrid architecture, consisting of a layer of biological neuron
    models of cardiac dynamics (modified FitzHugh Nagumo neurons) and several
    layers of a standard feed-forward neural network. The proposed model is
    evaluated on ECGs from 474 stable at-risk (coronary artery disease) patients,
    and 1172 chest pain patients of an emergency department. We show that it can
    significantly outperform models based on traditional heart rate variability
    predictors, as well as approaching or in some cases outperforming clinical
    blood tests, based only on 60 seconds of inter-beat intervals.

    From Virtual to Real World Visual Perception using Domain Adaptation — The DPM as Example

    Antonio M. Lopez, Jiaolong Xu, Jose L. Gomez, David Vazquez, German Ros
    Comments: Invited book chapter to appear in “Domain Adaptation in Computer Vision Applications”, Springer Series: Advances in Computer Vision and Pattern Recognition, Edited by Gabriela Csurka
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Supervised learning tends to produce more accurate classifiers than
    unsupervised learning in general. This implies that training data is preferred
    with annotations. When addressing visual perception challenges, such as
    localizing certain object classes within an image, the learning of the involved
    classifiers turns out to be a practical bottleneck. The reason is that, at
    least, we have to frame object examples with bounding boxes in thousands of
    images. A priori, the more complex the model is regarding its number of
    parameters, the more annotated examples are required. This annotation task is
    performed by human oracles, which ends up in inaccuracies and errors in the
    annotations (aka ground truth) since the task is inherently very cumbersome and
    sometimes ambiguous. As an alternative we have pioneered the use of virtual
    worlds for collecting such annotations automatically and with high precision.
    However, since the models learned with virtual data must operate in the real
    world, we still need to perform domain adaptation (DA). In this chapter we
    revisit the DA of a deformable part-based model (DPM) as an exemplifying case
    of virtual- to-real-world DA. As a use case, we address the challenge of
    vehicle detection for driver assistance, using different publicly available
    virtual-world data. While doing so, we investigate questions such as: how does
    the domain gap behave due to virtual-vs-real data with respect to dominant
    object appearance per domain, as well as the role of photo-realism in the
    virtual world.

    Meta-Unsupervised-Learning: A supervised approach to unsupervised learning

    Vikas K. Garg, Adam Tauman Kalai
    Comments: 22 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    We introduce a new paradigm to investigate unsupervised learning, reducing
    unsupervised learning to supervised learning. Specifically, we mitigate the
    subjectivity in unsupervised decision-making by leveraging knowledge acquired
    from prior, possibly heterogeneous, supervised learning tasks. We demonstrate
    the versatility of our framework via comprehensive expositions and detailed
    experiments on several unsupervised problems such as (a) clustering, (b)
    outlier detection, and (c) similarity prediction under a common umbrella of
    meta-unsupervised-learning. We also provide rigorous PAC-agnostic bounds to
    establish the theoretical foundations of our framework, and show that our
    framing of meta-clustering circumvents Kleinberg’s impossibility theorem for
    clustering.

    The formal-logical characterisation of lies, deception, and associated notions

    Toni Heidenreich
    Comments: Literature review prepared as a student at King’s College London
    Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

    Defining various dishonest notions in a formal way is a key step to enable
    intelligent agents to act in untrustworthy environments. This review evaluates
    the literature for this topic by looking at formal definitions based on modal
    logic as well as other formal approaches. Criteria from philosophical
    groundwork is used to assess the definitions for correctness and completeness.
    The key contribution of this review is to show that only a few definitions
    fully comply with this gold standard and to point out the missing steps towards
    a successful application of these definitions in an actual agent environment.

    FastMask: Segment Object Multi-scale Candidates in One Shot

    Hexiang Hu, Shiyi Lan, Yuning Jiang, Zhimin Cao, Fei Sha
    Comments: Our implementation is available on this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Objects appear to scale differently in natural images. This fact requires
    methods dealing with object-centric tasks e.g. object proposal to have robust
    performance over scale variances of objects. In the paper we present a novel
    segment proposal framework, namely FastMask, which takes advantage of the
    hierarchical structure in deep convolutional neural network to segment
    multi-scale objects in one shot. Innovatively, we generalize segment proposal
    network into three different functional components (body, neck and head). We
    further propose a weight-shared residual neck module as well as a
    scale-tolerant attentional head module for multi-scale training and efficient
    one-shot inference. On MS COCO benchmark, the proposed FastMask outperforms all
    state-of-the-art segment proposal methods in average recall while keeping 2~5
    times faster. More impressively, with a slight trade-off in accuracy, FastMask
    can segment objects in near real time (~13 fps) at 800( imes)600 resolution
    images, highlighting its potential in practical applications. Our
    implementation is available on this https URL

    Accelerated Convolutions for Efficient Multi-Scale Time to Contact Computation in Julia

    Alexander Amini, Berthold Horn, Alan Edelman
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Performance (cs.PF)

    Convolutions have long been regarded as fundamental to applied mathematics,
    physics and engineering. Their mathematical elegance allows for common tasks
    such as numerical differentiation to be computed efficiently on large data
    sets. Efficient computation of convolutions is critical to artificial
    intelligence in real-time applications, like machine vision, where convolutions
    must be continuously and efficiently computed on tens to hundreds of kilobytes
    per second. In this paper, we explore how convolutions are used in fundamental
    machine vision applications. We present an accelerated n-dimensional
    convolution package in the high performance computing language, Julia, and
    demonstrate its efficacy in solving the time to contact problem for machine
    vision. Results are measured against synthetically generated videos and
    quantitatively assessed according to their mean squared error from the ground
    truth. We achieve over an order of magnitude decrease in compute time and
    allocated memory for comparable machine vision applications. All code is
    packaged and integrated into the official Julia Package Manager to be used in
    various other scenarios.

    The Predictron: End-To-End Learning and Planning

    David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    One of the key challenges of artificial intelligence is to learn models that
    are effective in the context of planning. In this document we introduce the
    predictron architecture. The predictron consists of a fully abstract model,
    represented by a Markov reward process, that can be rolled forward multiple
    “imagined” planning steps. Each forward pass of the predictron accumulates
    internal rewards and values over multiple planning depths. The predictron is
    trained end-to-end so as to make these accumulated values accurately
    approximate the true value function. We applied the predictron to procedurally
    generated random mazes and a simulator for the game of pool. The predictron
    yielded significantly more accurate predictions than conventional deep neural
    network architectures.


    Information Retrieval

    Condensedly: comprehending article contents through condensed texts

    Chao-Hsuan Ke, Tsung-Lu Michael Lee, Jung-Hsien Chiang
    Subjects: Information Retrieval (cs.IR)

    Summary: Abstracts in biomedical articles can provide a quick overview of the
    articles but detailed information cannot be obtained without reading full-text
    contents. Full-text articles certainly generate more information and contents;
    however, accessing full-text documents is usually time consuming. Condensedly
    is a web-based application, which provides readers an easy and efficient way to
    access full-text paragraphs using sentences in abstracts as fishing bait to
    retrieve the big fish reside in full-text. Condensedly is based on the
    paragraph ranking algorithm, which evaluates and ranks full-text paragraphs
    based on their association scores with sentences in abstracts.

    Availability: this http URL


    Computation and Language

    Verifying Heaps' law using Google Books Ngram data

    Vladimir V. Bochkarev, Eduard Yu.Lerner, Anna V. Shevlyakova
    Comments: 8 pages, 6 figures
    Subjects: Computation and Language (cs.CL); Physics and Society (physics.soc-ph)

    This article is devoted to the verification of the empirical Heaps law in
    European languages using Google Books Ngram corpus data. The connection between
    word distribution frequency and expected dependence of individual word number
    on text size is analysed in terms of a simple probability model of text
    generation. It is shown that the Heaps exponent varies significantly within
    characteristic time intervals of 60-100 years.

    Deep Semi-Supervised Learning with Linguistically Motivated Sequence Labeling Task Hierarchies

    Jonathan Godwin, Pontus Stenetorp, Sebastian Riedel
    Subjects: Computation and Language (cs.CL)

    In this paper we present a novel Neural Network algorithm for conducting
    semi-supervised learning for sequence labeling tasks arranged in a
    linguistically motivated hierarchy. This relationship is exploited to
    regularise the representations of supervised tasks by backpropagating the error
    of the unsupervised task through the supervised tasks. We introduce a neural
    network where lower layers are supervised by junior downstream tasks and the
    final layer task is an auxiliary unsupervised task. The architecture shows
    improvements of up to two percentage points F1 for Chunking compared to a
    plausible baseline.

    Here's My Point: Argumentation Mining with Pointer Networks

    Peter Potash, Alexey Romanov, Anna Rumshisky
    Comments: 10 pages; under review for ICLR
    Subjects: Computation and Language (cs.CL)

    One of the major goals in automated argumentation mining is to uncover the
    argument structure present in argumentative text. In order to determine this
    structure, one must understand how different individual components of the
    overall argument are linked. General consensus in this field dictates that the
    argument components form a hierarchy of persuasion, which manifests itself in a
    tree structure. This work provides the first neural network-based approach to
    argumentation mining, focusing on the two tasks of extracting links between
    argument components, and classifying types of argument components. In order to
    solve this problem, we propose to use a joint model that is based on a Pointer
    Network architecture. A Pointer Network is appealing for this task for the
    following reasons: 1) It takes into account the sequential nature of argument
    components; 2) By construction, it enforces certain properties of the tree
    structure present in argument relations; 3) The hidden representations can be
    applied to auxiliary tasks. In order to extend the contribution of the original
    Pointer Network model, we construct a joint model that simultaneously attempts
    to learn the type of argument component, as well as continuing to predict links
    between argument components. The proposed joint model achieves state-of-the-art
    results on two separate evaluation corpora, achieving far superior performance
    than a regular Pointer Network model. Our results show that optimizing for both
    tasks, and adding a fully-connected layer prior to recurrent neural network
    input, is crucial for high performance.

    Shamela: A Large-Scale Historical Arabic Corpus

    Yonatan Belinkov, Alexander Magidow, Maxim Romanov, Avi Shmidman, Moshe Koppel
    Comments: Slightly expanded version of Coling LT4DH workshop paper
    Subjects: Computation and Language (cs.CL)

    Arabic is a widely-spoken language with a rich and long history spanning more
    than fourteen centuries. Yet existing Arabic corpora largely focus on the
    modern period or lack sufficient diachronic information. We develop a
    large-scale, historical corpus of Arabic of about 1 billion words from diverse
    periods of time. We clean this corpus, process it with a morphological
    analyzer, and enhance it by detecting parallel passages and automatically
    dating undated texts. We demonstrate its utility with selected case-studies in
    which we show its application to the digital humanities.

    The ontogeny of discourse structure mimics the development of literature

    Natalia Bezerra Mota, Sylvia Pinheiro, Mariano Sigman, Diego Fernandez Slezak, Guillermo Cecchi, Mauro Copelli, Sidarta Ribeiro
    Comments: Natalia Bezerra Mota and Sylvia Pinheiro: Equal contribution Sidarta Ribeiro and Mauro Copelli: Corresponding authors
    Subjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL); Physics and Society (physics.soc-ph)

    Discourse varies with age, education, psychiatric state and historical epoch,
    but the ontogenetic and cultural dynamics of discourse structure remain to be
    quantitatively characterized. To this end we investigated word graphs obtained
    from verbal reports of 200 subjects ages 2-58, and 676 literary texts spanning
    ~5,000 years. In healthy subjects, lexical diversity, graph size, and
    long-range recurrence departed from initial near-random levels through a
    monotonic asymptotic increase across ages, while short-range recurrence showed
    a corresponding decrease. These changes were explained by education and suggest
    a hierarchical development of discourse structure: short-range recurrence and
    lexical diversity stabilize after elementary school, but graph size and
    long-range recurrence only stabilize after high school. This gradual maturation
    was blurred in psychotic subjects, who maintained in adulthood a near-random
    structure. In literature, monotonic asymptotic changes over time were
    remarkable: While lexical diversity, long-range recurrence and graph size
    increased away from near-randomness, short-range recurrence declined, from
    above to below random levels. Bronze Age texts are structurally similar to
    childish or psychotic discourses, but subsequent texts converge abruptly to the
    healthy adult pattern around the onset of the Axial Age (800-200 BC), a period
    of pivotal cultural change. Thus, individually as well as historically,
    discourse maturation increases the range of word recurrence away from
    randomness.


    Distributed, Parallel, and Cluster Computing

    Distributed Convex Optimization with Inequality Constraints over Time-varying Unbalanced Digraphs

    Pei Xie, Keyou You, Roberto Tempo, Shiji Song, Cheng Wu
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)

    This paper considers a distributed convex optimization problem with
    inequality constraints over time-varying unbalanced digraphs, where the cost
    function is a sum of local objectives, and each node of the graph only knows
    its local objective and inequality constraints. Although there is a vast
    literature on distributed optimization, most of them require the graph to be
    balanced, which is quite restrictive and not necessary. Very recently, the
    unbalanced problem has been resolved only for either time-invariant graphs or
    unconstrained optimization. This work addresses the unbalancedness by focusing
    on an epigraph form of the constrained optimization. A striking feature is that
    this novel idea can be easily used to study time-varying unbalanced digraphs.
    Under local communications, a simple iterative algorithm is then designed for
    each node. We prove that if the graph is uniformly jointly strongly connected,
    each node asymptotically converges to some common optimal solution.

    Parallel Digital Predistortion Design on Mobile GPU and Embedded Multicore CPU for Mobile Transmitters

    Kaipeng Li, Amanullah Ghazi, Chance Tarver, Jani Boutellier, Mahmoud Abdelaziz, Lauri Anttila, Markku Juntti, Mikko Valkama, Joseph R. Cavallaro
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Digital predistortion (DPD) is a widely adopted baseband processing technique
    in current radio transmitters. While DPD can effectively suppress unwanted
    spurious spectrum emissions stemming from imperfections of analog RF and
    baseband electronics, it also introduces extra processing complexity and poses
    challenges on efficient and flexible implementations, especially for mobile
    cellular transmitters, considering their limited computing power compared to
    basestations. In this paper, we present high data rate implementations of
    broadband DPD on modern embedded processors, such as mobile GPU and multicore
    CPU, by taking advantage of emerging parallel computing techniques for
    exploiting their computing resources. We further verify the suppression effect
    of DPD experimentally on real radio hardware platforms. Performance evaluation
    results of our DPD design demonstrate the high efficacy of modern general
    purpose mobile processors on accelerating DPD processing for a mobile
    transmitter.

    Connectivity based technique for localization of nodes in wireless sensor networks

    Muhammad Farooq-i-Azam, Muhammad Naeem Ayyaz, Saleem Akhtar
    Comments: arXiv admin note: text overlap with arXiv:1611.03420
    Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)

    We propose a localization algorithm for wireless sensor networks, which is
    simple in design, does not involve significant overhead and yet provides
    acceptable position estimates of sensor nodes. The algorithm uses settled nodes
    as beacon nodes so as to increase the number of beacon nodes. The algorithm is
    range free and does not need any additional piece of hardware for ranging. It
    also does not involve any significant communication overhead for localization.
    The simulation and results show that good localization accuracy is achieved for
    outdoor environments.

    Ergodic Effects in Token Circulation

    Adrian Kosowski, Przemysław Uznański
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

    We consider a dynamical process in a network which distributes all tokens
    located at a node among its neighbors, in a round-robin manner.

    We show that in the recurrent state of this dynamics, the number of particles
    located on a given edge, averaged over an interval of time, is tightly
    concentrated around the average particle density in the system. Formally, for a
    system of (k) particles in a graph of (m) edges, during any interval of length
    (T), this time-averaged value is (k/m pm widetilde{O}(1/T)), whenever
    (gcd(m,k) = widetilde{O}(1)) (and so, e.g., whenever (m) is a prime number).
    To achieve these bounds, we link the behavior of the studied dynamics to
    ergodic properties of traversals based on Eulerian circuits on a symmetric
    directed graph. These results are proved through sum set methods and are likely
    to be of independent interest.

    As a corollary, we also obtain bounds on the emph{idleness} of the studied
    dynamics, i.e., on the longest possible time between two consecutive
    appearances of a token on an edge, taken over all edges. Designing trajectories
    for (k) tokens in a way which minimizes refresh time is fundamental to the
    study of patrolling in networks. Our results immediately imply a bound of
    (widetilde{O}(m/k)) on the idleness of the studied process, thus showing that
    it is a distributed (widetilde{O}(1))-competitive solution to the patrolling
    task, for all of the covered cases.

    Emerging Security Challenges of Cloud Virtual Infrastructure

    Amani S. Ibrahim, James Hamlyn-Harris, John Grundy
    Comments: 6 pages Published in APSEC 2010 Workshop on Cloud Computing
    Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)

    The cloud computing model is rapidly transforming the IT landscape. Cloud
    computing is a new computing paradigm that delivers computing resources as a
    set of reliable and scalable internet-based services allowing customers to
    remotely run and manage these services. Infrastructure-as-a-service (IaaS) is
    one of the popular cloud computing services. IaaS allows customers to increase
    their computing resources on the fly without investing in new hardware. IaaS
    adapts virtualization to enable on-demand access to a pool of virtual computing
    resources. Although there are great benefits to be gained from cloud computing,
    cloud computing also enables new categories of threats to be introduced. These
    threats are a result of the cloud virtual infrastructure complexity created by
    the adoption of the virtualization technology.

    Breaching the security of any component in the cloud virtual infrastructure
    significantly impacts on the security of other components and consequently
    affects the overall system security. This paper explores the security problem
    of the cloud platform virtual infrastructure identifying the existing security
    threats and the complexities of this virtual infrastructure. The paper also
    discusses the existing security approaches to secure the cloud virtual
    infrastructure and their drawbacks. Finally, we propose and explore some key
    research challenges of implementing new virtualization-aware security solutions
    that can provide the pre-emptive protection for complex and ever- dynamic cloud
    virtual infrastructure.


    Learning

    Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization

    Xingguo Li, Zhaoran Wang, Junwei Lu, Raman Arora, Jarvis Haupt, Han Liu, Tuo Zhao
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

    We propose a general theory for studying the geometry of nonconvex objective
    functions with underlying symmetric structures. In specific, we characterize
    the locations of stationary points and the null space of the associated Hessian
    matrices via the lens of invariant groups. As a major motivating example, we
    apply the proposed general theory to characterize the global geometry of the
    low-rank matrix factorization problem. In particular, we illustrate how the
    rotational symmetry group gives rise to infinitely many non-isolated strict
    saddle points and equivalent global minima of the objective function. By
    explicitly identifying all stationary points, we divide the entire parameter
    space into three regions: ((cR_1)) the region containing the neighborhoods of
    all strict saddle points, where the objective has negative curvatures;
    ((cR_2)) the region containing neighborhoods of all global minima, where the
    objective enjoys strong convexity along certain directions; and ((cR_3)) the
    complement of the above regions, where the gradient has sufficiently large
    magnitudes. We further extend our result to the matrix sensing problem. This
    allows us to establish strong global convergence guarantees for popular
    iterative algorithms with arbitrary initial solutions.

    Linear Learning with Sparse Data

    Ofer Dekel
    Subjects: Learning (cs.LG)

    Linear predictors are especially useful when the data is high-dimensional and
    sparse. One of the standard techniques used to train a linear predictor is the
    Averaged Stochastic Gradient Descent (ASGD) algorithm. We present an efficient
    implementation of ASGD that avoids dense vector operations. We also describe a
    translation invariant extension called Centered Averaged Stochastic Gradient
    Descent (CASGD).

    Modeling documents with Generative Adversarial Networks

    John Glover
    Subjects: Learning (cs.LG)

    This paper describes a method for using Generative Adversarial Networks to
    learn distributed representations of natural language documents. We propose a
    model that is based on the recently proposed Energy-Based GAN, but instead uses
    a Denoising Autoencoder as the discriminator network. Document representations
    are extracted from the hidden layer of the discriminator and evaluated both
    quantitatively and qualitatively.

    Selecting Bases in Spectral learning of Predictive State Representations via Model Entropy

    Yunlong Liu, Hexing Zhu
    Comments: 9 papges, 4 figures
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Predictive State Representations (PSRs) are powerful techniques for modelling
    dynamical systems, which represent a state as a vector of predictions about
    future observable events (tests). In PSRs, one of the fundamental problems is
    the learning of the PSR model of the underlying system. Recently, spectral
    methods have been successfully used to address this issue by treating the
    learning problem as the task of computing an singular value decomposition (SVD)
    over a submatrix of a special type of matrix called the Hankel matrix. Under
    the assumptions that the rows and columns of the submatrix of the Hankel Matrix
    are sufficient~(which usually means a very large number of rows and columns,
    and almost fails in practice) and the entries of the matrix can be estimated
    accurately, it has been proven that the spectral approach for learning PSRs is
    statistically consistent and the learned parameters can converge to the true
    parameters. However, in practice, due to the limit of the computation ability,
    only a finite set of rows or columns can be chosen to be used for the spectral
    learning. While different sets of columns usually lead to variant accuracy of
    the learned model, in this paper, we propose an approach for selecting the set
    of columns, namely basis selection, by adopting a concept of model entropy to
    measure the accuracy of the learned model. Experimental results are shown to
    demonstrate the effectiveness of the proposed approach.

    Deep Learning and Hierarchal Generative Models

    Elchanan Mossel
    Subjects: Learning (cs.LG)

    In this paper we propose a new prism for studying deep learning motivated by
    connections between deep learning and evolution. Our main contributions are: 1,
    We introduce of a sequence of increasingly complex hierarchal generative models
    which interpolate between standard Markov models on trees (phylogenetic models)
    and deep learning models. 2. Formal definitions of classes of algorithms that
    are not deep. 3. Rigorous proofs showing that such classes are information
    theoretically much weaker than optimal “deep” learning algorithms. In our
    models, deep learning is performed efficiently and proven to classify correctly
    with high probability. All of the models and results are in the semi-supervised
    setting. Many open problems and future directions are presented.

    Meta-Unsupervised-Learning: A supervised approach to unsupervised learning

    Vikas K. Garg, Adam Tauman Kalai
    Comments: 22 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    We introduce a new paradigm to investigate unsupervised learning, reducing
    unsupervised learning to supervised learning. Specifically, we mitigate the
    subjectivity in unsupervised decision-making by leveraging knowledge acquired
    from prior, possibly heterogeneous, supervised learning tasks. We demonstrate
    the versatility of our framework via comprehensive expositions and detailed
    experiments on several unsupervised problems such as (a) clustering, (b)
    outlier detection, and (c) similarity prediction under a common umbrella of
    meta-unsupervised-learning. We also provide rigorous PAC-agnostic bounds to
    establish the theoretical foundations of our framework, and show that our
    framing of meta-clustering circumvents Kleinberg’s impossibility theorem for
    clustering.

    The Predictron: End-To-End Learning and Planning

    David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    One of the key challenges of artificial intelligence is to learn models that
    are effective in the context of planning. In this document we introduce the
    predictron architecture. The predictron consists of a fully abstract model,
    represented by a Markov reward process, that can be rolled forward multiple
    “imagined” planning steps. Each forward pass of the predictron accumulates
    internal rewards and values over multiple planning depths. The predictron is
    trained end-to-end so as to make these accumulated values accurately
    approximate the true value function. We applied the predictron to procedurally
    generated random mazes and a simulator for the game of pool. The predictron
    yielded significantly more accurate predictions than conventional deep neural
    network architectures.

    Provable learning of Noisy-or Networks

    Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)

    Many machine learning applications use latent variable models to explain
    structure in data, whereby visible variables (= coordinates of the given
    datapoint) are explained as a probabilistic function of some hidden variables.
    Finding parameters with the maximum likelihood is NP-hard even in very simple
    settings. In recent years, provably efficient algorithms were nevertheless
    developed for models with linear structures: topic models, mixture models,
    hidden markov models, etc. These algorithms use matrix or tensor decomposition,
    and make some reasonable assumptions about the parameters of the underlying
    model.

    But matrix or tensor decomposition seems of little use when the latent
    variable model has nonlinearities. The current paper shows how to make
    progress: tensor decomposition is applied for learning the single-layer {em
    noisy or} network, which is a textbook example of a Bayes net, and used for
    example in the classic QMR-DT software for diagnosing which disease(s) a
    patient may have by observing the symptoms he/she exhibits.

    The technical novelty here, which should be useful in other settings in
    future, is analysis of tensor decomposition in presence of systematic error
    (i.e., where the noise/error is correlated with the signal, and doesn’t
    decrease as number of samples goes to infinity). This requires rethinking all
    steps of tensor decomposition methods from the ground up.

    For simplicity our analysis is stated assuming that the network parameters
    were chosen from a probability distribution but the method seems more generally
    applicable.

    Automatic composition and optimisation of multicomponent predictive systems

    Manuel Martin Salvador, Marcin Budka, Bogdan Gabrys
    Subjects: Learning (cs.LG)

    Composition and parametrisation of multicomponent predictive systems (MCPSs)
    consisting of chains of data transformation steps is a challenging task. This
    paper is concerned with theoretical considerations and extensive experimental
    analysis for automating the task of building such predictive systems. In the
    theoretical part of the paper, we first propose to adopt the Well-handled and
    Acyclic Workflow (WA-WF) Petri net as a formal representation of MCPSs. We then
    define the optimisation problem in which the search space consists of suitably
    parametrised directed acyclic graphs (i.e. WA-WFs) forming the sought MCPS
    solutions. In the experimental analysis we focus on examining the impact of
    considerably extending the search space resulting from incorporating multiple
    sequential data cleaning and preprocessing steps in the process of composing
    optimised MCPSs, and the quality of the solutions found. In a range of
    extensive experiments three different optimisation strategies are used to
    automatically compose MCPSs for 21 publicly available datasets and 7 datasets
    from real chemical processes. The diversity of the composed MCPSs found is an
    indication that fully and automatically exploiting different combinations of
    data cleaning and preprocessing techniques is possible and highly beneficial
    for different predictive models. Our findings can have a major impact on
    development of high quality predictive models as well as their maintenance and
    scalability aspects needed in modern applications and deployment scenarios.

    Generalized Intersection Kernel

    Ping Li
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Following the very recent line of work on the “generalized min-max” (GMM)
    kernel, this study proposes the “generalized intersection” (GInt) kernel and
    the related “normalized generalized min-max” (NGMM) kernel. In computer
    vision, the (histogram) intersection kernel has been popular, and the GInt
    kernel generalizes it to data which can have both negative and positive
    entries. Through an extensive empirical classification study on 40 datasets
    from the UCI repository, we are able to show that this (tuning-free) GInt
    kernel performs fairly well.

    The empirical results also demonstrate that the NGMM kernel typically
    outperforms the GInt kernel. Interestingly, the NGMM kernel has another
    interpretation — it is the “asymmetrically transformed” version of the GInt
    kernel, based on the idea of “asymmetric hashing”. Just like the GMM kernel,
    the NGMM kernel can be efficiently linearized through (e.g.,) generalized
    consistent weighted sampling (GCWS), as empirically validated in our study.
    Owing to the discrete nature of hashed values, it also provides a scheme for
    approximate near neighbor search.

    Deep neural heart rate variability analysis

    Tamas Madl
    Comments: 6 pages in NIPS 2016 Workshop on Machine Learning for Health (ML4HC)
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Despite of the pain and limited accuracy of blood tests for early recognition
    of cardiovascular disease, they dominate risk screening and triage. On the
    other hand, heart rate variability is non-invasive and cheap, but not
    considered accurate enough for clinical practice. Here, we tackle heart beat
    interval based classification with deep learning. We introduce an end to end
    differentiable hybrid architecture, consisting of a layer of biological neuron
    models of cardiac dynamics (modified FitzHugh Nagumo neurons) and several
    layers of a standard feed-forward neural network. The proposed model is
    evaluated on ECGs from 474 stable at-risk (coronary artery disease) patients,
    and 1172 chest pain patients of an emergency department. We show that it can
    significantly outperform models based on traditional heart rate variability
    predictors, as well as approaching or in some cases outperforming clinical
    blood tests, based only on 60 seconds of inter-beat intervals.

    The interplay between system identification and machine learning

    Gianluigi Pillonetto
    Subjects: Systems and Control (cs.SY); Learning (cs.LG); Machine Learning (stat.ML)

    Learning from examples is one of the key problems in science and engineering.
    It deals with function reconstruction from a finite set of direct and noisy
    samples. Regularization in reproducing kernel Hilbert spaces (RKHSs) is widely
    used to solve this task and includes powerful estimators such as regularization
    networks. Recent achievements include the proof of the statistical consistency
    of these kernel- based approaches. Parallel to this, many different system
    identification techniques have been developed but the interaction with machine
    learning does not appear so strong yet. One reason is that the RKHSs usually
    employed in machine learning do not embed the information available on dynamic
    systems, e.g. BIBO stability. In addition, in system identification the
    independent data assumptions routinely adopted in machine learning are never
    satisfied in practice. This paper provides new results which strengthen the
    connection between system identification and machine learning. Our starting
    point is the introduction of RKHSs of dynamic systems. They contain functionals
    over spaces defined by system inputs and allow to interpret system
    identification as learning from examples. In both linear and nonlinear
    settings, it is shown that this perspective permits to derive in a relatively
    simple way conditions on RKHS stability (i.e. the property of containing only
    BIBO stable systems or predictors), also facilitating the design of new kernels
    for system identification. Furthermore, we prove the convergence of the
    regularized estimator to the optimal predictor under conditions typical of
    dynamic systems.

    Sequence-to-point learning with neural networks for nonintrusive load monitoring

    Chaoyun Zhang, Mingjun Zhong, Zongzuo Wang, Nigel Goddard, Charles Sutton
    Comments: 15 pages, 7 figures
    Subjects: Applications (stat.AP); Learning (cs.LG)

    Energy disaggregation (a.k.a nonintrusive load monitoring, NILM), a
    single-channel blind source separation problem, aims to decompose the mains
    which records the whole electricity consumption into appliance-wise readings.
    This problem is difficult because it is inherently unidentifiable. Recent
    approaches have shown that the identifiability problem could be reduced by
    introducing domain knowledge into the model. Deep neural networks have been
    shown to be promising to tackle this problem in literatures. However, it is not
    clear why and how the neural networks could work for this problem. In this
    paper, we propose sequence-to-point learning for NILM, where the input is a
    window of the mains and the output is a single point of the target appliance.
    We use convolutional neural networks to train the model. Interestingly, we
    systematically show that the convolutional neural networks can inherently learn
    the signatures of the target appliances, which are automatically added into the
    model to reduce the identifiability problem. We applied the proposed neural
    network approaches to a real-world household energy data, and show that the
    methods achieve the state-of-the-art performance.

    Geometric descent method for convex composite minimization

    Shixiang Chen, Shiqian Ma
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)

    In this paper, we extend the geometric descent method recently proposed by
    Bubeck, Lee and Singh to solving nonsmooth and strongly convex composite
    problems. We prove that the resulting algorithm, GeoPG, converges with a linear
    rate ((1-1/sqrt{kappa})), thus achieves the optimal rate among first-order
    methods, where (kappa) is the condition number of the problem. Numerical
    results on linear regression and logistic regression with elastic net
    regularization show that GeoPG compares favorably with Nesterov’s accelerated
    proximal gradient method.

    A Deep Learning Approach To Multiple Kernel Fusion

    Huan Song, Jayaraman J. Thiagarajan, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy, Andreas Spanias
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Kernel fusion is a popular and effective approach for combining multiple
    features that characterize different aspects of data. Traditional approaches
    for Multiple Kernel Learning (MKL) attempt to learn the parameters for
    combining the kernels through sophisticated optimization procedures. In this
    paper, we propose an alternative approach that creates dense embeddings for
    data using the kernel similarities and adopts a deep neural network
    architecture for fusing the embeddings. In order to improve the effectiveness
    of this network, we introduce the kernel dropout regularization strategy
    coupled with the use of an expanded set of composition kernels. Experiment
    results on a real-world activity recognition dataset show that the proposed
    architecture is effective in fusing kernels and achieves state-of-the-art
    performance.

    Efficient iterative policy optimization

    Nicolas Le Roux
    Comments: 12 pages
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)

    We tackle the issue of finding a good policy when the number of policy
    updates is limited. This is done by approximating the expected policy reward as
    a sequence of concave lower bounds which can be efficiently maximized,
    drastically reducing the number of policy updates required to achieve good
    performance. We also extend existing methods to negative rewards, enabling the
    use of control variates.

    The Pessimistic Limits of Margin-based Losses in Semi-supervised Learning

    Jesse H. Krijthe, Marco Loog
    Comments: 16 pages, 1 figure, 1 table
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We show that for linear classifiers defined by convex margin-based surrogate
    losses that are monotonically decreasing, it is impossible to construct any
    semi-supervised approach that is able to guarantee an improvement over the
    supervised classifier measured by this surrogate loss. For non-monotonically
    decreasing loss functions, we demonstrate safe improvements are possible.


    Information Theory

    Conditional Central Limit Theorems for Gaussian Projections

    Galen Reeves
    Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)

    This paper addresses the question of when projections of a high-dimensional
    random vector are approximately Gaussian. This problem has been studied
    previously in the context of high-dimensional data analysis, where the focus is
    on low-dimensional projections of high-dimensional point clouds. The focus of
    this paper is on the typical behavior when the projections are generated by an
    i.i.d. Gaussian projection matrix. The main results are bounds on the deviation
    between the conditional distribution of the projections and a Gaussian
    approximation, where the conditioning is on the projection matrix. The bounds
    are given in terms of the quadratic Wasserstein distance and relative entropy
    and are stated explicitly as a function of the number of projections and
    certain key properties of the random vector. The proof uses Talagrand’s
    transportation inequality and a general integral-moment inequality for mutual
    information. Applications to random linear estimation and compressed sensing
    are discussed.

    Joint Channel Estimation and Nonlinear Distortion Compensation in OFDM Receivers

    Sergey V. Zhidkov
    Subjects: Information Theory (cs.IT)

    Nonlinear distortion in power amplifiers (PA) can significantly degrade
    performance of orthogonal frequency division multiplexed (OFDM) communication
    systems. This paper presents a joint maximum-likelihood channel frequency
    response and nonlinear PA model estimator for OFDM signals. Derivation of the
    estimator is based on Taylor-series representation of power amplifier
    nonlinearity and is suitable for wide range of memoryless PA models. A
    sub-optimal decision-aided algorithm for adaptive compensation of nonlinear
    distortion effects at the receiver-side is also presented. It is shown that the
    proposed algorithms can be used in IEEE 802.11a/g/p/ac compliant wireless LAN
    receivers without any modifications at the transmitter side. The performance of
    the proposed algorithms is studied by means of computer simulation.

    Weighted information and entropy rates

    Yuri Suhov, Izabella Stuhl
    Subjects: Information Theory (cs.IT); Probability (math.PR)

    The weighted entropy (H^{
    m w}_phi (X)=H^{
    m w}_phi (f)) of a random
    variable (X) with values (x) and a probability-mass/density function (f) is
    defined as the mean value ({mathbb E} I^{
    m w}_phi(X)) of the weighted
    information (I^{
    m w}_phi (x)=-phi (x)log,f(x)). Here (xmapstophi
    (x)in{mathbb R}) is a given weight function (WF) indicating a ‘value’ of
    outcome (x). For an (n)-component random vector
    ({mathbf{X}}_0^{n-1}=(X_0,ldots ,X_{n-1})) produced by a random process
    ({mathbf{X}}=(X_i,iin{mathbb Z})), the weighted information (I^{
    m
    w}_{phi_n}({mathbf x}_0^{n-1})) and weighted entropy (H^{
    m
    w}_{phi_n}({mathbf{X}}_0^{n-1})) are defined similarly, with an WF
    (phi_n({mathbf x}_0^{n-1})). Two types of WFs (phi_n) are considered, based
    on additive and a multiplicative forms ((phi_n({mathbf
    x}_0^{n-1})=sumlimits_{i=0}^{n-1}{varphi} (x_i)) and (phi_n({mathbf
    x}_0^{n-1})=prodlimits_{i=0}^{n-1}{varphi} (x_i)), respectively). The focus
    is upon ({it rates}) of the weighted entropy and information, regarded as
    parameters related to ({mathbf{X}}). We show that, in the context of
    ergodicity, a natural scale for an asymptotically additive/multiplicative WF is
    (frac{1}{n^2}H^{
    m w}_{phi_n}({mathbf{X}}_0^{n-1})) and
    (frac{1}{n}log;H^{
    m w}_{phi_n}({mathbf{X}}_0^{n-1})), respectively. This
    gives rise to ({it primary}) ({it rates}). The next-order terms can also be
    identified, leading to ({it secondary}) ({it rates}). We also consider
    emerging generalisations of the Shannon-McMillan-Breiman theorem.

    Fundamental Limits of Caching: Improved Bounds with Coded Prefetching

    Jesús Gómez-Vilardebó
    Subjects: Information Theory (cs.IT)

    We consider a cache network in which a single server is connected to multiple
    users via a shared error free link. The server has access do a database with
    (N) files of equal length (F), and serves (K) users each with a cache memory of
    (MF) bits. A novel centralized coded caching scheme is proposed for scenarios
    with more users than files (Nleq K) and cache capacities satisfying
    (frac{1}{K}leq Mleqfrac{N}{K}). It is shown, that the proposed scheme
    outperforms the best delivery rate known in the literature.

    Active Hypothesis Testing on A Tree: Anomaly Detection under Hierarchical Observations

    Chao Wang, Qing Zhao, Kobi Cohen
    Subjects: Information Theory (cs.IT)

    The problem of detecting a few anomalous processes among a large number of
    processes is considered. At each time, aggregated observations can be taken
    from a chosen subset of processes, where the chosen subset must conform to a
    given binary tree structure. The random observations are i.i.d. over time with
    a distribution that depends on the size of the chosen subset and the number of
    anomalous processes in the subset. The objective is a sequential search
    strategy that minimizes the sample complexity (i.e., expected detection time)
    subject to a reliability constraint. A sequential test that results in a biased
    random walk on the tree is developed and is shown to be asymptotically optimal
    in the reliability constraint and order optimal in the number of processes
    under certain conditions on the hierarchical observation model.

    Phase Retrieval Via Reweighted Wirtinger Flow

    Ziyang Yuan, Hongxia Wang
    Comments: 21 pages, 11 figures
    Subjects: Information Theory (cs.IT)

    Phase retrieval(PR) problem is a kind of ill-condition inverse problem which
    is arising in various of applications. Based on the Wirtinger flow(WF) method,
    a reweighted Wirtinger flow(RWF) method is proposed to deal with PR problem.
    RWF finds the global optimum by solving a series of sub-PR problems with
    changing weights. Theoretical analyses illustrate that the RWF has a geometric
    convergence from a deliberate initialization when the weights are bounded by 1
    and (frac{10}{9}). Numerical testing shows RWF has a lower sampling complexity
    compared with WF. As an essentially adaptive truncated Wirtinger flow(TWF)
    method, RWF performs better than TWF especially when the ratio between sampling
    number (m) and length of signal (n) is small.

    On Covert Communication with Noise Uncertainty

    Biao He, Shihao Yan, Xiangyun Zhou, Vincent K. N. Lau
    Comments: to appear in IEEE Communications Letters
    Subjects: Information Theory (cs.IT)

    Prior studies on covert communication with noise uncertainty adopted a
    worst-case approach from the warden’s perspective. That is, the worst-case
    detection performance of the warden is used to assess covertness, which is
    overly optimistic. Instead of simply considering the worst limit, in this work,
    we take the distribution of noise uncertainty into account to evaluate the
    overall covertness in a statistical sense. Specifically, we define new metrics
    for measuring the covertness, which are then adopted to analyze the maximum
    achievable rate for a given covertness requirement under both bounded and
    unbounded noise uncertainty models.

    Coding for the Permutation Channel with Insertions, Deletions, Substitutions, and Erasures

    Mladen Kovačević, Vincent Y. F. Tan
    Comments: 5 pages (double-column)
    Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM)

    This paper is motivated by the error-control problem in communication
    channels in which the transmitted sequences are subjected to random
    permutations, in addition to being impaired with insertions, deletions,
    substitutions, and erasures of symbols. Bounds on the size of optimal codes in
    this setting are derived, and their asymptotic behavior examined in the
    fixed-minimum-distance regime. A family of codes correcting these types of
    errors is described and is shown to be asymptotically optimal for some sets of
    parameters. The corresponding error-detection problem is also analyzed.
    Applications to data transmission over packet networks based on multipath
    routing are discussed.

    Several Classes of Permutation Trinomials From Niho Exponents

    Nian Li, Tor Helleseth
    Comments: Cryptography and Communications
    Subjects: Information Theory (cs.IT)

    Motivated by recent results on the constructions of permutation polynomials
    with few terms over the finite field (mathbb{F}_{2^n}), where (n) is a
    positive even integer, we focus on the construction of permutation trinomials
    over (mathbb{F}_{2^n}) from Niho exponents. As a consequence, several new
    classes of permutation trinomials over (mathbb{F}_{2^n}) are constructed from
    Niho exponents based on some subtle manipulation of solving equations with low
    degrees over finite fields.

    Non-Linear Programming: Maximize SNR for Designing Spreading Sequence – Part II: Conditions for Optimal Spreading Sequences

    Hirofumi Tsuda, Ken Umeno
    Comments: 11 pages 20 figures
    Subjects: Information Theory (cs.IT)

    Signal to Noise Ratio (SNR) is an important index for wireless
    communications. In CDMA systems, spreading sequences are utilized. This series
    of papers show the method to derive spreading sequences as the solutions of
    non-linear programming: maximize SNR. In this paper, we derive the optimization
    problems with the expression SNR derived in Part I and the necessary conditions
    for the global solutions. We numerically solve the problems and evaluate the
    solutions with two factors, mean-square correlations and maximum mean-square
    correlations.

    Logarithmic Coherence: Operational interpretation of (l_1)-norm Coherence

    Swapan Rana, Preeti Parashar, Andreas Winter, Maciej Lewenstein
    Comments: 5+4 pages, 2 figures, comments welcome
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph)

    We show that the distillable coherence–which is equal to the relative
    entropy of coherence, is, at most up to a constant, always bounded by the
    (l_1)-norm based measure of coherence (which equals to the sum of absolute
    values of off-diagonals). Thus, the latter plays a similar role as logarithmic
    negativity plays in entanglement theory, and this is the best operational
    interpretation from resource-theoretic viewpoint. Consequently the two measures
    are intimately connected to another operational measure, the robustness of
    coherence. We also find the tightest possible bound between the two, namely,
    given (l_1)-coherence we find states having minimum (and maximum for pure
    state) distillable coherence. For any given robustness, the same state also
    minimizes the distillable coherence.




沪ICP备19023445号-2号
友情链接