IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Thu, 1 Jun 2017

    我爱机器学习(52ml.net)发表于 2017-06-01 00:00:00
    love 0

    Neural and Evolutionary Computing

    End-to-end Differentiable Proving

    Tim Rocktäschel, Sebastian Riedel
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG); Logic in Computer Science (cs.LO)

    We introduce neural networks for end-to-end differentiable theorem proving
    that operate on dense vector representations of symbols. These neural networks
    are constructed recursively by taking inspiration from the backward chaining
    algorithm as used in Prolog. Specifically, we replace symbolic unification with
    a differentiable computation on vector representations of symbols using a
    radial basis function kernel, thereby combining symbolic reasoning with
    learning subsymbolic vector representations. By using gradient descent, the
    resulting neural network can be trained to infer facts from a given incomplete
    knowledge base. It learns to (i) place representations of similar symbols in
    close proximity in a vector space, (ii) make use of such similarities to prove
    facts, (iii) induce logical rules, and (iv) use provided and induced logical
    rules for complex multi-hop reasoning. We demonstrate that this architecture
    outperforms ComplEx, a state-of-the-art neural link prediction model, on four
    benchmark knowledge bases while at the same time inducing interpretable
    function-free first-order logic rules.

    SuperSpike: Supervised learning in multi-layer spiking neural networks

    Friedemann Zenke, Surya Ganguli
    Subjects: Neurons and Cognition (q-bio.NC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    A vast majority of computation in the brain is performed by spiking neural
    networks. Despite the ubiquity of such spiking, we currently lack an
    understanding of how biological spiking neural circuits learn and compute
    in-vivo, as well as how we can instantiate such capabilities in artificial
    spiking circuits in-silico. Here we revisit the problem of supervised learning
    in temporally coding multi-layer spiking neural networks. First, by using a
    surrogate gradient approach, we derive SuperSpike, a nonlinear voltage-based
    three factor learning rule capable of training multi-layer networks of
    deterministic integrate-and-fire neurons to perform nonlinear computations on
    spatiotemporal spike patterns. Second, inspired by recent results on feedback
    alignment, we compare the performance of our learning rule under different
    credit assignment strategies for propagating output errors to hidden units.
    Specifically, we test uniform, symmetric and random feedback, finding that
    simpler tasks can be solved with any type of feedback, while more complex tasks
    require symmetric feedback. In summary, our results open the door to obtaining
    a better scientific understanding of learning and computation in spiking neural
    networks by advancing our ability to train them to solve nonlinear problems
    involving transformations between different spatiotemporal spike-time patterns.

    Non-Markovian Control with Gated End-to-End Memory Policy Networks

    Julien Perez, Tomi Silander
    Comments: 11 pages, 1 figure, 1 table
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Partially observable environments present an important open challenge in the
    domain of sequential control learning with delayed rewards. Despite numerous
    attempts during the two last decades, the majority of reinforcement learning
    algorithms and associated approximate models, applied to this context, still
    assume Markovian state transitions. In this paper, we explore the use of a
    recently proposed attention-based model, the Gated End-to-End Memory Network,
    for sequential control. We call the resulting model the Gated End-to-End Memory
    Policy Network. More precisely, we use a model-free value-based algorithm to
    learn policies for partially observed domains using this memory-enhanced neural
    network. This model is end-to-end learnable and it features unbounded memory.
    Indeed, because of its attention mechanism and associated non-parametric
    memory, the proposed model allows us to define an attention mechanism over the
    observation stream unlike recurrent models. We show encouraging results that
    illustrate the capability of our attention-based model in the context of the
    continuous-state non-stationary control problem of stock trading. We also
    present an OpenAI Gym environment for simulated stock exchange and explain its
    relevance as a benchmark for the field of non-Markovian decision process
    learning.

    Adversarial Generation of Natural Language

    Sai Rajeswar, Sandeep Subramanian, Francis Dutil, Christopher Pal, Aaron Courville
    Comments: 11 pages, 3 figures, 5 tables
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Generative Adversarial Networks (GANs) have gathered a lot of attention from
    the computer vision community, yielding impressive results for image
    generation. Advances in the adversarial generation of natural language from
    noise however are not commensurate with the progress made in generating images,
    and still lag far behind likelihood based methods. In this paper, we take a
    step towards generating natural language with a GAN objective alone. We
    introduce a simple baseline that addresses the discrete output space problem
    without relying on gradient estimators and show that it is able to achieve
    state-of-the-art results on a Chinese poem generation dataset. We present
    quantitative results on generating sentences from context-free and
    probabilistic context-free grammars, and qualitative language modeling results.
    A conditional version is also described that can generate sequences conditioned
    on sentence characteristics.

    A Tale of Two Animats: What does it take to have goals?

    Larissa Albantakis
    Comments: This article is a contribution to the FQXi 2016-2017 essay contest “Wandering Towards a Goal”
    Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)

    What does it take for a system, biological or not, to have goals? Here, this
    question is approached in the context of in silico artificial evolution. By
    examining the informational and causal properties of artificial organisms
    (‘animats’) controlled by small, adaptive neural networks (Markov Brains), this
    essay discusses necessary requirements for intrinsic information, autonomy, and
    meaning. The focus lies on comparing two types of Markov Brains that evolved in
    the same simple environment: one with purely feedforward connections between
    its elements, the other with an integrated set of elements that causally
    constrain each other. While both types of brains ‘process’ information about
    their environment and are equally fit, only the integrated one forms a causally
    autonomous entity above a background of external influences. This suggests that
    to assess whether goals are meaningful for a system itself, it is important to
    understand what the system is, rather than what it does.

    Practical Neural Network Performance Prediction for Early Stopping

    Bowen Baker, Otkrist Gupta, Ramesh Raskar, Nikhil Naik
    Comments: Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    In the neural network domain, methods for hyperparameter optimization and
    meta-modeling are computationally expensive due to the need to train a large
    number of neural network configurations. In this paper, we show that a simple
    regression model, based on support vector machines, can predict the final
    performance of partially trained neural network configurations using features
    based on network architectures, hyperparameters, and time-series validation
    performance data. We use this regression model to develop an early stopping
    strategy for neural network configurations. With this early stopping strategy,
    we obtain significant speedups in both hyperparameter optimization and
    meta-modeling. Particularly in the context of meta-modeling, our method can
    learn to predict the performance of drastically different architectures and is
    seamlessly incorporated into reinforcement learning-based architecture
    selection algorithms. Finally, we show that our method is simpler, faster, and
    more accurate than Bayesian methods for learning curve prediction.


    Computer Vision and Pattern Recognition

    U-Phylogeny: Undirected Provenance Graph Construction in the Wild

    Aparna Bharati, Daniel Moreira, Allan Pinto, Joel Brogan, Kevin Bowyer, Patrick Flynn, Walter Scheirer, Anderson Rocha
    Comments: 5 pages, Accepted in International Conference on Image Processing, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deriving relationships between images and tracing back their history of
    modifications are at the core of Multimedia Phylogeny solutions, which aim to
    combat misinformation through doctored visual media. Nonetheless, most recent
    image phylogeny solutions cannot properly address cases of forged composite
    images with multiple donors, an area known as multiple parenting phylogeny
    (MPP). This paper presents a preliminary undirected graph construction solution
    for MPP, without any strict assumptions. The algorithm is underpinned by robust
    image representative keypoints and different geometric consistency checks among
    matching regions in both images to provide regions of interest for direct
    comparison. The paper introduces a novel technique to geometrically filter the
    most promising matches as well as to aid in the shared region localization
    task. The strength of the approach is corroborated by experiments with
    real-world cases, with and without image distractors (unrelated cases).

    Long-term Correlation Tracking using Multi-layer Hybrid Features in Sparse and Dense Environments

    Nathanael L. Baisa, Deepayan Bhowmik, Andrew Wallace
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Tracking a target of interest in both sparse and crowded environments is a
    challenging problem, not yet successfully addressed in the literature. In this
    paper, we propose a new long-term visual tracking algorithm, learning
    discriminative correlation filters and using an online classifier, to track a
    target of interest in both sparse and crowded video sequences. First, we learn
    a translation correlation filter using a multi-layer hybrid of convolutional
    neural networks (CNN) and traditional hand-crafted features. We combine
    advantages of both the lower convolutional layer which retains more spatial
    details for precise localization and the higher convolutional layer which
    encodes semantic information for handling appearance variations, and then
    integrate these with histogram of oriented gradients (HOG) and color-naming
    traditional features. Second, we include a re-detection module for overcoming
    tracking failures due to long-term occlusions by training an incremental
    (online) SVM on the most confident frames using hand-engineered features. This
    re-detection module is activated only when the correlation response of the
    object is below some pre-defined threshold. This generates high score detection
    proposals which are temporally filtered using a Gaussian mixture probability
    hypothesis density (GM-PHD) filter to find the detection proposal with the
    maximum weight as the target state estimate by removing the other detection
    proposals as clutter. Finally, we learn a scale correlation filter for
    estimating the scale of a target by constructing a target pyramid around the
    estimated or re-detected position using the HOG features. We carry out
    extensive experiments on both sparse and dense data sets which show that our
    method significantly outperforms state-of-the-art methods.

    Adversarial Inversion: Inverse Graphics with Adversarial Priors

    Hsiao-Yu Fish Tung, Adam Harley, William Seto, Katerina Fragkiadaki
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose adversarial inversion, a weakly supervised neural network model
    that combines inverse rendering with adversarial networks. Given a set of
    images, our inverse rendering encoder predicts a set of latent factors (e.g.,
    depth, camera pose), which a renderer then projects to reconstruct (part of)
    the visual input. Inversion is often ambiguous, e.g., many compositions of 3D
    shape and camera pose give rise to the same 2D projection. To address this
    ambiguity, we impose priors on the predicted latent factors, through an
    adversarial discriminator network trained to discriminate between predicted
    factors and ground-truth ones. Training adversarial inversion does not require
    input-output paired annotations, but merely a collection of ground-truth
    factors, unrelated (unpaired) to the current input. Our model can thus be
    self-supervised by unlabelled image data, by minimizing a joint reconstruction
    and adversarial loss, complementing any direct supervision provided by paired
    annotations. We apply adversarial inversion to 3D human pose estimation and 3D
    structure and egomotion estimation, and outperform models supervised by only
    paired annotations, and/or reconstruction losses, that do not use adversarial
    priors. Applying adversarial inversion to super-resolution and inpainting
    results in automated “visual plastic surgery”. In adversarial super-resolution,
    when the discriminator is provided with young, old, female, male, or Tom Cruise
    faces as ground-truth, our model renders the input face image towards its
    young, old, feminine, masculine or Tom Cruise-like equivalent. In adversarial
    inpainting, when the discriminator is provided with faces with big lips or big
    noses as ground-truth, it creates visual lip or nose augmentations.

    Representation Learning by Rotating Your Faces

    Luan Tran, Xi Yin, Xiaoming Liu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The large pose discrepancy between two face images is one of the fundamental
    challenges in automatic face recognition. Conventional approaches to
    pose-invariant face recognition either perform face frontalization on, or learn
    a pose-invariant representation from, a non-frontal face image. We argue that
    it is more desirable to perform both tasks jointly to allow them to leverage
    each other. To this end, this paper proposes a Disentangled Representation
    learning-Generative Adversarial Network (DR-GAN) with three distinct novelties.
    First, the encoder-decoder structure of the generator enables DR-GAN to learn a
    representation that is both generative and discriminative, which can be used
    for face image synthesis and pose-invariant face recognition. Second, this
    representation is explicitly disentangled from other face variations such as
    pose, through the pose code provided to the decoder and pose estimation in the
    discriminator. Third, DR-GAN can take one or multiple images as the input, and
    generate one unified identity representation along with an arbitrary number of
    synthetic face images. Extensive quantitative and qualitative evaluation on a
    number of controlled and in-the-wild databases demonstrate the superiority of
    DR-GAN over the state of the art in both learning representations and rotating
    large-pose face images.

    EvaluationNet: Can Human Skill be Evaluated by Deep Networks?

    Seong Tae Kim, Yong Man Ro
    Comments: 8 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    With the recent substantial growth of media such as YouTube, a considerable
    number of instructional videos covering a wide variety of tasks are available
    online. Therefore, online instructional videos have become a rich resource for
    humans to learn everyday skills. In order to improve the effectiveness of the
    learning with instructional video, observation and evaluation of the activity
    are required. However, it is difficult to observe and evaluate every activity
    steps by expert. In this study, a novel deep learning framework which targets
    human activity evaluation for learning from instructional video has been
    proposed. In order to deal with the inherent variability of activities, we
    propose to model activity as a structured process. First, action units are
    encoded from dense trajectories with LSTM network. The variable-length action
    unit features are then evaluated by a Siamese LSTM network. By the comparative
    experiments on public dataset, the effectiveness of the proposed method has
    been demonstrated.

    Neuron Segmentation Using Deep Complete Bipartite Networks

    Jianxu Chen, Sreya Banerjee, Abhinav Grama, Walter J. Scheirer, Danny Z. Chen
    Comments: miccai 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we consider the problem of automatically segmenting neuronal
    cells in dual-color confocal microscopy images. This problem is a key task in
    various quantitative analysis applications in neuroscience, such as tracing
    cell genesis in Danio rerio (zebrafish) brains. Deep learning, especially using
    fully convolutional networks (FCN), has profoundly changed segmentation
    research in biomedical imaging. We face two major challenges in this problem.
    First, neuronal cells may form dense clusters, making it difficult to correctly
    identify all individual cells (even to human experts). Consequently,
    segmentation results of the known FCN-type models are not accurate enough.
    Second, pixel-wise ground truth is difficult to obtain. Only a limited amount
    of approximate instance-wise annotation can be collected, which makes the
    training of FCN models quite cumbersome. We propose a new FCN-type deep
    learning model, called deep complete bipartite networks (CB-Net), and a new
    scheme for leveraging approximate instance-wise annotation to train our
    pixel-wise prediction model. Evaluated using seven real datasets, our proposed
    new CB-Net model outperforms the state-of-the-art FCN models and produces
    neuron segmentation results of remarkable quality

    Deep Supervised Discrete Hashing

    Qi Li, Zhenan Sun, Ran He, Tieniu Tan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    With the rapid growth of image and video data on the web, hashing has been
    extensively studied for image or video search in recent years. Benefit from
    recent advances in deep learning, deep hashing methods have achieved promising
    results for image retrieval. However, there are some limitations of previous
    deep hashing methods (e.g., the semantic information is not fully exploited).
    In this paper, we develop a deep supervised discrete hashing algorithm based on
    the assumption that the learned binary codes should be ideal for
    classification. Both the pairwise label information and the classification
    information are used to learn the hash codes within one stream framework. We
    constrain the outputs of the last layer to be binary codes directly, which is
    rarely investigated in deep hashing algorithm. Because of the discrete nature
    of hash codes, an alternating minimization method is used to optimize the
    objective function. Experimental results have shown that our method outperforms
    current state-of-the-art methods on benchmark datasets.

    Class Specific Feature Selection for Interval Valued Data Through Interval K-Means Clustering

    D. S. Guru, N. Vinay Kumar
    Comments: 12 Pages, 3 figures, 7 tables
    Journal-ref: RTIP2R 2016, CCIS 709, pp. 228 TO 239, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, a novel feature selection approach for supervised interval
    valued features is proposed. The proposed approach takes care of selecting the
    class specific features through interval K-Means clustering. The kernel of
    K-Means clustering algorithm is modified to adapt interval valued data. During
    training, a set of samples corresponding to a class is fed into the interval
    K-Means clustering algorithm, which clusters features into K distinct clusters.
    Hence, there are K number of features corresponding to each class.
    Subsequently, corresponding to each class, the cluster representatives are
    chosen. This procedure is repeated for all the samples of remaining classes.
    During testing the feature indices correspond to each class are used for
    validating the given dataset through classification using suitable symbolic
    classifiers. For experimentation, four standard supervised interval datasets
    are used. The results show the superiority of the proposed model when compared
    with the other existing state-of-the-art feature selection methods.

    Bridge Simulation and Metric Estimation on Landmark Manifolds

    Stefan Sommer, Alexis Arnaudon, Line Kuhnel, Sarang Joshi
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present an inference algorithm and connected Monte Carlo based estimation
    procedures for metric estimation from landmark configurations distributed
    according to the transition distribution of a Riemannian Brownian motion
    arising from the Large Deformation Diffeomorphic Metric Mapping (LDDMM) metric.
    The distribution possesses properties similar to the regular Euclidean normal
    distribution but its transition density is governed by a high-dimensional PDE
    with no closed-form solution in the nonlinear case. We show how the density can
    be numerically approximated by Monte Carlo sampling of conditioned Brownian
    bridges, and we use this to estimate parameters of the LDDMM kernel and thus
    the metric structure by maximum likelihood.

    Naturally Combined Shape-Color Moment Invariants under Affine Transformations

    Ming Gong, You Hao, Hanlin Mo, Hua Li
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We proposed a kind of naturally combined shape-color affine moment invariants
    (SCAMI), which consider both shape and color affine transformations
    simultaneously in one single system. In the real scene, color and shape
    deformations always exist in images simultaneously. Simple shape invariants or
    color invariants can not be qualified for this situation. The conventional
    method is just to make a simple linear combination of the two factors.
    Meanwhile, the manual selection of weights is a complex issue. Our construction
    method is based on the multiple integration framework. The integral kernel is
    assigned as the continued product of the shape and color invariant cores. It is
    the first time to directly derive an invariant to dual affine transformations
    of shape and color. The manual selection of weights is no longer necessary, and
    both the shape and color transformations are extended to affine transformation
    group. With the various of invariant cores, a set of lower-order invariants are
    constructed and the completeness and independence are discussed detailedly. A
    set of SCAMIs, which called SCAMI24, are recommended, and the effectiveness and
    robustness have been evaluated on both synthetic and real datasets.

    Weakly Supervised Generative Adversarial Networks for 3D Reconstruction

    JunYoung Gwak, Christopher B. Choy, Animesh Garg, Manmohan Chandraker, Silvio Savarese
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Volumetric 3D reconstruction has witnessed a significant progress in
    performance through the use of deep neural network based methods that address
    some of the limitations of traditional reconstruction algorithms. However, this
    increase in performance requires large scale annotations of 2D/3D data. This
    paper introduces a novel generative model for volumetric 3D reconstruction,
    Weakly supervised Generative Adversarial Network (WS-GAN) which reduces
    reliance on expensive 3D supervision. WS-GAN takes an input image, a sparse set
    of 2D object masks with respective camera parameters, and an unmatched 3D model
    as inputs during training. WS-GAN uses a learned encoding as input to a
    conditional 3D-model generator trained alongside a discriminator, which is
    constrained to the manifold of realistic 3D shapes. We bridge the
    representation gap between 2D masks and 3D volumes through a perspective
    raytrace pooling layer, that enables perspective projection and allows
    backpropagation. We evaluate WS-GAN on ShapeNet, ObjectNet and Stanford Online
    Product dataset for reconstruction with single-view and multi-view cases in
    both synthetic and real images. We compare our method to voxel carving and
    prior work with full 3D supervision. Additionally, we also demonstrate that the
    learned feature representation is semantically meaningful through interpolation
    and manipulation in input space.

    Morphological Error Detection in 3D Segmentations

    David Rolnick, Yaron Meirovitch, Toufiq Parag, Hanspeter Pfister, Viren Jain, Jeff W. Lichtman, Edward S. Boyden, Nir Shavit
    Comments: 13 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

    Deep learning algorithms for connectomics rely upon localized classification,
    rather than overall morphology. This leads to a high incidence of erroneously
    merged objects. Humans, by contrast, can easily detect such errors by acquiring
    intuition for the correct morphology of objects. Biological neurons have
    complicated and variable shapes, which are challenging to learn, and merge
    errors take a multitude of different forms. We present an algorithm, MergeNet,
    that shows 3D ConvNets can, in fact, detect merge errors from high-level
    neuronal morphology. MergeNet follows unsupervised training and operates across
    datasets. We demonstrate the performance of MergeNet both on a variety of
    connectomics data and on a dataset created from merged MNIST images.

    Working hard to know your neighbor's margins:Local descriptor learning loss

    Anastasiya Mishchuk, Dmytro Mishkin, Filip Radenovic, Jiri Matas
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We introduce a novel loss for learning local feature descriptors that is
    inspired by the SIFT matching scheme. We show that the proposed loss that
    relies on the maximization of the distance between the closest positive and
    closest negative patches can replace more complex regularization methods which
    have been used in local descriptor learning; it works well for both shallow and
    deep convolution network architectures. The resulting descriptor is compact —
    it has the same dimensionality as SIFT (128), it shows state-of-art performance
    on matching, patch verification and retrieval benchmarks and it is fast to
    compute on a GPU.

    Generic Tubelet Proposals for Action Localization

    Jiawei He, Mostafa S. Ibrahim, Zhiwei Deng, Greg Mori
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We develop a novel framework for action localization in videos. We propose
    the Tube Proposal Network (TPN), which can generate generic, class-independent,
    video-level tubelet proposals in videos. The generated tubelet proposals can be
    utilized in various video analysis tasks, including recognizing and localizing
    actions in videos. In particular, we integrate these generic tubelet proposals
    into a unified temporal deep network for action classification. Compared with
    other methods, our generic tubelet proposal method is accurate, general, and is
    fully differentiable under a smoothL1 loss function. We demonstrate the
    performance of our algorithm on the standard UCF-Sports, J-HMDB21, and UCF-101
    datasets. Our class-independent TPN outperforms other tubelet generation
    methods, and our unified temporal deep network achieves state-of-the-art
    localization results on all three datasets.

    PCM-TV-TFV: A Novel Two Stage Framework for Image Reconstruction from Fourier Data

    Weihong Guo, Guohui Song, Yue Zhang
    Comments: 22 pages, 11 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose in this paper a novel two-stage Projection Correction Modeling
    (PCM) framework for image reconstruction from (non-uniform) Fourier
    measurements. PCM consists of a projection stage (P-stage) motivated by the
    multi-scale Galerkin method and a correction stage (C-stage) with an edge
    guided regularity fusing together the advantages of total variation (TV) and
    total fractional variation (TFV). The P-stage allows for continuous modeling of
    the underlying image of interest. The given measurements are projected onto a
    space in which the image is well represented. We then enhance the
    reconstruction result at the C-stage that minimizes an energy functional
    consisting of a fidelity in the transformed domain and a novel edge guided
    regularity. We further develop efficient proximal algorithms to solve the
    corresponding optimization problem. Various numerical results in both 1D
    signals and 2D images have also been presented to demonstrate the superior
    performance of the proposed two-stage method to other classical one-stage
    methods.

    Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols

    Serhii Havrylov, Ivan Titov
    Comments: Submitted to NIPS 2017. The extended abstract was presented at ICLR 2017 workshop track
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)

    Learning to communicate through interaction, rather than relying on explicit
    supervision, is often considered a prerequisite for developing a general AI. We
    study a setting where two agents engage in playing a referential game and, from
    scratch, develop a communication protocol necessary to succeed in this game.
    Unlike previous work, we require that messages they exchange, both at train and
    test time, are in the form of a language (i.e. sequences of discrete symbols).
    We compare a reinforcement learning approach and one using a differentiable
    relaxation (straight-through Gumbel-softmax estimator) and observe that the
    latter is much faster to converge and it results in more effective protocols.
    Interestingly, we also observe that the protocol we induce by optimizing the
    communication success exhibits a degree of compositionality and variability
    (i.e. the same information can be phrased in different ways), both properties
    characteristic of natural languages. As the ultimate goal is to ensure that
    communication is accomplished in natural language, we also perform experiments
    where we inject prior information about natural language into our model and
    study properties of the resulting protocol.

    Micro Fourier Transform Profilometry ((μ)FTP): 3D shape measurement at 10,000 frames per second

    Chao Zuo, Tianyang Tao, Shijie Feng, Lei Huang, Anand Asundi, Qian Chen
    Comments: This manuscript was originally submitted on 30th January 17
    Subjects: Instrumentation and Detectors (physics.ins-det); Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)

    Recent advances in imaging sensors and digital light projection technology
    have facilitated a rapid progress in 3D optical sensing, enabling 3D surfaces
    of complex-shaped objects to be captured with improved resolution and accuracy.
    However, due to the large number of projection patterns required for phase
    recovery and disambiguation, the maximum fame rates of current 3D shape
    measurement techniques are still limited to the range of hundreds of frames per
    second (fps). Here, we demonstrate a new 3D dynamic imaging technique, Micro
    Fourier Transform Profilometry ((mu)FTP), which can capture 3D surfaces of
    transient events at up to 10,000 fps based on our newly developed high-speed
    fringe projection system. Compared with existing techniques, (mu)FTP has the
    prominent advantage of recovering an accurate, unambiguous, and dense 3D point
    cloud with only two projected patterns. Furthermore, the phase information is
    encoded within a single high-frequency fringe image, thereby allowing
    motion-artifact-free reconstruction of transient events with temporal
    resolution of 50 microseconds. To show (mu)FTP’s broad utility, we use it to
    reconstruct 3D videos of 4 transient scenes: vibrating cantilevers, rotating
    fan blades, bullet fired from a toy gun, and balloon’s explosion triggered by a
    flying dart, which were previously difficult or even unable to be captured with
    conventional approaches.

    Unsupervised Learning of Disentangled Representations from Video

    Emily Denton, Vighnesh Birodkar
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    We present a new model DrNET that learns disentangled image representations
    from video. Our approach leverages the temporal coherence of video and a novel
    adversarial loss to learn a representation that factorizes each frame into a
    stationary part and a temporally varying component. The disentangled
    representation can be used for a range of tasks. For example, applying a
    standard LSTM to the time-vary components enables prediction of future frames.
    We evaluate our approach on a range of synthetic and real videos, demonstrating
    the ability to coherently generate hundreds of steps into the future.

    Practical Neural Network Performance Prediction for Early Stopping

    Bowen Baker, Otkrist Gupta, Ramesh Raskar, Nikhil Naik
    Comments: Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    In the neural network domain, methods for hyperparameter optimization and
    meta-modeling are computationally expensive due to the need to train a large
    number of neural network configurations. In this paper, we show that a simple
    regression model, based on support vector machines, can predict the final
    performance of partially trained neural network configurations using features
    based on network architectures, hyperparameters, and time-series validation
    performance data. We use this regression model to develop an early stopping
    strategy for neural network configurations. With this early stopping strategy,
    we obtain significant speedups in both hyperparameter optimization and
    meta-modeling. Particularly in the context of meta-modeling, our method can
    learn to predict the performance of drastically different architectures and is
    seamlessly incorporated into reinforcement learning-based architecture
    selection algorithms. Finally, we show that our method is simpler, faster, and
    more accurate than Bayesian methods for learning curve prediction.


    Artificial Intelligence

    The Atari Grand Challenge Dataset

    Vitaly Kurin, Sebastian Nowozin, Katja Hofmann, Lucas Beyer, Bastian Leibe
    Subjects: Artificial Intelligence (cs.AI)

    Recent progress in Reinforcement Learning (RL), fueled by its combination,
    with Deep Learning has enabled impressive results in learning to interact with
    complex virtual environments, yet real-world applications of RL are still
    scarce. A key limitation is data efficiency, with current state-of-the-art
    approaches requiring millions of training samples. A promising way to tackle
    this problem is to augment RL with learning from human demonstrations. However,
    human demonstration data is not yet readily available. This hinders progress in
    this direction. The present work addresses this problem as follows. We (i)
    collect and describe a large dataset of human Atari 2600 replays — the largest
    and most diverse such data set publicly released to date, (ii) illustrate an
    example use of this dataset by analyzing the relation between demonstration
    quality and imitation learning performance, and (iii) outline possible research
    directions that are opened up by our work.

    Propositional Knowledge Representation in Restricted Boltzmann Machines

    Son N. Tran
    Subjects: Artificial Intelligence (cs.AI)

    Representing symbolic knowledge into a connectionist network is the key
    element for the integration of scalable learning and sound reasoning. Most of
    the previous studies focus on discriminative neural networks which
    unnecessarily require a separation of input/output variables. Recent
    development of generative neural networks such as restricted Boltzmann machines
    (RBMs) has shown a capability of learning semantic abstractions directly from
    data, posing a promise for general symbolic learning and reasoning. Previous
    work on Penalty logic show a link between propositional logic and symmetric
    connectionist networks, however it is not applicable to RBMs. This paper
    proposes a novel method to represent propositional formulas into RBMs/stack of
    RBMs where Gibbs sampling can be seen as MaxSAT. It also shows a promising use
    of RBMs to learn symbolic knowledge through maximum likelihood estimation.

    Towards Learned Clauses Database Reduction Strategies Based on Dominance Relationship

    Jerry Lonlac, Engelbert Mephu Nguifo
    Subjects: Artificial Intelligence (cs.AI)

    Clause Learning is one of the most important components of a conflict driven
    clause learning (CDCL) SAT solver that is effective on industrial instances.
    Since the number of learned clauses is proved to be exponential in the worse
    case, it is necessary to identify the most relevant clauses to maintain and
    delete the irrelevant ones. As reported in the literature, several learned
    clauses deletion strategies have been proposed. However the diversity in both
    the number of clauses to be removed at each step of reduction and the results
    obtained with each strategy creates confusion to determine which criterion is
    better. Thus, the problem to select which learned clauses are to be removed
    during the search step remains very challenging. In this paper, we propose a
    novel approach to identify the most relevant learned clauses without favoring
    or excluding any of the proposed measures, but by adopting the notion of
    dominance relationship among those measures. Our approach bypasses the problem
    of the diversity of results and reaches a compromise between the assessments of
    these measures. Furthermore, the proposed approach also avoids another
    non-trivial problem which is the amount of clauses to be deleted at each
    reduction of the learned clause database.

    Lifelong Multi-Agent Path Finding for Online Pickup and Delivery Tasks

    Hang Ma, Jiaoyang Li, T. K. Satish Kumar, Sven Koenig
    Comments: In AAMAS 2017
    Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)

    The multi-agent path-finding (MAPF) problem has recently received a lot of
    attention. However, it does not capture important characteristics of many
    real-world domains, such as automated warehouses, where agents are constantly
    engaged with new tasks. In this paper, we therefore study a lifelong version of
    the MAPF problem, called the multi-agent pickup and delivery (MAPD) problem. In
    the MAPD problem, agents have to attend to a stream of delivery tasks in an
    online setting. One agent has to be assigned to each delivery task. This agent
    has to first move to a given pickup location and then to a given delivery
    location while avoiding collisions with other agents. We present two decoupled
    MAPD algorithms, Token Passing (TP) and Token Passing with Task Swaps (TPTS).
    Theoretically, we show that they solve all well-formed MAPD instances, a
    realistic subclass of MAPD instances. Experimentally, we compare them against a
    centralized strawman MAPD algorithm without this guarantee in a simulated
    warehouse system. TP can easily be extended to a fully distributed MAPD
    algorithm and is the best choice when real-time computation is of primary
    concern since it remains efficient for MAPD instances with hundreds of agents
    and tasks. TPTS requires limited communication among agents and balances well
    between TP and the centralized MAPD algorithm.

    Experience Replay Using Transition Sequences

    Thommen George Karimpanal, Roland Bouffanais
    Comments: 23 pages, 6 figures, Submitted to the journal Artificial Intelligence
    Subjects: Artificial Intelligence (cs.AI)

    Experience replay is one of the most commonly used approaches to improve the
    sample efficiency of reinforcement learning algorithms. In this work, we
    propose an approach to select and replay sequences of transitions in order to
    accelerate the learning of a reinforcement learning agent in an off-policy
    setting. In addition to selecting appropriate sequences, we also artificially
    construct transition sequences using information gathered from previous
    agent-environment interactions. These sequences, when replayed, allow value
    function information to trickle down to larger sections of the
    state/state-action space, thereby making the most of the agent’s experience. We
    demonstrate our approach on modified versions of standard reinforcement
    learning tasks such as the mountain car and puddle world problems and
    empirically show that it enables better learning of value functions as compared
    to other forms of experience replay. Further, we briefly discuss some of the
    possible extensions to this work, as well as applications and situations where
    this approach could be particularly useful.

    The Morphospace of Consciousness

    Xerxes D. Arsiwalla, Clement Moulin-Frier, Ivan Herreros, Marti Sanchez-Fibla, Paul Verschure
    Comments: 19 pages, 3 figures
    Subjects: Neurons and Cognition (q-bio.NC); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Biological Physics (physics.bio-ph)

    Given recent proposals to synthesize consciousness, how many forms of
    conscious machines can one distinguish and on what grounds? Based on current
    clinical scales of consciousness, that measure cognitive awareness and
    wakefulness, we take a perspective on how contemporary artificially intelligent
    machines and synthetically engineered life forms would measure on these scales.
    To do so, we argue that awareness and wakefulness can be associated to
    computational and autonomous complexity respectively. Then, building on
    insights from cognitive robotics, we ask what function consciousness serves,
    and interpret it as an evolutionary game-theoretic strategy. We make the case
    for a third type of complexity necessary for describing consciousness, namely,
    social complexity. Having identified these complexity types, allows us to
    represent both, biological and synthetic systems in a common morphospace. This
    suggests an embodiment-based taxonomy of consciousness. In particular, we
    distinguish four forms of consciousness, based on embodiment: biological,
    synthetic, group (resulting from group interactions) and simulated
    consciousness (embodied by virtual agents within a simulated reality). Such a
    taxonomy is useful for studying comparative signatures of consciousness across
    domains, in order to highlight design principles necessary to engineer
    conscious machines. This is particularly relevant in the light of recent
    developments at the crossroads of neuroscience, biomedical engineering,
    artificial intelligence and biomimetics.

    Controllable Invariance through Adversarial Feature Learning

    Qizhe Xie, Zihang Dai, Yulun Du, Eduard Hovy, Graham Neubig
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Learning meaningful representations that maintain the content necessary for a
    particular task while filtering away detrimental variations is a problem of
    great interest in machine learning. In this paper, we tackle the problem of
    learning representations invariant to a specific factor or trait of data,
    leading to better generalization. The representation learning process is
    formulated as an adversarial minimax game. We analyze the optimal equilibrium
    of such a game and find that it amounts to maximizing the uncertainty of
    inferring the detrimental factor given the representation while maximizing the
    certainty of making task-specific predictions. On three benchmark tasks, namely
    fair and bias-free classification, language-independent generation, and
    lighting-independent image classification, we show that the proposed framework
    induces an invariant representation, and leads to better generalization
    evidenced by the improved test performance.

    End-to-end Differentiable Proving

    Tim Rocktäschel, Sebastian Riedel
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG); Logic in Computer Science (cs.LO)

    We introduce neural networks for end-to-end differentiable theorem proving
    that operate on dense vector representations of symbols. These neural networks
    are constructed recursively by taking inspiration from the backward chaining
    algorithm as used in Prolog. Specifically, we replace symbolic unification with
    a differentiable computation on vector representations of symbols using a
    radial basis function kernel, thereby combining symbolic reasoning with
    learning subsymbolic vector representations. By using gradient descent, the
    resulting neural network can be trained to infer facts from a given incomplete
    knowledge base. It learns to (i) place representations of similar symbols in
    close proximity in a vector space, (ii) make use of such similarities to prove
    facts, (iii) induce logical rules, and (iv) use provided and induced logical
    rules for complex multi-hop reasoning. We demonstrate that this architecture
    outperforms ComplEx, a state-of-the-art neural link prediction model, on four
    benchmark knowledge bases while at the same time inducing interpretable
    function-free first-order logic rules.

    Non-Markovian Control with Gated End-to-End Memory Policy Networks

    Julien Perez, Tomi Silander
    Comments: 11 pages, 1 figure, 1 table
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Partially observable environments present an important open challenge in the
    domain of sequential control learning with delayed rewards. Despite numerous
    attempts during the two last decades, the majority of reinforcement learning
    algorithms and associated approximate models, applied to this context, still
    assume Markovian state transitions. In this paper, we explore the use of a
    recently proposed attention-based model, the Gated End-to-End Memory Network,
    for sequential control. We call the resulting model the Gated End-to-End Memory
    Policy Network. More precisely, we use a model-free value-based algorithm to
    learn policies for partially observed domains using this memory-enhanced neural
    network. This model is end-to-end learnable and it features unbounded memory.
    Indeed, because of its attention mechanism and associated non-parametric
    memory, the proposed model allows us to define an attention mechanism over the
    observation stream unlike recurrent models. We show encouraging results that
    illustrate the capability of our attention-based model in the context of the
    continuous-state non-stationary control problem of stock trading. We also
    present an OpenAI Gym environment for simulated stock exchange and explain its
    relevance as a benchmark for the field of non-Markovian decision process
    learning.

    Adversarial Generation of Natural Language

    Sai Rajeswar, Sandeep Subramanian, Francis Dutil, Christopher Pal, Aaron Courville
    Comments: 11 pages, 3 figures, 5 tables
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Generative Adversarial Networks (GANs) have gathered a lot of attention from
    the computer vision community, yielding impressive results for image
    generation. Advances in the adversarial generation of natural language from
    noise however are not commensurate with the progress made in generating images,
    and still lag far behind likelihood based methods. In this paper, we take a
    step towards generating natural language with a GAN objective alone. We
    introduce a simple baseline that addresses the discrete output space problem
    without relying on gradient estimators and show that it is able to achieve
    state-of-the-art results on a Chinese poem generation dataset. We present
    quantitative results on generating sentences from context-free and
    probabilistic context-free grammars, and qualitative language modeling results.
    A conditional version is also described that can generate sequences conditioned
    on sentence characteristics.

    Unsupervised Learning of Disentangled Representations from Video

    Emily Denton, Vighnesh Birodkar
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    We present a new model DrNET that learns disentangled image representations
    from video. Our approach leverages the temporal coherence of video and a novel
    adversarial loss to learn a representation that factorizes each frame into a
    stationary part and a temporally varying component. The disentangled
    representation can be used for a range of tasks. For example, applying a
    standard LSTM to the time-vary components enables prediction of future frames.
    We evaluate our approach on a range of synthetic and real videos, demonstrating
    the ability to coherently generate hundreds of steps into the future.

    Morphological Error Detection in 3D Segmentations

    David Rolnick, Yaron Meirovitch, Toufiq Parag, Hanspeter Pfister, Viren Jain, Jeff W. Lichtman, Edward S. Boyden, Nir Shavit
    Comments: 13 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

    Deep learning algorithms for connectomics rely upon localized classification,
    rather than overall morphology. This leads to a high incidence of erroneously
    merged objects. Humans, by contrast, can easily detect such errors by acquiring
    intuition for the correct morphology of objects. Biological neurons have
    complicated and variable shapes, which are challenging to learn, and merge
    errors take a multitude of different forms. We present an algorithm, MergeNet,
    that shows 3D ConvNets can, in fact, detect merge errors from high-level
    neuronal morphology. MergeNet follows unsupervised training and operates across
    datasets. We demonstrate the performance of MergeNet both on a variety of
    connectomics data and on a dataset created from merged MNIST images.

    Practical Neural Network Performance Prediction for Early Stopping

    Bowen Baker, Otkrist Gupta, Ramesh Raskar, Nikhil Naik
    Comments: Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    In the neural network domain, methods for hyperparameter optimization and
    meta-modeling are computationally expensive due to the need to train a large
    number of neural network configurations. In this paper, we show that a simple
    regression model, based on support vector machines, can predict the final
    performance of partially trained neural network configurations using features
    based on network architectures, hyperparameters, and time-series validation
    performance data. We use this regression model to develop an early stopping
    strategy for neural network configurations. With this early stopping strategy,
    we obtain significant speedups in both hyperparameter optimization and
    meta-modeling. Particularly in the context of meta-modeling, our method can
    learn to predict the performance of drastically different architectures and is
    seamlessly incorporated into reinforcement learning-based architecture
    selection algorithms. Finally, we show that our method is simpler, faster, and
    more accurate than Bayesian methods for learning curve prediction.

    Semi-Supervised Learning for Detecting Human Trafficking

    Hamidreza Alvari, Paulo Shakarian, J.E. Kelly Snyder
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Human trafficking is one of the most atrocious crimes and among the
    challenging problems facing law enforcement which demands attention of global
    magnitude. In this study, we leverage textual data from the website “Backpage”-
    used for classified advertisement- to discern potential patterns of human
    trafficking activities which manifest online and identify advertisements of
    high interest to law enforcement. Due to the lack of ground truth, we rely on a
    human analyst from law enforcement, for hand-labeling a small portion of the
    crawled data. We extend the existing Laplacian SVM and present S3VM-R, by
    adding a regularization term to exploit exogenous information embedded in our
    feature space in favor of the task at hand. We train the proposed method using
    labeled and unlabeled data and evaluate it on a fraction of the unlabeled data,
    herein referred to as unseen data, with our expert’s further verification.
    Results from comparisons between our method and other semi-supervised and
    supervised approaches on the labeled data demonstrate that our learner is
    effective in identifying advertisements of high interest to law enforcement


    Information Retrieval

    LRSE: A Lightweight Efficient Searchable Encryption Scheme using Local and Global Representations

    Ruihui Zhao, Yuanliang Sun, Mizuho Iwaihara
    Comments: 9 pages, submitted to CIKM 2017
    Subjects: Information Retrieval (cs.IR); Cryptography and Security (cs.CR)

    Cloud computing is emerging as a revolutionary computing paradigm, while
    security and privacy become major concerns in the cloud scenario. For which
    Searchable Encryption (SE) technology is proposed to support efficient
    retrieval of encrypted data. However, the absence of lightweight ranked search
    with higher search quality in a harsh adversary model is still a typical
    shortage in existing SE schemes. In this paper, we propose a novel SE scheme
    called LRSE which firstly integrates machine learning methods into the
    framework of SE and combines local and global representations of encrypted
    cloud data to achieve the above design goals. In LRSE, we employ an improved
    secure kNN scheme to guarantee sufficient privacy protection. Our detailed
    security analysis shows that LRSE satisfies our formulated privacy
    requirements. Extensive experiments performed on benchmark datasets demonstrate
    that LRSE indeed achieves state-of-the-art search quality with lowest system
    cost.


    Computation and Language

    Are distributional representations ready for the real world? Evaluating word vectors for grounded perceptual meaning

    Li Lucy, Jon Gauthier
    Comments: Accepted at RoboNLP 2017
    Subjects: Computation and Language (cs.CL)

    Distributional word representation methods exploit word co-occurrences to
    build compact vector encodings of words. While these representations enjoy
    widespread use in modern natural language processing, it is unclear whether
    they accurately encode all necessary facets of conceptual meaning. In this
    paper, we evaluate how well these representations can predict perceptual and
    conceptual features of concrete concepts, drawing on two semantic norm datasets
    sourced from human participants. We find that several standard word
    representations fail to encode many salient perceptual features of concepts,
    and show that these deficits correlate with word-word similarity prediction
    errors. Our analyses provide motivation for grounded and embodied language
    learning approaches, which may help to remedy these deficits.

    Learning When to Attend for Neural Machine Translation

    Junhui Li, Muhua Zhu
    Comments: 5 pages, 2 figures
    Subjects: Computation and Language (cs.CL)

    In the past few years, attention mechanisms have become an indispensable
    component of end-to-end neural machine translation models. However, previous
    attention models always refer to some source words when predicting a target
    word, which contradicts with the fact that some target words have no
    corresponding source words. Motivated by this observation, we propose a novel
    attention model that has the capability of determining when a decoder should
    attend to source words and when it should not. Experimental results on NIST
    Chinese-English translation tasks show that the new model achieves an
    improvement of 0.8 BLEU score over a state-of-the-art baseline.

    Adversarial Ranking for Language Generation

    Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, Ming-Ting Sun
    Subjects: Computation and Language (cs.CL)

    Generative adversarial networks (GANs) have great successes on synthesizing
    data. However, the existing GANs restrict the discriminator to be a binary
    classifier, and thus limit their learning capacity for tasks that need to
    synthesize output with rich structures such as natural language descriptions.
    In this paper, we propose a novel generative adversarial network, RankGAN, for
    generating high-quality language descriptions. Rather than train the
    discriminator to learn and assign absolute binary predicate for individual data
    sample, the proposed RankGAN is able to analyze and rank a collection of
    human-written and machine-written sentences by giving a reference group. By
    viewing a set of data samples collectively and evaluating their quality through
    relative ranking scores, the discriminator is able to make better assessment
    which in turn helps to learn a better generator. The proposed RankGAN is
    optimized through the policy gradient technique. Experimental results on
    multiple public datasets clearly demonstrate the effectiveness of the proposed
    approach.

    Analysis of the Effect of Dependency Information on Predicate-Argument Structure Analysis and Zero Anaphora Resolution

    Koichiro Yoshino, Shinsuke Mori, Satoshi Nakamura
    Subjects: Computation and Language (cs.CL)

    This paper investigates and analyzes the effect of dependency information on
    predicate-argument structure analysis (PASA) and zero anaphora resolution (ZAR)
    for Japanese, and shows that a straightforward approach of PASA and ZAR works
    effectively even if dependency information was not available. We constructed an
    analyzer that directly predicts relationships of predicates and arguments with
    their semantic roles from a POS-tagged corpus. The features of the system are
    designed to compensate for the absence of syntactic information by using
    features used in dependency parsing as a reference. We also constructed
    analyzers that use the oracle dependency and the real dependency parsing
    results, and compared with the system that does not use any syntactic
    information to verify that the improvement provided by dependencies is not
    crucial.

    Adversarial Generation of Natural Language

    Sai Rajeswar, Sandeep Subramanian, Francis Dutil, Christopher Pal, Aaron Courville
    Comments: 11 pages, 3 figures, 5 tables
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Generative Adversarial Networks (GANs) have gathered a lot of attention from
    the computer vision community, yielding impressive results for image
    generation. Advances in the adversarial generation of natural language from
    noise however are not commensurate with the progress made in generating images,
    and still lag far behind likelihood based methods. In this paper, we take a
    step towards generating natural language with a GAN objective alone. We
    introduce a simple baseline that addresses the discrete output space problem
    without relying on gradient estimators and show that it is able to achieve
    state-of-the-art results on a Chinese poem generation dataset. We present
    quantitative results on generating sentences from context-free and
    probabilistic context-free grammars, and qualitative language modeling results.
    A conditional version is also described that can generate sequences conditioned
    on sentence characteristics.

    Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology Based Representations

    Paul Michel, Abhilasha Ravichander, Shruti Rijhwani
    Comments: 5 pages, 3 figures. Rep4NLP workshop at ACL 2017
    Subjects: Computation and Language (cs.CL)

    We investigate the pertinence of methods from algebraic topology for text
    data analysis. These methods enable the development of
    mathematically-principled isometric-invariant mappings from a set of vectors to
    a document embedding, which is stable with respect to the geometry of the
    document in the selected metric space. In this work, we evaluate the utility of
    these topology-based document representations in traditional NLP tasks,
    specifically document clustering and sentiment classification. We find that the
    embeddings do not benefit text analysis. In fact, performance is worse than
    simple techniques like ( extit{tf-idf}), indicating that the geometry of the
    document does not provide enough variability for classification on the basis of
    topic or sentiment in the chosen datasets.

    Character Composition Model with Convolutional Neural Networks for Dependency Parsing on Morphologically Rich Languages

    Xiang Yu, Ngoc Thang Vu
    Comments: Accepted in ACL 2017 (Short)
    Subjects: Computation and Language (cs.CL)

    We present a transition-based dependency parser that uses a convolutional
    neural network to compose word representations from characters. The character
    composition model shows great improvement over the word-lookup model,
    especially for parsing agglutinative languages. These improvements are even
    better than using pre-trained word embeddings from extra data. On the SPMRL
    data sets, our system outperforms the previous best greedy parser (Ballesteros
    et al., 2015) by a margin of 3% on average.

    Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols

    Serhii Havrylov, Ivan Titov
    Comments: Submitted to NIPS 2017. The extended abstract was presented at ICLR 2017 workshop track
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)

    Learning to communicate through interaction, rather than relying on explicit
    supervision, is often considered a prerequisite for developing a general AI. We
    study a setting where two agents engage in playing a referential game and, from
    scratch, develop a communication protocol necessary to succeed in this game.
    Unlike previous work, we require that messages they exchange, both at train and
    test time, are in the form of a language (i.e. sequences of discrete symbols).
    We compare a reinforcement learning approach and one using a differentiable
    relaxation (straight-through Gumbel-softmax estimator) and observe that the
    latter is much faster to converge and it results in more effective protocols.
    Interestingly, we also observe that the protocol we induce by optimizing the
    communication success exhibits a degree of compositionality and variability
    (i.e. the same information can be phrased in different ways), both properties
    characteristic of natural languages. As the ultimate goal is to ensure that
    communication is accomplished in natural language, we also perform experiments
    where we inject prior information about natural language into our model and
    study properties of the resulting protocol.

    Controllable Invariance through Adversarial Feature Learning

    Qizhe Xie, Zihang Dai, Yulun Du, Eduard Hovy, Graham Neubig
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Learning meaningful representations that maintain the content necessary for a
    particular task while filtering away detrimental variations is a problem of
    great interest in machine learning. In this paper, we tackle the problem of
    learning representations invariant to a specific factor or trait of data,
    leading to better generalization. The representation learning process is
    formulated as an adversarial minimax game. We analyze the optimal equilibrium
    of such a game and find that it amounts to maximizing the uncertainty of
    inferring the detrimental factor given the representation while maximizing the
    certainty of making task-specific predictions. On three benchmark tasks, namely
    fair and bias-free classification, language-independent generation, and
    lighting-independent image classification, we show that the proposed framework
    induces an invariant representation, and leads to better generalization
    evidenced by the improved test performance.

    Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

    Zixing Zhang, Jürgen Geiger, Jouni Pohjalainen, Amr El-Desoky Mousa, Björn Schuller
    Subjects: Sound (cs.SD); Computation and Language (cs.CL); Learning (cs.LG)

    Eliminating the negative effect of highly non-stationary environmental noise
    is a long-standing research topic for speech recognition but remains an
    important challenge nowadays. To address this issue, traditional unsupervised
    signal processing methods seem to have touched the ceiling. However,
    data-driven based supervised approaches, particularly the ones designed with
    deep learning, have recently emerged as potential alternatives. In this light,
    we are going to comprehensively summarise the recently developed and most
    representative deep learning approaches to deal with the raised problem in this
    article, with the aim of providing guidelines for those who are going deeply
    into the field of environmentally robust speech recognition. To better
    introduce these approaches, we categorise them into single- and multi-channel
    techniques, each of which is specifically described at the front-end, the
    back-end, and the joint framework of speech recognition systems. In the
    meanwhile, we describe the pros and cons of these approaches as well as the
    relationships among them, which can probably benefit future research.


    Distributed, Parallel, and Cluster Computing

    Extreme-Scale De Novo Genome Assembly

    Evangelos Georganas, Steven Hofmeyr, Rob Egan, Aydin Buluc, Leonid Oliker, Daniel Rokhsar, Katherine Yelick
    Comments: To appear as a chapter in Exascale Scientific Applications: Programming Approaches for Scalability, Performance, and Portability, Straatsma, Antypas, Williams (editors), CRC Press, 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    De novo whole genome assembly reconstructs genomic sequence from short,
    overlapping, and potentially erroneous DNA segments and is one of the most
    important computations in modern genomics. This work presents HipMER, a
    high-quality end-to-end de novo assembler designed for extreme scale analysis,
    via efficient parallelization of the Meraculous code. Genome assembly software
    has many components, each of which stresses different components of a computer
    system. This chapter explains the computational challenges involved in each
    step of the HipMer pipeline, the key distributed data structures, and
    communication costs in detail. We present performance results of assembling the
    human genome and the large hexaploid wheat genome on large supercomputers up to
    tens of thousands of cores.

    Implicit Consensus: Blockchain with Unbounded Throughput

    Zhijie Ren, Kelong Cong, Johan Pouwelse, Zekeriya Erkin
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Recently, the blockchain technique was put in the spotlight as it introduced
    a systematic approach for multiple parties to reach consensus without needing
    trust. However, the application of this technique in practice is severely
    restricted due to its limitations in throughput, reliability, storage
    requirement, and privacy. In this paper, we propose a novel consensus model,
    namely the implicit consensus, with a distinctive blockchain-based distributed
    ledger in which each node holds its individual blockchain. In our system, the
    agreement is not on the transactions, but on a special type of blocks called
    Check Points that are used to validate individual transactions. Our system
    achieves superior performance over all existing blockchain techniques in
    multiple aspects with equivalent reliability. Most remarkably, our system
    achieves unbounded throughput which is by far the best throughput achieved by
    any blockchain technique.

    Distributed Simulation Platform for Autonomous Driving

    Jie Tang, Shaoshan Liu, Chao Wang, Quan Wang
    Comments: 12 pages, 7 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Robotics (cs.RO)

    Autonomous vehicle safety and reliability are the paramount requirements when
    developing autonomous vehicles. These requirements are guaranteed by massive
    functional and performance tests. Conducting these tests on real vehicles is
    extremely expensive and time consuming, and thus it is imperative to develop a
    simulation platform to perform these tasks. For simulation, we can utilize the
    Robot Operating System (ROS) for data playback to test newly developed
    algorithms. However, due to the massive amount of simulation data, performing
    simulation on single machines is not practical. Hence, a high-performance
    distributed simulation platform is a critical piece in autonomous driving
    development. In this paper we present our experiences of building a production
    distributed autonomous driving simulation platform. This platform is built upon
    Spark distributed framework, for distributed computing management, and ROS, for
    data playback simulations.

    Minimizing the Cost of Team Exploration

    Dorota Urbańska-Osula
    Comments: 10 pages, 2 figures, 3 pseudo-codes
    Subjects: Discrete Mathematics (cs.DM); Distributed, Parallel, and Cluster Computing (cs.DC); Combinatorics (math.CO)

    A group of mobile entities is given a task to explore an edge-weighted tree
    (T), i.e. every vertex of (T) has to be visited by at least one agent. There is
    no centralized unit to coordinate their actions, but they can freely
    communicate with each other and they know the structure of (T) in advance. The
    goal is to construct a deterministic strategy which allows robots to complete
    their task optimally.

    In this paper we are interested in a cost-optimal strategy, where the cost is
    understood as the sum of the total distance traversed by agents and the cost of
    invoking them. We present an algorithm that finds an optimal solution for a
    given (n)-node tree in (O(n)) time.


    Learning

    Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols

    Serhii Havrylov, Ivan Titov
    Comments: Submitted to NIPS 2017. The extended abstract was presented at ICLR 2017 workshop track
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)

    Learning to communicate through interaction, rather than relying on explicit
    supervision, is often considered a prerequisite for developing a general AI. We
    study a setting where two agents engage in playing a referential game and, from
    scratch, develop a communication protocol necessary to succeed in this game.
    Unlike previous work, we require that messages they exchange, both at train and
    test time, are in the form of a language (i.e. sequences of discrete symbols).
    We compare a reinforcement learning approach and one using a differentiable
    relaxation (straight-through Gumbel-softmax estimator) and observe that the
    latter is much faster to converge and it results in more effective protocols.
    Interestingly, we also observe that the protocol we induce by optimizing the
    communication success exhibits a degree of compositionality and variability
    (i.e. the same information can be phrased in different ways), both properties
    characteristic of natural languages. As the ultimate goal is to ensure that
    communication is accomplished in natural language, we also perform experiments
    where we inject prior information about natural language into our model and
    study properties of the resulting protocol.

    Reinforcement Learning for Learning Rate Control

    Chang Xu, Tao Qin, Gang Wang, Tie-Yan Liu
    Comments: 7 pages, 9 figures
    Subjects: Learning (cs.LG)

    Stochastic gradient descent (SGD), which updates the model parameters by
    adding a local gradient times a learning rate at each step, is widely used in
    model training of machine learning algorithms such as neural networks. It is
    observed that the models trained by SGD are sensitive to learning rates and
    good learning rates are problem specific. We propose an algorithm to
    automatically learn learning rates using neural network based actor-critic
    methods from deep reinforcement learning (RL).In particular, we train a policy
    network called actor to decide the learning rate at each step during training,
    and a value network called critic to give feedback about quality of the
    decision (e.g., the goodness of the learning rate outputted by the actor) that
    the actor made. The introduction of auxiliary actor and critic networks helps
    the main network achieve better performance. Experiments on different datasets
    and network architectures show that our approach leads to better convergence of
    SGD than human-designed competitors.

    Controllable Invariance through Adversarial Feature Learning

    Qizhe Xie, Zihang Dai, Yulun Du, Eduard Hovy, Graham Neubig
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Learning meaningful representations that maintain the content necessary for a
    particular task while filtering away detrimental variations is a problem of
    great interest in machine learning. In this paper, we tackle the problem of
    learning representations invariant to a specific factor or trait of data,
    leading to better generalization. The representation learning process is
    formulated as an adversarial minimax game. We analyze the optimal equilibrium
    of such a game and find that it amounts to maximizing the uncertainty of
    inferring the detrimental factor given the representation while maximizing the
    certainty of making task-specific predictions. On three benchmark tasks, namely
    fair and bias-free classification, language-independent generation, and
    lighting-independent image classification, we show that the proposed framework
    induces an invariant representation, and leads to better generalization
    evidenced by the improved test performance.

    Information Theoretic Properties of Markov Random Fields, and their Algorithmic Applications

    Linus Hamilton, Frederic Koehler, Ankur Moitra
    Comments: 25 pages
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Statistics Theory (math.ST)

    Markov random fields area popular model for high-dimensional probability
    distributions. Over the years, many mathematical, statistical and algorithmic
    problems on them have been studied. Until recently, the only known algorithms
    for provably learning them relied on exhaustive search, correlation decay or
    various incoherence assumptions. Bresler gave an algorithm for learning general
    Ising models on bounded degree graphs. His approach was based on a structural
    result about mutual information in Ising models.

    Here we take a more conceptual approach to proving lower bounds on the mutual
    information through setting up an appropriate zero-sum game. Our proof
    generalizes well beyond Ising models, to arbitrary Markov random fields with
    higher order interactions. As an application, we obtain algorithms for learning
    Markov random fields on bounded degree graphs on (n) nodes with (r)-order
    interactions in (n^r) time and (log n) sample complexity. The sample
    complexity is information theoretically optimal up to the dependence on the
    maximum degree. The running time is nearly optimal under standard conjectures
    about the hardness of learning parity with noise.

    HiNet: Hierarchical Classification with Neural Network

    Zhenzhou Wu, Sean Saito
    Subjects: Learning (cs.LG)

    Traditionally, classifying large hierarchical labels with more than 10000
    distinct traces can only be achieved with flatten labels. Although flatten
    labels is feasible, it misses the hierarchical information in the labels.
    Hierarchical models like HSVM by cite{vural2004hierarchical} becomes
    impossible to train because of the sheer number of SVMs in the whole
    architecture. We developed a hierarchical architecture based on neural networks
    that is simple to train. Also, we derived an inference algorithm that can
    efficiently infer the MAP (maximum a posteriori) trace guaranteed by our
    theorems. Furthermore, the complexity of the model is only (O(n^2)) compared to
    (O(n^h)) in a flatten model, where (h) is the height of the hierarchy.

    Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees

    Francesco Locatello, Michael Tschannen, Gunnar Rätsch, Martin Jaggi
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe
    (FW) algorithms regained popularity in recent years due to their simplicity,
    effectiveness and theoretical guarantees. MP and FW address optimization over
    the linear span and the convex hull of a set of atoms, respectively. In this
    paper, we consider the intermediate case of optimization over the convex cone,
    parametrized as the conic hull of a generic atom set, leading to the first
    principled definitions of non-negative MP algorithms for which we give explicit
    convergence rates and demonstrate excellent empirical performance. In
    particular, we derive sublinear ((mathcal{O}(1/t))) convergence on general
    smooth and convex objectives, and linear convergence ((mathcal{O}(e^{-t}))) on
    strongly convex objectives, in both cases for general sets of atoms.
    Furthermore, we establish a clear correspondence of our algorithms to known
    algorithms from the MP and FW literature. Our novel algorithms and analyses
    target general atom sets and general objective functions, and hence are
    directly applicable to a large variety of learning settings.

    The ALAMO approach to machine learning

    Zachary T. Wilson, Nikolaos V. Sahinidis
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    ALAMO is a computational methodology for leaning algebraic functions from
    data. Given a data set, the approach begins by building a low-complexity,
    linear model composed of explicit non-linear transformations of the independent
    variables. Linear combinations of these non-linear transformations allow a
    linear model to better approximate complex behavior observed in real processes.
    The model is refined, as additional data are obtained in an adaptive fashion
    through error maximization sampling using derivative-free optimization. Models
    built using ALAMO can enforce constraints on the response variables to
    incorporate first-principles knowledge. The ability of ALAMO to generate simple
    and accurate models for a number of reaction problems is demonstrated. The
    error maximization sampling is compared with Latin hypercube designs to
    demonstrate its sampling efficiency. ALAMO’s constrained regression methodology
    is used to further refine concentration models, resulting in models that
    perform better on validation data and satisfy upper and lower bounds placed on
    model outputs.

    Unsupervised Learning of Disentangled Representations from Video

    Emily Denton, Vighnesh Birodkar
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    We present a new model DrNET that learns disentangled image representations
    from video. Our approach leverages the temporal coherence of video and a novel
    adversarial loss to learn a representation that factorizes each frame into a
    stationary part and a temporally varying component. The disentangled
    representation can be used for a range of tasks. For example, applying a
    standard LSTM to the time-vary components enables prediction of future frames.
    We evaluate our approach on a range of synthetic and real videos, demonstrating
    the ability to coherently generate hundreds of steps into the future.

    High Dimensional Structured Superposition Models

    Qilong Gu, Arindam Banerjee
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    High dimensional superposition models characterize observations using
    parameters which can be written as a sum of multiple component parameters, each
    with its own structure, e.g., sum of low rank and sparse matrices, sum of
    sparse and rotated sparse vectors, etc. In this paper, we consider general
    superposition models which allow sum of any number of component parameters, and
    each component structure can be characterized by any norm. We present a simple
    estimator for such models, give a geometric condition under which the
    components can be accurately estimated, characterize sample complexity of the
    estimator, and give high probability non-asymptotic bounds on the componentwise
    estimation error. We use tools from empirical processes and generic chaining
    for the statistical analysis, and our results, which substantially generalize
    prior work on superposition models, are in terms of Gaussian widths of suitable
    sets.

    Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained ERM

    Katrina Ligett, Seth Neel, Aaron Roth, Bo Waggoner, Z. Steven Wu
    Comments: 24 pages single-column
    Subjects: Learning (cs.LG)

    Traditional approaches to differential privacy assume a fixed privacy
    requirement (epsilon) for a computation, and attempt to maximize the accuracy
    of the computation subject to the privacy constraint. As differential privacy
    is increasingly deployed in practical settings, it may often be that there is
    instead a fixed accuracy requirement for a given computation and the data
    analyst would like to maximize the privacy of the computation subject to the
    accuracy constraint. This raises the question of how to find and run a
    maximally private empirical risk minimizer subject to a given accuracy
    requirement. We propose a general “noise reduction” framework that can apply to
    a variety of private empirical risk minimization (ERM) algorithms, using them
    to “search” the space of privacy levels to find the empirically strongest one
    that meets the accuracy constraint, incurring only logarithmic overhead in the
    number of privacy levels searched. The privacy analysis of our algorithm leads
    naturally to a version of differential privacy where the privacy parameters are
    dependent on the data, which we term ex-post privacy, and which is related to
    the recently introduced notion of privacy odometers. We also give an ex-post
    privacy analysis of the classical AboveThreshold privacy tool, modifying it to
    allow for queries chosen depending on the database. Finally, we apply our
    approach to two common objectives, regularized linear and logistic regression,
    and empirically compare our noise reduction methods to (i) inverting the
    theoretical utility guarantees of standard private ERM algorithms and (ii) a
    stronger, empirical baseline based on binary search.

    Practical Neural Network Performance Prediction for Early Stopping

    Bowen Baker, Otkrist Gupta, Ramesh Raskar, Nikhil Naik
    Comments: Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    In the neural network domain, methods for hyperparameter optimization and
    meta-modeling are computationally expensive due to the need to train a large
    number of neural network configurations. In this paper, we show that a simple
    regression model, based on support vector machines, can predict the final
    performance of partially trained neural network configurations using features
    based on network architectures, hyperparameters, and time-series validation
    performance data. We use this regression model to develop an early stopping
    strategy for neural network configurations. With this early stopping strategy,
    we obtain significant speedups in both hyperparameter optimization and
    meta-modeling. Particularly in the context of meta-modeling, our method can
    learn to predict the performance of drastically different architectures and is
    seamlessly incorporated into reinforcement learning-based architecture
    selection algorithms. Finally, we show that our method is simpler, faster, and
    more accurate than Bayesian methods for learning curve prediction.

    Semi-Supervised Learning for Detecting Human Trafficking

    Hamidreza Alvari, Paulo Shakarian, J.E. Kelly Snyder
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Human trafficking is one of the most atrocious crimes and among the
    challenging problems facing law enforcement which demands attention of global
    magnitude. In this study, we leverage textual data from the website “Backpage”-
    used for classified advertisement- to discern potential patterns of human
    trafficking activities which manifest online and identify advertisements of
    high interest to law enforcement. Due to the lack of ground truth, we rely on a
    human analyst from law enforcement, for hand-labeling a small portion of the
    crawled data. We extend the existing Laplacian SVM and present S3VM-R, by
    adding a regularization term to exploit exogenous information embedded in our
    feature space in favor of the task at hand. We train the proposed method using
    labeled and unlabeled data and evaluate it on a fraction of the unlabeled data,
    herein referred to as unseen data, with our expert’s further verification.
    Results from comparisons between our method and other semi-supervised and
    supervised approaches on the labeled data demonstrate that our learner is
    effective in identifying advertisements of high interest to law enforcement

    SuperSpike: Supervised learning in multi-layer spiking neural networks

    Friedemann Zenke, Surya Ganguli
    Subjects: Neurons and Cognition (q-bio.NC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    A vast majority of computation in the brain is performed by spiking neural
    networks. Despite the ubiquity of such spiking, we currently lack an
    understanding of how biological spiking neural circuits learn and compute
    in-vivo, as well as how we can instantiate such capabilities in artificial
    spiking circuits in-silico. Here we revisit the problem of supervised learning
    in temporally coding multi-layer spiking neural networks. First, by using a
    surrogate gradient approach, we derive SuperSpike, a nonlinear voltage-based
    three factor learning rule capable of training multi-layer networks of
    deterministic integrate-and-fire neurons to perform nonlinear computations on
    spatiotemporal spike patterns. Second, inspired by recent results on feedback
    alignment, we compare the performance of our learning rule under different
    credit assignment strategies for propagating output errors to hidden units.
    Specifically, we test uniform, symmetric and random feedback, finding that
    simpler tasks can be solved with any type of feedback, while more complex tasks
    require symmetric feedback. In summary, our results open the door to obtaining
    a better scientific understanding of learning and computation in spiking neural
    networks by advancing our ability to train them to solve nonlinear problems
    involving transformations between different spatiotemporal spike-time patterns.

    End-to-end Differentiable Proving

    Tim Rocktäschel, Sebastian Riedel
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG); Logic in Computer Science (cs.LO)

    We introduce neural networks for end-to-end differentiable theorem proving
    that operate on dense vector representations of symbols. These neural networks
    are constructed recursively by taking inspiration from the backward chaining
    algorithm as used in Prolog. Specifically, we replace symbolic unification with
    a differentiable computation on vector representations of symbols using a
    radial basis function kernel, thereby combining symbolic reasoning with
    learning subsymbolic vector representations. By using gradient descent, the
    resulting neural network can be trained to infer facts from a given incomplete
    knowledge base. It learns to (i) place representations of similar symbols in
    close proximity in a vector space, (ii) make use of such similarities to prove
    facts, (iii) induce logical rules, and (iv) use provided and induced logical
    rules for complex multi-hop reasoning. We demonstrate that this architecture
    outperforms ComplEx, a state-of-the-art neural link prediction model, on four
    benchmark knowledge bases while at the same time inducing interpretable
    function-free first-order logic rules.

    Criticality & Deep Learning II: Momentum Renormalisation Group

    Dan Oprisa, Peter Toth
    Subjects: Statistical Mechanics (cond-mat.stat-mech); Learning (cs.LG)

    Guided by critical systems found in nature we develop a novel mechanism
    consisting of inhomogeneous polynomial regularisation via which we can induce
    scale invariance in deep learning systems. Technically, we map our deep
    learning (DL) setup to a genuine field theory, on which we act with the
    Renormalisation Group (RG) in momentum space and produce the flow equations of
    the couplings; those are translated to constraints and consequently interpreted
    as “critical regularisation” conditions in the optimiser; the resulting
    equations hence prove to be sufficient conditions for – and serve as an elegant
    and simple mechanism to induce scale invariance in any deep learning setup.

    Non-Markovian Control with Gated End-to-End Memory Policy Networks

    Julien Perez, Tomi Silander
    Comments: 11 pages, 1 figure, 1 table
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Partially observable environments present an important open challenge in the
    domain of sequential control learning with delayed rewards. Despite numerous
    attempts during the two last decades, the majority of reinforcement learning
    algorithms and associated approximate models, applied to this context, still
    assume Markovian state transitions. In this paper, we explore the use of a
    recently proposed attention-based model, the Gated End-to-End Memory Network,
    for sequential control. We call the resulting model the Gated End-to-End Memory
    Policy Network. More precisely, we use a model-free value-based algorithm to
    learn policies for partially observed domains using this memory-enhanced neural
    network. This model is end-to-end learnable and it features unbounded memory.
    Indeed, because of its attention mechanism and associated non-parametric
    memory, the proposed model allows us to define an attention mechanism over the
    observation stream unlike recurrent models. We show encouraging results that
    illustrate the capability of our attention-based model in the context of the
    continuous-state non-stationary control problem of stock trading. We also
    present an OpenAI Gym environment for simulated stock exchange and explain its
    relevance as a benchmark for the field of non-Markovian decision process
    learning.

    FALKON: An Optimal Large Scale Kernel Method

    Alessandro Rudi, Luigi Carratino, Lorenzo Rosasco
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Kernel methods provide a principled way to perform non linear, nonparametric
    learning. They rely on solid functional analytic foundations and enjoy optimal
    statistical properties. However, at least in their basic form, they have
    limited applicability in large scale scenarios because of stringent
    computational requirements in terms of time and especially memory. In this
    paper, we take a substantial step in scaling up kernel methods, proposing
    FALKON, a novel algorithm that allows to efficiently process millions of
    points. FALKON is derived combining several algorithmic principles, namely
    stochastic projections, iterative solvers and preconditioning. Our theoretical
    analysis shows that optimal statistical accuracy is achieved requiring
    essentially (O(n)) memory and (O(nsqrt{n})) time. Extensive experiments show
    that state of the art results on available large scale datasets can be achieved
    even on a single machine.

    Spectral Norm Regularization for Improving the Generalizability of Deep Learning

    Yuichi Yoshida, Takeru Miyato
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We investigate the generalizability of deep learning based on the sensitivity
    to input perturbation. We hypothesize that the high sensitivity to the
    perturbation of data degrades the performance on it. To reduce the sensitivity
    to perturbation, we propose a simple and effective regularization method,
    referred to as spectral norm regularization, which penalizes the high spectral
    norm of weight matrices in neural networks. We provide supportive evidence for
    the abovementioned hypothesis by experimentally confirming that the models
    trained using spectral norm regularization exhibit better generalizability than
    other baseline methods.

    Sequential Dynamic Decision Making with Deep Neural Nets on a Test-Time Budget

    Henghui Zhu, Feng Nan, Ioannis Paschalidis, Venkatesh Saligrama
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Deep neural network (DNN) based approaches hold significant potential for
    reinforcement learning (RL) and have already shown remarkable gains over
    state-of-art methods in a number of applications. The effectiveness of DNN
    methods can be attributed to leveraging the abundance of supervised data to
    learn value functions, Q-functions, and policy function approximations without
    the need for feature engineering. Nevertheless, the deployment of DNN-based
    predictors with very deep architectures can pose an issue due to computational
    and other resource constraints at test-time in a number of applications. We
    propose a novel approach for reducing the average latency by learning a
    computationally efficient gating function that is capable of recognizing states
    in a sequential decision process for which policy prescriptions of a shallow
    network suffices and deeper layers of the DNN have little marginal utility. The
    overall system is adaptive in that it dynamically switches control actions
    based on state-estimates in order to reduce average latency without sacrificing
    terminal performance. We experiment with a number of alternative loss-functions
    to train gating functions and shallow policies and show that in a number of
    applications a speed-up of up to almost 5X can be obtained with little loss in
    performance.

    Sparse and low-rank approximations of large symmetric matrices using biharmonic interpolation

    Javier S. Turek, Alexander Huth
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Numerical Analysis (cs.NA)

    Symmetric matrices are widely used in machine learning problems such as
    kernel machines and manifold learning. Using large datasets often requires
    computing low-rank approximations of these symmetric matrices so that they fit
    in memory. In this paper, we present a novel method based on biharmonic
    interpolation for low-rank matrix approximation. The method exploits knowledge
    of the data manifold to learn an interpolation operator that approximates
    values using a subset of randomly selected landmark points. This operator is
    readily sparsified, reducing memory requirements by at least two orders of
    magnitude without significant loss in accuracy. We show that our method can
    approximate very large datasets using twenty times more landmarks than other
    methods. Further, numerical results suggest that our method is stable even when
    numerical difficulties arise for other methods.

    Optimization of Tree Ensembles

    Velibor V. Mišić
    Comments: 48 pages, 10 figures
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)

    Tree ensemble models such as random forests and boosted trees are among the
    most widely used and practically successful predictive models in applied
    machine learning and business analytics. Although such models have been used to
    make predictions based on exogenous, uncontrollable independent variables, they
    are increasingly being used to make predictions where the independent variables
    are controllable and are also decision variables. In this paper, we study the
    problem of tree ensemble optimization: given a tree ensemble that predicts some
    dependent variable using controllable independent variables, how should we set
    these variables so as to maximize the predicted value? We formulate the problem
    as a mixed-integer optimization problem. We theoretically examine the strength
    of our formulation, provide a hierarchy of approximate formulations with bounds
    on approximation quality and exploit the structure of the problem to develop
    two large-scale solution methods, one based on Benders decomposition and one
    based on iteratively generating tree split constraints. We test our methodology
    on real data sets, including two case studies in drug design and customized
    pricing, and show that our methodology can efficiently solve large-scale
    instances to near or full optimality, and outperforms solutions obtained by
    heuristic approaches. In our drug design case, we show how our approach can
    identify compounds that efficiently trade-off predicted performance and novelty
    with respect to existing, known compounds. In our customized pricing case, we
    show how our approach can efficiently determine optimal store-level prices
    under a random forest model that delivers excellent predictive accuracy.

    Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

    Zixing Zhang, Jürgen Geiger, Jouni Pohjalainen, Amr El-Desoky Mousa, Björn Schuller
    Subjects: Sound (cs.SD); Computation and Language (cs.CL); Learning (cs.LG)

    Eliminating the negative effect of highly non-stationary environmental noise
    is a long-standing research topic for speech recognition but remains an
    important challenge nowadays. To address this issue, traditional unsupervised
    signal processing methods seem to have touched the ceiling. However,
    data-driven based supervised approaches, particularly the ones designed with
    deep learning, have recently emerged as potential alternatives. In this light,
    we are going to comprehensively summarise the recently developed and most
    representative deep learning approaches to deal with the raised problem in this
    article, with the aim of providing guidelines for those who are going deeply
    into the field of environmentally robust speech recognition. To better
    introduce these approaches, we categorise them into single- and multi-channel
    techniques, each of which is specifically described at the front-end, the
    back-end, and the joint framework of speech recognition systems. In the
    meanwhile, we describe the pros and cons of these approaches as well as the
    relationships among them, which can probably benefit future research.

    Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models

    Gabriel Lima Guimaraes, Benjamin Sanchez-Lengeling, Pedro Luis Cunha Farias, Alán Aspuru-Guzik
    Comments: 10 pages, 7 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    In unsupervised data generation tasks, besides the generation of a sample
    based on previous observations, one would often like to give hints to the model
    in order to bias the generation towards desirable metrics. We propose a method
    that combines Generative Adversarial Networks (GANs) and reinforcement learning
    (RL) in order to accomplish exactly that. While RL biases the data generation
    process towards arbitrary metrics, the GAN component of the reward function
    ensures that the model still remembers information learned from data. We build
    upon previous results that incorporated GANs and RL in order to generate
    sequence data and test this model in several settings for the generation of
    molecules encoded as text sequences (SMILES) and in the context of music
    generation, showing for each case that we can effectively bias the generation
    process towards desired metrics.


    Information Theory

    Models and information-theoretic bounds for nanopore sequencing

    Wei Mao, Suhas Diggavi, Sreeram Kannan
    Subjects: Information Theory (cs.IT)

    Nanopore sequencing is an emerging new technology for sequencing DNA, which
    can read long fragments of DNA (~50,000 bases) in contrast to most current
    (short-read) sequencing technologies which can only read hundreds of bases.
    While nanopore sequencers can acquire long reads, the high error rates
    (20%-30%) pose a technical challenge. In a nanopore sequencer, a DNA is
    migrated through a nanopore and current variations are measured. The DNA
    sequence is inferred from this observed current pattern using an algorithm
    called a base-caller. In this paper, we propose a mathematical model for the
    “channel” from the input DNA sequence to the observed current, and calculate
    bounds on the information extraction capacity of the nanopore sequencer. This
    model incorporates impairments like (non-linear) inter-symbol interference,
    deletions, as well as random response. The potential practical application of
    such information bounds is two-fold: (1) benchmarking present base-calling
    algorithms against those theoretical bounds, and (2) offering an optimization
    objective for designing better nanopore sequencers.

    On One Generalization of LRC Codes with Availability

    Stanislav Kruglik, Marina Dudina, Valeriya Potapova, Alexey Frolov
    Comments: submitted to ITW 2017. arXiv admin note: text overlap with arXiv:1702.01314
    Subjects: Information Theory (cs.IT)

    We investigate one possible generalization of locally recoverable codes (LRC)
    with all-symbol locality and availability when recovering sets can intersect in
    a small number of coordinates. This feature allows us to increase the
    achievable code rate and still meet load balancing requirements. In this paper
    we derive an upper bound for the rate of such codes and give explicit
    constructions of codes with such a property. These constructions utilize LRC
    codes developed by Wang et al.

    Generalised Precoded Spatial Modulation for Integrated Wireless Information and Power Transfer

    Rong Zhang, Lie-Liang Yang, Lajos Hanzo
    Subjects: Information Theory (cs.IT)

    Conventional wireless information transfer by modulating the amplitude, phase
    or frequency leads to an inevitable Rate-Energy (RE) trade-off in the presence
    of simultaneous Wireless Power Transfer (WPT). In echoing Varshney’s seminal
    concept of jointly transmitting both information and energy, we propose the
    so-called Generalised Precoded Spatial Modulation (GPSM) aided Integrated
    Wireless Information and Power Transfer (IWIPT) concept employing a power-split
    receiver. The principle of GPSM is that a particular subset of Receive Antennas
    (RA) is activated and the activation pattern itself conveys useful information.
    Hence, the novelty of our GPSM aided IWIPT concept is that RA pattern-based
    information transfer is used in addition to the conventional waveform-based
    information carried by the classic M-ary modulation. Following the Radio
    Frequency (RF) to Direct Current (DC) power conversion invoked for WPT at the
    power-split receiver, the non-coherent detector simply compares the remaining
    received power accumulated by each legitimate RA pattern for the sake of
    identifying the most likely RA. This operation is then followed by
    down-conversion and conventional Base Band (BB) M-ary detection. Both our
    analysis and simulations show that the RA pattern based information transfer
    represented in the Spatial Domain (SD) exhibits a beneficial immunity to any
    potential power conversion induced performance degradation and hence improves
    the overall RE trade-off when additionally the waveform-based information
    transfer is also taken into account. Moreover, we investigate the impact of
    realistic imperfect Channel State Information at the Transmitter (CSIT) as well
    as that of the antenna correlations encountered. Finally, the system’s
    asymptotic performance is characterised in the context of large-scale Multiple
    Input Multiple Output (MIMO) systems.

    Max-Min Fair Transmit Precoding for Multi-group Multicasting in Massive MIMO

    Meysam Sadeghi, Emil Björnson, Erik G. Larsson, Chau Yuen, Thomas L. Marzetta
    Subjects: Information Theory (cs.IT)

    This paper considers the downlink precoding for physical layer multicasting
    in massive multiple-input-multiple-output (MIMO) systems. We study the max-min
    fairness (MMF) problem, where channel state information (CSI) at the
    transmitter is used to design precoding vectors that maximize the minimum
    spectral efficiency (SE) of the system, given fixed power budgets for uplink
    training and downlink transmission. Our system model accounts for channel
    estimation, pilot contamination, arbitrary pathlosses, and multi-group
    multicasting. We consider six scenarios with different transmission
    technologies (unicast and multicast), different pilot assignment strategies
    (dedicated or shared pilot assignments), and different precoding schemes
    (maximum ratio transmission and zero forcing), and derive achievable spectral
    efficiencies for all possible combinations. Then we solve the MMF problem for
    each of these scenarios and for any given pilot length we find the SE
    maximizing uplink pilot and downlink data transmission policies, all in
    closed-forms. We use these results to draw a general guideline for massive MIMO
    multicasting design, where for a given number of base station antennas, number
    of users, and coherence interval length, we determine the multicasting scheme
    that shall be used.

    Complex Quadrature Spatial Modulation

    Manar Mohaisen, Saetbyeol Lee
    Comments: 11 pages, 3 tables, 11 figures. ETRI Journal, 2017
    Subjects: Information Theory (cs.IT)

    In this paper, we propose a spatial modulation (SM) scheme referred to as
    complex quadrature spatial modulation (CQSM). In contrast to quadrature spatial
    modulation (QSM), CQSM transmits two complex signal constellation symbols on
    the real and quadrature spatial dimensions at each channel use, increasing the
    spectral efficiency. To this end, signal symbols transmitted at any given time
    instant are drawn from two different modulation sets. The first modulation set
    is any of the conventional QAM/PSK alphabets, while the second is a rotated
    version of it. The optimal rotation angle is obtained through simulations for
    several modulation schemes and analytically proven for the case of QPSK, where
    both results coincide. Simulation results showed that CQSM outperformed QSM and
    generalized SM (GSM) by approximately 5 and 4.5 dB, respectively, for the same
    transmission rate. Its performance was similar to that of QSM; however, it
    achieved higher transmission rates. It was additionally shown numerically and
    analytically that CQSM outperformed QSM for a relatively large number of
    transmit antennas.

    Information Theoretic Properties of Markov Random Fields, and their Algorithmic Applications

    Linus Hamilton, Frederic Koehler, Ankur Moitra
    Comments: 25 pages
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Statistics Theory (math.ST)

    Markov random fields area popular model for high-dimensional probability
    distributions. Over the years, many mathematical, statistical and algorithmic
    problems on them have been studied. Until recently, the only known algorithms
    for provably learning them relied on exhaustive search, correlation decay or
    various incoherence assumptions. Bresler gave an algorithm for learning general
    Ising models on bounded degree graphs. His approach was based on a structural
    result about mutual information in Ising models.

    Here we take a more conceptual approach to proving lower bounds on the mutual
    information through setting up an appropriate zero-sum game. Our proof
    generalizes well beyond Ising models, to arbitrary Markov random fields with
    higher order interactions. As an application, we obtain algorithms for learning
    Markov random fields on bounded degree graphs on (n) nodes with (r)-order
    interactions in (n^r) time and (log n) sample complexity. The sample
    complexity is information theoretically optimal up to the dependence on the
    maximum degree. The running time is nearly optimal under standard conjectures
    about the hardness of learning parity with noise.

    A general theory of singular values with applications to signal denoising

    Harm Derksen
    Comments: 59 pages, 27 figures
    Subjects: Numerical Analysis (math.NA); Information Theory (cs.IT); Numerical Analysis (cs.NA); Optimization and Control (math.OC); Statistics Theory (math.ST)

    We study the Pareto frontier for two competing norms (|cdot|_X) and
    (|cdot|_Y) on a vector space. For a given vector (c), the pareto frontier
    describes the possible values of ((|a|_X,|b|_Y)) for a decomposition
    (c=a+b). The singular value decomposition of a matrix is closely related to the
    Pareto frontier for the spectral and nuclear norm. We will develop a general
    theory that extends the notion of singular values of a matrix to arbitrary
    finite dimensional euclidean vector spaces equipped with dual norms. This also
    generalizes the diagonal singular value decompositions for tensors introduced
    by the author in previous work. We can apply the results to denoising, where
    (c) is a noisy signal, (a) is a sparse signal and (b) is noise. Applications
    include 1D total variation denoising, 2D total variation Rudin-Osher-Fatemi
    image denoising, LASSO, basis pursuit denoising and tensor decompositions.




沪ICP备19023445号-2号
友情链接