IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Mon, 7 Nov 2016

    我爱机器学习(52ml.net)发表于 2016-11-07 00:00:00
    love 0

    Neural and Evolutionary Computing

    Sparsely-Connected Neural Networks: Towards Efficient VLSI Implementation of Deep Neural Networks

    Arash Ardakani, Carlo Condo, Warren J. Gross
    Comments: 13 pages, 3 figures
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Recently deep neural networks have received considerable attention due to
    their ability to extract and represent high-level abstractions in data sets.
    Deep neural networks such as fully-connected and convolutional neural networks
    have shown excellent performance on a wide range of recognition and
    classification tasks. However, their hardware implementations currently suffer
    from large silicon area and high power consumption due to the their high degree
    of complexity. The power/energy consumption of neural networks is dominated by
    memory accesses, the majority of which occur in fully-connected networks. In
    fact, they contain most of the deep neural network parameters. In this paper,
    we propose sparsely-connected networks, by showing that the number of
    connections in fully-connected networks can be reduced by up to 90% while
    improving the accuracy performance on three popular datasets (MNIST, CIFAR10
    and SVHN). We then propose an efficient hardware architecture based on
    linear-feedback shift registers to reduce the memory requirements of the
    proposed sparsely-connected networks. The proposed architecture can save up to
    90% of memory compared to the conventional implementations of fully-connected
    neural networks. Moreover, implementation results show up to 84% reduction in
    the energy consumption of a single neuron of the proposed sparsely-connected
    networks compared to a single neuron of fully-connected neural networks.

    RenderGAN: Generating Realistic Labeled Data

    Leon Sixt, Benjamin Wild, Tim Landgraf
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

    Deep Convolutional Neuronal Networks (DCNN) are showing remarkable
    performance on many computer vision tasks. Due to their large parameter space,
    they require many labeled samples when trained in a supervised setting. The
    costs of annotating data manually can render the usage of DCNNs infeasible. We
    present a novel framework called RenderGAN that can generate large amounts of
    realistic, labeled images by combining a 3D model and the Generative
    Adversarial Network framework. In our approach, image augmentations (e.g.
    lighting, background, and detail) are learned from unlabeled data such that the
    generated images are strikingly realistic while preserving the labels known
    from the 3D model. We apply the RenderGAN framework to generate images of
    barcode-like markers that are attached to honeybees. A DCNN is trained on this
    data only. It performs better on a test set of real data than an equal DCNN
    trained on the limited amounts of real data available.

    A Self-Driving Robot Using Deep Convolutional Neural Networks on Neuromorphic Hardware

    Tiffany Hwu, Jacob Isbell, Nicolas Oros, Jeffrey Krichmar
    Comments: 6 pages, 8 figures
    Subjects: Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

    Neuromorphic computing is a promising solution for reducing the size, weight
    and power of mobile embedded systems. In this paper, we introduce a realization
    of such a system by creating the first closed-loop battery-powered
    communication system between an IBM TrueNorth NS1e and an autonomous
    Android-Based Robotics platform. Using this system, we constructed a dataset of
    path following behavior by manually driving the Android-Based robot along steep
    mountain trails and recording video frames from the camera mounted on the robot
    along with the corresponding motor commands. We used this dataset to train a
    deep convolutional neural network implemented on the TrueNorth NS1e. The NS1e,
    which was mounted on the robot and powered by the robot’s battery, resulted in
    a self-driving robot that could successfully traverse a steep mountain path in
    real time. To our knowledge, this represents the first time the TrueNorth NS1e
    neuromorphic chip has been embedded on a mobile platform under closed-loop
    control.

    Demystifying ResNet

    Sihan Li, Jiantao Jiao, Yanjun Han, Tsachy Weissman
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

    We provide a theoretical explanation for the superb performance of ResNet via
    the study of deep linear networks and some nonlinear variants. We show that
    with or without nonlinearities, by adding shortcuts that have depth two, the
    condition number of the Hessian of the loss function at the zero initial point
    is depth-invariant, which makes training very deep models no more difficult
    than shallow ones. Shortcuts of higher depth result in an extremely flat
    (high-order) stationary point initially, from which the optimization algorithm
    is hard to escape. The 1-shortcut, however, is essentially equivalent to no
    shortcuts. Extensive experiments are provided accompanying our theoretical
    results. We show that initializing the network to small weights with
    2-shortcuts achieves significantly better results than random Gaussian (Xavier)
    initialization, orthogonal initialization, and shortcuts of deeper depth, from
    various perspectives ranging from final loss, learning dynamics and stability,
    to the behavior of the Hessian along the learning process.

    Semantic Noise Modeling for Better Representation Learning

    Hyo-Eun Kim, Sangheum Hwang, Kyunghyun Cho
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Latent representation learned from multi-layered neural networks via
    hierarchical feature abstraction enables recent success of deep learning. Under
    the deep learning framework, generalization performance highly depends on the
    learned latent representation which is obtained from an appropriate training
    scenario with a task-specific objective on a designed network model. In this
    work, we propose a novel latent space modeling method to learn better latent
    representation. We designed a neural network model based on the assumption that
    good base representation can be attained by maximizing the total correlation
    between the input, latent, and output variables. From the base model, we
    introduce a semantic noise modeling method which enables class-conditional
    perturbation on latent space to enhance the representational power of learned
    latent feature. During training, latent vector representation can be
    stochastically perturbed by a modeled class-conditional additive noise while
    maintaining its original semantic feature. It implicitly brings the effect of
    semantic augmentation on the latent space. The proposed model can be easily
    learned by back-propagation with common gradient-based optimization algorithms.
    Experimental results show that the proposed method helps to achieve performance
    benefits against various previous approaches. We also provide the empirical
    analyses for the proposed class-conditional perturbation process including
    t-SNE visualization.

    Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear

    Zachary C. Lipton, Jianfeng Gao, Lihong Li, Jianshu Chen, Li Deng
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    To use deep reinforcement learning in the wild, we might hope for an agent
    that would never make catastrophic mistakes. At the very least, we could hope
    that an agent would eventually learn to avoid old mistakes. Unfortunately, even
    in simple environments, modern deep reinforcement learning techniques are
    doomed by a Sisyphean curse. Owing to the use of function approximation, these
    agents eventually forget experiences as they become exceedingly unlikely under
    a new policy. Consequently, for as long as they continue to train,
    state-aggregating agents may periodically relive catastrophic mistakes. We
    demonstrate unacceptable performance of deep Q-networks on two toy problems. We
    then introduce intrinsic fear, a method that mitigates these problems by
    avoiding dangerous states.


    Computer Vision and Pattern Recognition

    UMDFaces: An Annotated Face Dataset for Training Deep Networks

    Ankan Bansal, Anirudh Nanduri, Carlos Castillo, Rajeev Ranjan, Rama Chellappa
    Comments: 9 pages, submitted to WACV 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent progress in face detection (including keypoint detection), and
    recognition is mainly being driven by (i) deeper convolutional neural network
    architectures, and (ii) larger datasets. However, most of the largest datasets
    are maintained by private companies and are not publicly available. The
    academic computer vision community needs larger and more varied datasets to
    make further progress.

    In this paper we introduce a new face dataset, called UMDFaces, which has
    367,920 face annotations of 8,501 subjects. We discuss how a large dataset can
    be collected and annotated using human annotators and deep networks. We provide
    human curated bounding boxes for faces. We also provide estimated pose (roll,
    pitch and yaw), locations of twenty-one key-points and gender information
    generated by a pre-trained neural network. Finally, we compare the quality of
    the dataset with other publicly available face datasets at similar scales.

    STDP-based spiking deep neural networks for object recognition

    Saeed Reza Kheradpisheh, Mohammad Ganjtabesh, Simon J Thorpe, Timothée Masquelier
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Previous studies have shown that spike-timing-dependent plasticity (STDP) can
    be used in spiking neural networks (SNN) to extract visual features of low or
    intermediate complexity in an unsupervised manner. These studies, however, used
    relatively shallow architectures, and only one layer was trainable. Another
    line of research has demonstrated – using rate-based neural networks trained
    with back-propagation – that having many layers increases the recognition
    robustness, an approach known as deep learning. We thus designed a deep SNN,
    comprising several convolutional (trainable with STDP) and pooling layers. We
    used a temporal coding scheme where the most strongly activated neurons fire
    first, and less activated neurons fire later or not at all. The network was
    exposed to natural images. Thanks to STDP, neurons progressively learned
    features corresponding to prototypical patterns that were both salient and
    frequent. Only a few tens of examples per category were required and no label
    was needed. After learning, the complexity of the extracted features increased
    along the hierarchy, from edge detectors in the first layer to object
    prototypes in the last layer. Coding was very sparse, with only a few thousands
    spikes per image, and in some cases the object category could be reasonably
    well inferred from the activity of a single higher-order neuron. More
    generally, the activity of a few hundreds of such neurons contained robust
    category information, as demonstrated using a classifier on Caltech 101,
    ETH-80, and MNIST databases. We think that the combination of STDP with latency
    coding is key to understanding the way that the primate visual system learns,
    its remarkable processing speed and its low energy consumption. These
    mechanisms are also interesting for artificial vision systems, particularly for
    hardware solutions.

    Nonnegative Matrix Underapproximation for Robust Multiple Model Fitting

    Mariano Tepper, Guillermo Sapiro
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this work, we introduce the first algorithm to truly address the
    nonnegative matrix underapproximation (NMU) problem, i.e., nonnegative matrix
    factorization (NMF) with an additional underapproximation constraint. NMU
    results are interesting as, compared to traditional NMF, they present
    additional sparsity and part-based behavior, explaining unique data features.
    To show an example of these features in practice, we first present an
    application to the analysis of climate data. We then present an NMU-based
    algorithm to robustly fit multiple parametric models to a dataset. The proposed
    approach delivers state-of-the-art results for the estimation of multiple
    fundamental matrices and homographies, outperforming other alternatives in the
    literature and exemplifying the use of efficient NMU computations.

    Regularized Pel-Recursive Motion Estimation Using Generalized Cross-Validation and Spatial Adaptation

    Vania V. Estrela, Luis A. Rivera, Paulo C. Beggio, Ricardo T. Lopes
    Comments: 8 pages, 6 figures in Proceedings of the XVI Brazilian Symposium on Computer Graphics and Image Processing, 2003. SIBGRAPI 2003. IEEE. arXiv admin note: text overlap with arXiv:1403.7365, arXiv:1611.00960
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The computation of 2-D optical flow by means of regularized pel-recursive
    algorithms raises a host of issues, which include the treatment of outliers,
    motion discontinuities and occlusion among other problems. We propose a new
    approach which allows us to deal with these issues within a common framework.
    Our approach is based on the use of a technique called Generalized
    Cross-Validation to estimate the best regularization scheme for a given pixel.
    In our model, the regularization parameter is a matrix whose entries can
    account for diverse sources of error. The estimation of the motion vectors
    takes into consideration local properties of the image following a spatially
    adaptive approach where each moving pixel is supposed to have its own
    regularization matrix. Preliminary experiments indicate that this approach
    provides robust estimates of the optical flow.

    Learning Identity Mappings with Residual Gates

    Pedro H. P. Savarese
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We propose a technique to augment network layers by adding a linear gating
    mechanism, which provides a way to learn identity mappings by optimizing only
    one parameter. We also introduce a new metric which served as basis for the
    technique. It captures the difficulty involved in learning identity mappings
    for different types of network models, and provides a new theoretical intuition
    for the increased depths of models such as Highway and Residual Networks. We
    propose a new model, the Gated Residual Network, which is the result when
    augmenting Residual Networks. Experimental results show that augmenting layers
    grants increased performance, less issues with depth, and more layer
    independence — fully removing them does not cripple the model. We evaluate our
    method on MNIST using fully-connected networks and on CIFAR-10 using Wide
    ResNets, achieving a relative error reduction of more than 8% in the latter
    when compared to the original model.

    Adversarial Machine Learning at Scale

    Alexey Kurakin, Ian Goodfellow, Samy Bengio
    Comments: 15 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Learning (cs.LG); Machine Learning (stat.ML)

    Adversarial examples are malicious inputs designed to fool machine learning
    models. They often transfer from one model to another, allowing attackers to
    mount black box attacks without knowledge of the target model’s parameters.
    Adversarial training is the process of explicitly training a model on
    adversarial examples, in order to make it more robust to attack or to reduce
    its test error on clean inputs. So far, adversarial training has primarily been
    applied to small problems. In this research, we apply adversarial training to
    ImageNet. Our contributions include: (1) recommendations for how to succesfully
    scale adversarial training to large models and datasets, (2) the observation
    that adversarial training confers robustness to single-step attack methods, (3)
    the finding that multi-step attack methods are somewhat less transferable than
    single-step attack methods, so single-step attacks are the best for mounting
    black-box attacks, and (4) resolution of a “label leaking” effect that causes
    adversarially trained models to perform better on adversarial examples than on
    clean examples, because the adversarial example construction process uses the
    true label and the model can learn to exploit regularities in the construction
    process.

    Integrating Atlas and Graph Cut Methods for LV Segmentation from Cardiac Cine MRI

    Shusil Dangi, Nathan Cahill, Cristian A. Linte
    Comments: Statistical Atlases and Computational Modelling of Heart workshop 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Magnetic Resonance Imaging (MRI) has evolved as a clinical standard-of-care
    imaging modality for cardiac morphology, function assessment, and guidance of
    cardiac interventions. All these applications rely on accurate extraction of
    the myocardial tissue and blood pool from the imaging data. Here we propose a
    framework for left ventricle (LV) segmentation from cardiac cine-MRI. First, we
    segment the LV blood pool using iterative graph cuts, and subsequently use this
    information to segment the myocardium. We formulate the segmentation procedure
    as an energy minimization problem in a graph subject to the shape prior
    obtained by label propagation from an average atlas using affine registration.
    The proposed framework has been validated on 30 patient cardiac cine-MRI
    datasets available through the STACOM LV segmentation challenge and yielded
    fast, robust, and accurate segmentation results.

    Bayesian Modeling of Motion Perception using Dynamical Stochastic Textures

    Jonathan Vacher, Andrew Isaac Meso, Laurent U. Perrinet, Gabriel Peyré
    Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV)

    A common practice to account for psychophysical biases in vision is to frame
    them as consequences of a dynamic process relying on optimal inference with
    respect to a generative model. The present study details the complete
    formulation of such a generative model intended to probe visual motion
    perception. It is first derived in a set of axiomatic steps constrained by
    biological plausibility. We then extend previous contributions by detailing
    three equivalent formulations of the Gaussian dynamic texture model. First, the
    composite dynamic textures are constructed by the random aggregation of warped
    patterns, which can be viewed as 3D Gaussian fields. Second, these textures are
    cast as solutions to a stochastic partial differential equation (sPDE). This
    essential step enables real time, on-the-fly, texture synthesis using
    time-discretized auto- regressive processes. It also allows for the derivation
    of a local motion-energy model, which corresponds to the log-likelihood of the
    probability density. The log-likelihoods are finally essential for the
    construction of a Bayesian inference framework. We use the model to probe speed
    perception in humans psychophysically using zoom-like changes in stimulus
    spatial frequency content. The likelihood is contained within the genera- tive
    model and we chose a slow speed prior consistent with previous literature. We
    then validated the fitting process of the model using synthesized data. The
    human data replicates previous findings that relative perceived speed is
    positively biased by spatial frequency increments. The effect cannot be fully
    accounted for by previous models, but the current prior acting on the
    spatio-temporal likelihoods has proved necessary in accounting for the
    perceptual bias.

    RenderGAN: Generating Realistic Labeled Data

    Leon Sixt, Benjamin Wild, Tim Landgraf
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

    Deep Convolutional Neuronal Networks (DCNN) are showing remarkable
    performance on many computer vision tasks. Due to their large parameter space,
    they require many labeled samples when trained in a supervised setting. The
    costs of annotating data manually can render the usage of DCNNs infeasible. We
    present a novel framework called RenderGAN that can generate large amounts of
    realistic, labeled images by combining a 3D model and the Generative
    Adversarial Network framework. In our approach, image augmentations (e.g.
    lighting, background, and detail) are learned from unlabeled data such that the
    generated images are strikingly realistic while preserving the labels known
    from the 3D model. We apply the RenderGAN framework to generate images of
    barcode-like markers that are attached to honeybees. A DCNN is trained on this
    data only. It performs better on a test set of real data than an equal DCNN
    trained on the limited amounts of real data available.

    Statistical Inverse Formulation of Optical Flow with Uncertainty Quantification

    Jie Sun, Erik Bollt
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Data Analysis, Statistics and Probability (physics.data-an)

    Optical flow refers to the visual motion observed between two consecutive
    images. Since the degree of freedom is typically much larger than the
    constraints imposed by the image observations, the straightforward formulation
    of optical flow inference is an ill-posed problem. By setting some type of
    additional “regularity” constraints, classical approaches formulate a
    well-posed optical flow inference problem in the form of a parameterized set of
    variational equations. In this work we build a mathematical connection, focused
    on optical flow methods, between classical variational optical flow approaches
    and Bayesian statistical inversion. A classical optical flow solution is in
    fact identical to a maximum a posteriori estimator under the assumptions of
    linear model with additive independent Gaussian noise and a Gaussian prior
    distribution. Unlike classical approaches, the statistical inversion approach
    to optical flow estimation not only allows for “point” estimates, but also
    provides a distribution of solutions which can be used for ensemble estimation
    and in particular uncertainty quantification.


    Artificial Intelligence

    Estimating Causal Direction and Confounding of Two Discrete Variables

    Krzysztof Chalupka, Frederick Eberhardt, Pietro Perona
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We propose a method to classify the causal relationship between two discrete
    variables given only the joint distribution of the variables, acknowledging
    that the method is subject to an inherent baseline error. We assume that the
    causal system is acyclicity, but we do allow for hidden common causes. Our
    algorithm presupposes that the probability distributions (P(C)) of a cause (C)
    is independent from the probability distribution (P(Emid C)) of the
    cause-effect mechanism. While our classifier is trained with a Bayesian
    assumption of flat hyperpriors, we do not make this assumption about our test
    data. This work connects to recent developments on the identifiability of
    causal models over continuous variables under the assumption of “independent
    mechanisms”. Carefully-commented Python notebooks that reproduce all our
    experiments are available online at
    this http URL

    Understanding Deep Neural Networks with Rectified Linear Units

    Raman Arora, Amitabh Basu, Poorya Mianjy, Anirbit Mukherjee
    Subjects: Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Machine Learning (stat.ML)

    In this paper we investigate the family of functions representable by deep
    neural networks (DNN) with rectified linear units (ReLU). We give the
    first-ever polynomial time (in the size of data) algorithm to train a ReLU DNN
    with one hidden layer to global optimality. This follows from our complete
    characterization of the ReLU DNN function class whereby we show that a
    (mathbb{R}^n o mathbb{R}) function is representable by a ReLU DNN if and
    only if it is a continuous piecewise linear function. The main tool used to
    prove this characterization is an elegant result from tropical geometry.
    Further, for the (n=1) case, we show that a single hidden layer suffices to
    express all piecewise linear functions, and we give tight bounds for the size
    of such a ReLU DNN.We follow up with gap results showing that there is a
    smoothly parameterized family of (mathbb{R} o mathbb{R}) “hard” functions
    that lead to an exponential blow-up in size, if the number of layers is
    decreased by a small amount. An example consequence of our gap theorem is that
    for every natural number (N), there exists a function representable by a ReLU
    DNN with depth (N^2+1) and total size (N^3), such that any ReLU DNN with depth
    at most (N + 1) will require at least (frac12N^{N+1}-1) total nodes.

    Finally, we construct a family of (mathbb{R}^n o mathbb{R}) functions for
    (ngeq 2) (also smoothly parameterized), whose number of affine pieces scales
    exponentially with the dimension (n) at any fixed size and depth. To the best
    of our knowledge, such a construction with exponential dependence on (n) has
    not been achieved by previous families of “hard” functions in the neural nets
    literature.

    Ways of Conditioning Generative Adversarial Networks

    Hanock Kwak, Byoung-Tak Zhang
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    The GANs are generative models whose random samples realistically reflect
    natural images. It also can generate samples with specific attributes by
    concatenating a condition vector into the input, yet research on this field is
    not well studied. We propose novel methods of conditioning generative
    adversarial networks (GANs) that achieve state-of-the-art results on MNIST and
    CIFAR-10. We mainly introduce two models: an information retrieving model that
    extracts conditional information from the samples, and a spatial bilinear
    pooling model that forms bilinear features derived from the spatial cross
    product of an image and a condition vector. These methods significantly enhance
    log-likelihood of test data under the conditional distributions compared to the
    methods of concatenation.

    Learning Continuous Semantic Representations of Symbolic Expressions

    Miltiadis Allamanis, Pankajan Chanthirasegaran, Pushmeet Kohli, Charles Sutton
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    The question of how procedural knowledge is represented and inferred is a
    fundamental problem in machine learning and artificial intelligence. Recent
    work on program induction has proposed neural architectures, based on
    abstractions like stacks, Turing machines, and interpreters, that operate on
    abstract computational machines or on execution traces. But the recursive
    abstraction that is central to procedural knowledge is perhaps most naturally
    represented by symbolic representations that have syntactic structure, such as
    logical expressions and source code. Combining abstract, symbolic reasoning
    with continuous neural reasoning is a grand challenge of representation
    learning. As a step in this direction, we propose a new architecture, called
    neural equivalence networks, for the problem of learning continuous semantic
    representations of mathematical and logical expressions. These networks are
    trained to represent semantic equivalence, even of expressions that are
    syntactically very different. The challenge is that semantic representations
    must be computed in a syntax-directed manner, because semantics is
    compositional, but at the same time, small changes in syntax can lead to very
    large changes in semantics, which can be difficult for continuous neural
    architectures. We perform an exhaustive evaluation on the task of checking
    equivalence on a highly diverse class of symbolic algebraic and boolean
    expression types, showing that our model significantly outperforms existing
    architectures.


    Information Retrieval

    Learning to Rank Scientific Documents from the Crowd

    Jesse M Lingeman, Hong Yu
    Comments: 12 pages, 1 figure
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Digital Libraries (cs.DL); Learning (cs.LG); Social and Information Networks (cs.SI)

    Finding related published articles is an important task in any science, but
    with the explosion of new work in the biomedical domain it has become
    especially challenging. Most existing methodologies use text similarity metrics
    to identify whether two articles are related or not. However biomedical
    knowledge discovery is hypothesis-driven. The most related articles may not be
    ones with the highest text similarities. In this study, we first develop an
    innovative crowd-sourcing approach to build an expert-annotated
    document-ranking corpus. Using this corpus as the gold standard, we then
    evaluate the approaches of using text similarity to rank the relatedness of
    articles. Finally, we develop and evaluate a new supervised model to
    automatically rank related scientific articles. Our results show that authors’
    ranking differ significantly from rankings by text-similarity-based models. By
    training a learning-to-rank model on a subset of the annotated corpus, we found
    the best supervised learning-to-rank model (SVM-Rank) significantly surpassed
    state-of-the-art baseline systems.

    URL ordering policies for distributed crawlers: a review

    Deepika, Ashutosh Dixit
    Comments: 6 Pages, 5 figures, 1 Table, International Conference on Recent Trends in Computer and Information Technology Research September 2015
    Subjects: Information Retrieval (cs.IR)

    With the increase in size of web, the information is also spreading at large
    scale. Search Engines are the medium to access this information. Crawler is the
    module of search engine which is responsible for download the web pages. In
    order to download the fresh information and get the database rich, crawler
    should crawl the web in some order. This is called as ordering of URLs. URL
    ordering should be done in efficient and effective manner in order to crawl the
    web in proficient manner. In this paper, a survey is done on some existing
    methods of URL ordering and at the end of this paper comparison is also carried
    out among them.

    Generalized Topic Modeling

    Avrim Blum, Nika Haghtalab
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR)

    Recently there has been significant activity in developing algorithms with
    provable guarantees for topic modeling. In standard topic models, a topic (such
    as sports, business, or politics) is viewed as a probability distribution (vec
    a_i) over words, and a document is generated by first selecting a mixture (vec
    w) over topics, and then generating words i.i.d. from the associated mixture
    (A{vec w}). Given a large collection of such documents, the goal is to recover
    the topic vectors and then to correctly classify new documents according to
    their topic mixture.

    In this work we consider a broad generalization of this framework in which
    words are no longer assumed to be drawn i.i.d. and instead a topic is a complex
    distribution over sequences of paragraphs. Since one could not hope to even
    represent such a distribution in general (even if paragraphs are given using
    some natural feature representation), we aim instead to directly learn a
    document classifier. That is, we aim to learn a predictor that given a new
    document, accurately predicts its topic mixture, without learning the
    distributions explicitly. We present several natural conditions under which one
    can do this efficiently and discuss issues such as noise tolerance and sample
    complexity in this model. More generally, our model can be viewed as a
    generalization of the multi-view or co-training setting in machine learning.


    Computation and Language

    Sequence to Sequence Transduction with Hard Monotonic Attention

    Roee Aharoni, Yoav Goldberg
    Comments: Under review as a conference paper at ICLR 2017
    Subjects: Computation and Language (cs.CL)

    We present a supervised sequence to sequence transduction model with a hard
    attention mechanism which combines the more traditional statistical alignment
    methods with the power of recurrent neural networks. We evaluate the model on
    the task of morphological inflection generation and show that it provides state
    of the art results in various setups compared to the previous neural and
    non-neural approaches. Eventually we present an analysis of the learned
    representations for both hard and soft attention models, shedding light on the
    features such models extract in order to solve the task.

    Learning Recurrent Span Representations for Extractive Question Answering

    Kenton Lee, Tom Kwiatkowski, Ankur Parikh, Dipanjan Das
    Subjects: Computation and Language (cs.CL)

    The reading comprehension task, that asks questions about a given evidence
    document, is a central problem in natural language understanding. Recent
    formulations of this task have typically focused on answer selection from a set
    of candidates pre-defined manually or through the use of an external NLP
    pipeline. However, Rajpurkar et al. (2016) recently released the SQuAD dataset
    in which the answers can be arbitrary strings from the supplied text. In this
    paper, we focus on this answer extraction task, presenting a novel model
    architecture that efficiently builds fixed length representations of all spans
    in the evidence document with a recurrent network. We show that scoring
    explicit span representations significantly improves performance over other
    approaches that factor the prediction into separate predictions about words or
    start and end markers. Our approach improves upon the best published results of
    Wang & Jiang (2016) by 5% and decreases the error of Rajpurkar et al.’s
    baseline by > 50%.

    Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies

    Tal Linzen, Emmanuel Dupoux, Yoav Goldberg
    Comments: 15 pages; to appear in Transactions of the Association for Computational Linguistics
    Subjects: Computation and Language (cs.CL)

    The success of long short-term memory (LSTM) neural networks in language
    processing is typically attributed to their ability to capture long-distance
    statistical regularities. Linguistic regularities are often sensitive to
    syntactic structure; can such dependencies be captured by LSTMs, which do not
    have explicit structural representations? We begin addressing this question
    using number agreement in English subject-verb dependencies. We probe the
    architecture’s grammatical competence both using training objectives with an
    explicit grammatical target (number prediction, grammaticality judgments) and
    using language models. In the strongly supervised settings, the LSTM achieved
    very high overall accuracy (less than 1% errors), but errors increased when
    sequential and structural information conflicted. The frequency of such errors
    rose sharply in the language-modeling setting. We conclude that LSTMs can
    capture a non-trivial amount of grammatical structure given targeted
    supervision, but stronger architectures may be required to further reduce
    errors; furthermore, the language modeling signal is insufficient for capturing
    syntax-sensitive dependencies, and should be supplemented with more direct
    supervision if such dependencies need to be captured.

    Answering Complicated Question Intents Expressed in Decomposed Question Sequences

    Mohit Iyyer, Wen-tau Yih, Ming-Wei Chang
    Subjects: Computation and Language (cs.CL)

    Recent work in semantic parsing for question answering has focused on long
    and complicated questions, many of which would seem unnatural if asked in a
    normal conversation between two humans. In an effort to explore a
    conversational QA setting, we present a more realistic task: answering
    sequences of simple but inter-related questions. We collect a dataset of 6,066
    question sequences that inquire about semi-structured tables from Wikipedia,
    with 17,553 question-answer pairs in total. Existing QA systems face two major
    problems when evaluated on our dataset: (1) handling questions that contain
    coreferences to previous questions or answers, and (2) matching words or
    phrases in a question to corresponding entries in the associated table. We
    conclude by proposing strategies to handle both of these issues.

    Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

    Hakan Inan, Khashayar Khosravi, Richard Socher
    Comments: 10 pages, 2 figures, 3 tables
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

    Recurrent neural networks have been very successful at predicting sequences
    of words in tasks such as language modeling. However, all such models are based
    on the conventional classification framework, where model is trained against
    one-hot targets, and each word is represented both as an input and as an output
    in isolation. This causes inefficiencies in learning both in terms of utilizing
    all of the information and in terms of the number of parameters needed to
    train. We introduce a novel theoretical framework that facilitates better
    learning in language modeling, and show that our framework leads to tying
    together the input embedding and the output projection matrices, greatly
    reducing the number of trainable variables. Our LSTM model lowers the state of
    the art word-level perplexity on the Penn Treebank to 68.5.

    Learning to Rank Scientific Documents from the Crowd

    Jesse M Lingeman, Hong Yu
    Comments: 12 pages, 1 figure
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Digital Libraries (cs.DL); Learning (cs.LG); Social and Information Networks (cs.SI)

    Finding related published articles is an important task in any science, but
    with the explosion of new work in the biomedical domain it has become
    especially challenging. Most existing methodologies use text similarity metrics
    to identify whether two articles are related or not. However biomedical
    knowledge discovery is hypothesis-driven. The most related articles may not be
    ones with the highest text similarities. In this study, we first develop an
    innovative crowd-sourcing approach to build an expert-annotated
    document-ranking corpus. Using this corpus as the gold standard, we then
    evaluate the approaches of using text similarity to rank the relatedness of
    articles. Finally, we develop and evaluate a new supervised model to
    automatically rank related scientific articles. Our results show that authors’
    ranking differ significantly from rankings by text-similarity-based models. By
    training a learning-to-rank model on a subset of the annotated corpus, we found
    the best supervised learning-to-rank model (SVM-Rank) significantly surpassed
    state-of-the-art baseline systems.

    Generalized Topic Modeling

    Avrim Blum, Nika Haghtalab
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR)

    Recently there has been significant activity in developing algorithms with
    provable guarantees for topic modeling. In standard topic models, a topic (such
    as sports, business, or politics) is viewed as a probability distribution (vec
    a_i) over words, and a document is generated by first selecting a mixture (vec
    w) over topics, and then generating words i.i.d. from the associated mixture
    (A{vec w}). Given a large collection of such documents, the goal is to recover
    the topic vectors and then to correctly classify new documents according to
    their topic mixture.

    In this work we consider a broad generalization of this framework in which
    words are no longer assumed to be drawn i.i.d. and instead a topic is a complex
    distribution over sequences of paragraphs. Since one could not hope to even
    represent such a distribution in general (even if paragraphs are given using
    some natural feature representation), we aim instead to directly learn a
    document classifier. That is, we aim to learn a predictor that given a new
    document, accurately predicts its topic mixture, without learning the
    distributions explicitly. We present several natural conditions under which one
    can do this efficiently and discuss issues such as noise tolerance and sample
    complexity in this model. More generally, our model can be viewed as a
    generalization of the multi-view or co-training setting in machine learning.


    Distributed, Parallel, and Cluster Computing

    Multi-level Simulation of Internet of Things on Smart Territories

    Gabriele D'Angelo, Stefano Ferretti, Vittorio Ghini
    Comments: To appear in Simulation Modelling Practice and Theory, Elsevier
    Subjects: Performance (cs.PF); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA); Networking and Internet Architecture (cs.NI)

    In this paper, a methodology is presented and employed for simulating the
    Internet of Things (IoT). The requirement for scalability, due to the possibly
    huge amount of involved sensors and devices, and the heterogeneous scenarios
    that might occur, impose resorting to sophisticated modeling and simulation
    techniques. In particular, multi-level simulation is regarded as a main
    framework that allows simulating large-scale IoT environments while keeping
    high levels of detail, when it is needed. We consider a use case based on the
    deployment of smart services in decentralized territories. A two level
    simulator is employed, which is based on a coarse agent-based, adaptive
    parallel and distributed simulation approach to model the general life of
    simulated entities. However, when needed a finer grained simulator (based on
    OMNeT++) is triggered on a restricted portion of the simulated area, which
    allows considering all issues concerned with wireless communications. Based on
    this use case, it is confirmed that the ad-hoc wireless networking technologies
    do represent a principle tool to deploy smart services over decentralized
    countrysides. Moreover, the performance evaluation confirms the viability of
    utilizing multi-level simulation for simulating large scale IoT environments.


    Learning

    Improving Stochastic Gradient Descent with Feedback

    Jayanth Koushik, Hiroaki Hayashi
    Comments: ICLR 2017 Submission
    Subjects: Learning (cs.LG)

    In this paper we propose a simple and efficient method for improving
    stochastic gradient descent methods by using feedback from the objective
    function. The method tracks the relative changes in the objective function with
    a running average, and uses it to adaptively tune the learning rate in
    stochastic gradient descent. We specifically apply this idea to modify Adam, a
    popular algorithm for training deep neural networks. We conduct experiments to
    compare the resulting algorithm, which we call Eve, with state of the art
    methods used for training deep learning models. We train CNNs for image
    classification, and RNNs for language modeling and question answering. Our
    experiments show that Eve outperforms all other algorithms on these benchmark
    tasks. We also analyze the behavior of the feedback mechanism during the
    training process.

    Protein Secondary Structure Prediction Using Deep Multi-scale Convolutional Neural Networks and Next-Step Conditioning

    Akosua Busia, Jasmine Collins, Navdeep Jaitly
    Comments: 10 pages, 2 figures, submitted to RECOMB 2017
    Subjects: Learning (cs.LG); Biomolecules (q-bio.BM)

    Recently developed deep learning techniques have significantly improved the
    accuracy of various speech and image recognition systems. In this paper we
    adapt some of these techniques for protein secondary structure prediction. We
    first train a series of deep neural networks to predict eight-class secondary
    structure labels given a protein’s amino acid sequence information and find
    that using recent methods for regularization, such as dropout and weight-norm
    constraining, leads to measurable gains in accuracy. We then adapt recent
    convolutional neural network architectures–Inception, ReSNet, and DenseNet
    with Batch Normalization–to the problem of protein structure prediction. These
    convolutional architectures make heavy use of multi-scale filter layers that
    simultaneously compute features on several scales, and use residual connections
    to prevent underfitting. Using a carefully modified version of these
    architectures, we achieve state-of-the-art performance of 70.0% per amino acid
    accuracy on the public CB513 benchmark dataset. Finally, we explore additions
    from sequence-to-sequence learning, altering the model to make its predictions
    conditioned on both the protein’s amino acid sequence and its past secondary
    structure labels. We introduce a new method of ensembling such a conditional
    model with our convolutional model, an approach which reaches 70.6% Q8 accuracy
    on CB513. We argue that these results can be further refined for larger boosts
    in prediction accuracy through more sophisticated attempts to control
    overfitting of conditional models. We aim to release the code for these
    experiments as part of the TensorFlow repository.

    Understanding Deep Neural Networks with Rectified Linear Units

    Raman Arora, Amitabh Basu, Poorya Mianjy, Anirbit Mukherjee
    Subjects: Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Machine Learning (stat.ML)

    In this paper we investigate the family of functions representable by deep
    neural networks (DNN) with rectified linear units (ReLU). We give the
    first-ever polynomial time (in the size of data) algorithm to train a ReLU DNN
    with one hidden layer to global optimality. This follows from our complete
    characterization of the ReLU DNN function class whereby we show that a
    (mathbb{R}^n o mathbb{R}) function is representable by a ReLU DNN if and
    only if it is a continuous piecewise linear function. The main tool used to
    prove this characterization is an elegant result from tropical geometry.
    Further, for the (n=1) case, we show that a single hidden layer suffices to
    express all piecewise linear functions, and we give tight bounds for the size
    of such a ReLU DNN.We follow up with gap results showing that there is a
    smoothly parameterized family of (mathbb{R} o mathbb{R}) “hard” functions
    that lead to an exponential blow-up in size, if the number of layers is
    decreased by a small amount. An example consequence of our gap theorem is that
    for every natural number (N), there exists a function representable by a ReLU
    DNN with depth (N^2+1) and total size (N^3), such that any ReLU DNN with depth
    at most (N + 1) will require at least (frac12N^{N+1}-1) total nodes.

    Finally, we construct a family of (mathbb{R}^n o mathbb{R}) functions for
    (ngeq 2) (also smoothly parameterized), whose number of affine pieces scales
    exponentially with the dimension (n) at any fixed size and depth. To the best
    of our knowledge, such a construction with exponential dependence on (n) has
    not been achieved by previous families of “hard” functions in the neural nets
    literature.

    Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

    Hakan Inan, Khashayar Khosravi, Richard Socher
    Comments: 10 pages, 2 figures, 3 tables
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

    Recurrent neural networks have been very successful at predicting sequences
    of words in tasks such as language modeling. However, all such models are based
    on the conventional classification framework, where model is trained against
    one-hot targets, and each word is represented both as an input and as an output
    in isolation. This causes inefficiencies in learning both in terms of utilizing
    all of the information and in terms of the number of parameters needed to
    train. We introduce a novel theoretical framework that facilitates better
    learning in language modeling, and show that our framework leads to tying
    together the input embedding and the output projection matrices, greatly
    reducing the number of trainable variables. Our LSTM model lowers the state of
    the art word-level perplexity on the Penn Treebank to 68.5.

    Multi-task learning with deep model based reinforcement learning

    Asier Mujika
    Subjects: Learning (cs.LG)

    In recent years, model-free methods that use deep learning have achieved
    great success in many different reinforcement learning environments. Most
    successful approaches focus on solving a single task, while multi-task
    reinforcement learning remains an open problem. In this paper, we present a
    model based approach to deep reinforcement learning which we use to solve
    different tasks simultaneously. We show that our approach not only does not
    degrade but actually benefits from learning multiple tasks. For our model, we
    also present a new kind of recurrent neural network inspired by residual
    networks that decouples memory from computation allowing to model complex
    environments that do not require lots of memory. The code will be released
    before ICLR 2017.

    Learning heat diffusion graphs

    Dorina Thanou, Xiaowen Dong, Daniel Kressner, Pascal Frossard
    Subjects: Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Effective information analysis generally boils down to properly identifying
    the structure or geometry of the data, which is often represented by a graph.
    In some applications, this structure may be partly determined by design
    constraints or pre-determined sensing arrangements, like in road transportation
    networks for example. In general though, the data structure is not readily
    available and becomes pretty difficult to define. In particular, the global
    smoothness assumptions, that most of the existing works adopt, are often too
    general and unable to properly capture localized properties of data. In this
    paper, we go beyond this classical data model and rather propose to represent
    information as a sparse combination of localized functions that live on a data
    structure represented by a graph. Based on this model, we focus on the problem
    of inferring the connectivity that best explains the data samples at different
    vertices of a graph that is a priori unknown. We concentrate on the case where
    the observed data is actually the sum of heat diffusion processes, which is a
    quite common model for data on networks or other irregular structures. We cast
    a new graph learning problem and solve it with an efficient nonconvex
    optimization algorithm. Experiments on both synthetic and real world data
    finally illustrate the benefits of the proposed graph learning framework and
    confirm that the data structure can be efficiently learned from data
    observations only. We believe that our algorithm will help solving key
    questions in diverse application domains such as social and biological network
    analysis where it is crucial to unveil proper geometry for data understanding
    and inference.

    Ways of Conditioning Generative Adversarial Networks

    Hanock Kwak, Byoung-Tak Zhang
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    The GANs are generative models whose random samples realistically reflect
    natural images. It also can generate samples with specific attributes by
    concatenating a condition vector into the input, yet research on this field is
    not well studied. We propose novel methods of conditioning generative
    adversarial networks (GANs) that achieve state-of-the-art results on MNIST and
    CIFAR-10. We mainly introduce two models: an information retrieving model that
    extracts conditional information from the samples, and a spatial bilinear
    pooling model that forms bilinear features derived from the spatial cross
    product of an image and a condition vector. These methods significantly enhance
    log-likelihood of test data under the conditional distributions compared to the
    methods of concatenation.

    Semi-supervised deep learning by metric embedding

    Elad Hoffer, Nir Ailon
    Subjects: Learning (cs.LG)

    Deep networks are successfully used as classification models yielding
    state-of-the-art results when trained on a large number of labeled samples.
    These models, however, are usually much less suited for semi-supervised
    problems because of their tendency to overfit easily when trained on small
    amounts of data. In this work we will explore a new training objective that is
    targeting a semi-supervised regime with only a small subset of labeled data.
    This criterion is based on a deep metric embedding over distance relations
    within the set of labeled samples, together with constraints over the
    embeddings of the unlabeled set. The final learned representations are
    discriminative in euclidean space, and hence can be used with subsequent
    nearest-neighbor classification using the labeled samples.

    Learning Continuous Semantic Representations of Symbolic Expressions

    Miltiadis Allamanis, Pankajan Chanthirasegaran, Pushmeet Kohli, Charles Sutton
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    The question of how procedural knowledge is represented and inferred is a
    fundamental problem in machine learning and artificial intelligence. Recent
    work on program induction has proposed neural architectures, based on
    abstractions like stacks, Turing machines, and interpreters, that operate on
    abstract computational machines or on execution traces. But the recursive
    abstraction that is central to procedural knowledge is perhaps most naturally
    represented by symbolic representations that have syntactic structure, such as
    logical expressions and source code. Combining abstract, symbolic reasoning
    with continuous neural reasoning is a grand challenge of representation
    learning. As a step in this direction, we propose a new architecture, called
    neural equivalence networks, for the problem of learning continuous semantic
    representations of mathematical and logical expressions. These networks are
    trained to represent semantic equivalence, even of expressions that are
    syntactically very different. The challenge is that semantic representations
    must be computed in a syntax-directed manner, because semantics is
    compositional, but at the same time, small changes in syntax can lead to very
    large changes in semantics, which can be difficult for continuous neural
    architectures. We perform an exhaustive evaluation on the task of checking
    equivalence on a highly diverse class of symbolic algebraic and boolean
    expression types, showing that our model significantly outperforms existing
    architectures.

    A Communication-Efficient Parallel Algorithm for Decision Tree

    Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu
    Subjects: Learning (cs.LG)

    Decision tree (and its extensions such as Gradient Boosting Decision Trees
    and Random Forest) is a widely used machine learning algorithm, due to its
    practical effectiveness and model interpretability. With the emergence of big
    data, there is an increasing need to parallelize the training process of
    decision tree. However, most existing attempts along this line suffer from high
    communication costs. In this paper, we propose a new algorithm, called
    emph{Parallel Voting Decision Tree (PV-Tree)}, to tackle this challenge. After
    partitioning the training data onto a number of (e.g., (M)) machines, this
    algorithm performs both local voting and global voting in each iteration. For
    local voting, the top-(k) attributes are selected from each machine according
    to its local data. Then, globally top-(2k) attributes are determined by a
    majority voting among these local candidates. Finally, the full-grained
    histograms of the globally top-(2k) attributes are collected from local
    machines in order to identify the best (most informative) attribute and its
    split point. PV-Tree can achieve a very low communication cost (independent of
    the total number of attributes) and thus can scale out very well. Furthermore,
    theoretical analysis shows that this algorithm can learn a near optimal
    decision tree, since it can find the best attribute with a large probability.
    Our experiments on real-world datasets show that PV-Tree significantly
    outperforms the existing parallel decision tree algorithms in the trade-off
    between accuracy and efficiency.

    Semantic Noise Modeling for Better Representation Learning

    Hyo-Eun Kim, Sangheum Hwang, Kyunghyun Cho
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Latent representation learned from multi-layered neural networks via
    hierarchical feature abstraction enables recent success of deep learning. Under
    the deep learning framework, generalization performance highly depends on the
    learned latent representation which is obtained from an appropriate training
    scenario with a task-specific objective on a designed network model. In this
    work, we propose a novel latent space modeling method to learn better latent
    representation. We designed a neural network model based on the assumption that
    good base representation can be attained by maximizing the total correlation
    between the input, latent, and output variables. From the base model, we
    introduce a semantic noise modeling method which enables class-conditional
    perturbation on latent space to enhance the representational power of learned
    latent feature. During training, latent vector representation can be
    stochastically perturbed by a modeled class-conditional additive noise while
    maintaining its original semantic feature. It implicitly brings the effect of
    semantic augmentation on the latent space. The proposed model can be easily
    learned by back-propagation with common gradient-based optimization algorithms.
    Experimental results show that the proposed method helps to achieve performance
    benefits against various previous approaches. We also provide the empirical
    analyses for the proposed class-conditional perturbation process including
    t-SNE visualization.

    Generalized Topic Modeling

    Avrim Blum, Nika Haghtalab
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR)

    Recently there has been significant activity in developing algorithms with
    provable guarantees for topic modeling. In standard topic models, a topic (such
    as sports, business, or politics) is viewed as a probability distribution (vec
    a_i) over words, and a document is generated by first selecting a mixture (vec
    w) over topics, and then generating words i.i.d. from the associated mixture
    (A{vec w}). Given a large collection of such documents, the goal is to recover
    the topic vectors and then to correctly classify new documents according to
    their topic mixture.

    In this work we consider a broad generalization of this framework in which
    words are no longer assumed to be drawn i.i.d. and instead a topic is a complex
    distribution over sequences of paragraphs. Since one could not hope to even
    represent such a distribution in general (even if paragraphs are given using
    some natural feature representation), we aim instead to directly learn a
    document classifier. That is, we aim to learn a predictor that given a new
    document, accurately predicts its topic mixture, without learning the
    distributions explicitly. We present several natural conditions under which one
    can do this efficiently and discuss issues such as noise tolerance and sample
    complexity in this model. More generally, our model can be viewed as a
    generalization of the multi-view or co-training setting in machine learning.

    Sample Efficient Actor-Critic with Experience Replay

    Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, Nando de Freitas
    Comments: 20 pages. Prepared for ICLR 2017
    Subjects: Learning (cs.LG)

    This paper presents an actor-critic deep reinforcement learning agent with
    experience replay that is stable, sample efficient, and performs remarkably
    well on challenging environments, including the discrete 57-game Atari domain
    and several continuous control problems. To achieve this, the paper introduces
    several innovations, including truncated importance sampling with bias
    correction, stochastic dueling network architectures, and a new trust region
    policy optimization method.

    Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear

    Zachary C. Lipton, Jianfeng Gao, Lihong Li, Jianshu Chen, Li Deng
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    To use deep reinforcement learning in the wild, we might hope for an agent
    that would never make catastrophic mistakes. At the very least, we could hope
    that an agent would eventually learn to avoid old mistakes. Unfortunately, even
    in simple environments, modern deep reinforcement learning techniques are
    doomed by a Sisyphean curse. Owing to the use of function approximation, these
    agents eventually forget experiences as they become exceedingly unlikely under
    a new policy. Consequently, for as long as they continue to train,
    state-aggregating agents may periodically relive catastrophic mistakes. We
    demonstrate unacceptable performance of deep Q-networks on two toy problems. We
    then introduce intrinsic fear, a method that mitigates these problems by
    avoiding dangerous states.

    PrivLogit: Efficient Privacy-preserving Logistic Regression by Tailoring Numerical Optimizers

    Wei Xie, Yang Wang, Steven M. Boker, Donald E. Brown
    Comments: 24 pages, 4 figures. Work done and circulated since 2015
    Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

    Safeguarding privacy in machine learning is highly desirable, especially in
    collaborative studies across many organizations. Privacy-preserving distributed
    machine learning (based on cryptography) is popular to solve the problem.
    However, existing cryptographic protocols still incur excess computational
    overhead. Here, we make a novel observation that this is partially due to naive
    adoption of mainstream numerical optimization (e.g., Newton method) and failing
    to tailor for secure computing. This work presents a contrasting perspective:
    customizing numerical optimization specifically for secure settings. We propose
    a seemingly less-favorable optimization method that can in fact significantly
    accelerate privacy-preserving logistic regression. Leveraging this new method,
    we propose two new secure protocols for conducting logistic regression in a
    privacy-preserving and distributed manner. Extensive theoretical and empirical
    evaluations prove the competitive performance of our two secure proposals while
    without compromising accuracy or privacy: with speedup up to 2.3x and 8.1x,
    respectively, over state-of-the-art; and even faster as data scales up. Such
    drastic speedup is on top of and in addition to performance improvements from
    existing (and future) state-of-the-art cryptography. Our work provides a new
    way towards efficient and practical privacy-preserving logistic regression for
    large-scale studies which are common for modern science.

    Estimating Causal Direction and Confounding of Two Discrete Variables

    Krzysztof Chalupka, Frederick Eberhardt, Pietro Perona
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We propose a method to classify the causal relationship between two discrete
    variables given only the joint distribution of the variables, acknowledging
    that the method is subject to an inherent baseline error. We assume that the
    causal system is acyclicity, but we do allow for hidden common causes. Our
    algorithm presupposes that the probability distributions (P(C)) of a cause (C)
    is independent from the probability distribution (P(Emid C)) of the
    cause-effect mechanism. While our classifier is trained with a Bayesian
    assumption of flat hyperpriors, we do not make this assumption about our test
    data. This work connects to recent developments on the identifiability of
    causal models over continuous variables under the assumption of “independent
    mechanisms”. Carefully-commented Python notebooks that reproduce all our
    experiments are available online at
    this http URL

    Sparsely-Connected Neural Networks: Towards Efficient VLSI Implementation of Deep Neural Networks

    Arash Ardakani, Carlo Condo, Warren J. Gross
    Comments: 13 pages, 3 figures
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Recently deep neural networks have received considerable attention due to
    their ability to extract and represent high-level abstractions in data sets.
    Deep neural networks such as fully-connected and convolutional neural networks
    have shown excellent performance on a wide range of recognition and
    classification tasks. However, their hardware implementations currently suffer
    from large silicon area and high power consumption due to the their high degree
    of complexity. The power/energy consumption of neural networks is dominated by
    memory accesses, the majority of which occur in fully-connected networks. In
    fact, they contain most of the deep neural network parameters. In this paper,
    we propose sparsely-connected networks, by showing that the number of
    connections in fully-connected networks can be reduced by up to 90% while
    improving the accuracy performance on three popular datasets (MNIST, CIFAR10
    and SVHN). We then propose an efficient hardware architecture based on
    linear-feedback shift registers to reduce the memory requirements of the
    proposed sparsely-connected networks. The proposed architecture can save up to
    90% of memory compared to the conventional implementations of fully-connected
    neural networks. Moreover, implementation results show up to 84% reduction in
    the energy consumption of a single neuron of the proposed sparsely-connected
    networks compared to a single neuron of fully-connected neural networks.

    Information-Theoretic Bounds and Approximations in Neural Population Coding

    Wentao Huang, Kechen Zhang
    Subjects: Information Theory (cs.IT); Learning (cs.LG)

    Information theory is a powerful tool for neuroscience and other disciplines.
    Efficient calculation of Shannon’s mutual information (MI) is a key
    computational step that often presents the biggest difficulty for practical
    applications. In this paper, we propose effective approximation methods for
    evaluating MI in the context of neural population coding, especially for
    high-dimensional inputs. We prove several information-theoretic asymptotic
    bounds and approximation formulas for large size neural populations. We also
    discuss how optimization of neural population coding based on these
    approximation formulas leads to a convex problem about the population density
    distribution in neural population parameter space. Several useful techniques,
    including variable transformation and dimensionality reduction, are proposed
    for more efficient computation of the approximations. Our numerical simulation
    results show that our asymptotic formulas are highly accurate for approximating
    MI in neural populations. For some special cases, the approximations by our
    formulas are exactly equal to the true MI. The approximation methods for MI may
    have a wide range of applications in various disciplines.

    Learning to Rank Scientific Documents from the Crowd

    Jesse M Lingeman, Hong Yu
    Comments: 12 pages, 1 figure
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Digital Libraries (cs.DL); Learning (cs.LG); Social and Information Networks (cs.SI)

    Finding related published articles is an important task in any science, but
    with the explosion of new work in the biomedical domain it has become
    especially challenging. Most existing methodologies use text similarity metrics
    to identify whether two articles are related or not. However biomedical
    knowledge discovery is hypothesis-driven. The most related articles may not be
    ones with the highest text similarities. In this study, we first develop an
    innovative crowd-sourcing approach to build an expert-annotated
    document-ranking corpus. Using this corpus as the gold standard, we then
    evaluate the approaches of using text similarity to rank the relatedness of
    articles. Finally, we develop and evaluate a new supervised model to
    automatically rank related scientific articles. Our results show that authors’
    ranking differ significantly from rankings by text-similarity-based models. By
    training a learning-to-rank model on a subset of the annotated corpus, we found
    the best supervised learning-to-rank model (SVM-Rank) significantly surpassed
    state-of-the-art baseline systems.

    Information Dropout: learning optimal representations through noise

    Alessandro Achille, Stefano Soatto
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Computation (stat.CO)

    We introduce Information Dropout, a generalization of dropout that is
    motivated by the Information Bottleneck principle and highlights the way in
    which injecting noise in the activations can help in learning optimal
    representations of the data. Information Dropout is rooted in information
    theoretic principles, it includes as special cases several existing dropout
    methods, like Gaussian Dropout and Variational Dropout, and, unlike classical
    dropout, it can learn and build representations that are invariant to nuisances
    of the data, like occlusions and clutter. When the task is the reconstruction
    of the input, we show that the information dropout method yields a variational
    autoencoder as a special case, thus providing a link between representation
    learning, information theory and variational inference. Our experiments
    validate the theoretical intuitions behind our method, and we find that
    information dropout achieves a comparable or better generalization performance
    than binary dropout, especially on smaller models, since it can automatically
    adapt the noise to the structure of the network, as well as to the test sample.

    Learning Identity Mappings with Residual Gates

    Pedro H. P. Savarese
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We propose a technique to augment network layers by adding a linear gating
    mechanism, which provides a way to learn identity mappings by optimizing only
    one parameter. We also introduce a new metric which served as basis for the
    technique. It captures the difficulty involved in learning identity mappings
    for different types of network models, and provides a new theoretical intuition
    for the increased depths of models such as Highway and Residual Networks. We
    propose a new model, the Gated Residual Network, which is the result when
    augmenting Residual Networks. Experimental results show that augmenting layers
    grants increased performance, less issues with depth, and more layer
    independence — fully removing them does not cripple the model. We evaluate our
    method on MNIST using fully-connected networks and on CIFAR-10 using Wide
    ResNets, achieving a relative error reduction of more than 8% in the latter
    when compared to the original model.

    Reparameterization trick for discrete variables

    Seiya Tokui, Issei sato
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Low-variance gradient estimation is crucial for learning directed graphical
    models parameterized by neural networks, where the reparameterization trick is
    widely used for those with continuous variables. While this technique gives
    low-variance gradient estimates, it has not been directly applicable to
    discrete variables, the sampling of which inherently requires discontinuous
    operations. We argue that the discontinuity can be bypassed by marginalizing
    out the variable of interest, which results in a new reparameterization trick
    for discrete variables. This reparameterization greatly reduces the variance,
    which is understood by regarding the method as an application of common random
    numbers to the estimation. The resulting estimator is theoretically guaranteed
    to have a variance not larger than that of the likelihood-ratio method with the
    optimal input-dependent baseline. We give empirical results for variational
    learning of sigmoid belief networks.

    Adversarial Machine Learning at Scale

    Alexey Kurakin, Ian Goodfellow, Samy Bengio
    Comments: 15 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Learning (cs.LG); Machine Learning (stat.ML)

    Adversarial examples are malicious inputs designed to fool machine learning
    models. They often transfer from one model to another, allowing attackers to
    mount black box attacks without knowledge of the target model’s parameters.
    Adversarial training is the process of explicitly training a model on
    adversarial examples, in order to make it more robust to attack or to reduce
    its test error on clean inputs. So far, adversarial training has primarily been
    applied to small problems. In this research, we apply adversarial training to
    ImageNet. Our contributions include: (1) recommendations for how to succesfully
    scale adversarial training to large models and datasets, (2) the observation
    that adversarial training confers robustness to single-step attack methods, (3)
    the finding that multi-step attack methods are somewhat less transferable than
    single-step attack methods, so single-step attacks are the best for mounting
    black-box attacks, and (4) resolution of a “label leaking” effect that causes
    adversarially trained models to perform better on adversarial examples than on
    clean examples, because the adversarial example construction process uses the
    true label and the model can learn to exploit regularities in the construction
    process.

    Deep Information Propagation

    Samuel S. Schoenholz, Justin Gilmer, Surya Ganguli, Jascha Sohl-Dickstein
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We study the behavior of untrained neural networks whose weights and biases
    are randomly distributed using mean field theory. We show the existence of
    depth scales that naturally limit the maximum depth of signal propagation
    through these random networks. Our main practical result is to show that random
    networks may be trained precisely when information can travel through them.
    Thus, the depth scales that we identify provide bounds on how deep a network
    may be trained for a specific choice of hyperparameters. As a corollary to
    this, we argue that in networks at the edge of chaos, one of these depth scales
    diverges. Thus arbitrarily deep networks may be trained only sufficiently close
    to criticality. We show that the presence of dropout destroys the
    order-to-chaos critical point and therefore strongly limits the maximum
    trainable depth for random networks. Finally, we develop a mean field theory
    for backpropagation and we show that the ordered and chaotic phases correspond
    to regions of vanishing and exploding gradient respectively.

    Conspiracies between Learning Algorithms, Circuit Lower Bounds and Pseudorandomness

    Igor C. Oliveira, Rahul Santhanam
    Subjects: Computational Complexity (cs.CC); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Learning (cs.LG)

    We prove several results giving new and stronger connections between
    learning, circuit lower bounds and pseudorandomness. Among other results, we
    show a generic learning speedup lemma, equivalences between various learning
    models in the exponential time and subexponential time regimes, a dichotomy
    between learning and pseudorandomness, consequences of non-trivial learning for
    circuit lower bounds, Karp-Lipton theorems for probabilistic exponential time,
    and NC(^1)-hardness for the Minimum Circuit Size Problem.

    Demystifying ResNet

    Sihan Li, Jiantao Jiao, Yanjun Han, Tsachy Weissman
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

    We provide a theoretical explanation for the superb performance of ResNet via
    the study of deep linear networks and some nonlinear variants. We show that
    with or without nonlinearities, by adding shortcuts that have depth two, the
    condition number of the Hessian of the loss function at the zero initial point
    is depth-invariant, which makes training very deep models no more difficult
    than shallow ones. Shortcuts of higher depth result in an extremely flat
    (high-order) stationary point initially, from which the optimization algorithm
    is hard to escape. The 1-shortcut, however, is essentially equivalent to no
    shortcuts. Extensive experiments are provided accompanying our theoretical
    results. We show that initializing the network to small weights with
    2-shortcuts achieves significantly better results than random Gaussian (Xavier)
    initialization, orthogonal initialization, and shortcuts of deeper depth, from
    various perspectives ranging from final loss, learning dynamics and stability,
    to the behavior of the Hessian along the learning process.


    Information Theory

    Almost universal codes for MIMO wiretap channels

    Laura Luzzi, Roope Vehkalahti, Cong Ling
    Comments: 31 pages, 1 figure
    Subjects: Information Theory (cs.IT); Number Theory (math.NT)

    Despite several works on secrecy coding for fading and MIMO wiretap channels
    from an error probability perspective, the construction of information
    theoretically secure codes over such channels remains as an open problem. In
    this paper, we consider a fading wiretap channel model where the transmitter
    has only partial statistical channel state information. We extend the flatness
    factor criterion from the Gaussian wiretap channel to fading and MIMO wiretap
    channels, and propose concrete lattice codes with a vanishing flatness factor
    to achieve information theoretic security. These codes are built from algebraic
    number fields with constant root discriminant for the single-antenna fading
    wiretap channel, and from division algebras centered at such number fields for
    the MIMO wiretap channel, respectively. The proposed lattice codes achieve
    strong secrecy and semantic security over any ergodic stationary fading/MIMO
    wiretap channel with sufficiently fast decay of time correlations, for all
    secrecy rates (R < C_b-C_e -kappa), where (C_b) and (C_e) are Bob and Eve’s
    channel capacities respectively, and (kappa) is an explicit constant gap.
    Moreover, these codes are almost universal in the sense that a fixed code is
    good for secrecy for a wide range of fading models.

    Information-Theoretic Bounds and Approximations in Neural Population Coding

    Wentao Huang, Kechen Zhang
    Subjects: Information Theory (cs.IT); Learning (cs.LG)

    Information theory is a powerful tool for neuroscience and other disciplines.
    Efficient calculation of Shannon’s mutual information (MI) is a key
    computational step that often presents the biggest difficulty for practical
    applications. In this paper, we propose effective approximation methods for
    evaluating MI in the context of neural population coding, especially for
    high-dimensional inputs. We prove several information-theoretic asymptotic
    bounds and approximation formulas for large size neural populations. We also
    discuss how optimization of neural population coding based on these
    approximation formulas leads to a convex problem about the population density
    distribution in neural population parameter space. Several useful techniques,
    including variable transformation and dimensionality reduction, are proposed
    for more efficient computation of the approximations. Our numerical simulation
    results show that our asymptotic formulas are highly accurate for approximating
    MI in neural populations. For some special cases, the approximations by our
    formulas are exactly equal to the true MI. The approximation methods for MI may
    have a wide range of applications in various disciplines.

    Denoising based Vector Approximate Message Passing

    Philip Schniter, Sundeep Rangan, Alyson Fletcher
    Subjects: Information Theory (cs.IT)

    The denoising-based approximate message passing (D-AMP) methodology, recently
    proposed by Metzler, Maleki, and Baraniuk, allows one to plug in sophisticated
    denoisers like BM3D into the AMP algorithm to achieve state-of-the-art
    compressive image recovery. But AMP diverges with small deviations from the
    i.i.d.-Gaussian assumption on the measurement matrix. Recently, the vector AMP
    (VAMP) algorithm has been proposed to fix this problem. In this work, we show
    that the benefits of VAMP extend to D-VAMP.

    Out-of-Band Radiation from Large Antenna Arrays

    Christopher Mollén, Erik G. Larsson, Ulf Gustavsson, Thomas Eriksson, Robert W. Heath Jr
    Subjects: Information Theory (cs.IT)

    Co-existing wireless systems that share a common spectrum need to mitigate
    out-of-band (OOB) radiation to avoid excessive interference. In legacy SISO
    transmitters and small MIMO arrays, OOB radiation is well understood and is
    commonly handled by digital compensation techniques. In large arrays, however,
    new phenomena and hardware limitations have to be considered: First, signals
    can be radiated directionally, which might focus the OOB radiation. Second,
    low-complexity hardware with poor linearity has to be used for cost reasons,
    which increases the relative amount of OOB radiation. Given that massive MIMO
    and millimeter wave communication rely on base stations with a huge number of
    antennas, the spatial behavior of OOB radiation from large arrays will have
    significant implications for future hardware requirements. We show that, if the
    OOB radiation is beamformed, its array gain is never larger than that of the
    in-band signal. In many cases, the OOB radiation is even close to isotropic
    also when the in-band signal is highly directive. With the same total radiated
    power, the OOB radiation from large arrays is therefore never more severe than
    from a legacy system with the same adjacent-channel-leakage ratio. The OOB
    radiation is even less detrimental than from a legacy system since the high
    array gain of the in-band signal allows large arrays to radiate less total
    power than legacy systems. We also show how OOB radiation from large arrays
    varies with location in static propagation environments and how these effects
    vanish when averaged over the small-scale fading. A main conclusions is that
    the linearity requirement for large arrays can be relaxed as compared to legacy
    systems. Specifically, less stringent linearity requirements on each
    transmitter makes it possible to build large arrays from low-complexity
    hardware.

    Phi-Entropic Measures of Correlation

    Salman Beigi, Amin Gohari
    Subjects: Information Theory (cs.IT)

    A measure of correlation is said to have the tensorization property if it is
    unchanged when computed for i.i.d. copies. More precisely, a measure of
    correlation between two random variables ((X, Y)) denoted by (
    ho(X, Y)), has
    the tensorization property if (
    ho(X^n, Y^n)=
    ho(X, Y)) where ((X^n, Y^n)) is
    (n) i.i.d. copies of ((X, Y)).Two well-known examples of such measures are the
    maximal correlation and the hypercontractivity ribbon (HC~ribbon). We show that
    the maximal correlation and HC ribbons are special cases of (Phi)-ribbon,
    defined in this paper for any function (Phi) from a class of convex functions
    ((Phi)-ribbon reduces to HC~ribbon and the maximal correlation for special
    choices of (Phi)). Any (Phi)-ribbon is shown to be a measures of correlation
    with the tensorization property. We show that the (Phi)-ribbon also
    characterizes the (Phi)-strong data processing inequality constant introduced
    by Raginsky. We further study the (Phi)-ribbon for the choice of (Phi(t)=t^2)
    and introduce an equivalent characterization of this ribbon.

    On TDMA Optimality in Locally Connected Networks with no CSIT

    Aly El Gamal
    Comments: submitted to IEEE International Conference on Communications (ICC 2017)
    Subjects: Information Theory (cs.IT)

    In this work, we study scenarios of wireless networks where using simple
    Time-Division-Multiple-Access (TDMA) is optimal if no channel state information
    is available at the transmitters (no CSIT). We consider single-hop locally
    connected interference networks where each transmitter is connected to the
    receiver that has the same index as well as L succeeding receivers. The
    considered rate criterion is the per user Degrees of Freedom as the number of
    transmitter-receiver pairs goes to infinity. For the case when L=1, it was
    shown in previous work that TDMA is optimal, even if each message can be
    available at multiple transmitters and linear cooperation schemes are allowed.
    Here, we extend this conclusion to the case where L=2, by proving optimality of
    TDMA in this case as well. We then study the problem for general values of L
    without allowing for cooperative transmission. We prove optimality of TDMA when
    each transmitter is serving the receiver with the same index, and finally
    conjecture – based on an example – that the same conclusion applies even if
    each receiver can be served by any single transmitter connected to it.

    Capacity-Achieving Rate-Compatible Polar Codes for General Channels

    Marco Mondelli, S. Hamed Hassani, Ivana Marić, Dennis Hui, Song-Nam Hong
    Comments: 6 pages, 2 figures, submitted to WCNC’17 workshop on polar coding
    Subjects: Information Theory (cs.IT)

    We present a rate-compatible polar coding scheme that achieves the capacity
    of any family of channels. Our solution generalizes the previous results [1],
    [2] that provide capacity-achieving rate-compatible polar codes for a degraded
    family of channels. The motivation for our extension comes from the fact that
    in many practical scenarios, e.g., MIMO systems and non-Gaussian interference,
    the channels cannot be ordered by degradation. The main technical contribution
    of this paper consists in removing the degradation condition. To do so, we
    exploit the ideas coming from the construction of universal polar codes.

    Our scheme possesses the usual attractive features of polar codes: low
    complexity code construction, encoding, and decoding; super-polynomial scaling
    of the error probability with the block length; and absence of error floors. On
    the negative side, the scaling of the gap to capacity with the block length is
    slower than in standard polar codes, and we prove an upper bound on the scaling
    exponent.

    Hierarchical Overlapping Clustering of Network Data Using Cut Metrics

    Fernando Gama, Santiago Segarra, Alejandro Ribeiro
    Comments: Submitted to IEEE Transactions on Signal and Information Processing Over Networks
    Subjects: Social and Information Networks (cs.SI); Information Theory (cs.IT); Physics and Society (physics.soc-ph)

    A novel method to obtain hierarchical and overlapping clusters from network
    data -i.e., a set of nodes endowed with pairwise dissimilarities- is presented.
    The introduced method is hierarchical in the sense that it outputs a nested
    collection of groupings of the node set depending on the resolution or degree
    of similarity desired, and it is overlapping since it allows nodes to belong to
    more than one group. Our construction is rooted on the facts that a
    hierarchical (non-overlapping) clustering of a network can be equivalently
    represented by a finite ultrametric space and that a convex combination of
    ultrametrics results in a cut metric. By applying a hierarchical
    (non-overlapping) clustering method to multiple dithered versions of a given
    network and then convexly combining the resulting ultrametrics, we obtain a cut
    metric associated to the network of interest. We then show how to extract a
    hierarchical overlapping clustering structure from the aforementioned cut
    metric. Furthermore, the so-called overlapping function is presented as a tool
    for gaining insights about the data by identifying meaningful resolutions of
    the obtained hierarchical structure. Additionally, we explore hierarchical
    overlapping quasi-clustering methods that preserve the asymmetry of the data
    contained in directed networks. Finally, the presented method is illustrated
    via synthetic and real-world classification problems including handwritten
    digit classification and authorship attribution of famous plays.

    On the primitivity of PRESENT and other lightweight ciphers

    Riccardo Aragona, Marco Calderini, Antonio Tortora, Maria Tota
    Subjects: Group Theory (math.GR); Cryptography and Security (cs.CR); Information Theory (cs.IT)

    We provide two sufficient conditions to guarantee that the round functions of
    a translation based cipher generate a primitive group. Furthermore, if a round
    of the cipher is strongly proper and consists of m-bit S-Boxes, with m = 3, 4
    or 5, we prove that such a group is the alternating group. As an immediate
    consequence, we deduce that the round functions of some lightweight translation
    based ciphers, such as the PRESENT cipher, generate the alternating group.

    Adaptive Geometric Multiscale Approximations for Intrinsically Low-dimensional Data

    Wenjing Liao, Mauro Maggioni
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Statistics Theory (math.ST)

    We consider the problem of efficiently approximating and encoding
    high-dimensional data sampled from a probability distribution (
    ho) in
    (mathbb{R}^D), that is nearly supported on a (d)-dimensional set (mathcal{M})
    – for example supported on a (d)-dimensional Riemannian manifold. Geometric
    Multi-Resolution Analysis (GMRA) provides a robust and computationally
    efficient procedure to construct low-dimensional geometric approximations of
    (mathcal{M}) at varying resolutions. We introduce a thresholding algorithm on
    the geometric wavelet coefficients, leading to what we call adaptive GMRA
    approximations. We show that these data-driven, empirical approximations
    perform well, when the threshold is chosen as a suitable universal function of
    the number of samples (n), on a wide variety of measures (
    ho), that are
    allowed to exhibit different regularity at different scales and locations,
    thereby efficiently encoding data from more complex measures than those
    supported on manifolds. These approximations yield a data-driven dictionary,
    together with a fast transform mapping data to coefficients, and an inverse of
    such a map. The algorithms for both the dictionary construction and the
    transforms have complexity (C n log n) with the constant linear in (D) and
    exponential in (d). Our work therefore establishes adaptive GMRA as a fast
    dictionary learning algorithm with approximation guarantees. We include several
    numerical experiments on both synthetic and real data, confirming our
    theoretical results and demonstrating the effectiveness of adaptive GMRA.




沪ICP备19023445号-2号
友情链接