IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Mon, 17 Apr 2017

    我爱机器学习(52ml.net)发表于 2017-04-17 00:00:00
    love 0

    Neural and Evolutionary Computing

    Runtime Analysis of the ((1+(λ,λ))) Genetic Algorithm on Random Satisfiable 3-CNF Formulas

    Maxim Buzdalov, Benjamin Doerr
    Comments: An extended abstract of this report will appear in the proceedings of the 2017 Genetic and Evolutionary Computation Conference (GECCO 2017)
    Subjects: Neural and Evolutionary Computing (cs.NE)

    The ((1+(lambda,lambda))) genetic algorithm, first proposed at GECCO 2013,
    showed a surprisingly good performance on so me optimization problems. The
    theoretical analysis so far was restricted to the OneMax test function, where
    this GA profited from the perfect fitness-distance correlation. In this work,
    we conduct a rigorous runtime analysis of this GA on random 3-SAT instances in
    the planted solution model having at least logarithmic average degree, which
    are known to have a weaker fitness distance correlation.

    We prove that this GA with fixed not too large population size again obtains
    runtimes better than (Theta(n log n)), which is a lower bound for most
    evolutionary algorithms on pseudo-Boolean problems with unique optimum.
    However, the self-adjusting version of the GA risks reaching population sizes
    at which the intermediate selection of the GA, due to the weaker
    fitness-distance correlation, is not able to distinguish a profitable offspring
    from others. We show that this problem can be overcome by equipping the
    self-adjusting GA with an upper limit for the population size. Apart from
    sparse instances, this limit can be chosen in a way that the asymptotic
    performance does not worsen compared to the idealistic OneMax case. Overall,
    this work shows that the ((1+(lambda,lambda))) GA can provably have a good
    performance on combinatorial search and optimization problems also in the
    presence of a weaker fitness-distance correlation.

    Reward-based stochastic self-configuration of neural circuits

    David Kappel, Robert Legenstein, Stefan Habenschuss, Michael Hsieh, Wolfgang Maass
    Subjects: Neurons and Cognition (q-bio.NC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Experimental data suggest that neural circuits configure their synaptic
    connectivity for a given computational task. They also point to dopamine-gated
    stochastic spine dynamics as an important underlying mechanism, and they show
    that the stochastic component of synaptic plasticity is surprisingly strong. We
    propose a model that elucidates how task-dependent self-configuration of neural
    circuits can emerge through these mechanisms. The Fokker-Planck equation allows
    us to relate local stochastic processes at synapses to the stationary
    distribution of network configurations, and thereby to computational properties
    of the network. This framework suggests a new model for reward-gated network
    plasticity, where one replaces the common policy gradient paradigm by
    continuously ongoing stochastic policy search (sampling) from a posterior
    distribution of network configurations. This posterior integrates priors that
    encode for example previously attained knowledge and structural constraints.
    This model can explain the experimentally found capability of neural circuits
    to configure themselves for a given task, and to compensate automatically for
    changes in the network or task. We also show that experimental data on
    dopamine-modulated spine dynamics can be modeled within this theoretical
    framework, and that a strong stochastic component of synaptic plasticity is
    essential for its performance.


    Computer Vision and Pattern Recognition

    TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering

    Yunseok Jang, Yale Song, Youngjae Yu, Youngjin Kim, Gunhee Kim
    Comments: Accepted paper at CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Vision and language understanding has emerged as a subject undergoing intense
    study in Artificial Intelligence. Among many tasks in this line of research,
    visual question answering (VQA) has been one of the most successful ones, where
    the goal is to learn a model that understands visual content at region-level
    details and finds their associations with pairs of questions and answers in the
    natural language form. Despite the rapid progress in the past few years, most
    existing work in VQA have focused primarily on images. In this paper, we focus
    on extending VQA to the video domain and contribute to the literature in three
    important ways. First, we propose three new tasks designed specifically for
    video VQA, which require spatio-temporal reasoning from videos to answer
    questions correctly. Next, we introduce a new large-scale dataset for video VQA
    named TGIF-QA that extends existing VQA work with our new tasks. Finally, we
    propose a dual-LSTM based approach with both spatial and temporal attention,
    and show its effectiveness over conventional VQA techniques through empirical
    evaluations.

    Deep Structured Learning for Facial Action Unit Intensity Estimation

    Robert Walecki, Ognjen (Oggi)
    Rudovic, Vladimir Pavlovic, Björn Schuller, Maja Pantic
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We consider the task of automated estimation of facial expression intensity.
    This involves estimation of multiple output variables (facial action units —
    AUs) that are structurally dependent. Their structure arises from statistically
    induced co-occurrence patterns of AU intensity levels. Modeling this structure
    is critical for improving the estimation performance; however, this performance
    is bounded by the quality of the input features extracted from face images. The
    goal of this paper is to model these structures and estimate complex feature
    representations simultaneously by combining conditional random field (CRF)
    encoded AU dependencies with deep learning. To this end, we propose a novel
    Copula CNN deep learning approach for modeling multivariate ordinal variables.
    Our model accounts for (ordinal) structure in output variables and their
    (non)-(linear) dependencies via copula functions modeled as cliques of a CRF.
    These are jointly optimized with deep CNN feature encoding layers using a newly
    introduced balanced batch iterative training algorithm. We demonstrate the
    effectiveness of our approach on the task of AU intensity estimation on two
    benchmark datasets. We show that joint learning of the deep features and the
    target output structure results in significant performance gains compared to
    existing deep structured models for analysis of facial expressions.

    3D seismic data denoising using two-dimensional sparse coding scheme

    Ming-Jun Su, Jingbo Chang, Feng Qian, Guangmin Hu, Xiao-Yang Liu
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Engineering, Finance, and Science (cs.CE)

    Seismic data denoising is vital to geophysical applications and the
    transform-based function method is one of the most widely used techniques.
    However, it is challenging to design a suit- able sparse representation to
    express a transform-based func- tion group due to the complexity of seismic
    data. In this paper, we apply a seismic data denoising method based on
    learning- type overcomplete dictionaries which uses two-dimensional sparse
    coding (2DSC). First, we model the input seismic data and dictionaries as
    third-order tensors and introduce tensor- linear combinations for data
    approximation. Second, we ap- ply learning-type overcomplete dictionary, i.e.,
    optimal sparse data representation is achieved through learning and training.
    Third, we exploit the alternating minimization algorithm to solve the
    optimization problem of seismic denoising. Finally we evaluate its denoising
    performance on synthetic seismic data and land data survey. Experiment results
    show that the two-dimensional sparse coding scheme reduces computational costs
    and enhances the signal-to-noise ratio.

    Parallel Multi Channel Convolution using General Matrix Multiplication

    Aravind Vasudevan, Andrew Anderson, David Gregg
    Comments: Under review at ASAP 2017 – The 28th Annual IEEE International Conference on Application-specific Systems, Architectures and Processors. 11 pages, 13 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Performance (cs.PF)

    Convolutional neural networks (CNNs) have emerged as one of the most
    successful machine learning technologies for image and video processing. The
    most computationally intensive parts of CNNs are the convolutional layers,
    which convolve multi-channel images with multiple kernels. A common approach to
    implementing convolutional layers is to expand the image into a column matrix
    (im2col) and perform Multiple Channel Multiple Kernel (MCMK) convolution using
    an existing parallel General Matrix Multiplication (GEMM) library. This im2col
    conversion greatly increases the memory footprint of the input matrix and
    reduces data locality.

    In this paper we propose a new approach to MCMK convolution that is based on
    General Matrix Multiplication (GEMM), but not on im2col. Our algorithm
    eliminates the need for data replication on the input. By splitting a single
    call to GEMM into several smaller calls, we can eliminate date size increases
    on either the input or output of the convolution layer. We have implemented
    several variants of our algorithm on CPU and GPU processors. On CPU, our
    algorithm uses much less memory than im2col and in most cases is also faster.

    Learning a collaborative multiscale dictionary based on robust empirical mode decomposition

    Rui Chen, Huizhu Jia, Xiaodong Xie, Wen Gao
    Comments: to be published in Neurocomputing
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Dictionary learning is a challenge topic in many image processing areas. The
    basic goal is to learn a sparse representation from an overcomplete basis set.
    Due to combining the advantages of generic multiscale representations with
    learning based adaptivity, multiscale dictionary representation approaches have
    the power in capturing structural characteristics of natural images. However,
    existing multiscale learning approaches still suffer from three main
    weaknesses: inadaptability to diverse scales of image data, sensitivity to
    noise and outliers, difficulty to determine optimal dictionary structure. In
    this paper, we present a novel multiscale dictionary learning paradigm for
    sparse image representations based on an improved empirical mode decomposition.
    This powerful data-driven analysis tool for multi-dimensional signal can fully
    adaptively decompose the image into multiscale oscillating components according
    to intrinsic modes of data self. This treatment can obtain a robust and
    effective sparse representation, and meanwhile generates a raw base dictionary
    at multiple geometric scales and spatial frequency bands. This dictionary is
    refined by selecting optimal oscillating atoms based on frequency clustering.
    In order to further enhance sparsity and generalization, a tolerance dictionary
    is learned using a coherence regularized model. A fast proximal scheme is
    developed to optimize this model. The multiscale dictionary is considered as
    the product of oscillating dictionary and tolerance dictionary. Experimental
    results demonstrate that the proposed learning approach has the superior
    performance in sparse image representations as compared with several competing
    methods. We also show the promising results in image denoising application.

    DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents

    Namhoon Lee, Wongun Choi, Paul Vernaza, Christopher B. Choy, Philip H. S. Torr, Manmohan Chandraker
    Comments: Accepted at CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We introduce a Deep Stochastic IOC RNN Encoderdecoder framework, DESIRE, for
    the task of future predictions of multiple interacting agents in dynamic
    scenes. DESIRE effectively predicts future locations of objects in multiple
    scenes by 1) accounting for the multi-modal nature of the future prediction
    (i.e., given the same context, future may vary), 2) foreseeing the potential
    future outcomes and make a strategic prediction based on that, and 3) reasoning
    not only from the past motion history, but also from the scene context as well
    as the interactions among the agents. DESIRE achieves these in a single
    end-to-end trainable neural network model, while being computationally
    efficient. The model first obtains a diverse set of hypothetical future
    prediction samples employing a conditional variational autoencoder, which are
    ranked and refined by the following RNN scoring-regression module. Samples are
    scored by accounting for accumulated future rewards, which enables better
    long-term strategic decisions similar to IOC frameworks. An RNN scene context
    fusion module jointly captures past motion histories, the semantic scene
    context and interactions among multiple agents. A feedback mechanism iterates
    over the ranking and refinement to further boost the prediction accuracy. We
    evaluate our model on two publicly available datasets: KITTI and Stanford Drone
    Dataset. Our experiments show that the proposed model significantly improves
    the prediction accuracy compared to other baseline methods.

    Camera Calibration by Global Constraints on the Motion of Silhouettes

    Gil Ben-Artzi
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We address the problem of epipolar geometry using the motion of silhouettes.
    Such methods match epipolar lines or frontier points across views, which are
    then used as the set of putative correspondences. We introduce an approach that
    improves by two orders of magnitude the performance over state-of-the-art
    methods, by significantly reducing the number of outliers in the putative
    matching. We model the frontier points’ correspondence problem as constrained
    flow optimization, requiring small differences between their coordinates over
    consecutive frames. Our approach is formulated as a Linear Integer Program and
    we show that due to the nature of our problem, it can be solved efficiently in
    an iterative manner. Our method was validated on four standard datasets
    providing accurate calibrations across very different viewpoints.

    Dataset Augmentation for Pose and Lighting Invariant Face Recognition

    Daniel Crispell, Octavian Biris, Nate Crosswhite, Jeffrey Byrne, Joseph L. Mundy
    Comments: Appeared in 2016 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The performance of modern face recognition systems is a function of the
    dataset on which they are trained. Most datasets are largely biased toward
    “near-frontal” views with benign lighting conditions, negatively effecting
    recognition performance on images that do not meet these criteria. The proposed
    approach demonstrates how a baseline training set can be augmented to increase
    pose and lighting variability using semi-synthetic images with simulated pose
    and lighting conditions. The semi-synthetic images are generated using a fast
    and robust 3-d shape estimation and rendering pipeline which includes the full
    head and background. Various methods of incorporating the semi-synthetic
    renderings into the training procedure of a state of the art deep neural
    network-based recognition system without modifying the structure of the network
    itself are investigated. Quantitative results are presented on the challenging
    IJB-A identification dataset using a state of the art recognition pipeline as a
    baseline.

    CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data

    Lukas Cavigelli, Philippe Degen, Luca Benini
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Extracting per-frame features using convolutional neural networks for
    real-time processing of video data is currently mainly performed on powerful
    GPU-accelerated workstations and compute clusters. However, there are many
    applications such as smart surveillance cameras that require or would benefit
    from on-site processing. To this end, we propose and evaluate a novel algorithm
    for change-based evaluation of CNNs for video data recorded with a static
    camera setting, exploiting the spatio-temporal sparsity of pixel changes. We
    achieve an average speed-up of 8.6x over a cuDNN baseline on a realistic
    benchmark with a negligible accuracy loss of less than 0.1% and no retraining
    of the network. The resulting energy efficiency is 10x higher than per-frame
    evaluation and reaches an equivalent of 328 GOp/s/W on the Tegra X1 platform.

    FastVentricle: Cardiac Segmentation with ENet

    Jesse Lieman-Sifry, Matthieu Le, Felix Lau, Sean Sall, Daniel Golden
    Comments: 11 pages, 6 figures, Accepted to Functional Imaging and Modeling of the Heart (FIMH) 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Cardiac Magnetic Resonance (CMR) imaging is commonly used to assess cardiac
    structure and function. One disadvantage of CMR is that post-processing of
    exams is tedious. Without automation, precise assessment of cardiac function
    via CMR typically requires an annotator to spend tens of minutes per case
    manually contouring ventricular structures. Automatic contouring can lower the
    required time per patient by generating contour suggestions that can be lightly
    modified by the annotator. Fully convolutional networks (FCNs), a variant of
    convolutional neural networks, have been used to rapidly advance the
    state-of-the-art in automated segmentation, which makes FCNs a natural choice
    for ventricular segmentation. However, FCNs are limited by their computational
    cost, which increases the monetary cost and degrades the user experience of
    production systems. To combat this shortcoming, we have developed the
    FastVentricle architecture, an FCN architecture for ventricular segmentation
    based on the recently developed ENet architecture. FastVentricle is 4x faster
    and runs with 6x less memory than the previous state-of-the-art ventricular
    segmentation architecture while still maintaining excellent clinical accuracy.

    Visual Recognition of Paper Analytical Device Images for Detection of Falsified Pharmaceuticals

    Sandipan Banerjee, James Sweet, Christopher Sweet, Marya Lieberman
    Comments: in Proc. IEEE Winter Conference on Applications of Computer Vision (WACV), 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Falsification of medicines is a big problem in many developing countries,
    where technological infrastructure is inadequate to detect these harmful
    products. We have developed a set of inexpensive paper cards, called Paper
    Analytical Devices (PADs), which can efficiently classify drugs based on their
    chemical composition, as a potential solution to the problem. These cards have
    different reagents embedded in them which produce a set of distinctive color
    descriptors upon reacting with the chemical compounds that constitute
    pharmaceutical dosage forms. If a falsified version of the medicine lacks the
    active ingredient or includes substitute fillers, the difference in color is
    perceivable by humans. However, reading the cards with accuracy takes training
    and practice, which may hamper their scaling and implementation in low resource
    settings. To deal with this, we have developed an automatic visual recognition
    system to read the results from the PAD images. At first, the optimal set of
    reagents was found by running singular value decomposition on the intensity
    values of the color tones in the card images. A dataset of cards embedded with
    these reagents is produced to generate the most distinctive results for a set
    of 26 different active pharmaceutical ingredients (APIs) and excipients. Then,
    we train two popular convolutional neural network (CNN) models, with the card
    images. We also extract some “hand-crafted” features from the images and train
    a nearest neighbor classifier and a non-linear support vector machine with
    them. On testing, higher-level features performed much better in accurately
    classifying the PAD images, with the CNN models reaching the highest average
    accuracy of over 94\%.

    Applying High-Resolution Visible Imagery to Satellite Melt Pond Fraction Retrieval: A Neural Network Approach

    Qi Liu, Yawen Zhang, Qin Lv, Li Shang
    Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Computer Vision and Pattern Recognition (cs.CV)

    During summer, melt ponds have a significant influence on Arctic sea-ice
    albedo. The melt pond fraction (MPF) also has the ability to forecast the
    Arctic sea-ice in a certain period. It is important to retrieve accurate melt
    pond fraction (MPF) from satellite data for Arctic research. This paper
    proposes a satellite MPF retrieval model based on the multi-layer neural
    network, named MPF-NN. Our model uses multi-spectral satellite data as model
    input and MPF information from multi-site and multi-period visible imagery as
    prior knowledge for modeling. It can effectively model melt ponds evolution of
    different regions and periods over the Arctic. Evaluation results show that the
    MPF retrieved from MODIS data using the proposed model has an RMSE of 3.91% and
    a correlation coefficient of 0.73. The seasonal distribution of MPF is also
    consistent with previous results.

    Close Yet Distinctive Domain Adaptation

    Lingkun Luo, Xiaofang Wang, Shiqiang Hu, Chao Wang, Yuxing Tang, Liming Chen
    Comments: 11pages, 3 figures, ICCV2017
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Domain adaptation is transfer learning which aims to generalize a learning
    model across training and testing data with different distributions. Most
    previous research tackle this problem in seeking a shared feature
    representation between source and target domains while reducing the mismatch of
    their data distributions. In this paper, we propose a close yet discriminative
    domain adaptation method, namely CDDA, which generates a latent feature
    representation with two interesting properties. First, the discrepancy between
    the source and target domain, measured in terms of both marginal and
    conditional probability distribution via Maximum Mean Discrepancy is minimized
    so as to attract two domains close to each other. More importantly, we also
    design a repulsive force term, which maximizes the distances between each label
    dependent sub-domain to all others so as to drag different class dependent
    sub-domains far away from each other and thereby increase the discriminative
    power of the adapted domain. Moreover, given the fact that the underlying data
    manifold could have complex geometric structure, we further propose the
    constraints of label smoothness and geometric structure consistency for label
    propagation. Extensive experiments are conducted on 36 cross-domain image
    classification tasks over four public datasets. The comprehensive results show
    that the proposed method consistently outperforms the state-of-the-art methods
    with significant margins.


    Artificial Intelligence

    Incremental learning of high-level concepts by imitation

    Mina Alibeigi, Majid Nili Ahmadabadi, Babak Nadjar Araabi
    Comments: 6 pages, 5 figures, 2 tables, supplementary material, conference
    Journal-ref: IEEE Transactions on Robotics 33.1 (2017): 153-168
    Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

    Nowadays, robots become a companion in everyday life. To be well-accepted by
    humans, robots should efficiently understand meanings of their partners’
    motions and body language, and respond accordingly. Learning concepts by
    imitation brings them this ability in a user-friendly way.

    This paper presents a fast and robust model for Incremental Learning of
    Concepts by Imitation (ILoCI). In ILoCI, observed multimodal spatio-temporal
    demonstrations are incrementally abstracted and generalized based on both their
    perceptual and functional similarities during the imitation. In this method,
    perceptually similar demonstrations are abstracted by a dynamic model of mirror
    neuron system. An incremental method is proposed to learn their functional
    similarities through a limited number of interactions with the teacher.
    Learning all concepts together by the proposed memory rehearsal enables robot
    to utilize the common structural relations among concepts which not only
    expedites the learning process especially at the initial stages, but also
    improves the generalization ability and the robustness against discrepancies
    between observed demonstrations.

    Performance of ILoCI is assessed using standard LASA handwriting benchmark
    data set. The results show efficiency of ILoCI in concept acquisition,
    recognition and generation in addition to its robustness against variability in
    demonstrations.

    Environment-Independent Task Specifications via GLTL

    Michael L. Littman, Ufuk Topcu, Jie Fu, Charles Isbell, Min Wen, James MacGlashan
    Subjects: Artificial Intelligence (cs.AI)

    We propose a new task-specification language for Markov decision processes
    that is designed to be an improvement over reward functions by being
    environment independent. The language is a variant of Linear Temporal Logic
    (LTL) that is extended to probabilistic specifications in a way that permits
    approximations to be learned in finite time. We provide several small
    environments that demonstrate the advantages of our geometric LTL (GLTL)
    language and illustrate how it can be used to specify standard
    reinforcement-learning tasks straightforwardly.

    Deep API Programmer: Learning to Program with APIs

    Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, Pushmeet Kohli
    Comments: 8 pages + 4 pages of supplementary material. Submitted to IJCAI 2017
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    We present DAPIP, a Programming-By-Example system that learns to program with
    APIs to perform data transformation tasks. We design a domain-specific language
    (DSL) that allows for arbitrary concatenations of API outputs and constant
    strings. The DSL consists of three family of APIs: regular expression-based
    APIs, lookup APIs, and transformation APIs. We then present a novel neural
    synthesis algorithm to search for programs in the DSL that are consistent with
    a given set of examples. The search algorithm uses recently introduced neural
    architectures to encode input-output examples and to model the program search
    in the DSL. We show that synthesis algorithm outperforms baseline methods for
    synthesizing programs on both synthetic and real-world benchmarks.

    Graphical Models: An Extension to Random Graphs, Trees, and Other Objects

    Neil Hallonquist
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

    In this work, we consider an extension of graphical models to random graphs,
    trees, and other objects. To do this, many fundamental concepts for
    multivariate random variables (e.g., marginal variables, Gibbs distribution,
    Markov properties) must be extended to other mathematical objects; it turns out
    that this extension is possible, as we will discuss, if we have a consistent,
    complete system of projections on a given object. Each projection defines a
    marginal random variable, allowing one to specify independence assumptions
    between them. Furthermore, these independencies can be specified in terms of a
    small subset of these marginal variables (which we call the atomic variables),
    allowing the compact representation of independencies by a directed graph.
    Projections also define factors, functions on the projected object space, and
    hence a projection family defines a set of possible factorizations for a
    distribution; these can be compactly represented by an undirected graph.

    The invariances used in graphical models are essential for learning
    distributions, not just on multivariate random variables, but also on other
    objects. When they are applied to random graphs and random trees, the result is
    a general class of models that is applicable to a broad range of problems,
    including those in which the graphs and trees have complicated edge structures.
    These models need not be conditioned on a fixed number of vertices, as is often
    the case in the literature for random graphs, and can be used for problems in
    which attributes are associated with vertices and edges. For graphs,
    applications include the modeling of molecules, neural networks, and relational
    real-world scenes; for trees, applications include the modeling of infectious
    diseases, cell fusion, the structure of language, and the structure of objects
    in visual scenes. Many classic models are particular instances of this
    framework.

    Optimizing Differentiable Relaxations of Coreference Evaluation Metrics

    Phong Le, Ivan Titov
    Comments: 10 pages
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Coreference evaluation metrics are hard to optimize directly as they are
    non-differentiable functions, not easily decomposable into elementary
    decisions. Consequently, most approaches optimize objectives only indirectly
    related to the end goal, resulting in suboptimal performance. Instead, we
    propose a differentiable relaxation that lends itself to gradient-based
    optimisation, thus bypassing the need for reinforcement learning or heuristic
    modification of cross-entropy. We show that by modifying the training objective
    of a competitive neural coreference system, we obtain a substantial gain in
    performance. This suggests that our approach can be regarded as a viable
    alternative to using reinforcement learning or more computationally expensive
    imitation learning.

    Ultrafast photonic reinforcement learning based on laser chaos

    Makoto Naruse, Yuta Terashima, Atsushi Uchida, Song-Ju Kim
    Subjects: Optics (physics.optics); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)

    Reinforcement learning involves decision making in dynamic and uncertain
    environments, and constitutes one important element of artificial intelligence
    (AI). In this paper, we experimentally demonstrate that the ultrafast chaotic
    oscillatory dynamics of lasers efficiently solve the multi-armed bandit problem
    (MAB), which requires decision making concerning a class of difficult
    trade-offs called the exploration-exploitation dilemma. To solve the MAB, a
    certain degree of randomness is required for exploration purposes. However,
    pseudo-random numbers generated using conventional electronic circuitry
    encounter severe limitations in terms of their data rate and the quality of
    randomness due to their algorithmic foundations. We generate laser chaos
    signals using a semiconductor laser sampled at a maximum rate of 100 GSample/s,
    and combine it with a simple decision-making principle called tug-of-war with a
    variable threshold, to ensure ultrafast, adaptive and accurate decision making
    at a maximum adaptation speed of 1 GHz. We found that decision-making
    performance was maximized with an optimal sampling interval, and we highlight
    the exact coincidence between the negative autocorrelation inherent in laser
    chaos and decision-making performance. This study paves the way for a new realm
    of ultrafast photonics in the age of AI, where the ultrahigh bandwidth of
    photons can provide new value.

    CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data

    Lukas Cavigelli, Philippe Degen, Luca Benini
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Extracting per-frame features using convolutional neural networks for
    real-time processing of video data is currently mainly performed on powerful
    GPU-accelerated workstations and compute clusters. However, there are many
    applications such as smart surveillance cameras that require or would benefit
    from on-site processing. To this end, we propose and evaluate a novel algorithm
    for change-based evaluation of CNNs for video data recorded with a static
    camera setting, exploiting the spatio-temporal sparsity of pixel changes. We
    achieve an average speed-up of 8.6x over a cuDNN baseline on a realistic
    benchmark with a negligible accuracy loss of less than 0.1% and no retraining
    of the network. The resulting energy efficiency is 10x higher than per-frame
    evaluation and reaches an equivalent of 328 GOp/s/W on the Tegra X1 platform.

    Gamma Processes, Stick-Breaking, and Variational Inference

    Anirban Roychowdhury, Brian Kulis
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    While most Bayesian nonparametric models in machine learning have focused on
    the Dirichlet process, the beta process, or their variants, the gamma process
    has recently emerged as a useful nonparametric prior in its own right. Current
    inference schemes for models involving the gamma process are restricted to
    MCMC-based methods, which limits their scalability. In this paper, we present a
    variational inference framework for models involving gamma process priors. Our
    approach is based on a novel stick-breaking constructive definition of the
    gamma process. We prove correctness of this stick-breaking process by using the
    characterization of the gamma process as a completely random measure (CRM), and
    we explicitly derive the rate measure of our construction using Poisson process
    machinery. We also derive error bounds on the truncation of the infinite
    process required for variational inference, similar to the truncation analyses
    for other nonparametric models based on the Dirichlet and beta processes. Our
    representation is then used to derive a variational inference algorithm for a
    particular Bayesian nonparametric latent structure formulation known as the
    infinite Gamma-Poisson model, where the latent variables are drawn from a gamma
    process prior with Poisson likelihoods. Finally, we present results for our
    algorithms on nonnegative matrix factorization tasks on document corpora, and
    show that we compare favorably to both sampling-based techniques and
    variational approaches based on beta-Bernoulli priors.


    Computation and Language

    Cardinal Virtues: Extracting Relation Cardinalities from Text

    Paramita Mirza, Simon Razniewski, Fariz Darari, Gerhard Weikum
    Comments: 5 pages, ACL 2017 (short paper)
    Subjects: Computation and Language (cs.CL)

    Information extraction (IE) from text has largely focused on relations
    between individual entities, such as who has won which award. However, some
    facts are never fully mentioned, and no IE method has perfect recall. Thus, it
    is beneficial to also tap contents about the cardinalities of these relations,
    for example, how many awards someone has won. We introduce this novel problem
    of extracting cardinalities and discusses the specific challenges that set it
    apart from standard IE. We present a distant supervision method using
    conditional random fields. A preliminary evaluation results in precision
    between 3% and 55%, depending on the difficulty of relations.

    Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps

    Tobias Falke, Iryna Gurevych
    Subjects: Computation and Language (cs.CL)

    Concept maps can be used to concisely represent important information and
    bring structure into large document collections. Therefore, we study a variant
    of multi-document summarization that produces summaries in the form of concept
    maps. However, suitable evaluation datasets for this task are currently
    missing. To close this gap, we present a newly created corpus of concept maps
    that summarize heterogeneous collections of web documents on educational
    topics. It was created using a novel crowdsourcing approach that allows us to
    efficiently determine important elements in large document collections. We
    release the corpus along with a baseline system and proposed evaluation
    protocol to enable further research on this variant of summarization.

    Optimizing Differentiable Relaxations of Coreference Evaluation Metrics

    Phong Le, Ivan Titov
    Comments: 10 pages
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Coreference evaluation metrics are hard to optimize directly as they are
    non-differentiable functions, not easily decomposable into elementary
    decisions. Consequently, most approaches optimize objectives only indirectly
    related to the end goal, resulting in suboptimal performance. Instead, we
    propose a differentiable relaxation that lends itself to gradient-based
    optimisation, thus bypassing the need for reinforcement learning or heuristic
    modification of cross-entropy. We show that by modifying the training objective
    of a competitive neural coreference system, we obtain a substantial gain in
    performance. This suggests that our approach can be regarded as a viable
    alternative to using reinforcement learning or more computationally expensive
    imitation learning.

    How Robust Are Character-Based Word Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse?

    Georg Heigold, Günter Neumann, Josef van Genabith
    Comments: 9 pages
    Subjects: Computation and Language (cs.CL)

    This paper investigates the robustness of NLP against perturbed word forms.
    While neural approaches can achieve (almost) human-like accuracy for certain
    tasks and conditions, they often are sensitive to small changes in the input
    such as non-canonical input (e.g., typos). Yet both stability and robustness
    are desired properties in applications involving user-generated content, and
    the more as humans easily cope with such noisy or adversary conditions. In this
    paper, we study the impact of noisy input. We consider different noise
    distributions (one type of noise, combination of noise types) and mismatched
    noise distributions for training and testing. Moreover, we empirically evaluate
    the robustness of different models (convolutional neural networks, recurrent
    neural networks, non-neural models), different basic units (characters, byte
    pair encoding units), and different NLP tasks (morphological tagging, machine
    translation).

    Get To The Point: Summarization with Pointer-Generator Networks

    Abigail See, Peter J. Liu, Christopher D. Manning
    Comments: Accepted to ACL 2017
    Subjects: Computation and Language (cs.CL)

    Neural sequence-to-sequence models have provided a viable new approach for
    abstractive text summarization (meaning they are not restricted to simply
    selecting and rearranging passages from the original text). However, these
    models have two shortcomings: they are liable to reproduce factual details
    inaccurately, and they tend to repeat themselves. In this work we propose a
    novel architecture that augments the standard sequence-to-sequence attentional
    model in two orthogonal ways. First, we use a hybrid pointer-generator network
    that can copy words from the source text via pointing, which aids accurate
    reproduction of information, while retaining the ability to produce novel words
    through the generator. Second, we use coverage to keep track of what has been
    summarized, which discourages repetition. We apply our model to the CNN / Daily
    Mail summarization task, outperforming the current abstractive state-of-the-art
    by at least 2 ROUGE points.

    Exploiting Cross-Sentence Context for Neural Machine Translation

    Longyue Wang, Zhaopeng Tu, Andy Way, Qun Liu
    Subjects: Computation and Language (cs.CL)

    In translation, considering the document as a whole allows certain
    ambiguities and inconsistencies to be resolved. In this paper, we propose a
    cross-sentence context-aware approach and investigate the influence of
    historical contextual information on the performance of neural machine
    translation (NMT). First, this history is summarized in a hierarchical way. We
    then integrate the historical representation into NMT in two strategies: 1) a
    warm-start of encoder and decoder states, and 2) an auxiliary context source
    for updating decoder states. Experimental results on a large Chinese-English
    translation task show that our approach significantly improves upon a strong
    attention-based NMT system by up to +2.1 BLEU points.

    An entity-driven recursive neural network model for chinese discourse coherence modeling

    Fan Xu, Shujing Du, Maoxi Li, Mingwen Wang
    Comments: 9 pages
    Journal-ref: International Journal of Artificial Intelligence and Applications
    (IJAIA), Vol.8, No.2, March 2017
    Subjects: Computation and Language (cs.CL)

    Chinese discourse coherence modeling remains a challenge taskin Natural
    Language Processing field.Existing approaches mostlyfocus on the need for
    feature engineering, whichadoptthe sophisticated features to capture the logic
    or syntactic or semantic relationships acrosssentences within a text.In this
    paper, we present an entity-drivenrecursive deep modelfor the Chinese discourse
    coherence evaluation based on current English discourse coherenceneural network
    model. Specifically, to overcome the shortage of identifying the entity(nouns)
    overlap across sentences in the currentmodel, Our combined modelsuccessfully
    investigatesthe entities information into the recursive neural network
    freamework.Evaluation results on both sentence ordering and machine translation
    coherence rating task show the effectiveness of the proposed model, which
    significantly outperforms the existing strong baseline.

    Identity and Granularity of Events in Text

    Piek Vossen, Agata Cybulska
    Comments: Invited keynote speech by Piek Vossen at Cicling 2016
    Subjects: Computation and Language (cs.CL)

    In this paper we describe a method to detect event descrip- tions in
    different news articles and to model the semantics of events and their
    components using RDF representations. We compare these descriptions to solve a
    cross-document event coreference task. Our com- ponent approach to event
    semantics defines identity and granularity of events at different levels. It
    performs close to state-of-the-art approaches on the cross-document event
    coreference task, while outperforming other works when assuming similar quality
    of event detection. We demonstrate how granularity and identity are
    interconnected and we discuss how se- mantic anomaly could be used to define
    differences between coreference, subevent and topical relations.


    Distributed, Parallel, and Cluster Computing

    Encoding Cardinality Constraints using Generalized Selection Networks

    Michał Karpiński, Marek Piotrów
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

    Boolean cardinality constraints state that at most (at least, or exactly) (k)
    out of (n) propositional literals can be true. We propose a new class of
    selection networks that can be used for an efficient encoding of them. Several
    comparator networks have been proposed recently for encoding cardinality
    constraints and experiments have proved their efficiency. Those were based
    mainly on the odd-even or pairwise comparator networks. We use similar ideas,
    but we extend the model of comparator networks so that the basic components are
    not only comparators (2-sorters) but more general (m)-sorters, for (m geq 2).
    The inputs are organized into (m) columns, in which elements are recursively
    selected and, after that, columns are merged using an idea of multi-way
    merging. We present two algorithms parametrized by (m geq 2). We call those
    networks (m)-Wise Selection Network and (m)-Odd-Even Selection Network. We give
    detailed construction of the mergers when (m=4). The construction can be
    directly applied to any values of (k) and (n). The proposed encoding of sorters
    is standard, therefore the arc-consistency is preserved. We prove correctness
    of the constructions and present the theoretical and experimental evaluation,
    which show that the new encodings are competitive to the other state-of-art
    encodings.

    HPTT: A High-Performance Tensor Transposition C++ Library

    Paul Springer, Tong Su, Paolo Bientinesi
    Subjects: Mathematical Software (cs.MS); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

    Recently we presented TTC, a domain-specific compiler for tensor
    transpositions. Despite the fact that the performance of the generated code is
    nearly optimal, due to its offline nature, TTC cannot be utilized in all the
    application codes in which the tensor sizes and the necessary tensor
    permutations are determined at runtime. To overcome this limitation, we
    introduce the open-source C++ library High-Performance Tensor Transposition
    (HPTT). Similar to TTC, HPTT incorporates optimizations such as blocking,
    multi-threading, and explicit vectorization; furthermore it decomposes any
    transposition into multiple loops around a so called micro-kernel. This modular
    design—inspired by BLIS—makes HPTT easy to port to different architectures,
    by only replacing the hand-vectorized micro-kernel (e.g., a 4×4 transpose).
    HPTT also offers an optional autotuning framework—guided by a performance
    mode—that explores a vast search space of implementations at runtime (similar
    to FFTW). Across a wide range of different tensor transpositions and
    architectures (e.g., Intel Ivy Bridge, Intel Knights Landing, ARMv7, IBM
    Power7), HPTT attains a bandwidth comparable to that of SAXPY, and yields
    remarkable speedups over Eigen’s tensor transposition implementation. Most
    importantly, the integration of HPTT into the Cyclops Tensor Framework (CTF)
    improves the overall performance of tensor contractions by up to 3.1x.


    Learning

    Lean From Thy Neighbor: Stochastic & Adversarial Bandits in a Network

    L. Elisa Celis, Farnood Salehi
    Comments: This article was first circulated in January 2015 and presented at ISMP 2015 under the title “Bandit in a Network” (this https URL&mmnno=264&ppnno=85856)
    Subjects: Learning (cs.LG); Social and Information Networks (cs.SI)

    An individual’s decisions are often guided by those of his or her peers,
    i.e., neighbors in a social network. Presumably, being privy to the experiences
    of others aids in learning and decision making, but how much advantage does an
    individual gain by observing her neighbors? Such problems make appearances in
    sociology and economics and, in this paper, we present a novel model to capture
    such decision-making processes and appeal to the classical multi-armed bandit
    framework to analyze it. Each individual, in addition to her own actions, can
    observe the actions and rewards obtained by her neighbors, and can use all of
    this information in order to minimize her own regret. We provide algorithms for
    this setting, both for stochastic and adversarial bandits, and show that their
    regret smoothly interpolates between the regret in the classical bandit setting
    and that of the full-information setting as a function of the neighbors’
    exploration. In the stochastic setting the additional information must simply
    be incorporated into the usual estimation of the rewards, while in the
    adversarial setting this is attained by constructing a new unbiased estimator
    for the rewards and appropriately bounding the amount of additional information
    provided by the neighbors. These algorithms are optimal up to log factors;
    despite the fact that the agents act independently and selfishly, this implies
    that it is an approximate Nash equilibria for all agents to use our algorithms.
    Further, we show via empirical simulations that our algorithms, often
    significantly, outperform existing algorithms that one could apply to this
    setting.

    On Generalized Bellman Equations and Temporal-Difference Learning

    Huizhen Yu, A. Rupam Mahmood, Richard S. Sutton
    Comments: 35 pages; an abridged version to appear at The 30th Canadian Conference on Artificial Intelligence, May, 2017
    Subjects: Learning (cs.LG); Optimization and Control (math.OC)

    We consider off-policy temporal-difference (TD) learning in discounted Markov
    decision processes, where the goal is to evaluate a policy in a model-free way
    by using observations of a state process generated without executing the
    policy. To curb the high variance issue in off-policy TD learning, we propose a
    new scheme of setting the (lambda)-parameters of TD, based on generalized
    Bellman equations. Our scheme is to set (lambda) according to the eligibility
    trace iterates calculated in TD, thereby easily keeping these traces in a
    desired bounded range. Compared to prior works, this scheme is more direct and
    flexible, and allows much larger (lambda) values for off-policy TD learning
    with bounded traces. Using Markov chain theory, we prove the ergodicity of the
    joint state-trace process under nonrestrictive conditions, and we show that
    associated with our scheme is a generalized Bellman equation (for the policy to
    be evaluated) that depends on both the evolution of (lambda) and the unique
    invariant probability measure of the state-trace process. These results not
    only lead immediately to a characterization of the convergence behavior of
    least-squares based implementation of our scheme, but also prepare the ground
    for further analysis of gradient-based implementations.

    Close Yet Distinctive Domain Adaptation

    Lingkun Luo, Xiaofang Wang, Shiqiang Hu, Chao Wang, Yuxing Tang, Liming Chen
    Comments: 11pages, 3 figures, ICCV2017
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Domain adaptation is transfer learning which aims to generalize a learning
    model across training and testing data with different distributions. Most
    previous research tackle this problem in seeking a shared feature
    representation between source and target domains while reducing the mismatch of
    their data distributions. In this paper, we propose a close yet discriminative
    domain adaptation method, namely CDDA, which generates a latent feature
    representation with two interesting properties. First, the discrepancy between
    the source and target domain, measured in terms of both marginal and
    conditional probability distribution via Maximum Mean Discrepancy is minimized
    so as to attract two domains close to each other. More importantly, we also
    design a repulsive force term, which maximizes the distances between each label
    dependent sub-domain to all others so as to drag different class dependent
    sub-domains far away from each other and thereby increase the discriminative
    power of the adapted domain. Moreover, given the fact that the underlying data
    manifold could have complex geometric structure, we further propose the
    constraints of label smoothness and geometric structure consistency for label
    propagation. Extensive experiments are conducted on 36 cross-domain image
    classification tasks over four public datasets. The comprehensive results show
    that the proposed method consistently outperforms the state-of-the-art methods
    with significant margins.

    Liquid Splash Modeling with Neural Networks

    Kiwon Um, Xiangyu Hu, Nils Thuerey
    Subjects: Graphics (cs.GR); Learning (cs.LG)

    This paper proposes a new data-driven approach for modeling detailed splashes
    for liquid simulations with neural networks. Our model learns to generate
    small-scale splash detail for fluid-implicit-particle methods using training
    data acquired from physically accurate, high-resolution simulations. We use
    neural networks to model the regression of splash formation using a classifier
    together with a velocity modification term. More specifically, we employ a
    heteroscedastic model for the velocity updates. Our simulation results
    demonstrate that our model significantly improves visual fidelity with a large
    amount of realistic droplet formation and yields splash detail much more
    efficiently than finer discretizations. We show this for two different spatial
    scales and simulation setups.

    Optimizing Differentiable Relaxations of Coreference Evaluation Metrics

    Phong Le, Ivan Titov
    Comments: 10 pages
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Coreference evaluation metrics are hard to optimize directly as they are
    non-differentiable functions, not easily decomposable into elementary
    decisions. Consequently, most approaches optimize objectives only indirectly
    related to the end goal, resulting in suboptimal performance. Instead, we
    propose a differentiable relaxation that lends itself to gradient-based
    optimisation, thus bypassing the need for reinforcement learning or heuristic
    modification of cross-entropy. We show that by modifying the training objective
    of a competitive neural coreference system, we obtain a substantial gain in
    performance. This suggests that our approach can be regarded as a viable
    alternative to using reinforcement learning or more computationally expensive
    imitation learning.

    Learning a collaborative multiscale dictionary based on robust empirical mode decomposition

    Rui Chen, Huizhu Jia, Xiaodong Xie, Wen Gao
    Comments: to be published in Neurocomputing
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Dictionary learning is a challenge topic in many image processing areas. The
    basic goal is to learn a sparse representation from an overcomplete basis set.
    Due to combining the advantages of generic multiscale representations with
    learning based adaptivity, multiscale dictionary representation approaches have
    the power in capturing structural characteristics of natural images. However,
    existing multiscale learning approaches still suffer from three main
    weaknesses: inadaptability to diverse scales of image data, sensitivity to
    noise and outliers, difficulty to determine optimal dictionary structure. In
    this paper, we present a novel multiscale dictionary learning paradigm for
    sparse image representations based on an improved empirical mode decomposition.
    This powerful data-driven analysis tool for multi-dimensional signal can fully
    adaptively decompose the image into multiscale oscillating components according
    to intrinsic modes of data self. This treatment can obtain a robust and
    effective sparse representation, and meanwhile generates a raw base dictionary
    at multiple geometric scales and spatial frequency bands. This dictionary is
    refined by selecting optimal oscillating atoms based on frequency clustering.
    In order to further enhance sparsity and generalization, a tolerance dictionary
    is learned using a coherence regularized model. A fast proximal scheme is
    developed to optimize this model. The multiscale dictionary is considered as
    the product of oscillating dictionary and tolerance dictionary. Experimental
    results demonstrate that the proposed learning approach has the superior
    performance in sparse image representations as compared with several competing
    methods. We also show the promising results in image denoising application.

    Cross-media Similarity Metric Learning with Unified Deep Networks

    Jinwei Qi, Xin Huang, Yuxin Peng
    Comments: 19 pages, submitted to Multimedia Tools and Applications
    Subjects: Multimedia (cs.MM); Learning (cs.LG); Machine Learning (stat.ML)

    As a highlighting research topic in the multimedia area, cross-media
    retrieval aims to capture the complex correlations among multiple media types.
    Learning better shared representation and distance metric for multimedia data
    is important to boost the cross-media retrieval. Motivated by the strong
    ability of deep neural network in feature representation and comparison
    functions learning, we propose the Unified Network for Cross-media Similarity
    Metric (UNCSM) to associate cross-media shared representation learning with
    distance metric in a unified framework. First, we design a two-pathway deep
    network pretrained with contrastive loss, and employ double triplet similarity
    loss for fine-tuning to learn the shared representation for each media type by
    modeling the relative semantic similarity. Second, the metric network is
    designed for effectively calculating the cross-media similarity of the shared
    representation, by modeling the pairwise similar and dissimilar constraints.
    Compared to the existing methods which mostly ignore the dissimilar constraints
    and only use sample distance metric as Euclidean distance separately, our UNCSM
    approach unifies the representation learning and distance metric to preserve
    the relative similarity as well as embrace more complex similarity functions
    for further improving the cross-media retrieval accuracy. The experimental
    results show that our UNCSM approach outperforms 8 state-of-the-art methods on
    4 widely-used cross-media datasets.

    Deep API Programmer: Learning to Program with APIs

    Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, Pushmeet Kohli
    Comments: 8 pages + 4 pages of supplementary material. Submitted to IJCAI 2017
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    We present DAPIP, a Programming-By-Example system that learns to program with
    APIs to perform data transformation tasks. We design a domain-specific language
    (DSL) that allows for arbitrary concatenations of API outputs and constant
    strings. The DSL consists of three family of APIs: regular expression-based
    APIs, lookup APIs, and transformation APIs. We then present a novel neural
    synthesis algorithm to search for programs in the DSL that are consistent with
    a given set of examples. The search algorithm uses recently introduced neural
    architectures to encode input-output examples and to model the program search
    in the DSL. We show that synthesis algorithm outperforms baseline methods for
    synthesizing programs on both synthetic and real-world benchmarks.

    Stochastic Gradient Descent as Approximate Bayesian Inference

    Stephan Mandt, Matthew D. Hoffman, David M. Blei
    Comments: 26 pages + appendix
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Stochastic Gradient Descent with a constant learning rate (constant SGD)
    simulates a Markov chain with a stationary distribution. With this perspective,
    we derive several new results. (1) We show that constant SGD can be used as an
    approximate Bayesian posterior inference algorithm. Specifically, we show how
    to adjust the tuning parameters of constant SGD to best match the stationary
    distribution to a posterior, minimizing the Kullback-Leibler divergence between
    these two distributions. (2) We demonstrate that constant SGD gives rise to a
    new variational EM algorithm that optimizes hyperparameters in complex
    probabilistic models. (3) We also propose SGD with momentum for sampling and
    show how to adjust the damping coefficient accordingly. (4) We analyze MCMC
    algorithms. For Langevin Dynamics and Stochastic Gradient Fisher Scoring, we
    quantify the approximation errors due to finite learning rates. Finally (5), we
    use the stochastic process perspective to give a short proof of why Polyak
    averaging is optimal. Based on this idea, we propose a scalable approximate
    MCMC algorithm, the Averaged Stochastic Gradient Sampler.

    Reward-based stochastic self-configuration of neural circuits

    David Kappel, Robert Legenstein, Stefan Habenschuss, Michael Hsieh, Wolfgang Maass
    Subjects: Neurons and Cognition (q-bio.NC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Experimental data suggest that neural circuits configure their synaptic
    connectivity for a given computational task. They also point to dopamine-gated
    stochastic spine dynamics as an important underlying mechanism, and they show
    that the stochastic component of synaptic plasticity is surprisingly strong. We
    propose a model that elucidates how task-dependent self-configuration of neural
    circuits can emerge through these mechanisms. The Fokker-Planck equation allows
    us to relate local stochastic processes at synapses to the stationary
    distribution of network configurations, and thereby to computational properties
    of the network. This framework suggests a new model for reward-gated network
    plasticity, where one replaces the common policy gradient paradigm by
    continuously ongoing stochastic policy search (sampling) from a posterior
    distribution of network configurations. This posterior integrates priors that
    encode for example previously attained knowledge and structural constraints.
    This model can explain the experimentally found capability of neural circuits
    to configure themselves for a given task, and to compensate automatically for
    changes in the network or task. We also show that experimental data on
    dopamine-modulated spine dynamics can be modeled within this theoretical
    framework, and that a strong stochastic component of synaptic plasticity is
    essential for its performance.

    Accelerated Stochastic Quasi-Newton Optimization on Riemann Manifolds

    Anirban Roychowdhury, Srinivasan Parthasarathy
    Comments: Note: the proof of Theorem 3.5 will be changed in a future revision
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Differential Geometry (math.DG); Machine Learning (stat.ML)

    We propose L-BFGS and trust-region algorithms on Riemann manifolds that use
    stochastic variance reduction techniques to speed up convergence. For the
    stochastic L-BFGS algorithm we analyze and prove linear convergence rates for
    geodesically convex problems on the manifold, without resorting to linesearch
    methods designed to satisfy Wolfe conditions on the step size. To the best of
    our knowledge our trust-region method with stochastic variance reduction
    techniques is the first of its kind in the literature. We conduct experiments
    on Karcher mean computation for positive definite matrices and computation of
    leading eigenvectors for both synthetic and real data matrices, and demonstrate
    notably better performance than recently-proposed first-order stochastic
    optimization methods on Riemann manifolds, as well as standard trust-region
    manifold optimization techniques.

    Text-Independent Speaker Recognition for Low SNR Environments with Encryption

    Aman Chadha, Divya Jyoti, M. Mani Roja
    Comments: Biometrics, Pattern Recognition, Security, Speaker Individuality, Text-independence, Pitch Extraction, Voice Recognition, Autocorrelation; Published by Foundation of Computer Science, New York, USA
    Journal-ref: International Journal of Computer Applications 31(10):43-50, 2011
    Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Learning (cs.LG)

    Recognition systems are commonly designed to authenticate users at the access
    control levels of a system. A number of voice recognition methods have been
    developed using a pitch estimation process which are very vulnerable in low
    Signal to Noise Ratio (SNR) environments thus, these programs fail to provide
    the desired level of accuracy and robustness. Also, most text independent
    speaker recognition programs are incapable of coping with unauthorized attempts
    to gain access by tampering with the samples or reference database. The
    proposed text-independent voice recognition system makes use of multilevel
    cryptography to preserve data integrity while in transit or storage. Encryption
    and decryption follow a transform based approach layered with pseudorandom
    noise addition whereas for pitch detection, a modified version of the
    autocorrelation pitch extraction algorithm is used. The experimental results
    show that the proposed algorithm can decrypt the signal under test with
    exponentially reducing Mean Square Error over an increasing range of SNR.
    Further, it outperforms the conventional algorithms in actual identification
    tasks even in noisy environments. The recognition rate thus obtained using the
    proposed method is compared with other conventional methods used for speaker
    identification.


    Information Theory

    Optimal Power Splitting for Simultaneous Information Detection and Energy Harvesting

    Ali Kariminezhad, Soheil Gherekhloo, Aydin Sezgin
    Subjects: Information Theory (cs.IT)

    This letter deals with the joint information and energy processing at a
    receiver of a point-to-point communication channel. In particular, the
    trade-off between the achievable information rate and harvested energy for a
    multiple-antenna power splitting (PS) receiver is investigated. Here, the rate-
    energy region characterization is of particular interest, which is
    intrinsically a non-convex problem. In this letter, an efficient algorithm is
    proposed for obtaining an approximate solution to the problem in polynomial
    time. This algorithm is mainly based on the Taylor approximation in conjunction
    with semidefinite relaxation (SDR) which is solved by interior-point methods.
    Moreover, we utilize the Gaussian randomization procedure to obtain a feasible
    solution for the original problem. It is shown that by proper receiver design
    the rate-energy region can be significantly enlarged compared to the state of
    the art, while at the same time the receiver hardware costs is reduced by
    utilizing less number of energy harvesting circuitry.

    Limited Feedback in Single and Multi-user MIMO Systems with Finite-Bit ADCs

    Jianhua Mo, Robert W. Heath Jr
    Comments: 30 pages, 12 figures, submitted to IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    Communication systems with low-resolution analog-to-digital-converters (ADCs)
    can exploit channel state information at the transmitter and receiver. This
    paper presents codebook designs and performance analyses for limited feedback
    MIMO systems with finite-bit ADCs. A point-to-point single-user channel is
    firstly considered. When the received signal is sliced by 1-bit ADCs, the
    absolute phase at the receiver is important to align the phase of the received
    signals. A new codebook design for beamforming, which separately quantizes the
    channel direction and the residual phase, is therefore proposed. For the
    multi-bit case where the optimal transmission method is unknown, suboptimal
    Gaussian signaling and eigenvector beamforming is assumed to obtain a lower
    bound of the achievable rate. It is found that to limit the rate loss, more
    feedback bits are needed in the medium SNR regime than the low and high SNR
    regimes, which is quite different from the conventional infinite-bit ADC case.
    Second, a multi-user system where a multiple-antenna transmitter sends signals
    to multiple single-antenna receivers with finite-bit ADCs is considered. Based
    on the derived performance loss due to finite-bit ADCs and finite-bit CSI
    feedback, the number of bits per feedback should increase linearly with the ADC
    resolution in order to restrict the rate loss.

    Gbps User Rates Using mmWave Relayed Backhaul with High Gain Antennas

    Jinfeng Du, Efe Onaran, Dmitry Chizhik, Sivarama Venkatesan, Reinaldo A. Valenzuela
    Comments: Accepted for publication for IEEE Journal on Selected Areas in Communications (JSAC), Special Issue on Deployment and Performance Challenges for 5G
    Subjects: Information Theory (cs.IT)

    Delivering Gbps high user rate over long distances (around 1 km) is
    challenging, and the abundant spectrum available in millimeter wave band cannot
    solve the challenge by its own due to the severe path loss and other
    limitations. Since it is economically challenging to deploy wired backhaul
    every few hundred meters, relays (e.g., wireless access points) have been
    proposed to extend the coverage of a base station which has wired connection to
    the core network. These relays, deployed every few hundred meters, serve the
    users in their vicinity and are backhauled to the base station through wireless
    connections. In this work, the wireless relayed backhaul design has been
    formulated as a topology-bandwidth-power joint optimization problem, and the
    influence of path loss, angular spread, array size, and RF power limitation on
    the user rate has been evaluated. It has been shown that for a linear network
    deployed along the street at 28 GHz, when high joint directional gain (50 dBi)
    is available, 1 Gbps user rate within cell range of 1 km can be delivered using
    1.5 GHz of bandwidth (using single polarization antennas). The user rates drop
    precipitously when joint directional gain is reduced, or when the path loss is
    much more severe. When the number of RF chains is limited, the benefit of
    larger arrays will eventually be surpassed by the increased channel estimation
    penalty as the effective beamforming gain saturates owing to the channel
    angular spread.

    How Much Spectrum is Too Much in Millimeter Wave Wireless Access

    Jinfeng Du, Reinaldo A. Valenzuela
    Comments: Accepted for publication for IEEE Journal on Selected Areas in Communications (JSAC) Special Issue on Millimeter Wave Communications for Future Mobile Networks
    Subjects: Information Theory (cs.IT)

    A great increase in wireless access rates might be attainable by using the
    large amount of spectrum available in the millimeter wave (mmWave, 30 – 300
    GHz) band. However, due to higher propagation losses inherent in these
    frequencies, to use wider bandwidth for transmission at ranges beyond 100
    meters or in non-line-of-sight (NLOS) settings may be ineffective or even
    counterproductive when the penalty for estimating the channel is taken into
    account. In this work we quantify the maximum beneficial bandwidth for mmWave
    transmission in some typical deployment scenarios which use pilot-based channel
    estimation and assume a minimum mean square error (MMSE) channel estimator at
    the receiver. We find that for an I.I.D. block fading model with coherence time
    (T_c) and coherence bandwidth (B_c), for transmitters and receivers equipped
    with a single antenna, the optimal (rate-maximizing) signal-to-noise-ratio is a
    constant that only depends on the product (B_cT_c), which measures the channel
    coherence and equals the average number of orthogonal symbols per each
    independent channel coefficient. That is, for fixed channel coherence, the
    optimal bandwidth scales linearly with the received signal power. Under some
    typical deployment scenarios with both transmit and receive side beamforming, 1
    GHz bandwidth can be too much.

    A Low-Complexity Approach to Distributed Cooperative Caching with Geographic Constraints

    Konstantin Avrachenkov, Jasper Goseling, Berksan Serbetci
    Comments: 15 pages, 9 figures, accepted for presentation at SIGMETRICS’17 and for publication in the first issue of the PACM Series on Measurement and Analysis of Computing Systems (POMACS)
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    We consider caching in cellular networks in which each base station is
    equipped with a cache that can store a limited number of files. The popularity
    of the files is known and the goal is to place files in the caches such that
    the probability that a user at an arbitrary location in the plane will find the
    file that she requires in one of the covering caches is maximized.

    We develop distributed asynchronous algorithms for deciding which contents to
    store in which cache. Such cooperative algorithms require communication only
    between caches with overlapping coverage areas and can operate in asynchronous
    manner. The development of the algorithms is principally based on an
    observation that the problem can be viewed as a potential game. Our basic
    algorithm is derived from the best response dynamics. We demonstrate that the
    complexity of each best response step is independent of the number of files,
    linear in the cache capacity and linear in the maximum number of base stations
    that cover a certain area. Then, we show that the overall algorithm complexity
    for a discrete cache placement is polynomial in both network size and catalog
    size. In practical examples, the algorithm converges in just a few iterations.
    Also, in most cases of interest, the basic algorithm finds the best Nash
    equilibrium corresponding to the global optimum. We provide two extensions of
    our basic algorithm based on stochastic and deterministic simulated annealing
    which find the global optimum.

    Finally, we demonstrate the hit probability evolution on real and synthetic
    networks numerically and show that our distributed caching algorithm performs
    significantly better than storing the most popular content, probabilistic
    content placement policy and Multi-LRU caching policies.

    General three and four person two color Hat Game

    Theo van Uem
    Comments: 7 pages. arXiv admin note: text overlap with arXiv:1612.00276
    Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Information Theory (cs.IT)

    N distinguishable players are randomly fitted with a white or black hat,
    where the probabilities of getting a white or black hat may be different for
    each player, but known to all the players. All players guess simultaneously the
    color of their own hat observing only the hat colors of the other N-1 players.
    It is also allowed for each player to pass: no color is guessed. The team wins
    if at least one player guesses his hat color correctly and none of the players
    has an incorrect guess. No communication of any sort is allowed, except for an
    initial strategy session before the game begins. Our goal is to maximize the
    probability of winning the game and to describe winning strategies, using the
    concept of an adequate set. We find explicit solutions in case of N =3 and N
    =4.

    From Data to Decisions: Distributionally Robust Optimization is Optimal

    Bart P.G. Van Parys, Peyman Mohajerin Esfahani, Daniel Kuhn
    Comments: 30 pages, 2 figures. Submitted to Management Science
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)

    We study stochastic programs where the decision-maker cannot observe the
    distribution of the exogenous uncertainties but has access to a finite set of
    independent samples from this distribution. In this setting, the goal is to
    find a procedure that transforms the data to an estimate of the expected cost
    function under the unknown data-generating distribution, i.e., a predictor, and
    an optimizer of the estimated cost function that serves as a near-optimal
    candidate decision, i.e., a prescriptor. As functions of the data, predictors
    and prescriptors constitute statistical estimators. We propose a
    meta-optimization problem to find the least conservative predictors and
    prescriptors subject to constraints on their out-of-sample disappointment. The
    out-of-sample disappointment quantifies the probability that the actual
    expected cost of the candidate decision under the unknown true distribution
    exceeds its predicted cost. Leveraging tools from large deviations theory, we
    prove that this meta-optimization problem admits a unique solution: The best
    predictor-prescriptor pair is obtained by solving a distributionally robust
    optimization problem over all distributions within a given relative entropy
    distance from the empirical distribution of the data.




沪ICP备19023445号-2号
友情链接