IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Fri, 26 May 2017

    我爱机器学习(52ml.net)发表于 2017-05-26 00:00:00
    love 0

    Neural and Evolutionary Computing

    Neural Decomposition of Time-Series Data for Effective Generalization

    Luke B. Godfrey, Michael S. Gashler
    Comments: 13 pages, 11 figures, IEEE TNNLS Preprint
    Subjects: Neural and Evolutionary Computing (cs.NE)

    We present a neural network technique for the analysis and extrapolation of
    time-series data called Neural Decomposition (ND). Units with a sinusoidal
    activation function are used to perform a Fourier-like decomposition of
    training samples into a sum of sinusoids, augmented by units with nonperiodic
    activation functions to capture linear trends and other nonperiodic components.
    We show how careful weight initialization can be combined with regularization
    to form a simple model that generalizes well. Our method generalizes
    effectively on the Mackey-Glass series, a dataset of unemployment rates as
    reported by the U.S. Department of Labor Statistics, a time-series of monthly
    international airline passengers, the monthly ozone concentration in downtown
    Los Angeles, and an unevenly sampled time-series of oxygen isotope measurements
    from a cave in north India. We find that ND outperforms popular time-series
    forecasting techniques including LSTM, echo state networks, ARIMA, SARIMA, SVR
    with a radial basis function, and Gashler and Ashmore’s model.

    Deriving Neural Architectures from Sequence and Graph Kernels

    Tao Lei, Wengong Jin, Regina Barzilay, Tommi Jaakkola
    Comments: to appear at ICML 2017; includes additional discussions
    Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL)

    The design of neural architectures for structured objects is typically guided
    by experimental insights rather than a formal process. In this work, we appeal
    to kernels over combinatorial structures, such as sequences and graphs, to
    derive appropriate neural operations. We introduce a class of deep recurrent
    neural operations and formally characterize their associated kernel spaces. Our
    recurrent modules compare the input to virtual reference objects (cf. filters
    in CNN) via the kernels. Similar to traditional neural operations, these
    reference objects are parameterized and directly optimized in end-to-end
    training. We empirically evaluate the proposed class of neural architectures on
    standard applications such as language modeling and molecular graph regression,
    achieving state-of-the-art or competitive results across these applications. We
    also draw connections to existing architectures such as LSTMs.

    Filtering Variational Objectives

    Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, Yee Whye Teh
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    The evidence lower bound (ELBO) appears in many algorithms for maximum
    likelihood estimation (MLE) with latent variables because it is a sharp lower
    bound of the marginal log-likelihood. For neural latent variable models,
    optimizing the ELBO jointly in the variational posterior and model parameters
    produces state-of-the-art results. Inspired by the success of the ELBO as a
    surrogate MLE objective, we consider the extension of the ELBO to a family of
    lower bounds defined by a Monte Carlo estimator of the marginal likelihood. We
    show that the tightness of such bounds is asymptotically related to the
    variance of the underlying estimator. We introduce a special case, the
    filtering variational objectives (FIVOs), which takes the same arguments as the
    ELBO and passes them through a particle filter to form a tighter bound. FIVOs
    can be optimized tractably with stochastic gradients, and are particularly
    suited to MLE in sequential latent variable models. In standard sequential
    generative modeling tasks we present uniform improvements over models trained
    with ELBO, including some whole nat-per-timestep improvements.


    Computer Vision and Pattern Recognition

    Who Will Share My Image? Predicting the Content Diffusion Path in Online Social Networks

    Wenjian Hu, Krishna Kumar Singh, Fanyi Xiao, Jinyoung Han, Chen-Nee Chuah, Yong Jae Lee
    Comments: 10 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)

    Content popularity prediction has been extensively studied due to its
    importance and interest for both users and hosts of social media sites like
    Facebook, Instagram, Twitter, and Pinterest. However, existing work mainly
    focuses on modeling popularity using a single metric such as the total number
    of likes or shares. In this work, we propose Diffusion-LSTM, a memory-based
    deep recurrent network that learns to recursively predict the entire diffusion
    path of an image through a social network. By combining user social features
    and image features, and encoding the diffusion path taken thus far with an
    explicit memory cell, our model predicts the diffusion path of an image more
    accurately compared to alternate baselines that either encode only image or
    social features, or lack memory. By mapping individual users to user
    prototypes, our model can generalize to new users not seen during training.
    Finally, we demonstrate our model’s capability of generating diffusion trees,
    and show that the generated trees closely resemble ground-truth trees.

    Classification of Quantitative Light-Induced Fluorescence Images Using Convolutional Neural Network

    Sultan Imangaliyev, Monique H. van der Veen, Catherine M. C. Volgenant, Bruno G. Loos, Bart J. F. Keijser, Wim Crielaard, Evgeni Levin
    Comments: Full version of ICANN 2017 submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Images are an important data source for diagnosis and treatment of oral
    diseases. The manual classification of images may lead to misdiagnosis or
    mistreatment due to subjective errors. In this paper an image classification
    model based on Convolutional Neural Network is applied to Quantitative
    Light-induced Fluorescence images. The deep neural network outperforms other
    state of the art shallow classification models in predicting labels derived
    from three different dental plaque assessment scores. The model directly
    benefits from multi-channel representation of the images resulting in improved
    performance when, besides the Red colour channel, additional Green and Blue
    colour channels are used.

    Deep image representations using caption generators

    Konda Reddy Mopuri, Vishal B. Athreya, R. Venkatesh Babu
    Comments: ICME 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep learning exploits large volumes of labeled data to learn powerful
    models. When the target dataset is small, it is a common practice to perform
    transfer learning using pre-trained models to learn new task specific
    representations. However, pre-trained CNNs for image recognition are provided
    with limited information about the image during training, which is label alone.
    Tasks such as scene retrieval suffer from features learned from this weak
    supervision and require stronger supervision to better understand the contents
    of the image. In this paper, we exploit the features learned from caption
    generating models to learn novel task specific image representations. In
    particular, we consider the state-of-the art captioning system Show and
    Tell~cite{SnT-pami-2016} and the dense region description model
    DenseCap~cite{densecap-cvpr-2016}. We demonstrate that, owing to richer
    supervision provided during the process of training, the features learned by
    the captioning system perform better than those of CNNs. Further, we train a
    siamese network with a modified pair-wise loss to fuse the features learned
    by~cite{SnT-pami-2016} and~cite{densecap-cvpr-2016} and learn image
    representations suitable for retrieval. Experiments show that the proposed
    fusion exploits the complementary nature of the individual features and yields
    state-of-the art retrieval results on benchmark datasets.

    SLAM based Quasi Dense Reconstruction For Minimally Invasive Surgery Scenes

    Nader Mahmoud, Alexandre Hostettler, Toby Collins, Luc Soler, Christophe Doignon, J.M.M. Montiel
    Comments: ICRA 2017 workshop C4 Surgical Robots: Compliant, Continuum, Cognitive, and Collaborative
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recovering surgical scene structure in laparoscope surgery is crucial step
    for surgical guidance and augmented reality applications. In this paper, a
    quasi dense reconstruction algorithm of surgical scene is proposed. This is
    based on a state-of-the-art SLAM system, and is exploiting the initial
    exploration phase that is typically performed by the surgeon at the beginning
    of the surgery. We show how to convert the sparse SLAM map to a quasi dense
    scene reconstruction, using pairs of keyframe images and correlation-based
    featureless patch matching. We have validated the approach with a live porcine
    experiment using Computed Tomography as ground truth, yielding a Root Mean
    Squared Error of 4.9mm.

    Weakly Supervised Semantic Segmentation Based on Co-segmentation

    Tong Shen, Guosheng Lin, Lingqiao Liu, Chunhua Shen, Ian Reid
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Training a Fully Convolutional Network (FCN) for semantic segmentation
    requires a large number of pixel-level masks, which involves a large amount of
    human labour and time for annotation. In contrast, image-level labels are much
    easier to obtain. In this work, we propose a novel method for weakly supervised
    semantic segmentation with only image-level labels. The method relies on a
    large scale co-segmentation framework that can produce object masks for a group
    of images containing objects belonging to the same semantic class. We first
    retrieve images from search engines, e.g. Flickr and Google, using semantic
    class names as queries, e.g. class names in PASCAL VOC 2012. We then use high
    quality masks produced by co-segmentation on the retrieved images as well as
    the target dataset images with image level labels to train segmentation
    networks. We obtain IoU 56.9 on test set of PASCAL VOC 2012, which reaches
    state of the art performance.

    Extraction and Classification of Diving Clips from Continuous Video Footage

    Aiden Nibali, Zhen He, Stuart Morgan, Daniel Greenwood
    Comments: To appear at CVsports 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Due to recent advances in technology, the recording and analysis of video
    data has become an increasingly common component of athlete training
    programmes. Today it is incredibly easy and affordable to set up a fixed camera
    and record athletes in a wide range of sports, such as diving, gymnastics,
    golf, tennis, etc. However, the manual analysis of the obtained footage is a
    time-consuming task which involves isolating actions of interest and
    categorizing them using domain-specific knowledge. In order to automate this
    kind of task, three challenging sub-problems are often encountered: 1)
    temporally cropping events/actions of interest from continuous video; 2)
    tracking the object of interest; and 3) classifying the events/actions of
    interest.

    Most previous work has focused on solving just one of the above sub-problems
    in isolation. In contrast, this paper provides a complete solution to the
    overall action monitoring task in the context of a challenging real-world
    exemplar. Specifically, we address the problem of diving classification. This
    is a challenging problem since the person (diver) of interest typically
    occupies fewer than 1% of the pixels in each frame. The model is required to
    learn the temporal boundaries of a dive, even though other divers and
    bystanders may be in view. Finally, the model must be sensitive to subtle
    changes in body pose over a large number of frames to determine the
    classification code. We provide effective solutions to each of the sub-problems
    which combine to provide a highly functional solution to the task as a whole.
    The techniques proposed can be easily generalized to video footage recorded
    from other sports.

    Plug-and-Play Unplugged: Optimization Free Reconstruction using Consensus Equilibrium

    Gregery T. Buzzard, Suhas Sreehari, Charles A. Bouman
    Comments: 14 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)

    Regularized inversion methods for image reconstruction are used widely due to
    their tractability and their ability to combine complex physical sensor models
    with useful regularity criteria. Such methods were used in the recently
    developed Plug-and-Play prior method, which provides a framework to use
    advanced denoising algorithms as regularizers in inversion. However, the need
    to formulate regularized inversion as the solution to an optimization problem
    severely limits both the expressiveness of possible regularity conditions and
    the variety of provably convergent Plug-and-Play denoising operators.

    In this paper, we introduce the concept of consensus equilibrium (CE), which
    generalizes regularized inversion to include a much wider variety of regularity
    operators without the need for an optimization formulation. Consensus
    equilibrium is based on the solution of a set of equilibrium equations that
    balance data fit and regularity. In this framework, the problem of MAP
    estimation in regularized inversion is replaced by the problem of solving these
    equilibrium equations, which can be approached in multiple ways, including as a
    fixed point problem that generalizes the ADMM approach used in the
    Plug-and-Play method.

    We present the Douglas-Rachford (DR) algorithm for computing the CE solution
    as a fixed point and prove the convergence of this algorithm under conditions
    that include denoising operators that do not arise from optimization problems
    and that may not be nonexpansive. We give several examples to illustrate the
    idea of consensus equilibrium and the convergence properties of the DR
    algorithm and demonstrate this method on a sparse interpolation problem using
    electron microscopy data.

    Cultural Diffusion and Trends in Facebook Photographs

    Quanzeng You, Darío García-García, Mahohar Paluri, Jiebo Luo, Jungseock Joo
    Comments: 10 pages, To appear in ICWSM 2017 (Full Paper)
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)

    Online social media is a social vehicle in which people share various moments
    of their lives with their friends, such as playing sports, cooking dinner or
    just taking a selfie for fun, via visual means, that is, photographs. Our study
    takes a closer look at the popular visual concepts illustrating various
    cultural lifestyles from aggregated, de-identified photographs. We perform
    analysis both at macroscopic and microscopic levels, to gain novel insights
    about global and local visual trends as well as the dynamics of interpersonal
    cultural exchange and diffusion among Facebook friends. We processed images by
    automatically classifying the visual content by a convolutional neural network
    (CNN). Through various statistical tests, we find that socially tied
    individuals more likely post images showing similar cultural lifestyles. To
    further identify the main cause of the observed social correlation, we use the
    Shuffle test and the Preference-based Matched Estimation (PME) test to
    distinguish the effects of influence and homophily. The results indicate that
    the visual content of each user’s photographs are temporally, although not
    necessarily causally, correlated with the photographs of their friends, which
    may suggest the effect of influence. Our paper demonstrates that Facebook
    photographs exhibit diverse cultural lifestyles and preferences and that the
    social interaction mediated through the visual channel in social media can be
    an effective mechanism for cultural diffusion.

    Novel Deep Convolution Neural Network Applied to MRI Cardiac Segmentation

    Clement Zotti, Zhiming Luo, Alain Lalande, Olivier Humbert, Pierre-Marc Jodoin
    Comments: 14 pages, 4 tables, 7 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we propose a fully automatic MRI cardiac segmentation method
    based on a novel deep convolutional neural network (CNN). As opposed to most
    cardiac segmentation methods which focus on the left ventricle (and especially
    the left cavity), our method segments both the left ventricular cavity, the
    left ventricular epicardium, and the right ventricular cavity. The novelty of
    our network lies in its maximum a posteriori loss function, which is
    specifically designed for the cardiac anatomy. Our loss function incorporates
    the cross-entropy of the predicted labels, the predicted contours, a cardiac
    shape prior, and an a priori term. Our model also includes a cardiac
    center-of-mass regression module which allows for an automatic shape prior
    registration. Also, since our method processes raw MR images without any manual
    preprocessing and/or image cropping, our CNN learns both high-level features
    (useful to distinguish the heart from other organs with a similar shape) and
    low-level features (useful to get accurate segmentation results). Those
    features are learned with a multi-resolution conv-deconv “grid” architecture
    which can be seen as an extension of the U-Net.

    We trained and tested our model on the ACDC MICCAI’17 challenge dataset of
    150 patients whose diastolic and systolic images were manually outlined by 2
    medical experts. Results reveal that our method can segment all three regions
    of a 3D MRI cardiac volume in (0.4) second with an average Dice index of
    (0.90), which is significantly better than state-of-the-art deep learning
    methods.

    Attention-based Natural Language Person Retrieval

    Tao Zhou, Muhao Chen, Jie Yu, Demetri Terzopoulos
    Comments: CVPR 2017 Workshop (vision meets cognition)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Following the recent progress in image classification and captioning using
    deep learning, we develop a novel natural language person retrieval system
    based on an attention mechanism. More specifically, given the description of a
    person, the goal is to localize the person in an image. To this end, we first
    construct a benchmark dataset for natural language person retrieval. To do so,
    we generate bounding boxes for persons in a public image dataset from the
    segmentation masks, which are then annotated with descriptions and attributes
    using the Amazon Mechanical Turk. We then adopt a region proposal network in
    Faster R-CNN as a candidate region generator. The cropped images based on the
    region proposals as well as the whole images with attention weights are fed
    into Convolutional Neural Networks for visual feature extraction, while the
    natural language expression and attributes are input to Bidirectional Long
    Short- Term Memory (BLSTM) models for text feature extraction. The visual and
    text features are integrated to score region proposals, and the one with the
    highest score is retrieved as the output of our system. The experimental
    results show significant improvement over the state-of-the-art method for
    generic object retrieval and this line of research promises to benefit search
    in surveillance video footage.

    Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework

    Lei Deng, Peng Jiao, Jing Pei, Zhenzhi Wu, Guoqi Li
    Comments: 9 pages, 10 figures
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    There is a pressing need to build an architecture that could subsume these
    networks undera unified framework that achieves both higher performance and
    less overhead. To this end, two fundamental issues are yet to be addressed. The
    first one is how to implement the back propagation when neuronal activations
    are discrete. The second one is how to remove the full-precision hidden weights
    in the training phase to break the bottlenecks of memory/computation
    consumption. To address the first issue, we present a multistep neuronal
    activation discretization method and a derivative approximation technique that
    enable the implementing the back propagation algorithm on discrete DNNs. While
    for the second issue, we propose a discrete state transition (DST) methodology
    to constrain the weights in a discrete space without saving the hidden weights.
    In this way, we build a unified framework that subsumes the binary or ternary
    networks as its special cases.More particularly, we find that when both the
    weights and activations become ternary values, the DNNs can be reduced to gated
    XNOR networks (or sparse binary networks) since only the event of non-zero
    weight and non-zero activation enables the control gate to start the XNOR logic
    operations in the original binary networks. This promises the event-driven
    hardware design for efficient mobile intelligence. We achieve advanced
    performance compared with state-of-the-art algorithms. Furthermore,the
    computational sparsity and the number of states in the discrete space can be
    flexibly modified to make it suitable for various hardware platforms.

    First-spike based visual categorization using reward-modulated STDP

    Milad Mozafari, Saeed Reza Kheradpisheh, Timothée Masquelier, Abbas Nowzari-Dalini, Mohammad Ganjtabesh
    Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV)

    Reinforcement learning (RL) has recently regained popularity, with major
    achievements such as beating the European game of Go champion. Here, for the
    first time, we show that RL can be used efficiently to train a spiking neural
    network (SNN) to perform object recognition in natural images without using an
    external classifier. We used a feedforward convolutional SNN and a temporal
    coding scheme where the most strongly activated neurons fire first, while less
    activated ones fire later, or not at all. In the highest layers, each neuron
    was assigned to an object category, and it was assumed that the stimulus
    category was the category of the first neuron to fire. If this assumption was
    correct, the neuron was rewarded, i.e. spike-timing-dependent plasticity (STDP)
    was applied, which reinforced the neuron’s selectivity. Otherwise, anti-STDP
    was applied, which encouraged the neuron to learn something else. As
    demonstrated on various image datasets (Caltech, ETH-80, and NORB), this reward
    modulated STDP (R-STDP) approach extracted particularly discriminative visual
    features, whereas classic unsupervised STDP extracts any feature that
    consistently repeats. As a result, R-STDP outperformed STDP on these datasets.
    Furthermore, R-STDP is suitable for online learning, and can adapt to drastic
    changes such as label permutations. Finally, it is worth mentioning that both
    feature extraction and classification were done with spikes, using at most one
    spike per neuron. Thus the network is hardware friendly and energy efficient.

    Visual Servoing from Deep Neural Networks

    Quentin Bateux, Eric Marchand, Jurgen Leitner, Francois Chaumette
    Comments: 6 pages, 7 figures
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    We present a deep neural network-based method to perform high-precision,
    robust and real-time 6 DOF visual servoing. The paper describes how to create a
    dataset simulating various perturbations (occlusions and lighting conditions)
    from a single real-world image of the scene. A convolutional neural network is
    fine-tuned using this dataset to estimate the relative pose between two images
    of the same scene. The output of the network is then employed in a visual
    servoing control scheme. The method converges robustly even in difficult
    real-world settings with strong lighting variations and occlusions.A
    positioning error of less than one millimeter is obtained in experiments with a
    6 DOF robot.

    Unsupervised Learning Layers for Video Analysis

    Liang Zhao, Yang Wang, Yi Yang, Wei Xu
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    This paper presents two unsupervised learning layers (UL layers) for
    label-free video analysis: one for fully connected layers, and the other for
    convolutional ones. The proposed UL layers can play two roles: they can be the
    cost function layer for providing global training signal; meanwhile they can be
    added to any regular neural network layers for providing local training signals
    and combined with the training signals backpropagated from upper layers for
    extracting both slow and fast changing features at layers of different depths.
    Therefore, the UL layers can be used in either pure unsupervised or
    semi-supervised settings. Both a closed-form solution and an online learning
    algorithm for two UL layers are provided. Experiments with unlabeled synthetic
    and real-world videos demonstrated that the neural networks equipped with UL
    layers and trained with the proposed online learning algorithm can extract
    shape and motion information from video sequences of moving objects. The
    experiments demonstrated the potential applications of UL layers and online
    learning algorithm to head orientation estimation and moving object
    localization.


    Artificial Intelligence

    Neural Attribute Machines for Program Generation

    Matthew Amodio, Swarat Chaudhuri, Thomas Reps
    Subjects: Artificial Intelligence (cs.AI); Programming Languages (cs.PL)

    Recurrent neural networks have achieved remarkable success at generating
    sequences with complex structures, thanks to advances that include richer
    embeddings of input and cures for vanishing gradients. Trained only on
    sequences from a known grammar, though, they can still struggle to learn rules
    and constraints of the grammar.

    Neural Attribute Machines (NAMs) are equipped with a logical machine that
    represents the underlying grammar, which is used to teach the constraints to
    the neural machine by (i) augmenting the input sequence, and (ii) optimizing a
    custom loss function. Unlike traditional RNNs, NAMs are exposed to the grammar,
    as well as samples from the language of the grammar. During generation, NAMs
    make significantly fewer violations of the constraints of the underlying
    grammar than RNNs trained only on samples from the language of the grammar.

    Finding Robust Solutions to Stable Marriage

    Begum Genc, Mohamed Siala, Barry O'Sullivan, Gilles Simonin
    Comments: Accepted for IJCAI 2017
    Subjects: Artificial Intelligence (cs.AI)

    We study the notion of robustness in stable matching problems. We first
    define robustness by introducing (a,b)-supermatches. An ((a,b))-supermatch is a
    stable matching in which if (a) pairs break up it is possible to find another
    stable matching by changing the partners of those (a) pairs and at most (b)
    other pairs. In this context, we define the most robust stable matching as a
    ((1,b))-supermatch where b is minimum. We show that checking whether a given
    stable matching is a ((1,b))-supermatch can be done in polynomial time. Next,
    we use this procedure to design a constraint programming model, a local search
    approach, and a genetic algorithm to find the most robust stable matching. Our
    empirical evaluation on large instances show that local search outperforms the
    other approaches.

    An Empirical Analysis of Approximation Algorithms for the Euclidean Traveling Salesman Problem

    Yihui He, Ming Xiang
    Comments: 4 pages, 5 figures
    Subjects: Artificial Intelligence (cs.AI)

    With applications to many disciplines, the traveling salesman problem (TSP)
    is a classical computer science optimization problem with applications to
    industrial engineering, theoretical computer science, bioinformatics, and
    several other disciplines. In recent years, there have been a plethora of novel
    approaches for approximate solutions ranging from simplistic greedy to
    cooperative distributed algorithms derived from artificial intelligence. In
    this paper, we perform an evaluation and analysis of cornerstone algorithms for
    the Euclidean TSP. We evaluate greedy, 2-opt, and genetic algorithms. We use
    several datasets as input for the algorithms including a small dataset, a
    mediumsized dataset representing cities in the United States, and a synthetic
    dataset consisting of 200 cities to test algorithm scalability. We discover
    that the greedy and 2-opt algorithms efficiently calculate solutions for
    smaller datasets. Genetic algorithm has the best performance for optimality for
    medium to large datasets, but generally have longer runtime. Our
    implementations is public available.

    Cross-Domain Perceptual Reward Functions

    Ashley D. Edwards, Charles L. Isbell Jr
    Comments: A shorter version of this paper was accepted to RLDM (this http URL)
    Subjects: Artificial Intelligence (cs.AI)

    In reinforcement learning, we often define goals by specifying rewards within
    desirable states. One problem with this approach is that we typically need to
    redefine the rewards each time the goal changes, which often requires some
    understanding of the solution in the agents environment. When humans are
    learning to complete tasks, we regularly utilize alternative sources that guide
    our understanding of the problem. Such task representations allow one to
    specify goals on their own terms, thus providing specifications that can be
    appropriately interpreted across various environments. This motivates our own
    work, in which we represent goals in environments that are different from the
    agents. We introduce Cross-Domain Perceptual Reward (CDPR) functions, learned
    rewards that represent the visual similarity between an agents state and a
    cross-domain goal image. We report results for learning the CDPRs with a deep
    neural network and using them to solve two tasks with deep reinforcement
    learning.

    State Space Decomposition and Subgoal Creation for Transfer in Deep Reinforcement Learning

    Himanshu Sahni, Saurabh Kumar, Farhan Tejani, Yannick Schroecker, Charles Isbell
    Comments: 5 pages, 6 figures; 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2017), Ann Arbor, Michigan
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Typical reinforcement learning (RL) agents learn to complete tasks specified
    by reward functions tailored to their domain. As such, the policies they learn
    do not generalize even to similar domains. To address this issue, we develop a
    framework through which a deep RL agent learns to generalize policies from
    smaller, simpler domains to more complex ones using a recurrent attention
    mechanism. The task is presented to the agent as an image and an instruction
    specifying the goal. This meta-controller guides the agent towards its goal by
    designing a sequence of smaller subtasks on the part of the state space within
    the attention, effectively decomposing it. As a baseline, we consider a setup
    without attention as well. Our experiments show that the meta-controller learns
    to create subgoals within the attention.

    Logic Tensor Networks for Semantic Image Interpretation

    Ivan Donadello, Luciano Serafini, Artur d'Avila Garcez
    Comments: 14 pages, 2 figures, IJCAI 2017
    Subjects: Artificial Intelligence (cs.AI)

    Semantic Image Interpretation (SII) is the task of extracting structured
    semantic descriptions from images. It is widely agreed that the combined use of
    visual data and background knowledge is of great importance for SII. Recently,
    Statistical Relational Learning (SRL) approaches have been developed for
    reasoning under uncertainty and learning in the presence of data and rich
    knowledge. Logic Tensor Networks (LTNs) are an SRL framework which integrates
    neural networks with first-order fuzzy logic to allow (i) efficient learning
    from noisy data in the presence of logical constraints, and (ii) reasoning with
    logical formulas describing general properties of the data. In this paper, we
    develop and apply LTNs to two of the main tasks of SII, namely, the
    classification of an image’s bounding boxes and the detection of the relevant
    part-of relations between objects. To the best of our knowledge, this is the
    first successful application of SRL to such SII tasks. The proposed approach is
    evaluated on a standard image processing benchmark. Experiments show that the
    use of background knowledge in the form of logical constraints can improve the
    performance of purely data-driven approaches, including the state-of-the-art
    Fast Region-based Convolutional Neural Networks (Fast R-CNN). Moreover, we show
    that the use of logical background knowledge adds robustness to the learning
    system when errors are present in the labels of the training data.

    Efficient, Safe, and Probably Approximately Complete Learning of Action Models

    Roni Stern, Brendan Juba
    Journal-ref: International Joint Conference on Artificial Intelligence (IJCAI)
    2017
    Subjects: Artificial Intelligence (cs.AI)

    In this paper we explore the theoretical boundaries of planning in a setting
    where no model of the agent’s actions is given. Instead of an action model, a
    set of successfully executed plans are given and the task is to generate a plan
    that is safe, i.e., guaranteed to achieve the goal without failing. To this
    end, we show how to learn a conservative model of the world in which actions
    are guaranteed to be applicable. This conservative model is then given to an
    off-the-shelf classical planner, resulting in a plan that is guaranteed to
    achieve the goal. However, this reduction from a model-free planning to a
    model-based planning is not complete: in some cases a plan will not be found
    even when such exists. We analyze the relation between the number of observed
    plans and the likelihood that our conservative approach will indeed fail to
    solve a solvable problem. Our analysis show that the number of trajectories
    needed scales gracefully.

    Counterfactual Multi-Agent Policy Gradients

    Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson
    Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

    Cooperative multi-agent systems can be naturally used to model many real
    world problems, such as network packet routing and the coordination of
    autonomous vehicles. There is a great need for new reinforcement learning
    methods that can efficiently learn decentralised policies for such systems. To
    this end, we propose a new multi-agent actor-critic method called
    counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised
    critic to estimate the Q-function and decentralised actors to optimise the
    agents’ policies. In addition, to address the challenges of multi-agent credit
    assignment, it uses a counterfactual baseline that marginalises out a single
    agent’s action, while keeping the other agents’ actions fixed. COMA also uses a
    critic representation that allows the counterfactual baseline to be computed
    efficiently in a single forward pass. We evaluate COMA in the testbed of
    StarCraft unit micromanagement, using a decentralised variant with significant
    partial observability. COMA significantly improves average performance over
    other multi-agent actor-critic methods in this setting, and the best performing
    agents are competitive with state-of-the-art centralised controllers that get
    access to the full state.

    Filtering Variational Objectives

    Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, Yee Whye Teh
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    The evidence lower bound (ELBO) appears in many algorithms for maximum
    likelihood estimation (MLE) with latent variables because it is a sharp lower
    bound of the marginal log-likelihood. For neural latent variable models,
    optimizing the ELBO jointly in the variational posterior and model parameters
    produces state-of-the-art results. Inspired by the success of the ELBO as a
    surrogate MLE objective, we consider the extension of the ELBO to a family of
    lower bounds defined by a Monte Carlo estimator of the marginal likelihood. We
    show that the tightness of such bounds is asymptotically related to the
    variance of the underlying estimator. We introduce a special case, the
    filtering variational objectives (FIVOs), which takes the same arguments as the
    ELBO and passes them through a particle filter to form a tighter bound. FIVOs
    can be optimized tractably with stochastic gradients, and are particularly
    suited to MLE in sequential latent variable models. In standard sequential
    generative modeling tasks we present uniform improvements over models trained
    with ELBO, including some whole nat-per-timestep improvements.

    Online Edge Grafting for Efficient MRF Structure Learning

    Walid Chaabene, Bert Huang
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Incremental methods for structure learning of pairwise Markov random fields
    (MRFs), such as grafting, improve scalability to large systems by avoiding
    inference over the entire feature space in each optimization step. Instead,
    inference is performed over an incrementally grown active set of features. In
    this paper, we address the computational bottlenecks that current techniques
    still suffer by introducing online edge grafting, an incremental, structured
    method that activates edges as groups of features in a streaming setting. The
    framework is based on reservoir sampling of edges that satisfy a necessary
    activation condition, approximating the search for the optimal edge to
    activate. Online edge grafting performs an informed edge search set
    reorganization using search history and structure heuristics. Experiments show
    a significant computational speedup for structure learning and a controllable
    trade-off between the speed and the quality of learning.

    Principled Hybrids of Generative and Discriminative Domain Adaptation

    Han Zhao, Zhenyao Zhu, Junjie Hu, Adam Coates, Geoff Gordon
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    We propose a probabilistic framework for domain adaptation that blends both
    generative and discriminative modeling in a principled way. By maximizing both
    the marginal and the conditional log-likelihoods, models derived from this
    framework can use both labeled instances from the source domain as well as
    unlabeled instances from both source and target domains. Under this framework,
    we show that the popular reconstruction loss of autoencoder corresponds to an
    upper bound of the negative marginal log-likelihoods of unlabeled instances,
    where marginal distributions are given by proper kernel density estimations.
    This provides a way to interpret the empirical success of autoencoders in
    domain adaptation and semi-supervised learning. We instantiate our framework
    using neural networks, and build a concrete model, DAuto. Empirically, we
    demonstrate the effectiveness of DAuto on text, image and speech datasets,
    showing that it outperforms related competitors when domain adaptation is
    possible.

    Modeling The Intensity Function Of Point Process Via Recurrent Neural Networks

    Shuai Xiao, Junchi Yan, Stephen M. Chu, Xiaokang Yang, Hongyuan Zha
    Comments: Accepted at Thirty-First AAAI Conference on Artificial Intelligence (AAAI17)
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Event sequence, asynchronously generated with random timestamp, is ubiquitous
    among applications. The precise and arbitrary timestamp can carry important
    clues about the underlying dynamics, and has lent the event data fundamentally
    different from the time-series whereby series is indexed with fixed and equal
    time interval. One expressive mathematical tool for modeling event is point
    process. The intensity functions of many point processes involve two
    components: the background and the effect by the history. Due to its inherent
    spontaneousness, the background can be treated as a time series while the other
    need to handle the history events. In this paper, we model the background by a
    Recurrent Neural Network (RNN) with its units aligned with time series indexes
    while the history effect is modeled by another RNN whose units are aligned with
    asynchronous events to capture the long-range dynamics. The whole model with
    event type and timestamp prediction output layers can be trained end-to-end.
    Our approach takes an RNN perspective to point process, and models its
    background and history effect. For utility, our method allows a black-box
    treatment for modeling the intensity which is often a pre-defined parametric
    form in point processes. Meanwhile end-to-end training opens the venue for
    reusing existing rich techniques in deep network for point process modeling. We
    apply our model to the predictive maintenance problem using a log dataset by
    more than 1000 ATMs from a global bank headquartered in North America.

    Compiling Quantum Circuits to Realistic Hardware Architectures using Temporal Planners

    Davide Venturelli, Minh Do, Eleanor Rieffel, Jeremy Frank
    Journal-ref: related to proceedings of IJCAI 2017, and ICAPS SPARK Workshop
    2017
    Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Systems and Control (cs.SY)

    To run quantum algorithms on emerging gate-model quantum hardware, quantum
    circuits must be compiled to take into account constraints on the hardware. For
    near-term hardware, with only limited means to mitigate decoherence, it is
    critical to minimize the duration of the circuit. We investigate the
    application of temporal planners to the problem of compiling quantum circuits
    to newly emerging quantum hardware. While our approach is general, we focus on
    compiling to superconducting hardware architectures with nearest neighbor
    constraints. Our initial experiments focus on compiling Quantum Approximate
    Optimization Algorithm (QAOA) circuits whose high number of commuting gates
    allow great flexibility in the order in which the gates can be applied. That
    freedom makes it more challenging to find optimal compilations but also means
    there is a greater potential win from more optimized compilation than for less
    flexible circuits. We map this quantum circuit compilation problem to a
    temporal planning problem, and generated a test suite of compilation problems
    for QAOA circuits of various sizes to a realistic hardware architecture. We
    report compilation results from several state-of-the-art temporal planners on
    this test set. compile circuits of various sizes to a realistic hardware. This
    early empirical evaluation demonstrates that temporal planning is a viable
    approach to quantum circuit compilation.

    Beyond Parity: Fairness Objectives for Collaborative Filtering

    Sirui Yao, Bert Huang
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    We study fairness in collaborative-filtering recommender systems, which are
    sensitive to discrimination that exists in historical data. Biased data can
    lead collaborative-filtering methods to make unfair predictions for users from
    minority groups. We identify the insufficiency of existing fairness metrics and
    propose four new metrics that address different forms of unfairness. These
    fairness metrics can be optimized by adding fairness terms to the learning
    objective. Experiments on synthetic and real data show that our new metrics can
    better measure fairness than the baseline, and that the fairness objectives
    effectively help reduce unfairness.


    Computation and Language

    Learning Structured Text Representations

    Yang Liu, Mirella Lapata
    Subjects: Computation and Language (cs.CL)

    In this paper, we focus on learning structure-aware document representations
    from data without recourse to a discourse parser or additional annotations.
    Drawing inspiration from recent efforts to empower neural networks with a
    structural bias, we propose a model that can encode a document while
    automatically inducing rich structural dependencies. Specifically, we embed a
    differentiable non-projective parsing algorithm into a neural model and use
    attention mechanisms to incorporate the structural biases. Experimental
    evaluation across different tasks and datasets shows that the proposed model
    achieves state-of-the-art results on document modeling tasks while inducing
    intermediate structures which are both interpretable and meaningful.

    Jointly Learning Sentence Embeddings and Syntax with Unsupervised Tree-LSTMs

    Jean Maillard, Stephen Clark, Dani Yogatama
    Subjects: Computation and Language (cs.CL)

    We introduce a neural network that represents sentences by composing their
    words according to induced binary parse trees. We use Tree-LSTM as our
    composition function, applied along a tree structure found by a fully
    differentiable natural language chart parser. Our model simultaneously
    optimises both the composition function and the parser, thus eliminating the
    need for externally-provided parse trees which are normally required for
    Tree-LSTM. It can therefore be seen as a tree-based RNN that is unsupervised
    with respect to the parse trees. As it is fully differentiable, our model is
    easily trained with an off-the-shelf gradient descent method and
    backpropagation. We demonstrate that it achieves better performance compared to
    various supervised Tree-LSTM architectures on a textual entailment task and a
    reverse dictionary task.

    Max-Cosine Matching Based Neural Models for Recognizing Textual Entailment

    Zhipeng Xie, Junfeng Hu
    Journal-ref: DASFAA (1) 2017: 295-308
    Subjects: Computation and Language (cs.CL)

    Recognizing textual entailment is a fundamental task in a variety of text
    mining or natural language processing applications. This paper proposes a
    simple neural model for RTE problem. It first matches each word in the
    hypothesis with its most-similar word in the premise, producing an augmented
    representation of the hypothesis conditioned on the premise as a sequence of
    word pairs. The LSTM model is then used to model this augmented sequence, and
    the final output from the LSTM is fed into a softmax layer to make the
    prediction. Besides the base model, in order to enhance its performance, we
    also proposed three techniques: the integration of multiple word-embedding
    library, bi-way integration, and ensemble based on model averaging.
    Experimental results on the SNLI dataset have shown that the three techniques
    are effective in boosting the predicative accuracy and that our method
    outperforms several state-of-the-state ones.

    Deep Voice 2: Multi-Speaker Neural Text-to-Speech

    Sercan Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou
    Comments: Submitted to NIPS 2017
    Subjects: Computation and Language (cs.CL)

    We introduce a technique for augmenting neural text-to-speech (TTS) with
    lowdimensional trainable speaker embeddings to generate different voices from a
    single model. As a starting point, we show improvements over the two
    state-ofthe-art approaches for single-speaker neural TTS: Deep Voice 1 and
    Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with
    Deep Voice 1, but constructed with higher performance building blocks and
    demonstrates a significant audio quality improvement over Deep Voice 1. We
    improve Tacotron by introducing a post-processing neural vocoder, and
    demonstrate a significant audio quality improvement. We then demonstrate our
    technique for multi-speaker speech synthesis for both Deep Voice 2 and Tacotron
    on two multi-speaker TTS datasets. We show that a single neural TTS system can
    learn hundreds of unique voices from less than half an hour of data per
    speaker, while achieving high audio quality synthesis and preserving the
    speaker identities almost perfectly.

    Joint PoS Tagging and Stemming for Agglutinative Languages

    Necva Bölücü, Burcu Can
    Comments: 12 pages with 3 figures, accepted and presented at the CICLING 2017 – 18th International Conference on Intelligent Text Processing and Computational Linguistics
    Journal-ref: CICLING 2017
    Subjects: Computation and Language (cs.CL)

    The number of word forms in agglutinative languages is theoretically infinite
    and this variety in word forms introduces sparsity in many natural language
    processing tasks. Part-of-speech tagging (PoS tagging) is one of these tasks
    that often suffers from sparsity. In this paper, we present an unsupervised
    Bayesian model using Hidden Markov Models (HMMs) for joint PoS tagging and
    stemming for agglutinative languages. We use stemming to reduce sparsity in PoS
    tagging. Two tasks are jointly performed to provide a mutual benefit in both
    tasks. Our results show that joint POS tagging and stemming improves PoS
    tagging scores. We present results for Turkish and Finnish as agglutinative
    languages and English as a morphologically poor language.

    Towards a Knowledge Graph based Speech Interface

    Ashwini Jaya Kumar, Sören Auer, Christoph Schmidt, Joachim köhler
    Comments: Under Review in International Workshop on Grounding Language Understanding, Satellite of Interspeech 2017
    Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL)

    Applications which use human speech as an input require a speech interface
    with high recognition accuracy. The words or phrases in the recognised text are
    annotated with a machine-understandable meaning and linked to knowledge graphs
    for further processing by the target application. These semantic annotations of
    recognised words can be represented as a subject-predicate-object triples which
    collectively form a graph often referred to as a knowledge graph. This type of
    knowledge representation facilitates to use speech interfaces with any spoken
    input application, since the information is represented in logical, semantic
    form, retrieving and storing can be followed using any web standard query
    languages. In this work, we develop a methodology for linking speech input to
    knowledge graphs and study the impact of recognition errors in the overall
    process. We show that for a corpus with lower WER, the annotation and linking
    of entities to the DBpedia knowledge graph is considerable. DBpedia Spotlight,
    a tool to interlink text documents with the linked open data is used to link
    the speech recognition output to the DBpedia knowledge graph. Such a
    knowledge-based speech recognition interface is useful for applications such as
    question answering or spoken dialog systems.

    Deriving Neural Architectures from Sequence and Graph Kernels

    Tao Lei, Wengong Jin, Regina Barzilay, Tommi Jaakkola
    Comments: to appear at ICML 2017; includes additional discussions
    Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL)

    The design of neural architectures for structured objects is typically guided
    by experimental insights rather than a formal process. In this work, we appeal
    to kernels over combinatorial structures, such as sequences and graphs, to
    derive appropriate neural operations. We introduce a class of deep recurrent
    neural operations and formally characterize their associated kernel spaces. Our
    recurrent modules compare the input to virtual reference objects (cf. filters
    in CNN) via the kernels. Similar to traditional neural operations, these
    reference objects are parameterized and directly optimized in end-to-end
    training. We empirically evaluate the proposed class of neural architectures on
    standard applications such as language modeling and molecular graph regression,
    achieving state-of-the-art or competitive results across these applications. We
    also draw connections to existing architectures such as LSTMs.

    Matroids Hitting Sets and Unsupervised Dependency Grammar Induction

    Nicholas Harvey, David Karger, Virginia Savova, Leonid Peshkin
    Subjects: Discrete Mathematics (cs.DM); Computation and Language (cs.CL); Data Structures and Algorithms (cs.DS)

    This paper formulates a novel problem on graphs: find the minimal subset of
    edges in a fully connected graph, such that the resulting graph contains all
    spanning trees for a set of specifed sub-graphs. This formulation is motivated
    by an un-supervised grammar induction problem from computational linguistics.
    We present a reduction to some known problems and algorithms from graph theory,
    provide computational complexity results, and describe an approximation
    algorithm.


    Distributed, Parallel, and Cluster Computing

    Is Our Model for Contention Resolution Wrong?

    William C. Cooper, Maxwell Young
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    Randomized binary exponential backoff (BEB) is a popular algorithm for
    coordinating access to a shared channel. With an operational history exceeding
    four decades, BEB is currently an important component of several wireless
    standards.

    Despite this track record, prior theoretical results indicate that under
    bursty traffic (1) BEB yields poor makespan and (2) superior algorithms are
    possible. To date, the degree to which these findings manifest in practice has
    not been resolved.

    To address this issue, we examine one of the strongest cases against BEB: (n)
    packets that simultaneously begin contending for the wireless channel. Using
    Network Simulator 3, we compare against more recent algorithms that are
    inspired by BEB, but whose makespan guarantees are superior. Surprisingly, we
    discover that these newer algorithms significantly underperform.

    Through further investigation, we identify as the culprit a flawed but common
    abstraction regarding the cost of collisions. Our experimental results are
    complemented by analytical arguments that the number of collisions — and not
    solely makespan — is an important metric to optimize. We believe that these
    findings have implications for the design of contention-resolution algorithms.

    Load Balancing for Skewed Streams on Heterogeneous Cluster

    Muhammad Anis Uddin Nasir, Hiroshi Horii, Marco Serafini, Nicolas Kourtellis, Rudy Raymond, Sarunas Girdzijauskas, Takayuki Osogami
    Comments: 13 pages, under submission
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Primitive partitioning strategies for streaming applications operate
    efficiently under two very strict assumptions: the resources are homogeneous
    and the messages are drawn from a uniform key distribution. These assumptions
    are often not true for the real-world use cases. Dealing with heterogeneity and
    non-uniform workload requires inferring the resource capacities and input
    distribution at run time. However, gathering these statistics and finding an
    optimal placement often become a challenge when microsecond latency is desired.
    In this paper, we address the load balancing problem for streaming engines
    running on a heterogeneous cluster and processing skewed workload. In doing so,
    we propose a novel partitioning strategy called Consistent Grouping (CG) that
    is inspired by traditional consistent hashing. CG is a lightweight distributed
    strategy that enables each processing element instance (PEIs) to process the
    workload according to its capacity. The main idea behind CG is the notion of
    equal-sized virtual workers at the sources, which are assigned to workers based
    on their capacities. We provide a theoretical analysis of the proposed
    algorithm and show via extensive empirical evaluation that the proposed scheme
    outperforms the state-of-the-art approaches. In particular, CG achieves 3.44x
    superior performance in terms of latency compared to key grouping, which is the
    state-of-the-art grouping strategy for stateful streaming applications.

    Triangle Finding and Listing in CONGEST Networks

    Taisuke Izumi, François Le Gall
    Comments: To appear in PODC 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    Triangle-free graphs play a central role in graph theory, and triangle
    detection (or triangle finding) as well as triangle enumeration (triangle
    listing) play central roles in the field of graph algorithms. In distributed
    computing, algorithms with sublinear round complexity for triangle finding and
    listing have recently been developed in the powerful CONGEST clique model,
    where communication is allowed between any two nodes of the network. In this
    paper we present the first algorithms with sublinear complexity for triangle
    finding and triangle listing in the standard CONGEST model, where the
    communication topology is the same as the topology of the network. More
    precisely, we give randomized algorithms for triangle finding and listing with
    round complexity (O(n^{2/3}(log n)^{2/3})) and (O(n^{3/4}log n)),
    respectively, where (n) denotes the number of nodes of the network. We also
    show a lower bound (Omega(n^{1/3}/log n)) on the round complexity of triangle
    listing, which also holds for the CONGEST clique model.

    Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

    Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jio Hsieh, Wei Zhang, Ji Liu
    Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Machine Learning (stat.ML)

    Most distributed machine learning systems nowadays, including TensorFlow and
    CNTK, are built in a centralized fashion. One bottleneck of centralized
    algorithms lies on high communication cost on the central node. Motivated by
    this, we ask, can decentralized algorithms be faster than its centralized
    counterpart?

    Although decentralized PSGD (D-PSGD) algorithms have been studied by the
    control community, existing analysis and theory do not show any advantage over
    centralized PSGD (C-PSGD) algorithms, simply assuming the application scenario
    where only the decentralized network is available. In this paper, we study a
    D-PSGD algorithm and provide the first theoretical analysis that indicates a
    regime in which decentralized algorithms might outperform centralized
    algorithms for distributed stochastic gradient descent. This is because D-PSGD
    has comparable total computational complexities to C-PSGD but requires much
    less communication cost on the busiest node. We further conduct an empirical
    study to validate our theoretical analysis across multiple frameworks (CNTK and
    Torch), different network configurations, and computation platforms up to 112
    GPUs. On network configurations with low bandwidth or high latency, D-PSGD can
    be up to one order of magnitude faster than its well-optimized centralized
    counterparts.


    Learning

    Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework

    Lei Deng, Peng Jiao, Jing Pei, Zhenzhi Wu, Guoqi Li
    Comments: 9 pages, 10 figures
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    There is a pressing need to build an architecture that could subsume these
    networks undera unified framework that achieves both higher performance and
    less overhead. To this end, two fundamental issues are yet to be addressed. The
    first one is how to implement the back propagation when neuronal activations
    are discrete. The second one is how to remove the full-precision hidden weights
    in the training phase to break the bottlenecks of memory/computation
    consumption. To address the first issue, we present a multistep neuronal
    activation discretization method and a derivative approximation technique that
    enable the implementing the back propagation algorithm on discrete DNNs. While
    for the second issue, we propose a discrete state transition (DST) methodology
    to constrain the weights in a discrete space without saving the hidden weights.
    In this way, we build a unified framework that subsumes the binary or ternary
    networks as its special cases.More particularly, we find that when both the
    weights and activations become ternary values, the DNNs can be reduced to gated
    XNOR networks (or sparse binary networks) since only the event of non-zero
    weight and non-zero activation enables the control gate to start the XNOR logic
    operations in the original binary networks. This promises the event-driven
    hardware design for efficient mobile intelligence. We achieve advanced
    performance compared with state-of-the-art algorithms. Furthermore,the
    computational sparsity and the number of states in the discrete space can be
    flexibly modified to make it suitable for various hardware platforms.

    Filtering Variational Objectives

    Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, Yee Whye Teh
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    The evidence lower bound (ELBO) appears in many algorithms for maximum
    likelihood estimation (MLE) with latent variables because it is a sharp lower
    bound of the marginal log-likelihood. For neural latent variable models,
    optimizing the ELBO jointly in the variational posterior and model parameters
    produces state-of-the-art results. Inspired by the success of the ELBO as a
    surrogate MLE objective, we consider the extension of the ELBO to a family of
    lower bounds defined by a Monte Carlo estimator of the marginal likelihood. We
    show that the tightness of such bounds is asymptotically related to the
    variance of the underlying estimator. We introduce a special case, the
    filtering variational objectives (FIVOs), which takes the same arguments as the
    ELBO and passes them through a particle filter to form a tighter bound. FIVOs
    can be optimized tractably with stochastic gradients, and are particularly
    suited to MLE in sequential latent variable models. In standard sequential
    generative modeling tasks we present uniform improvements over models trained
    with ELBO, including some whole nat-per-timestep improvements.

    Geometric Methods for Robust Data Analysis in High Dimension

    Joseph Anderson
    Comments: 180 Pages, 7 Figures, PhD thesis, Ohio State (2017)
    Subjects: Learning (cs.LG)

    Machine learning and data analysis now finds both scientific and industrial
    application in biology, chemistry, geology, medicine, and physics. These
    applications rely on large quantities of data gathered from automated sensors
    and user input. Furthermore, the dimensionality of many datasets is extreme:
    more details are being gathered about single user interactions or sensor
    readings. All of these applications encounter problems with a common theme: use
    observed data to make inferences about the world. Our work obtains the first
    provably efficient algorithms for Independent Component Analysis (ICA) in the
    presence of heavy-tailed data. The main tool in this result is the centroid
    body (a well-known topic in convex geometry), along with optimization and
    random walks for sampling from a convex body. This is the first algorithmic use
    of the centroid body and it is of independent theoretical interest, since it
    effectively replaces the estimation of covariance from samples, and is more
    generally accessible.

    This reduction relies on a non-linear transformation of samples from such an
    intersection of halfspaces (i.e. a simplex) to samples which are approximately
    from a linearly transformed product distribution. Through this transformation
    of samples, which can be done efficiently, one can then use an ICA algorithm to
    recover the vertices of the intersection of halfspaces.

    Finally, we again use ICA as an algorithmic primitive to construct an
    efficient solution to the widely-studied problem of learning the parameters of
    a Gaussian mixture model. Our algorithm again transforms samples from a
    Gaussian mixture model into samples which fit into the ICA model and, when
    processed by an ICA algorithm, result in recovery of the mixture parameters.
    Our algorithm is effective even when the number of Gaussians in the mixture
    grows polynomially with the ambient dimension

    The cost of fairness in classification

    Aditya Krishna Menon, Robert C. Williamson
    Subjects: Learning (cs.LG)

    We study the problem of learning classifiers with a fairness constraint, with
    three main contributions towards the goal of quantifying the problem’s inherent
    tradeoffs. First, we relate two existing fairness measures to cost-sensitive
    risks. Second, we show that for cost-sensitive classification and fairness
    measures, the optimal classifier is an instance-dependent thresholding of the
    class-probability function. Third, we show how the tradeoff between accuracy
    and fairness is determined by the alignment between the class-probabilities for
    the target and sensitive features. Underpinning our analysis is a general
    framework that casts the problem of learning with a fairness requirement as one
    of minimising the difference of two statistical risks.

    Online Edge Grafting for Efficient MRF Structure Learning

    Walid Chaabene, Bert Huang
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Incremental methods for structure learning of pairwise Markov random fields
    (MRFs), such as grafting, improve scalability to large systems by avoiding
    inference over the entire feature space in each optimization step. Instead,
    inference is performed over an incrementally grown active set of features. In
    this paper, we address the computational bottlenecks that current techniques
    still suffer by introducing online edge grafting, an incremental, structured
    method that activates edges as groups of features in a streaming setting. The
    framework is based on reservoir sampling of edges that satisfy a necessary
    activation condition, approximating the search for the optimal edge to
    activate. Online edge grafting performs an informed edge search set
    reorganization using search history and structure heuristics. Experiments show
    a significant computational speedup for structure learning and a controllable
    trade-off between the speed and the quality of learning.

    Principled Hybrids of Generative and Discriminative Domain Adaptation

    Han Zhao, Zhenyao Zhu, Junjie Hu, Adam Coates, Geoff Gordon
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    We propose a probabilistic framework for domain adaptation that blends both
    generative and discriminative modeling in a principled way. By maximizing both
    the marginal and the conditional log-likelihoods, models derived from this
    framework can use both labeled instances from the source domain as well as
    unlabeled instances from both source and target domains. Under this framework,
    we show that the popular reconstruction loss of autoencoder corresponds to an
    upper bound of the negative marginal log-likelihoods of unlabeled instances,
    where marginal distributions are given by proper kernel density estimations.
    This provides a way to interpret the empirical success of autoencoders in
    domain adaptation and semi-supervised learning. We instantiate our framework
    using neural networks, and build a concrete model, DAuto. Empirically, we
    demonstrate the effectiveness of DAuto on text, image and speech datasets,
    showing that it outperforms related competitors when domain adaptation is
    possible.

    Approximation and Convergence Properties of Generative Adversarial Learning

    Shuang Liu, Olivier Bousquet, Kamalika Chaudhuri
    Subjects: Learning (cs.LG)

    Generative adversarial networks (GAN) approximate a target data distribution
    by jointly optimizing an objective function through a “two-player game” between
    a generator and a discriminator. Despite their empirical success, however, two
    very basic questions on how well they can approximate the target distribution
    remain unanswered. First, it is not known how restricting the discriminator
    family affects the approximation quality. Second, while a number of different
    objective functions have been proposed, we do not understand when convergence
    to the global minima of the objective function leads to convergence to the
    target distribution under various notions of distributional convergence.

    In this paper, we address these questions in a broad and unified setting by
    defining a notion of adversarial divergences that includes a number of recently
    proposed objective functions. We show that if the objective function is an
    adversarial divergence with some additional conditions, then using a restricted
    discriminator family has a moment-matching effect. Additionally, we show that
    for objective functions that are strict adversarial divergences, convergence in
    the objective function implies weak convergence, thus generalizing previous
    results.

    Modeling The Intensity Function Of Point Process Via Recurrent Neural Networks

    Shuai Xiao, Junchi Yan, Stephen M. Chu, Xiaokang Yang, Hongyuan Zha
    Comments: Accepted at Thirty-First AAAI Conference on Artificial Intelligence (AAAI17)
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Event sequence, asynchronously generated with random timestamp, is ubiquitous
    among applications. The precise and arbitrary timestamp can carry important
    clues about the underlying dynamics, and has lent the event data fundamentally
    different from the time-series whereby series is indexed with fixed and equal
    time interval. One expressive mathematical tool for modeling event is point
    process. The intensity functions of many point processes involve two
    components: the background and the effect by the history. Due to its inherent
    spontaneousness, the background can be treated as a time series while the other
    need to handle the history events. In this paper, we model the background by a
    Recurrent Neural Network (RNN) with its units aligned with time series indexes
    while the history effect is modeled by another RNN whose units are aligned with
    asynchronous events to capture the long-range dynamics. The whole model with
    event type and timestamp prediction output layers can be trained end-to-end.
    Our approach takes an RNN perspective to point process, and models its
    background and history effect. For utility, our method allows a black-box
    treatment for modeling the intensity which is often a pre-defined parametric
    form in point processes. Meanwhile end-to-end training opens the venue for
    reusing existing rich techniques in deep network for point process modeling. We
    apply our model to the predictive maintenance problem using a log dataset by
    more than 1000 ATMs from a global bank headquartered in North America.

    Optimal Cooperative Inference

    Scott Cheng-Hsin Yang, Yue Yu, Arash Givchi, Pei Wang, Wai Keen Vong, Patrick Shafto
    Comments: 14 pages (5 pages of Supplementary Material), 1 figure
    Subjects: Learning (cs.LG)

    Cooperative transmission of data fosters rapid accumulation of knowledge by
    efficiently combining experience across learners. Although well studied in
    human learning, there has been less attention to cooperative transmission of
    data in machine learning, and we consequently lack strong formal frameworks
    through which we may reason about the benefits and limitations of cooperative
    inference. We present such a framework. We introduce a novel index for
    measuring the effectiveness of probabilistic information transmission, and
    cooperative information transmission specifically. We relate our cooperative
    index to previous measures of teaching in deterministic settings. We prove
    conditions under which optimal cooperative inference can be achieved, including
    a representation theorem which constrains the form of inductive biases for
    learners optimized for cooperative inference. We conclude by demonstrating how
    these principles may inform the design of machine learning algorithms and
    discuss implications for human learning, machine learning, and human-machine
    learning systems.

    Exploring the Regularity of Sparse Structure in Convolutional Neural Networks

    Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally
    Comments: submitted to NIPS 2017
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Sparsity helps reduce the computational complexity of deep neural networks by
    skipping zeros. Taking advantage of sparsity is listed as a high priority in
    next generation DNN accelerators such as TPU. The structure of sparsity, i.e.,
    the granularity of pruning, affects the efficiency of hardware accelerator
    design as well as the prediction accuracy. Coarse-grained pruning creates
    regular sparsity patterns, making it more amenable for hardware acceleration
    but more challenging to maintain the same accuracy. In this paper we
    quantitatively measure the trade-off between sparsity regularity and prediction
    accuracy, providing insights in how to maintain accuracy while having more a
    more structured sparsity pattern. Our experimental results show that
    coarse-grained pruning can achieve a sparsity ratio similar to unstructured
    pruning without loss of accuracy. Moreover, due to the index saving effect,
    coarse-grained pruning is able to obtain a better compression ratio than
    fine-grained sparsity at the same accuracy threshold. Based on the recent
    sparse convolutional neural network accelerator (SCNN), our experiments further
    demonstrate that coarse-grained sparsity saves about 2x the memory references
    compared to fine-grained sparsity. Since memory reference is more than two
    orders of magnitude more expensive than arithmetic operations, the regularity
    of sparse structure leads to more efficient hardware design.

    Unsupervised Learning Layers for Video Analysis

    Liang Zhao, Yang Wang, Yi Yang, Wei Xu
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    This paper presents two unsupervised learning layers (UL layers) for
    label-free video analysis: one for fully connected layers, and the other for
    convolutional ones. The proposed UL layers can play two roles: they can be the
    cost function layer for providing global training signal; meanwhile they can be
    added to any regular neural network layers for providing local training signals
    and combined with the training signals backpropagated from upper layers for
    extracting both slow and fast changing features at layers of different depths.
    Therefore, the UL layers can be used in either pure unsupervised or
    semi-supervised settings. Both a closed-form solution and an online learning
    algorithm for two UL layers are provided. Experiments with unlabeled synthetic
    and real-world videos demonstrated that the neural networks equipped with UL
    layers and trained with the proposed online learning algorithm can extract
    shape and motion information from video sequences of moving objects. The
    experiments demonstrated the potential applications of UL layers and online
    learning algorithm to head orientation estimation and moving object
    localization.

    Implicit Regularization in Matrix Factorization

    Suriya Gunasekar, Blake Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We study implicit regularization when optimizing an underdetermined quadratic
    objective over a matrix (X) with gradient descent on a factorization of (X). We
    conjecture and provide empirical and theoretical evidence that with small
    enough step sizes and initialization close enough to the origin, gradient
    descent on a full dimensional factorization converges to the minimum nuclear
    norm solution.

    Asynchronous Parallel Bayesian Optimisation via Thompson Sampling

    Kirthevasan Kandasamy, Akshay Krishnamurthy, Jeff Schneider, Barnabas Poczos
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We design and analyse variations of the classical Thompson sampling (TS)
    procedure for Bayesian optimisation (BO) in settings where function evaluations
    are expensive, but can be performed in parallel. Our theoretical analysis shows
    that a direct application of the sequential Thompson sampling algorithm in
    either synchronous or asynchronous parallel settings yields a surprisingly
    powerful result: making (n) evaluations distributed among (M) workers is
    essentially equivalent to performing (n) evaluations in sequence. Further, by
    modeling the time taken to complete a function evaluation, we show that, under
    a time constraint, asynchronously parallel TS achieves asymptotically lower
    regret than both the synchronous and sequential versions. These results are
    complemented by an experimental analysis, showing that asynchronous TS
    outperforms a suite of existing parallel BO algorithms in simulations and in a
    hyper-parameter tuning application in convolutional neural networks. In
    addition to these, the proposed procedure is conceptually and computationally
    much simpler than existing work for parallel BO.

    Classification of Quantitative Light-Induced Fluorescence Images Using Convolutional Neural Network

    Sultan Imangaliyev, Monique H. van der Veen, Catherine M. C. Volgenant, Bruno G. Loos, Bart J. F. Keijser, Wim Crielaard, Evgeni Levin
    Comments: Full version of ICANN 2017 submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Images are an important data source for diagnosis and treatment of oral
    diseases. The manual classification of images may lead to misdiagnosis or
    mistreatment due to subjective errors. In this paper an image classification
    model based on Convolutional Neural Network is applied to Quantitative
    Light-induced Fluorescence images. The deep neural network outperforms other
    state of the art shallow classification models in predicting labels derived
    from three different dental plaque assessment scores. The model directly
    benefits from multi-channel representation of the images resulting in improved
    performance when, besides the Red colour channel, additional Green and Blue
    colour channels are used.

    Investigation of Using VAE for i-Vector Speaker Verification

    Timur Pekhovsky, Maxim Korenevsky
    Subjects: Sound (cs.SD); Learning (cs.LG); Machine Learning (stat.ML)

    New system for i-vector speaker recognition based on variational autoencoder
    (VAE) is investigated. VAE is a promising approach for developing accurate deep
    nonlinear generative models of complex data. Experiments show that VAE provides
    speaker embedding and can be effectively trained in an unsupervised manner. LLR
    estimate for VAE is developed. Experiments on NIST SRE 2010 data demonstrate
    its correctness. Additionally, we show that the performance of VAE-based system
    in the i-vectors space is close to that of the diagonal PLDA. Several
    interesting results are also observed in the experiments with (eta)-VAE. In
    particular, we found that for (etall 1), VAE can be trained to capture the
    features of complex input data distributions in an effective way, which is hard
    to obtain in the standard VAE ((eta=1)).

    MagNet: a Two-Pronged Defense against Adversarial Examples

    Dongyu Meng, Hao Chen
    Comments: In submission as a conference paper
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

    Deep learning has shown promising results on hard perceptual problems in
    recent years. However, deep learning systems are found to be vulnerable to
    small adversarial perturbations that are nearly imperceptible to human. Such
    specially crafted perturbations cause deep learning systems to output incorrect
    decisions, with potentially disastrous consequences. These vulnerabilities
    hinder the deployment of deep learning systems where safety or security is
    important. Attempts to secure deep learning systems either target specific
    attacks or have been shown to be ineffective.

    In this paper, we propose MagNet, a framework for defending neural network
    classifiers against adversarial examples. MagNet does not modify the protected
    classifier or know the process for generating adversarial examples. MagNet
    includes one or more separate detector networks and a reformer network.
    Different from previous work, MagNet learns to differentiate between normal and
    adversarial examples by approximating the manifold of normal examples. Since it
    does not rely on any process for generating adversarial examples, it has
    substantial generalization power. Moreover, MagNet reconstructs adversarial
    examples by moving them towards the manifold, which is effective for helping
    classify adversarial examples with small perturbation correctly. We discuss the
    intrinsic difficulty in defending against whitebox attack and propose a
    mechanism to defend against graybox attack. Inspired by the use of randomness
    in cryptography, we propose to use diversity to strengthen MagNet. We show
    empirically that MagNet is effective against most advanced state-of-the-art
    attacks in blackbox and graybox scenarios while keeping false positive rate on
    normal examples very low.

    Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

    Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jio Hsieh, Wei Zhang, Ji Liu
    Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Machine Learning (stat.ML)

    Most distributed machine learning systems nowadays, including TensorFlow and
    CNTK, are built in a centralized fashion. One bottleneck of centralized
    algorithms lies on high communication cost on the central node. Motivated by
    this, we ask, can decentralized algorithms be faster than its centralized
    counterpart?

    Although decentralized PSGD (D-PSGD) algorithms have been studied by the
    control community, existing analysis and theory do not show any advantage over
    centralized PSGD (C-PSGD) algorithms, simply assuming the application scenario
    where only the decentralized network is available. In this paper, we study a
    D-PSGD algorithm and provide the first theoretical analysis that indicates a
    regime in which decentralized algorithms might outperform centralized
    algorithms for distributed stochastic gradient descent. This is because D-PSGD
    has comparable total computational complexities to C-PSGD but requires much
    less communication cost on the busiest node. We further conduct an empirical
    study to validate our theoretical analysis across multiple frameworks (CNTK and
    Torch), different network configurations, and computation platforms up to 112
    GPUs. On network configurations with low bandwidth or high latency, D-PSGD can
    be up to one order of magnitude faster than its well-optimized centralized
    counterparts.

    A Clustering-based Consistency Adaptation Strategy for Distributed SDN Controllers

    Mohamed Aslan, Ashraf Matrawy
    Subjects: Networking and Internet Architecture (cs.NI); Learning (cs.LG)

    Distributed controllers are oftentimes used in large-scale SDN deployments
    where they run a myriad of network applications simultaneously. Such
    applications could have different consistency and availability preferences.
    These controllers need to communicate via east/west interfaces in order to
    synchronize their state information. The consistency and the availability of
    the distributed state information are governed by an underlying consistency
    model. Earlier, we suggested the use of adaptively-consistent controllers that
    can autonomously tune their consistency parameters in order to meet the
    performance requirements of a certain application. In this paper, we examine
    the feasibility of employing adaptive controllers that are built on-top of
    tunable consistency models similar to that of Apache Cassandra. We present an
    adaptation strategy that uses clustering techniques (sequential k-means and
    incremental k-means) in order to map a given application performance indicator
    into a feasible consistency level that can be used with the underlying tunable
    consistency model. In the cases that we modeled and tested, our results show
    that in the case of sequential k-means, with a reasonable number of clusters
    (>= 50), a plausible mapping (low RMSE) could be estimated between the
    application performance indicators and the consistency level indicator. In the
    case of incremental k-means, the results also showed that a plausible mapping
    (low RMSE) could be estimated using a similar number of clusters (>= 50) by
    using a small threshold (~( 0.01).

    Learning to Pour

    Yongqiang Huang, Yu Sun
    Subjects: Robotics (cs.RO); Learning (cs.LG)

    Pouring is a simple task people perform daily. It is the second most
    frequently executed motion in cooking scenarios, after pick-and-place. We
    present a pouring trajectory generation approach, which uses force feedback
    from the cup to determine the future velocity of pouring. The approach uses
    recurrent neural networks as its building blocks. We collected the pouring
    demonstrations which we used for training. To test our approach in simulation,
    we also created and trained a force estimation system. The simulated
    experiments show that the system is able to generalize to single unseen element
    of the pouring characteristics.

    State Space Decomposition and Subgoal Creation for Transfer in Deep Reinforcement Learning

    Himanshu Sahni, Saurabh Kumar, Farhan Tejani, Yannick Schroecker, Charles Isbell
    Comments: 5 pages, 6 figures; 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2017), Ann Arbor, Michigan
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Typical reinforcement learning (RL) agents learn to complete tasks specified
    by reward functions tailored to their domain. As such, the policies they learn
    do not generalize even to similar domains. To address this issue, we develop a
    framework through which a deep RL agent learns to generalize policies from
    smaller, simpler domains to more complex ones using a recurrent attention
    mechanism. The task is presented to the agent as an image and an instruction
    specifying the goal. This meta-controller guides the agent towards its goal by
    designing a sequence of smaller subtasks on the part of the state space within
    the attention, effectively decomposing it. As a baseline, we consider a setup
    without attention as well. Our experiments show that the meta-controller learns
    to create subgoals within the attention.

    Proximity Variational Inference

    Jaan Altosaar, Rajesh Ranganath, David M. Blei
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Computation (stat.CO)

    Variational inference is a powerful approach for approximate posterior
    inference. However, it is sensitive to initialization and can be subject to
    poor local optima. In this paper, we develop proximity variational inference
    (PVI). PVI is a new method for optimizing the variational objective that
    constrains subsequent iterates of the variational parameters to robustify the
    optimization path. Consequently, PVI is less sensitive to initialization and
    optimization quirks and finds better local optima. We demonstrate our method on
    three proximity statistics. We study PVI on a Bernoulli factor model and
    sigmoid belief network with both real and synthetic data and compare to
    deterministic annealing (Katahira et al., 2008). We highlight the flexibility
    of PVI by designing a proximity statistic for Bayesian deep learning models
    such as the variational autoencoder (Kingma and Welling, 2014; Rezende et al.,
    2014). Empirically, we show that PVI consistently finds better local optima and
    gives better predictive performance.

    Consistent Kernel Density Estimation with Non-Vanishing Bandwidth

    Efrén Cruz Cortés, Clayton Scott
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Consistency of the kernel density estimator requires that the kernel
    bandwidth tends to zero as the sample size grows. In this paper we investigate
    the question of whether consistency is still possible when the bandwidth is
    fixed, if we consider a more general class of weighted KDEs. To answer this
    question in the affirmative, we introduce the fixed-bandwidth KDE (fbKDE),
    obtained by solving a quadratic program, that consistently estimates any
    continuous square-integrable density. Rates of convergence for fbKDE are also
    established for radial kernel and the box kernel under appropriate smoothness
    assumptions. Similar results are provided for the box kernel. Furthermore, in
    an experimental study we demonstrate that the fbKDE compares favorably to the
    standard KDE and the previously proposed variable bandwidth KDE.

    Beyond Parity: Fairness Objectives for Collaborative Filtering

    Sirui Yao, Bert Huang
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    We study fairness in collaborative-filtering recommender systems, which are
    sensitive to discrimination that exists in historical data. Biased data can
    lead collaborative-filtering methods to make unfair predictions for users from
    minority groups. We identify the insufficiency of existing fairness metrics and
    propose four new metrics that address different forms of unfairness. These
    fairness metrics can be optimized by adding fairness terms to the learning
    objective. Experiments on synthetic and real data show that our new metrics can
    better measure fairness than the baseline, and that the fairness objectives
    effectively help reduce unfairness.


    Information Theory

    Performance Optimization of Co-Existing Underlay Secondary Networks

    Pratik Chakraborty, Shankar Prakriya
    Subjects: Information Theory (cs.IT)

    In this paper, we analyze the throughput performance of two co-existing
    downlink multiuser underlay secondary networks that use fixed-rate
    transmissions. We assume that the interference temperature limit (ITL) is
    apportioned to accommodate two concurrent transmissions using an interference
    temperature apportioning parameter so as to ensure that the overall
    interference to the primary receiver does not exceed the ITL. Using the derived
    analytical expressions for throughput, when there is only one secondary user in
    each network, or when the secondary networks do not employ opportunistic user
    selection (use round robin scheduling for example), there exists a critical
    fixed-rate below which sum throughput with co-existing secondary networks is
    higher than the throughput with a single secondary network. We derive an
    expression for this critical fixed-rate. Below this critical rate, we show that
    careful apportioning of the ITL is critical to maximizing sum throughput of the
    co-existing networks. We derive an expression for this apportioning parameter.
    Throughput is seen to increase with increase in number of users in each of the
    secondary networks. Computer simulations demonstrate accuracy of the derived
    expressions.

    Wireless Powered Communications with Finite Battery and Finite Blocklength

    Onel L. A. López (1), Evelio M. G. Fernández (2), H. Alves (1), R. D. Souza (3) ((1) Centre for Wireless Communications (CWC), University of Oulu, Finland, (2) Federal University of Paraná (UFPR), Curitiba, Brazil, (3) Federal University of Santa Catarina (UFSC), Florianopolis, Brazil)
    Comments: 23 pages, 6 main figures, submitted to IEEE TWC
    Subjects: Information Theory (cs.IT)

    We analyze a wireless communication system with finite block length and
    finite battery energy, under quasi-static Nakagami-m fading. Wireless energy
    transfer is carried out in the downlink while information transfer occurs in
    the uplink. Transmission strategies for scenarios with/without energy
    accumulation between transmission rounds are characterized in terms of error
    probability and energy consumption. A power control protocol for the energy
    accumulation scenario is proposed and results show the enormous impact on
    improving the system performance, in terms of error probability and energy
    consumption. The numerical results corroborate the existence and uniqueness of
    an optimum target error probability, while showing that a relatively small
    battery could be a limiting factor for some setups, specially when using the
    energy accumulation strategy.

    Energy-Efficient Multi-Pair Two-Way AF Full-Duplex Massive MIMO Relaying

    Ekant Sharma, Rohit Budhiraja, K Vasudevan
    Comments: 32 pages
    Subjects: Information Theory (cs.IT)

    We consider multi-pair two-way amplify and forward (AF) relaying, where
    multiple full-duplex source-node pairs exchange information via a shared
    full-duplex massive multiple-input multiple-output (MIMO) relay. Most of the
    previous massive MIMO relay works maximize spectral efficiency (SE). We
    maximize global energy efficiency (GEE) with quality-of-service (QoS)
    constraints that are expressed as the rate required by source nodes. The
    problem is non-convex and is solved by approximating it as a pseudo-concave
    problem, which is then solved using the Dinkelbach method. We also consider the
    max-min EE problem which maximizes the EE of the worst energy efficient user.
    For solving the EE optimization, we derive approximate closed-form lower bounds
    for the ergodic achievable rate for maximal-ratio and zero-forcing processing
    at the relay by using minimum mean squared error channel estimation. We
    numerically show the accuracy of the derived lower bounds and the improved GEE
    with optimal power allocation, with and without QoS constraints.

    Spectrum Sharing and Cyclical Multiple Access in UAV-Aided Cellular Offloading

    Jiangbin Lyu, Yong Zeng, Rui Zhang
    Comments: 6 pages, 3 figures, submitted for possible publication
    Subjects: Information Theory (cs.IT)

    In conventional terrestrial cellular systems, mobile terminals (MTs) at the
    cell edge often pose the performance bottleneck due to their long distance from
    the ground base station (GBS), especially in hotspot areas. This paper proposes
    a new hybrid network architecture by leveraging the use of unmanned aerial
    vehicle (UAV) as an aerial mobile base station, which flies cyclically along
    the cell edge to serve the cell-edge MTs and help offloading the traffic from
    the GBS. To achieve user fairness, we aim to maximize the minimum throughput of
    all MTs in a single cell by jointly optimizing the UAV’s trajectory, as well as
    the bandwidth allocation and user partitioning between the UAV and GBS.
    Numerical results show that the proposed hybrid network with optimized spectrum
    sharing and cyclical multiple access design significantly improves the spatial
    throughput over the conventional cellular network with the GBS only.

    The Dual Graph Shift Operator: Identifying the Support of the Frequency Domain

    Geert Leus, Santiago Segarra, Alejandro Ribeiro, Antonio G. Marques
    Comments: 5 pages, 2 figures
    Subjects: Information Theory (cs.IT)

    Contemporary data is often supported by an irregular structure, which can be
    conveniently captured by a graph. Accounting for this graph support is crucial
    to analyze the data, leading to an area known as graph signal processing (GSP).
    The two most important tools in GSP are the graph shift operator (GSO), which
    is a sparse matrix accounting for the topology of the graph, and the graph
    Fourier transform (GFT), which maps graph signals into a frequency domain
    spanned by a number of graph-related Fourier-like basis vectors. This
    alternative representation of a graph signal is denominated the graph frequency
    signal. Several attempts have been undertaken in order to interpret the support
    of this graph frequency signal, but they all resulted in a one-dimensional
    interpretation. However, if the support of the original signal is captured by a
    graph, why would the graph frequency signal have a simple one-dimensional
    support? That is why, for the first time, we propose an irregular support for
    the graph frequency signal, which we coin the dual graph. The dual GSO leads to
    a better interpretation of the graph frequency signal and its domain, helps to
    understand how the different graph frequencies are related and clustered,
    enables the development of better graph filters and filter banks, and
    facilitates the generalization of classical SP results to the graph domain.

    Communication vs Distributed Computation: an alternative trade-off curve

    Yahya H. Ezzeldin, Mohammed Karmoose, Christina Fragouli
    Subjects: Information Theory (cs.IT)

    In this paper, we revisit the communication vs. distributed computing
    trade-off, studied within the framework of MapReduce in [1]. An implicit
    assumption in the aforementioned work is that each server performs all possible
    computations on all the files stored in its memory. Our starting observation is
    that, if servers can compute only the intermediate values they need, then
    storage constraints do not directly imply computation constraints. We examine
    how this affects the communication-computation trade-off and suggest that the
    trade-off be studied with a predetermined storage constraint. We then proceed
    to examine the case where servers need to perform computationally intensive
    tasks, and may not have sufficient time to perform all computations required by
    the scheme in [1]. Given a threshold that limits the computational load, we
    derive a lower bound on the associated communication load, and propose a
    heuristic scheme that achieves in some cases the lower bound.

    New Results for Provable Dynamic Robust PCA

    Praneeth Narayanamurthy, Namrata Vaswani
    Subjects: Information Theory (cs.IT); Machine Learning (stat.ML)

    Robust PCA (RPCA) is the problem of separating a given data matrix into the
    sum of a sparse matrix and a low-rank matrix. Dynamic RPCA assumes that the
    true data vectors lie in a low-dimensional subspace that an change with time,
    albeit slowly. The goal is to track this changing subspace over time in the
    presence of sparse outliers. This work provides the first guarantee for dynamic
    RPCA that holds under (weakened) standard RPCA assumptions and a realistic
    model of slow subspace change. We analyze an existing method called ReProCS.
    Our result removes the strong assumptions needed by the two previous complete
    guarantees for ReProCS. Both these required an unrealistic model of subspace
    change and very specific assumptions on how the outlier support could change.
    Most importantly, our guarantees show that, because it exploits slow subspace
    change, ReProCS (and its offline counterpart) can provably tolerate much larger
    outlier fractions, are faster than most other provable methods, and have
    near-optimal storage complexity.

    Quantum-secured blockchain

    E.O. Kiktenko, N.O. Pozhar, M.N. Anufriev, A.S. Trushechkin, R.R. Yunusov, Y.V. Kurochkin, A.I. Lvovsky, A.K. Fedorov
    Comments: 6 pages, 2 figures
    Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Information Theory (cs.IT)

    Blockchain is a distributed database which is cryptographically protected
    against malicious modifications. While promising for a wide range of
    applications, current blockchain platforms rely on digital signatures, which
    are vulnerable to attacks by means of quantum computers. The same, albeit to a
    lesser extent, applies to cryptographic hash functions that are used in
    preparing new blocks, so parties with access to quantum computation would have
    unfair advantage in procuring mining rewards. Here we propose a possible
    solution to the quantum-era blockchain challenge and report an experimental
    realization of a quantum-safe blockchain platform that utilizes quantum key
    distribution across an urban fiber network for information-theoretically secure
    authentication. These results address important questions about realizability
    and scalability of quantum-safe blockchains for commercial and governmental
    applications.

    Shorter stabilizer circuits via Bruhat decomposition and quantum circuit transformations

    Dmitri Maslov, Martin Roetteler
    Comments: Supersedes arXiv:1703.00874
    Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET); Information Theory (cs.IT)

    In this paper we improve the layered implementation of arbitrary stabilizer
    circuits introduced by Aaronson and Gottesman in Phys. Rev. A 70(052328), 2004:
    to implement a general stabilizer circuit, we reduce their 11-stage computation
    -H-C-P-C-P-C-H-P-C-P-C- over the gate set consisting of Hadamard,
    Controlled-NOT, and Phase gates, into a 7-stage computation of the form
    -C-CZ-P-H-P-CZ-C-. We show arguments in support of using -CZ- stages over the
    -C- stages: not only the use of -CZ- stages allows a shorter layered
    expression, but -CZ- stages are simpler and appear to be easier to implement
    compared to the -C- stages. Based on this decomposition, we develop a two-qubit
    gate depth-)(14n{-}4)( implementation of stabilizer circuits over the gate
    library {H, P, CNOT}, executable in the LNN architecture, improving best
    previously known depth-)25n( circuit, also executable in the LNN architecture.
    Our constructions rely on Bruhat decomposition of the symplectic group and on
    folding arbitrarily long sequences of the form )((-P-C-))^m( into a 3-stage
    computation -P-CZ-C-. Our results include the reduction of the )11(-stage
    decomposition -H-C-P-C-P-C-H-P-C-P-C- into a )9(-stage decomposition of the
    form -C-P-C-P-H-C-P-C-P-. This reduction is based on the Bruhat decomposition
    of the symplectic group. This result also implies a new normal form for
    stabilizer circuits. We show that a circuit in this normal form is optimal in
    the number of Hadamard gates used. We also show that the normal form has an
    asymptotically optimal number of parameters.




沪ICP备19023445号-2号
友情链接