IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Thu, 9 Mar 2017

    我爱机器学习(52ml.net)发表于 2017-03-09 00:00:00
    love 0

    Neural and Evolutionary Computing

    Deep Reservoir Computing Using Cellular Automata

    Stefano Nichele, Andreas Molund
    Subjects: Neural and Evolutionary Computing (cs.NE); Emerging Technologies (cs.ET)

    Recurrent Neural Networks (RNNs) have been a prominent concept within
    artificial intelligence. They are inspired by Biological Neural Networks (BNNs)
    and provide an intuitive and abstract representation of how BNNs work. Derived
    from the more generic Artificial Neural Networks (ANNs), the recurrent ones are
    meant to be used for temporal tasks, such as speech recognition, because they
    are capable of memorizing historic input. However, such networks are very time
    consuming to train as a result of their inherent nature. Recently, Echo State
    Networks and Liquid State Machines have been proposed as possible RNN
    alternatives, under the name of Reservoir Computing (RC). RCs are far more easy
    to train. In this paper, Cellular Automata are used as reservoir, and are
    tested on the 5-bit memory task (a well known benchmark within the RC
    community). The work herein provides a method of mapping binary inputs from the
    task onto the automata, and a recurrent architecture for handling the
    sequential aspects of it. Furthermore, a layered (deep) reservoir architecture
    is proposed. Performances are compared towards earlier work, in addition to its
    single-layer version. Results show that the single CA reservoir system yields
    similar results to state-of-the-art work. The system comprised of two layered
    reservoirs do show a noticeable improvement compared to a single CA reservoir.
    This indicates potential for further research and provides valuable insight on
    how to design CA reservoir systems.

    Large-scale image analysis using docker sandboxing

    B Sengupta, E Vazquez, M Sasdelli, Y Qian, M Peniak, L Netherton, G Delfino
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)

    With the advent of specialized hardware such as Graphics Processing Units
    (GPUs), large scale image localization, classification and retrieval have seen
    increased prevalence. Designing scalable software architecture that co-evolves
    with such specialized hardware is a challenge in the commercial setting. In
    this paper, we describe one such architecture ( extit{Cortexica}) that
    leverages scalability of GPUs and sandboxing offered by docker containers. This
    allows for the flexibility of mixing different computer architectures as well
    as computational algorithms with the security of a trusted environment. We
    illustrate the utility of this framework in a commercial setting i.e.,
    searching for multiple products in an image by combining image localisation and
    retrieval.

    Byzantine-Tolerant Machine Learning

    Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, Julien Stainer
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)

    The growth of data, the need for scalability and the complexity of models
    used in modern machine learning calls for distributed implementations. Yet, as
    of today, distributed machine learning frameworks have largely ignored the
    possibility of arbitrary (i.e., Byzantine) failures. In this paper, we study
    the robustness to Byzantine failures at the fundamental level of stochastic
    gradient descent (SGD), the heart of most machine learning algorithms. Assuming
    a set of (n) workers, up to (f) of them being Byzantine, we ask how robust can
    SGD be, without limiting the dimension, nor the size of the parameter space.

    We first show that no gradient descent update rule based on a linear
    combination of the vectors proposed by the workers (i.e, current approaches)
    tolerates a single Byzantine failure. We then formulate a resilience property
    of the update rule capturing the basic requirements to guarantee convergence
    despite (f) Byzantine workers. We finally propose Krum, an update rule that
    satisfies the resilience property aforementioned. For a (d)-dimensional
    learning problem, the time complexity of Krum is (O(n^2 cdot (d + log n))).

    Customer Life Time Value Prediction Using Embeddings

    Benjamin Paul Chamberlain, Angelo Cardoso, C.H. Bryan Liu, Roberto Pagliari, Marc Peter Deisenroth
    Comments: 9 pages, 10 figures
    Subjects: Learning (cs.LG); Computers and Society (cs.CY); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We describe the Customer Life Time Value (CLTV) prediction system deployed at
    ASOS.com, a global online fashion retailer. CLTV prediction is an important
    problem in e-commerce where an accurate estimate of future value allows
    retailers to effectively allocate marketing spend, identify and nurture high
    value customers and mitigate exposure to losses. The system at ASOS provides
    daily estimates of the future value of every customer and is one of the
    cornerstones of the personalised shopping experience. The state of the art in
    this domain uses large numbers of handcrafted features and ensemble regressors
    to forecast value, predict churn and evaluate customer loyalty. We describe our
    system, which adopts this approach, and our ongoing efforts to further improve
    it. Recently, domains including language, vision and speech have shown dramatic
    advances by replacing handcrafted features with features that are learned
    automatically from data. We show that learning feature representations is a
    promising extension to the state of the art in CLTV modelling. We propose a
    novel way to generate embeddings of customers, which addresses the issue of the
    ever changing product catalogue and obtain a significant improvement over an
    exhaustive set of handcrafted features.


    Computer Vision and Pattern Recognition

    QuaSI: Quantile Sparse Image Prior for Spatio-Temporal Denoising of Retinal OCT Data

    Franziska Schirrmacher, Thomas Köhler, Lennart Husvogt, James G. Fujimoto, Joachim Hornegger, Andreas K. Maier
    Comments: submitted to MICCAI’17
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Optical coherence tomography (OCT) enables high-resolution and non-invasive
    3D imaging of the human retina but is inherently impaired by speckle noise.
    This paper introduces a spatio-temporal denoising algorithm for OCT data on a
    B-scan level using a novel quantile sparse image (QuaSI) prior. To remove
    speckle noise while preserving image structures of diagnostic relevance, we
    implement our QuaSI prior via median filter regularization coupled with a Huber
    data fidelity model in a variational approach. For efficient energy
    minimization, we develop an alternating direction method of multipliers (ADMM)
    scheme using a linearization of median filtering. Our spatio-temporal method
    can handle both, denoising of single B-scans and temporally consecutive
    B-scans, to gain volumetric OCT data with enhanced signal-to-noise ratio. Our
    algorithm based on 4 B-scans only achieved comparable performance to averaging
    13 B-scans and outperformed other current denoising methods.

    Fast Gesture Recognition with Multiple Stream Discrete HMMs on 3D Skeletons

    Guido Borghi, Roberto Vezzani, Rita Cucchiara
    Comments: Accepted in ICPR 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    HMMs are widely used in action and gesture recognition due to their
    implementation simplicity, low computational requirement, scalability and high
    parallelism. They have worth performance even with a limited training set. All
    these characteristics are hard to find together in other even more accurate
    methods. In this paper, we propose a novel double-stage classification
    approach, based on Multiple Stream Discrete Hidden Markov Models (MSD-HMM) and
    3D skeleton joint data, able to reach high performances maintaining all
    advantages listed above. The approach allows both to quickly classify
    pre-segmented gestures (offline classification), and to perform temporal
    segmentation on streams of gestures (online classification) faster than real
    time. We test our system on three public datasets, MSRAction3D, UTKinect-Action
    and MSRDailyAction, and on a new dataset, Kinteract Dataset, explicitly created
    for Human Computer Interaction (HCI). We obtain state of the art performances
    on all of them.

    Transformation-Grounded Image Generation Network for Novel 3D View Synthesis

    Eunbyung Park, Jimei Yang, Ersin Yumer, Duygu Ceylan, Alexander C. Berg
    Comments: To appear in CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a transformation-grounded image generation network for novel 3D
    view synthesis from a single image. Instead of taking a ‘blank slate’ approach,
    we first explicitly infer the parts of the geometry visible both in the input
    and novel views and then re-cast the remaining synthesis problem as image
    completion. Specifically, we both predict a flow to move the pixels from the
    input to the novel view along with a novel visibility map that helps deal with
    occulsion/disocculsion. Next, conditioned on those intermediate results, we
    hallucinate (infer) parts of the object invisible in the input image. In
    addition to the new network structure, training with a combination of
    adversarial and perceptual loss results in a reduction in common artifacts of
    novel view synthesis such as distortions and holes, while successfully
    generating high frequency details and preserving visual aspects of the input
    image. We evaluate our approach on a wide range of synthetic and real examples.
    Both qualitative and quantitative results show our method achieves
    significantly better results compared to existing methods.

    Large-scale image analysis using docker sandboxing

    B Sengupta, E Vazquez, M Sasdelli, Y Qian, M Peniak, L Netherton, G Delfino
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)

    With the advent of specialized hardware such as Graphics Processing Units
    (GPUs), large scale image localization, classification and retrieval have seen
    increased prevalence. Designing scalable software architecture that co-evolves
    with such specialized hardware is a challenge in the commercial setting. In
    this paper, we describe one such architecture ( extit{Cortexica}) that
    leverages scalability of GPUs and sandboxing offered by docker containers. This
    allows for the flexibility of mixing different computer architectures as well
    as computational algorithms with the security of a trusted environment. We
    illustrate the utility of this framework in a commercial setting i.e.,
    searching for multiple products in an image by combining image localisation and
    retrieval.

    A Linear Extrinsic Calibration of Kaleidoscopic Imaging System from Single 3D Point

    Kosuke Takahashi, Akihiro Miyata, Shohei Nobuhara, Takashi Matsuyama
    Comments: to appear in CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes a new extrinsic calibration of kaleidoscopic imaging
    system by estimating normals and distances of the mirrors. The problem to be
    solved in this paper is a simultaneous estimation of all mirror parameters
    consistent throughout multiple reflections. Unlike conventional methods
    utilizing a pair of direct and mirrored images of a reference 3D object to
    estimate the parameters on a per-mirror basis, our method renders the
    simultaneous estimation problem into solving a linear set of equations. The key
    contribution of this paper is to introduce a linear estimation of multiple
    mirror parameters from kaleidoscopic 2D projections of a single 3D point of
    unknown geometry. Evaluations with synthesized and real images demonstrate the
    performance of the proposed algorithm in comparison with conventional methods.

    Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network

    Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    One of recent trends [30, 31, 14] in network architec- ture design is
    stacking small filters (e.g., 1×1 or 3×3) in the entire network because the
    stacked small filters is more ef- ficient than a large kernel, given the same
    computational complexity. However, in the field of semantic segmenta- tion,
    where we need to perform dense per-pixel prediction, we find that the large
    kernel (and effective receptive field) plays an important role when we have to
    perform the clas- sification and localization tasks simultaneously. Following
    our design principle, we propose a Global Convolutional Network to address both
    the classification and localization issues for the semantic segmentation. We
    also suggest a residual-based boundary refinement to further refine the ob-
    ject boundaries. Our approach achieves state-of-art perfor- mance on two public
    benchmarks and significantly outper- forms previous results, 82.2% (vs 80.2%)
    on PASCAL VOC 2012 dataset and 76.9% (vs 71.8%) on Cityscapes dataset.

    A Pursuit of Temporal Accuracy in General Activity Detection

    Yuanjun Xiong, Yue Zhao, Limin Wang, Dahua Lin, Xiaoou Tang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Detecting activities in untrimmed videos is an important but challenging
    task. The performance of existing methods remains unsatisfactory, e.g., they
    often meet difficulties in locating the beginning and end of a long complex
    action. In this paper, we propose a generic framework that can accurately
    detect a wide variety of activities from untrimmed videos. Our first
    contribution is a novel proposal scheme that can efficiently generate
    candidates with accurate temporal boundaries. The other contribution is a
    cascaded classification pipeline that explicitly distinguishes between
    relevance and completeness of a candidate instance. On two challenging temporal
    activity detection datasets, THUMOS14 and ActivityNet, the proposed framework
    significantly outperforms the existing state-of-the-art methods, demonstrating
    superior accuracy and strong adaptivity in handling activities with various
    temporal structures.

    Tree-Structured Reinforcement Learning for Sequential Object Localization

    Zequn Jie, Xiaodan Liang, Jiashi Feng, Xiaojie Jin, Wen Feng Lu, Shuicheng Yan
    Comments: Advances in Neural Information Processing Systems 2016
    Journal-ref: In Advances in Neural Information Processing Systems (pp. 127-135)
    (2016)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Existing object proposal algorithms usually search for possible object
    regions over multiple locations and scales separately, which ignore the
    interdependency among different objects and deviate from the human perception
    procedure. To incorporate global interdependency between objects into object
    localization, we propose an effective Tree-structured Reinforcement Learning
    (Tree-RL) approach to sequentially search for objects by fully exploiting both
    the current observation and historical search paths. The Tree-RL approach
    learns multiple searching policies through maximizing the long-term reward that
    reflects localization accuracies over all the objects. Starting with taking the
    entire image as a proposal, the Tree-RL approach allows the agent to
    sequentially discover multiple objects via a tree-structured traversing scheme.
    Allowing multiple near-optimal policies, Tree-RL offers more diversity in
    search paths and is able to find multiple objects with a single feed-forward
    pass. Therefore, Tree-RL can better cover different objects with various scales
    which is quite appealing in the context of object proposal. Experiments on
    PASCAL VOC 2007 and 2012 validate the effectiveness of the Tree-RL, which can
    achieve comparable recalls with current object proposal algorithms via much
    fewer candidate windows.

    Texture Classification of MR Images of the Brain in ALS using CoHOG

    G M Mashrur E Elahi, Sanjay Kalra, Yee-Hong Yang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Texture analysis is a well-known research topic in computer vision and image
    processing and has many applications. Gradient-based texture methods have
    become popular in classification problems. For the first time we extend a
    well-known gradient-based method, Co-occurrence Histograms of Oriented
    Gradients (CoHOG) to extract texture features from 2D Magnetic Resonance Images
    (MRI). Unlike the original CoHOG method, we use the whole image instead of
    sub-regions for feature calculation. Also, we use a larger neighborhood size.
    Gradient orientations of the image pixels are calculated using Sobel, Gaussian
    Derivative (GD) and Local Frequency Descriptor Gradient (LFDG) operators. The
    extracted feature vector size is very large and classification using a large
    number of similar features does not provide the best results. In our proposed
    method, for the first time to our best knowledge, only a minimum number of
    significant features are selected using area under the receiver operator
    characteristic (ROC) curve (AUC) thresholds with <= 0.01. In this paper, we
    apply the proposed method to classify Amyotrophic Lateral Sclerosis (ALS)
    patients from the controls. It is observed that selected texture features from
    downsampled images are significantly different between patients and controls.
    These features are used in a linear support vector machine (SVM) classifier to
    determine the classification accuracy. Optimal sensitivity and specificity are
    also calculated. Three different cohort datasets are used in the experiments.
    The performance of the proposed method using three gradient operators and two
    different neighborhood sizes is analyzed. Region based analysis is performed to
    demonstrate that significant changes between patients and controls are limited
    to the motor cortex.

    Optical Flow Fields: Dense Correspondence Fields for Highly Accurate Large Displacement Optical Flow Estimation

    Christian Bailer, Bertram Taetz, Didier Stricker
    Comments: extended version of conference publication arXiv:1508.05151
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Modern large displacement optical flow algorithms usually use an
    initialization by either sparse descriptor matching techniques or dense
    approximate nearest neighbor fields. While the latter have the advantage of
    being dense, they have the major disadvantage of being very outlier-prone as
    they are not designed to find the optical flow, but the visually most similar
    correspondence. In this article we present a dense correspondence field
    approach that is much less outlier-prone and thus much better suited for
    optical flow estimation than approximate nearest neighbor fields. Our approach
    does not require explicit regularization, smoothing (like median filtering) or
    a new data term. Instead we solely rely on patch matching techniques and a
    novel multi-scale matching strategy. We also present enhancements for outlier
    filtering. We show that our approach is better suited for large displacement
    optical flow estimation than modern descriptor matching techniques. We do so by
    initializing EpicFlow with our approach instead of their originally used
    state-of-the-art descriptor matching technique. We significantly outperform the
    original EpicFlow on MPI-Sintel, KITTI 2012, KITTI 2015 and Middlebury. In this
    extended article of our former conference publication we further improve our
    approach in matching accuracy as well as runtime and present more experiments
    and insights.

    A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics

    Seyed Ali Ossia, Ali Shahin Shamsabadi, Ali Taheri, Hamid R. Rabiee, Nic Lane, Hamed Haddadi
    Comments: Technical Report
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    The increasing quality of smartphone cameras and variety of photo editing
    applications, in addition to the rise in popularity of image-centric social
    media, have all led to a phenomenal growth in mobile-based photography.
    Advances in computer vision and machine learning techniques provide a large
    number of cloud-based services with the ability to provide content analysis,
    face recognition, and object detection facilities to third parties. These
    inferences and analytics might come with undesired privacy risks to the
    individuals.

    In this paper, we address a fundamental challenge: Can we utilize the local
    processing capabilities of modern smartphones efficiently to provide desired
    features to approved analytics services, while protecting against undesired
    inference attacks and preserving privacy on the cloud? We propose a hybrid
    architecture for a distributed deep learning model between the smartphone and
    the cloud. We rely on the Siamese network and machine learning approaches for
    providing privacy based on defined privacy constraints. We also use transfer
    learning techniques to evaluate the proposed method. Using the latest deep
    learning models for Face Recognition, Emotion Detection, and Gender
    Classification techniques, we demonstrate the effectiveness of our technique in
    providing highly accurate classification results for the desired analytics,
    while proving strong privacy guarantees.

    Deep Bayesian Active Learning with Image Data

    Yarin Gal, Riashat Islam, Zoubin Ghahramani
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Even though active learning forms an important pillar of machine learning,
    deep learning tools are not prevalent within it. Deep learning poses several
    difficulties when used in an active learning setting. First, active learning
    (AL) methods generally rely on being able to learn and update models from small
    amounts of data. Recent advances in deep learning, on the other hand, are
    notorious for their dependence on large amounts of data. Second, many AL
    acquisition functions rely on model uncertainty, yet deep learning methods
    rarely represent such model uncertainty. In this paper we combine recent
    advances in Bayesian deep learning into the active learning framework in a
    practical way. We develop an active learning framework for high dimensional
    data, a task which has been extremely challenging so far, with very sparse
    existing literature. Taking advantage of specialised models such as Bayesian
    convolutional neural networks, we demonstrate our active learning techniques
    with image data, obtaining a significant improvement on existing active
    learning approaches. We demonstrate this on both the MNIST dataset, as well as
    for skin cancer diagnosis from lesion images (ISIC2016 task).

    A Computational Model of a Single-Photon Avalanche Diode Sensor for Transient Imaging

    Quercus Hernandez, Diego Gutierrez, Adrian Jarabo
    Comments: 7 pages, 11 figures
    Subjects: Instrumentation and Detectors (physics.ins-det); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

    Single-Photon Avalanche Diodes (SPAD) are affordable photodetectors, capable
    to collect extremely fast low-energy events, due to their single-photon
    sensibility. This makes them very suitable for time-of-flight-based range
    imaging systems, allowing to reduce costs and power requirements, without
    sacrifizing much temporal resolution. In this work we describe a computational
    model to simulate the behaviour of SPAD sensors, aiming to provide a realistic
    camera model for time-resolved light transport simulation, with applications on
    prototyping new reconstructions techniques based on SPAD time-of-flight data.
    Our model accounts for the major effects of the sensor on the incoming signal.
    We compare our model against real-world measurements, and apply it to a variety
    of scenarios, including complex multiply-scattered light transport.

    Cellulyzer – Automated analysis and interactive visualization/simulation of select cellular processes

    Aliakbar Jafarpour, Holger Lorenz
    Subjects: Biological Physics (physics.bio-ph); Computer Vision and Pattern Recognition (cs.CV); Data Analysis, Statistics and Probability (physics.data-an); Subcellular Processes (q-bio.SC)

    Here we report on a set of programs developed at the ZMBH Bio-Imaging
    Facility for tracking real-life images of cellular processes. These programs
    perform 1) automated tracking; 2) quantitative and comparative track analyses
    of different images in different groups; 3) different interactive visualization
    schemes; and 4) interactive realistic simulation of different cellular
    processes for validation and optimal problem-specific adjustment of image
    acquisition parameters (tradeoff between speed, resolution, and quality with
    feedback from the very final results). The collection of programs is primarily
    developed for the common bio-image analysis software ImageJ (as a single Java
    Plugin). Some programs are also available in other languages (C++ and
    Javascript) and may be run simply with a web-browser; even on a low-end Tablet
    or Smartphone. The programs are available at
    this https URL


    Artificial Intelligence

    Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning

    Abhishek Gupta, Coline Devin, YuXuan Liu, Pieter Abbeel, Sergey Levine
    Comments: Published as a conference paper at ICLR 2017
    Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

    People can learn a wide range of tasks from their own experience, but can
    also learn from observing other creatures. This can accelerate acquisition of
    new skills even when the observed agent differs substantially from the learning
    agent in terms of morphology. In this paper, we examine how reinforcement
    learning algorithms can transfer knowledge between morphologically different
    agents (e.g., different robots). We introduce a problem formulation where two
    agents are tasked with learning multiple skills by sharing information. Our
    method uses the skills that were learned by both agents to train invariant
    feature spaces that can then be used to transfer other skills from one agent to
    another. The process of learning these invariant feature spaces can be viewed
    as a kind of “analogy making”, or implicit learning of partial correspondences
    between two distinct domains. We evaluate our transfer learning algorithm in
    two simulated robotic manipulation skills, and illustrate that we can transfer
    knowledge between simulated robotic arms with different numbers of links, as
    well as simulated arms with different actuation mechanisms, where one robot is
    torque-driven while the other is tendon-driven.

    A quantum dynamic belief model to explain the interference effects of categorization on decision making

    Zichang He, Wen Jiang
    Comments: 28 pages. arXiv admin note: text overlap with arXiv:1703.02386
    Subjects: Artificial Intelligence (cs.AI); Quantum Physics (quant-ph)

    Categorization is necessary for many decision making tasks. However, the
    categorization process may interfere the decision making result and the law of
    total probability can be violated in some situations. To predict the
    interference effect of categorization, some model based on quantum probability
    has been proposed. In this paper, a new quantum dynamic belief (QDB) model is
    proposed. Considering the precise decision may not be made during the process,
    the concept of uncertainty is introduced in our model to simulate real human
    thinking process. Then the interference effect categorization can be predicted
    by handling the uncertain information. The proposed model is applied to a
    categorization decision-making experiment to explain the interference effect of
    categorization. Compared with other models, our model is relatively more
    succinct and the result shows the correctness and effectiveness of our model.

    Memory Enriched Big Bang Big Crunch Optimization Algorithm for Data Clustering

    Kayvan Bijari, Hadi Zare, Hadi Veisi, Hossein Bobarshad
    Comments: 17 pages, 3 figures, 8 tables
    Journal-ref: Neural Comput & Applic (2016)
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    Cluster analysis plays an important role in decision making process for many
    knowledge-based systems. There exist a wide variety of different approaches for
    clustering applications including the heuristic techniques, probabilistic
    models, and traditional hierarchical algorithms. In this paper, a novel
    heuristic approach based on big bang-big crunch algorithm is proposed for
    clustering problems. The proposed method not only takes advantage of heuristic
    nature to alleviate typical clustering algorithms such as k-means, but it also
    benefits from the memory based scheme as compared to its similar heuristic
    techniques. Furthermore, the performance of the proposed algorithm is
    investigated based on several benchmark test functions as well as on the
    well-known datasets. The experimental results show the significant superiority
    of the proposed method over the similar algorithms.

    An Integrated and Scalable Platform for Proactive Event-Driven Traffic Management

    Alain Kibangou, Alexander Artikis, Evangelos Michelioudakis, Georgios Paliouras, Marius Schmitt, John Lygeros, Chris Baber, Natan Morar, Fabiana Fournier, Inna Skarbovsky
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

    Traffic on freeways can be managed by means of ramp meters from Road Traffic
    Control rooms. Human operators cannot efficiently manage a network of ramp
    meters. To support them, we present an intelligent platform for traffic
    management which includes a new ramp metering coordination scheme in the
    decision making module, an efficient dashboard for interacting with human
    operators, machine learning tools for learning event definitions and Complex
    Event Processing tools able to deal with uncertainties inherent to the traffic
    use case. Unlike the usual approach, the devised event-driven platform is able
    to predict a congestion up to 4 minutes before it really happens. Proactive
    decision making can then be established leading to significant improvement of
    traffic conditions.

    Cost-Optimal Learning of Causal Graphs

    Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath
    Subjects: Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

    We consider the problem of learning a causal graph over a set of variables
    with interventions. We study the cost-optimal causal graph learning problem:
    For a given skeleton (undirected version of the causal graph), design the set
    of interventions with minimum total cost, that can uniquely identify any causal
    graph with the given skeleton. We show that this problem is solvable in
    polynomial time. Later, we consider the case when the number of interventions
    is limited. For this case, we provide polynomial time algorithms when the
    skeleton is a tree or a clique tree. For a general chordal skeleton, we develop
    an efficient greedy algorithm, which can be improved when the causal graph
    skeleton is an interval graph.

    Learning a Unified Control Policy for Safe Falling

    Visak CV Kumar, Sehoon Ha, C Karen Liu
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Being able to fall safely is a necessary motor skill for humanoids performing
    highly dynamic tasks, such as running and jumping. We propose a new method to
    learn a policy that minimizes the maximal impulse during the fall. The
    optimization solves for both a discrete contact planning problem and a
    continuous optimal control problem. Once trained, the policy can compute the
    optimal next contacting body part (e.g. left foot, right foot, or hands),
    contact location and timing, and the required joint actuation. We represent the
    policy as a mixture of actor-critic neural network, which consists of n control
    policies and the corresponding value functions. Each pair of actor-critic is
    associated with one of the n possible contacting body parts. During execution,
    the policy corresponding to the highest value function will be executed while
    the associated body part will be the next contact with the ground. With this
    mixture of actor-critic architecture, the discrete contact sequence planning is
    solved through the selection of the best critics while the continuous control
    problem is solved by the optimization of actors. We show that our policy can
    achieve comparable, sometimes even higher, rewards than a recursive search of
    the action space using dynamic programming, while enjoying 50 to 400 times of
    speed gain during online execution.

    Introduction to Formal Concept Analysis and Its Applications in Information Retrieval and Related Fields

    Dmitry I. Ignatov
    Journal-ref: RuSSIR 2014, Nizhniy Novgorod, Russia, CCIS vol. 505, Springer
    42-141
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)

    This paper is a tutorial on Formal Concept Analysis (FCA) and its
    applications. FCA is an applied branch of Lattice Theory, a mathematical
    discipline which enables formalisation of concepts as basic units of human
    thinking and analysing data in the object-attribute form. Originated in early
    80s, during the last three decades, it became a popular human-centred tool for
    knowledge representation and data analysis with numerous applications. Since
    the tutorial was specially prepared for RuSSIR 2014, the covered FCA topics
    include Information Retrieval with a focus on visualisation aspects, Machine
    Learning, Data Mining and Knowledge Discovery, Text Mining and several others.

    Robust Adversarial Reinforcement Learning

    Lerrel Pinto, James Davidson, Rahul Sukthankar, Abhinav Gupta
    Comments: 10 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)

    Deep neural networks coupled with fast simulation and improved computation
    have led to recent successes in the field of reinforcement learning (RL).
    However, most current RL-based approaches fail to generalize since: (a) the gap
    between simulation and real world is so large that policy-learning approaches
    fail to transfer; (b) even if policy learning is done in real world, the data
    scarcity leads to failed generalization from training to test scenarios (e.g.,
    due to different friction or object masses). Inspired from H-infinity control
    methods, we note that both modeling errors and differences in training and test
    scenarios can be viewed as extra forces/disturbances in the system. This paper
    proposes the idea of robust adversarial reinforcement learning (RARL), where we
    train an agent to operate in the presence of a destabilizing adversary that
    applies disturbance forces to the system. The jointly trained adversary is
    reinforced — that is, it learns an optimal destabilization policy. We
    formulate the policy learning as a zero-sum, minimax objective function.
    Extensive experiments in multiple environments (InvertedPendulum, HalfCheetah,
    Swimmer, Hopper and Walker2d) conclusively demonstrate that our method (a)
    improves training stability; (b) is robust to differences in training/test
    conditions; and c) outperform the baseline even in the absence of the
    adversary.

    Towards Generalization and Simplicity in Continuous Control

    Aravind Rajeswaran, Kendall Lowrey, Emanuel Todorov, Sham Kakade
    Comments: Project page: this https URL
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (cs.SY)

    This work shows that policies with simple linear and RBF parameterizations
    can be trained to solve a variety of continuous control tasks, including the
    OpenAI gym benchmarks. The performance of these trained policies are
    competitive with state of the art results, obtained with more elaborate
    parameterizations such as fully connected neural networks. Furthermore,
    existing training and testing scenarios are shown to be very limited and prone
    to over-fitting, thus giving rise to only trajectory-centric policies. Training
    with a diverse initial state distribution is shown to produce more global
    policies with better generalization. This allows for interactive control
    scenarios where the system recovers from large on-line perturbations; as shown
    in the supplementary video.

    Multi-Robot Active Information Gathering with Periodic Communication

    Mikko Lauri, Eero Heinänen, Simone Frintrop
    Comments: IEEE International Conference on Robotics and Automation (ICRA), 2017
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

    A team of robots sharing a common goal can benefit from coordination of the
    activities of team members, helping the team to reach the goal more reliably or
    quickly. We address the problem of coordinating the actions of a team of robots
    with periodic communication capability executing an information gathering task.
    We cast the problem as a multi-agent optimal decision-making problem with an
    information theoretic objective function. We show that appropriate techniques
    for solving decentralized partially observable Markov decision processes
    (Dec-POMDPs) are applicable in such information gathering problems. We quantify
    the usefulness of coordinated information gathering through simulation studies,
    and demonstrate the feasibility of the method in a real-world target tracking
    domain.


    Information Retrieval

    Kaggle Competition: Expedia Hotel Recommendations

    Gourav G. Shenoy, Mangirish A. Wagle, Anwar Shaikh
    Comments: 12 pages, 8 tables, 7 figures
    Subjects: Information Retrieval (cs.IR)

    With hundreds, even thousands, of hotels to choose from at every destination,
    it’s difficult to know which will suit your personal preferences. Expedia wants
    to take the proverbial rabbit hole out of hotel search by providing
    personalized hotel recommendations to their users. This is no small task for a
    site with hundreds of millions of visitors every month! Currently, Expedia uses
    search parameters to adjust their hotel recommendations, but there aren’t
    enough customer specific data to personalize them for each user. In this
    project, we have taken up the challenge to contextualize customer data and
    predict the likelihood a user will stay at 100 different hotel groups.

    Introduction to Formal Concept Analysis and Its Applications in Information Retrieval and Related Fields

    Dmitry I. Ignatov
    Journal-ref: RuSSIR 2014, Nizhniy Novgorod, Russia, CCIS vol. 505, Springer
    42-141
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)

    This paper is a tutorial on Formal Concept Analysis (FCA) and its
    applications. FCA is an applied branch of Lattice Theory, a mathematical
    discipline which enables formalisation of concepts as basic units of human
    thinking and analysing data in the object-attribute form. Originated in early
    80s, during the last three decades, it became a popular human-centred tool for
    knowledge representation and data analysis with numerous applications. Since
    the tutorial was specially prepared for RuSSIR 2014, the covered FCA topics
    include Information Retrieval with a focus on visualisation aspects, Machine
    Learning, Data Mining and Knowledge Discovery, Text Mining and several others.

    Large-scale image analysis using docker sandboxing

    B Sengupta, E Vazquez, M Sasdelli, Y Qian, M Peniak, L Netherton, G Delfino
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)

    With the advent of specialized hardware such as Graphics Processing Units
    (GPUs), large scale image localization, classification and retrieval have seen
    increased prevalence. Designing scalable software architecture that co-evolves
    with such specialized hardware is a challenge in the commercial setting. In
    this paper, we describe one such architecture ( extit{Cortexica}) that
    leverages scalability of GPUs and sandboxing offered by docker containers. This
    allows for the flexibility of mixing different computer architectures as well
    as computational algorithms with the security of a trusted environment. We
    illustrate the utility of this framework in a commercial setting i.e.,
    searching for multiple products in an image by combining image localisation and
    retrieval.

    On Sampling from Massive Graph Streams

    Nesreen K. Ahmed, Nick Duffield, Theodore Willke, Ryan A. Rossi
    Subjects: Social and Information Networks (cs.SI); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR); Statistics Theory (math.ST)

    We propose Graph Priority Sampling (GPS), a new paradigm for order-based
    reservoir sampling from massive streams of graph edges. GPS provides a general
    way to weight edge sampling according to auxiliary and/or size variables so as
    to accomplish various estimation goals of graph properties. In the context of
    subgraph counting, we show how edge sampling weights can be chosen so as to
    minimize the estimation variance of counts of specified sets of subgraphs. In
    distinction with many prior graph sampling schemes, GPS separates the functions
    of edge sampling and subgraph estimation. We propose two estimation frameworks:
    (1) Post-Stream estimation, to allow GPS to construct a reference sample of
    edges to support retrospective graph queries, and (2) In-Stream estimation, to
    allow GPS to obtain lower variance estimates by incrementally updating the
    subgraph count estimates during stream processing. Unbiasedness of subgraph
    estimators is established through a new Martingale formulation of graph stream
    order sampling, which shows that subgraph estimators, written as a product of
    constituent edge estimators are unbiased, even when computed at different
    points in the stream. The separation of estimation and sampling enables
    significant resource savings relative to previous work. We illustrate our
    framework with applications to triangle and wedge counting. We perform a
    large-scale experimental study on real-world graphs from various domains and
    types. GPS achieves high accuracy with less than 1% error for triangle and
    wedge counting, while storing a small fraction of the graph with average update
    times of a few microseconds per edge. Notably, for a large Twitter graph with
    more than 260M edges, GPS accurately estimates triangle counts with less than
    1% error, while storing only 40K edges.

    Customer Life Time Value Prediction Using Embeddings

    Benjamin Paul Chamberlain, Angelo Cardoso, C.H. Bryan Liu, Roberto Pagliari, Marc Peter Deisenroth
    Comments: 9 pages, 10 figures
    Subjects: Learning (cs.LG); Computers and Society (cs.CY); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We describe the Customer Life Time Value (CLTV) prediction system deployed at
    ASOS.com, a global online fashion retailer. CLTV prediction is an important
    problem in e-commerce where an accurate estimate of future value allows
    retailers to effectively allocate marketing spend, identify and nurture high
    value customers and mitigate exposure to losses. The system at ASOS provides
    daily estimates of the future value of every customer and is one of the
    cornerstones of the personalised shopping experience. The state of the art in
    this domain uses large numbers of handcrafted features and ensemble regressors
    to forecast value, predict churn and evaluate customer loyalty. We describe our
    system, which adopts this approach, and our ongoing efforts to further improve
    it. Recently, domains including language, vision and speech have shown dramatic
    advances by replacing handcrafted features with features that are learned
    automatically from data. We show that learning feature representations is a
    promising extension to the state of the art in CLTV modelling. We propose a
    novel way to generate embeddings of customers, which addresses the issue of the
    ever changing product catalogue and obtain a significant improvement over an
    exhaustive set of handcrafted features.


    Computation and Language

    Spice up Your Chat: The Intentions and Sentiment Effects of Using Emoji

    Tianran Hu, Han Guo, Hao Sun, Thuy-vy Thi Nguyen, Jiebo Luo
    Comments: 10 pages, published at ICWSM’17
    Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)

    Emojis, as a new way of conveying nonverbal cues, are widely adopted in
    computer-mediated communications. In this paper, first from a message sender
    perspective, we focus on people’s motives in using four types of emojis —
    positive, neutral, negative, and non-facial. We compare the willingness levels
    of using these emoji types for seven typical intentions that people usually
    apply nonverbal cues for in communication. The results of extensive statistical
    hypothesis tests not only report the popularities of the intentions, but also
    uncover the subtle differences between emoji types in terms of intended uses.
    Second, from a perspective of message recipients, we further study the
    sentiment effects of emojis, as well as their duplications, on verbal messages.
    Different from previous studies in emoji sentiment, we study the sentiments of
    emojis and their contexts as a whole. The experiment results indicate that the
    powers of conveying sentiment are different between four emoji types, and the
    sentiment effects of emojis vary in the contexts of different valences.

    A World of Difference: Divergent Word Interpretations among People

    Tianran Hu, Ruihua Song, Philip Ding, Xing Xie, Jiebo Luo
    Comments: 4 pages, 1 figure, published at ICWSM’17
    Subjects: Computation and Language (cs.CL)

    Divergent word usages reflect differences among people. In this paper, we
    present a novel angle for studying word usage divergence — word
    interpretations. We propose an approach that quantifies semantic differences in
    interpretations among different groups of people. The effectiveness of our
    approach is validated by quantitative evaluations. Experiment results indicate
    that divergences in word interpretations exist. We further apply the approach
    to two well studied types of differences between people — gender and region.
    The detected words with divergent interpretations reveal the unique features of
    specific groups of people. For gender, we discover that certain different
    interests, social attitudes, and characters between males and females are
    reflected in their divergent interpretations of many words. For region, we find
    that specific interpretations of certain words reveal the geographical and
    cultural features of different regions.

    Linguistic Knowledge as Memory for Recurrent Neural Networks

    Bhuwan Dhingra, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
    Subjects: Computation and Language (cs.CL)

    Training recurrent neural networks to model long term dependencies is
    difficult. Hence, we propose to use external linguistic knowledge as an
    explicit signal to inform the model which memories it should utilize.
    Specifically, external knowledge is used to augment a sequence with typed edges
    between arbitrarily distant elements, and the resulting graph is decomposed
    into directed acyclic subgraphs. We introduce a model that encodes such graphs
    as explicit memory in recurrent neural networks, and use it to model
    coreference relations in text. We apply our model to several text comprehension
    tasks and achieve new state-of-the-art results on all considered benchmarks,
    including CNN, bAbi, and LAMBADA. On the bAbi QA tasks, our model solves 15 out
    of the 20 tasks with only 1000 training examples per task. Analysis of the
    learned representations further demonstrates the ability of our model to encode
    fine-grained entity information across a document.

    Introduction to Formal Concept Analysis and Its Applications in Information Retrieval and Related Fields

    Dmitry I. Ignatov
    Journal-ref: RuSSIR 2014, Nizhniy Novgorod, Russia, CCIS vol. 505, Springer
    42-141
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)

    This paper is a tutorial on Formal Concept Analysis (FCA) and its
    applications. FCA is an applied branch of Lattice Theory, a mathematical
    discipline which enables formalisation of concepts as basic units of human
    thinking and analysing data in the object-attribute form. Originated in early
    80s, during the last three decades, it became a popular human-centred tool for
    knowledge representation and data analysis with numerous applications. Since
    the tutorial was specially prepared for RuSSIR 2014, the covered FCA topics
    include Information Retrieval with a focus on visualisation aspects, Machine
    Learning, Data Mining and Knowledge Discovery, Text Mining and several others.

    Data Noising as Smoothing in Neural Network Language Models

    Ziang Xie, Sida I. Wang, Jiwei Li, Daniel Lévy, Aiming Nie, Dan Jurafsky, Andrew Y. Ng
    Comments: ICLR 2017
    Subjects: Learning (cs.LG); Computation and Language (cs.CL)

    Data noising is an effective technique for regularizing neural network
    models. While noising is widely adopted in application domains such as vision
    and speech, commonly used noising primitives have not been developed for
    discrete sequence-level settings such as language modeling. In this paper, we
    derive a connection between input noising in neural network language models and
    smoothing in (n)-gram models. Using this connection, we draw upon ideas from
    smoothing to develop effective noising schemes. We demonstrate performance
    gains when applying the proposed schemes to language modeling and machine
    translation. Finally, we provide empirical analysis validating the relationship
    between noising and smoothing.


    Distributed, Parallel, and Cluster Computing

    Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications

    Enrico Calore, Alessandro Gabbana, Sebastiano Fabio Schifano, Raffaele Tripiccione
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

    Energy efficiency is becoming increasingly important for computing systems,
    in particular for large scale HPC facilities. In this work we evaluate, from an
    user perspective, the use of Dynamic Voltage and Frequency Scaling (DVFS)
    techniques, assisted by the power and energy monitoring capabilities of modern
    processors in order to tune applications for energy efficiency. We run selected
    kernels and a full HPC application on two high-end processors widely used in
    the HPC context, namely an NVIDIA K80 GPU and an Intel Haswell CPU. We evaluate
    the available trade-offs between energy-to-solution and time-to-solution,
    attempting a function-by-function frequency tuning. We finally estimate the
    benefits obtainable running the full code on a HPC multi-GPU node, with respect
    to default clock frequency governors. We instrument our code to accurately
    monitor power consumption and execution time without the need of any additional
    hardware, and we enable it to change CPUs and GPUs clock frequencies while
    running. We analyze our results on the different architectures using a simple
    energy-performance model, and derive a number of energy saving strategies which
    can be easily adopted on recent high-end HPC systems for generic applications.

    Byzantine-Tolerant Machine Learning

    Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, Julien Stainer
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)

    The growth of data, the need for scalability and the complexity of models
    used in modern machine learning calls for distributed implementations. Yet, as
    of today, distributed machine learning frameworks have largely ignored the
    possibility of arbitrary (i.e., Byzantine) failures. In this paper, we study
    the robustness to Byzantine failures at the fundamental level of stochastic
    gradient descent (SGD), the heart of most machine learning algorithms. Assuming
    a set of (n) workers, up to (f) of them being Byzantine, we ask how robust can
    SGD be, without limiting the dimension, nor the size of the parameter space.

    We first show that no gradient descent update rule based on a linear
    combination of the vectors proposed by the workers (i.e, current approaches)
    tolerates a single Byzantine failure. We then formulate a resilience property
    of the update rule capturing the basic requirements to guarantee convergence
    despite (f) Byzantine workers. We finally propose Krum, an update rule that
    satisfies the resilience property aforementioned. For a (d)-dimensional
    learning problem, the time complexity of Krum is (O(n^2 cdot (d + log n))).

    A Scalable Data Streaming Infrastructure for Smart Cities

    Jesus Arias Fisteus, Luis Sanchez Fernandez, Victor Corcoba Magaña, Mario Muñoz Organero, Jorge Yago Fernandez, Juan Antonio Alvarez Garcia
    Comments: Preprint of a paper accepted for publication at this http URL as part of the Proceedings of JARCA 2016 (XVIII Jornadas de ARCA Sistemas Cualitativos y sus Aplicaciones en Diagnosis, Rob’otica e Inteligencia Ambiental), Almeria, Spain, June 2016
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Many of the services a smart city can provide to its citizens rely on the
    ability of its infrastructure to collect and process in real time vast amounts
    of continuous data that sensors deployed through the city produce. In this
    paper we present the server infrastructure we have designed in the context of
    the HERMES project to collect the data from sensors and aggregate it in streams
    for their use in services of the smart city.

    MSF and Connectivity in Limited Variants of the Congested Clique

    Tomasz Jurdzinski, Krzysztof Nowicki
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    The congested clique is a synchronous, message-passing model of distributed
    computing in which each computational unit (node) in each round can send
    message of O(log n) bits to each other node of the network, where n is the
    number of nodes. This model has been considered under two extreme scanarios:
    unicast or broadcast. In the unicast model, a node can send (possibly)
    different message to each other node of the network. In contrast, in the
    broadcast model each node sends a single (the same) message to all other nodes.
    We study the congested clique model parametrized by the range r, the maximum
    number of different messages a node can send in one round. Following recent
    progress in design of algorihms for graph connectivity and minimum span- ning
    forest (MSF) in the unicast congested clique, we study these problems in
    limited variants of the congested clique. We present the first sub-logarithmic
    algorithm for connected components in the broadcast congested clique. Then, we
    show that efficient unicast deterministic algorithm for MSF and randomized
    algorithm for connected components can be efficiently imple- mented in the
    rcast model with range r = 2, the weakest model of the congested clique above
    the broadcast variant (r = 1) in the hierarchy with respect to range. More
    importantly, our al- gorithms give the first solutions with optimal capacity of
    communication edges, while preserving small round complexity.

    Multi-GPU maximum entropy image synthesis for radio astronomy

    M. Cárcamo, P. Román, S. Casassus, V. Moral, F.R. Rannou
    Comments: 11 pages, 13 figures
    Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Distributed, Parallel, and Cluster Computing (cs.DC)

    The maximum entropy method (MEM) is a well known deconvolution technique in
    radio-interferometry. This method solves a non-linear optimization problem with
    an entropy regularization term. Other heuristics such as CLEAN are faster but
    highly user dependent. Nevertheless, MEM has the following advantages: it is
    unsupervised, it has an statistical basis, it has a better resolution and
    better image quality under certain conditions. This work presents a high
    performance GPU version of non-gridded MEM, which is tested using
    interferometric and simulated data. We propose a single-GPU and a multi-GPU
    implementation for single and multi-spectral data, respectively. We also make
    use of the Peer-to-Peer and Unified Virtual Addressing features of newer GPUs
    which allows to exploit transparently and efficiently multiple GPUs. Several
    ALMA data sets are used to demonstrate the effectiveness in imaging and to
    evaluate GPU performance. The results show that a speedup from 1000 to 5000
    times faster than a sequential version can be achieved, depending on data and
    image size. This has allowed us to reconstruct the HD142527 CO(6-5) short
    baseline data set in 2.1 minutes, instead of the 2.5 days that takes on CPU.

    Large-scale image analysis using docker sandboxing

    B Sengupta, E Vazquez, M Sasdelli, Y Qian, M Peniak, L Netherton, G Delfino
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)

    With the advent of specialized hardware such as Graphics Processing Units
    (GPUs), large scale image localization, classification and retrieval have seen
    increased prevalence. Designing scalable software architecture that co-evolves
    with such specialized hardware is a challenge in the commercial setting. In
    this paper, we describe one such architecture ( extit{Cortexica}) that
    leverages scalability of GPUs and sandboxing offered by docker containers. This
    allows for the flexibility of mixing different computer architectures as well
    as computational algorithms with the security of a trusted environment. We
    illustrate the utility of this framework in a commercial setting i.e.,
    searching for multiple products in an image by combining image localisation and
    retrieval.


    Learning

    A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics

    Seyed Ali Ossia, Ali Shahin Shamsabadi, Ali Taheri, Hamid R. Rabiee, Nic Lane, Hamed Haddadi
    Comments: Technical Report
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    The increasing quality of smartphone cameras and variety of photo editing
    applications, in addition to the rise in popularity of image-centric social
    media, have all led to a phenomenal growth in mobile-based photography.
    Advances in computer vision and machine learning techniques provide a large
    number of cloud-based services with the ability to provide content analysis,
    face recognition, and object detection facilities to third parties. These
    inferences and analytics might come with undesired privacy risks to the
    individuals.

    In this paper, we address a fundamental challenge: Can we utilize the local
    processing capabilities of modern smartphones efficiently to provide desired
    features to approved analytics services, while protecting against undesired
    inference attacks and preserving privacy on the cloud? We propose a hybrid
    architecture for a distributed deep learning model between the smartphone and
    the cloud. We rely on the Siamese network and machine learning approaches for
    providing privacy based on defined privacy constraints. We also use transfer
    learning techniques to evaluate the proposed method. Using the latest deep
    learning models for Face Recognition, Emotion Detection, and Gender
    Classification techniques, we demonstrate the effectiveness of our technique in
    providing highly accurate classification results for the desired analytics,
    while proving strong privacy guarantees.

    Nearly-tight VC-dimension bounds for piecewise linear neural networks

    Nick Harvey, Chris Liaw, Abbas Mehrabian
    Comments: 11 pages, 2 figures
    Subjects: Learning (cs.LG)

    We prove new upper and lower bounds on the VC-dimension of deep neural
    networks with the ReLU activation function. These bounds are tight for almost
    the entire range of parameters. Letting (W) be the number of weights and (L) be
    the number of layers, we prove that the VC-dimension is (O(W L log(W))) and
    (Omega( W L log(W/L) )). This improves both the previously known upper bounds
    and lower bounds. In terms of the number (U) of non-linear units, we prove a
    tight bound (Theta(W U)) on the VC-dimension. All of these results generalize
    to arbitrary piecewise linear activation functions.

    Dropout Inference in Bayesian Neural Networks with Alpha-divergences

    Yingzhen Li, Yarin Gal
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    To obtain uncertainty estimates with real-world Bayesian deep learning
    models, practical inference approximations are needed. Dropout variational
    inference (VI) for example has been used for machine vision and medical
    applications, but VI can severely underestimates model uncertainty.
    Alpha-divergences are alternative divergences to VI’s KL objective, which are
    able to avoid VI’s uncertainty underestimation. But these are hard to use in
    practice: existing techniques can only use Gaussian approximating
    distributions, and require existing models to be changed radically, thus are of
    limited use for practitioners. We propose a re-parametrisation of the
    alpha-divergence objectives, deriving a simple inference technique which,
    together with dropout, can be easily implemented with existing models by simply
    changing the loss of the model. We demonstrate improved uncertainty estimates
    and accuracy compared to VI in dropout networks. We study our model’s epistemic
    uncertainty far away from the data using adversarial images, showing that these
    can be distinguished from non-adversarial images by examining our model’s
    uncertainty.

    Deep Bayesian Active Learning with Image Data

    Yarin Gal, Riashat Islam, Zoubin Ghahramani
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Even though active learning forms an important pillar of machine learning,
    deep learning tools are not prevalent within it. Deep learning poses several
    difficulties when used in an active learning setting. First, active learning
    (AL) methods generally rely on being able to learn and update models from small
    amounts of data. Recent advances in deep learning, on the other hand, are
    notorious for their dependence on large amounts of data. Second, many AL
    acquisition functions rely on model uncertainty, yet deep learning methods
    rarely represent such model uncertainty. In this paper we combine recent
    advances in Bayesian deep learning into the active learning framework in a
    practical way. We develop an active learning framework for high dimensional
    data, a task which has been extremely challenging so far, with very sparse
    existing literature. Taking advantage of specialised models such as Bayesian
    convolutional neural networks, we demonstrate our active learning techniques
    with image data, obtaining a significant improvement on existing active
    learning approaches. We demonstrate this on both the MNIST dataset, as well as
    for skin cancer diagnosis from lesion images (ISIC2016 task).

    Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers

    Andreas Doerr, Duy Nguyen-Tuong, Alonso Marco, Stefan Schaal, Sebastian Trimpe
    Comments: Accepted final version to appear in 2017 IEEE International Conference on Robotics and Automation (ICRA)
    Subjects: Learning (cs.LG); Robotics (cs.RO); Systems and Control (cs.SY); Machine Learning (stat.ML)

    PID control architectures are widely used in industrial applications. Despite
    their low number of open parameters, tuning multiple, coupled PID controllers
    can become tedious in practice. In this paper, we extend PILCO, a model-based
    policy search framework, to automatically tune multivariate PID controllers
    purely based on data observed on an otherwise unknown system. The system’s
    state is extended appropriately to frame the PID policy as a static state
    feedback policy. This renders PID tuning possible as the solution of a finite
    horizon optimal control problem without further a priori knowledge. The
    framework is applied to the task of balancing an inverted pendulum on a seven
    degree-of-freedom robotic arm, thereby demonstrating its capabilities of fast
    and data-efficient policy learning, even on complex real world problems.

    Inference in Sparse Graphs with Pairwise Measurements and Side Information

    Dylan J. Foster, Daniel Reichman, Karthik Sridharan
    Comments: 33 pages
    Subjects: Learning (cs.LG)

    We consider the statistical problem of recovering a hidden “ground truth”
    binary labeling for the vertices of a graph G up to low Hamming error from
    noisy edge and vertex measurements. We present new algorithms and a sharp
    finite-sample analysis for this problem on trees and sparse graphs with poor
    expansion properties such as hypergrids and ring lattices. Our method
    generalizes and improves over Globerson et al. (2015), who introduced the
    problem for two-dimensional grid lattices.

    For trees we provide a simple, efficient, algorithm that infers the ground
    truth with optimal Hamming error and implies recovery results for all connected
    graphs. Here, the presence of side information is critical to obtain a
    non-trivial recovery rate. We then show how to adapt this algorithm to tree
    decompositions of edge-subgraphs of certain graph families such as lattices,
    resulting in optimal recovery error rates that can be obtained in a highly
    efficient manner.

    The thrust of our analysis is to 1) use the tree decomposition along with
    edge measurements to produce a small class of viable vertex labelings and 2)
    apply fast rate results from statistical learn-ing theory to show that we can
    infer the ground truth from this class using vertex measurements. We show the
    power of our method by providing several examples including hypergrids, ring
    lattices, and the Newman-Watts model for small world graphs. For
    two-dimensional grids, our results improve over Globerson et al. (2015) by
    obtaining optimal recovery in the constant-height regime.

    Robust Adversarial Reinforcement Learning

    Lerrel Pinto, James Davidson, Rahul Sukthankar, Abhinav Gupta
    Comments: 10 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)

    Deep neural networks coupled with fast simulation and improved computation
    have led to recent successes in the field of reinforcement learning (RL).
    However, most current RL-based approaches fail to generalize since: (a) the gap
    between simulation and real world is so large that policy-learning approaches
    fail to transfer; (b) even if policy learning is done in real world, the data
    scarcity leads to failed generalization from training to test scenarios (e.g.,
    due to different friction or object masses). Inspired from H-infinity control
    methods, we note that both modeling errors and differences in training and test
    scenarios can be viewed as extra forces/disturbances in the system. This paper
    proposes the idea of robust adversarial reinforcement learning (RARL), where we
    train an agent to operate in the presence of a destabilizing adversary that
    applies disturbance forces to the system. The jointly trained adversary is
    reinforced — that is, it learns an optimal destabilization policy. We
    formulate the policy learning as a zero-sum, minimax objective function.
    Extensive experiments in multiple environments (InvertedPendulum, HalfCheetah,
    Swimmer, Hopper and Walker2d) conclusively demonstrate that our method (a)
    improves training stability; (b) is robust to differences in training/test
    conditions; and c) outperform the baseline even in the absence of the
    adversary.

    Structural Data Recognition with Graph Model Boosting

    Tomo Miyazaki, Shinichiro Omachi
    Comments: 8 pages
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    This paper presents a novel method for structural data recognition using a
    large number of graph models. In general, prevalent methods for structural data
    recognition have two shortcomings: 1) Only a single model is used to capture
    structural variation. 2) Naive recognition methods are used, such as the
    nearest neighbor method. In this paper, we propose strengthening the
    recognition performance of these models as well as their ability to capture
    structural variation. The proposed method constructs a large number of graph
    models and trains decision trees using the models. This paper makes two main
    contributions. The first is a novel graph model that can quickly perform
    calculations, which allows us to construct several models in a feasible amount
    of time. The second contribution is a novel approach to structural data
    recognition: graph model boosting. Comprehensive structural variations can be
    captured with a large number of graph models constructed in a boosting
    framework, and a sophisticated classifier can be formed by aggregating the
    decision trees. Consequently, we can carry out structural data recognition with
    powerful recognition capability in the face of comprehensive structural
    variation. The experiments shows that the proposed method achieves impressive
    results and outperforms existing methods on datasets of IAM graph database
    repository.

    Towards Generalization and Simplicity in Continuous Control

    Aravind Rajeswaran, Kendall Lowrey, Emanuel Todorov, Sham Kakade
    Comments: Project page: this https URL
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (cs.SY)

    This work shows that policies with simple linear and RBF parameterizations
    can be trained to solve a variety of continuous control tasks, including the
    OpenAI gym benchmarks. The performance of these trained policies are
    competitive with state of the art results, obtained with more elaborate
    parameterizations such as fully connected neural networks. Furthermore,
    existing training and testing scenarios are shown to be very limited and prone
    to over-fitting, thus giving rise to only trajectory-centric policies. Training
    with a diverse initial state distribution is shown to produce more global
    policies with better generalization. This allows for interactive control
    scenarios where the system recovers from large on-line perturbations; as shown
    in the supplementary video.

    Online Learning Without Prior Information

    Ashok Cutkosky, Kwabena Boahen
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    The vast majority of optimization and online learning algorithms today
    require some prior information about the data (often in the form of bounds on
    gradients or on the optimal parameter value). When this information is not
    available, these algorithms require laborious manual tuning of various
    hyperparameters, motivating the search for algorithms that can adapt to the
    data with no prior information. We describe a frontier of new lower bounds on
    the performance of such algorithms, reflecting a tradeoff between a term that
    depends on the optimal parameter value and a term that depends on the
    gradients’ rate of growth. Further, we construct a family of algorithms whose
    performance matches any desired point on this frontier, which no previous
    algorithm reaches.

    Horde of Bandits using Gaussian Markov Random Fields

    Sharan Vaswani, Mark Schmidt, Laks V.S. Lakshmanan
    Subjects: Learning (cs.LG)

    The gang of bandits (GOB) model cite{cesa2013gang} is a recent contextual
    bandits framework that shares information between a set of bandit problems,
    related by a known (possibly noisy) graph. This model is useful in problems
    like recommender systems where the large number of users makes it vital to
    transfer information between users. Despite its effectiveness, the existing GOB
    model can only be applied to small problems due to its quadratic
    time-dependence on the number of nodes. Existing solutions to combat the
    scalability issue require an often-unrealistic clustering assumption. By
    exploiting a connection to Gaussian Markov random fields (GMRFs), we show that
    the GOB model can be made to scale to much larger graphs without additional
    assumptions. In addition, we propose a Thompson sampling algorithm which uses
    the recent GMRF sampling-by-perturbation technique, allowing it to scale to
    even larger problems (leading to a “horde” of bandits). We give regret bounds
    and experimental results for GOB with Thompson sampling and epoch-greedy
    algorithms, indicating that these methods are as good as or significantly
    better than ignoring the graph or adopting a clustering-based approach.
    Finally, when an existing graph is not available, we propose a heuristic for
    learning it on the fly and show promising results.

    Online Convex Optimization with Unconstrained Domains and Losses

    Ashok Cutkosky, Kwabena Boahen
    Journal-ref: Advances in Neural Information Processing Systems 29 (2016)
    748-756
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We propose an online convex optimization algorithm (RescaledExp) that
    achieves optimal regret in the unconstrained setting without prior knowledge of
    any bounds on the loss functions. We prove a lower bound showing an exponential
    separation between the regret of existing algorithms that require a known bound
    on the loss functions and any algorithm that does not require such knowledge.
    RescaledExp matches this lower bound asymptotically in the number of
    iterations. RescaledExp is naturally hyperparameter-free and we demonstrate
    empirically that it matches prior optimization algorithms that require
    hyperparameter optimization.

    Bootstrapped Graph Diffusions: Exposing the Power of Nonlinearity

    Eliav Buchnik, Edith Cohen
    Comments: 11 pages, 5 figures, 6 tables
    Subjects: Learning (cs.LG)

    Graph-based semi-supervised learning (SSL) algorithms predict labels for all
    nodes based on provided labels of a small set of seed nodes. Classic methods
    capture the graph structure through some underlying diffusion process that
    propagates through the graph edges. Spectral diffusion, which includes
    personalized page rank and label propagation, propagates through random walks.
    Social diffusion propagates through shortest paths. A common ground to these
    diffusions is their {em linearity}, which does not distinguish between
    contributions of few “strong” relations and many “weak” relations.

    Recently, non-linear methods such as node embeddings and graph convolutional
    networks (GCN) demonstrated a large gain in quality for SSL tasks. These
    methods introduce multiple components and greatly vary on how the graph
    structure, seed label information, and other features are used.

    We aim here to study the contribution of non-linearity, as an isolated
    ingredient, to the performance gain. To do so, we place classic linear graph
    diffusions in a self-training framework. Surprisingly, we observe that SSL
    using the resulting {em bootstrapped diffusions} not only significantly
    improves over the respective non-bootstrapped baselines but also outperform
    state-of-the-art non-linear SSL methods. Moreover, since the self-training
    wrapper retains the scalability of the base method, we obtain both higher
    quality and better scalability.

    Customer Life Time Value Prediction Using Embeddings

    Benjamin Paul Chamberlain, Angelo Cardoso, C.H. Bryan Liu, Roberto Pagliari, Marc Peter Deisenroth
    Comments: 9 pages, 10 figures
    Subjects: Learning (cs.LG); Computers and Society (cs.CY); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We describe the Customer Life Time Value (CLTV) prediction system deployed at
    ASOS.com, a global online fashion retailer. CLTV prediction is an important
    problem in e-commerce where an accurate estimate of future value allows
    retailers to effectively allocate marketing spend, identify and nurture high
    value customers and mitigate exposure to losses. The system at ASOS provides
    daily estimates of the future value of every customer and is one of the
    cornerstones of the personalised shopping experience. The state of the art in
    this domain uses large numbers of handcrafted features and ensemble regressors
    to forecast value, predict churn and evaluate customer loyalty. We describe our
    system, which adopts this approach, and our ongoing efforts to further improve
    it. Recently, domains including language, vision and speech have shown dramatic
    advances by replacing handcrafted features with features that are learned
    automatically from data. We show that learning feature representations is a
    promising extension to the state of the art in CLTV modelling. We propose a
    novel way to generate embeddings of customers, which addresses the issue of the
    ever changing product catalogue and obtain a significant improvement over an
    exhaustive set of handcrafted features.

    Data Noising as Smoothing in Neural Network Language Models

    Ziang Xie, Sida I. Wang, Jiwei Li, Daniel Lévy, Aiming Nie, Dan Jurafsky, Andrew Y. Ng
    Comments: ICLR 2017
    Subjects: Learning (cs.LG); Computation and Language (cs.CL)

    Data noising is an effective technique for regularizing neural network
    models. While noising is widely adopted in application domains such as vision
    and speech, commonly used noising primitives have not been developed for
    discrete sequence-level settings such as language modeling. In this paper, we
    derive a connection between input noising in neural network language models and
    smoothing in (n)-gram models. Using this connection, we draw upon ideas from
    smoothing to develop effective noising schemes. We demonstrate performance
    gains when applying the proposed schemes to language modeling and machine
    translation. Finally, we provide empirical analysis validating the relationship
    between noising and smoothing.

    Regularising Non-linear Models Using Feature Side-information

    Amina Mollaysa, Pablo Strasser, Alexandros Kalousis
    Comments: 11 page with appendix
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Very often features come with their own vectorial descriptions which provide
    detailed information about their properties. We refer to these vectorial
    descriptions as feature side-information. In the standard learning scenario,
    input is represented as a vector of features and the feature side-information
    is most often ignored or used only for feature selection prior to model
    fitting. We believe that feature side-information which carries information
    about features intrinsic property will help improve model prediction if used in
    a proper way during learning process. In this paper, we propose a framework
    that allows for the incorporation of the feature side-information during the
    learning of very general model families to improve the prediction performance.
    We control the structures of the learned models so that they reflect features
    similarities as these are defined on the basis of the side-information. We
    perform experiments on a number of benchmark datasets which show significant
    predictive performance gains, over a number of baselines, as a result of the
    exploitation of the side-information.

    Unsupervised Ensemble Regression

    Omer Dror, Boaz Nadler, Erhan Bilal, Yuval Kluger
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Consider a regression problem where there is no labeled data and the only
    observations are the predictions (f_i(x_j)) of (m) experts (f_{i}) over many
    samples (x_j). With no knowledge on the accuracy of the experts, is it still
    possible to accurately estimate the unknown responses (y_{j})? Can one still
    detect the least or most accurate experts? In this work we propose a framework
    to study these questions, based on the assumption that the (m) experts have
    uncorrelated deviations from the optimal predictor. Assuming the first two
    moments of the response are known, we develop methods to detect the best and
    worst regressors, and derive U-PCR, a novel principal components approach for
    unsupervised ensemble regression. We provide theoretical support for U-PCR and
    illustrate its improved accuracy over the ensemble mean and median on a variety
    of regression problems.

    Learning a Unified Control Policy for Safe Falling

    Visak CV Kumar, Sehoon Ha, C Karen Liu
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Being able to fall safely is a necessary motor skill for humanoids performing
    highly dynamic tasks, such as running and jumping. We propose a new method to
    learn a policy that minimizes the maximal impulse during the fall. The
    optimization solves for both a discrete contact planning problem and a
    continuous optimal control problem. Once trained, the policy can compute the
    optimal next contacting body part (e.g. left foot, right foot, or hands),
    contact location and timing, and the required joint actuation. We represent the
    policy as a mixture of actor-critic neural network, which consists of n control
    policies and the corresponding value functions. Each pair of actor-critic is
    associated with one of the n possible contacting body parts. During execution,
    the policy corresponding to the highest value function will be executed while
    the associated body part will be the next contact with the ground. With this
    mixture of actor-critic architecture, the discrete contact sequence planning is
    solved through the selection of the best critics while the continuous control
    problem is solved by the optimization of actors. We show that our policy can
    achieve comparable, sometimes even higher, rewards than a recursive search of
    the action space using dynamic programming, while enjoying 50 to 400 times of
    speed gain during online execution.

    Memory Enriched Big Bang Big Crunch Optimization Algorithm for Data Clustering

    Kayvan Bijari, Hadi Zare, Hadi Veisi, Hossein Bobarshad
    Comments: 17 pages, 3 figures, 8 tables
    Journal-ref: Neural Comput & Applic (2016)
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    Cluster analysis plays an important role in decision making process for many
    knowledge-based systems. There exist a wide variety of different approaches for
    clustering applications including the heuristic techniques, probabilistic
    models, and traditional hierarchical algorithms. In this paper, a novel
    heuristic approach based on big bang-big crunch algorithm is proposed for
    clustering problems. The proposed method not only takes advantage of heuristic
    nature to alleviate typical clustering algorithms such as k-means, but it also
    benefits from the memory based scheme as compared to its similar heuristic
    techniques. Furthermore, the performance of the proposed algorithm is
    investigated based on several benchmark test functions as well as on the
    well-known datasets. The experimental results show the significant superiority
    of the proposed method over the similar algorithms.

    Discriminative models for multi-instance problems with tree-structure

    Tomas Pevny, Petr Somol
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

    Modeling network traffic is gaining importance in order to counter modern
    threats of ever increasing sophistication. It is though surprisingly difficult
    and costly to construct reliable classifiers on top of telemetry data due to
    the variety and complexity of signals that no human can manage to interpret in
    full. Obtaining training data with sufficiently large and variable body of
    labels can thus be seen as prohibitive problem. The goal of this work is to
    detect infected computers by observing their HTTP(S) traffic collected from
    network sensors, which are typically proxy servers or network firewalls, while
    relying on only minimal human input in model training phase. We propose a
    discriminative model that makes decisions based on all computer’s traffic
    observed during predefined time window (5 minutes in our case). The model is
    trained on collected traffic samples over equally sized time window per large
    number of computers, where the only labels needed are human verdicts about the
    computer as a whole (presumed infected vs. presumed clean). As part of training
    the model itself recognizes discriminative patterns in traffic targeted to
    individual servers and constructs the final high-level classifier on top of
    them. We show the classifier to perform with very high precision, while the
    learned traffic patterns can be interpreted as Indicators of Compromise. In the
    following we implement the discriminative model as a neural network with
    special structure reflecting two stacked multi-instance problems. The main
    advantages of the proposed configuration include not only improved accuracy and
    ability to learn from gross labels, but also automatic learning of server types
    (together with their detectors) which are typically visited by infected
    computers.

    Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy

    Quan Zou, Shixiang Wan, Ying Ju, Jijun Tang, Xiangxiang Zeng
    Subjects: Quantitative Methods (q-bio.QM); Learning (cs.LG); Biomolecules (q-bio.BM)

    Background: It is necessary and essential to discovery protein function from
    the novel primary sequences. Wet lab experimental procedures are not only
    time-consuming, but also costly, so predicting protein structure and function
    reliably based only on amino acid sequence has significant value. TATA-binding
    protein (TBP) is a kind of DNA binding protein, which plays a key role in the
    transcription regulation. Our study proposed an automatic approach for
    identifying TATA-binding proteins efficiently, accurately, and conveniently.
    This method would guide for the special protein identification with
    computational intelligence strategies. Results: Firstly, we proposed novel
    fingerprint features for TBP based on pseudo amino acid composition,
    physicochemical properties, and secondary structure. Secondly, hierarchical
    features dimensionality reduction strategies were employed to improve the
    performance furthermore. Currently, Pretata achieves 92.92% TATA- binding
    protein prediction accuracy, which is better than all other existing methods.
    Conclusions: The experiments demonstrate that our method could greatly improve
    the prediction accuracy and speed, thus allowing large-scale NGS data
    prediction to be practical. A web server is developed to facilitate the other
    researchers, which can be accessed at this http URL

    An Integrated and Scalable Platform for Proactive Event-Driven Traffic Management

    Alain Kibangou, Alexander Artikis, Evangelos Michelioudakis, Georgios Paliouras, Marius Schmitt, John Lygeros, Chris Baber, Natan Morar, Fabiana Fournier, Inna Skarbovsky
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

    Traffic on freeways can be managed by means of ramp meters from Road Traffic
    Control rooms. Human operators cannot efficiently manage a network of ramp
    meters. To support them, we present an intelligent platform for traffic
    management which includes a new ramp metering coordination scheme in the
    decision making module, an efficient dashboard for interacting with human
    operators, machine learning tools for learning event definitions and Complex
    Event Processing tools able to deal with uncertainties inherent to the traffic
    use case. Unlike the usual approach, the devised event-driven platform is able
    to predict a congestion up to 4 minutes before it really happens. Proactive
    decision making can then be established leading to significant improvement of
    traffic conditions.

    Byzantine-Tolerant Machine Learning

    Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, Julien Stainer
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)

    The growth of data, the need for scalability and the complexity of models
    used in modern machine learning calls for distributed implementations. Yet, as
    of today, distributed machine learning frameworks have largely ignored the
    possibility of arbitrary (i.e., Byzantine) failures. In this paper, we study
    the robustness to Byzantine failures at the fundamental level of stochastic
    gradient descent (SGD), the heart of most machine learning algorithms. Assuming
    a set of (n) workers, up to (f) of them being Byzantine, we ask how robust can
    SGD be, without limiting the dimension, nor the size of the parameter space.

    We first show that no gradient descent update rule based on a linear
    combination of the vectors proposed by the workers (i.e, current approaches)
    tolerates a single Byzantine failure. We then formulate a resilience property
    of the update rule capturing the basic requirements to guarantee convergence
    despite (f) Byzantine workers. We finally propose Krum, an update rule that
    satisfies the resilience property aforementioned. For a (d)-dimensional
    learning problem, the time complexity of Krum is (O(n^2 cdot (d + log n))).

    Guaranteed Tensor PCA with Optimality in Statistics and Computation

    Anru Zhang, Dong Xia
    Subjects: Statistics Theory (math.ST); Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

    Tensors, or high-order arrays, attract much attention in recent research. In
    this paper, we propose a general framework for tensor principal component
    analysis (tensor PCA), which focuses on the methodology and theory for
    extracting the hidden low-rank structure from the high-dimensional tensor data.
    A unified solution is provided for tensor PCA with considerations in both
    statistical limits and computational costs. The problem exhibits three
    different phases according to the signal-noise-ratio (SNR). In particular, with
    strong SNR, we propose a fast spectral power iteration method that achieves the
    minimax optimal rate of convergence in estimation; with weak SNR, the
    information-theoretical lower bound shows that it is impossible to have
    consistent estimation in general; with moderate SNR, we show that the
    non-convex maximum likelihood estimation provides optimal solution, but with
    NP-hard computational cost; moreover, under the hardness hypothesis of
    hypergraphic planted clique detection, there are no polynomial-time algorithms
    performing consistently in general. Simulation studies show that the proposed
    spectral power iteration method have good performance under a variety of
    settings.

    Scalable Greedy Feature Selection via Weak Submodularity

    Rajiv Khanna, Ethan Elenberg, Alexandros G. Dimakis, Sahand Negahban, Joydeep Ghosh
    Comments: To appear in AISTATS 2017
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    Greedy algorithms are widely used for problems in machine learning such as
    feature selection and set function optimization. Unfortunately, for large
    datasets, the running time of even greedy algorithms can be quite high. This is
    because for each greedy step we need to refit a model or calculate a function
    using the previously selected choices and the new candidate.

    Two algorithms that are faster approximations to the greedy forward selection
    were introduced recently ([Mirzasoleiman et al. 2013, 2015]). They achieve
    better performance by exploiting distributed computation and stochastic
    evaluation respectively. Both algorithms have provable performance guarantees
    for submodular functions.

    In this paper we show that divergent from previously held opinion,
    submodularity is not required to obtain approximation guarantees for these two
    algorithms. Specifically, we show that a generalized concept of weak
    submodularity suffices to give multiplicative approximation guarantees. Our
    result extends the applicability of these algorithms to a larger class of
    functions. Furthermore, we show that a bounded submodularity ratio can be used
    to provide data dependent bounds that can sometimes be tighter also for
    submodular functions. We empirically validate our work by showing superior
    performance of fast greedy approximations versus several established baselines
    on artificial and real datasets.

    On Approximation Guarantees for Greedy Low Rank Optimization

    Rajiv Khanna, Ethan Elenberg, Alexandros G. Dimakis, Sahand Negahban
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    We provide new approximation guarantees for greedy low rank matrix estimation
    under standard assumptions of restricted strong convexity and smoothness. Our
    novel analysis also uncovers previously unknown connections between the low
    rank estimation and combinatorial optimization, so much so that our bounds are
    reminiscent of corresponding approximation bounds in submodular maximization.
    Additionally, we also provide statistical recovery guarantees. Finally, we
    present empirical comparison of greedy estimation with established baselines on
    two important real-world problems.

    Leveraging Sparsity for Efficient Submodular Data Summarization

    Erik M. Lindgren, Shanshan Wu, Alexandros G. Dimakis
    Comments: In NIPS 2016
    Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG)

    The facility location problem is widely used for summarizing large datasets
    and has additional applications in sensor placement, image retrieval, and
    clustering. One difficulty of this problem is that submodular optimization
    algorithms require the calculation of pairwise benefits for all items in the
    dataset. This is infeasible for large problems, so recent work proposed to only
    calculate nearest neighbor benefits. One limitation is that several strong
    assumptions were invoked to obtain provable approximation guarantees. In this
    paper we establish that these extra assumptions are not necessary—solving the
    sparsified problem will be almost optimal under the standard assumptions of the
    problem. We then analyze a different method of sparsification that is a better
    model for methods such as Locality Sensitive Hashing to accelerate the nearest
    neighbor computations and extend the use of the problem to a broader family of
    similarities. We validate our approach by demonstrating that it rapidly
    generates interpretable summaries.

    Exact MAP Inference by Avoiding Fractional Vertices

    Erik M. Lindgren, Alexandros G. Dimakis, Adam Klivans
    Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG)

    Given a graphical model, one essential problem is MAP inference, that is,
    finding the most likely configuration of states according to the model.
    Although this problem is NP-hard, large instances can be solved in practice. A
    major open question is to explain why this is true. We give a natural condition
    under which we can provably perform MAP inference in polynomial time. We
    require that the number of fractional vertices in the LP relaxation exceeding
    the optimal solution is bounded by a polynomial in the problem size. This
    resolves an open question by Dimakis, Gohari, and Wainwright. In contrast, for
    general LP relaxations of integer programs, known techniques can only handle a
    constant number of fractional vertices whose value exceeds the optimal
    solution. We experimentally verify this condition and demonstrate how efficient
    various integer programming methods are at removing fractional solutions.

    Sparse Quadratic Logistic Regression in Sub-quadratic Time

    Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G. Dimakis, Sujay Sanghavi
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    We consider support recovery in the quadratic logistic regression setting –
    where the target depends on both p linear terms (x_i) and up to (p^2) quadratic
    terms (x_i x_j). Quadratic terms enable prediction/modeling of higher-order
    effects between features and the target, but when incorporated naively may
    involve solving a very large regression problem. We consider the sparse case,
    where at most (s) terms (linear or quadratic) are non-zero, and provide a new
    faster algorithm. It involves (a) identifying the weak support (i.e. all
    relevant variables) and (b) standard logistic regression optimization only on
    these chosen variables. The first step relies on a novel insight about
    correlation tests in the presence of non-linearity, and takes (O(pn)) time for
    (n) samples – giving potentially huge computational gains over the naive
    approach. Motivated by insights from the boolean case, we propose a non-linear
    correlation test for non-binary finite support case that involves hashing a
    variable and then correlating with the output variable. We also provide
    experimental results to demonstrate the effectiveness of our methods.

    Streaming Weak Submodularity: Interpreting Neural Networks on the Fly

    Ethan R. Elenberg, Alexandros G. Dimakis, Moran Feldman, Amin Karbasi
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    In many machine learning applications, it is important to explain the
    predictions of a black-box classifier. For example, why does a deep neural
    network assign an image to a particular class? We cast interpretability of
    black-box classifiers as a combinatorial maximization problem and propose an
    efficient streaming algorithm to solve it subject to cardinality constraints.
    By extending ideas from Badanidiyuru et al. [2014], we provide a constant
    factor approximation guarantee for our algorithm in the case of random stream
    order and a weakly submodular objective function. This is the first such
    theoretical guarantee for this general class of functions, and we also show
    that no such algorithm exists for a worst case stream order. Our algorithm
    obtains similar explanations of Inception V3 predictions (10) times faster than
    the state-of-the-art LIME framework of Ribeiro et al. [2016].

    Don't Fear the Bit Flips: Optimized Coding Strategies for Binary Classification

    Frederic Sala, Shahroze Kabir, Guy Van den Broeck, Lara Dolecek
    Comments: 11 pages, 4 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    After being trained, classifiers must often operate on data that has been
    corrupted by noise. In this paper, we consider the impact of such noise on the
    features of binary classifiers. Inspired by tools for classifier robustness, we
    introduce the same classification probability (SCP) to measure the resulting
    distortion on the classifier outputs. We introduce a low-complexity estimate of
    the SCP based on quantization and polynomial multiplication. We also study
    channel coding techniques based on replication error-correcting codes. In
    contrast to the traditional channel coding approach, where error-correction is
    meant to preserve the data and is agnostic to the application, our schemes
    specifically aim to maximize the SCP (equivalently minimizing the distortion of
    the classifier output) for the same redundancy overhead.

    Online Learning of Optimal Bidding Strategy in Repeated Multi-Commodity Auctions

    Sevi Baltaoglu, Lang Tong, Qing Zhao
    Subjects: Computer Science and Game Theory (cs.GT); Learning (cs.LG)

    We study the online learning problem of a bidder who participates in repeated
    auctions. With the goal of maximizing his total T-period payoff, the bidder
    wants to determine the optimal allocation of his fixed budget among his bids
    for (K) different goods at each period. As a bidding strategy, we propose a
    polynomial time algorithm, referred to as dynamic programming on discrete set
    (DPDS), which is inspired by the dynamic programming approach to Knapsack
    problems. We show that DPDS achieves the regret order of (O(sqrt{Tlog{T}})).
    Also, by showing that the regret growth rate is lower bounded by
    (Omega(sqrt{T})) for any bidding strategy, we conclude that DPDS algorithm is
    order optimal up to a (sqrt{log{T}}) term. We also evaluate the performance
    of DPDS empirically in the context of virtual bidding in wholesale electricity
    markets by using historical data from the New York energy market.

    Distance Metric Learning using Graph Convolutional Networks: Application to Functional Brain Networks

    Sofia Ira Ktena, Sarah Parisot, Enzo Ferrante, Martin Rajchl, Matthew Lee, Ben Glocker, Daniel Rueckert
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Evaluating similarity between graphs is of major importance in several
    computer vision and pattern recognition problems, where graph representations
    are often used to model objects or interactions between elements. The choice of
    a distance or similarity metric is, however, not trivial and can be highly
    dependent on the application at hand. In this work, we propose a novel metric
    learning method to evaluate distance between graphs that leverages the power of
    convolutional neural networks, while exploiting concepts from spectral graph
    theory to allow these operations on irregular graphs. We demonstrate the
    potential of our method in the field of connectomics, where neuronal pathways
    or functional connections between brain regions are commonly modelled as
    graphs. In this problem, the definition of an appropriate graph similarity
    function is critical to unveil patterns of disruptions associated with certain
    brain disorders. Experimental results on the ABIDE dataset show that our method
    can learn a graph similarity metric tailored for a clinical application,
    improving the performance of a simple k-nn classifier by 11.9% compared to a
    traditional distance metric.


    Information Theory

    A Low-Complexity Adaptive Multisine Waveform Design for Wireless Power Transfer

    Bruno Clerckx, Ekaterina Bayguzina
    Comments: submitted for publication
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    Far-field Wireless Power Transfer (WPT) has attracted significant attention
    in the last decade. Recently, channel-adaptive waveforms have been shown to
    significantly increase the DC power level at the output of the rectifier.
    However the design of those waveforms is generally computationally complex and
    does not lend itself easily to practical implementation. We here propose a
    low-complexity channel-adaptive multisine waveform design whose performance is
    very close to that of the optimal design. Performance evaluations confirm the
    benefits of the new design in various rectifier topologies.

    Wirelessly Powered Backscatter Communications: Waveform Design and SNR-Energy Tradeoff

    Bruno Clerckx, Zati Bayani Zawawi, Kaibin Huang
    Comments: submitted for publication
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    This paper shows that wirelessly powered backscatter communications is
    subject to a fundamental tradeoff between the harvested energy at the tag and
    the reliability of the backscatter communication, measured in terms of SNR at
    the reader. Assuming the RF transmit signal is a multisine waveform adaptive to
    the channel state information, we derive a systematic approach to optimize the
    transmit waveform weights (amplitudes and phases) in order to enlarge as much
    as possible the SNRenergy region. Performance evaluations confirm the
    significant benefits of using multiple frequency components in the adaptive
    transmit multisine waveform to exploit the nonlinearity of the rectifier and a
    frequency diversity gain.

    End-to-end Throughput Maximization for Underlay Multi-hop Cognitive Radio Networks with RF Energy Harvesting

    Chi Xu, Meng Zheng, Wei Liang, Haibin Yu, Ying-Chang Liang
    Subjects: Information Theory (cs.IT)

    This paper studies a green paradigm for the underlay coexistence of primary
    users (PUs) and secondary users (SUs) in energy harvesting cognitive radio
    networks (EH-CRNs), wherein battery-free SUs capture both the spectrum and the
    energy of PUs to enhance spectrum efficiency and green energy utilization. To
    lower the transmit powers of SUs, we employ multi-hop transmission with time
    division multiple access, by which SUs first harvest energy from the RF signals
    of PUs and then transmit data in the allocated time concurrently with PUs, all
    in the licensed spectrum. In this way, the available transmit energy of each SU
    mainly depends on the harvested energy before the turn to transmit, namely
    energy causality. Meanwhile, the transmit powers of SUs must be strictly
    controlled to protect PUs from harmful interference. Thus, subject to the
    energy causality constraint and the interference power constraint, we study the
    end-to-end throughput maximization problem for optimal time and power
    allocation. To solve this nonconvex problem, we first equivalently transform it
    into a convex optimization problem and then propose the joint optimal time and
    power allocation (JOTPA) algorithm that iteratively solves a series of
    feasibility problems until convergence. Extensive simulations evaluate the
    performance of EH-CRNs with JOTPA in three typical deployment scenarios and
    validate the superiority of JOTPA by making comparisons with two other resource
    allocation algorithms.

    Performance Scaling Law for Multi-Cell Multi-User Massive MIMO with MRT

    Cheng Zhang, Yindi Jing, Yongming Huang, Luxi Yang
    Comments: 11 pages, 5 figures, submitted to IEEE Transactions on Vehicular Technology
    Subjects: Information Theory (cs.IT)

    This work provides a comprehensive scaling law based performance analysis for
    multi-cell multi-user massive multiple-input-multiple-output (MIMO) downlink
    systems with maximal-ratio transmission (MRT). Imperfect channel state
    information (CSI), pilot contamination, and channel spatial correlation are all
    considered. First, a sum-rate lower bound is derived by exploiting the
    asymptotically deterministic property of the received signal power, while
    keeping the random nature of other components in the
    signal-to-interference-plus-noise-ratio (SINR) intact. Via a general scaling
    model on important network parameters, including the number of users, the
    channel training energy and the data transmission power, with respect to the
    number of base station antennas, the asymptotic scaling law of the effective
    SINR is obtained, which reveals quantitatively the tradeoff of the network
    parameters. More importantly, pilot contamination and pilot contamination
    elimination (PCE) are considered in the analytical framework. In addition, the
    applicability of the derived asymptotic scaling law in practical systems with
    large but finite antenna numbers are discussed. Finally, sufficient conditions
    on the parameter scalings for the SINR to be asymptotically deterministic in
    the sense of mean square convergence are provided, which covers existing
    results on such analysis as special cases and shows the effect of PCE
    explicitly.

    Conditional quantum one-time pad

    Kunal Sharma, Eyuri Wakakuwa, Mark M. Wilde
    Comments: 6 pages
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    Suppose that Alice and Bob are located in distant laboratories, which are
    connected by an ideal quantum channel. Suppose further that they share many
    copies of a quantum state (
    ho_{ABE}), such that Alice possesses the (A)
    systems and Bob the (BE) systems. In our model, there is an identifiable part
    of Bob’s laboratory that is insecure: a third party named Eve has infiltrated
    Bob’s laboratory and gained control of the (E) systems. Alice, knowing this,
    would like use their shared state and the ideal quantum channel to communicate
    a message in such a way that Bob, who has access to the whole of his laboratory
    ((BE) systems), can decode it, while Eve, who has access only to a sector of
    Bob’s laboratory ((E) systems) and the ideal quantum channel connecting Alice
    to Bob, cannot learn anything about Alice’s transmitted message. We call this
    task the conditional one-time pad, and in this paper, we prove that the optimal
    rate of secret communication for this task is equal to the conditional quantum
    mutual information (I(A;B|E)) of their shared state. We thus give the
    conditional quantum mutual information an operational meaning that is different
    from those given in prior works, via state redistribution, conditional erasure,
    or state deconstruction. We also generalize the model and method in several
    ways, one of which demonstrates that the negative tripartite interaction
    information (-I_{3}(A;B;E) = I(A;BE)-I(A;B)-I(A;E)) of a tripartite state
    (
    ho_{ABE}) is an achievable rate for a secret-sharing task, i.e., the case in
    which Alice’s message should be secure from someone possessing only the (AB) or
    (AE) systems but should be decodable by someone possessing all systems (A),
    (B), and (E).

    Scalable Greedy Feature Selection via Weak Submodularity

    Rajiv Khanna, Ethan Elenberg, Alexandros G. Dimakis, Sahand Negahban, Joydeep Ghosh
    Comments: To appear in AISTATS 2017
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    Greedy algorithms are widely used for problems in machine learning such as
    feature selection and set function optimization. Unfortunately, for large
    datasets, the running time of even greedy algorithms can be quite high. This is
    because for each greedy step we need to refit a model or calculate a function
    using the previously selected choices and the new candidate.

    Two algorithms that are faster approximations to the greedy forward selection
    were introduced recently ([Mirzasoleiman et al. 2013, 2015]). They achieve
    better performance by exploiting distributed computation and stochastic
    evaluation respectively. Both algorithms have provable performance guarantees
    for submodular functions.

    In this paper we show that divergent from previously held opinion,
    submodularity is not required to obtain approximation guarantees for these two
    algorithms. Specifically, we show that a generalized concept of weak
    submodularity suffices to give multiplicative approximation guarantees. Our
    result extends the applicability of these algorithms to a larger class of
    functions. Furthermore, we show that a bounded submodularity ratio can be used
    to provide data dependent bounds that can sometimes be tighter also for
    submodular functions. We empirically validate our work by showing superior
    performance of fast greedy approximations versus several established baselines
    on artificial and real datasets.

    On Approximation Guarantees for Greedy Low Rank Optimization

    Rajiv Khanna, Ethan Elenberg, Alexandros G. Dimakis, Sahand Negahban
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    We provide new approximation guarantees for greedy low rank matrix estimation
    under standard assumptions of restricted strong convexity and smoothness. Our
    novel analysis also uncovers previously unknown connections between the low
    rank estimation and combinatorial optimization, so much so that our bounds are
    reminiscent of corresponding approximation bounds in submodular maximization.
    Additionally, we also provide statistical recovery guarantees. Finally, we
    present empirical comparison of greedy estimation with established baselines on
    two important real-world problems.

    Leveraging Sparsity for Efficient Submodular Data Summarization

    Erik M. Lindgren, Shanshan Wu, Alexandros G. Dimakis
    Comments: In NIPS 2016
    Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG)

    The facility location problem is widely used for summarizing large datasets
    and has additional applications in sensor placement, image retrieval, and
    clustering. One difficulty of this problem is that submodular optimization
    algorithms require the calculation of pairwise benefits for all items in the
    dataset. This is infeasible for large problems, so recent work proposed to only
    calculate nearest neighbor benefits. One limitation is that several strong
    assumptions were invoked to obtain provable approximation guarantees. In this
    paper we establish that these extra assumptions are not necessary—solving the
    sparsified problem will be almost optimal under the standard assumptions of the
    problem. We then analyze a different method of sparsification that is a better
    model for methods such as Locality Sensitive Hashing to accelerate the nearest
    neighbor computations and extend the use of the problem to a broader family of
    similarities. We validate our approach by demonstrating that it rapidly
    generates interpretable summaries.

    Exact MAP Inference by Avoiding Fractional Vertices

    Erik M. Lindgren, Alexandros G. Dimakis, Adam Klivans
    Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG)

    Given a graphical model, one essential problem is MAP inference, that is,
    finding the most likely configuration of states according to the model.
    Although this problem is NP-hard, large instances can be solved in practice. A
    major open question is to explain why this is true. We give a natural condition
    under which we can provably perform MAP inference in polynomial time. We
    require that the number of fractional vertices in the LP relaxation exceeding
    the optimal solution is bounded by a polynomial in the problem size. This
    resolves an open question by Dimakis, Gohari, and Wainwright. In contrast, for
    general LP relaxations of integer programs, known techniques can only handle a
    constant number of fractional vertices whose value exceeds the optimal
    solution. We experimentally verify this condition and demonstrate how efficient
    various integer programming methods are at removing fractional solutions.

    Sparse Quadratic Logistic Regression in Sub-quadratic Time

    Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G. Dimakis, Sujay Sanghavi
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    We consider support recovery in the quadratic logistic regression setting –
    where the target depends on both p linear terms (x_i) and up to (p^2) quadratic
    terms (x_i x_j). Quadratic terms enable prediction/modeling of higher-order
    effects between features and the target, but when incorporated naively may
    involve solving a very large regression problem. We consider the sparse case,
    where at most (s) terms (linear or quadratic) are non-zero, and provide a new
    faster algorithm. It involves (a) identifying the weak support (i.e. all
    relevant variables) and (b) standard logistic regression optimization only on
    these chosen variables. The first step relies on a novel insight about
    correlation tests in the presence of non-linearity, and takes (O(pn)) time for
    (n) samples – giving potentially huge computational gains over the naive
    approach. Motivated by insights from the boolean case, we propose a non-linear
    correlation test for non-binary finite support case that involves hashing a
    variable and then correlating with the output variable. We also provide
    experimental results to demonstrate the effectiveness of our methods.

    Performance Bounds for Graphical Record Linkage

    Rebecca C. Steorts, Matt Barnes, Willie Neiswanger
    Comments: 11 pages with supplement; 4 figures and 2 tables; to appear in AISTATS 2017
    Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Methodology (stat.ME); Machine Learning (stat.ML)

    Record linkage involves merging records in large, noisy databases to remove
    duplicate entities. It has become an important area because of its widespread
    occurrence in bibliometrics, public health, official statistics production,
    political science, and beyond. Traditional linkage methods directly linking
    records to one another are computationally infeasible as the number of records
    grows. As a result, it is increasingly common for researchers to treat record
    linkage as a clustering task, in which each latent entity is associated with
    one or more noisy database records. We critically assess performance bounds
    using the Kullback-Leibler (KL) divergence under a Bayesian record linkage
    framework, making connections to Kolchin partition models. We provide an upper
    bound using the KL divergence and a lower bound on the minimum probability of
    misclassifying a latent entity. We give insights for when our bounds hold using
    simulated data and provide practical user guidance.

    Joint Multichannel Deconvolution and Blind Source Separation

    Ming Jiang, Jérôme Bobin, Jean-Luc Starck
    Subjects: Applications (stat.AP); Information Theory (cs.IT)

    Blind Source Separation (BSS) is a challenging matrix factorization problem
    that plays a central role in multichannel imaging science. In a large number of
    applications, such as astrophysics, current unmixing methods are limited since
    real-world mixtures are generally affected by extra instrumental effects like
    blurring. Therefore, BSS has to be solved jointly with a deconvolution problem,
    which requires tackling a new inverse problem: deconvolution BSS (DBSS). In
    this article, we introduce an innovative DBSS approach, called DecGMCA, based
    on sparse signal modeling and an efficient alternative projected least square
    algorithm. Numerical results demonstrate that the DecGMCA algorithm performs
    very well on simulations. It further highlights the importance of jointly
    solving BSS and deconvolution instead of considering these two problems
    independently. Furthermore, the performance of the proposed DecGMCA algorithm
    is demonstrated on simulated radio-interferometric data.

    Streaming Weak Submodularity: Interpreting Neural Networks on the Fly

    Ethan R. Elenberg, Alexandros G. Dimakis, Moran Feldman, Amin Karbasi
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    In many machine learning applications, it is important to explain the
    predictions of a black-box classifier. For example, why does a deep neural
    network assign an image to a particular class? We cast interpretability of
    black-box classifiers as a combinatorial maximization problem and propose an
    efficient streaming algorithm to solve it subject to cardinality constraints.
    By extending ideas from Badanidiyuru et al. [2014], we provide a constant
    factor approximation guarantee for our algorithm in the case of random stream
    order and a weakly submodular objective function. This is the first such
    theoretical guarantee for this general class of functions, and we also show
    that no such algorithm exists for a worst case stream order. Our algorithm
    obtains similar explanations of Inception V3 predictions (10) times faster than
    the state-of-the-art LIME framework of Ribeiro et al. [2016].

    Cost-Optimal Learning of Causal Graphs

    Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath
    Subjects: Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

    We consider the problem of learning a causal graph over a set of variables
    with interventions. We study the cost-optimal causal graph learning problem:
    For a given skeleton (undirected version of the causal graph), design the set
    of interventions with minimum total cost, that can uniquely identify any causal
    graph with the given skeleton. We show that this problem is solvable in
    polynomial time. Later, we consider the case when the number of interventions
    is limited. For this case, we provide polynomial time algorithms when the
    skeleton is a tree or a clique tree. For a general chordal skeleton, we develop
    an efficient greedy algorithm, which can be improved when the causal graph
    skeleton is an interval graph.




沪ICP备19023445号-2号
友情链接