IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Fri, 3 Mar 2017

    我爱机器学习(52ml.net)发表于 2017-03-03 00:00:00
    love 0

    Neural and Evolutionary Computing

    Evolving Deep Neural Networks

    Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Dan Fink, Olivier Francon, Bala Raju, Arshak Navruzyan, Nigel Duffy, Babak Hodjat
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

    The success of deep learning depends on finding an architecture to fit the
    task. As deep learning has scaled up to more challenging tasks, the
    architectures have become difficult to design by hand. This paper proposes an
    automated method, CoDeepNEAT, for optimizing deep learning architectures
    through evolution. By extending existing neuroevolution methods to topology,
    components, and hyperparameters, this method achieves results comparable to
    best human designs in standard benchmarks in object recognition and language
    modeling. It also supports building a real-world application of automated image
    captioning on a magazine website. Given the anticipated increases in available
    computing power, evolution of deep networks is promising approach to
    constructing deep learning applications in the future.

    Conversion Rate Optimization through Evolutionary Computation

    Risto Miikkulainen, Neil Iscoe, Aaron Shagrin, Ron Cordell, Cory Schoolland, Myles Brundage, Jonathan Epstein, Randy Dean, Gurmeet Lamba
    Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Conversion optimization means designing a web interface so that as many users
    as possible take a desired action on it, such as register or purchase. Such
    design is usually done by hand, testing one change at a time through A/B
    testing, or a limited number of combinations through multivariate testing,
    making it possible to evaluate only a small fraction of designs in a vast
    design space. This paper describes Sentient Ascend, an automatic conversion
    optimization system that uses evolutionary optimization to create effective web
    interface designs. Ascend makes it possible to discover and utilize
    interactions between the design elements that are difficult to identify
    otherwise. Moreover, evaluation of design candidates is done in parallel
    online, i.e. with a large number of real users interacting with the system. A
    case study on a lead generation site learnhotobecome.org shows that significant
    improvements (i.e. over 43%) are possible beyond human design. Ascend can
    therefore be seen as an approach to massively multivariate conversion
    optimization, based on a massively parallel interactive evolution.

    Understanding Synthetic Gradients and Decoupled Neural Interfaces

    Wojciech Marian Czarnecki, Grzegorz Świrszcz, Max Jaderberg, Simon Osindero, Oriol Vinyals, Koray Kavukcuoglu
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    When training neural networks, the use of Synthetic Gradients (SG) allows
    layers or modules to be trained without update locking – without waiting for a
    true error gradient to be backpropagated – resulting in Decoupled Neural
    Interfaces (DNIs). This unlocked ability of being able to update parts of a
    neural network asynchronously and with only local information was demonstrated
    to work empirically in Jaderberg et al (2016). However, there has been very
    little demonstration of what changes DNIs and SGs impose from a functional,
    representational, and learning dynamics point of view. In this paper, we study
    DNIs through the use of synthetic gradients on feed-forward networks to better
    understand their behaviour and elucidate their effect on optimisation. We show
    that the incorporation of SGs does not affect the representational strength of
    the learning system for a neural network, and prove the convergence of the
    learning system for linear and deep linear models. On practical problems we
    investigate the mechanism by which synthetic gradient estimators approximate
    the true loss, and, surprisingly, how that leads to drastically different
    layer-wise representations. Finally, we also expose the relationship of using
    synthetic gradients to other error approximation techniques and find a unifying
    language for discussion and comparison.

    Predictive Business Process Monitoring with LSTM Neural Networks

    Niek Tax, Ilya Verenich, Marcello La Rosa, Marlon Dumas
    Subjects: Applications (stat.AP); Databases (cs.DB); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Predictive business process monitoring methods exploit logs of completed
    cases of a process in order to make predictions about running cases thereof.
    Existing methods in this space are tailor-made for specific prediction tasks.
    Moreover, their relative accuracy is highly sensitive to the dataset at hand,
    thus requiring users to engage in trial-and-error and tuning when applying them
    in a specific setting. This paper investigates Long Short-Term Memory (LSTM)
    neural networks as an approach to build consistently accurate models for a wide
    range of predictive process monitoring tasks. First, we show that LSTMs
    outperform existing techniques to predict the next event of a running case and
    its timestamp. Next, we show how to use models for predicting the next task in
    order to predict the full continuation of a running case. Finally, we apply the
    same approach to predict the remaining time, and show that this approach
    outperforms existing tailor-made methods.


    Computer Vision and Pattern Recognition

    Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources

    Adrian Bulat, Georgios Tzimiropoulos
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This work is on landmark localization using binarized approximations of
    Convolutional Neural Networks (CNNs). Our goal is to design architectures that
    retain the groundbreaking performance of CNNs for landmark localization and at
    the same time are lightweight, compact and suitable for applications with
    limited computational resources. To this end, we make the following
    contributions: (a) we are the first to study the effect of neural network
    binarization on localization tasks, namely human pose estimation and face
    alignment. We exhaustively evaluate various design choices, identify
    performance bottlenecks, and more importantly propose multiple orthogonal ways
    to boost performance. (b) Based on our analysis, we propose a novel
    hierarchical, parallel and multi-scale residual block architecture that yields
    large performance improvement over the standard bottleneck block when having
    the same number of parameters, thus bridging the gap between the original
    network and its binarized counterpart. (c) We also show that the performance
    boost offered by the proposed architecture is not only observed for the case of
    binary networks but also generalizes for the case of real valued weights and
    activations. (d) We perform a large number of ablation studies that shed light
    on the properties and the performance of the proposed block. (e) We present
    results for experiments on the most challenging datasets for human pose
    estimation and face alignment, reporting in many cases state-of-the-art
    performance. Code can be download from
    this http URL

    Araguaia Medical Vision Lab at ISIC 2017 Skin Lesion Classification Challenge

    Rafael Teixeira Sousa, Larissa Vasconcellos de Moraes
    Comments: Abstract submitted as a requirement to ISIC2017 challenge
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper describes the participation of Araguaia Medical Vision Lab at the
    International Skin Imaging Collaboration 2017 Skin Lesion Challenge. We
    describe the use of deep convolutional neural networks in attempt to classify
    images of Melanoma and Seborrheic Keratosis lesions. With use of finetuned
    GoogleNet and AlexNet we attained results of 0.950 and 0.846 AUC on Seborrheic
    Keratosis and Melanoma respectively.

    Unsupervised Image-to-Image Translation Networks

    Ming-Yu Liu, Thomas Breuel, Jan Kautz
    Comments: 19 pages, 19 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Most of the existing image-to-image translation frameworks—mapping an image
    in one domain to a corresponding image in another—are based on supervised
    learning, i.e., pairs of corresponding images in two domains are required for
    learning the translation function. This largely limits their applications,
    because capturing corresponding images in two different domains is often a
    difficult task. To address the issue, we propose the UNsupervised
    Image-to-image Translation (UNIT) framework, which is based on variational
    autoencoders and generative adversarial networks. The proposed framework can
    learn the translation function without any corresponding images in two domains.
    We enable this learning capability by combining a weight-sharing constraint and
    an adversarial training objective. Through visualization results from various
    unsupervised image translation tasks, we verify the effectiveness of the
    proposed framework. An ablation study further reveals the critical design
    choices. Moreover, we apply the UNIT framework to the unsupervised domain
    adaptation task and achieve better results than competing algorithms do in
    benchmark datasets.

    Towards CNN Map Compression for camera relocalisation

    Luis Contreras, Walterio Mayol-Cuevas
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a study on the use of Convolutional Neural Networks for
    camera relocalisation and its application to map compression. We follow state
    of the art visual relocalisation results and evaluate response to different
    data inputs — namely, depth, grayscale, RGB, spatial position and combinations
    of these. We use a CNN map representation and introduce the notion of CNN map
    compression by using a smaller CNN architecture. We evaluate our proposal in a
    series of publicly available datasets. This formulation allows us to improve
    relocalisation accuracy by increasing the number of training trajectories while
    maintaining a constant-size CNN.

    Face Image Reconstruction from Deep Templates

    Guangcan Mai, Kai Cao, Pong C. Yuen, Anil K. Jain
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    State-of-the-art face recognition systems are based on deep (convolutional)
    neural networks. Therefore, it is imperative to determine to what extent face
    templates derived from deep networks can be inverted to obtain the original
    face image. In this paper, we discuss the vulnerabilities of a face recognition
    system based on deep templates, extracted by deep networks under image
    reconstruction attack. We propose a de-convolutional neural network (D-CNN) to
    reconstruct images of faces from their deep templates. In our experiments, we
    did not assume any knowledge about the target subject nor the deep network. To
    train the D-CNN reconstruction models, we augmented existing face datasets with
    a large collection of images synthesized using a face generator. The proposed
    reconstruction method was evaluated using type-I (comparing the reconstructed
    images against the original face images used to generate the deep template) and
    type-II (comparing the reconstructed images against a different face image of
    the same subject) attacks. We conducted a three-trial attack for each target
    face image using three face images reconstructed from three different D-CNNs.
    Each D-CNN was trained on a different dataset (VGG-Face, CASIA-Webface, or
    Multi-PIE). The type-I attack achieved a true accept rate (TAR) of 85.48\% at a
    false accept rate (FAR) of 0.1\% on the LFW dataset. The corresponding TAR for
    the type-II attack is 14.71\%. Our experimental results demonstrate the need to
    secure deep templates in face recognition systems.

    Robust Spatial Filtering with Graph Convolutional Neural Networks

    Felipe Petroski Such, Shagan Sah, Miguel Dominguez, Suhas Pillai, Chao Zhang, Andrew Michael, Nathan Cahill, Raymond Ptucha
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional Neural Networks (CNNs) have recently led to incredible
    breakthroughs on a variety of pattern recognition problems. Banks of finite
    impulse response filters are learned on a hierarchy of layers, each
    contributing more abstract information than the previous layer. The simplicity
    and elegance of the convolutional filtering process makes them perfect for
    structured problems such as image, video, or voice, where vertices are
    homogeneous in the sense of number, location, and strength of neighbors. The
    vast majority of classification problems, for example in the pharmaceutical,
    homeland security, and financial domains are unstructured. As these problems
    are formulated into unstructured graphs, the heterogeneity of these problems,
    such as number of vertices, number of connections per vertex, and edge
    strength, cannot be tackled with standard convolutional techniques. We propose
    a novel neural learning framework that is capable of handling both homogeneous
    and heterogeneous data, while retaining the benefits of traditional CNN
    successes.

    Recently, researchers have proposed variations of CNNs that can handle graph
    data. In an effort to create learnable filter banks of graphs, these methods
    either induce constraints on the data or require considerable preprocessing. As
    opposed to defining filters as spectral multipliers applied to the eigenvectors
    of the graph Laplacian, our framework, which we term Graph-CNNs, defines
    filters as polynomials of functions of the graph adjacency matrix. Graph-CNNs
    can handle both heterogeneous and homogeneous graph data, including graphs
    having entirely different vertex or edge sets. We compare Graph-CNN to
    traditional CNNs using the CIFAR-10 and Imagenet image classification datasets.

    Attentive Recurrent Comparators

    Pranav Shyam, Shubham Gupta, Ambedkar Dukkipati
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Many problems in Artificial Intelligence and Machine Learning can be reduced
    to the problem of quantitative comparison of two entities. In Deep Learning the
    ubiquitous architecture used for this task is the Siamese Neural Network which
    maps each entity to a representation through a learnable function and expresses
    similarity through the distances among the entities in the representation
    space. In this paper, we argue that such a static and invariant mapping is both
    naive and unnatural. We develop a novel neural model called Attentive Recurrent
    Comparators (ARCs) that dynamically compares two entities and test the model
    extensively on the Omniglot dataset. In the task of similarity learning, our
    simplistic model that does not use any convolutions performs on par with Deep
    Convolutional Siamese Networks and significantly better when convolutional
    layers are also used. In the challenging task of one-shot learning on the same
    dataset, an ARC based model achieves the first super-human performance for a
    neural method with an error rate of 1.5\%.

    BoxCars: Improving Vehicle Fine-Grained Recognition using 3D Bounding Boxes in Traffic Surveillance

    Jakub Sochor, Jakub Špaňhel, Adam Herout
    Comments: under review in IJCV
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we focus on fine-grained recognition of vehicles mainly in
    traffic surveillance applications. We propose an approach orthogonal to recent
    advancement in fine-grained recognition (automatic part discovery, bilinear
    pooling). Also, in contrast to other methods focused on fine-grained
    recognition of vehicles, we do not limit ourselves to frontal/rear viewpoint
    but allow the vehicles to be seen from any viewpoint. Our approach is based on
    3D bounding boxes built around the vehicles. The bounding box can be
    automatically constructed from traffic surveillance data. For scenarios where
    it is not possible to use the precise construction, we propose a method for
    estimation of the 3D bounding box. The 3D bounding box is used to normalize the
    image viewpoint by unpacking the image into plane. We also propose to randomly
    alter the color of the image and add a rectangle with random noise to random
    position in the image during training Convolutional Neural Networks. We have
    collected a large fine-grained vehicle dataset BoxCars116k, with 116k images of
    vehicles from various viewpoints taken by numerous surveillance cameras. We
    performed a number of experiments which show that our proposed method
    significantly improves CNN classification accuracy (the accuracy is increased
    by up to 12 percent points and the error is reduced by up to 50% compared to
    CNNs without the proposed modifications). We also show that our method
    outperforms state-of-the-art methods for fine-grained recognition.

    TumorNet: Lung Nodule Characterization Using Multi-View Convolutional Neural Network with Gaussian Process

    Sarfaraz Hussein, Robert Gillies, Kunlin Cao, Qi Song, Ulas Bagci
    Comments: Accepted for publication in IEEE International Symposium on Biomedical Imaging (ISBI) 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Characterization of lung nodules as benign or malignant is one of the most
    important tasks in lung cancer diagnosis, staging and treatment planning. While
    the variation in the appearance of the nodules remains large, there is a need
    for a fast and robust computer aided system. In this work, we propose an
    end-to-end trainable multi-view deep Convolutional Neural Network (CNN) for
    nodule characterization. First, we use median intensity projection to obtain a
    2D patch corresponding to each dimension. The three images are then
    concatenated to form a tensor, where the images serve as different channels of
    the input image. In order to increase the number of training samples, we
    perform data augmentation by scaling, rotating and adding noise to the input
    image. The trained network is used to extract features from the input image
    followed by a Gaussian Process (GP) regression to obtain the malignancy score.
    We also empirically establish the significance of different high level nodule
    attributes such as calcification, sphericity and others for malignancy
    determination. These attributes are found to be complementary to the deep
    multi-view CNN features and a significant improvement over other methods is
    obtained.

    A novel image tag completion method based on convolutional neural network

    Yanyan Geng, Weizhi Li, Gaoyuan Liang, Jingbin Wang, Yanbin Wu, Nitin Patil, Jing-Yan Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In the problems of image retrieval and annotation, complete textual tag lists
    of images play critical roles. However, in real-world applications, the image
    tags are usually incomplete, thus it is important to learn the complete tags
    for images. In this paper, we study the problem of image tag complete and
    proposed a novel method for this problem based on a popular image
    representation method, convolutional neural network (CNN). The method estimates
    the complete tags from the CNN representations of images based on a linear
    predictor. The CNN parameters, linear predictor, and the complete tags are
    learned jointly by our method. We build a minimization problem to encourage the
    consistency between the complete tags and the available incomplete tags, reduce
    the estimation error, and reduce the model complexity. An iterative algorithm
    is developed to solve the minimization problem. Experiments over benchmark
    image data sets show its effectiveness.

    Skin Lesion Analysis Towards Melanoma Detection Using Deep Learning Network

    Yuexiang Li, Linlin Shen
    Comments: ISIC2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Skin lesion is a severe disease in world-wide extent. Reliable automatic
    detection of skin tumors is very useful to help increase the accuracy and
    efficiency of pathologists. International Skin Imaging Collaboration (ISIC) is
    a challenge focusing on the automatic analysis of skin lesion. In this brief
    paper, we introduce two deep learning methods to address all the three tasks
    announced in ISIC 2017, i.e. lesion segmentation (task 1), lesion dermoscopic
    feature extraction (task 2) and lesion classification (task 3). A
    fully-convolutional network is proposed to simultaneously address the tasks of
    lesion segmentation and classification and a straight-forward CNN is proposed
    for the dermoscopic feature extraction task. Experimental results on the
    validation set show promising accuracies, i.e. 75.1% for task 1, 84.4% for task
    2 and 90.8% for task 3 were achieved.

    A Deep Cascade of Convolutional Neural Networks for MR Image Reconstruction

    Jo Schlemper, Jose Caballero, Joseph V. Hajnal, Anthony Price, Daniel Rueckert
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The acquisition of Magnetic Resonance Imaging (MRI) is inherently slow.
    Inspired by recent advances in deep learning, we propose a framework for
    reconstructing MR images from undersampled data using a deep cascade of
    convolutional neural networks to accelerate the data acquisition process. We
    show that for Cartesian undersampling of 2D cardiac MR images, the proposed
    method outperforms the state-of-the-art compressed sensing approaches, such as
    dictionary learning-based MRI (DLMRI) reconstruction, in terms of
    reconstruction error, perceptual quality and reconstruction speed for both
    3-fold and 6-fold undersampling. Compared to DLMRI, the error produced by the
    method proposed is approximately twice as small, allowing to preserve
    anatomical structures more faithfully. Using our method, each image can be
    reconstructed in 23 ms, which is fast enough to enable real-time applications.

    Change Detection under Global Viewpoint Uncertainty

    Murase Tomoya, Tanaka Kanji
    Comments: 8 pages, 9 figures, technical report
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper addresses the problem of change detection from a novel perspective
    of long-term map learning. We are particularly interested in designing an
    approach that can scale to large maps and that can function under global
    uncertainty in the viewpoint (i.e., GPS-denied situations). Our approach, which
    utilizes a compact bag-of-words (BoW) scene model, makes several contributions
    to the problem:

    1) Two kinds of prior information are extracted from the view sequence map
    and used for change detection. Further, we propose a novel type of prior,
    called motion prior, to predict the relative motions of stationary objects and
    anomaly ego-motion detection. The proposed prior is also useful for
    distinguishing stationary from non-stationary objects.

    2) A small set of good reference images (e.g., 10) are efficiently retrieved
    from the view sequence map by employing the recently developed
    Bag-of-Local-Convolutional-Features (BoLCF) scene model.

    3) Change detection is reformulated as a scene retrieval over these reference
    images to find changed objects using a novel spatial Bag-of-Words (SBoW) scene
    model. Evaluations conducted of individual techniques and also their
    combinations on a challenging dataset of highly dynamic scenes in the publicly
    available Malaga dataset verify their efficacy.

    Label Refinement Network for Coarse-to-Fine Semantic Segmentation

    Md Amirul Islam, Shujon Naha, Mrigank Rochan, Neil Bruce, Yang Wang
    Comments: 9 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We consider the problem of semantic image segmentation using deep
    convolutional neural networks. We propose a novel network architecture called
    the label refinement network that predicts segmentation labels in a
    coarse-to-fine fashion at several resolutions. The segmentation labels at a
    coarse resolution are used together with convolutional features to obtain finer
    resolution segmentation labels. We define loss functions at several stages in
    the network to provide supervisions at different stages. Our experimental
    results on several standard datasets demonstrate that the proposed model
    provides an effective way of producing pixel-wise dense image labeling.

    Skin cancer reorganization and classification with deep neural network

    Hao Chang
    Comments: 5 pages, 2 figures. ISIC2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    As one kind of skin cancer, melanoma is very dangerous. Dermoscopy based
    early detection and recarbonization strategy is critical for melanoma therapy.
    However, well-trained dermatologists dominant the diagnostic accuracy. In order
    to solve this problem, many effort focus on developing automatic image analysis
    systems. Here we report a novel strategy based on deep learning technique, and
    achieve very high skin lesion segmentation and melanoma diagnosis accuracy: 1)
    we build a segmentation neural network (skin_segnn), which achieved very high
    lesion boundary detection accuracy; 2) We build another very deep neural
    network based on Google inception v3 network (skin_recnn) and its well-trained
    weight. The novel designed transfer learning based deep neural network
    skin_inceptions_v3_nn helps to achieve a high prediction accuracy.

    ISIC 2017 – Skin Lesion Analysis Towards Melanoma Detection

    Matt Berseth
    Comments: ISIC2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Our system addresses Part 1, Lesion Segmentation and Part 3, Lesion
    Classification of the ISIC 2017 challenge. Both algorithms make use of deep
    convolutional networks to achieve the challenge objective.

    Making 360° Video Watchable in 2D: Learning Videography for Click Free Viewing

    Yu-Chuan Su, Kristen Grauman
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    360{deg} video requires human viewers to actively control “where” to look
    while watching the video. Although it provides a more immersive experience of
    the visual content, it also introduces additional burden for viewers; awkward
    interfaces to navigate the video lead to suboptimal viewing experiences.
    Virtual cinematography is an appealing direction to remedy these problems, but
    conventional methods are limited to virtual environments or rely on
    hand-crafted heuristics. We propose a new algorithm for virtual cinematography
    that automatically controls a virtual camera within a 360{deg} video. Compared
    to the state of the art, our algorithm allows more general camera control,
    avoids redundant outputs, and extracts its output videos substantially more
    efficiently. Experimental results on over 7 hours of real “in the wild” video
    show that our generalized camera control is crucial for viewing 360{deg}
    video, while the proposed efficient algorithm is essential for making the
    generalized control computationally tractable.

    Using Synthetic Data to Train Neural Networks is Model-Based Reasoning

    Tuan Anh Le, Atilim Gunes Baydin, Robert Zinkov, Frank Wood
    Comments: 8 pages, 4 figures
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    We draw a formal connection between using synthetic training data to optimize
    neural network parameters and approximate, Bayesian, model-based reasoning. In
    particular, training a neural network using synthetic data can be viewed as
    learning a proposal distribution generator for approximate inference in the
    synthetic-data generative model. We demonstrate this connection in a
    recognition task where we develop a novel Captcha-breaking architecture and
    train it using synthetic data, demonstrating both state-of-the-art performance
    and a way of computing task-specific posterior uncertainty. Using a neural
    network trained this way, we also demonstrate successful breaking of real-world
    Captchas currently used by Facebook and Wikipedia. Reasoning from these
    empirical results and drawing connections with Bayesian modeling, we discuss
    the robustness of synthetic data results and suggest important considerations
    for ensuring good neural network generalization when training with synthetic
    data.

    A Simple, Fast and Fully Automated Approach for Midline Shift Measurement on Brain Computed Tomography

    Huan-Chih Wang, Shih-Hao Ho, Furen Xiao, Jen-Hai Chou
    Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV)

    Brain CT has become a standard imaging tool for emergent evaluation of brain
    condition, and measurement of midline shift (MLS) is one of the most important
    features to address for brain CT assessment. We present a simple method to
    estimate MLS and propose a new alternative parameter to MLS: the ratio of MLS
    over the maximal width of intracranial region (MLS/ICWMAX). Three neurosurgeons
    and our automated system were asked to measure MLS and MLS/ICWMAX in the same
    sets of axial CT images obtained from 41 patients admitted to ICU under
    neurosurgical service. A weighted midline (WML) was plotted based on individual
    pixel intensities, with higher weighted given to the darker portions. The MLS
    could then be measured as the distance between the WML and ideal midline (IML)
    near the foramen of Monro. The average processing time to output an automatic
    MLS measurement was around 10 seconds. Our automated system achieved an overall
    accuracy of 90.24% when the CT images were calibrated automatically, and
    performed better when the calibrations of head rotation were done manually
    (accuracy: 92.68%). MLS/ICWMAX and MLS both gave results in same confusion
    matrices and produced similar ROC curve results. We demonstrated a simple, fast
    and accurate automated system of MLS measurement and introduced a new parameter
    (MLS/ICWMAX) as a good alternative to MLS in terms of estimating the degree of
    brain deformation, especially when non-DICOM images (e.g. JPEG) are more easily
    accessed.

    Wireless Interference Identification with Convolutional Neural Networks

    Malte Schmidt, Dimitri Block, Uwe Meier
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    The steadily growing use of license-free frequency bands requires reliable
    coexistence management for deterministic medium utilization. For interference
    mitigation, proper wireless interference identification (WII) is essential. In
    this work we propose the first WII approach based upon deep convolutional
    neural networks (CNNs). The CNN naively learns its features through
    self-optimization during an extensive data-driven GPU-based training process.
    We propose a CNN example which is based upon sensing snapshots with a limited
    duration of 12.8 {mu}s and an acquisition bandwidth of 10 MHz. The CNN differs
    between 15 classes. They represent packet transmissions of IEEE 802.11 b/g,
    IEEE 802.15.4 and IEEE 802.15.1 with overlapping frequency channels within the
    2.4 GHz ISM band. We show that the CNN outperforms state-of-the-art WII
    approaches and has a classification accuracy greater than 95% for
    signal-to-noise ratio of at least -5 dB.

    Introduction to Nonnegative Matrix Factorization

    Nicolas Gillis
    Comments: 18 pages, 4 figures
    Journal-ref: SIAG/OPT Views and News 25 (1), pp. 7-16 (2017)
    Subjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Numerical Analysis (cs.NA); Machine Learning (stat.ML)

    In this paper, we introduce and provide a short overview of nonnegative
    matrix factorization (NMF). Several aspects of NMF are discussed, namely, the
    application in hyperspectral imaging, geometry and uniqueness of NMF solutions,
    complexity, algorithms, and its link with extended formulations of polyhedra.
    In order to put NMF into perspective, the more general problem class of
    constrained low-rank matrix approximation problems is first briefly introduced.

    Learning Social Affordance Grammar from Videos: Transferring Human Interactions to Human-Robot Interactions

    Tianmin Shu, Xiaofeng Gao, Michael S. Ryoo, Song-Chun Zhu
    Comments: The 2017 IEEE International Conference on Robotics and Automation (ICRA)
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we present a general framework for learning social affordance
    grammar as a spatiotemporal AND-OR graph (ST-AOG) from RGB-D videos of human
    interactions, and transfer the grammar to humanoids to enable a real-time
    motion inference for human-robot interaction (HRI). Based on Gibbs sampling,
    our weakly supervised grammar learning can automatically construct a
    hierarchical representation of an interaction with long-term joint sub-tasks of
    both agents and short term atomic actions of individual agents. Based on a new
    RGB-D video dataset with rich instances of human interactions, our experiments
    of Baxter simulation, human evaluation, and real Baxter test demonstrate that
    the model learned from limited training data successfully generates human-like
    behaviors in unseen scenarios and outperforms both baselines.


    Artificial Intelligence

    SLIM: Semi-Lazy Inference Mechanism for Plan Recognition

    Retuh Mirsky, Ya'akov (Kobi)Gal
    Subjects: Artificial Intelligence (cs.AI)

    Plan Recognition algorithms require to recognize a complete hierarchy
    explaining the agent’s actions and goals. While the output of such algorithms
    is informative to the recognizer, the cost of its calculation is high in
    run-time, space, and completeness. Moreover, performing plan recognition online
    requires the observing agent to reason about future actions that have not yet
    been seen and maintain a set of hypotheses to support all possible options.
    This paper presents a new and efficient algorithm for online plan recognition
    called SLIM (Semi-Lazy Inference Mechanism). It combines both a bottom-up and
    top-down parsing processes, which allow it to commit only to the minimum
    necessary actions in real-time, but still provide complete hypotheses post
    factum. We show both theoretically and empirically that although the
    computational cost of this process is still exponential, there is a significant
    improvement in run-time when compared to a state of the art of plan recognition
    algorithm.

    Sampling Variations of Lead Sheets

    Pierre Roy, Alexandre Papadopoulos, François Pachet
    Comments: 16 pages, 11 figures
    Subjects: Artificial Intelligence (cs.AI)

    Machine-learning techniques have been recently used with spectacular results
    to generate artefacts such as music or text. However, these techniques are
    still unable to capture and generate artefacts that are convincingly
    structured. In this paper we present an approach to generate structured musical
    sequences. We introduce a mechanism for sampling efficiently variations of
    musical sequences. Given a input sequence and a statistical model, this
    mechanism samples a set of sequences whose distance to the input sequence is
    approximately within specified bounds. This mechanism is implemented as an
    extension of belief propagation, and uses local fields to bias the generation.
    We show experimentally that sampled sequences are indeed closely correlated to
    the standard musical similarity measure defined by Mongeau and Sankoff. We then
    show how this mechanism can used to implement composition strategies that
    enforce arbitrary structure on a musical lead sheet generation problem.

    Adaptive Matching for Expert Systems with Uncertain Task Types

    Virag Shah, Lennart Gulikers, Laurent Massoulie, Milan Vojnovic
    Comments: 19 pages
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Online two-sided matching markets such as Q&A forums (e.g. StackOverflow,
    Quora) and online labour platforms (e.g. Upwork) critically rely on the ability
    to propose adequate matches based on imperfect knowledge of the two parties to
    be matched. This prompts the following question: Which matching recommendation
    algorithms can, in the presence of such uncertainty, lead to efficient platform
    operation?

    To answer this question, we develop a model of a task / server matching
    system. For this model, we give a necessary and sufficient condition for an
    incoming stream of tasks to be manageable by the system. We further identify a
    so-called back-pressure policy under which the throughput that the system can
    handle is optimized. We show that this policy achieves strictly larger
    throughput than a natural greedy policy. Finally, we validate our model and
    confirm our theoretical findings with experiments based on logs of
    Math.StackExchange, a StackOverflow forum dedicated to mathematics.

    Unsupervised Image-to-Image Translation Networks

    Ming-Yu Liu, Thomas Breuel, Jan Kautz
    Comments: 19 pages, 19 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Most of the existing image-to-image translation frameworks—mapping an image
    in one domain to a corresponding image in another—are based on supervised
    learning, i.e., pairs of corresponding images in two domains are required for
    learning the translation function. This largely limits their applications,
    because capturing corresponding images in two different domains is often a
    difficult task. To address the issue, we propose the UNsupervised
    Image-to-image Translation (UNIT) framework, which is based on variational
    autoencoders and generative adversarial networks. The proposed framework can
    learn the translation function without any corresponding images in two domains.
    We enable this learning capability by combining a weight-sharing constraint and
    an adversarial training objective. Through visualization results from various
    unsupervised image translation tasks, we verify the effectiveness of the
    proposed framework. An ablation study further reveals the critical design
    choices. Moreover, we apply the UNIT framework to the unsupervised domain
    adaptation task and achieve better results than competing algorithms do in
    benchmark datasets.

    Conversion Rate Optimization through Evolutionary Computation

    Risto Miikkulainen, Neil Iscoe, Aaron Shagrin, Ron Cordell, Cory Schoolland, Myles Brundage, Jonathan Epstein, Randy Dean, Gurmeet Lamba
    Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Conversion optimization means designing a web interface so that as many users
    as possible take a desired action on it, such as register or purchase. Such
    design is usually done by hand, testing one change at a time through A/B
    testing, or a limited number of combinations through multivariate testing,
    making it possible to evaluate only a small fraction of designs in a vast
    design space. This paper describes Sentient Ascend, an automatic conversion
    optimization system that uses evolutionary optimization to create effective web
    interface designs. Ascend makes it possible to discover and utilize
    interactions between the design elements that are difficult to identify
    otherwise. Moreover, evaluation of design candidates is done in parallel
    online, i.e. with a large number of real users interacting with the system. A
    case study on a lead generation site learnhotobecome.org shows that significant
    improvements (i.e. over 43%) are possible beyond human design. Ascend can
    therefore be seen as an approach to massively multivariate conversion
    optimization, based on a massively parallel interactive evolution.

    Evolving Deep Neural Networks

    Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Dan Fink, Olivier Francon, Bala Raju, Arshak Navruzyan, Nigel Duffy, Babak Hodjat
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

    The success of deep learning depends on finding an architecture to fit the
    task. As deep learning has scaled up to more challenging tasks, the
    architectures have become difficult to design by hand. This paper proposes an
    automated method, CoDeepNEAT, for optimizing deep learning architectures
    through evolution. By extending existing neuroevolution methods to topology,
    components, and hyperparameters, this method achieves results comparable to
    best human designs in standard benchmarks in object recognition and language
    modeling. It also supports building a real-world application of automated image
    captioning on a magazine website. Given the anticipated increases in available
    computing power, evolution of deep networks is promising approach to
    constructing deep learning applications in the future.

    PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison

    Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, Jason H. Moore
    Comments: 14 pages, 5 figures, submitted for review to JMLR
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    The selection, development, or comparison of machine learning methods in data
    mining can be a difficult task based on the target problem and goals of a
    particular study. Numerous publicly available real-world and simulated
    benchmark datasets have emerged from different sources, but their organization
    and adoption as standards have been inconsistent. As such, selecting and
    curating specific benchmarks remains an unnecessary burden on machine learning
    practitioners and data scientists. The present study introduces an accessible,
    curated, and developing public benchmark resource to facilitate identification
    of the strengths and weaknesses of different machine learning methodologies. We
    compare meta-features among the current set of benchmark datasets in this
    resource to characterize the diversity of available data. Finally, we apply a
    number of established machine learning methods to the entire benchmark suite
    and analyze how datasets and algorithms cluster in terms of performance. This
    work is an important first step towards understanding the limitations of
    popular benchmarking suites and developing a resource that connects existing
    benchmarking standards to more diverse and efficient standards in the future.

    Learning Social Affordance Grammar from Videos: Transferring Human Interactions to Human-Robot Interactions

    Tianmin Shu, Xiaofeng Gao, Michael S. Ryoo, Song-Chun Zhu
    Comments: The 2017 IEEE International Conference on Robotics and Automation (ICRA)
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we present a general framework for learning social affordance
    grammar as a spatiotemporal AND-OR graph (ST-AOG) from RGB-D videos of human
    interactions, and transfer the grammar to humanoids to enable a real-time
    motion inference for human-robot interaction (HRI). Based on Gibbs sampling,
    our weakly supervised grammar learning can automatically construct a
    hierarchical representation of an interaction with long-term joint sub-tasks of
    both agents and short term atomic actions of individual agents. Based on a new
    RGB-D video dataset with rich instances of human interactions, our experiments
    of Baxter simulation, human evaluation, and real Baxter test demonstrate that
    the model learned from limited training data successfully generates human-like
    behaviors in unseen scenarios and outperforms both baselines.

    Truth and Regret in Online Scheduling

    Shuchi Chawla, Nikhil Devanur, Janardhan Kulkarni, Rad Niazadeh
    Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Learning (cs.LG)

    We consider a scheduling problem where a cloud service provider has multiple
    units of a resource available over time. Selfish clients submit jobs, each with
    an arrival time, deadline, length, and value. The service provider’s goal is to
    implement a truthful online mechanism for scheduling jobs so as to maximize the
    social welfare of the schedule. Recent work shows that under a stochastic
    assumption on job arrivals, there is a single-parameter family of mechanisms
    that achieves near-optimal social welfare. We show that given any such family
    of near-optimal online mechanisms, there exists an online mechanism that in the
    worst case performs nearly as well as the best of the given mechanisms. Our
    mechanism is truthful whenever the mechanisms in the given family are truthful
    and prompt, and achieves optimal (within constant factors) regret.

    We model the problem of competing against a family of online scheduling
    mechanisms as one of learning from expert advice. A primary challenge is that
    any scheduling decisions we make affect not only the payoff at the current
    step, but also the resource availability and payoffs in future steps.
    Furthermore, switching from one algorithm (a.k.a. expert) to another in an
    online fashion is challenging both because it requires synchronization with the
    state of the latter algorithm as well as because it affects the incentive
    structure of the algorithms. We further show how to adapt our algorithm to a
    non-clairvoyant setting where job lengths are unknown until jobs are run to
    completion. Once again, in this setting, we obtain truthfulness along with
    asymptotically optimal regret (within poly-logarithmic factors).


    Information Retrieval

    Identifying leading indicators of product recalls from online reviews using positive unlabeled learning and domain adaptation

    Shreesh Kumara Bhat, Aron Culotta
    Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    Consumer protection agencies are charged with safeguarding the public from
    hazardous products, but the thousands of products under their jurisdiction make
    it challenging to identify and respond to consumer complaints quickly. From the
    consumer’s perspective, online reviews can provide evidence of product defects,
    but manually sifting through hundreds of reviews is not always feasible. In
    this paper, we propose a system to mine Amazon.com reviews to identify products
    that may pose safety or health hazards. Since labeled data for this task are
    scarce, our approach combines positive unlabeled learning with domain
    adaptation to train a classifier from consumer complaints submitted to the U.S.
    Consumer Product Safety Commission. On a validation set of manually annotated
    Amazon product reviews, we find that our approach results in an absolute F1
    score improvement of 8% over the best competing baseline. Furthermore, we apply
    the classifier to Amazon reviews of known recalled products; the classifier
    identifies reviews reporting safety hazards prior to the recall date for 45% of
    the products. This suggests that the system may be able to provide an early
    warning system to alert consumers to hazardous products before an official
    recall is announced.

    Scattertext: a Browser-Based Tool for Visualizing how Corpora Differ

    Jason S. Kessler
    Comments: 6 pages, 5 figures. See this https URL for source code and documentation
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Scattertext is an open source tool for visualizing linguistic variation
    between document categories in a language-independent way. The tool presents a
    scatterplot, where each axis corresponds to the rank-frequency a term occurs in
    a category of documents. Through a tie-breaking strategy, the tool is able to
    display thousands of visible term-representing points and find space to legibly
    label hundreds of them. Scattertext also lends itself to a query-based
    visualization of how the use of terms with similar embeddings differs between
    document categories, as well as a visualization for comparing the importance
    scores of bag-of-words features to univariate metrics.


    Computation and Language

    A Generic Online Parallel Learning Framework for Large Margin Models

    Shuming Ma, Xu Sun
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    To speed up the training process, many existing systems use parallel
    technology for online learning algorithms. However, most research mainly focus
    on stochastic gradient descent (SGD) instead of other algorithms. We propose a
    generic online parallel learning framework for large margin models, and also
    analyze our framework on popular large margin algorithms, including MIRA and
    Structured Perceptron. Our framework is lock-free and easy to implement on
    existing systems. Experiments show that systems with our framework can gain
    near linear speed up by increasing running threads, and with no loss in
    accuracy.

    Lock-Free Parallel Perceptron for Graph-based Dependency Parsing

    Xu Sun, Shuming Ma
    Subjects: Computation and Language (cs.CL)

    Dependency parsing is an important NLP task. A popular approach for
    dependency parsing is structured perceptron. Still, graph-based dependency
    parsing has the time complexity of (O(n^3)), and it suffers from slow training.
    To deal with this problem, we propose a parallel algorithm called parallel
    perceptron. The parallel algorithm can make full use of a multi-core computer
    which saves a lot of training time. Based on experiments we observe that
    dependency parsing with parallel perceptron can achieve 8-fold faster training
    speed than traditional structured perceptron methods when using 10 threads, and
    with no loss at all in accuracy.

    Discovery of Evolving Semantics through Dynamic Word Embedding Learning

    Zijun Yao, Yifan Sun, Weicong Ding, Nikhil Rao, Hui Xiong
    Subjects: Computation and Language (cs.CL); Machine Learning (stat.ML)

    During the course of human language evolution, the semantic meanings of words
    keep evolving with time. The understanding of evolving semantics enables us to
    capture the true meaning of the words in different usage contexts, and thus is
    critical for various applications, such as machine translation. While it is
    naturally promising to study word semantics in a time-aware manner, traditional
    methods to learn word vector representation do not adequately capture the
    change over time. To this end, in this paper, we aim at learning time-aware
    vector representation of words through dynamic word embedding modeling.
    Specifically, we first propose a method that captures time-specific semantics
    and across-time alignment simultaneously in a way that is robust to data
    sparsity. Then, we solve the resulting optimization problem using a scalable
    coordinate descent method. Finally, we perform the empirical study on New York
    Times data to learn the temporal embeddings and develop multiple evaluations
    that illustrate the semantic evolution of words, discovered from news media.
    Moreover, our qualitative and quantitative tests indicate that the our method
    not only reliably captures the semantic evolution over time, but also
    onsistently outperforms state-of-the-art temporal embedding approaches on both
    semantic accuracy and alignment quality.

    Structural Embedding of Syntactic Trees for Machine Comprehension

    Rui Liu, Junjie Hu, Wei Wei, Zi Yang, Eric Nyberg
    Subjects: Computation and Language (cs.CL)

    This paper develops a model that addresses syntactic embedding for machine
    comprehension, a key task of natural language understanding. Our proposed
    model, structural embedding of syntactic trees (SEST), takes each word in a
    sentence, constructs a sequence of syntactic nodes extracted from syntactic
    parse trees, and encodes the sequence into a vector representation. The learned
    vector is then incorporated into neural attention models, which allows learning
    the mapping of syntactic structures between question and context pairs. We
    evaluate our approach on SQuAD dataset and demonstrate that our model can
    accurately identify the syntactic boundaries of the sentences and to extract
    answers that are syntactically coherent over the baseline methods.

    Scattertext: a Browser-Based Tool for Visualizing how Corpora Differ

    Jason S. Kessler
    Comments: 6 pages, 5 figures. See this https URL for source code and documentation
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Scattertext is an open source tool for visualizing linguistic variation
    between document categories in a language-independent way. The tool presents a
    scatterplot, where each axis corresponds to the rank-frequency a term occurs in
    a category of documents. Through a tie-breaking strategy, the tool is able to
    display thousands of visible term-representing points and find space to legibly
    label hundreds of them. Scattertext also lends itself to a query-based
    visualization of how the use of terms with similar embeddings differs between
    document categories, as well as a visualization for comparing the importance
    scores of bag-of-words features to univariate metrics.

    Unsupervised Ensemble Ranking of Terms in Electronic Health Record Notes Based on Their Importance to Patients

    Jinying Chen, Hong Yu
    Subjects: Computation and Language (cs.CL)

    Background: Electronic health record (EHR) notes contain abundant medical
    jargon that can be difficult for patients to comprehend. One way to help
    patients is to reduce information overload and help them focus on medical terms
    that matter most to them.

    Objective: The aim of this work was to develop FIT (Finding Important Terms
    for patients), an unsupervised natural language processing (NLP) system that
    ranks medical terms in EHR notes based on their importance to patients.

    Methods: We built FIT on a new unsupervised ensemble ranking model derived
    from the biased random walk algorithm to combine heterogeneous information
    resources for ranking candidate terms from each EHR note. Specifically, FIT
    integrates four single views for term importance: patient use of medical
    concepts, document-level term salience, word-occurrence based term relatedness,
    and topic coherence. It also incorporates partial information of term
    importance as conveyed by terms’ unfamiliarity levels and semantic types. We
    evaluated FIT on 90 expert-annotated EHR notes and compared it with three
    benchmark unsupervised ensemble ranking methods.

    Results: FIT achieved 0.885 AUC-ROC for ranking candidate terms from EHR
    notes to identify important terms. When including term identification, the
    performance of FIT for identifying important terms from EHR notes was 0.813
    AUC-ROC. It outperformed the three ensemble rankers for most metrics. Its
    performance is relatively insensitive to its parameter.

    Conclusions: FIT can automatically identify EHR terms important to patients
    and may help develop personalized interventions to improve quality of care. By
    using unsupervised learning as well as a robust and flexible framework for
    information fusion, FIT can be readily applied to other domains and
    applications.


    Distributed, Parallel, and Cluster Computing

    The RowHammer Problem and Other Issues We May Face as Memory Becomes Denser

    Onur Mutlu
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    As memory scales down to smaller technology nodes, new failure mechanisms
    emerge that threaten its correct operation. If such failure mechanisms are not
    anticipated and corrected, they can not only degrade system reliability and
    availability but also, perhaps even more importantly, open up security
    vulnerabilities: a malicious attacker can exploit the exposed failure mechanism
    to take over the entire system. As such, new failure mechanisms in memory can
    become practical and significant threats to system security.

    In this work, we discuss the RowHammer problem in DRAM, which is a prime (and
    perhaps the first) example of how a circuit-level failure mechanism in DRAM can
    cause a practical and widespread system security vulnerability. RowHammer, as
    it is popularly referred to, is the phenomenon that repeatedly accessing a row
    in a modern DRAM chip causes bit flips in physically-adjacent rows at
    consistently predictable bit locations. It is caused by a hardware failure
    mechanism called DRAM disturbance errors, which is a manifestation of
    circuit-level cell-to-cell interference in a scaled memory technology. We
    analyze the root causes of the RowHammer problem and examine various solutions.
    We also discuss what other vulnerabilities may be lurking in DRAM and other
    types of memories, e.g., NAND flash memory or Phase Change Memory, that can
    potentially threaten the foundations of secure systems, as the memory
    technologies scale to higher densities. We conclude by describing and
    advocating a principled approach to memory reliability and security research
    that can enable us to better anticipate and prevent such vulnerabilities.

    General and Robust Communication-Efficient Algorithms for Distributed Clustering

    Pranjal Awasthi, Maria-Florina Balcan, Colin White
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    As datasets become larger and more distributed, algorithms for distributed
    clustering have become more and more important. In this work, we present a
    general framework for designing distributed clustering algorithms that are
    robust to outliers. Using our framework, we give a distributed approximation
    algorithm for k-means, k-median, or generally any L_p objective, with z
    outliers and/or balance constraints, using O(m(k+z)(d+log n)) bits of
    communication, where m is the number of machines, n is the size of the point
    set, and d is the dimension. This generalizes and improves over previous work
    of Bateni et al. and Malkomes et al. As a special case, we achieve the first
    distributed algorithm for k-median with outliers, answering an open question
    posed by Malkomes et al. For distributed k-means clustering, we provide the
    first dimension-dependent communication complexity lower bound for finding the
    optimal clustering. This improves over the lower bound from Chen et al. which
    is dimension-agnostic.

    Furthermore, we give distributed clustering algorithms which return nearly
    optimal solutions, provided the data satisfies the approximation stability
    condition of Balcan et al. or the spectral stability condition of Kumar and
    Kannan. In certain clustering applications where a local clustering consistent
    among all machines is sufficient, we show that no communication is necessary if
    the data satisfies approximation stability.

    Distributed Bayesian Matrix Factorization with Minimal Communication

    Xiangju Qin, Paul Blomstedt, Eemeli Leppäaho, Pekka Parviainen, Samuel Kaski
    Comments: 10 pages, 11 figures
    Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Numerical Analysis (cs.NA); Methodology (stat.ME)

    Bayesian matrix factorization (BMF) is a powerful tool for producing low-rank
    representations of matrices, and giving principled predictions of missing
    values. However, scaling up MCMC samplers to large matrices has proven to be
    difficult with parallel algorithms that require communication between MCMC
    iterations. On the other hand, designing communication-free algorithms is
    challenging due to the inherent unidentifiability of BMF solutions. We propose
    posterior propagation, an embarrassingly parallel inference procedure, which
    hierarchically introduces dependencies between data subsets and thus alleviates
    the unidentifiability problem.

    Even better correction of genome sequencing data

    Maciej Dlugosz, Sebastian Deorowicz, Marek Kokot
    Subjects: Genomics (q-bio.GN); Distributed, Parallel, and Cluster Computing (cs.DC)

    We introduce an improved version of RECKONER, an error corrector for Illumina
    whole genome sequencing data. By modifying its workflow we reduce the
    computation time even 10 times. We also propose a new method of determination
    of (k)-mer length, the key parameter of (k)-spectrum-based family of
    correctors. The correction algorithms are examined on huge data sets, i.e.,
    human and maize genomes for both Illumina HiSeq and MiSeq instruments.

    Even faster sorting of (not only) integers

    Marek Kokot, Sebastian Deorowicz, Maciej Dlugosz
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

    In this paper we introduce RADULS2, the fastest parallel sorter based on
    radix algorithm. It is optimized to process huge amounts of data making use of
    modern multicore CPUs. The main novelties include: extremely optimized
    algorithm for handling tiny arrays (up to about a hundred of records) that
    could appear even billions times as subproblems to handle and improved
    processing of larger subarrays with better use of non-temporal memory stores.


    Learning

    Being Robust (in High Dimensions) Can Be Practical

    Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Machine Learning (stat.ML)

    Robust estimation is much more challenging in high dimensions than it is in
    one dimension: Most techniques either lead to intractable optimization problems
    or estimators that can tolerate only a tiny fraction of errors. Recent work in
    theoretical computer science has shown that, in appropriate distributional
    models, it is possible to robustly estimate the mean and covariance with
    polynomial time algorithms that can tolerate a constant fraction of
    corruptions, independent of the dimension. However, the sample and time
    complexity of these algorithms is prohibitively large for high-dimensional
    applications. In this work, we address both of these issues by establishing
    sample complexity bounds that are optimal, up to logarithmic factors, as well
    as giving various refinements that allow the algorithms to tolerate a much
    larger fraction of corruptions. Finally, we show on both synthetic and real
    data that our algorithms have state-of-the-art performance and suddenly make
    high-dimensional robust estimation a realistic possibility.

    How to Escape Saddle Points Efficiently

    Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael I. Jordan
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

    This paper shows that a perturbed form of gradient descent converges to a
    second-order stationary point in a number iterations which depends only
    poly-logarithmically on dimension (i.e., it is almost “dimension-free”). The
    convergence rate of this procedure matches the well-known convergence rate of
    gradient descent to first-order stationary points, up to log factors. When all
    saddle points are non-degenerate, all second-order stationary points are local
    minima, and our result thus shows that perturbed gradient descent can escape
    saddle points almost for free. Our results can be directly applied to many
    machine learning applications, including deep learning. As a particular
    concrete example of such an application, we show that our results can be used
    directly to establish sharp global convergence rates for matrix factorization.
    Our results rely on a novel characterization of the geometry around saddle
    points, which may be of independent interest to the non-convex optimization
    community.

    Using Synthetic Data to Train Neural Networks is Model-Based Reasoning

    Tuan Anh Le, Atilim Gunes Baydin, Robert Zinkov, Frank Wood
    Comments: 8 pages, 4 figures
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    We draw a formal connection between using synthetic training data to optimize
    neural network parameters and approximate, Bayesian, model-based reasoning. In
    particular, training a neural network using synthetic data can be viewed as
    learning a proposal distribution generator for approximate inference in the
    synthetic-data generative model. We demonstrate this connection in a
    recognition task where we develop a novel Captcha-breaking architecture and
    train it using synthetic data, demonstrating both state-of-the-art performance
    and a way of computing task-specific posterior uncertainty. Using a neural
    network trained this way, we also demonstrate successful breaking of real-world
    Captchas currently used by Facebook and Wikipedia. Reasoning from these
    empirical results and drawing connections with Bayesian modeling, we discuss
    the robustness of synthetic data results and suggest important considerations
    for ensuring good neural network generalization when training with synthetic
    data.

    Learning the Structure of Generative Models without Labeled Data

    Stephen H. Bach, Bryan He, Alexander Ratner, Christopher Ré
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Curating labeled training data has become the primary bottleneck in machine
    learning. Recent frameworks address this bottleneck with generative models to
    synthesize labels at scale from weak supervision sources. The generative
    model’s dependency structure directly affects the quality of the estimated
    labels, but selecting a structure automatically without any labeled data is a
    distinct challenge. We propose a structure estimation method that maximizes the
    (ell_1)-regularized marginal pseudolikelihood of the observed data. Our
    analysis shows that the amount of unlabeled data required to identify the true
    structure scales sublinearly in the number of possible dependencies for a broad
    class of models. Experiments on synthetic data show that our method is
    100( imes) faster than a maximum likelihood approach and selects (1/4) as many
    extraneous dependencies. We also show that our method provides an average of
    1.5 F1 points of improvement over existing, user-developed information
    extraction applications on real-world data such as PubMed journal articles.

    Exact Topology Reconstruction of Radial Dynamical Systems with Applications to Distribution System of the Power Grid

    Saurav Talukdar, Deepjyoti Deka, Donatello Materassi, Murti V. Salapaka
    Comments: 6 pages
    Subjects: Learning (cs.LG); Systems and Control (cs.SY)

    In this article we present a method to reconstruct the interconnectedness of
    dynamically related stochastic processes, where the interactions are
    bi-directional and the underlying topology is a tree. Our approach is based on
    multivariate Wiener filtering which recovers spurious edges apart from the true
    edges in the topology reconstruction. The main contribution of this work is to
    show that all spurious links obtained using Wiener filtering can be eliminated
    if the underlying topology is a tree based on which we present a three stage
    network reconstruction procedure for trees. We illustrate the effectiveness of
    the method developed by applying it on a typical distribution system of the
    electric grid.

    Meta Networks

    Tsendsuren Munkhdalai, Hong Yu
    Comments: initial submission
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Deep neural networks have been successfully applied in applications with a
    large amount of labeled data. However, there are major drawbacks of the neural
    networks that are related to rapid generalization with small data and continual
    learning of new concepts without forgetting. We present a novel meta learning
    method, Meta Networks (MetaNet), that acquires a meta-level knowledge across
    tasks and shifts its inductive bias via fast parameterization for the rapid
    generalization. When tested on the standard one-shot learning benchmarks, our
    MetaNet models achieved near human-level accuracy. We demonstrated several
    appealing properties of MetaNet relating to generalization and continual
    learning.

    Opening the Black Box of Deep Neural Networks via Information

    Ravid Shwartz-Ziv, Naftali Tishby
    Comments: 9 pages, 7 figures
    Subjects: Learning (cs.LG)

    Despite their great success, there is still no com- prehensive theoretical
    understanding of learning with Deep Neural Networks (DNNs) or their in- ner
    organization. Previous work [Tishby & Zaslavsky (2015)] proposed to analyze
    DNNs in the Information Plane; i.e., the plane of the Mutual Information values
    that each layer preserves on the input and output variables. They suggested
    that the goal of the network is to optimize the In- formation Bottleneck (IB)
    tradeoff between com- pression and prediction, successively, for each layer.

    In this work we follow up on this idea and demonstrate the effectiveness of
    the Information- Plane visualization of DNNs. We first show that the stochastic
    gradient descent (SGD) epochs have two distinct phases: fast empirical error
    minimization followed by slow representation compression, for each layer. We
    then argue that the DNN layers end up very close to the IB theo- retical bound,
    and present a new theoretical argu- ment for the computational benefit of the
    hidden layers.

    A Robust Adaptive Stochastic Gradient Method for Deep Learning

    Caglar Gulcehre, Jose Sotelo, Marcin Moczulski, Yoshua Bengio
    Comments: IJCNN 2017 Accepted Paper, An extension of our paper, “ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient”
    Subjects: Learning (cs.LG)

    Stochastic gradient algorithms are the main focus of large-scale optimization
    problems and led to important successes in the recent advancement of the deep
    learning algorithms. The convergence of SGD depends on the careful choice of
    learning rate and the amount of the noise in stochastic estimates of the
    gradients. In this paper, we propose an adaptive learning rate algorithm, which
    utilizes stochastic curvature information of the loss function for
    automatically tuning the learning rates. The information about the element-wise
    curvature of the loss function is estimated from the local statistics of the
    stochastic first order gradients. We further propose a new variance reduction
    technique to speed up the convergence. In our experiments with deep neural
    networks, we obtained better performance compared to the popular stochastic
    gradient algorithms.

    Predicting Rankings of Software Verification Competitions

    Mike Czech, Eyke Hüllermeier, Marie-Christine Jakobs, Heike Wehrheim
    Subjects: Learning (cs.LG); Software Engineering (cs.SE)

    Software verification competitions, such as the annual SV-COMP, evaluate
    software verification tools with respect to their effectivity and efficiency.
    Typically, the outcome of a competition is a (possibly category-specific)
    ranking of the tools. For many applications, such as building portfolio
    solvers, it would be desirable to have an idea of the (relative) performance of
    verification tools on a given verification task beforehand, i.e., prior to
    actually running all tools on the task.

    In this paper, we present a machine learning approach to predicting rankings
    of tools on verification tasks. The method builds upon so-called label ranking
    algorithms, which we complement with appropriate kernels providing a similarity
    measure for verification tasks. Our kernels employ a graph representation for
    software source code that mixes elements of control flow and program dependence
    graphs with abstract syntax trees. Using data sets from SV-COMP, we demonstrate
    our rank prediction technique to generalize well and achieve a rather high
    predictive accuracy. In particular, our method outperforms a recently proposed
    feature-based approach of Demyanova et al. (when applied to rank predictions).

    Wireless Interference Identification with Convolutional Neural Networks

    Malte Schmidt, Dimitri Block, Uwe Meier
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    The steadily growing use of license-free frequency bands requires reliable
    coexistence management for deterministic medium utilization. For interference
    mitigation, proper wireless interference identification (WII) is essential. In
    this work we propose the first WII approach based upon deep convolutional
    neural networks (CNNs). The CNN naively learns its features through
    self-optimization during an extensive data-driven GPU-based training process.
    We propose a CNN example which is based upon sensing snapshots with a limited
    duration of 12.8 {mu}s and an acquisition bandwidth of 10 MHz. The CNN differs
    between 15 classes. They represent packet transmissions of IEEE 802.11 b/g,
    IEEE 802.15.4 and IEEE 802.15.1 with overlapping frequency channels within the
    2.4 GHz ISM band. We show that the CNN outperforms state-of-the-art WII
    approaches and has a classification accuracy greater than 95% for
    signal-to-noise ratio of at least -5 dB.

    Mixing Complexity and its Applications to Neural Networks

    Michal Moshkovitz, Naftali Tishby
    Subjects: Learning (cs.LG)

    We suggest analyzing neural networks through the prism of space constraints.
    We observe that most training algorithms applied in practice use bounded
    memory, which enables us to use a new notion introduced in the study of
    space-time tradeoffs that we call mixing complexity. This notion was devised in
    order to measure the (in)ability to learn using a bounded-memory algorithm. In
    this paper we describe how we use mixing complexity to obtain new results on
    what can and cannot be learned using neural networks.

    A Unifying View of Explicit and Implicit Feature Maps for Structured Data: Systematic Studies of Graph Kernels

    Nils M. Kriege, Marion Neumann, Christopher Morris, Kristian Kersting, Petra Mutzel
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Non-linear kernel methods can be approximated by fast linear ones using
    suitable explicit feature maps allowing their application to large scale
    problems. To this end, explicit feature maps of kernels for vectorial data have
    been extensively studied. As many real-world data is structured, various
    kernels for complex data like graphs have been proposed. Indeed, many of them
    directly compute feature maps. However, the kernel trick is employed when the
    number of features is very large or the individual vertices of graphs are
    annotated by real-valued attributes.

    Can we still compute explicit feature maps efficiently under these
    circumstances? Triggered by this question, we investigate how general
    convolution kernels are composed from base kernels and construct corresponding
    feature maps. We apply our results to widely used graph kernels and analyze for
    which kernels and graph properties computation by explicit feature maps is
    feasible and actually more efficient. In particular, we derive feature maps for
    random walk and subgraph matching kernels and apply them to real-world graphs
    with discrete labels. Thereby, our theoretical results are confirmed
    experimentally by observing a phase transition when comparing running time with
    respect to label diversity, walk lengths and subgraph size, respectively.
    Moreover, we derive approximative, explicit feature maps for state-of-the-art
    kernels supporting real-valued attributes including the GraphHopper and Graph
    Invariant kernels. In extensive experiments we show that our approaches often
    achieve a classification accuracy close to the exact methods based on the
    kernel trick, but require only a fraction of their running time.

    In Search of an Entity Resolution OASIS: Optimal Asymptotic Sequential Importance Sampling

    Neil G. Marchant, Benjamin I. P. Rubinstein
    Comments: 13 pages, 5 figures
    Subjects: Learning (cs.LG); Databases (cs.DB); Machine Learning (stat.ML)

    Entity resolution (ER) presents unique challenges for evaluation methodology.
    While crowd sourcing provides a platform to acquire ground truth, sound
    approaches to sampling must drive labelling efforts. In ER, extreme class
    imbalance between matching and non-matching records can lead to enormous
    labelling requirements when seeking statistically consistent estimates of
    population parameters. This paper addresses this important challenge with the
    OASIS algorithm. OASIS draws samples from a (biased) instrumental distribution,
    chosen to have optimal asymptotic variance. As new labels are collected OASIS
    updates this instrumental distribution via a Bayesian latent variable model of
    the annotator oracle, to quickly focus on regions providing more information.
    We prove that resulting estimates of F-measure, precision, recall converge to
    the true population values. Thorough comparisons of sampling methods on a
    variety of ER datasets demonstrate significant labelling reductions of up to
    75% without loss to estimate accuracy.

    Positive-Unlabeled Learning with Non-Negative Risk Estimator

    Ryuichi Kiryo, Gang Niu, Marthinus C. du Plessis, Masashi Sugiyama
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    From only positive (P) and unlabeled (U) data, a binary classifier could be
    trained with PU learning. Unbiased PU learning that is based on unbiased risk
    estimators is now state of the art. However, if its model is very flexible, its
    empirical risk on training data will go negative, and we will suffer from
    overfitting seriously. In this paper, we propose a novel non-negative risk
    estimator for PU learning. When being minimized, it is more robust against
    overfitting, and thus we are able to train very flexible models given limited P
    data. Moreover, we analyze the bias, consistency and mean-squared-error
    reduction of the proposed risk estimator as well as the estimation error of the
    corresponding risk minimizer. Experiments show that the non-negative risk
    estimator outperforms unbiased counterparts when they disagree.

    Generalization and Equilibrium in Generative Adversarial Nets (GANs)

    Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, Yi Zhang
    Subjects: Learning (cs.LG)

    This paper makes progress on several open theoretical issues related to
    Generative Adversarial Networks. A definition is provided for what it means for
    the training to generalize, and it is shown that generalization is not
    guaranteed for the popular distances between distributions such as
    Jensen-Shannon or Wasserstein. We introduce a new metric called neural net
    distance for which generalization does occur. We also show that an approximate
    pure equilibrium in the 2-player game exists for a natural training objective
    (Wasserstein). Showing such a result has been an open problem (for any training
    objective).

    Finally, the above theoretical ideas lead us to propose a new training
    protocol, MIX+GAN, which can be combined with any existing method. We present
    experiments showing that it stabilizes and improves some existing methods.

    MoleculeNet: A Benchmark for Molecular Machine Learning

    Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, Vijay Pande
    Subjects: Learning (cs.LG); Chemical Physics (physics.chem-ph); Machine Learning (stat.ML)

    Molecular machine learning has been maturing rapidly over the last few years.
    Improved methods and the presence of larger datasets have enabled machine
    learning algorithms to make increasingly accurate predictions about molecular
    properties. However, algorithmic progress has been limited due to the lack of a
    standard benchmark to compare the efficacy of proposed methods; most new
    algorithms are benchmarked on different datasets making it challenging to gauge
    the quality of proposed methods. This work introduces MoleculeNet, a large
    scale benchmark for molecular machine learning. MoleculeNet curates multiple
    public datasets, establishes metrics for evaluation, and offers high quality
    open-source implementations of multiple previously proposed molecular
    featurization and learning algorithms (released as part of the DeepChem open
    source library). MoleculeNet benchmarks demonstrate that learnable
    representations, and in particular graph convolutional networks, are powerful
    tools for molecular machine learning and broadly offer the best performance.
    However, for quantum mechanical and biophysical datasets, the use of
    physics-aware featurizations can be significantly more important than choice of
    particular learning algorithm.

    Signal-based Bayesian Seismic Monitoring

    David A. Moore, Stuart J. Russell
    Comments: Appearing at AISTATS 2017
    Subjects: Learning (cs.LG); Geophysics (physics.geo-ph)

    Detecting weak seismic events from noisy sensors is a difficult perceptual
    task. We formulate this task as Bayesian inference and propose a generative
    model of seismic events and signals across a network of spatially distributed
    stations. Our system, SIGVISA, is the first to directly model seismic
    waveforms, allowing it to incorporate a rich representation of the physics
    underlying the signal generation process. We use Gaussian processes over
    wavelet parameters to predict detailed waveform fluctuations based on
    historical events, while degrading smoothly to simple parametric envelopes in
    regions with no historical seismicity. Evaluating on data from the western US,
    we recover three times as many events as previous work, and reduce mean
    location errors by a factor of four while greatly increasing sensitivity to
    low-magnitude events.

    An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis

    Yuandong Tian
    Subjects: Learning (cs.LG)

    In this paper, we explore theoretical properties of training a two-layered
    ReLU network (g(mathbf{x}; mathbf{w}) = sum_{j=1}^K
    sigma(mathbf{w}_j^Tmathbf{x})) with centered (d)-dimensional spherical
    Gaussian input (mathbf{x}) ((sigma)=ReLU). We train our network with gradient
    descent on (mathbf{w}) to mimic the output of a teacher network with the same
    architecture and fixed parameters (mathbf{w}^*). We show that its population
    gradient has an analytical formula, leading to interesting theoretical analysis
    of critical points and convergence behaviors. First, we prove that critical
    points outside the hyperplane spanned by the teacher parameters
    (“out-of-plane”) are not isolated and form manifolds, and characterize in-plane
    critical-point-free regions for two ReLU case. On the other hand, convergence
    to (mathbf{w}^*) for one ReLU node is guaranteed with at least
    ((1-epsilon)/2) probability, if weights are initialized randomly with standard
    deviation upper-bounded by (O(epsilon/sqrt{d})), consistent with empirical
    practice. For network with many ReLU nodes, we prove that an infinitesimal
    perturbation of weight initialization results in convergence towards
    (mathbf{w}^*) (or its permutation), a phenomenon known as spontaneous
    symmetric-breaking (SSB) in physics. We assume no independence of ReLU
    activations. Simulation verifies our findings.

    Diffusion Independent Semi-Bandit Influence Maximization

    Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks Lakshmanan, Mark Schmidt
    Subjects: Learning (cs.LG)

    We consider emph{influence maximization} (IM) in social networks, which is
    the problem of maximizing the number of users that become aware of a product by
    selecting a set of “seed” users to expose the product to. While prior work
    assumes a known model of information diffusion, we propose a parametrization in
    terms of pairwise reachability which makes our framework agnostic to the
    underlying diffusion model. We give a corresponding monotone, submodular
    surrogate function, and show that it is a good approximation to the original IM
    objective. We also consider the case of a new marketer looking to exploit an
    existing social network, while simultaneously learning the factors governing
    information propagation. For this, we propose a pairwise-influence semi-bandit
    feedback model and develop a LinUCB-based bandit algorithm. Our
    model-independent regret analysis shows that our bound on the cumulative regret
    has a better (as compared to previous work) dependence on the size of the
    network. By using the graph Laplacian eigenbasis to construct features, we
    describe a practical LinUCB implementation. Experimental evaluation suggests
    that our framework is robust to the underlying diffusion model and can
    efficiently learn a near-optimal solution.

    Understanding Synthetic Gradients and Decoupled Neural Interfaces

    Wojciech Marian Czarnecki, Grzegorz Świrszcz, Max Jaderberg, Simon Osindero, Oriol Vinyals, Koray Kavukcuoglu
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    When training neural networks, the use of Synthetic Gradients (SG) allows
    layers or modules to be trained without update locking – without waiting for a
    true error gradient to be backpropagated – resulting in Decoupled Neural
    Interfaces (DNIs). This unlocked ability of being able to update parts of a
    neural network asynchronously and with only local information was demonstrated
    to work empirically in Jaderberg et al (2016). However, there has been very
    little demonstration of what changes DNIs and SGs impose from a functional,
    representational, and learning dynamics point of view. In this paper, we study
    DNIs through the use of synthetic gradients on feed-forward networks to better
    understand their behaviour and elucidate their effect on optimisation. We show
    that the incorporation of SGs does not affect the representational strength of
    the learning system for a neural network, and prove the convergence of the
    learning system for linear and deep linear models. On practical problems we
    investigate the mechanism by which synthetic gradient estimators approximate
    the true loss, and, surprisingly, how that leads to drastically different
    layer-wise representations. Finally, we also expose the relationship of using
    synthetic gradients to other error approximation techniques and find a unifying
    language for discussion and comparison.

    PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison

    Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, Jason H. Moore
    Comments: 14 pages, 5 figures, submitted for review to JMLR
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    The selection, development, or comparison of machine learning methods in data
    mining can be a difficult task based on the target problem and goals of a
    particular study. Numerous publicly available real-world and simulated
    benchmark datasets have emerged from different sources, but their organization
    and adoption as standards have been inconsistent. As such, selecting and
    curating specific benchmarks remains an unnecessary burden on machine learning
    practitioners and data scientists. The present study introduces an accessible,
    curated, and developing public benchmark resource to facilitate identification
    of the strengths and weaknesses of different machine learning methodologies. We
    compare meta-features among the current set of benchmark datasets in this
    resource to characterize the diversity of available data. Finally, we apply a
    number of established machine learning methods to the entire benchmark suite
    and analyze how datasets and algorithms cluster in terms of performance. This
    work is an important first step towards understanding the limitations of
    popular benchmarking suites and developing a resource that connects existing
    benchmarking standards to more diverse and efficient standards in the future.

    Encrypted accelerated least squares regression

    Pedro M. Esperança, Louis J. M. Aslett, Chris C. Holmes
    Comments: Accepted for AISTATS 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Information that is stored in an encrypted format is, by definition, usually
    not amenable to statistical analysis or machine learning methods. In this paper
    we present detailed analysis of coordinate and accelerated gradient descent
    algorithms which are capable of fitting least squares and penalised ridge
    regression models, using data encrypted under a fully homomorphic encryption
    scheme. Gradient descent is shown to dominate in terms of encrypted
    computational speed, and theoretical results are proven to give parameter
    bounds which ensure correctness of decryption. The characteristics of encrypted
    computation are empirically shown to favour a non-standard acceleration
    technique. This demonstrates the possibility of approximating conventional
    statistical regression methods using encrypted data without compromising
    privacy.

    General and Robust Communication-Efficient Algorithms for Distributed Clustering

    Pranjal Awasthi, Maria-Florina Balcan, Colin White
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    As datasets become larger and more distributed, algorithms for distributed
    clustering have become more and more important. In this work, we present a
    general framework for designing distributed clustering algorithms that are
    robust to outliers. Using our framework, we give a distributed approximation
    algorithm for k-means, k-median, or generally any L_p objective, with z
    outliers and/or balance constraints, using O(m(k+z)(d+log n)) bits of
    communication, where m is the number of machines, n is the size of the point
    set, and d is the dimension. This generalizes and improves over previous work
    of Bateni et al. and Malkomes et al. As a special case, we achieve the first
    distributed algorithm for k-median with outliers, answering an open question
    posed by Malkomes et al. For distributed k-means clustering, we provide the
    first dimension-dependent communication complexity lower bound for finding the
    optimal clustering. This improves over the lower bound from Chen et al. which
    is dimension-agnostic.

    Furthermore, we give distributed clustering algorithms which return nearly
    optimal solutions, provided the data satisfies the approximation stability
    condition of Balcan et al. or the spectral stability condition of Kumar and
    Kannan. In certain clustering applications where a local clustering consistent
    among all machines is sufficient, we show that no communication is necessary if
    the data satisfies approximation stability.

    Unsupervised Steganalysis Based on Artificial Training Sets

    Daniel Lerch-Hostalot, David Megías
    Subjects: Multimedia (cs.MM); Learning (cs.LG)

    In this paper, an unsupervised steganalysis method that combines artificial
    training setsand supervised classification is proposed. We provide a formal
    framework for unsupervisedclassification of stego and cover images in the
    typical situation of targeted steganalysis (i.e.,for a known algorithm and
    approximate embedding bit rate). We also present a completeset of experiments
    using 1) eight different image databases, 2) image features based on
    RichModels, and 3) three different embedding algorithms: Least Significant Bit
    (LSB) matching,Highly undetectable steganography (HUGO) and Wavelet Obtained
    Weights (WOW). Weshow that the experimental results outperform previous methods
    based on Rich Models inthe majority of the tested cases. At the same time, the
    proposed approach bypasses theproblem of Cover Source Mismatch -when the
    embedding algorithm and bit rate are known-, since it removes the need of a
    training database when we have a large enough testing set.Furthermore, we
    provide a generic proof of the proposed framework in the machine
    learningcontext. Hence, the results of this paper could be extended to other
    classification problemssimilar to steganalysis.

    A Generic Online Parallel Learning Framework for Large Margin Models

    Shuming Ma, Xu Sun
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    To speed up the training process, many existing systems use parallel
    technology for online learning algorithms. However, most research mainly focus
    on stochastic gradient descent (SGD) instead of other algorithms. We propose a
    generic online parallel learning framework for large margin models, and also
    analyze our framework on popular large margin algorithms, including MIRA and
    Structured Perceptron. Our framework is lock-free and easy to implement on
    existing systems. Experiments show that systems with our framework can gain
    near linear speed up by increasing running threads, and with no loss in
    accuracy.

    Attentive Recurrent Comparators

    Pranav Shyam, Shubham Gupta, Ambedkar Dukkipati
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Many problems in Artificial Intelligence and Machine Learning can be reduced
    to the problem of quantitative comparison of two entities. In Deep Learning the
    ubiquitous architecture used for this task is the Siamese Neural Network which
    maps each entity to a representation through a learnable function and expresses
    similarity through the distances among the entities in the representation
    space. In this paper, we argue that such a static and invariant mapping is both
    naive and unnatural. We develop a novel neural model called Attentive Recurrent
    Comparators (ARCs) that dynamically compares two entities and test the model
    extensively on the Omniglot dataset. In the task of similarity learning, our
    simplistic model that does not use any convolutions performs on par with Deep
    Convolutional Siamese Networks and significantly better when convolutional
    layers are also used. In the challenging task of one-shot learning on the same
    dataset, an ARC based model achieves the first super-human performance for a
    neural method with an error rate of 1.5\%.

    Distributed Bayesian Matrix Factorization with Minimal Communication

    Xiangju Qin, Paul Blomstedt, Eemeli Leppäaho, Pekka Parviainen, Samuel Kaski
    Comments: 10 pages, 11 figures
    Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Numerical Analysis (cs.NA); Methodology (stat.ME)

    Bayesian matrix factorization (BMF) is a powerful tool for producing low-rank
    representations of matrices, and giving principled predictions of missing
    values. However, scaling up MCMC samplers to large matrices has proven to be
    difficult with parallel algorithms that require communication between MCMC
    iterations. On the other hand, designing communication-free algorithms is
    challenging due to the inherent unidentifiability of BMF solutions. We propose
    posterior propagation, an embarrassingly parallel inference procedure, which
    hierarchically introduces dependencies between data subsets and thus alleviates
    the unidentifiability problem.

    Adaptive Matching for Expert Systems with Uncertain Task Types

    Virag Shah, Lennart Gulikers, Laurent Massoulie, Milan Vojnovic
    Comments: 19 pages
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Online two-sided matching markets such as Q&A forums (e.g. StackOverflow,
    Quora) and online labour platforms (e.g. Upwork) critically rely on the ability
    to propose adequate matches based on imperfect knowledge of the two parties to
    be matched. This prompts the following question: Which matching recommendation
    algorithms can, in the presence of such uncertainty, lead to efficient platform
    operation?

    To answer this question, we develop a model of a task / server matching
    system. For this model, we give a necessary and sufficient condition for an
    incoming stream of tasks to be manageable by the system. We further identify a
    so-called back-pressure policy under which the throughput that the system can
    handle is optimized. We show that this policy achieves strictly larger
    throughput than a natural greedy policy. Finally, we validate our model and
    confirm our theoretical findings with experiments based on logs of
    Math.StackExchange, a StackOverflow forum dedicated to mathematics.

    Introduction to Nonnegative Matrix Factorization

    Nicolas Gillis
    Comments: 18 pages, 4 figures
    Journal-ref: SIAG/OPT Views and News 25 (1), pp. 7-16 (2017)
    Subjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Numerical Analysis (cs.NA); Machine Learning (stat.ML)

    In this paper, we introduce and provide a short overview of nonnegative
    matrix factorization (NMF). Several aspects of NMF are discussed, namely, the
    application in hyperspectral imaging, geometry and uniqueness of NMF solutions,
    complexity, algorithms, and its link with extended formulations of polyhedra.
    In order to put NMF into perspective, the more general problem class of
    constrained low-rank matrix approximation problems is first briefly introduced.

    Traffic-Aware Transmission Mode Selection in D2D-enabled Cellular Networks with Token System

    Yiling Yuan, Tao Yang, Hui Feng, Bo Hu, Jianqiu Zhang, Bin Wang, Qiyong Lu
    Subjects: Information Theory (cs.IT); Learning (cs.LG)

    We consider a D2D-enabled cellular network where user equipments (UEs) owned
    by rational users are incentivized to form D2D pairs using tokens. They
    exchange tokens electronically to “buy” and “sell” D2D services. Meanwhile the
    devices have the ability to choose the transmission mode, i.e. receiving data
    via cellular links or D2D links. Thus taking the different benefits brought by
    diverse traffic types as a prior, the UEs can utilize their tokens more
    efficiently via transmission mode selection. In this paper, the optimal
    transmission mode selection strategy as well as token collection policy are
    investigated to maximize the long-term utility in the dynamic network
    environment. The optimal policy is proved to be a threshold strategy, and the
    thresholds have a monotonicity property. Numerical simulations verify our
    observations and the gain from transmission mode selection is observed.

    Active Learning for Accurate Estimation of Linear Models

    Carlos Riquelme, Mohammad Ghavamzadeh, Alessandro Lazaric
    Comments: 37 pages, 8 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We explore the sequential decision making problem where the goal is to
    estimate uniformly well a number of linear models, given a shared budget of
    random contexts independently sampled from a known distribution. The decision
    maker must query one of the linear models for each incoming context, and
    receives an observation corrupted by noise levels that are unknown, and depend
    on the model instance. We present Trace-UCB, an adaptive allocation algorithm
    that learns the noise levels while balancing contexts accordingly across the
    different linear functions, and derive guarantees for simple regret in both
    expectation and high-probability. Finally, we extend the algorithm and its
    guarantees to high dimensional settings, where the number of linear models
    times the dimension of the contextual space is higher than the total budget of
    samples. Simulations with real data suggest that Trace-UCB is remarkably
    robust, outperforming a number of baselines even when its assumptions are
    violated.

    Human Interaction with Recommendation Systems: On Bias and Explorationn

    Sven Schmit, Carlos Riquelme
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Recommendation systems rely on historical user data to provide suggestions.
    We propose an explicit and simple model for the interaction between users and
    recommendations provided by a platform, and relate this model to the
    multi-armed bandit literature. First, we show that this interaction leads to a
    bias in naive estimators due to selection effects. This bias leads to
    suboptimal outcomes, which we quantify in terms of linear regret. We end the
    first part by discussing ways to obtain unbiased estimates. The second part of
    this work considers exploration of alternatives. We show that although agents
    are myopic, agents’ heterogeneous preferences ensure that recommendation
    systems ‘learn’ about all alternatives without explicitly incentivizing this
    exploration. This work provides new and practical insights relevant to a wide
    range of systems designed to help users make better decisions.

    Truth and Regret in Online Scheduling

    Shuchi Chawla, Nikhil Devanur, Janardhan Kulkarni, Rad Niazadeh
    Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Learning (cs.LG)

    We consider a scheduling problem where a cloud service provider has multiple
    units of a resource available over time. Selfish clients submit jobs, each with
    an arrival time, deadline, length, and value. The service provider’s goal is to
    implement a truthful online mechanism for scheduling jobs so as to maximize the
    social welfare of the schedule. Recent work shows that under a stochastic
    assumption on job arrivals, there is a single-parameter family of mechanisms
    that achieves near-optimal social welfare. We show that given any such family
    of near-optimal online mechanisms, there exists an online mechanism that in the
    worst case performs nearly as well as the best of the given mechanisms. Our
    mechanism is truthful whenever the mechanisms in the given family are truthful
    and prompt, and achieves optimal (within constant factors) regret.

    We model the problem of competing against a family of online scheduling
    mechanisms as one of learning from expert advice. A primary challenge is that
    any scheduling decisions we make affect not only the payoff at the current
    step, but also the resource availability and payoffs in future steps.
    Furthermore, switching from one algorithm (a.k.a. expert) to another in an
    online fashion is challenging both because it requires synchronization with the
    state of the latter algorithm as well as because it affects the incentive
    structure of the algorithms. We further show how to adapt our algorithm to a
    non-clairvoyant setting where job lengths are unknown until jobs are run to
    completion. Once again, in this setting, we obtain truthfulness along with
    asymptotically optimal regret (within poly-logarithmic factors).

    Reinforcement Learning for Pivoting Task

    Rika Antonova, Silvia Cruciani, Christian Smith, Danica Kragic
    Comments: (Rika Antonova and Silvia Cruciani contributed equally)
    Subjects: Robotics (cs.RO); Learning (cs.LG)

    In this work we propose an approach to learn a robust policy for solving the
    pivoting task. Recently, several model-free continuous control algorithms were
    shown to learn successful policies without prior knowledge of the dynamics of
    the task. However, obtaining successful policies required thousands to millions
    of training episodes, limiting the applicability of these approaches to real
    hardware. We developed a training procedure that allows us to use a simple
    custom simulator to learn policies robust to the mismatch of simulation vs
    robot. In our experiments, we demonstrate that the policy learned in the
    simulator is able to pivot the object to the desired target angle on the real
    robot. We also show generalization to an object with different inertia, shape,
    mass and friction properties than those used during training. This result is a
    step towards making model-free reinforcement learning available for solving
    robotics tasks via pre-training in simulators that offer only an imprecise
    match to the real-world dynamics.


    Information Theory

    On a class of constacyclic codes over the non-principal ideal ring (mathbb{Z}_{p^s}+umathbb{Z}_{p^s})

    Yuan Cao, Yonglin Cao
    Subjects: Information Theory (cs.IT)

    ((1+pw))-constacyclic codes of arbitrary length over the non-principal ideal
    ring (mathbb{Z}_{p^s} +umathbb{Z}_{p^s}) are studied, where (p) is a prime,
    (win mathbb{Z}_{p^s}^{ imes}) and (s) an integer satisfying (sgeq 2).
    First, the structure of any ((1+pw))-constacyclic code over (mathbb{Z}_{p^s}
    +umathbb{Z}_{p^s}) are presented. Then enumerations for the number of all
    codes and the number of codewords in each code, and the structure of dual codes
    for these codes are given, respectively. Then self-dual ((1+2w))-constacyclic
    codes over (mathbb{Z}_{2^s} +umathbb{Z}_{2^s}) are investigated, where
    (w=2^{s-2}-1) or (2^{s-1}-1) if (sgeq 3), and (w=1) if (s=2).

    Peterson-Gorenstein-Zierler algorithm for skew RS codes

    José Gómez-Torrecillas, F. J. Lobillo, Gabriel Navarro
    Subjects: Information Theory (cs.IT)

    We design a non-commutative version of the Peterson-Gorenstein-Zierler
    decoding algorithm for a class of codes that we call skew RS codes. These codes
    are left ideals of a quotient of a skew polynomial ring, which endow them of a
    sort of non-commutative cyclic structure. Since we work over an arbitrary
    field, our techniques may be applied both to linear block codes and
    convolutional codes. In particular, our decoding algorithm applies for block
    codes beyond the classical cyclic case.

    Secrecy and Robustness for Active Attack in Secure Network Coding

    Masahito Hayashi, Masaki Owari, Go Kato, Ning Cai
    Subjects: Information Theory (cs.IT)

    In the network coding, we discuss the effect by sequential error injection to
    information leakage. We show that there is no improvement when the network is
    composed of linear operations. However, when the network contains non-linear
    operations, we find a counterexample to improve Eve’s obtained information.
    Further, we discuss the asymptotic rate in the linear network under the secrecy
    and robustness conditions.

    Wireless Power Transfer for Distributed Estimation in Sensor Networks

    Vien V. Mai, Won-Yong Shin, Koji Ishibashi
    Comments: 24 pages, 6 figures, To appear in IEEE Journal of Selected Topics in Signal Processing
    Subjects: Information Theory (cs.IT); Multiagent Systems (cs.MA); Networking and Internet Architecture (cs.NI); Optimization and Control (math.OC)

    This paper studies power allocation for distributed estimation of an unknown
    scalar random source in sensor networks with a multiple-antenna fusion center
    (FC), where wireless sensors are equipped with radio-frequency based energy
    harvesting technology. The sensors’ observation is locally processed by using
    an uncoded amplify-and-forward scheme. The processed signals are then sent to
    the FC, and are coherently combined at the FC, at which the best linear
    unbiased estimator (BLUE) is adopted for reliable estimation. We aim to solve
    the following two power allocation problems: 1) minimizing distortion under
    various power constraints; and 2) minimizing total transmit power under
    distortion constraints, where the distortion is measured in terms of
    mean-squared error of the BLUE. Two iterative algorithms are developed to solve
    the non-convex problems, which converge at least to a local optimum. In
    particular, the above algorithms are designed to jointly optimize the
    amplification coefficients, energy beamforming, and receive filtering. For each
    problem, a suboptimal design, a single-antenna FC scenario, and a common
    harvester deployment for colocated sensors, are also studied. Using the
    powerful semidefinite relaxation framework, our result is shown to be valid for
    any number of sensors, each with different noise power, and for an arbitrarily
    number of antennas at the FC.

    Unveiling Bias Compensation in Turbo-Based Algorithms for (Discrete) Compressed Sensing

    Susanne Sparrer, Robert F.H. Fischer
    Subjects: Information Theory (cs.IT)

    In Compressed Sensing, a real-valued sparse vector has to be recovered from
    an underdetermined system of linear equations. In many applications, however,
    the elements of the sparse vector are drawn from a finite set. Adapted
    algorithms incorporating this additional knowledge are required for the
    discrete-valued setup. In this paper, turbo-based algorithms for both cases are
    elucidated and analyzed from a communications engineering perspective, leading
    to a deeper understanding of the algorithm. In particular, we gain the
    intriguing insight that the calculation of extrinsic values is equal to the
    unbiasing of a biased estimate and present an improved algorithm.

    Artificial Noise-Aided Biobjective Transmitter Optimization for Service Integration in Multi-User MIMO Gaussian Broadcast Channel

    Weidong Mei, Zhi Chen, Jun Fang, Shaoqian Li
    Comments: 12 pages, 5 figures
    Subjects: Information Theory (cs.IT)

    This paper considers an artificial noise (AN)-aided transmit design for
    multi-user MIMO systems with integrated services. Specifically, two sorts of
    service messages are combined and served simultaneously: one multicast message
    intended for all receivers and one confidential message intended for only one
    receiver and required to be perfectly secure from other unauthorized receivers.
    Our interest lies in the joint design of input covariances of the multicast
    message, confidential message and artificial noise (AN), such that the
    achievable secrecy rate and multicast rate are simultaneously maximized. This
    problem is identified as a secrecy rate region maximization (SRRM) problem in
    the context of physical-layer service integration. Since this bi-objective
    optimization problem is inherently complex to solve, we put forward two
    different scalarization methods to convert it into a scalar optimization
    problem. First, we propose to prefix the multicast rate as a constant, and
    accordingly, the primal biobjective problem is converted into a secrecy rate
    maximization (SRM) problem with quality of multicast service (QoMS) constraint.
    By varying the constant, we can obtain different Pareto optimal points. The
    resulting SRM problem can be iteratively solved via a provably convergent
    difference-of-concave (DC) algorithm. In the second method, we aim to maximize
    the weighted sum of the secrecy rate and the multicast rate. Through varying
    the weighted vector, one can also obtain different Pareto optimal points. We
    show that this weighted sum rate maximization (WSRM) problem can be recast into
    a primal decomposable form, which is amenable to alternating optimization (AO).
    Then we compare these two scalarization methods in terms of their overall
    performance and computational complexity via theoretical analysis as well as
    numerical simulation, based on which new insights can be drawn.

    On a User-Centric Base Station Cooperation Scheme for Reliable Communications

    Dong Min Kim, Henning Thomsen, Petar Popovski
    Comments: to be presented in IEEE VTC 2017 Spring
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    In this paper, we describe CoMP2flex, a user-centric base station (BS)
    cooperation scheme that provides improvements in reliability of both uplink
    (UL) and downlink (DL) communications of wireless cellular networks. CoMP2flex
    supports not only cooperation of two BSs with same direction of traffic but
    also cooperation of two BSs serving bidirectional traffic. The reliability
    performance of CoMP2flex is shown with numerical simulations and analytical
    expressions. We quantify and numerically validate the performance of the greedy
    BS pairing algorithm by comparing maximum weight matching methods, implemented
    as the Edmonds matching algorithm for weighted graphs.

    Traffic-Aware Transmission Mode Selection in D2D-enabled Cellular Networks with Token System

    Yiling Yuan, Tao Yang, Hui Feng, Bo Hu, Jianqiu Zhang, Bin Wang, Qiyong Lu
    Subjects: Information Theory (cs.IT); Learning (cs.LG)

    We consider a D2D-enabled cellular network where user equipments (UEs) owned
    by rational users are incentivized to form D2D pairs using tokens. They
    exchange tokens electronically to “buy” and “sell” D2D services. Meanwhile the
    devices have the ability to choose the transmission mode, i.e. receiving data
    via cellular links or D2D links. Thus taking the different benefits brought by
    diverse traffic types as a prior, the UEs can utilize their tokens more
    efficiently via transmission mode selection. In this paper, the optimal
    transmission mode selection strategy as well as token collection policy are
    investigated to maximize the long-term utility in the dynamic network
    environment. The optimal policy is proved to be a threshold strategy, and the
    thresholds have a monotonicity property. Numerical simulations verify our
    observations and the gain from transmission mode selection is observed.

    Learning Mixtures of Sparse Linear Regressions Using Sparse Graph Codes

    Dong Yin, Ramtin Pedarsani, Yudong Chen, Kannan Ramchandran
    Subjects: Information Theory (cs.IT)

    In this paper, we consider the mixture of sparse linear regressions model.
    Let ({eta}^{(1)},ldots,{eta}^{(L)}inmathbb{C}^n) be ( L ) unknown sparse
    parameter vectors with a total of ( K ) non-zero coefficients. Noisy linear
    measurements are obtained in the form (y_i={x}_i^H {eta}^{(ell_i)} + w_i),
    each of which is generated randomly from one of the sparse vectors with the
    label ( ell_i ) unknown. The goal is to estimate the parameter vectors
    efficiently with low sample and computational costs. This problem presents
    significant challenges as one needs to simultaneously solve the demixing
    problem of recovering the labels ( ell_i ) as well as the estimation problem
    of recovering the sparse vectors ( {eta}^{(ell)} ).

    Our solution to the problem leverages the connection between modern coding
    theory and statistical inference. We introduce a new algorithm, Mixed-Coloring,
    which samples the mixture strategically using query vectors ( {x}_i )
    constructed based on ideas from sparse graph codes. Our novel code design
    allows for both efficient demixing and parameter estimation. The algorithm
    achieves the order-optimal sample and time complexities of (Theta(K)) in the
    noiseless setting, and near-optimal (Theta(K ext{polylog}(n))) complexities
    in the noisy setting. In one of our experiments, to recover a mixture of two
    regressions with dimension (n=500) and sparsity (K=50), our algorithm is more
    than (300) times faster than EM algorithm, with about ( 1/3 ) of its sample
    cost.

    Infinity-Norm Permutation Covering Codes from Cyclic Groups

    Ronen Karni, Moshe Schwartz
    Subjects: Information Theory (cs.IT)

    We study covering codes of permutations with the (ell_infty)-metric. We
    provide a general code construction, which uses smaller building-block codes.
    We study cyclic transitive groups as building blocks, determining their exact
    covering radius, and showing linear-time algorithms for finding a covering
    codeword. We also bound the covering radius of relabeled cyclic transitive
    groups under conjugation.

    A Distortion Based Approach for Protecting Inferences

    Chi-Yo Tsai, Gaurav Kumar Agarwal, Christina Fragouli, Suhas Diggavi
    Subjects: Information Theory (cs.IT)

    Eavesdropping attacks in inference systems aim to learn not the raw data, but
    the system inferences to predict and manipulate system actions. We argue that
    conventional entropy measures can be ambiguous on the adversary’s estimation
    abilities, and adopt instead a distortion based framework. We show that
    requiring perfect distortion-based security is more frugal than requiring
    perfect entropy-based secrecy even for block length 1 codes, offering in some
    cases unbounded gains. Within this framework, we design algorithms that enable
    to efficiently use shared randomness, and show that each shared random key is
    exponentially useful in security.

    Multidimensional Sampling of Isotropically Bandlimited Signals

    Erik Agrell, Balázs Csébfalvi
    Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)

    A new lower bound on the average reconstruction error variance of
    multidimensional sampling and reconstruction is presented. It applies to
    sampling on arbitrary lattices in arbitrary dimensions, assuming a stochastic
    process with constant, isotropically bandlimited spectrum and reconstruction by
    the best linear interpolator. The lower bound is exact for any lattice at
    sufficiently high and low sampling rates. The two threshold rates where the
    error variance deviates from the lower bound gives two optimality criteria for
    sampling lattices. It is proved that at low rates, near the first threshold,
    the optimal lattice is the dual of the best sphere-covering lattice, which for
    the first time establishes a rigorous relation between optimal sampling and
    optimal sphere covering. A previously known result is confirmed at high rates,
    near the second threshold, namely, that the optimal lattice is the dual of the
    best sphere-packing lattice. Numerical results quantify the performance of
    various lattices for sampling and support the theoretical optimality criteria.

    Being Robust (in High Dimensions) Can Be Practical

    Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Machine Learning (stat.ML)

    Robust estimation is much more challenging in high dimensions than it is in
    one dimension: Most techniques either lead to intractable optimization problems
    or estimators that can tolerate only a tiny fraction of errors. Recent work in
    theoretical computer science has shown that, in appropriate distributional
    models, it is possible to robustly estimate the mean and covariance with
    polynomial time algorithms that can tolerate a constant fraction of
    corruptions, independent of the dimension. However, the sample and time
    complexity of these algorithms is prohibitively large for high-dimensional
    applications. In this work, we address both of these issues by establishing
    sample complexity bounds that are optimal, up to logarithmic factors, as well
    as giving various refinements that allow the algorithms to tolerate a much
    larger fraction of corruptions. Finally, we show on both synthetic and real
    data that our algorithms have state-of-the-art performance and suddenly make
    high-dimensional robust estimation a realistic possibility.

    Wireless Node Cooperation with Resource Availability Constraints

    Luis David Alvarez Corrales, Anastasios Giovanidis, Philippe Martins, Laurent Decreusefond
    Comments: submitted, 12 pages, double-column, 7 figures, 8 sub-figures in total
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Performance (cs.PF); Probability (math.PR)

    Base station cooperation is a promising scheme to improve network performance
    for next generation cellular networks. Up to this point research has focused on
    station grouping criteria based solely on geographic proximity. However, for
    the cooperation to be meaningful, each station participating in a group should
    have sufficient available resources to share with others. In this work we
    consider an alternative grouping criterion based on a distance that considers
    both geographic proximity and available resources of the stations. When the
    network is modelled by a Poisson Point Process, we derive analytical formulas
    on the proportion of cooperative pairs or single stations, and the expected sum
    interference from each of the groups. The results illustrate that cooperation
    gains strongly depend on the distribution of available resources over the
    network.




沪ICP备19023445号-2号
友情链接