IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Thu, 27 Apr 2017

    我爱机器学习(52ml.net)发表于 2017-04-27 00:00:00
    love 0

    Neural and Evolutionary Computing

    A Recurrent Neural Model with Attention for the Recognition of Chinese Implicit Discourse Relations

    Samuel Rönnqvist, Niko Schenk, Christian Chiarcos
    Comments: To appear at ACL2017, code available at this https URL
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We introduce an attention-based Bi-LSTM for Chinese implicit discourse
    relations and demonstrate that modeling argument pairs as a joint sequence can
    outperform word order-agnostic approaches. Our model benefits from a partial
    sampling scheme and is conceptually simple, yet achieves state-of-the-art
    performance on the Chinese Discourse Treebank. We also visualize its attention
    activity to illustrate the model’s ability to selectively focus on the relevant
    parts of an input sequence.

    The loss surface of deep and wide neural networks

    Quynh Nguyen, Matthias Hein
    Comments: 14 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    While the optimization problem behind deep neural networks is highly
    non-convex, it is frequently observed in practice that training deep networks
    seems possible without getting stuck in suboptimal points. It has been argued
    that this is the case as all local minima are close to being globally optimal.
    We show that this is (almost) true, in fact almost all local minima are
    globally optimal, for a fully connected network with squared loss and analytic
    activation function given that the number of hidden units of one layer of the
    network is larger than the number of training points and the network structure
    from this layer on is pyramidal.

    Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car

    Mariusz Bojarski, Philip Yeres, Anna Choromanska, Krzysztof Choromanski, Bernhard Firner, Lawrence Jackel, Urs Muller
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

    As part of a complete software stack for autonomous driving, NVIDIA has
    created a neural-network-based system, known as PilotNet, which outputs
    steering angles given images of the road ahead. PilotNet is trained using road
    images paired with the steering angles generated by a human driving a
    data-collection car. It derives the necessary domain knowledge by observing
    human drivers. This eliminates the need for human engineers to anticipate what
    is important in an image and foresee all the necessary rules for safe driving.
    Road tests demonstrated that PilotNet can successfully perform lane keeping in
    a wide variety of driving conditions, regardless of whether lane markings are
    present or not.

    The goal of the work described here is to explain what PilotNet learns and
    how it makes its decisions. To this end we developed a method for determining
    which elements in the road image most influence PilotNet’s steering decision.
    Results show that PilotNet indeed learns to recognize relevant objects on the
    road.

    In addition to learning the obvious features such as lane markings, edges of
    roads, and other cars, PilotNet learns more subtle features that would be hard
    to anticipate and program by engineers, for example, bushes lining the edge of
    the road and atypical vehicle classes.


    Computer Vision and Pattern Recognition

    C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset

    Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    Visual Question Answering (VQA) has received a lot of attention over the past
    couple of years. A number of deep learning models have been proposed for this
    task. However, it has been shown that these models are heavily driven by
    superficial correlations in the training data and lack compositionality — the
    ability to answer questions about unseen compositions of seen concepts. This
    compositionality is desirable and central to intelligence. In this paper, we
    propose a new setting for Visual Question Answering where the test
    question-answer pairs are compositionally novel compared to training
    question-answer pairs. To facilitate developing models under this setting, we
    present a new compositional split of the VQA v1.0 dataset, which we call
    Compositional VQA (C-VQA). We analyze the distribution of questions and answers
    in the C-VQA splits. Finally, we evaluate several existing VQA models under
    this new setting and show that the performances of these models degrade by a
    significant amount compared to the original VQA setting.

    New region force for variational models in image segmentation and high dimensional data clustering

    Ke Wei, Ke Yin, Xue-Cheng Tai, Tony F. Chan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose an effective framework for multi-phase image segmentation and
    semi-supervised data clustering by introducing a novel region force term into
    the Potts model. Assume the probability that a pixel or a data point belongs to
    each class is known a priori. We show that the corresponding indicator function
    obeys the Bernoulli distribution and the new region force function can be
    computed as the negative log-likelihood function under the Bernoulli
    distribution. We solve the Potts model by the primal-dual hybrid gradient
    method and the augmented Lagrangian method, which are based on two different
    dual problems of the same primal problem. Empirical evaluations of the Potts
    model with the new region force function on benchmark problems show that it is
    competitive with existing variational methods in both image segmentation and
    semi-supervised data clustering.

    Compact Descriptors for Video Analysis: the Emerging MPEG Standard

    Ling-Yu Duan, Vijay Chandrasekhar, Shiqi Wang, Yihang Lou, Jie Lin, Yan Bai, Tiejun Huang, Alex Chichung Kot, Wen Gao
    Comments: 4 figures, 4 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper provides an overview of the on-going compact descriptors for video
    analysis standard (CDVA) from the ISO/IEC moving pictures experts group (MPEG).
    MPEG-CDVA targets at defining a standardized bitstream syntax to enable
    interoperability in the context of video analysis applications. During the
    developments of MPEGCDVA, a series of techniques aiming to reduce the
    descriptor size and improve the video representation ability have been
    proposed. This article describes the new standard that is being developed and
    reports the performance of these key technical contributions.

    Multimodal MRI brain tumor segmentation using random forests with features learned from fully convolutional neural network

    Mohammadreza Soltaninejad, Lei Zhang, Tryphon Lambrou, Nigel Allinson, Xujiong Ye
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In this paper, we propose a novel learning based method for automated
    segmenta-tion of brain tumor in multimodal MRI images. The machine learned
    features from fully convolutional neural network (FCN) and hand-designed texton
    fea-tures are used to classify the MRI image voxels. The score map with
    pixel-wise predictions is used as a feature map which is learned from
    multimodal MRI train-ing dataset using the FCN. The learned features are then
    applied to random for-ests to classify each MRI image voxel into normal brain
    tissues and different parts of tumor. The method was evaluated on BRATS 2013
    challenge dataset. The results show that the application of the random forest
    classifier to multimodal MRI images using machine-learned features based on FCN
    and hand-designed features based on textons provides promising segmentations.
    The Dice overlap measure for automatic brain tumor segmentation against ground
    truth is 0.88, 080 and 0.73 for complete tumor, core and enhancing tumor,
    respectively.

    Misdirected Registration Uncertainty

    Jie Luo, Karteek Popuri, Dana Cobzas, Hongyi Ding, William M. Wells, Masashi Sugiyama
    Comments: raw version
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Being a task of establishing spatial correspondences, medical image
    registration is often formalized as finding the optimal transformation that
    best aligns two images. Since the transformation is such an essential component
    of registration, most existing researches conventionally quantify the
    registration uncertainty, which is the confidence in the estimated spatial
    correspondences, by the transformation uncertainty. In this paper, we give
    concrete examples and reveal that using the transformation uncertainty to
    quantify the registration uncertainty is inappropriate and sometimes
    misleading. Based on this finding, we also raise attention to an important yet
    subtle aspect of probabilistic image registration, that is whether it is
    reasonable to determine the correspondence of a registered voxel solely by the
    mode of its transformation distribution.

    A Faster Patch Ordering Method for Image Denoising

    Badre Munir
    Comments: 4 pages, 1 figure, 2 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Among the patch-based image denoising processing methods, smooth ordering of
    local patches (patch ordering) has been shown to give state-of-art results. For
    image denoising the patch ordering method forms two large TSPs (Traveling
    Salesman Problem) comprised of nodes in N-dimensional space. Ten approximate
    solutions of the two large TSPs are then used in a filtering process to form
    the reconstructed image. Use of large TSPs makes patch ordering a
    computationally intensive method. A modified patch ordering method for image
    denoising is proposed. In the proposed method, several smaller-sized TSPs are
    formed and the filtering process varied to work with solutions of these smaller
    TSPs. In terms of PSNR, denoising results of the proposed method differed by
    0.032 dB to 0.016 dB on average. In original method, solving TSPs was observed
    to consume 85% of execution time. In proposed method, the time for solving TSPs
    can be reduced to half of the time required in original method. The proposed
    method can denoise images in 40% less time.

    AutoDIAL: Automatic DomaIn Alignment Layers

    Fabio Maria Carlucci, Lorenzo Porzi, Barbara Caputo, Elisa Ricci, Samuel Rota Bulò
    Comments: arXiv admin note: substantial text overlap with arXiv:1702.06332
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Classifiers trained on given databases perform poorly when tested on data
    acquired in different settings. This is explained in domain adaptation through
    a shift among distributions of the source and target domains. Attempts to align
    them have traditionally resulted in works reducing the domain shift by
    introducing appropriate loss terms, measuring the discrepancies between source
    and target distributions, in the objective function. Here we take a different
    route, proposing to align the learned representations by embedding in any given
    network specific Domain Alignment Layers, designed to match the source and
    target feature distributions to a reference one. Opposite to previous works
    which define a priori in which layers adaptation should be performed, our
    method is able to automatically learn the degree of feature alignment required
    at different levels of the deep network. Thorough experiments on different
    public benchmarks, in the unsupervised setting, confirm the power of our
    approach.

    SphereFace: Deep Hypersphere Embedding for Face Recognition

    Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, Le Song
    Comments: To appear in CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper addresses deep face recognition (FR) problem under open-set
    protocol, where ideal face features are expected to have smaller maximal
    intra-class distance than minimal inter-class distance under a suitably chosen
    metric space. However, few existing algorithms can effectively achieve this
    criterion. To this end, we propose the angular softmax (A-Softmax) loss that
    enables convolutional neural networks (CNNs) to learn angularly discriminative
    features. Geometrically, A-Softmax loss can be viewed as imposing
    discriminative constraints on a hypersphere manifold, which intrinsically
    matches the prior that faces also lie on a manifold. Moreover, the size of
    angular margin can be quantitatively adjusted by a parameter m. We further
    derive specific (m) to approximate the ideal feature criterion. Extensive
    analysis and experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF)
    and MegaFace Challenge 1 show the superiority of A-Softmax loss in FR tasks.

    Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading

    Adriana Fernandez-Lopez, Federico M. Sukno
    Comments: International Conference on Computer Vision Theory and Applications (VISAPP 2017)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Speech is the most common communication method between humans and involves
    the perception of both auditory and visual channels. Automatic speech
    recognition focuses on interpreting the audio signals, but it has been
    demonstrated that video can provide information that is complementary to the
    audio. Thus, the study of automatic lip-reading is important and is still an
    open problem. One of the key challenges is the definition of the visual
    elementary units (the visemes) and their vocabulary. Many researchers have
    analyzed the importance of the phoneme to viseme mapping and have proposed
    viseme vocabularies with lengths between 11 and 15 visemes. These viseme
    vocabularies have usually been manually defined by their linguistic properties
    and in some cases using decision trees or clustering techniques. In this work,
    we focus on the automatic construction of an optimal viseme vocabulary based on
    the association of phonemes with similar appearance. To this end, we construct
    an automatic system that uses local appearance descriptors to extract the main
    characteristics of the mouth region and HMMs to model the statistic relations
    of both viseme and phoneme sequences. To compare the performance of the system
    different descriptors (PCA, DCT and SIFT) are analyzed. We test our system in a
    Spanish corpus of continuous speech. Our results indicate that we are able to
    recognize approximately 58% of the visemes, 47% of the phonemes and 23% of the
    words in a continuous speech scenario and that the optimal viseme vocabulary
    for Spanish is composed by 20 visemes.

    Airway segmentation from 3D chest CT volumes based on volume of interest using gradient vector flow

    Qier Meng, Takayuki Kitasaka, Masahiro Oda, Junji Ueno, Kensaku Mori
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Some lung diseases are related to bronchial airway structures and morphology.
    Although airway segmentation from chest CT volumes is an important task in the
    computer-aided diagnosis and surgery assistance systems for the chest, complete
    3-D airway structure segmentation is a quite challenging task due to its
    complex tree-like structure. In this paper, we propose a new airway
    segmentation method from 3D chest CT volumes based on volume of interests (VOI)
    using gradient vector flow (GVF). This method segments the bronchial regions by
    applying the cavity enhancement filter (CEF) to trace the bronchial tree
    structure from the trachea. It uses the CEF in the VOI to segment each branch.
    And a tube-likeness function based on GVF and the GVF magnitude map in each VOI
    are utilized to assist predicting the positions and directions of child
    branches. By calculating the tube-likeness function based on GVF and the GVF
    magnitude map, the airway-like candidate structures are identified and their
    centrelines are extracted. Based on the extracted centrelines, we can detect
    the branch points of the bifurcations and directions of the airway branches in
    the next level. At the same time, a leakage detection is performed to avoid the
    leakage by analysing the pixel information and the shape information of airway
    candidate regions extracted in the VOI. Finally, we unify all of the extracted
    bronchial regions to form an integrated airway tree. Preliminary experiments
    using four cases of chest CT volumes demonstrated that the proposed method can
    extract more bronchial branches in comparison with other methods.

    Towards Estimating the Upper Bound of Visual-Speech Recognition: The Visual Lip-Reading Feasibility Database

    Adriana Fernandez-Lopez, Oriol Martinez, Federico M. Sukno
    Comments: IEEE International Conference on Automatic Face and Gesture Recognition
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Speech is the most used communication method between humans and it involves
    the perception of auditory and visual channels. Automatic speech recognition
    focuses on interpreting the audio signals, although the video can provide
    information that is complementary to the audio. Exploiting the visual
    information, however, has proven challenging. On one hand, researchers have
    reported that the mapping between phonemes and visemes (visual units) is
    one-to-many because there are phonemes which are visually similar and
    indistinguishable between them. On the other hand, it is known that some people
    are very good lip-readers (e.g: deaf people). We study the limit of visual only
    speech recognition in controlled conditions. With this goal, we designed a new
    database in which the speakers are aware of being read and aim to facilitate
    lip-reading. In the literature, there are discrepancies on whether
    hearing-impaired people are better lip-readers than normal-hearing people.
    Then, we analyze if there are differences between the lip-reading abilities of
    9 hearing-impaired and 15 normal-hearing people. Finally, human abilities are
    compared with the performance of a visual automatic speech recognition system.
    In our tests, hearing-impaired participants outperformed the normal-hearing
    participants but without reaching statistical significance. Human observers
    were able to decode 44% of the spoken message. In contrast, the visual only
    automatic system achieved 20% of word recognition rate. However, if we repeat
    the comparison in terms of phonemes both obtained very similar recognition
    rates, just above 50%. This suggests that the gap between human lip-reading and
    automatic speech-reading might be more related to the use of context than to
    the ability to interpret mouth appearance.

    Anisotropic twicing for single particle reconstruction using autocorrelation analysis

    Tejal Bhamre, Teng Zhang, Amit Singer
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Biomolecules (q-bio.BM); Applications (stat.AP); Methodology (stat.ME)

    The missing phase problem in X-ray crystallography is commonly solved using
    the technique of molecular replacement, which borrows phases from a previously
    solved homologous structure, and appends them to the measured Fourier
    magnitudes of the diffraction patterns of the unknown structure. More recently,
    molecular replacement has been proposed for solving the missing orthogonal
    matrices problem arising in Kam’s autocorrelation analysis for single particle
    reconstruction using X-ray free electron lasers and cryo-EM. In classical
    molecular replacement, it is common to estimate the magnitudes of the unknown
    structure as twice the measured magnitudes minus the magnitudes of the
    homologous structure, a procedure known as `twicing’. Mathematically, this is
    equivalent to finding an unbiased estimator for a complex-valued scalar. We
    generalize this scheme for the case of estimating real or complex valued
    matrices arising in single particle autocorrelation analysis. We name this
    approach “Anisotropic Twicing” because unlike the scalar case, the unbiased
    estimator is not obtained by a simple magnitude isotropic correction. We
    compare the performance of the least squares, twicing and anisotropic twicing
    estimators on synthetic and experimental datasets. We demonstrate 3D homology
    modeling in cryo-EM directly from experimental data without iterative
    refinement or class averaging, for the first time.

    Unsupervised Geometric Learning of Hyperspectral Images

    James M. Murphy, Mauro Maggioni
    Comments: 42 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The problem of unsupervised learning and segmentation of hyperspectral images
    is a significant challenge in remote sensing. The high dimensionality of
    hyperspectral data, presence of substantial noise, and overlap of classes all
    contribute to the difficulty of automatically segment and cluster hyperspectral
    image. In this article, we propose an unsupervised learning technique that
    combines a density-based estimation of class modes with partial least squares
    regression (PLSR) on the learned modes. The density-based learning incorporates
    the geometry of the hyperspectral data by using diffusion distance to promote
    learning a unique mode from each class. These class modes are then used to
    generate class cores which approximate training labels. Partial least squares
    regression using these learned class modes as labeled training points
    consequently determines a labeling of the entire dataset. The proposed method
    is shown to perform competitively against state-of-the-art clustering and
    dimension reduction methods, and often achieves performance comparable to fully
    supervised PLSR.

    Spatio-temporal Person Retrieval via Natural Language Queries

    Masataka Yamaguchi, Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we address the problem of spatio-temporal person retrieval
    from multiple videos using a natural language query, in which we output a tube
    (i.e., a sequence of bounding boxes) which encloses the person described by the
    query. For this problem, we introduce a novel dataset consisting of videos
    containing people annotated with bounding boxes for each second and with five
    natural language descriptions. To retrieve the tube of the person described by
    a given natural language query, we design a model that combines methods for
    spatio-temporal human detection and multimodal retrieval. We conduct
    comprehensive experiments to compare a variety of tube and text representations
    and multimodal retrieval methods, and present a strong baseline in this task as
    well as demonstrate the efficacy of our tube representation and multimodal
    feature embedding technique. Finally, we demonstrate the versatility of our
    model by applying it to two other important tasks.

    Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car

    Mariusz Bojarski, Philip Yeres, Anna Choromanska, Krzysztof Choromanski, Bernhard Firner, Lawrence Jackel, Urs Muller
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

    As part of a complete software stack for autonomous driving, NVIDIA has
    created a neural-network-based system, known as PilotNet, which outputs
    steering angles given images of the road ahead. PilotNet is trained using road
    images paired with the steering angles generated by a human driving a
    data-collection car. It derives the necessary domain knowledge by observing
    human drivers. This eliminates the need for human engineers to anticipate what
    is important in an image and foresee all the necessary rules for safe driving.
    Road tests demonstrated that PilotNet can successfully perform lane keeping in
    a wide variety of driving conditions, regardless of whether lane markings are
    present or not.

    The goal of the work described here is to explain what PilotNet learns and
    how it makes its decisions. To this end we developed a method for determining
    which elements in the road image most influence PilotNet’s steering decision.
    Results show that PilotNet indeed learns to recognize relevant objects on the
    road.

    In addition to learning the obvious features such as lane markings, edges of
    roads, and other cars, PilotNet learns more subtle features that would be hard
    to anticipate and program by engineers, for example, bushes lining the edge of
    the road and atypical vehicle classes.

    Multi-View Dynamic Facial Action Unit Detection

    Andrés Romero, Juan León, Pablo Arbeláez
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a novel convolutional neural network architecture to address the
    fine-grained recognition problem of multi-view dynamic facial action unit
    detection. We leverage recent gains in large-scale object recognition by
    formulating the task of predicting the presence or absence of a specific action
    unit in a still image of a human face as holistic classification. We then
    explore the design space of our approach by considering both shared and
    independent representations for separate action units, and also different CNN
    architectures for combining color and motion information. We then move to the
    novel setup of the FERA 2017 Challenge, in which we propose a multi-view
    extension of our approach that operates by first predicting the viewpoint from
    which the video was taken, and then evaluating an ensemble of action unit
    detectors that were trained for that specific viewpoint. Our approach is
    holistic, efficient, and modular, since new action units can be easily included
    in the overall system. Our approach significantly outperforms the baseline of
    the FERA 2017 Challenge, which was the previous state-of-the-art in multi-view
    dynamic action unit detection, with an absolute improvement of 14%.

    Punny Captions: Witty Wordplay in Image Descriptions

    Arjun Chandrasekaran, Devi Parikh, Mohit Bansal
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    Wit is a quintessential form of rich inter-human interaction, and is often
    grounded in a specific situation (e.g., a comment in response to an event). In
    this work, we attempt to build computational models that can produce witty
    descriptions for a given image. Inspired by a cognitive account of humor
    appreciation, we employ linguistic wordplay, specifically puns. We compare our
    approach against meaningful baseline approaches via human studies. In a Turing
    test style evaluation, people find our model’s description for an image to be
    wittier than a human’s witty description 55% of the time!

    A Generalization of Convolutional Neural Networks to Graph-Structured Data

    Yotam Hechtlinger, Purvasha Chakravarti, Jining Qin
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    This paper introduces a generalization of Convolutional Neural Networks
    (CNNs) from low-dimensional grid data, such as images, to graph-structured
    data. We propose a novel spatial convolution utilizing a random walk to uncover
    the relations within the input, analogous to the way the standard convolution
    uses the spatial neighborhood of a pixel on the grid. The convolution has an
    intuitive interpretation, is efficient and scalable and can also be used on
    data with varying graph structure. Furthermore, this generalization can be
    applied to many standard regression or classification problems, by learning the
    the underlying graph. We empirically demonstrate the performance of the
    proposed CNN on MNIST, and challenge the state-of-the-art on Merck molecular
    activity data set.

    The loss surface of deep and wide neural networks

    Quynh Nguyen, Matthias Hein
    Comments: 14 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    While the optimization problem behind deep neural networks is highly
    non-convex, it is frequently observed in practice that training deep networks
    seems possible without getting stuck in suboptimal points. It has been argued
    that this is the case as all local minima are close to being globally optimal.
    We show that this is (almost) true, in fact almost all local minima are
    globally optimal, for a fully connected network with squared loss and analytic
    activation function given that the number of hidden units of one layer of the
    network is larger than the number of training points and the network structure
    from this layer on is pyramidal.

    Adaptive Cost Function for Pointcloud Registration

    Johan Ekekrantz, John Folkesson, Patric Jensfelt
    Comments: 10 pages, 7 figures, 1 table
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    In this paper we introduce an adaptive cost function for pointcloud
    registration. The algorithm automatically estimates the sensor noise, which is
    important for generalization across different sensors and environments. Through
    experiments on real and synthetic data, we show significant improvements in
    accuracy and robustness over state-of-the-art solutions.


    Artificial Intelligence

    Using a new parsimonious AHP methodology combined with the Choquet integral: An application for evaluating social housing initiatives

    Francesca Abastante, Salvatore Corrente, Salvatore Greco, Alessio Ishizaka, Isabella Lami
    Subjects: Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

    We propose a development of the Analytic Hierarchy Process (AHP) permitting
    to use the methodology also in cases of decision problems with a very large
    number of alternatives evaluated with respect to several criteria. While the
    application of the original AHP method involves many pairwise comparisons
    between alternatives and criteria, our proposal is composed of three steps: (i)
    direct evaluation of the alternatives at hand on the considered criteria, (ii)
    selection of some reference evaluations; (iii) application of the original AHP
    method to reference evaluations; (iv) revision of the direct evaluation on the
    basis of the prioritization supplied by AHP on reference evaluations. The new
    proposal has been tested and validated in an experiment conducted on a sample
    of university students. The new methodology has been therefore applied to a
    real world problem involving the evaluation of 21 Social Housing initiatives
    sited in the Piedmont region (Italy). To take into account interaction between
    criteria, the Choquet integral preference model has been considered within a
    Non Additive Robust Ordinal Regression approach.

    A Popperian Falsification of AI – Lighthill's Argument Defended

    Steven Meyer
    Comments: 8 Pages
    Subjects: Artificial Intelligence (cs.AI)

    The area of computation called artificial intelligence (AI) is falsified by
    describing a previous 1972 falsification of AI by British applied mathematician
    James Lighthill. It is explained how Lighthill’s arguments continue to apply to
    current AI. It is argued that AI should use the Popperian scientific method in
    which it is the duty of every scientist to attempt to falsify theories and if
    theories are falsified to replace or modify them. The paper describes the
    Popperian method in detail and discusses Paul Nurse’s application of the method
    to cell biology that also involves questions of mechanism and behavior.
    Arguments used by Lighthill in his original 1972 report that falsifed AI are
    discussed. The Lighthill arguments are then shown to apply to current AI. The
    argument uses recent scholarship to explain Lighthill’s assumptions and to show
    how the arguments based on those assumptions continue to falsify modern AI. An
    iimportant focus of the argument involves Hilbert’s philosophical programme
    that defined knowledge and truth as provable formal sentences. Current AI takes
    the Hilbert programme as dogma beyond criticism while Lighthill as a mid 20th
    century applied mathematician had abandoned it. The paper uses recent
    scholarship to explain John von Neumann’s criticism of AI that I claim was
    assumed by Lighthill. The paper discusses computer chess programs to show
    Lighthill’s combinatorial explosion still applies to AI but not humans. An
    argument showing that Turing Machines (TM) are not the correct description of
    computation is given. The paper concludes by advocating studying computation as
    Peter Naur’s Dataology.

    Structured Production System (extended abstract)

    Yi Zhou
    Subjects: Artificial Intelligence (cs.AI)

    In this extended abstract, we propose Structured Production Systems (SPS),
    which extend traditional production systems with well-formed syntactic
    structures. Due to the richness of structures, structured production systems
    significantly enhance the expressive power as well as the flexibility of
    production systems, for instance, to handle uncertainty. We show that different
    rule application strategies can be reduced into the basic one by utilizing
    structures. Also, many fundamental approaches in computer science, including
    automata, grammar and logic, can be captured by structured production systems.

    From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood

    Kelvin Guu, Panupong Pasupat, Evan Zheran Liu, Percy Liang
    Comments: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (2017)
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Our goal is to learn a semantic parser that maps natural language utterances
    into executable programs when only indirect supervision is available: examples
    are labeled with the correct execution result, but not the program itself.
    Consequently, we must search the space of programs for those that output the
    correct result, while not being misled by spurious programs: incorrect programs
    that coincidentally output the correct result. We connect two common learning
    paradigms, reinforcement learning (RL) and maximum marginal likelihood (MML),
    and then present a new learning algorithm that combines the strengths of both.
    The new algorithm guards against spurious programs by combining the systematic
    search traditionally employed in MML with the randomized exploration of RL, and
    by updating parameters such that probability is spread more evenly across
    consistent programs. We apply our learning algorithm to a new neural semantic
    parser and show significant gains over existing state-of-the-art results on a
    recent context-dependent semantic parsing task.

    Reinforcement Learning-based Thermal Comfort Control for Vehicle Cabins

    James Brusey, Diana Hintea, Elena Gaura, Neil Beloe
    Subjects: Artificial Intelligence (cs.AI)

    Vehicle climate control systems aim to keep passengers thermally comfortable.
    However, current systems control temperature rather than thermal comfort and
    tend to be energy hungry, which is of particular concern when considering
    electric vehicles. This paper poses energy-efficient vehicle comfort control as
    a Markov Decision Process, which is then solved numerically using
    Sarsa({lambda}) and an empirically validated, single-zone, 1D thermal model of
    the cabin. The resulting controller was tested in simulation using 200 randomly
    selected scenarios and found to exceed the performance of bang-bang,
    proportional, simple fuzzy logic, and commercial controllers with 23%, 43%,
    40%, 56% increase, respectively. Compared to the next best performing
    controller, energy consumption is reduced by 13% while the proportion of time
    spent thermally comfortable is increased by 23%. These results indicate that
    this is a viable approach that promises to translate into substantial comfort
    and energy improvements in the car.

    C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset

    Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    Visual Question Answering (VQA) has received a lot of attention over the past
    couple of years. A number of deep learning models have been proposed for this
    task. However, it has been shown that these models are heavily driven by
    superficial correlations in the training data and lack compositionality — the
    ability to answer questions about unseen compositions of seen concepts. This
    compositionality is desirable and central to intelligence. In this paper, we
    propose a new setting for Visual Question Answering where the test
    question-answer pairs are compositionally novel compared to training
    question-answer pairs. To facilitate developing models under this setting, we
    present a new compositional split of the VQA v1.0 dataset, which we call
    Compositional VQA (C-VQA). We analyze the distribution of questions and answers
    in the C-VQA splits. Finally, we evaluate several existing VQA models under
    this new setting and show that the performances of these models degrade by a
    significant amount compared to the original VQA setting.

    Punny Captions: Witty Wordplay in Image Descriptions

    Arjun Chandrasekaran, Devi Parikh, Mohit Bansal
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    Wit is a quintessential form of rich inter-human interaction, and is often
    grounded in a specific situation (e.g., a comment in response to an event). In
    this work, we attempt to build computational models that can produce witty
    descriptions for a given image. Inspired by a cognitive account of humor
    appreciation, we employ linguistic wordplay, specifically puns. We compare our
    approach against meaningful baseline approaches via human studies. In a Turing
    test style evaluation, people find our model’s description for an image to be
    wittier than a human’s witty description 55% of the time!

    A Generalization of Convolutional Neural Networks to Graph-Structured Data

    Yotam Hechtlinger, Purvasha Chakravarti, Jining Qin
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    This paper introduces a generalization of Convolutional Neural Networks
    (CNNs) from low-dimensional grid data, such as images, to graph-structured
    data. We propose a novel spatial convolution utilizing a random walk to uncover
    the relations within the input, analogous to the way the standard convolution
    uses the spatial neighborhood of a pixel on the grid. The convolution has an
    intuitive interpretation, is efficient and scalable and can also be used on
    data with varying graph structure. Furthermore, this generalization can be
    applied to many standard regression or classification problems, by learning the
    the underlying graph. We empirically demonstrate the performance of the
    proposed CNN on MNIST, and challenge the state-of-the-art on Merck molecular
    activity data set.

    Event Stream-Based Process Discovery using Abstract Representations

    Sebastiaan J. van Zelst, Boudewijn F. van Dongen, Wil M.P. van der Aalst
    Comments: Accepted for publication in “Knowledge and Information Systems; ” (Springer: this http URL)
    Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)

    The aim of process discovery, originating from the area of process mining, is
    to discover a process model based on business process execution data. A
    majority of process discovery techniques relies on an event log as an input. An
    event log is a static source of historical data capturing the execution of a
    business process. In this paper we focus on process discovery relying on online
    streams of business process execution events. Learning process models from
    event streams poses both challenges and opportunities, i.e. we need to handle
    unlimited amounts of data using finite memory and, preferably, constant time.
    We propose a generic architecture that allows for adopting several classes of
    existing process discovery techniques in context of event streams. Moreover, we
    provide several instantiations of the architecture, accompanied by
    implementations in the process mining tool-kit ProM (this http URL).
    Using these instantiations, we evaluate several dimensions of stream-based
    process discovery. The evaluation shows that the proposed architecture allows
    us to lift process discovery to the streaming domain.

    A Recurrent Neural Model with Attention for the Recognition of Chinese Implicit Discourse Relations

    Samuel Rönnqvist, Niko Schenk, Christian Chiarcos
    Comments: To appear at ACL2017, code available at this https URL
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We introduce an attention-based Bi-LSTM for Chinese implicit discourse
    relations and demonstrate that modeling argument pairs as a joint sequence can
    outperform word order-agnostic approaches. Our model benefits from a partial
    sampling scheme and is conceptually simple, yet achieves state-of-the-art
    performance on the Chinese Discourse Treebank. We also visualize its attention
    activity to illustrate the model’s ability to selectively focus on the relevant
    parts of an input sequence.

    The loss surface of deep and wide neural networks

    Quynh Nguyen, Matthias Hein
    Comments: 14 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    While the optimization problem behind deep neural networks is highly
    non-convex, it is frequently observed in practice that training deep networks
    seems possible without getting stuck in suboptimal points. It has been argued
    that this is the case as all local minima are close to being globally optimal.
    We show that this is (almost) true, in fact almost all local minima are
    globally optimal, for a fully connected network with squared loss and analytic
    activation function given that the number of hidden units of one layer of the
    network is larger than the number of training points and the network structure
    from this layer on is pyramidal.


    Information Retrieval

    Ranking in evolving complex networks

    Hao Liao, Manuel Sebastian Mariani, Matus Medo, Yi-Cheng Zhang, Ming-Yang Zhou
    Comments: 54 pages, 16 figures
    Subjects: Physics and Society (physics.soc-ph); Digital Libraries (cs.DL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    Complex networks have emerged as a simple yet powerful framework to represent
    and analyze a wide range of complex systems. The problem of ranking the nodes
    and the edges in complex networks is critical for a broad range of real-world
    problems because it affects how we access online information and products, how
    success and talent are evaluated in human activities, and how scarce resources
    are allocated by companies and policymakers, among others. This calls for a
    deep understanding of how existing ranking algorithms perform, and which are
    their possible biases that may impair their effectiveness. Well-established
    ranking algorithms (such as the popular Google’s PageRank) are static in nature
    and, as a consequence, they exhibit important shortcomings when applied to real
    networks that rapidly evolve in time. The recent advances in the understanding
    and modeling of evolving networks have enabled the development of a wide and
    diverse range of ranking algorithms that take the temporal dimension into
    account. The aim of this review is to survey the existing ranking algorithms,
    both static and time-aware, and their applications to evolving networks. We
    emphasize both the impact of network evolution on well-established static
    algorithms and the benefits from including the temporal dimension for tasks
    such as prediction of real network traffic, prediction of future links, and
    identification of highly-significant nodes.


    Computation and Language

    Punny Captions: Witty Wordplay in Image Descriptions

    Arjun Chandrasekaran, Devi Parikh, Mohit Bansal
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    Wit is a quintessential form of rich inter-human interaction, and is often
    grounded in a specific situation (e.g., a comment in response to an event). In
    this work, we attempt to build computational models that can produce witty
    descriptions for a given image. Inspired by a cognitive account of humor
    appreciation, we employ linguistic wordplay, specifically puns. We compare our
    approach against meaningful baseline approaches via human studies. In a Turing
    test style evaluation, people find our model’s description for an image to be
    wittier than a human’s witty description 55% of the time!

    A Recurrent Neural Model with Attention for the Recognition of Chinese Implicit Discourse Relations

    Samuel Rönnqvist, Niko Schenk, Christian Chiarcos
    Comments: To appear at ACL2017, code available at this https URL
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We introduce an attention-based Bi-LSTM for Chinese implicit discourse
    relations and demonstrate that modeling argument pairs as a joint sequence can
    outperform word order-agnostic approaches. Our model benefits from a partial
    sampling scheme and is conceptually simple, yet achieves state-of-the-art
    performance on the Chinese Discourse Treebank. We also visualize its attention
    activity to illustrate the model’s ability to selectively focus on the relevant
    parts of an input sequence.

    Enriching Complex Networks with Word Embeddings for Detecting Mild Cognitive Impairment from Speech Transcripts

    Leandro B. dos Santos, Edilson A. Corrêa Jr, Osvaldo N. Oliveira Jr, Diego R. Amancio, Letícia L. Mansur, Sandra M. Aluísio
    Comments: Published in Annual Meeting of the Association for Computational Linguist 2017
    Subjects: Computation and Language (cs.CL)

    Mild Cognitive Impairment (MCI) is a mental disorder difficult to diagnose.
    Linguistic features, mainly from parsers, have been used to detect MCI, but
    this is not suitable for large-scale assessments. MCI disfluencies produce
    non-grammatical speech that requires manual or high precision automatic
    correction of transcripts. In this paper, we modeled transcripts into complex
    networks and enriched them with word embedding (CNE) to better represent short
    texts produced in neuropsychological assessments. The network measurements were
    applied with well-known classifiers to automatically identify MCI in
    transcripts, in a binary classification task. A comparison was made with the
    performance of traditional approaches using Bag of Words (BoW) and linguistic
    features for three datasets: DementiaBank in English, and Cinderella and
    Arizona-Battery in Portuguese. Overall, CNE provided higher accuracy than using
    only complex networks, while Support Vector Machine was superior to other
    classifiers. CNE provided the highest accuracies for DementiaBank and
    Cinderella, but BoW was more efficient for the Arizona-Battery dataset probably
    owing to its short narratives. The approach using linguistic features yielded
    higher accuracy if the transcriptions of the Cinderella dataset were manually
    revised. Taken together, the results indicate that complex networks enriched
    with embedding is promising for detecting MCI in large-scale assessments

    Riemannian Optimization for Skip-Gram Negative Sampling

    Alexander Fonarev, Oleksii Hrinchuk, Gleb Gusev, Pavel Serdyukov, Ivan Oseledets
    Comments: 9 pages, 4 figures, ACL 2017
    Subjects: Computation and Language (cs.CL)

    Skip-Gram Negative Sampling (SGNS) word embedding model, well known by its
    implementation in “word2vec” software, is usually optimized by stochastic
    gradient descent. However, the optimization of SGNS objective can be viewed as
    a problem of searching for a good matrix with the low-rank constraint. The most
    standard way to solve this type of problems is to apply Riemannian optimization
    framework to optimize the SGNS objective over the manifold of required low-rank
    matrices. In this paper, we propose an algorithm that optimizes SGNS objective
    using Riemannian optimization and demonstrates its superiority over popular
    competitors, such as the original method to train SGNS and SVD over SPPMI
    matrix.

    Topically Driven Neural Language Model

    Jey Han Lau, Timothy Baldwin, Trevor Cohn
    Comments: 11 pages, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017) (to appear)
    Subjects: Computation and Language (cs.CL)

    Language models are typically applied at the sentence level, without access
    to the broader document context. We present a neural language model that
    incorporates document context in the form of a topic model-like architecture,
    thus providing a succinct representation of the broader document context
    outside of the current sentence. Experiments over a range of datasets
    demonstrate that our model outperforms a pure sentence-based model in terms of
    language model perplexity, and leads to topics that are potentially more
    coherent than those produced by a standard LDA topic model. Our model also has
    the ability to generate related sentences for a topic, providing another way to
    interpret topics.

    Other Topics You May Also Agree or Disagree: Modeling Inter-Topic Preferences using Tweets and Matrix Factorization

    Akira Sasaki, Kazuaki Hanawa, Naoaki Okazaki, Kentaro Inui
    Comments: To appear in ACL2017
    Subjects: Computation and Language (cs.CL)

    We present in this paper our approach for modeling inter-topic preferences of
    Twitter users: for example, those who agree with the Trans-Pacific Partnership
    (TPP) also agree with free trade. This kind of knowledge is useful not only for
    stance detection across multiple topics but also for various real-world
    applications including public opinion surveys, electoral predictions, electoral
    campaigns, and online debates. In order to extract users’ preferences on
    Twitter, we design linguistic patterns in which people agree and disagree about
    specific topics (e.g., “A is completely wrong”). By applying these linguistic
    patterns to a collection of tweets, we extract statements agreeing and
    disagreeing with various topics. Inspired by previous work on item
    recommendation, we formalize the task of modeling inter-topic preferences as
    matrix factorization: representing users’ preferences as a user-topic matrix
    and mapping both users and topics onto a latent feature space that abstracts
    the preferences. Our experimental results demonstrate both that our proposed
    approach is useful in predicting missing preferences of users and that the
    latent vector representations of topics successfully encode inter-topic
    preferences.

    Automatic Compositor Attribution in the First Folio of Shakespeare

    Maria Ryskina, Hannah Alpert-Abrams, Dan Garrette, Taylor Berg-Kirkpatrick
    Comments: Short paper (6 pages) accepted at ACL 2017
    Subjects: Computation and Language (cs.CL)

    Compositor attribution, the clustering of pages in a historical printed
    document by the individual who set the type, is a bibliographic task that
    relies on analysis of orthographic variation and inspection of visual details
    of the printed page. In this paper, we introduce a novel unsupervised model
    that jointly describes the textual and visual features needed to distinguish
    compositors. Applied to images of Shakespeare’s First Folio, our model predicts
    attributions that agree with the manual judgements of bibliographers with an
    accuracy of 87%, even on text that is the output of OCR.

    C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset

    Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    Visual Question Answering (VQA) has received a lot of attention over the past
    couple of years. A number of deep learning models have been proposed for this
    task. However, it has been shown that these models are heavily driven by
    superficial correlations in the training data and lack compositionality — the
    ability to answer questions about unseen compositions of seen concepts. This
    compositionality is desirable and central to intelligence. In this paper, we
    propose a new setting for Visual Question Answering where the test
    question-answer pairs are compositionally novel compared to training
    question-answer pairs. To facilitate developing models under this setting, we
    present a new compositional split of the VQA v1.0 dataset, which we call
    Compositional VQA (C-VQA). We analyze the distribution of questions and answers
    in the C-VQA splits. Finally, we evaluate several existing VQA models under
    this new setting and show that the performances of these models degrade by a
    significant amount compared to the original VQA setting.

    Friendships, Rivalries, and Trysts: Characterizing Relations between Ideas in Texts

    Chenhao Tan, Dallas Card, Noah A. Smith
    Comments: 11 pages, 9 figures, to appear in Proceedings of ACL 2017, code and data available at this https URL
    Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Physics and Society (physics.soc-ph)

    Understanding how ideas relate to each other is a fundamental question in
    many domains, ranging from intellectual history to public communication.
    Because ideas are naturally embedded in texts, we propose the first framework
    to systematically characterize the relations between ideas based on their
    occurrence in a corpus of documents, independent of how these ideas are
    represented. Combining two statistics — cooccurrence within documents and
    prevalence correlation over time — our approach reveals a number of different
    ways in which ideas can cooperate and compete. For instance, two ideas can
    closely track each other’s prevalence over time, and yet rarely cooccur, almost
    like a “cold war” scenario. We observe that pairwise cooccurrence and
    prevalence correlation exhibit different distributions. We further demonstrate
    that our approach is able to uncover intriguing relations between ideas through
    in-depth case studies on news articles and research papers.


    Distributed, Parallel, and Cluster Computing

    Idle Period Propagation in Message-Passing Applications

    Ivy Bo Peng, Stefano Markidis, Erwin Laure, Gokcen Kestor, Roberto Gioiosa
    Comments: 18th International Conference on High Performance Computing and Communications, IEEE, 2016
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Idle periods on different processes of Message Passing applications are
    unavoidable. While the origin of idle periods on a single process is well
    understood as the effect of system and architectural random delays, yet it is
    unclear how these idle periods propagate from one process to another. It is
    important to understand idle period propagation in Message Passing applications
    as it allows application developers to design communication patterns avoiding
    idle period propagation and the consequent performance degradation in their
    applications. To understand idle period propagation, we introduce a methodology
    to trace idle periods when a process is waiting for data from a remote delayed
    process in MPI applications. We apply this technique in an MPI application that
    solves the heat equation to study idle period propagation on three different
    systems. We confirm that idle periods move between processes in the form of
    waves and that there are different stages in idle period propagation. Our
    methodology enables us to identify a self-synchronization phenomenon that
    occurs on two systems where some processes run slower than the other processes.

    Exploring Application Performance on Emerging Hybrid-Memory Supercomputers

    Ivy Bo Peng, Stefano Markidis, Erwin Laure, Gokcen Kestor, Roberto Gioiosa
    Comments: 18th International Conference on High Performance Computing and Communications, IEEE, 2016
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Next-generation supercomputers will feature more hierarchical and
    heterogeneous memory systems with different memory technologies working
    side-by-side. A critical question is whether at large scale existing HPC
    applications and emerging data-analytics workloads will have performance
    improvement or degradation on these systems. We propose a systematic and fair
    methodology to identify the trend of application performance on emerging
    hybrid-memory systems. We model the memory system of next-generation
    supercomputers as a combination of “fast” and “slow” memories. We then analyze
    performance and dynamic execution characteristics of a variety of workloads,
    from traditional scientific applications to emerging data analytics to compare
    traditional and hybrid-memory systems. Our results show that data analytics
    applications can clearly benefit from the new system design, especially at
    large scale. Moreover, hybrid-memory systems do not penalize traditional
    scientific applications, which may also show performance improvement.

    Models of fault-tolerant distributed computation via dynamic epistemic logic

    Eric Goubault, Sergio Rajsbaum
    Comments: arXiv admin note: text overlap with arXiv:1703.11005
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Logic in Computer Science (cs.LO); Multiagent Systems (cs.MA); Algebraic Topology (math.AT)

    The computability power of a distributed computing model is determined by the
    communication media available to the processes, the timing assumptions about
    processes and communication, and the nature of failures that processes can
    suffer. In a companion paper we showed how dynamic epistemic logic can be used
    to give a formal semantics to a given distributed computing model, to capture
    precisely the knowledge needed to solve a distributed task, such as consensus.
    Furthermore, by moving to a dual model of epistemic logic defined by simplicial
    complexes, topological invariants are exposed, which determine task
    solvability. In this paper we show how to extend the setting above to include
    in the knowledge of the processes, knowledge about the model of computation
    itself. The extension describes the knowledge processes gain about the current
    execution, in problems where processes have no input values at all.

    Systematizing Decentralization and Privacy: Lessons from 15 years of research and deployments

    Carmela Troncoso, George Danezis, Marios Isaakidis, Harry Halpin
    Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

    Decentralized systems are a subset of distributed systems where multiple
    authorities control different components and no authority is fully trusted by
    all. This implies that any component in a decentralized system is potentially
    adversarial. We revise fifteen years of research on decentralization and
    privacy, and provide an overview of key systems. Decentralized designs may lead
    to gains in privacy, integrity and availability, but there are also inherent
    trade-offs that require to be better understood to be addressed. We argue that
    combination of insights from cryptography, distributed systems, and mechanism
    design will be necessary to build scalable and successful decentralized
    systems.

    Video Streaming in Distributed Erasure-coded Storage Systems: Stall Duration Analysis

    Abubakr O. Al-Abbasi, Vaneet Aggarwal
    Comments: 15 pages, 13 figures
    Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Multimedia (cs.MM)

    The demand for global video has been burgeoning across industries. With the
    expansion and improvement of video streaming services, cloud-based video is
    evolving into a necessary feature of any successful business for reaching
    internal and external audiences. This paper considers video streaming over
    distributed systems where the video segments are encoded using an erasure code
    for better reliability thus being the first work to our best knowledge that
    considers video streaming over erasure-coded distributed cloud systems. The
    download time of each coded chunk of each video segment is characterized and
    ordered statistics over the choice of the erasure-coded chunks is used to
    obtain the playback time of different video segments. Using the playback times,
    bounds on the moment generating function on the stall duration is used to bound
    the mean stall duration. Moment generating function based bounds on the ordered
    statistics are also used to bound the stall duration tail probability which
    determines the probability that the stall time is greater than a pre-defined
    number. These two metrics, mean stall duration and the stall duration tail
    probability, are important quality of experience (QoE) measures for the end
    users. Based on these metrics, we formulate an optimization problem to jointly
    minimize the convex combination of both the QoE metrics averaged over all
    requests over the placement and access of the video content. The non-convex
    problem is solved using an efficient iterative algorithm. Numerical results
    show significant improvement in QoE metrics for cloud-based video as compared
    to the considered baselines.


    Learning

    Understanding the Feedforward Artificial Neural Network Model From the Perspective of Network Flow

    Dawei Dai, Weimin Tan, Hong Zhan
    Comments: arXiv admin note: text overlap with arXiv:1702.04595 by other authors
    Subjects: Learning (cs.LG)

    In recent years, deep learning based on artificial neural network (ANN) has
    achieved great success in pattern recognition. However, there is no clear
    understanding of such neural computational models. In this paper, we try to
    unravel “black-box” structure of Ann model from network flow. Specifically, we
    consider the feed forward Ann as a network flow model, which consists of many
    directional class-pathways. Each class-pathway encodes one class. The
    class-pathway of a class is obtained by connecting the activated neural nodes
    in each layer from input to output, where activation value of neural node
    (node-value) is defined by the weights of each layer in a trained
    ANN-classifier. From the perspective of the class-pathway, training an
    ANN-classifier can be regarded as the formulation process of class-pathways of
    different classes. By analyzing the the distances of each two class-pathways in
    a trained ANN-classifiers, we try to answer the questions, why the classifier
    performs so? At last, from the neural encodes view, we define the importance of
    each neural node through the class-pathways, which is helpful to optimize the
    structure of a classifier. Experiments for two types of ANN model including
    multi-layer MLP and CNN verify that the network flow based on class-pathway is
    a reasonable explanation for ANN models.

    The loss surface of deep and wide neural networks

    Quynh Nguyen, Matthias Hein
    Comments: 14 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    While the optimization problem behind deep neural networks is highly
    non-convex, it is frequently observed in practice that training deep networks
    seems possible without getting stuck in suboptimal points. It has been argued
    that this is the case as all local minima are close to being globally optimal.
    We show that this is (almost) true, in fact almost all local minima are
    globally optimal, for a fully connected network with squared loss and analytic
    activation function given that the number of hidden units of one layer of the
    network is larger than the number of training points and the network structure
    from this layer on is pyramidal.

    Stochastic Orthant-Wise Limited-Memory Quasi-Newton Method

    Jianqiao Wangni
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    The (ell_1) regularized sparse model has been favourably used in machine
    learning society. Due to the non-smoothness, fast optimizers like quasi-Newton
    methods can not be directly applied. In this paper, we propose the first
    stochastic limited-memory quasi-newton optimizer that specializing in strongly
    convex loss function with (ell_1)-regularization. The optimizer consists three
    alignment steps which are generalized from batch version of OWL-QN optimizer,
    to encourage the parameter update be orthant-wise. We adopt several practical
    features from recent stochastic variants of L-BFGS and variance reduction of
    subsampled gradient, we also employ various sketch techniques on the Hessian
    matrix inversion, squeezing more curvature information and accelerate the
    convergence. We prove a linear convergence rate of our optimizer, and
    experimentally demonstrate that our optimizer outperforms other linear
    convergent optimizers on large-scale sparse logistic regression task.

    On Improving Deep Reinforcement Learning for POMDPs

    Pengfei Zhu, Xin Li, Pascal Poupart
    Comments: 7 pages, 6 figures, 3 tables
    Subjects: Learning (cs.LG)

    Deep Reinforcement Learning (RL) recently emerged as one of the most
    competitive approaches for learning in sequential decision making problems with
    fully observable environments, e.g., computer Go. However, very little work has
    been done in deep RL to handle partially observable environments. We propose a
    new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to
    enhance learning performance in partially observable domains. Actions are
    encoded by a fully connected layer and coupled with a convolutional observation
    to form an action-observation pair. The time series of action-observation pairs
    are then integrated by an LSTM layer that learns latent states based on which a
    fully connected layer computes Q-values as in conventional Deep Q-Networks
    (DQNs). We demonstrate the effectiveness of our new architecture in several
    partially observable domains, including flickering Atari games.

    Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks

    Swapna Buccapatnam, Fang Liu, Atilla Eryilmaz, Ness B. Shroff
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We study the stochastic multi-armed bandit (MAB) problem in the presence of
    side-observations across actions that occur as a result of an underlying
    network structure. In our model, a bipartite graph captures the relationship
    between actions and a common set of unknowns such that choosing an action
    reveals observations for the unknowns that it is connected to. This models a
    common scenario in online social networks where users respond to their friends’
    activity, thus providing side information about each other’s preferences. Our
    contributions are as follows: 1) We derive an asymptotic lower bound (with
    respect to time) as a function of the bi-partite network structure on the
    regret of any uniformly good policy that achieves the maximum long-term average
    reward. 2) We propose two policies – a randomized policy; and a policy based on
    the well-known upper confidence bound (UCB) policies – both of which explore
    each action at a rate that is a function of its network position. We show,
    under mild assumptions, that these policies achieve the asymptotic lower bound
    on the regret up to a multiplicative factor, independent of the network
    structure. Finally, we use numerical examples on a real-world social network
    and a routing example network to demonstrate the benefits obtained by our
    policies over other existing policies.

    An ensemble-based online learning algorithm for streaming data

    Tien Thanh Nguyen, Thi Thu Thuy Nguyen, Xuan Cuong Pham, Alan Wee-Chung Liew, James C. Bezdek
    Comments: 19 pages, 3 figures
    Subjects: Learning (cs.LG)

    In this study, we introduce an ensemble-based approach for online machine
    learning. The ensemble of base classifiers in our approach is obtained by
    learning Naive Bayes classifiers on different training sets which are generated
    by projecting the original training set to lower dimensional space. We propose
    a mechanism to learn sequences of data using data chunks paradigm. The
    experiments conducted on a number of UCI datasets and one synthetic dataset
    demonstrate that the proposed approach performs significantly better than some
    well-known online learning algorithms.

    Relative Error Tensor Low Rank Approximation

    Zhao Song, David P. Woodruff, Peilin Zhong
    Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Learning (cs.LG)

    We consider relative error low rank approximation of {it tensors} with
    respect to the Frobenius norm: given an order-(q) tensor (A in
    mathbb{R}^{prod_{i=1}^q n_i}), output a rank-(k) tensor (B) for which
    (|A-B|_F^2 leq (1+epsilon))OPT, where OPT (= inf_{ extrm{rank-}k~A’}
    |A-A’|_F^2). Despite the success on obtaining relative error low rank
    approximations for matrices, no such results were known for tensors. One
    structural issue is that there may be no rank-(k) tensor (A_k) achieving the
    above infinum. Another, computational issue, is that an efficient relative
    error low rank approximation algorithm for tensors would allow one to compute
    the rank of a tensor, which is NP-hard. We bypass these issues via (1)
    bicriteria and (2) parameterized complexity solutions:

    (1) We give an algorithm which outputs a rank (k’ = O((k/epsilon)^{q-1}))
    tensor (B) for which (|A-B|_F^2 leq (1+epsilon))OPT in (nnz(A) + n cdot
    extrm{poly}(k/epsilon)) time in the real RAM model. Here (nnz(A)) is the
    number of non-zero entries in (A).

    (2) We give an algorithm for any (delta >0) which outputs a rank (k) tensor
    (B) for which (|A-B|_F^2 leq (1+epsilon))OPT and runs in ( ( nnz(A) + n
    cdot extrm{poly}(k/epsilon) + exp(k^2/epsilon) ) cdot n^delta) time in
    the unit cost RAM model.

    For outputting a rank-(k) tensor, or even a bicriteria solution with
    rank-(Ck) for a certain constant (C > 1), we show a (2^{Omega(k^{1-o(1)})})
    time lower bound under the Exponential Time Hypothesis.

    Our results give the first relative error low rank approximations for tensors
    for a large number of robust error measures for which nothing was known, as
    well as column row and tube subset selection. We also obtain new results for
    matrices, such as (nnz(A))-time CUR decompositions, improving previous
    (nnz(A)log n)-time algorithms, which may be of independent interest.

    C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset

    Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    Visual Question Answering (VQA) has received a lot of attention over the past
    couple of years. A number of deep learning models have been proposed for this
    task. However, it has been shown that these models are heavily driven by
    superficial correlations in the training data and lack compositionality — the
    ability to answer questions about unseen compositions of seen concepts. This
    compositionality is desirable and central to intelligence. In this paper, we
    propose a new setting for Visual Question Answering where the test
    question-answer pairs are compositionally novel compared to training
    question-answer pairs. To facilitate developing models under this setting, we
    present a new compositional split of the VQA v1.0 dataset, which we call
    Compositional VQA (C-VQA). We analyze the distribution of questions and answers
    in the C-VQA splits. Finally, we evaluate several existing VQA models under
    this new setting and show that the performances of these models degrade by a
    significant amount compared to the original VQA setting.

    Accelerating Stochastic Gradient Descent

    Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford
    Comments: 55 pages, 3 figures, 1 table
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)

    There is widespread sentiment that it is not possible to effectively utilize
    fast gradient methods (e.g. Nesterov’s acceleration, conjugate gradient, heavy
    ball) for the purposes of stochastic optimization due to their instability and
    error accumulation, a notion made precise in d’Aspremont 2008 and Devolder,
    Glineur, and Nesterov 2014. This work considers these issues for the special
    case of stochastic approximation for the least squares regression problem, and
    our main result refutes the conventional wisdom by showing that acceleration
    can be made robust to statistical errors. In particular, this work introduces
    an accelerated stochastic gradient method that provably achieves the minimax
    optimal statistical risk faster than stochastic gradient descent. Critical to
    the analysis is a sharp characterization of accelerated stochastic gradient
    descent as a stochastic process. We hope this characterization gives insights
    towards the broader question of designing simple and effective accelerated
    stochastic methods for more general convex and non-convex optimization
    problems.

    A Generalization of Convolutional Neural Networks to Graph-Structured Data

    Yotam Hechtlinger, Purvasha Chakravarti, Jining Qin
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    This paper introduces a generalization of Convolutional Neural Networks
    (CNNs) from low-dimensional grid data, such as images, to graph-structured
    data. We propose a novel spatial convolution utilizing a random walk to uncover
    the relations within the input, analogous to the way the standard convolution
    uses the spatial neighborhood of a pixel on the grid. The convolution has an
    intuitive interpretation, is efficient and scalable and can also be used on
    data with varying graph structure. Furthermore, this generalization can be
    applied to many standard regression or classification problems, by learning the
    the underlying graph. We empirically demonstrate the performance of the
    proposed CNN on MNIST, and challenge the state-of-the-art on Merck molecular
    activity data set.

    Multimodal MRI brain tumor segmentation using random forests with features learned from fully convolutional neural network

    Mohammadreza Soltaninejad, Lei Zhang, Tryphon Lambrou, Nigel Allinson, Xujiong Ye
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In this paper, we propose a novel learning based method for automated
    segmenta-tion of brain tumor in multimodal MRI images. The machine learned
    features from fully convolutional neural network (FCN) and hand-designed texton
    fea-tures are used to classify the MRI image voxels. The score map with
    pixel-wise predictions is used as a feature map which is learned from
    multimodal MRI train-ing dataset using the FCN. The learned features are then
    applied to random for-ests to classify each MRI image voxel into normal brain
    tissues and different parts of tumor. The method was evaluated on BRATS 2013
    challenge dataset. The results show that the application of the random forest
    classifier to multimodal MRI images using machine-learned features based on FCN
    and hand-designed features based on textons provides promising segmentations.
    The Dice overlap measure for automatic brain tumor segmentation against ground
    truth is 0.88, 080 and 0.73 for complete tumor, core and enhancing tumor,
    respectively.

    A Recurrent Neural Model with Attention for the Recognition of Chinese Implicit Discourse Relations

    Samuel Rönnqvist, Niko Schenk, Christian Chiarcos
    Comments: To appear at ACL2017, code available at this https URL
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We introduce an attention-based Bi-LSTM for Chinese implicit discourse
    relations and demonstrate that modeling argument pairs as a joint sequence can
    outperform word order-agnostic approaches. Our model benefits from a partial
    sampling scheme and is conceptually simple, yet achieves state-of-the-art
    performance on the Chinese Discourse Treebank. We also visualize its attention
    activity to illustrate the model’s ability to selectively focus on the relevant
    parts of an input sequence.

    Exploiting random projections and sparsity with random forests and gradient boosting methods — Application to multi-label and multi-output learning, random forest model compression and leveraging input sparsity

    Arnaud Joly
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Within machine learning, the supervised learning field aims at modeling the
    input-output relationship of a system, from past observations of its behavior.
    Decision trees characterize the input-output relationship through a series of
    nested (if-then-else) questions, the testing nodes, leading to a set of
    predictions, the leaf nodes. Several of such trees are often combined together
    for state-of-the-art performance: random forest ensembles average the
    predictions of randomized decision trees trained independently in parallel,
    while tree boosting ensembles train decision trees sequentially to refine the
    predictions made by the previous ones.

    The emergence of new applications requires scalable supervised learning
    algorithms in terms of computational power and memory space with respect to the
    number of inputs, outputs, and observations without sacrificing accuracy. In
    this thesis, we identify three main areas where decision tree methods could be
    improved for which we provide and evaluate original algorithmic solutions: (i)
    learning over high dimensional output spaces, (ii) learning with large sample
    datasets and stringent memory constraints at prediction time and (iii) learning
    over high dimensional sparse input spaces.

    Deep Text Classification Can be Fooled

    Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, Wenchang Shi
    Comments: 7 pages
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

    Deep neural networks (DNNs) play a key role in many applications. Current
    studies focus on crafting adversarial samples against DNN-based image
    classifiers by introducing some imperceptible perturbations to the input.
    However, DNNs for natural language processing have not got the attention they
    deserve. In fact, the existing perturbation algorithms for images cannot be
    directly applied to text. This paper presents a simple but effective method to
    attack DNN-based text classifiers. Three perturbation strategies, namely
    insertion, modification, and removal, are designed to generate an adversarial
    sample for a given text. By computing the cost gradients, what should be
    inserted, modified or removed, where to insert and how to modify are determined
    effectively. The experimental results show that the adversarial samples
    generated by our method can successfully fool a state-of-the-art model to
    misclassify them as any desirable classes without compromising their utilities.
    At the same time, the introduced perturbations are difficult to be perceived.
    Our study demonstrates that DNN-based text classifiers are also prone to the
    adversarial sample attack.

    A Flexible Framework for Hypothesis Testing in High-dimensions

    Adel Javanmard, Jason D. Lee
    Comments: 27 pages
    Subjects: Statistics Theory (math.ST); Learning (cs.LG); Applications (stat.AP); Methodology (stat.ME); Machine Learning (stat.ML)

    Hypothesis testing in the linear regression model is a fundamental
    statistical problem. We consider linear regression in the high-dimensional
    regime where the number of parameters exceeds the number of samples ((p> n))
    and assume that the high-dimensional parameters vector is (s_0) sparse. We
    develop a general and flexible (ell_infty) projection statistic for
    hypothesis testing in this model. Our framework encompasses testing whether the
    parameter lies in a convex cone, testing the signal strength, testing arbitrary
    functionals of the parameter, and testing adaptive hypothesis. We show that the
    proposed procedure controls the type I error under the standard assumption of
    (s_0 (log p)/sqrt{n} o 0), and also analyze the power of the procedure. Our
    numerical experiments confirms our theoretical findings and demonstrate that we
    control false positive rate (type I error) near the nominal level, and have
    high power.

    Linear Convergence of Accelerated Stochastic Gradient Descent for Nonconvex Nonsmooth Optimization

    Feihu Huang, Songcan Chen
    Comments: 21 pages, 3 figures and 1 table
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)

    In this paper, we study the stochastic gradient descent (SGD) method for the
    nonconvex nonsmooth optimization, and propose an accelerated SGD method by
    combining the variance reduction technique with Nesterov’s extrapolation
    technique. Moreover, based on the local error bound condition, we establish the
    linear convergence of our method to obtain a stationary point of the nonconvex
    optimization. In particular, we prove that not only the sequence generated
    linearly converges to a stationary point of the problem, but also the
    corresponding sequence of objective values is linearly convergent. Finally,
    some numerical experiments demonstrate the effectiveness of our method. To the
    best of our knowledge, it is first proved that the accelerated SGD method
    converges linearly to the local minimum of the nonconvex optimization.

    From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood

    Kelvin Guu, Panupong Pasupat, Evan Zheran Liu, Percy Liang
    Comments: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (2017)
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Our goal is to learn a semantic parser that maps natural language utterances
    into executable programs when only indirect supervision is available: examples
    are labeled with the correct execution result, but not the program itself.
    Consequently, we must search the space of programs for those that output the
    correct result, while not being misled by spurious programs: incorrect programs
    that coincidentally output the correct result. We connect two common learning
    paradigms, reinforcement learning (RL) and maximum marginal likelihood (MML),
    and then present a new learning algorithm that combines the strengths of both.
    The new algorithm guards against spurious programs by combining the systematic
    search traditionally employed in MML with the randomized exploration of RL, and
    by updating parameters such that probability is spread more evenly across
    consistent programs. We apply our learning algorithm to a new neural semantic
    parser and show significant gains over existing state-of-the-art results on a
    recent context-dependent semantic parsing task.

    Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car

    Mariusz Bojarski, Philip Yeres, Anna Choromanska, Krzysztof Choromanski, Bernhard Firner, Lawrence Jackel, Urs Muller
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

    As part of a complete software stack for autonomous driving, NVIDIA has
    created a neural-network-based system, known as PilotNet, which outputs
    steering angles given images of the road ahead. PilotNet is trained using road
    images paired with the steering angles generated by a human driving a
    data-collection car. It derives the necessary domain knowledge by observing
    human drivers. This eliminates the need for human engineers to anticipate what
    is important in an image and foresee all the necessary rules for safe driving.
    Road tests demonstrated that PilotNet can successfully perform lane keeping in
    a wide variety of driving conditions, regardless of whether lane markings are
    present or not.

    The goal of the work described here is to explain what PilotNet learns and
    how it makes its decisions. To this end we developed a method for determining
    which elements in the road image most influence PilotNet’s steering decision.
    Results show that PilotNet indeed learns to recognize relevant objects on the
    road.

    In addition to learning the obvious features such as lane markings, edges of
    roads, and other cars, PilotNet learns more subtle features that would be hard
    to anticipate and program by engineers, for example, bushes lining the edge of
    the road and atypical vehicle classes.

    Stochastic Optimization from Distributed, Streaming Data in Rate-limited Networks

    Matthew Nokleby, Waheed U. Bajwa
    Comments: 13 pages, 1 figure; submitted to IEEE Transactions on Signal and Information Processing over Networks
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Motivated by machine learning applications in networks of sensors,
    internet-of-things (IoT) devices, and autonomous agents, we propose techniques
    for distributed stochastic convex learning from high-rate data streams. The
    setup involves a network of nodes—each one of which has a stream of data
    arriving at a constant rate—that solve a stochastic convex optimization
    problem by collaborating with each other over rate-limited communication links.
    To this end, we present and analyze two algorithms—termed distributed
    stochastic approximation mirror descent (D-SAMD) and {em accelerated}
    distributed stochastic approximation mirror descent (AD-SAMD)—that are based
    on two stochastic variants of mirror descent. The main collaborative step in
    the proposed algorithms is approximate averaging of the local, noisy
    subgradients using distributed consensus. While distributed consensus is well
    suited for collaborative learning, its use in our setup results in perturbed
    subgradient averages due to rate-limited links, which may slow down or prevent
    convergence. Our main contributions in this regard are: (i) bounds on the
    convergence rates of D-SAMD and AD-SAMD in terms of the number of nodes,
    network topology, and ratio of the data streaming and communication rates, and
    (ii) sufficient conditions for order-optimum convergence of D-SAMD and AD-SAMD.
    In particular, we show that there exist regimes under which AD-SAMD, when
    compared to D-SAMD, achieves order-optimum convergence with slower
    communications rates. This is in contrast to the centralized setting in which
    use of accelerated mirror descent results in a modest improvement over regular
    mirror descent for stochastic composite optimization. Finally, we demonstrate
    the effectiveness of the proposed algorithms using numerical experiments.

    Pre-computed Liquid Spaces with Generative Neural Networks and Optical Flow

    Boris Bonev, Lukas Prantl, Nils Thuerey
    Comments: Supplemental Android app: this https URL
    Subjects: Graphics (cs.GR); Learning (cs.LG)

    Liquids exhibit highly complex, non-linear behavior under changing simulation
    conditions such as user interactions. We propose a method to map this complex
    behavior over a parameter range onto a reduced representation based on
    space-time deformations. In order to represent the complexity of the full space
    of inputs, we use aligned deformations from optical flow solves, and we
    leverage the power of generative neural networks to synthesize additional
    deformations for refinement. We introduce a novel deformation-aware loss
    function, which enables optimization in the highly non-linear space of multiple
    deformations. To demonstrate the effectiveness of our approach, we showcase the
    method with several complex examples in two and four dimensions. Our
    representation makes it possible to generate implicit surfaces of liquids very
    efficiently, which allows us to very efficiently display the scene from any
    angle, and to add secondary effects such as particle systems. We have
    implemented a mobile application with our full pipeline to demonstrate that
    real-time interaction is possible with our approach.

    ADMM Penalty Parameter Selection by Residual Balancing

    Brendt Wohlberg
    Subjects: Optimization and Control (math.OC); Learning (cs.LG)

    Appropriate selection of the penalty parameter is crucial to obtaining good
    performance from the Alternating Direction Method of Multipliers (ADMM). While
    analytic results for optimal selection of this parameter are very limited,
    there is a heuristic method that appears to be relatively successful in a
    number of different problems. The contribution of this paper is to demonstrate
    that their is a potentially serious flaw in this heuristic approach, and to
    propose a modification that at least partially addresses it.


    Information Theory

    Coverage and Rate Analysis of Super Wi-Fi Networks Using Stochastic Geometry

    Neelakantan Nurani Krishnan, Gokul Sridharan, Ivan Seskar, Narayan Mandayam
    Comments: Published in IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), 2017 held at Baltimore, MD
    Subjects: Information Theory (cs.IT)

    Recent regulatory changes proposed by the Federal Communications Commission
    (FCC) permitting unlicensed use of television white space (TVWS) channels
    present new opportunities for designing wireless networks that make efficient
    use of this spectrum. The favorable propagation characteristics of these
    channels and their widespread availability, especially in rural areas, make
    them well-suited for providing broadband services in sparsely populated regions
    where economic factors hinder deployment of such services on licensed spectrum.
    In this context, this paper explores the deployment of an outdoor Wi-Fi-like
    network operating in TVWS channels, referred to commonly as a Super Wi-Fi
    network. Since regulations governing unlicensed use of these channels allow (a)
    mounting fixed devices up to a height of 30 m and operation at transmit powers
    of up to 4 W EIRP, and (b) operation at transmit powers of up to 100 mW EIRP
    for portable devices, such networks can provide extended coverage and higher
    rates than traditional Wi-Fi networks. However, these gains are subject to the
    viability of the uplink from the portable devices (clients) to the fixed
    devices (access points (AP)) because of tighter restrictions on transmit power
    of clients compared to APs. This paper leverages concepts from stochastic
    geometry to study the performance of such networks with specific focus on the
    effect of (a) transmit power asymmetry between APs and clients and its impact
    on uplink viability and coverage, and (b) the interplay between height and
    transmit power of APs in determining the network throughput. Such an analysis
    reveals that (a) maximum coverage of no more than 700 m is obtained even when
    APs are deployed at 30 m height, and (b) operating APs at transmit power of
    more than 1 W is beneficial only at sparse deployment densities when rate is
    prioritized over coverage.

    Hybrid Procoder and Combiner Design for Secure Transmission in mmWave MIMO Systems

    Xiaowen Tian, Ming Li, Zihuan Wang, Qian Liu
    Subjects: Information Theory (cs.IT)

    Millimeter wave (mmWave) communications have been considered as a key
    technology for future 5G wireless networks. In order to overcome the severe
    propagation loss of mmWave channel, mmWave multiple-input multiple-output
    (MIMO) systems with analog/digital hybrid precoding and combining transceiver
    architecture have been widely considered. However, physical layer security
    (PLS) in mmWave MIMO systems and the secure hybrid beamformer design have not
    been well investigated. In this paper, we consider the problem of hybrid
    precoder and combiner design for secure transmission in mmWave MIMO systems in
    order to protect the legitimate transmission from eavesdropping. When
    eavesdropper’s channel state information (CSI) is known, we first propose a
    joint analog precoder and combiner design algorithm which can prevent the
    information leakage to the eavesdropper. Then, the digital precoder and
    combiner are computed based on the obtained effective baseband channel to
    further maximize the secrecy rate. Next, if prior knowledge of the
    eavesdropper’s CSI is unavailable, we develop an artificial noise (AN)-based
    hybrid beamforming approach, which can jam eavesdropper’s reception while
    maintaining the quality-of-service (QoS) of intended receiver at the
    pre-specified level. Simulation results demonstrate that our proposed
    algorithms offer significant secrecy performance improvement compared with
    other hybrid beamforming algorithms.

    Measurement Matrix Design for Phase Retrieval Based on Mutual Information

    Nir Shlezinger, Ron Dabora, Yonina C. Eldar
    Comments: This paper will be presented in part at the 2017 International Symposium on Information Theory
    Subjects: Information Theory (cs.IT)

    In phase retrieval problems, a signal of interest (SOI) is reconstructed
    based on the magnitude of a linear transformation of the SOI observed with
    additive noise. The linear transform is typically referred to as a measurement
    matrix. Many works on phase retrieval assume that the measurement matrix is a
    random Gaussian matrix, which, in the noiseless scenario with sufficiently many
    measurements, guarantees invertability of the transformation between the SOI
    and the observations, up to an inherent phase ambiguity. However, in many
    practical applications, the measurement matrix corresponds to an underlying
    physical setup, and is therefore deterministic, possibly with structural
    constraints. In this work we study the design of deterministic measurement
    matrices, based on maximizing the mutual information between the SOI and the
    observations. We characterize necessary conditions for the optimality of a
    measurement matrix, and analytically obtain the optimal matrix in the low SNR
    regime. Practical methods for designing general measurement matrices and masked
    Fourier measurements are proposed. Simulation tests demonstrate the performance
    gain achieved by the proposed techniques compared to random Gaussian
    measurements for various phase recovery algorithms.

    Replica Symmetry Breaking in Compressive Sensing

    Ali Bereyhi, Ralf Müller, Hermann Schulz-Baldes
    Comments: 7 pages, 3 figures, presented at ITA 2017
    Subjects: Information Theory (cs.IT)

    For noisy compressive sensing systems, the asymptotic distortion with respect
    to an arbitrary distortion function is determined when a general class of
    least-square based reconstruction schemes is employed. The sampling matrix is
    considered to belong to a large ensemble of random matrices including i.i.d.
    and projector matrices, and the source vector is assumed to be i.i.d. with a
    desired distribution. We take a statistical mechanical approach by representing
    the asymptotic distortion as a macroscopic parameter of a spin glass and
    employing the replica method for the large-system analysis. In contrast to
    earlier studies, we evaluate the general replica ansatz which includes the RS
    ansatz as well as RSB. The generality of the solution enables us to study the
    impact of symmetry breaking. Our numerical investigations depict that for the
    reconstruction scheme with the “zero-norm” penalty function, the RS fails to
    predict the asymptotic distortion for relatively large compression rates;
    however, the one-step RSB ansatz gives a valid prediction of the performance
    within a larger regime of compression rates.

    Transmit Filter and Artificial Noise Design for Secure MIMO-OFDM Systems

    Wenfei Liu, Ming Li, Xiaowen Tian, Zihuan Wang, Qian Liu
    Subjects: Information Theory (cs.IT)

    Physical layer security has been considered as an important security approach
    in wireless communications to protect legitimate transmission from passive
    eavesdroppers. This paper investigates the physical layer security of a
    wireless multiple-input multiple-output (MIMO) orthogonal frequency division
    multiplexing (OFDM) communication system in the presence of a multiple-antenna
    eavesdropper. We first propose a transmit-filter-assisted secure MIMO-OFDM
    system which can destroy the orthogonality of eavesdropper’s signals. Our
    proposed transmit filter can disturb the reception of eavesdropper while
    maintaining the quality of legitimate transmission. Then, we propose another
    artificial noise (AN)-assisted secure MIMO-OFDM system to further improve the
    security of the legitimate transmission. The time-domain AN signal is designed
    to disturb the reception of eavesdropper while the legitimate transmission will
    not be affected. Simulation results are presented to demonstrate the security
    performance of the proposed transmit filter design and AN-assisted scheme in
    the MIMO-OFDM system.

    Iterative Hybrid Precoder and Combiner Design for mmWave MIMO-OFDM Systems

    Wenfei Liu, Ming Li, Xiaowen Tian, Zihuan Wang, Qian Liu
    Subjects: Information Theory (cs.IT)

    This paper investigates the problem of hybrid precoder and combiner design
    for multiple-input multiple-output (MIMO) orthogonal frequency division
    multiplexing (OFDM) systems operating in millimeter-wave (mmWave) bands. We
    propose a novel iterative scheme to design the codebook-based analog precoder
    and combiner in forward and reverse channels. During each iteration, we apply
    compressive sensing (CS) technology to efficiently estimate the equivalent
    MIMO-OFDM mmWave channel. Then, the analog precoder or combiner is obtained
    based on the orthogonal matching pursuit (OMP) algorithm to alleviate the
    interference between different data streams as well as maximize the spectral
    efficiency. The digital precoder and combiner are finally obtained based on the
    effective baseband channel to further enhance the spectral efficiency.
    Simulation results demonstrate the proposed iterative hybrid precoder and
    combiner algorithm has significant performance advantages.

    Secure Precise Wireless Transmission with Random-Subcarrier-Selection-based Directional Modulation Transmit Antenna Array

    Feng Shu, Xiaomin Wu, Jinsong Hu, Riqing Chen, Jiangzhou Wang
    Comments: 19 pages, 5 figures
    Subjects: Information Theory (cs.IT)

    In this paper, a practical wireless transmission scheme is proposed to
    transmit confidential messages to the desired user securely and precisely by
    the joint use of multiple tools including artificial noise (AN) projection,
    phase alignment (PA)/beamforming, and random subcarrier selection (RSCS) based
    on OFDM, and directional modulation (DM), short for RSCS-OFDM-DM. This
    RSCS-OFDM-DM scheme provides an extremely low-complexity structure for the
    desired receiver and makes the secure and precise wireless transmission
    realizable in practical. For illegitimate eavesdroppers, the receive power of
    confidential messages is so weak that their receivers cannot intercept these
    confidential messages successfully once it is corrupted by AN. In such a
    scheme, the design of phase alignment/beamforming vector and AN projection
    matrix depend intimately on the desired direction angle and distance. It is
    particularly noted that the use of RSCS leads to a significant outcome that the
    receive power of confidential messages mainly concentrates on the small
    adjacent region around the desired receiver and only its power small fraction
    leaks out to the remaining large broad regions. This concept is called secure
    precise transmission. The probability density function of real-time receive
    signal-to-interference-and-noise ratio (SINR) is derived, and the average SINR
    is attained. From simulation and analysis, it follows that the proposed scheme
    actually can achieve a secure and precise wireless transmission of confidential
    messages in line-of-propagation channel and can be applied to the various
    future wireless scenarios.

    Hybrid Precoder and Combiner Design with One-Bit Quantized Phase Shifters in mmWave MIMO Systems

    Zihuan Wang, Ming Li, Xiaowen Tian, Qian Liu
    Subjects: Information Theory (cs.IT)

    Analog/digital hybrid precoder and combiner have been widely used in
    millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems due to
    its energy-efficient and economic superiorities. Infinite resolution of phase
    shifters (PSs) for the analog beamformer can achieve very close performance
    compared to the full-digital scheme but will result in high complexity and
    intensive power consumption. Thus, more cost effective and energy efficient low
    resolution PSs are typically used in practical mmWave MIMO systems. In this
    paper, we consider the joint hybrid precoder and combiner design with one-bit
    quantized PSs in mmWave MIMO systems. We propose to firstly design the analog
    precoder and combiner pair for each data stream successively, aiming at
    conditionally maximizing the spectral efficiency. We present a novel binary
    analog precoder and combiner optimization algorithm under a Rank-1
    approximation of the interference-included equivalent channel with lower than
    quadratic complexity. Then the digital precoder and combiner are computed based
    on the obtained baseband effective channel to further enhance the spectral
    efficiency. Simulation results demonstrate that the proposed algorithm
    outperforms the existing one-bit PSs based hybrid beamforming scheme.

    Joint Hybrid Precoder and Combiner Design for mmWave Spatial Multiplexing Transmission

    Zihuan Wang, Ming Li, Xiaowen Tian, Qian Liu
    Subjects: Information Theory (cs.IT)

    Millimeter-wave (mmWave) communications have been considered as a key
    technology for future 5G wireless networks because of the orders-of-magnitude
    wider bandwidth than current cellular bands. In this paper, we consider the
    problem of codebook-based joint analog-digital hybrid precoder and combiner
    design for spatial multiplexing transmission in a mmWave multiple-input
    multiple-output (MIMO) system. We propose to jointly select analog precoder and
    combiner pair for each data stream successively aiming at maximizing the
    channel gain while suppressing the interference between different data streams.
    After all analog precoder/combiner pairs have been determined, we can obtain
    the effective baseband channel. Then, the digital precoder and combiner are
    computed based on the obtained effective baseband channel to further mitigate
    the interference and maximize the sum-rate. Simulation results demonstrate that
    our proposed algorithm exhibits prominent advantages in combating interference
    between different data streams and offer satisfactory performance improvement
    compared to the existing codebook-based hybrid beamforming schemes.

    Uplink performance of multi-antenna cellular networks with co-operative base stations and user-centric clustering

    Siddhartan Govindasamy, Itsik Bergel
    Subjects: Information Theory (cs.IT)

    We consider a user-centric co-operative cellular network, where base stations
    close to a mobile co-operate to detect its signal using a (joint) linear
    minimum-mean-square-error receiver. The base stations, which have multiple
    antennas, and mobiles are modeled as independent Poisson Point Processes to
    avoid dependence on specific node locations. Combining stochastic geometry and
    infinite random matrix theory, we derive a simple expression for the spectral
    efficiency of this complex system as the number of antennas grows large. This
    expression is verified by Monte Carlo simulations which support its utility for
    even a moderate number of antennas. This result reveals the influence of
    tangible system parameters such as mobile and base-station densities, number of
    antennas per base station, and number of co-operating base stations on
    achievable spectral efficiencies. For instance, we find that for a given base
    station density and a constraint on the total number of co-operating antennas,
    all co-operating antennas should be located at a single base station. On the
    other hand, in our asymptotic regime, for the same number of co-operating
    antennas, if the network is limited by the area density of antennas, then the
    number of co-operating base stations should be increased with fewer antennas
    per base station.

    From Uncoded Prefetching to Coded Prefetching in Coded Caching

    Chao Tian, Kai Zhang
    Comments: 16 pages, 4 figures
    Subjects: Information Theory (cs.IT)

    In the coded caching framework proposed by Maddah Ali and Niesen, there are
    two classes of coding schemes known in the literature, namely uncoded
    prefetching schemes and coded prefetching schemes. In this work, we provide a
    connection between the uncoded prefetching scheme proposed by Maddah Ali and
    Niesen (and its improved version by Yu et al.) and the coded prefetching scheme
    proposed by Tian and Chen, when the number of files is no larger than that of
    users. We make a critical observation that a coding component in the Tian-Chen
    scheme can be replaced by a binary code, which enables us to view the two
    schemes as the extremes of a more general scheme. The intermediate operating
    points of this general scheme can in fact provide new tradeoff points
    previously not known in the literature, however, explicit characterizing the
    performance of this general scheme appears rather difficult.

    Sub-string/Pattern Matching in Sub-linear Time Using a Sparse Fourier Transform Approach

    Nagaraj T. Janakiraman, Avinash Vem, Krishna R. Narayanan, Jean-Francois Chamberland
    Subjects: Information Theory (cs.IT); Data Structures and Algorithms (cs.DS)

    We consider the problem of querying a string (or, a database) of length (N)
    bits to determine all the locations where a substring (query) of length (M)
    appears either exactly or is within a Hamming distance of (K) from the query.
    We assume that sketches of the original signal can be computed off line and
    stored. Using the sparse Fourier transform computation based approach
    introduced by Pawar and Ramchandran, we show that all such matches can be
    determined with high probability in sub-linear time. Specifically, if the query
    length (M = O(N^mu)) and the number of matches (L=O(N^lambda)), we show that
    for (lambda < 1-mu) all the matching positions can be determined with a
    probability that approaches 1 as (N
    ightarrow infty) for (K leq
    frac{1}{6}M). More importantly our scheme has a worst-case computational
    complexity that is only (Oleft(max{N^{1-mu}log^2 N, N^{mu+lambda}log N
    }
    ight)), which means we can recover all the matching positions in {it
    sub-linear} time for (lambda<1-mu). This is a substantial improvement over
    the best known computational complexity of (Oleft(N^{1-0.359 mu}
    ight)) for
    recovering one matching position by Andoni {em et al.} cite{andoni2013shift}.
    Further, the number of Fourier transform coefficients that need to be computed,
    stored and accessed, i.e., the sketching complexity of this algorithm is only
    (Oleft(N^{1-mu}log N
    ight)). Several extensions of the main theme are also
    discussed.

    Generalized subspace subcodes with application in cryptology

    Thierry P. Berger, Cheikh Thiécoumba Gueye, Jean Belo Klamti
    Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)

    Most of the codes that have an algebraic decoding algorithm are derived from
    the Reed Solomon codes. They are obtained by taking equivalent codes, for
    example the generalized Reed Solomon codes, or by using the so-called subfield
    subcode method, which leads to Alternant codes and Goppa codes over the
    underlying prime field, or over some intermediate subfield. The main advantages
    of these constructions is to preserve both the minimum distance and the
    decoding algorithm of the underlying Reed Solomon code. In this paper, we
    propose a generalization of the subfield subcode construction by introducing
    the notion of subspace subcodes and a generalization of the equivalence of
    codes which leads to the notion of generalized subspace subcodes. When the
    dimension of the selected subspaces is equal to one, we show that our approach
    gives exactly the family of the codes obtained by equivalence and subfield
    subcode technique. However, our approach highlights the links between the
    subfield subcode of a code defined over an extension field and the operation of
    puncturing the (q)-ary image of this code. When the dimension of the subspaces
    is greater than one, we obtain codes whose alphabet is no longer a finite
    field, but a set of r-uples. We explain why these codes are practically as
    efficient for applications as the codes defined on an extension of degree r. In
    addition, they make it possible to obtain decodable codes over a large alphabet
    having parameters previously inaccessible. As an application, we give some
    examples that can be used in public key cryptosystems such as McEliece.

    Video Streaming in Distributed Erasure-coded Storage Systems: Stall Duration Analysis

    Abubakr O. Al-Abbasi, Vaneet Aggarwal
    Comments: 15 pages, 13 figures
    Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Multimedia (cs.MM)

    The demand for global video has been burgeoning across industries. With the
    expansion and improvement of video streaming services, cloud-based video is
    evolving into a necessary feature of any successful business for reaching
    internal and external audiences. This paper considers video streaming over
    distributed systems where the video segments are encoded using an erasure code
    for better reliability thus being the first work to our best knowledge that
    considers video streaming over erasure-coded distributed cloud systems. The
    download time of each coded chunk of each video segment is characterized and
    ordered statistics over the choice of the erasure-coded chunks is used to
    obtain the playback time of different video segments. Using the playback times,
    bounds on the moment generating function on the stall duration is used to bound
    the mean stall duration. Moment generating function based bounds on the ordered
    statistics are also used to bound the stall duration tail probability which
    determines the probability that the stall time is greater than a pre-defined
    number. These two metrics, mean stall duration and the stall duration tail
    probability, are important quality of experience (QoE) measures for the end
    users. Based on these metrics, we formulate an optimization problem to jointly
    minimize the convex combination of both the QoE metrics averaged over all
    requests over the placement and access of the video content. The non-convex
    problem is solved using an efficient iterative algorithm. Numerical results
    show significant improvement in QoE metrics for cloud-based video as compared
    to the considered baselines.




沪ICP备19023445号-2号
友情链接