IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 25 Apr 2017

    我爱机器学习(52ml.net)发表于 2017-04-25 00:00:00
    love 0

    Neural and Evolutionary Computing

    Reinforcement Learning Based Dynamic Selection of Auxiliary Objectives with Preserving of the Best Found Solution

    Irina Petrova, Arina Buzdalova
    Comments: this is a full version of a paper which has been accepted as a student workshop paper to GECCO conference 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Efficiency of single-objective optimization can be improved by introducing
    some auxiliary objectives. Ideally, auxiliary objectives should be helpful.
    However, in practice, objectives may be efficient on some optimization stages
    but obstructive on others. In this paper we propose a modification of the EA+RL
    method which dynamically selects optimized objectives using reinforcement
    learning. The proposed modification prevents from losing the best found
    solution. We analysed the proposed modification and compared it with the EA+RL
    method and Random Local Search on XdivK, Generalized OneMax and LeadingOnes
    problems. The proposed modification outperforms the EA+RL method on all problem
    instances. It also outperforms the single objective approach on the most
    problem instances. We also provide detailed analysis of how different
    components of the considered algorithms influence efficiency of optimization.
    In addition, we present theoretical analysis of the proposed modification on
    the XdivK problem.

    Elite Bases Regression: A Real-time Algorithm for Symbolic Regression

    Chen Chen, Changtong Luo, Zonglin Jiang
    Subjects: Data Structures and Algorithms (cs.DS); Neural and Evolutionary Computing (cs.NE)

    Symbolic regression is an important but challenging research topic in data
    mining. It can detect the underlying mathematical models. Genetic programming
    (GP) is one of the most popular methods for symbolic regression. However, its
    convergence speed might be too slow for large scale problems with a large
    number of variables. This drawback has become a bottleneck in practical
    applications. In this paper, a new non-evolutionary real-time algorithm for
    symbolic regression, Elite Bases Regression (EBR), is proposed. EBR generates a
    set of candidate basis functions coded with parse-matrix in specific mapping
    rules. Meanwhile, a certain number of elite bases are preserved and updated
    iteratively according to the correlation coefficients with respect to the
    target model. The regression model is then spanned by the elite bases. A
    comparative study between EBR and a recent proposed machine learning method for
    symbolic regression, Fast Function eXtraction (FFX), are conducted. Numerical
    results indicate that EBR can solve symbolic regression problems more
    effectively.

    Semi-supervised Multitask Learning for Sequence Labeling

    Marek Rei
    Comments: ACL 2017
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose a sequence labeling framework with a secondary training objective,
    learning to predict surrounding words for every word in the dataset. This
    language modeling objective incentivises the system to learn general-purpose
    patterns of semantic and syntactic composition, which are also useful for
    improving accuracy on different sequence labeling tasks. The architecture was
    evaluated on a range of datasets, covering the tasks of error detection in
    learner texts, named entity recognition, chunking and POS-tagging. The novel
    language modeling objective provided consistent performance improvements on
    every benchmark, without requiring any additional annotated or unannotated
    data.

    k-FFNN: A priori knowledge infused Feed-forward Neural Networks

    Sri Harsha Dumpala, Rupayan Chakraborty, Sunil Kumar Kopparapu
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recurrent neural network (RNN) are being extensively used over feed-forward
    neural networks (FFNN) because of their inherent capability to capture temporal
    relationships that exist in the sequential data such as speech. This aspect of
    RNN is advantageous especially when there is no a priori knowledge about the
    temporal correlations within the data. However, RNNs require large amount of
    data to learn these temporal correlations, limiting their advantage in low
    resource scenarios. It is not immediately clear (a) how a priori temporal
    knowledge can be used in a FFNN architecture (b) how a FFNN performs when
    provided with this knowledge about temporal correlations (assuming available)
    during training. The objective of this paper is to explore k-FFNN, namely a
    FFNN architecture that can incorporate the a priori knowledge of the temporal
    relationships within the data sequence during training and compare k-FFNN
    performance with RNN in a low resource scenario. We evaluate the performance of
    k-FFNN and RNN by extensive experimentation on MediaEval 2016 audio data
    (“Emotional Impact of Movies” task). Experimental results show that the
    performance of k-FFNN is comparable to RNN, and in some scenarios k-FFNN
    performs better than RNN when temporal knowledge is injected into FFNN
    architecture. The main contributions of this paper are (a) fusing a priori
    knowledge into FFNN architecture to construct a k-FFNN and (b) analyzing the
    performance of k-FFNN with respect to RNN for different size of training data.

    Differentiable Scheduled Sampling for Credit Assignment

    Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick
    Comments: Accepted at ACL2017 (this http URL)
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We demonstrate that a continuous relaxation of the argmax operation can be
    used to create a differentiable approximation to greedy decoding for
    sequence-to-sequence (seq2seq) models. By incorporating this approximation into
    the scheduled sampling training procedure (Bengio et al., 2015)–a well-known
    technique for correcting exposure bias–we introduce a new training objective
    that is continuous and differentiable everywhere and that can provide
    informative gradients near points where previous decoding decisions change
    their value. In addition, by using a related approximation, we demonstrate a
    similar approach to sampled-based training. Finally, we show that our approach
    outperforms cross-entropy training and scheduled sampling procedures in two
    sequence prediction tasks: named entity recognition and machine translation.

    Translating Neuralese

    Jacob Andreas, Anca Dragan, Dan Klein
    Comments: To appear in ACL 2017
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    Several approaches have recently been proposed for learning decentralized
    deep multiagent policies that coordinate via a differentiable communication
    channel. While these policies are effective for many tasks, interpretation of
    their induced communication strategies has remained a challenge. Here we
    propose to interpret agents’ messages by translating them. Unlike in typical
    machine translation problems, we have no parallel data to learn from. Instead
    we develop a translation model based on the insight that agent messages and
    natural language strings mean the same thing if they induce the same belief
    about the world in a listener. We present theoretical guarantees and empirical
    evidence that our approach preserves both the semantics and pragmatics of
    messages by ensuring that players communicating through a translation layer do
    not suffer a substantial loss in reward relative to players with a common
    language.

    Population Seeding Techniques for Rolling Horizon Evolution in General Video Game Playing

    Rauca D. Gaina, Simon M. Lucas, Diego Perez-Liebana
    Comments: Proceedings of the IEEE Conference on Evolutionary Computation 2017
    Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    While Monte Carlo Tree Search and closely related methods have dominated
    General Video Game Playing, recent research has demonstrated the promise of
    Rolling Horizon Evolutionary Algorithms as an interesting alternative. However,
    there is little attention paid to population initialization techniques in the
    setting of general real-time video games. Therefore, this paper proposes the
    use of population seeding to improve the performance of Rolling Horizon
    Evolution and presents the results of two methods, One Step Look Ahead and
    Monte Carlo Tree Search, tested on 20 games of the General Video Game AI corpus
    with multiple evolution parameter values (population size and individual
    length). An in-depth analysis is carried out between the results of the seeding
    methods and the vanilla Rolling Horizon Evolution. In addition, the paper
    presents a comparison to a Monte Carlo Tree Search algorithm. The results are
    promising, with seeding able to boost performance significantly over baseline
    evolution and even match the high level of play obtained by the Monte Carlo
    Tree Search.

    A hybrid primal heuristic for Robust Multiperiod Network Design

    Fabio D'Andreagiovanni, Jonatan Krolikowski, Jonad Pulaj
    Comments: This is the authors’ final version of the paper published in: Esparcia-Alc’azar A., Mora A. (eds), EvoApplications 2014: Applications of Evolutionary Computation, LNCS 8602, pp. 15-26, 2014. DOI: 10.1007/978-3-662-45523-4\_2. The final publication is available at Springer via this http URL arXiv admin note: substantial text overlap with arXiv:1410.5850
    Journal-ref: Esparcia-Alc’azar A., Mora A. (Eds), EvoApplications 2014:
    Applications of Evolutionary Computation, Springer LNCS vol. 8602, pp. 15-26,
    2014
    Subjects: Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS); Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)

    We investigate the Robust Multiperiod Network Design Problem, a
    generalization of the classical Capacitated Network Design Problem that
    additionally considers multiple design periods and provides solutions protected
    against traffic uncertainty. Given the intrinsic difficulty of the problem,
    which proves challenging even for state-of-the art commercial solvers, we
    propose a hybrid primal heuristic based on the combination of ant colony
    optimization and an exact large neighborhood search. Computational experiments
    on a set of realistic instances from the SNDlib show that our heuristic can
    find solutions of extremely good quality with low optimality gap.

    A dynamic resource allocation decision model for IT security

    Lotfi Hajjem, Salah Benabdallah, Fouad Ben Abdelaziz
    Comments: 6 pages, 2010 Second International Conference on Engineering System Management and Applications
    Subjects: Cryptography and Security (cs.CR); Neural and Evolutionary Computing (cs.NE)

    Today, with the continued growth in using information and communication
    technologies (ICT) for business purposes, business organizations become
    increasingly dependent on their information systems. Thus, they need to protect
    them from the different attacks exploiting their vulnerabilities. To do so, the
    organization has to use security technologies, which may be proactive or
    reactive ones. Each security technology has a relative cost and addresses
    specific vulnerabilities. Therefore, the organization has to put in place the
    appropriate security technologies set that minimizes the information system s
    vulnerabilities with a minimal cost. This bi objective problem will be
    considered as a resources allocation problem (RAP) where security technologies
    represent the resources to be allocated. However, the set of vulnerabilities
    may change, periodically, with the continual appearance of new ones. Therefore,
    the security technologies set should be flexible to face these changes, in real
    time, and the problem becomes a dynamic one. In this paper, we propose a
    harmony search based algorithm to solve the bi objective dynamic resource
    allocation decision model. This approach was compared to a genetic algorithm
    and provided good results.

    A hybrid exact-ACO algorithm for the joint scheduling, power and cluster assignment in cooperative wireless networks

    Fabio D'Andreagiovanni
    Comments: This is the author’s final version of the paper published in G. Di Caro, G. Theraulaz (eds.), BIONETICS 2012: Bio-Inspired Models of Network, Information, and Computing Systems. LNICST, vol. 134, pp. 3-17. Springer, Heidelberg, 2014, DOI: 10.1007/978-3-319-06944-9_1 ). The final publication is available at Springer via this http URL
    Journal-ref: G. Di Caro, G. Theraulaz (eds.), BIONETICS 2012: Bio-Inspired
    Models of Network, Information, and Computing Systems. LNICST, vol. 134, pp.
    3-17, Springer, Heidelberg, 2014, DOI: 10.1007/978-3-319-06944-9_1
    Subjects: Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS); Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)

    Base station cooperation (BSC) has recently arisen as a promising way to
    increase the capacity of a wireless network. Implementing BSC adds a new design
    dimension to the classical wireless network design problem: how to define the
    subset of base stations (clusters) that coordinate to serve a user. Though the
    problem of forming clusters has been extensively discussed from a technical
    point of view, there is still a lack of effective optimization models for its
    representation and algorithms for its solution. In this work, we make a further
    step towards filling such gap: 1) we generalize the classical network design
    problem by adding cooperation as an additional decision dimension; 2) we
    develop a strong formulation for the resulting problem; 3) we define a new
    hybrid solution algorithm that combines exact large neighborhood search and ant
    colony optimization. Finally, we assess the performance of our new model and
    algorithm on a set of realistic instances of a WiMAX network.


    Computer Vision and Pattern Recognition

    Accelerated Nearest Neighbor Search with Quick ADC

    Fabien André (Technicolor), Anne-Marie Kermarrec (Inria), Nicolas Le Scouarnec (Technicolor)
    Comments: 8 pages, 5 figures, published in Proceedings of ICMR’17, Bucharest, Romania, June 06-09, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB); Information Retrieval (cs.IR); Multimedia (cs.MM); Performance (cs.PF)

    Efficient Nearest Neighbor (NN) search in high-dimensional spaces is a
    foundation of many multimedia retrieval systems. Because it offers low
    responses times, Product Quantization (PQ) is a popular solution. PQ compresses
    high-dimensional vectors into short codes using several sub-quantizers, which
    enables in-RAM storage of large databases. This allows fast answers to NN
    queries, without accessing the SSD or HDD. The key feature of PQ is that it can
    compute distances between short codes and high-dimensional vectors using
    cache-resident lookup tables. The efficiency of this technique, named
    Asymmetric Distance Computation (ADC), remains limited because it performs many
    cache accesses.

    In this paper, we introduce Quick ADC, a novel technique that achieves a 3 to
    6 times speedup over ADC by exploiting Single Instruction Multiple Data (SIMD)
    units available in current CPUs. Efficiently exploiting SIMD requires
    algorithmic changes to the ADC procedure. Namely, Quick ADC relies on two key
    modifications of ADC: (i) the use 4-bit sub-quantizers instead of the standard
    8-bit sub-quantizers and (ii) the quantization of floating-point distances.
    This allows Quick ADC to exceed the performance of state-of-the-art systems,
    e.g., it achieves a Recall@100 of 0.94 in 3.4 ms on 1 billion SIFT descriptors
    (128-bit codes).

    Detecting and Recognizing Human-Object Interactions

    Georgia Gkioxari, Ross Girshick, Piotr Dollár, Kaiming He
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    To understand the visual world, a machine must not only recognize individual
    object instances but also how they interact. Humans are often at the center of
    such interactions and detecting human-object interactions is an important
    practical and scientific problem. In this paper, we address the task of
    detecting <human, verb, object> triplets in challenging everyday photos. We
    propose a novel model that is driven by a human-centric approach. Our
    hypothesis is that the appearance of a person — their pose, clothing, action
    — is a powerful cue for localizing the objects they are interacting with. To
    exploit this cue, our model learns to predict an action-specific density over
    target object locations based on the appearance of a detected person. Our model
    also jointly learns to detect people and objects, and by fusing these
    predictions it efficiently infers interaction triplets in a clean, jointly
    trained end-to-end system we call Interact. We validate our approach on the
    recently introduced Verbs in COCO (V-COCO) dataset, where we show qualitatively
    and quantitatively compelling results.

    Accurate Optical Flow via Direct Cost Volume Processing

    Jia Xu, René Ranftl, Vladlen Koltun
    Comments: Published at the Conference on Computer Vision and Pattern Recognition (CVPR 2017)
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    We present an optical flow estimation approach that operates on the full
    four-dimensional cost volume. This direct approach shares the structural
    benefits of leading stereo matching pipelines, which are known to yield high
    accuracy. To this day, such approaches have been considered impractical due to
    the size of the cost volume. We show that the full four-dimensional cost volume
    can be constructed in a fraction of a second due to its regularity. We then
    exploit this regularity further by adapting semi-global matching to the
    four-dimensional setting. This yields a pipeline that achieves significantly
    higher accuracy than state-of-the-art optical flow methods while being faster
    than most. Our approach outperforms all published general-purpose optical flow
    methods on both Sintel and KITTI 2015 benchmarks.

    Detection, Recognition and Tracking of Moving Objects from Real-time Video via SP Theory of Intelligence and Species Inspired PSO

    Kumar S Ray, Sayandip Dutta, Anit Chakraborty
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we address the basic problem of recognizing moving objects in
    video images using SP Theory of Intelligence. The concept of SP Theory of
    Intelligence which is a framework of artificial intelligence, was first
    introduced by Gerard J Wolff, where S stands for Simplicity and P stands for
    Power. Using the concept of multiple alignment, we detect and recognize object
    of our interest in video frames with multilevel hierarchical parts and
    subparts, based on polythetic categories. We track the recognized objects using
    the species based Particle Swarm Optimization (PSO). First, we extract the
    multiple alignment of our object of interest from training images. In order to
    recognize accurately and handle occlusion, we use the polythetic concepts on
    raw data line to omit the redundant noise via searching for best alignment
    representing the features from the extracted alignments. We recognize the
    domain of interest from the video scenes in form of wide variety of multiple
    alignments to handle scene variability. Unsupervised learning is done in the SP
    model following the DONSVIC principle and natural structures are discovered via
    information compression and pattern analysis. After successful recognition of
    objects, we use species based PSO algorithm as the alignments of our object of
    interest is analogues to observation likelihood and fitness ability of species.
    Subsequently, we analyze the competition and repulsion among species with
    annealed Gaussian based PSO. We have tested our algorithms on David, Walking2,
    FaceOcc1, Jogging and Dudek, obtaining very satisfactory and competitive
    results.

    A Real-time Hand Gesture Recognition and Human-Computer Interaction System

    Pei Xu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this project, we design a real-time human-computer interaction system
    based on hand gesture. The whole system consists of three components: hand
    detection, gesture recognition and human-computer interaction (HCI) based on
    recognition; and realizes the robust control of mouse and keyboard events with
    a higher accuracy of gesture recognition. Specifically, we use the
    convolutional neural network (CNN) to recognize gestures and makes it
    attainable to identify relatively complex gestures using only one cheap
    monocular camera. We introduce the Kalman filter to estimate the hand position
    based on which the mouse cursor control is realized in a stable and smooth way.
    During the HCI stage, we develop a simple strategy to avoid the false
    recognition caused by noises – mostly transient, false gestures, and thus to
    improve the reliability of interaction. The developed system is highly
    extendable and can be used in human-robotic or other human-machine interaction
    scenarios with more complex command formats rather than just mouse and keyboard
    events.

    Measuring the Accuracy of Object Detectors and Trackers

    Tobias Bottger, Patrick Follmann, Michael Fauser
    Comments: 10 pages, 7 Figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The accuracy of object detectors and trackers is most commonly evaluated by
    the Intersection over Union (IoU) criterion. To date, most approaches are
    restricted to axis-aligned or oriented boxes and, as a consequence, many
    datasets are only labeled with boxes. Nevertheless, axis-aligned or oriented
    boxes cannot accurately capture an object’s shape. To address this, a number of
    densely segmented datasets has started to emerge in both the object detection
    and the object tracking communities. However, evaluating the accuracy of object
    detectors and trackers that are restricted to boxes on densely segmented data
    is not straightforward. To close this gap, we introduce the relative
    Intersection over Union (rIoU) accuracy measure. The measure normalizes the IoU
    with the optimal box for the segmentation to generate an accuracy measure that
    ranges between 0 and 1 and allows a more precise measurement of accuracies.
    Furthermore, it enables an efficient and easy way to understand scenes and the
    strengths and weaknesses of an object detection or tracking approach. We
    display how the new measure can be efficiently calculated and present an
    easy-to-use evaluation framework. The framework is tested on the DAVIS and the
    VOT2016 segmentations and has been made available to the community.

    Fast PET reconstruction using Multi-scale Fully Convolutional Neural Networks

    Jieqing Jiao, Sebastien Ourselin
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Reconstruction of PET images is an ill-posed inverse problem and often
    requires iterative algorithms to achieve good image quality for reliable
    clinical use in practice, at huge computational costs. In this paper, we
    consider the PET reconstruction a dense prediction problem where the large
    scale contextual information is essential, and propose a novel architecture of
    multi-scale fully convolutional neural networks (msfCNN) for fast PET image
    reconstruction. The proposed msfCNN gains large receptive fields with both
    memory and computational efficiency, by using a downscaling-upscaling structure
    and dilated convolutions. Instead of pooling and deconvolution, we propose to
    use the periodic shuffling operation from sub-pixel convolution and its inverse
    to scale the size of feature maps without losing resolution. Residual
    connections were added to improve training. We trained the proposed msfCNN
    model with simulated data, and applied it to clinical PET data acquired on a
    Siemens mMR scanner. The results from real oncological and neurodegenerative
    cases show that the proposed msfCNN-based reconstruction outperforms the
    iterative approaches in terms of computational time while achieving comparable
    image quality for quantification. The proposed msfCNN model can be applied to
    other dense prediction tasks, and fast msfCNN-based PET reconstruction could
    facilitate the potential use of molecular imaging in interventional/surgical
    procedures, where cancer surgery can particularly benefit.

    Supervised Adversarial Networks for Image Saliency Detection

    Hengyue Pan, Hui Jiang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In the past few years, Generative Adversarial Network (GAN) became a
    prevalent research topic. By defining two convolutional neural networks
    (G-Network and D-Network) and introducing an adversarial procedure between them
    during the training process, GAN has ability to generate good quality images
    that look like natural images from a random vector. Besides image generation,
    GAN may have potential to deal with wide range of real world problems. In this
    paper, we follow the basic idea of GAN and propose a novel model for image
    saliency detection, which is called Supervised Adversarial Networks (SAN).
    Specifically, SAN also trains two models simultaneously: the G-Network takes
    natural images as inputs and generates corresponding saliency maps (synthetic
    saliency maps), and the D-Network is trained to determine whether one sample is
    a synthetic saliency map or ground-truth saliency map. However, different from
    GAN, the proposed method uses fully supervised learning to learn both G-Network
    and D-Network by applying class labels of the training set. Moreover, a novel
    kind of layer call conv-comparison layer is introduced into the D-Network to
    further improve the saliency performance by forcing the high-level feature of
    synthetic saliency maps and ground-truthes as similar as possible. Experimental
    results on Pascal VOC 2012 database show that the SAN model can generate high
    quality saliency maps for many complicate natural images.

    Automatic Liver Lesion Segmentation Using A Deep Convolutional Neural Network Method

    Xiao Han
    Comments: Submission for ISBI’2017 LiTS Challenge ISIC2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Liver lesion segmentation is an important step for liver cancer diagnosis,
    treatment planning and treatment evaluation. LiTS (Liver Tumor Segmentation
    Challenge) provides a common testbed for comparing different automatic liver
    lesion segmentation methods. We participate in this challenge by developing a
    deep convolutional neural network (DCNN) method. The particular DCNN model
    works in 2.5D in that it takes a stack of adjacent slices as input and produces
    the segmentation map corresponding to the center slice. The model has 32 layers
    in total and makes use of both long range concatenation connections of U-Net
    [1] and short-range residual connections from ResNet [2]. The model was trained
    using the 130 LiTS training datasets and achieved an average Dice score of 0.67
    when evaluated on the 70 test CT scans, which ranked first for the LiTS
    challenge at the time of the ISBI 2017 conference.

    Monocular Visual Odometry with a Rolling Shutter Camera

    Chang-Ryeol Lee, Kuk-Jin Yoon
    Comments: 14 pages, 8 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Rolling Shutter (RS) cameras have become popularized because of low-cost
    imaging capability. However, the RS cameras suffer from undesirable artifacts
    when the camera or the subject is moving, or illumination condition changes.
    For that reason, Monocular Visual Odometry (MVO) with RS cameras produces
    inaccurate ego-motion estimates. Previous works solve this RS distortion
    problem with motion prediction from images and/or inertial sensors. However,
    the MVO still has trouble in handling the RS distortion when the camera motion
    changes abruptly (e.g. vibration of mobile cameras causes extremely fast motion
    instantaneously). To address the problem, we propose the novel MVO algorithm in
    consideration of the geometric characteristics of RS cameras. The key idea of
    the proposed algorithm is the new RS essential matrix which incorporates the
    instantaneous angular and linear velocities at each frame. Our algorithm
    produces accurate and robust ego-motion estimates in an online manner, and is
    applicable to various mobile applications with RS cameras. The superiority of
    the proposed algorithm is validated through quantitative and qualitative
    comparison on both synthetic and real dataset.

    Body Joint guided 3D Deep Convolutional Descriptors for Action Recognition

    Congqi Cao, Yifan Zhang, Chunjie Zhang, Hanqing Lu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Three dimensional convolutional neural networks (3D CNNs) have been
    established as a powerful tool to simultaneously learn features from both
    spatial and temporal dimensions, which is suitable to be applied to video-based
    action recognition. In this work, we propose not to directly use the
    activations of fully-connected layers of a 3D CNN as the video feature, but to
    use selective convolutional layer activations to form a discriminative
    descriptor for video. It pools the feature on the convolutional layers under
    the guidance of body joint positions. Two schemes of mapping body joints into
    convolutional feature maps for pooling are discussed. The body joint positions
    can be obtained from any off-the-shelf skeleton estimation algorithm. The
    helpfulness of the body joint guided feature pooling with inaccurate skeleton
    estimation is systematically evaluated. To make it end-to-end and do not rely
    on any sophisticated body joint detection algorithm, we further propose a
    two-stream bilinear model which can learn the guidance from the body joints and
    capture the spatio-temporal features simultaneously. In this model, the body
    joint guided feature pooling is conveniently formulated as a bilinear product
    operation. Experimental results on three real-world datasets demonstrate the
    effectiveness of body joint guided pooling which achieves promising
    performance.

    Dense 3D Facial Reconstruction from a Single Depth Image in Unconstrained Environment

    Shu Zhang, Hui Yu, Ting Wang, Junyu Dong, Honghai Liu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    With the increasing demands of applications in virtual reality such as 3D
    films, virtual Human-Machine Interactions and virtual agents, the analysis of
    3D human face analysis is considered to be more and more important as a
    fundamental step for those virtual reality tasks. Due to information provided
    by an additional dimension, 3D facial reconstruction enables aforementioned
    tasks to be achieved with higher accuracy than those based on 2D facial
    analysis. The denser the 3D facial model is, the more information it could
    provide. However, most existing dense 3D facial reconstruction methods require
    complicated processing and high system cost. To this end, this paper presents a
    novel method that simplifies the process of dense 3D facial reconstruction by
    employing only one frame of depth data obtained with an off-the-shelf RGB-D
    sensor. The experiments showed competitive results with real world data.

    Unified Framework for Automated Person Re-identification and Camera Network Topology Inference in Camera Networks

    Y.-J. Cho, J.-H. Park, S.-A. Kim, K. Lee, K.-J. Yoon
    Comments: 10 pages, 10 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Person re-identification(re-id) is the task of recognizing and identifying a
    person across multiple views in multi-camera networks. Although there has been
    much progress in person re-id, the person re-id in large-scale multi-camera
    networks still remains a challenging task because of the spatio-temporal
    uncertainty and high complexity due to large numbers of cameras and people. To
    handle these difficulties, additional information such as camera network
    topology should be provided, which is also difficult to automatically estimate.
    In this paper, we propose a unified framework which jointly solves both person
    re-id and camera network topology inference problems with minimal prior
    knowledge about the environments. The proposed framework takes general
    multi-camera network environments into account and can be applied to online
    person re-id in large-scale multi-camera networks. To effectively show the
    superiority of the proposed framework, we also provide a new person re-id
    dataset with full annotations, named SLP, captured in the synchronized
    multi-camera network. Experimental results using public and our datasets show
    that the proposed methods are promising for both person re-id and camera
    topology inference tasks.

    Target Oriented High Resolution SAR Image Formation via Semantic Information Guided Regularizations

    Biao Hou, Zaidao Wen, Licheng Jiao, Qian Wu
    Comments: Submitted to IEEE TGRS
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Sparsity-regularized synthetic aperture radar (SAR) imaging framework has
    shown its remarkable performance to generate a feature enhanced high resolution
    image, in which a sparsity-inducing regularizer is involved by exploiting the
    sparsity priors of some visual features in the underlying image. However, since
    the simple prior of low level features are insufficient to describe different
    semantic contents in the image, this type of regularizer will be incapable of
    distinguishing between the target of interest and unconcerned background
    clutters. As a consequence, the features belonging to the target and clutters
    are simultaneously affected in the generated image without concerning their
    underlying semantic labels. To address this problem, we propose a novel
    semantic information guided framework for target oriented SAR image formation,
    which aims at enhancing the interested target scatters while suppressing the
    background clutters. Firstly, we develop a new semantics-specific regularizer
    for image formation by exploiting the statistical properties of different
    semantic categories in a target scene SAR image. In order to infer the semantic
    label for each pixel in an unsupervised way, we moreover induce a novel
    high-level prior-driven regularizer and some semantic causal rules from the
    prior knowledge. Finally, our regularized framework for image formation is
    further derived as a simple iteratively reweighted (ell_1) minimization
    problem which can be conveniently solved by many off-the-shelf solvers.
    Experimental results demonstrate the effectiveness and superiority of our
    framework for SAR image formation in terms of target enhancement and clutters
    suppression, compared with the state of the arts. Additionally, the proposed
    framework opens a new direction of devoting some machine learning strategies to
    image formation, which can benefit the subsequent decision making tasks.

    Exploiting Multi-layer Graph Factorization for Multi-attributed Graph Matching

    Han-Mu Park, Kuk-Jin Yoon
    Comments: 10 pages, 4 figures, conference submitted
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Multi-attributed graph matching is a problem of finding correspondences
    between two sets of data while considering their complex properties described
    in multiple attributes. However, the information of multiple attributes is
    likely to be oversimplified during a process that makes an integrated
    attribute, and this degrades the matching accuracy. For that reason, a
    multi-layer graph structure-based algorithm has been proposed recently. It can
    effectively avoid the problem by separating attributes into multiple layers.
    Nonetheless, there are several remaining issues such as a scalability problem
    caused by the huge matrix to describe the multi-layer structure and a
    back-projection problem caused by the continuous relaxation of the quadratic
    assignment problem. In this work, we propose a novel multi-attributed graph
    matching algorithm based on the multi-layer graph factorization. We reformulate
    the problem to be solved with several small matrices that are obtained by
    factorizing the multi-layer structure. Then, we solve the problem using a
    convex-concave relaxation procedure for the multi-layer structure. The proposed
    algorithm exhibits better performance than state-of-the-art algorithms based on
    the single-layer structure.

    Camera Pose Filtering with Local Regression Geodesicsc on the Riemannian Manifold of Dual Quaternions

    Benjamin Busam, Tolga Birdal, Nassir Navab
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Time-varying, smooth trajectory estimation is of great interest to the vision
    community for accurate and well behaving 3D systems. In this paper, we propose
    a novel principal component local regression filter acting directly on the
    Riemannian manifold of unit dual quaternions (mathbb{D} mathbb{H}_1). We
    first present a numerically stable Lie algebra of the dual quaternions. Then,
    we introduce (exp) and (log) operators. Unlike state of the art path
    smoothing methods which either operate on (SEleft(3
    ight)) of rotation
    matrices or the hypersphere (mathbb{H}_1) of quaternions, we treat the
    orientation and translation jointly on the dual quaternion quadric in the
    7-dimensional real projective space (mathbb{R}mathbb{P}^7). In the end, we
    provide an outlier-robust IRLS algorithm for generic pose filtering exploiting
    this manifold structure. Besides our theoretical contributions, our experiments
    on synthetic and real data show the advantages of the manifold aware filtering
    on pose tracking and smoothing.

    A Dual Sparse Decomposition Method for Image Denoising

    Hong Sun, Chen-guang Liu, Cheng-wei Sang
    Comments: 6 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This article addresses the image denoising problem in the situations of
    strong noise. We propose a dual sparse decomposition method. This method makes
    a sub-dictionary decomposition on the over-complete dictionary in the sparse
    decomposition. The sub-dictionary decomposition makes use of a novel criterion
    based on the occurrence frequency of atoms of the over-complete dictionary over
    the data set. The experimental results demonstrate that the
    dual-sparse-decomposition method surpasses state-of-art denoising performance
    in terms of both peak-signal-to-noise ratio and
    structural-similarity-index-metric, and also at subjective visual quality.

    Non-Convex Weighted Schatten p-Norm Minimization based ADMM Framework for Image Restoration

    Zhiyuan Zha, Xinggan Zhang, Yu Wu, Qiong Wang, Lan Tang
    Comments: arXiv admin note: text overlap with arXiv:1611.08983
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Since the matrix formed by nonlocal similar patches in a natural image is of
    low rank, the nuclear norm minimization (NNM) has been widely used for image
    restoration. However, NNM tends to over-shrink the rank components and treats
    the different rank components equally, thus limits its capability and
    flexibility. This paper proposes a new approach for image restoration based
    ADMM framework via non-convex weighted Schatten (p)-norm minimization (WSNM).
    To make the proposed model tractable and robust, we have developed the
    alternative direction multiplier method (ADMM) framework to solve the proposed
    non-convex model. Experimental results on image deblurring and image inpainting
    have shown that the proposed approach outperforms many current state-of-the-art
    methods in both of PSNR and visual perception.

    Image Compressive Sensing Recovery Using Group Sparse Coding via Non-convex Weighted Lp Minimization

    Zhiyuan Zha, Xinggan Zhang, Yu Wu, Qiong Wang, Lan Tang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Compressive sensing (CS) has attracted considerable research from
    signal/image processing communities. Recent studies further show that
    structured or group sparsity often leads to more powerful signal reconstruction
    techniques in various CS taskes. Unlike the conventional sparsity-promoting
    convex regularization methods, this paper proposes a new approach for image
    compressive sensing recovery using group sparse coding via non-convex weighted
    (ell_p) minimization. To make our scheme tractable and robust, an iterative
    shrinkage/thresholding (IST) algorithm based technique is adopted to solve the
    above non-convex (ell_p) minimization problem efficiently. Experimental
    results have shown that the proposed algorithm outperforms many
    state-of-the-art techniques for image CS recovery.

    Model-based Iterative Restoration for Binary Document Image Compression with Dictionary Learning

    Yandong Guo, Cheng Lu, Jan P. Allebach, Charles A. Bouman
    Comments: CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The inherent noise in the observed (e.g., scanned) binary document image
    degrades the image quality and harms the compression ratio through breaking the
    pattern repentance and adding entropy to the document images. In this paper, we
    design a cost function in Bayesian framework with dictionary learning.
    Minimizing our cost function produces a restored image which has better quality
    than that of the observed noisy image, and a dictionary for representing and
    encoding the image. After the restoration, we use this dictionary (from the
    same cost function) to encode the restored image following the
    symbol-dictionary framework by JBIG2 standard with the lossless mode.
    Experimental results with a variety of document images demonstrate that our
    method improves the image quality compared with the observed image, and
    simultaneously improves the compression ratio. For the test images with
    synthetic noise, our method reduces the number of flipped pixels by 48.2% and
    improves the compression ratio by 36.36% as compared with the best encoding
    methods. For the test images with real noise, our method visually improves the
    image quality, and outperforms the cutting-edge method by 28.27% in terms of
    the compression ratio.

    Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition

    Yufei Wang, Zhe Lin, Xiaohui Shen, Scott Cohen, Garrison W. Cottrell
    Comments: Accepted by CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recently, there has been a lot of interest in automatically generating
    descriptions for an image. Most existing language-model based approaches for
    this task learn to generate an image description word by word in its original
    word order. However, for humans, it is more natural to locate the objects and
    their relationships first, and then elaborate on each object, describing
    notable attributes. We present a coarse-to-fine method that decomposes the
    original image description into a skeleton sentence and its attributes, and
    generates the skeleton sentence and attribute phrases separately. By this
    decomposition, our method can generate more accurate and novel descriptions
    than the previous state-of-the-art. Experimental results on the MS-COCO and a
    larger scale Stock3M datasets show that our algorithm yields consistent
    improvements across different evaluation metrics, especially on the SPICE
    metric, which has much higher correlation with human ratings than the
    conventional metrics. Furthermore, our algorithm can generate descriptions with
    varied length, benefiting from the separate control of the skeleton and
    attributes. This enables image description generation that better accommodates
    user preferences.

    Proxy Templates for Inverse Compositional Photometric Bundle Adjustment

    Christopher Ham, Simon Lucey, Surya Singh
    Comments: 8 pages, 5 figures, for supplementary material file, see this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent advances in 3D vision have demonstrated the strengths of photometric
    bundle adjustment. By directly minimizing reprojected pixel errors, instead of
    geometric reprojection errors, such methods can achieve sub-pixel alignment
    accuracy in both high and low textured regions. Typically, these problems are
    solved using a forwards compositional Lucas-Kanade formulation parameterized by
    6-DoF rigid camera poses and a depth per point in the structure. For large
    problems the most CPU-intensive component of the pipeline is the creation and
    factorization of the Hessian matrix at each iteration. For many warps, the
    inverse compositional formulation can offer significant speed-ups since the
    Hessian need only be inverted once. In this paper, we show that an ordinary
    inverse compositional formulation does not work for warps of this type of
    parameterization due to ill-conditioning of its partial derivatives. However,
    we show that it is possible to overcome this limitation by introducing the
    concept of a proxy template image. We show an order of magnitude improvement in
    speed, with little effect on quality, going from forwards to inverse
    compositional in our own photometric bundle adjustment method designed for
    object-centric structure from motion. This means less processing time for large
    systems or denser reconstructions under the same real-time constraints. We
    additionally show that this theory can be readily applied to existing methods
    by integrating it with the recently released Direct Sparse Odometry SLAM
    algorithm.

    Second-order Temporal Pooling for Action Recognition

    Anoop Cherian, Stephen Gould
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Most successful deep learning models for action recognition generate
    predictions for short video clips, which are later aggregated into a longer
    time-frame action descriptor by computing a statistic over these predictions.
    Zeroth (max) or first order (average) statistic are commonly used. In this
    paper, we explore the benefits of using second-order statistics. Specifically,
    we propose a novel end-to-end learnable action pooling scheme, temporal
    correlation pooling, that generates an action descriptor for a video sequence
    by capturing the similarities between the temporal evolution of per-frame CNN
    features across the video. Such a descriptor, while being computationally
    cheap, also naturally encodes the co-activations of multiple CNN features,
    thereby providing a richer characterization of actions than their first-order
    counterparts. We also propose higher-order extensions of this scheme by
    computing correlations after embedding the CNN features in a reproducing kernel
    Hilbert space. We provide experiments on four standard and fine-grained action
    recognition datasets. Our results clearly demonstrate the advantages of
    higher-order pooling schemes, achieving state-of-the-art performance.

    Residual Attention Network for Image Classification

    Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, Xiaoou Tang
    Comments: accepted to CVPR2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this work, we propose “Residual Attention Network”, a convolutional neural
    network using attention mechanism which can incorporate with state-of-art feed
    forward network architecture in an end-to-end training fashion. Our Residual
    Attention Network is built by stacking Attention Modules which generate
    attention-aware features. The attention-aware features from different modules
    change adaptively as layers going deeper. Inside each Attention Module,
    bottom-up top-down feedforward structure is used to unfold the feedforward and
    feedback attention process into a single feedforward process. Importantly, we
    propose attention residual learning to train very deep Residual Attention
    Networks which can be easily scaled up to hundreds of layers. Extensive
    analyses are conducted on CIFAR-10 and CIFAR-100 datasets to verify the
    effectiveness of every module mentioned above. Our Residual Attention Network
    achieves state-of-the-art object recognition performance on three benchmark
    datasets including CIFAR-10 (3.90% error), CIFAR-100 (20.45% error) and
    ImageNet (4.8% single model and single crop, top-5 error). Note that, our
    method achieves 0.6% top-1 accuracy improvement with 46% trunk depth and 69%
    forward FLOPs comparing to ResNet-200. The experiment also demonstrates that
    our network is robust against noisy labels.

    Time-Contrastive Networks: Self-Supervised Learning from Multi-View Observation

    Pierre Sermanet, Corey Lynch, Jasmine Hsu, Sergey Levine
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    We propose a self-supervised approach for learning representations entirely
    from unlabeled videos recorded from multiple viewpoints. This is particularly
    relevant to robotic imitation learning, which requires a viewpoint-invariant
    understanding of the relationships between humans and their environment,
    including object interactions, attributes and body pose. We train our
    representations using a triplet loss, where multiple simultaneous viewpoints of
    the same observation are attracted in the embedding space, while being repelled
    from temporal neighbors which are often visually similar but functionally
    different. This signal encourages our model to discover attributes that do not
    change across viewpoint, but do change across time, while ignoring nuisance
    variables such as occlusions, motion blur, lighting and background. Our
    experiments demonstrate that such a representation even acquires some degree of
    invariance to object instance. We demonstrate that our model can correctly
    identify corresponding steps in complex object interactions, such as pouring,
    across different videos with different instances. We also show what is, to the
    best of our knowledge, the first self-supervised results for end-to-end
    imitation learning of human motions by a real robot.

    A Review on Deep Learning Techniques Applied to Semantic Segmentation

    Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, Jose Garcia-Rodriguez
    Comments: Submitted to TPAMI on Apr. 22, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Image semantic segmentation is more and more being of interest for computer
    vision and machine learning researchers. Many applications on the rise need
    accurate and efficient segmentation mechanisms: autonomous driving, indoor
    navigation, and even virtual or augmented reality systems to name a few. This
    demand coincides with the rise of deep learning approaches in almost every
    field or application target related to computer vision, including semantic
    segmentation or scene understanding. This paper provides a review on deep
    learning methods for semantic segmentation applied to various application
    areas. Firstly, we describe the terminology of this field as well as mandatory
    background concepts. Next, the main datasets and challenges are exposed to help
    researchers decide which are the ones that best suit their needs and their
    targets. Then, existing methods are reviewed, highlighting their contributions
    and their significance in the field. Finally, quantitative results are given
    for the described methods and the datasets in which they were evaluated,
    following up with a discussion of the results. At last, we point out a set of
    promising future works and draw our own conclusions about the state of the art
    of semantic segmentation using deep learning techniques.

    On the Two-View Geometry of Unsynchronized Cameras

    Cenek Albl, Zuzana Kukelova, Andrew Fitzgibbon, Jan Heller, Matej Smid, Tomas Pajdla
    Comments: 12 pages, 9 figures, Computer Vision and Pattern Recognition (CVPR) 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present new methods for simultaneously estimating camera geometry and time
    shift from video sequences from multiple unsynchronized cameras. Algorithms for
    simultaneous computation of a fundamental matrix or a homography with unknown
    time shift between images are developed. Our methods use minimal correspondence
    sets (eight for fundamental matrix and four and a half for homography) and
    therefore are suitable for robust estimation using RANSAC. Furthermore, we
    present an iterative algorithm that extends the applicability on sequences
    which are significantly unsynchronized, finding the correct time shift up to
    several seconds. We evaluated the methods on synthetic and wide range of real
    world datasets and the results show a broad applicability to the problem of
    camera synchronization.

    Deep Learning for Medical Image Processing: Overview, Challenges and Future

    Muhammad Imran Razzak, Saeeda Naz, Ahmad Zaib
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Healthcare sector is totally different from other industry. It is on high
    priority sector and people expect highest level of care and services regardless
    of cost. It did not achieve social expectation even though it consume huge
    percentage of budget. Mostly the interpretations of medical data is being done
    by medical expert. In terms of image interpretation by human expert, it is
    quite limited due to its subjectivity, the complexity of the image, extensive
    variations exist across different interpreters, and fatigue. After the success
    of deep learning in other real world application, it is also providing exciting
    solutions with good accuracy for medical imaging and is seen as a key method
    for future applications in health secotr. In this chapter, we discussed state
    of the art deep learning architecture and its optimization used for medical
    image segmentation and classification. In the last section, we have discussed
    the challenges deep learning based methods for medical imaging and open
    research issue.

    Deep Learning based Isolated Arabic Scene Character Recognition

    Saad Bin Ahmed, Saeeda Naz, Muhammad Imran Razzak, Rubiyah Yousaf
    Comments: 6 pages, 8 Figures, 3 Tables, Accepted in IEEE Workshop on Arabic Script Analysis and Recognition (ASAR) 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The technological advancement and sophistication in cameras and gadgets
    prompt researchers to have focus on image analysis and text understanding. The
    deep learning techniques demonstrated well to assess the potential for
    classifying text from natural scene images as reported in recent years. There
    are variety of deep learning approaches that prospects the detection and
    recognition of text, effectively from images. In this work, we presented Arabic
    scene text recognition using Convolutional Neural Networks (ConvNets) as a deep
    learning classifier. As the scene text data is slanted and skewed, thus to deal
    with maximum variations, we employ five orientations with respect to single
    occurrence of a character. The training is formulated by keeping filter size 3
    x 3 and 5 x 5 with stride value as 1 and 2. During text classification phase,
    we trained network with distinct learning rates. Our approach reported
    encouraging results on recognition of Arabic characters from segmented Arabic
    scene images.

    Deep Learning for Content-Based, Cross-Modal Retrieval of Videos and Music

    Sungeun Hong, Woobin Im, Hyun S. Yang
    Comments: 10 pages, 7 figures, 4 tables, supplementary material link &gt;&gt; this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In the context of multimedia content, a modality can be defined as a type of
    data item such as text, images, music, and videos. Up to now, only limited
    research has been conducted on cross-modal retrieval of suitable music for a
    specified video or vice versa. Moreover, much of the existing research relies
    on metadata such as keywords, tags, or associated description that must be
    individually produced and attached posterior. This paper introduces a new
    content-based, cross-modal retrieval method for video and music that is
    implemented through deep neural networks. The proposed model consists of a
    two-branch network that extracts features from the two different modalities and
    embeds them into a single embedding space. We train the network via cross-modal
    ranking loss such that videos and music with similar semantics end up close
    together in the embedding space. In addition, to preserve inherent
    characteristics within each modality, the proposed single-modal structure loss
    was also used for training. Owing to the lack of a dataset to evaluate
    cross-modal video-music tasks, we constructed a large-scale video-music pair
    benchmark. Finally, we introduced reasonable quantitative and qualitative
    experimental protocols. The experimental results on our dataset are expected to
    be a baseline for subsequent studies of less-mature video-to-music and music-to
    video related tasks.

    Convolutional Neural Networks for Facial Expression Recognition

    Shima Alizadeh, Azar Fazel
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We have developed convolutional neural networks (CNN) for a facial expression
    recognition task. The goal is to classify each facial image into one of the
    seven facial emotion categories considered in this study. We trained CNN models
    with different depth using gray-scale images. We developed our models in Torch
    and exploited Graphics Processing Unit (GPU) computation in order to expedite
    the training process. In addition to the networks performing based on raw pixel
    data, we employed a hybrid feature strategy by which we trained a novel CNN
    model with the combination of raw pixel data and Histogram of Oriented
    Gradients (HOG) features. To reduce the overfitting of the models, we utilized
    different techniques including dropout and batch normalization in addition to
    L2 regularization. We applied cross validation to determine the optimal
    hyper-parameters and evaluated the performance of the developed models by
    looking at their training histories. We also present the visualization of
    different layers of a network to show what features of a face can be learned by
    CNN models.

    ScaleNet: Guiding Object Proposal Generation in Supermarkets and Beyond

    Siyuan Qiao, Wei Shen, Weichao Qiu, Chenxi Liu, Alan Yuille
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Motivated by product detection in supermarkets, this paper studies the
    problem of object proposal generation in supermarket images and other natural
    images. We argue that estimation of object scales in images is helpful for
    generating object proposals, especially for supermarket images where object
    scales are usually within a small range. Therefore, we propose to estimate
    object scales of images before generating object proposals. The proposed method
    for predicting object scales is called ScaleNet. To validate the effectiveness
    of ScaleNet, we build three supermarket datasets, two of which are real-world
    datasets used for testing and the other one is a synthetic dataset used for
    training. In short, we extend the previous state-of-the-art object proposal
    methods by adding a scale prediction phase. The resulted method outperforms the
    previous state-of-the-art on the supermarket datasets by a large margin. We
    also show that the approach works for object proposal on other natural images
    and it outperforms the previous state-of-the-art object proposal methods on the
    MS COCO dataset. The supermarket datasets, the virtual supermarkets, and the
    tools for creating more synthetic datasets will be made public.

    On Face Segmentation, Face Swapping, and Face Perception

    Yuval Nirkin, Iacopo Masi, Anh Tuan Tran, Tal Hassner, Gerard Medioni
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We show that even when face images are unconstrained and arbitrarily paired,
    face swapping between them is actually quite simple. To this end, we make the
    following contributions. (a) Instead of tailoring systems for face
    segmentation, as others previously proposed, we show that a standard fully
    convolutional network (FCN) can achieve remarkably fast and accurate
    segmentations, provided that it is trained on a rich enough example set. For
    this purpose, we describe novel data collection and generation routines which
    provide challenging segmented face examples. (b) We use our segmentations to
    enable robust face swapping under unprecedented conditions. (c) Unlike previous
    work, our swapping is robust enough to allow for extensive quantitative tests.
    To this end, we use the Labeled Faces in the Wild (LFW) benchmark and measure
    the effect of intra- and inter-subject face swapping on recognition. We show
    that our intra-subject swapped faces remain as recognizable as their sources,
    testifying to the effectiveness of our method. In line with well known
    perceptual studies, we show that better face swapping produces less
    recognizable inter-subject results. This is the first time this effect was
    quantitatively demonstrated for machine vision systems.

    SREFI: Synthesis of Realistic Example Face Images

    Sandipan Banerjee, John S. Bernhard, Walter J. Scheirer, Kevin W. Bowyer, Patrick J. Flynn
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we propose a novel face synthesis approach that can generate
    an arbitrarily large number of synthetic images of both real and synthetic
    identities. Thus a face image dataset can be expanded in terms of the number of
    identities represented and the number of images per identity using this
    approach, without the identity-labeling and privacy complications that come
    from downloading images from the web. To measure the visual fidelity and
    uniqueness of the synthetic face images and identities, we conducted face
    matching experiments with both human participants and a CNN pre-trained on a
    dataset of 2.6M real face images. To evaluate the stability of these synthetic
    faces, we trained a CNN model with an augmented dataset containing close to
    200,000 synthetic faces. We used a snapshot of this trained CNN to recognize
    extremely challenging frontal (real) face images. Experiments showed training
    with the augmented faces boosted the face recognition performance of the CNN.

    Scatteract: Automated extraction of data from scatter plots

    Mathieu Cliche, David Rosenberg, Dhruv Madeka, Connie Yee
    Comments: Submitted to ECML PKDD 2017 proceedings, 16 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (stat.ML)

    Charts are an excellent way to convey patterns and trends in data, but they
    do not facilitate further modeling of the data or close inspection of
    individual data points. We present a fully automated system for extracting the
    numerical values of data points from images of scatter plots. We use deep
    learning techniques to identify the key components of the chart, and optical
    character recognition together with robust regression to map from pixels to the
    coordinate system of the chart. We focus on scatter plots with linear scales,
    which already have several interesting challenges. Previous work has done fully
    automatic extraction for other types of charts, but to our knowledge this is
    the first approach that is fully automatic for scatter plots. Our method
    performs well, achieving successful data extraction on 89% of the plots in our
    test set.

    An Analysis of Action Recognition Datasets for Language and Vision Tasks

    Spandana Gella, Frank Keller
    Comments: To appear in Proceedings of ACL 2017, 8 pages
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    A large amount of recent research has focused on tasks that combine language
    and vision, resulting in a proliferation of datasets and methods. One such task
    is action recognition, whose applications include image annotation, scene
    under- standing and image retrieval. In this survey, we categorize the existing
    ap- proaches based on how they conceptualize this problem and provide a
    detailed review of existing datasets, highlighting their di- versity as well as
    advantages and disad- vantages. We focus on recently devel- oped datasets which
    link visual informa- tion with linguistic resources and provide a fine-grained
    syntactic and semantic anal- ysis of actions in images.

    Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets

    Wei-Lun Chao, Hexiang Hu, Fei Sha
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Visual question answering (QA) has attracted a lot of attention lately, seen
    essentially as a form of (visual) Turing test that artificial intelligence
    should strive to achieve. In this paper, we study a crucial component of this
    task: how can we design good datasets for the task? We focus on the design of
    multiple-choice based datasets where the learner has to select the right answer
    from a set of candidate ones including the target (i.e. the correct one) and
    the decoys (i.e. the incorrect ones). Through careful analysis of the results
    attained by state-of-the-art learning models and human annotators on existing
    datasets, we show the design of the decoy answers has a significant impact on
    how and what the learning models learn from the datasets. In particular, the
    resulting learner can ignore the visual information, the question, or the both
    while still doing well on the task. Inspired by this, we propose automatic
    procedures to remedy such design deficiencies. We apply the procedures to
    re-construct decoy answers for two popular visual QA datasets as well as to
    create a new visual QA dataset from the Visual Genome project, resulting in the
    largest dataset for this task. Extensive empirical studies show that the design
    deficiencies have been alleviated in the remedied datasets and the performance
    on them is likely a more faithful indicator of the difference among learning
    models. The datasets are released and publicly available via
    this http URL

    A General Theory for Training Learning Machine

    Hong Zhao
    Comments: 55 pages, 18 figures. arXiv admin note: substantial text overlap with arXiv:1602.03950
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Though the deep learning is pushing the machine learning to a new stage,
    basic theories of machine learning are still limited. The principle of
    learning, the role of the a prior knowledge, the role of neuron bias, and the
    basis for choosing neural transfer function and cost function, etc., are still
    far from clear. In this paper, we present a general theoretical framework for
    machine learning. We classify the prior knowledge into common and
    problem-dependent parts, and consider that the aim of learning is to maximally
    incorporate them. The principle we suggested for maximizing the former is the
    design risk minimization principle, while the neural transfer function, the
    cost function, as well as pretreatment of samples, are endowed with the role
    for maximizing the latter. The role of the neuron bias is explained from a
    different angle. We develop a Monte Carlo algorithm to establish the
    input-output responses, and we control the input-output sensitivity of a
    learning machine by controlling that of individual neurons. Applications of
    function approaching and smoothing, pattern recognition and classification, are
    provided to illustrate how to train general learning machines based on our
    theory and algorithm. Our method may in addition induce new applications, such
    as the transductive inference.

    Robust, Deep and Inductive Anomaly Detection

    Raghavendra Chalapathy (University of Sydney and Capital Markets Cooperative Research Centre (CMCRC)), Aditya Krishna Menon (Data61/CSIRO and the Australian National University), Sanjay Chawla (Qatar Computing Research Institute, HBKU)
    Comments: Submitted for review ECML PKDD 2017 Skopje, Macedonia 18-22 September the European Conference On Machine Learning & Principles and Practice of Knowledge Discovery
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    PCA is a classical statistical technique whose simplicity and maturity has
    seen it find widespread use as an anomaly detection technique. However, it is
    limited in this regard by being sensitive to gross perturbations of the input,
    and by seeking a linear subspace that captures normal behaviour. The first
    issue has been dealt with by robust PCA, a variant of PCA that explicitly
    allows for some data points to be arbitrarily corrupted, however, this does not
    resolve the second issue, and indeed introduces the new issue that one can no
    longer inductively find anomalies on a test set. This paper addresses both
    issues in a single model, the robust autoencoder. This method learns a
    nonlinear subspace that captures the majority of data points, while allowing
    for some data to have arbitrary corruption. The model is simple to train and
    leverages recent advances in the optimisation of deep neural networks.
    Experiments on a range of real-world datasets highlight the model’s
    effectiveness.


    Artificial Intelligence

    Stochastic Constraint Programming as Reinforcement Learning

    Steven Prestwich, Roberto Rossi, Armagan Tarim
    Subjects: Artificial Intelligence (cs.AI)

    Stochastic Constraint Programming (SCP) is an extension of Constraint
    Programming (CP) used for modelling and solving problems involving constraints
    and uncertainty. SCP inherits excellent modelling abilities and filtering
    algorithms from CP, but so far it has not been applied to large problems.
    Reinforcement Learning (RL) extends Dynamic Programming to large stochastic
    problems, but is problem-specific and has no generic solvers. We propose a
    hybrid combining the scalability of RL with the modelling and constraint
    filtering methods of CP. We implement a prototype in a CP system and
    demonstrate its usefulness on SCP problems.

    Analysis of Vanilla Rolling Horizon Evolution Parameters in General Video Game Playing

    Raluca D. Gaina, Jialin Liu, Simon M. Lucas, Diego Perez-Liebana
    Journal-ref: Applications of Evolutionary Computation, EvoApplications, Lecture
    Notes in Computer Science, vol. 10199, Springer, Cham., p. 418-434, 2017
    Subjects: Artificial Intelligence (cs.AI)

    Monte Carlo Tree Search techniques have generally dominated General Video
    Game Playing, but recent research has started looking at Evolutionary
    Algorithms and their potential at matching Tree Search level of play or even
    outperforming these methods. Online or Rolling Horizon Evolution is one of the
    options available to evolve sequences of actions for planning in General Video
    Game Playing, but no research has been done up to date that explores the
    capabilities of the vanilla version of this algorithm in multiple games. This
    study aims to critically analyse the different configurations regarding
    population size and individual length in a set of 20 games from the General
    Video Game AI corpus. Distinctions are made between deterministic and
    stochastic games, and the implications of using superior time budgets are
    studied. Results show that there is scope for the use of these techniques,
    which in some configurations outperform Monte Carlo Tree Search, and also
    suggest that further research in these methods could boost their performance.

    Evaluating and Modelling Hanabi-Playing Agents

    Joseph Walton-Rivers, Piers R. Williams, Richard Bartle, Diego Perez-Liebana, Simon M. Lucas
    Comments: Proceedings of the IEEE Conference on Evolutionary Computation (2017)
    Subjects: Artificial Intelligence (cs.AI)

    Agent modelling involves considering how other agents will behave, in order
    to influence your own actions. In this paper, we explore the use of agent
    modelling in the hidden-information, collaborative card game Hanabi. We
    implement a number of rule-based agents, both from the literature and of our
    own devising, in addition to an Information Set Monte Carlo Tree Search
    (IS-MCTS) agent. We observe poor results from IS-MCTS, so construct a new,
    predictor version that uses a model of the agents with which it is paired. We
    observe a significant improvement in game-playing strength from this agent in
    comparison to IS-MCTS, resulting from its consideration of what the other
    agents in a game would do. In addition, we create a flawed rule-based agent to
    highlight the predictor’s capabilities with such an agent.

    General Video Game AI: Learning from Screen Capture

    Kamolwan Kunanusont, Simon M. Lucas, Diego Perez-Liebana
    Comments: Proceedings of the IEEE Conference on Evolutionary Computation 2017
    Subjects: Artificial Intelligence (cs.AI)

    General Video Game Artificial Intelligence is a general game playing
    framework for Artificial General Intelligence research in the video-games
    domain. In this paper, we propose for the first time a screen capture learning
    agent for General Video Game AI framework. A Deep Q-Network algorithm was
    applied and improved to develop an agent capable of learning to play different
    games in the framework. After testing this algorithm using various games of
    different categories and difficulty levels, the results suggest that our
    proposed screen capture learning agent has the potential to learn many
    different games using only a single learning algorithm.

    Population Seeding Techniques for Rolling Horizon Evolution in General Video Game Playing

    Rauca D. Gaina, Simon M. Lucas, Diego Perez-Liebana
    Comments: Proceedings of the IEEE Conference on Evolutionary Computation 2017
    Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    While Monte Carlo Tree Search and closely related methods have dominated
    General Video Game Playing, recent research has demonstrated the promise of
    Rolling Horizon Evolutionary Algorithms as an interesting alternative. However,
    there is little attention paid to population initialization techniques in the
    setting of general real-time video games. Therefore, this paper proposes the
    use of population seeding to improve the performance of Rolling Horizon
    Evolution and presents the results of two methods, One Step Look Ahead and
    Monte Carlo Tree Search, tested on 20 games of the General Video Game AI corpus
    with multiple evolution parameter values (population size and individual
    length). An in-depth analysis is carried out between the results of the seeding
    methods and the vanilla Rolling Horizon Evolution. In addition, the paper
    presents a comparison to a Monte Carlo Tree Search algorithm. The results are
    promising, with seeding able to boost performance significantly over baseline
    evolution and even match the high level of play obtained by the Monte Carlo
    Tree Search.

    Multi-Objective Deep Q-Learning with Subsumption Architecture

    Tomasz Tajmajer
    Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

    In this work we present a method for using Deep Q-Networks (DQNs) in
    multi-objective tasks. Deep Q-Networks provide remarkable performance in single
    objective tasks learning from high-level visual perception. However, in many
    scenarios (e.g in robotics), the agent needs to pursue multiple objectives
    simultaneously. We propose an architecture in which separate DQNs are used to
    control the agent’s behaviour with respect to particular objectives. In this
    architecture we use signal suppression, known from the (Brooks) subsumption
    architecture, to combine outputs of several DQNs into a single action. Our
    architecture enables the decomposition of the agent’s behaviour into
    controllable and replaceable sub-behaviours learned by distinct modules. To
    evaluate our solution we used a game-like simulator in which an agent –
    provided with high-level visual input – pursues multiple objectives in a 2D
    world. Our solution provides benefits of modularity, while its performance is
    comparable to the monolithic approach.

    Governing Governance: A Formal Framework for Analysing Institutional Design and Enactment Governance

    Thomas C. King
    Journal-ref: SIKS Dissertation Series No. 2016-41
    Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

    This dissertation is motivated by the need, in today’s globalist world, for a
    precise way to enable governments, organisations and other regulatory bodies to
    evaluate the constraints they place on themselves and others. An organisation’s
    modus operandi is enacting and fulfilling contracts between itself and its
    participants. Yet, organisational contracts should respect external laws, such
    as those setting out data privacy rights and liberties. Contracts can only be
    enacted by following contract law processes, which often require bilateral
    agreement and consideration. Governments need to legislate whilst understanding
    today’s context of national and international governance hierarchy where law
    makers shun isolationism and seek to influence one another. Governments should
    avoid punishment by respecting constraints from international treaties and
    human rights charters. Governments can only enact legislation by following
    their own, pre-existing, law making procedures. In other words, institutions,
    such as laws and contracts are designed and enacted under constraints.

    Accurate Optical Flow via Direct Cost Volume Processing

    Jia Xu, René Ranftl, Vladlen Koltun
    Comments: Published at the Conference on Computer Vision and Pattern Recognition (CVPR 2017)
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    We present an optical flow estimation approach that operates on the full
    four-dimensional cost volume. This direct approach shares the structural
    benefits of leading stereo matching pipelines, which are known to yield high
    accuracy. To this day, such approaches have been considered impractical due to
    the size of the cost volume. We show that the full four-dimensional cost volume
    can be constructed in a fraction of a second due to its regularity. We then
    exploit this regularity further by adapting semi-global matching to the
    four-dimensional setting. This yields a pipeline that achieves significantly
    higher accuracy than state-of-the-art optical flow methods while being faster
    than most. Our approach outperforms all published general-purpose optical flow
    methods on both Sintel and KITTI 2015 benchmarks.

    Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM

    Elena Kochkina, Maria Liakata, Isabelle Augenstein
    Comments: SemEval 2017 RumourEval: Determining rumour veracity and support for rumours (SemEval 2017 Task 8, Subtask A)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    This paper describes team Turing’s submission to SemEval 2017 RumourEval:
    Determining rumour veracity and support for rumours (SemEval 2017 Task 8,
    Subtask A). Subtask A addresses the challenge of rumour stance classification,
    which involves identifying the attitude of Twitter users towards the
    truthfulness of the rumour they are discussing. Stance classification is
    considered to be an important step towards rumour verification, therefore
    performing well in this task is expected to be useful in debunking false
    rumours. In this work we classify a set of Twitter posts discussing rumours
    into either supporting, denying, questioning or commenting on the underlying
    rumours. We propose a LSTM-based sequential model that, through modelling the
    conversational structure of tweets, which achieves an accuracy of 0.784 on the
    RumourEval test set outperforming all other systems in Subtask A.

    Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets

    Wei-Lun Chao, Hexiang Hu, Fei Sha
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Visual question answering (QA) has attracted a lot of attention lately, seen
    essentially as a form of (visual) Turing test that artificial intelligence
    should strive to achieve. In this paper, we study a crucial component of this
    task: how can we design good datasets for the task? We focus on the design of
    multiple-choice based datasets where the learner has to select the right answer
    from a set of candidate ones including the target (i.e. the correct one) and
    the decoys (i.e. the incorrect ones). Through careful analysis of the results
    attained by state-of-the-art learning models and human annotators on existing
    datasets, we show the design of the decoy answers has a significant impact on
    how and what the learning models learn from the datasets. In particular, the
    resulting learner can ignore the visual information, the question, or the both
    while still doing well on the task. Inspired by this, we propose automatic
    procedures to remedy such design deficiencies. We apply the procedures to
    re-construct decoy answers for two popular visual QA datasets as well as to
    create a new visual QA dataset from the Visual Genome project, resulting in the
    largest dataset for this task. Extensive empirical studies show that the design
    deficiencies have been alleviated in the remedied datasets and the performance
    on them is likely a more faithful indicator of the difference among learning
    models. The datasets are released and publicly available via
    this http URL

    Naturalizing a Programming Language via Interactive Learning

    Sida I. Wang, Samuel Ginn, Percy Liang, Christoper D. Manning
    Comments: 10 pages, ACL2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)

    Our goal is to create a convenient natural language interface for performing
    well-specified but complex actions such as analyzing data, manipulating text,
    and querying databases. However, existing natural language interfaces for such
    tasks are quite primitive compared to the power one wields with a programming
    language. To bridge this gap, we start with a core programming language and
    allow users to “naturalize” the core language incrementally by defining
    alternative, more natural syntax and increasingly complex concepts in terms of
    compositions of simpler ones. In a voxel world, we show that a community of
    users can simultaneously teach a common system a diverse language and use it to
    build hundreds of complex voxel structures. Over the course of three days,
    these users went from using only the core language to using the naturalized
    language in 85.9\% of the last 10K utterances.

    A General Theory for Training Learning Machine

    Hong Zhao
    Comments: 55 pages, 18 figures. arXiv admin note: substantial text overlap with arXiv:1602.03950
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Though the deep learning is pushing the machine learning to a new stage,
    basic theories of machine learning are still limited. The principle of
    learning, the role of the a prior knowledge, the role of neuron bias, and the
    basis for choosing neural transfer function and cost function, etc., are still
    far from clear. In this paper, we present a general theoretical framework for
    machine learning. We classify the prior knowledge into common and
    problem-dependent parts, and consider that the aim of learning is to maximally
    incorporate them. The principle we suggested for maximizing the former is the
    design risk minimization principle, while the neural transfer function, the
    cost function, as well as pretreatment of samples, are endowed with the role
    for maximizing the latter. The role of the neuron bias is explained from a
    different angle. We develop a Monte Carlo algorithm to establish the
    input-output responses, and we control the input-output sensitivity of a
    learning machine by controlling that of individual neurons. Applications of
    function approaching and smoothing, pattern recognition and classification, are
    provided to illustrate how to train general learning machines based on our
    theory and algorithm. Our method may in addition induce new applications, such
    as the transductive inference.

    A Review on Deep Learning Techniques Applied to Semantic Segmentation

    Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, Jose Garcia-Rodriguez
    Comments: Submitted to TPAMI on Apr. 22, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Image semantic segmentation is more and more being of interest for computer
    vision and machine learning researchers. Many applications on the rise need
    accurate and efficient segmentation mechanisms: autonomous driving, indoor
    navigation, and even virtual or augmented reality systems to name a few. This
    demand coincides with the rise of deep learning approaches in almost every
    field or application target related to computer vision, including semantic
    segmentation or scene understanding. This paper provides a review on deep
    learning methods for semantic segmentation applied to various application
    areas. Firstly, we describe the terminology of this field as well as mandatory
    background concepts. Next, the main datasets and challenges are exposed to help
    researchers decide which are the ones that best suit their needs and their
    targets. Then, existing methods are reviewed, highlighting their contributions
    and their significance in the field. Finally, quantitative results are given
    for the described methods and the datasets in which they were evaluated,
    following up with a discussion of the results. At last, we point out a set of
    promising future works and draw our own conclusions about the state of the art
    of semantic segmentation using deep learning techniques.

    Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures

    Daniel R. Jiang, Warren B. Powell
    Comments: 51 pages, 9 figures
    Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI)

    In this paper, we consider a finite-horizon Markov decision process (MDP) for
    which the objective at each stage is to minimize a quantile-based risk measure
    (QBRM) of the sequence of future costs; we call the overall objective a dynamic
    quantile-based risk measure (DQBRM). In particular, we consider optimizing
    dynamic risk measures where the one-step risk measures are QBRMs, a class of
    risk measures that includes the popular value at risk (VaR) and the conditional
    value at risk (CVaR). Although there is considerable theoretical development of
    risk-averse MDPs in the literature, the computational challenges have not been
    explored as thoroughly. We propose a data-driven or simulation-based
    approximate dynamic programming (ADP) algorithm to solve the risk-averse
    sequential decision problem. In addition, we address the issue of inefficient
    sampling for risk applications in simulated settings and present a procedure,
    based on importance sampling, to direct samples toward the “risky region” as
    the ADP algorithm progresses. Finally, we show numerical results of our
    algorithms in the context of an application involving risk-averse bidding for
    energy storage.


    Information Retrieval

    Distant Supervision for Topic Classification of Tweets in Curated Streams

    Salman Mohammed, Nimesh Ghelani, Jimmy Lin
    Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    We tackle the challenge of topic classification of tweets in the context of
    analyzing a large collection of curated streams by news outlets and other
    organizations to deliver relevant content to users. Our approach is novel in
    applying distant supervision based on semi-automatically identifying curated
    streams that are topically focused (for example, on politics, entertainment, or
    sports). These streams provide a source of labeled data to train topic
    classifiers that can then be applied to categorize tweets from more
    topically-diffuse streams. Experiments on both noisy labels and human
    ground-truth judgments demonstrate that our approach yields good topic
    classifiers essentially “for free”, and that topic classifiers trained in this
    manner are able to dynamically adjust for topic drift as news on Twitter
    evolves.

    Accelerated Nearest Neighbor Search with Quick ADC

    Fabien André (Technicolor), Anne-Marie Kermarrec (Inria), Nicolas Le Scouarnec (Technicolor)
    Comments: 8 pages, 5 figures, published in Proceedings of ICMR’17, Bucharest, Romania, June 06-09, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB); Information Retrieval (cs.IR); Multimedia (cs.MM); Performance (cs.PF)

    Efficient Nearest Neighbor (NN) search in high-dimensional spaces is a
    foundation of many multimedia retrieval systems. Because it offers low
    responses times, Product Quantization (PQ) is a popular solution. PQ compresses
    high-dimensional vectors into short codes using several sub-quantizers, which
    enables in-RAM storage of large databases. This allows fast answers to NN
    queries, without accessing the SSD or HDD. The key feature of PQ is that it can
    compute distances between short codes and high-dimensional vectors using
    cache-resident lookup tables. The efficiency of this technique, named
    Asymmetric Distance Computation (ADC), remains limited because it performs many
    cache accesses.

    In this paper, we introduce Quick ADC, a novel technique that achieves a 3 to
    6 times speedup over ADC by exploiting Single Instruction Multiple Data (SIMD)
    units available in current CPUs. Efficiently exploiting SIMD requires
    algorithmic changes to the ADC procedure. Namely, Quick ADC relies on two key
    modifications of ADC: (i) the use 4-bit sub-quantizers instead of the standard
    8-bit sub-quantizers and (ii) the quantization of floating-point distances.
    This allows Quick ADC to exceed the performance of state-of-the-art systems,
    e.g., it achieves a Recall@100 of 0.94 in 3.4 ms on 1 billion SIFT descriptors
    (128-bit codes).

    Exploring the Evolution of Node Neighborhoods in Dynamic Networks

    Günce Keziban Orman, Vincent Labatut (LIA), Ahmet Teoman Naskali
    Comments: Physica A, Elsevier, 2017
    Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR); Physics and Society (physics.soc-ph)

    Dynamic Networks are a popular way of modeling and studying the behavior of
    evolving systems. However, their analysis constitutes a relatively recent
    subfield of Network Science, and the number of available tools is consequently
    much smaller than for static networks. In this work, we propose a method
    specifically designed to take advantage of the longitudinal nature of dynamic
    networks. It characterizes each individual node by studying the evolution of
    its direct neighborhood, based on the assumption that the way this neighborhood
    changes reflects the role and position of the node in the whole network. For
    this purpose, we define the concept of extit{neighborhood event}, which
    corresponds to the various transformations such groups of nodes can undergo,
    and describe an algorithm for detecting such events. We demonstrate the
    interest of our method on three real-world networks: DBLP, LastFM and Enron. We
    apply frequent pattern mining to extract meaningful information from temporal
    sequences of neighborhood events. This results in the identification of
    behavioral trends emerging in the whole network, as well as the individual
    characterization of specific nodes. We also perform a cluster analysis, which
    reveals that, in all three networks, one can distinguish two types of nodes
    exhibiting different behaviors: a very small group of active nodes, whose
    neighborhood undergo diverse and frequent events, and a very large group of
    stable nodes.

    Ranking with Fairness Constraints

    L. Elisa Celis, Damian Straszak, Nisheeth K. Vishnoi
    Subjects: Data Structures and Algorithms (cs.DS); Computers and Society (cs.CY); Information Retrieval (cs.IR)

    The problem of ranking a set of items is fundamental in today’s data-driven
    world. Ranking algorithms lie at the core of applications such as search
    engines, news feeds, and recommendation systems. However, recent events have
    pointed to the fact that algorithmic bias in rankings, which results in
    decreased fairness or diversity in the type of content presented, can promote
    stereotypes and propagate injustices. Motivated by such applications, we
    initiate the study of the traditional ranking problem with additional fairness
    constraints. Given a collection of items along with 1) the value of placing an
    item at a particular position, 2) the collection of possibly non-disjoint
    attributes (e.g., gender, race) of each item and 3) a collection of fairness
    constraints that upper bound the number of items with each attribute that are
    allowed to appear in the top positions of the ranking, the goal is to output a
    ranking that maximizes value while respecting the constraints. This problem
    encapsulates various well-studied problems related to matching as special cases
    and turns out to be hard to approximate even with simple constraints.

    Our main technical contributions are exact and approximation algorithms and
    hardness results that, together, come close to settling the approximability of
    the constrained ranking maximization problem. Technically, our main results
    rely on novel insights about the standard linear programming relaxation for the
    constrained matching problem when the objective function satisfies certain
    properties that appear in common ranking metrics such as DCG, Spearman’s rho or
    Bradley-Terry, along with the structure of fairness constraints. Overall, our
    results contribute to the growing set of algorithms that can counter
    algorithmic bias, and the structural insights obtained may find use in other
    algorithmic contexts related to ranking problems.

    Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks

    Federico Monti, Michael M. Bronstein, Xavier Bresson
    Subjects: Learning (cs.LG); Information Retrieval (cs.IR); Numerical Analysis (cs.NA); Machine Learning (stat.ML)

    Matrix completion models are among the most common formulations of
    recommender systems. Recent works have showed a boost of performance of these
    techniques when introducing the pairwise relationships between users/items in
    the form of graphs, and imposing smoothness priors on these graphs. However,
    such techniques do not fully exploit the local stationarity structures of
    user/item graphs, and the number of parameters to learn is linear w.r.t. the
    number of users and items. We propose a novel approach to overcome these
    limitations by using geometric deep learning on graphs. Our matrix completion
    architecture combines graph convolutional neural networks and recurrent neural
    networks to learn meaningful statistical graph-structured patterns and the
    non-linear diffusion process that generates the known ratings. This neural
    network system requires a constant number of parameters independent of the
    matrix size. We apply our method on both synthetic and real datasets, showing
    that it outperforms state-of-the-art techniques.

    Scatteract: Automated extraction of data from scatter plots

    Mathieu Cliche, David Rosenberg, Dhruv Madeka, Connie Yee
    Comments: Submitted to ECML PKDD 2017 proceedings, 16 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (stat.ML)

    Charts are an excellent way to convey patterns and trends in data, but they
    do not facilitate further modeling of the data or close inspection of
    individual data points. We present a fully automated system for extracting the
    numerical values of data points from images of scatter plots. We use deep
    learning techniques to identify the key components of the chart, and optical
    character recognition together with robust regression to map from pixels to the
    coordinate system of the chart. We focus on scatter plots with linear scales,
    which already have several interesting challenges. Previous work has done fully
    automatic extraction for other types of charts, but to our knowledge this is
    the first approach that is fully automatic for scatter plots. Our method
    performs well, achieving successful data extraction on 89% of the plots in our
    test set.


    Computation and Language

    A Trie-Structured Bayesian Model for Unsupervised Morphological Segmentation

    Murathan Kurfalı, Ahmet Üstün, Burcu Can
    Comments: 12 pages, accepted and presented at the CICLING 2017 – 18th International Conference on Intelligent Text Processing and Computational Linguistics
    Subjects: Computation and Language (cs.CL)

    In this paper, we introduce a trie-structured Bayesian model for unsupervised
    morphological segmentation. We adopt prior information from different sources
    in the model. We use neural word embeddings to discover words that are
    morphologically derived from each other and thereby that are semantically
    similar. We use letter successor variety counts obtained from tries that are
    built by neural word embeddings. Our results show that using different
    information sources such as neural word embeddings and letter successor variety
    as prior information improves morphological segmentation in a Bayesian model.
    Our model outperforms other unsupervised morphological segmentation models on
    Turkish and gives promising results on English and German for scarce resources.

    Joint Modeling of Text and Acoustic-Prosodic Cues for Neural Parsing

    Trang Tran, Shubham Toshniwal, Mohit Bansal, Kevin Gimpel, Karen Livescu, Mari Ostendorf
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Sound (cs.SD)

    In conversational speech, the acoustic signal provides cues that help
    listeners disambiguate difficult parses. For automatically parsing a spoken
    utterance, we introduce a model that integrates transcribed text and
    acoustic-prosodic features using a convolutional neural network over energy and
    pitch trajectories coupled with an attention-based recurrent neural network
    that accepts text and word-based prosodic features. We find that different
    types of acoustic-prosodic features are individually helpful, and together
    improve parse F1 scores significantly over a strong text-only baseline. For
    this study with known sentence boundaries, error analysis shows that the main
    benefit of acoustic-prosodic features is in sentences with disfluencies and
    that attachment errors are most improved.

    Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM

    Elena Kochkina, Maria Liakata, Isabelle Augenstein
    Comments: SemEval 2017 RumourEval: Determining rumour veracity and support for rumours (SemEval 2017 Task 8, Subtask A)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    This paper describes team Turing’s submission to SemEval 2017 RumourEval:
    Determining rumour veracity and support for rumours (SemEval 2017 Task 8,
    Subtask A). Subtask A addresses the challenge of rumour stance classification,
    which involves identifying the attitude of Twitter users towards the
    truthfulness of the rumour they are discussing. Stance classification is
    considered to be an important step towards rumour verification, therefore
    performing well in this task is expected to be useful in debunking false
    rumours. In this work we classify a set of Twitter posts discussing rumours
    into either supporting, denying, questioning or commenting on the underlying
    rumours. We propose a LSTM-based sequential model that, through modelling the
    conversational structure of tweets, which achieves an accuracy of 0.784 on the
    RumourEval test set outperforming all other systems in Subtask A.

    What is the Essence of a Claim? Cross-Domain Claim Identification

    Johannes Daxenberger, Steffen Eger, Ivan Habernal, Christian Stab, Iryna Gurevych
    Comments: Under review
    Subjects: Computation and Language (cs.CL)

    Argument mining has become a popular research area in NLP. It typically
    includes the identification of argumentative components, e.g. claims, as the
    central component of an argument. We perform a qualitative analysis across six
    different datasets and show that these appear to conceptualize claims quite
    differently. To learn about the consequences of such different
    conceptualizations of claim for practical applications, we carried out
    extensive experiments using state-of-the-art feature-rich and deep learning
    systems, to identify claims in a cross-domain fashion. While the divergent
    perception of claims in different datasets is indeed harmful to cross-domain
    classification, we show that there are shared properties on the lexical level
    as well as system configurations that can help to overcome these gaps.

    Watset: Automatic Induction of Synsets from a Graph of Synonyms

    Dmitry Ustalov, Alexander Panchenko, Chris Biemann
    Comments: 12 pages, 3 figures, 6 tables, accepted to ACL 2017
    Subjects: Computation and Language (cs.CL)

    This paper presents a new graph-based approach that induces synsets using
    synonymy dictionaries and word embeddings. First, we build a weighted graph of
    synonyms extracted from commonly available resources, such as Wiktionary.
    Second, we apply word sense induction to deal with ambiguous words. Finally, we
    cluster the disambiguated version of the ambiguous input graph into synsets.
    Our meta-clustering approach lets us use an efficient hard clustering algorithm
    to perform a fuzzy clustering of the graph. Despite its simplicity, our
    approach shows excellent results, outperforming five competitive
    state-of-the-art methods in terms of F-score on three gold standard datasets
    for English and Russian derived from large-scale manually constructed lexical
    resources.

    Semi-supervised Multitask Learning for Sequence Labeling

    Marek Rei
    Comments: ACL 2017
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose a sequence labeling framework with a secondary training objective,
    learning to predict surrounding words for every word in the dataset. This
    language modeling objective incentivises the system to learn general-purpose
    patterns of semantic and syntactic composition, which are also useful for
    improving accuracy on different sequence labeling tasks. The architecture was
    evaluated on a range of datasets, covering the tasks of error detection in
    learner texts, named entity recognition, chunking and POS-tagging. The novel
    language modeling objective provided consistent performance improvements on
    every benchmark, without requiring any additional annotated or unannotated
    data.

    Found in Translation: Reconstructing Phylogenetic Language Trees from Translations

    Ella Rabinovich, Noam Ordan, Shuly Wintner
    Comments: ACL2017, 11 pages
    Subjects: Computation and Language (cs.CL)

    Translation has played an important role in trade, law, commerce, politics,
    and literature for thousands of years. Translators have always tried to be
    invisible; ideal translations should look as if they were written originally in
    the target language. We show that traces of the source language remain in the
    translation product to the extent that it is possible to uncover the history of
    the source language by looking only at the translation. Specifically, we
    automatically reconstruct phylogenetic language trees from monolingual texts
    (translated from several source languages). The signal of the source language
    is so powerful that it is retained even after two phases of translation. This
    strongly indicates that source language interference is the most dominant
    characteristic of translated texts, overshadowing the more subtle signals of
    universal properties of translation.

    Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search

    Chris Hokamp, Qun Liu
    Comments: Accepted as a long paper at ACL 2017
    Subjects: Computation and Language (cs.CL)

    We present Grid Beam Search (GBS), an algorithm which extends beam search to
    allow the inclusion of pre-specified lexical constraints. The algorithm can be
    used with any model that generates a sequence ( mathbf{hat{y}} =
    {y_{0}ldots y_{T}} ), by maximizing ( p(mathbf{y} | mathbf{x}) =
    prodlimits_{t}p(y_{t} | mathbf{x}; {y_{0} ldots y_{t-1}}) ). Lexical
    constraints take the form of phrases or words that must be present in the
    output sequence. This is a very general way to incorporate additional knowledge
    into a model’s output without requiring any modification of the model
    parameters or training data. We demonstrate the feasibility and flexibility of
    Lexically Constrained Decoding by conducting experiments on Neural
    Interactive-Predictive Translation, as well as Domain Adaptation for Neural
    Machine Translation. Experiments show that GBS can provide large improvements
    in translation quality in interactive scenarios, and that, even without any
    user input, GBS can be used to achieve significant gains in performance in
    domain adaptation scenarios.

    Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings

    He He, Anusha Balakrishnan, Mihail Eric, Percy Liang
    Comments: ACL 2017
    Subjects: Computation and Language (cs.CL)

    We study a symmetric collaborative dialogue setting in which two agents, each
    with private knowledge, must strategically communicate to achieve a common
    goal. The open-ended dialogue state in this setting poses new challenges for
    existing dialogue systems. We collected a dataset of 11K human-human dialogues,
    which exhibits interesting lexical, semantic, and strategic elements. To model
    both structured knowledge and unstructured language, we propose a neural model
    with dynamic knowledge graph embeddings that evolve as the dialogue progresses.
    Automatic and human evaluations show that our model is both more effective at
    achieving the goal and more human-like than baseline neural and rule-based
    models.

    An Analysis of Action Recognition Datasets for Language and Vision Tasks

    Spandana Gella, Frank Keller
    Comments: To appear in Proceedings of ACL 2017, 8 pages
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    A large amount of recent research has focused on tasks that combine language
    and vision, resulting in a proliferation of datasets and methods. One such task
    is action recognition, whose applications include image annotation, scene
    under- standing and image retrieval. In this survey, we categorize the existing
    ap- proaches based on how they conceptualize this problem and provide a
    detailed review of existing datasets, highlighting their di- versity as well as
    advantages and disad- vantages. We focus on recently devel- oped datasets which
    link visual informa- tion with linguistic resources and provide a fine-grained
    syntactic and semantic anal- ysis of actions in images.

    Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets

    Wei-Lun Chao, Hexiang Hu, Fei Sha
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Visual question answering (QA) has attracted a lot of attention lately, seen
    essentially as a form of (visual) Turing test that artificial intelligence
    should strive to achieve. In this paper, we study a crucial component of this
    task: how can we design good datasets for the task? We focus on the design of
    multiple-choice based datasets where the learner has to select the right answer
    from a set of candidate ones including the target (i.e. the correct one) and
    the decoys (i.e. the incorrect ones). Through careful analysis of the results
    attained by state-of-the-art learning models and human annotators on existing
    datasets, we show the design of the decoy answers has a significant impact on
    how and what the learning models learn from the datasets. In particular, the
    resulting learner can ignore the visual information, the question, or the both
    while still doing well on the task. Inspired by this, we propose automatic
    procedures to remedy such design deficiencies. We apply the procedures to
    re-construct decoy answers for two popular visual QA datasets as well as to
    create a new visual QA dataset from the Visual Genome project, resulting in the
    largest dataset for this task. Extensive empirical studies show that the design
    deficiencies have been alleviated in the remedied datasets and the performance
    on them is likely a more faithful indicator of the difference among learning
    models. The datasets are released and publicly available via
    this http URL

    Robust Incremental Neural Semantic Graph Parsing

    Jan Buys, Phil Blunsom
    Comments: 12 pages; Accepted to ACL 2017
    Subjects: Computation and Language (cs.CL)

    Parsing sentences to linguistically-expressive semantic representations is a
    key goal of Natural Language Processing. Yet statistical parsing has focused
    almost exclusively on bilexical dependencies or domain-specific logical forms.
    We propose a neural encoder-decoder transition-based parser which is the first
    full-coverage semantic graph parser for Minimal Recursion Semantics (MRS). The
    model architecture uses stack-based embedding features, predicting graphs
    jointly with unlexicalized predicates and their token alignments. Our parser is
    more accurate than attention-based baselines on MRS, and on an additional
    Abstract Meaning Representation (AMR) benchmark, and GPU batch processing makes
    it an order of magnitude faster than a high-precision grammar-based parser.
    Further, the 86.69% Smatch score of our MRS parser is higher than the
    upper-bound on AMR parsing, making MRS an attractive choice as a semantic
    representation.

    Selective Encoding for Abstractive Sentence Summarization

    Qingyu Zhou, Nan Yang, Furu Wei, Ming Zhou
    Comments: 10 pages; To appear in ACL 2017
    Subjects: Computation and Language (cs.CL)

    We propose a selective encoding model to extend the sequence-to-sequence
    framework for abstractive sentence summarization. It consists of a sentence
    encoder, a selective gate network, and an attention equipped decoder. The
    sentence encoder and decoder are built with recurrent neural networks. The
    selective gate network constructs a second level sentence representation by
    controlling the information flow from encoder to decoder. The second level
    representation is tailored for sentence summarization task, which leads to
    better performance. We evaluate our model on the English Gigaword, DUC 2004 and
    MSR abstractive sentence summarization datasets. The experimental results show
    that the proposed selective encoding model outperforms the state-of-the-art
    baseline models.

    Using Global Constraints and Reranking to Improve Cognates Detection

    Michael Bloodgood, Benjamin Strauss
    Comments: 10 pages, 6 figures, 6 tables; to appear in the Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    Global constraints and reranking have not been used in cognates detection
    research to date. We propose methods for using global constraints by performing
    rescoring of the score matrices produced by state of the art cognates detection
    systems. Using global constraints to perform rescoring is complementary to
    state of the art methods for performing cognates detection and results in
    significant performance improvements beyond current state of the art
    performance on publicly available datasets with different language pairs and
    various conditions such as different levels of baseline state of the art
    performance and different data size conditions, including with more realistic
    large data size conditions than have been evaluated with in the past.

    Fast and Accurate Neural Word Segmentation for Chinese

    Deng Cai, Hai Zhao, Zhisong Zhang, Yuan Xin, Yongjian Wu, Feiyue Huang
    Comments: To appear in ACL2017
    Subjects: Computation and Language (cs.CL)

    Neural models with minimal feature engineering have achieved competitive
    performance against traditional methods for the task of Chinese word
    segmentation. However, both training and working procedures of the current
    neural models are computationally inefficient. This paper presents a greedy
    neural word segmenter with balanced word and character embedding inputs to
    alleviate the existing drawbacks. Our segmenter is truly end-to-end, capable of
    performing segmentation much faster and even more accurate than
    state-of-the-art neural models on Chinese benchmark datasets.

    Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling

    Kazuya Kawakami, Chris Dyer, Phil Blunsom
    Comments: ACL 2017
    Subjects: Computation and Language (cs.CL)

    Fixed-vocabulary language models fail to account for one of the most
    characteristic statistical facts of natural language: the frequent creation and
    reuse of new word types. Although character-level language models offer a
    partial solution in that they can create word types not attested in the
    training corpus, they do not capture the “bursty” distribution of such words.
    In this paper, we augment a hierarchical LSTM language model that generates
    sequences of word tokens character by character with a caching mechanism that
    learns to reuse previously generated words. To validate our model we construct
    a new open-vocabulary language modeling corpus (the Multilingual Wikipedia
    Corpus, MWC) from comparable Wikipedia articles in 7 typologically diverse
    languages and demonstrate the effectiveness of our model across this range of
    languages.

    Differentiable Scheduled Sampling for Credit Assignment

    Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick
    Comments: Accepted at ACL2017 (this http URL)
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We demonstrate that a continuous relaxation of the argmax operation can be
    used to create a differentiable approximation to greedy decoding for
    sequence-to-sequence (seq2seq) models. By incorporating this approximation into
    the scheduled sampling training procedure (Bengio et al., 2015)–a well-known
    technique for correcting exposure bias–we introduce a new training objective
    that is continuous and differentiable everywhere and that can provide
    informative gradients near points where previous decoding decisions change
    their value. In addition, by using a related approximation, we demonstrate a
    similar approach to sampled-based training. Finally, we show that our approach
    outperforms cross-entropy training and scheduled sampling procedures in two
    sequence prediction tasks: named entity recognition and machine translation.

    Translating Neuralese

    Jacob Andreas, Anca Dragan, Dan Klein
    Comments: To appear in ACL 2017
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    Several approaches have recently been proposed for learning decentralized
    deep multiagent policies that coordinate via a differentiable communication
    channel. While these policies are effective for many tasks, interpretation of
    their induced communication strategies has remained a challenge. Here we
    propose to interpret agents’ messages by translating them. Unlike in typical
    machine translation problems, we have no parallel data to learn from. Instead
    we develop a translation model based on the insight that agent messages and
    natural language strings mean the same thing if they induce the same belief
    about the world in a listener. We present theoretical guarantees and empirical
    evidence that our approach preserves both the semantics and pragmatics of
    messages by ensuring that players communicating through a translation layer do
    not suffer a substantial loss in reward relative to players with a common
    language.

    Naturalizing a Programming Language via Interactive Learning

    Sida I. Wang, Samuel Ginn, Percy Liang, Christoper D. Manning
    Comments: 10 pages, ACL2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)

    Our goal is to create a convenient natural language interface for performing
    well-specified but complex actions such as analyzing data, manipulating text,
    and querying databases. However, existing natural language interfaces for such
    tasks are quite primitive compared to the power one wields with a programming
    language. To bridge this gap, we start with a core programming language and
    allow users to “naturalize” the core language incrementally by defining
    alternative, more natural syntax and increasingly complex concepts in terms of
    compositions of simpler ones. In a voxel world, we show that a community of
    users can simultaneously teach a common system a diverse language and use it to
    build hundreds of complex voxel structures. Over the course of three days,
    these users went from using only the core language to using the naturalized
    language in 85.9\% of the last 10K utterances.

    A* CCG Parsing with a Supertag and Dependency Factored Model

    Masashi Yoshikawa, Hiroshi Noji, Yuji Matsumoto
    Comments: long paper (11 pages) accepted to ACL 2017
    Subjects: Computation and Language (cs.CL)

    We propose a new A* CCG parsing model in which the probability of a tree is
    decomposed into factors of CCG categories and its syntactic dependencies both
    defined on bi-directional LSTMs. Our factored model allows the precomputation
    of all probabilities and runs very efficiently, while modeling sentence
    structures explicitly via dependencies. Our model achieves the state-of-the-art
    results on English and Japanese CCG parsing.

    Adversarial Neural Machine Translation

    Lijun Wu, Yingce Xia, Li Zhao, Fei Tian, Tao Qin, Jianhuang Lai, Tie-Yan Liu
    Comments: In submission to IJCAI’17
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    In this paper, we study a new learning paradigm for Neural Machine
    Translation (NMT). Instead of maximizing the likelihood of the human
    translation as in previous works, we minimize the distinction between human
    translation and the translation given by a NMT model. To achieve this goal,
    inspired by the recent success of generative adversarial networks (GANs), we
    employ an adversarial training architecture and name it as Adversarial-NMT. In
    Adversarial-NMT, the training of the NMT model is assisted by an adversary,
    which is an elaborately designed Convolutional Neural Network (CNN). The goal
    of the adversary is to differentiate the translation result generated by the
    NMT model from that by human. The goal of the NMT model is to produce high
    quality translations so as to cheat the adversary. A policy gradient method is
    leveraged to co-train the NMT model and the adversary. Experimental results on
    English(
    ightarrow)French and German(
    ightarrow)English translation tasks
    show that Adversarial-NMT can achieve significantly better translation quality
    than several strong baselines.

    Neural Machine Translation via Binary Code Prediction

    Yusuke Oda, Philip Arthur, Graham Neubig, Koichiro Yoshino, Satoshi Nakamura
    Comments: Accepted as a long paper at ACL2017
    Subjects: Computation and Language (cs.CL)

    In this paper, we propose a new method for calculating the output layer in
    neural machine translation systems. The method is based on predicting a binary
    code for each word and can reduce computation time/memory requirements of the
    output layer to be logarithmic in vocabulary size in the best case. In
    addition, we also introduce two advanced approaches to improve the robustness
    of the proposed model: using error-correcting codes and combining softmax and
    binary codes. Experiments on two English-Japanese bidirectional translation
    tasks show proposed models achieve BLEU scores that approach the softmax, while
    reducing memory usage to the order of less than 1/10 and improving decoding
    speed on CPUs by x5 to x10.

    Learning weakly supervised multimodal phoneme embeddings

    Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Recent works have explored deep architectures for learning multimodal speech
    representation (e.g. audio and images, articulation and audio) in a supervised
    way. Here we investigate the role of combining different speech modalities,
    i.e. audio and visual information representing the lips movements, in a weakly
    supervised way using Siamese networks and lexical same-different side
    information. In particular, we ask whether one modality can benefit from the
    other to provide a richer representation for phone recognition in a weakly
    supervised setting. We introduce mono-task and multi-task methods for merging
    speech and visual modalities for phone recognition. The mono-task learning
    consists in applying a Siamese network on the concatenation of the two
    modalities, while the multi-task learning receives several different
    combinations of modalities at train time. We show that multi-task learning
    enhances discriminability for visual and multimodal inputs while minimally
    impacting auditory inputs. Furthermore, we present a qualitative analysis of
    the obtained phone embeddings, and show that cross-modal visual input can
    improve the discriminability of phonological features which are visually
    discernable (rounding, open/close, labial place of articulation), resulting in
    representations that are closer to abstract linguistic features than those
    based on audio only.

    Deep Keyphrase Generation

    Rui Meng, Sanqiang Zhao, Shuguang Han, Daqing He, Peter Brusilovsky, Yu Chi
    Comments: 11 pages. Accepted by ACL2017
    Subjects: Computation and Language (cs.CL)

    Keyphrase provides highly-summative information that can be effectively used
    for understanding, organizing and retrieving text content. Though previous
    studies have provided many workable solutions for automated keyphrase
    extraction, they commonly divided the to-be-summarized content into multiple
    text chunks, then ranked and selected the most meaningful ones. These
    approaches could neither identify keyphrases that do not appear in the text,
    nor capture the real semantic meaning behind the text. We propose a generative
    model for keyphrase prediction with an encoder-decoder framework, which can
    effectively overcome the above drawbacks. We name it as deep keyphrase
    generation since it attempts to capture the deep semantic meaning of the
    content with a deep learning method. Empirical analysis on six datasets
    demonstrates that our proposed model not only achieves a significant
    performance boost on extracting keyphrases that appear in the source text, but
    also can generate absent keyphrases based on the semantic meaning of the text.
    Code and dataset are available at this https URL

    Learning to Skim Text

    Adams Wei Yu, Hongrae Lee, Quoc V. Le
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Recurrent Neural Networks are showing much promise in many sub-areas of
    natural language processing, ranging from document classification to machine
    translation to automatic question answering. Despite their promise, many
    recurrent models have to read the whole text word by word, making it slow to
    handle long documents. For example, it is difficult to use a recurrent network
    to read a book and answer questions about it. In this paper, we present an
    approach of reading text while skipping irrelevant information if needed. The
    underlying model is a recurrent network that learns how far to jump after
    reading a few words of the input text. We employ a standard policy gradient
    method to train the model to make discrete jumping decisions. In our benchmarks
    on four different tasks, including number prediction, sentiment analysis, news
    article classification and automatic Q&A, our proposed model, a modified LSTM
    with jumping, is up to 6 times faster than the standard sequential LSTM, while
    maintaining the same or even better accuracy.

    Argument Mining with Structured SVMs and RNNs

    Vlad Niculae, Joonsuk Park, Claire Cardie
    Comments: Accepted for publication at ACL 2017. 11 pages, 5 figures. Code at this https URL and data at this http URL
    Subjects: Computation and Language (cs.CL)

    We propose a novel factor graph model for argument mining, designed for
    settings in which the argumentative relations in a document do not necessarily
    form a tree structure. (This is the case in over 20% of the web comments
    dataset we release.) Our model jointly learns elementary unit type
    classification and argumentative relation prediction. Moreover, our model
    supports SVM and RNN parametrizations, can enforce structure constraints (e.g.,
    transitivity), and can express dependencies between adjacent relations and
    propositions. Our approaches outperform unstructured baselines in both web
    comments and argumentative essay datasets.

    Deep Multitask Learning for Semantic Dependency Parsing

    Hao Peng, Sam Thomson, Noah A. Smith
    Comments: Proceedings of ACL 2017
    Subjects: Computation and Language (cs.CL)

    We present a deep neural architecture that parses sentences into three
    semantic dependency graph formalisms. By using efficient, nearly arc-factored
    inference and a bidirectional-LSTM composed with a multi-layer perceptron, our
    base system is able to significantly improve the state of the art for semantic
    dependency parsing, without using hand-engineered features or syntax. We then
    explore two multitask learning approaches—one that shares parameters across
    formalisms, and one that uses higher-order structures to predict the graphs
    jointly. We find that both approaches improve performance across formalisms on
    average, achieving a new state of the art. Our code is open-source and
    available at this https URL

    Affect-LM: A Neural Language Model for Customizable Affective Text Generation

    Sayan Ghosh, Mathieu Chollet, Eugene Laksana, Louis-Philippe Morency, Stefan Scherer
    Subjects: Computation and Language (cs.CL)

    Human verbal communication includes affective messages which are conveyed
    through use of emotionally colored words. There has been a lot of research in
    this direction but the problem of integrating state-of-the-art neural language
    models with affective information remains an area ripe for exploration. In this
    paper, we propose an extension to an LSTM (Long Short-Term Memory) language
    model for generating conversational text, conditioned on affect categories. Our
    proposed model, Affect-LM enables us to customize the degree of emotional
    content in generated sentences through an additional design parameter.
    Perception studies conducted using Amazon Mechanical Turk show that Affect-LM
    generates naturally looking emotional sentences without sacrificing grammatical
    correctness. Affect-LM also learns affect-discriminative word representations,
    and perplexity experiments show that additional affective information in
    conversational text can improve language model prediction.

    Medical Text Classification using Convolutional Neural Networks

    Mark Hughes, Irene Li, Spyros Kotoulas, Toyotaro Suzumura
    Subjects: Computation and Language (cs.CL)

    We present an approach to automatically classify clinical text at a sentence
    level. We are using deep convolutional neural networks to represent complex
    features. We train the network on a dataset providing a broad categorization of
    health information. Through a detailed evaluation, we demonstrate that our
    method outperforms several approaches widely used in natural language
    processing tasks by about 15%.

    Sarcasm SIGN: Interpreting Sarcasm with Sentiment Based Monolingual Machine Translation

    Lotem Peled, Roi Reichart
    Subjects: Computation and Language (cs.CL)

    Sarcasm is a form of speech in which speakers say the opposite of what they
    truly mean in order to convey a strong sentiment. In other words, “Sarcasm is
    the giant chasm between what I say, and the person who doesn’t get it.”. In
    this paper we present the novel task of sarcasm interpretation, defined as the
    generation of a non-sarcastic utterance conveying the same message as the
    original sarcastic one. We introduce a novel dataset of 3000 sarcastic tweets,
    each interpreted by five human judges. Addressing the task as monolingual
    machine translation (MT), we experiment with MT algorithms and evaluation
    measures. We then present SIGN: an MT based sarcasm interpretation algorithm
    that targets sentiment words, a defining element of textual sarcasm. We show
    that while the scores of n-gram based automatic measures are similar for all
    interpretation models, SIGN’s interpretations are scored higher by humans for
    adequacy and sentiment polarity. We conclude with a discussion on future
    research directions for our new task.

    Lexical Features in Coreference Resolution: To be Used With Caution

    Nafise Sadat Moosavi, Michael Strube
    Comments: 6 pages, ACL 2017
    Subjects: Computation and Language (cs.CL)

    Lexical features are a major source of information in state-of-the-art
    coreference resolvers. Lexical features implicitly model some of the linguistic
    phenomena at a fine granularity level. They are especially useful for
    representing the context of mentions. In this paper we investigate a drawback
    of using many lexical features in state-of-the-art coreference resolvers. We
    show that if coreference resolvers mainly rely on lexical features, they can
    hardly generalize to unseen domains. Furthermore, we show that the current
    coreference resolution evaluation is clearly flawed by only evaluating on a
    specific split of a specific dataset in which there is a notable overlap
    between the training, development and test sets.

    Improving Semantic Composition with Offset Inference

    Thomas Kober, Julie Weeds, Jeremy Reffin, David Weir
    Comments: to appear at ACL 2017 (short papers)
    Subjects: Computation and Language (cs.CL)

    Count-based distributional semantic models suffer from sparsity due to
    unobserved but plausible co-occurrences in any text collection. This problem is
    amplified for models like Anchored Packed Trees (APTs), that take the
    grammatical type of a co-occurrence into account. We therefore introduce a
    novel form of distributional inference that exploits the rich type structure in
    APTs and infers missing data by the same mechanism that is used for semantic
    composition.


    Distributed, Parallel, and Cluster Computing

    Beeping a Maximal Independent Set Fast

    Stephan Holzer, Nancy Lynch
    Comments: Full version of a brief announcement with the same title that appeared at the 30th International Symposium on Distributed Computing (DISC), Paris, France, September 2016. 17 pages
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    We adapt a recent algorithm by Ghaffari [SODA’16] for computing a Maximal
    Independent Set in the LOCAL model, so that it works in the significantly
    weaker BEEP model. For networks with maximum degree (Delta), our algorithm
    terminates locally within time (O((log Delta + log (1/epsilon)) cdot
    log(1/epsilon))), with probability at least (1 – epsilon). The key idea of
    the modification is to replace explicit messages about transmission
    probabilities with estimates based on the number of received messages.

    After the successful introduction (and implicit use) of local analysis, e.g.,
    by Barenboim et al. [JACM’16], Chung et al. [PODC’14], Ghaffari [SODA’16], and
    Halldorsson et al. [PODC’15], we study this concept in the BEEP model for the
    first time.

    By doing so, we improve over local bounds that are implicitly derived from
    previous work (that uses traditional global analysis) on computing a Maximal
    Independent Set in the eep model for a large range of values of the parameter
    (Delta). At the same time, we show that our algorithm in the eep model only
    needs to pay a (log(1/epsilon)) factor in the runtime compared to the best
    known MIS algorithm in the much more powerful local model. We demonstrate that
    this overhead is negligible, as communication via beeps can be implemented
    using significantly less resources than communication in the LOCAL model. In
    particular, when looking at implementing these models, one round of the local
    model needs at least (O(Delta)) time units, while one round in the BEEP model
    needs (O(logDelta)) time units, an improvement that diminishes the loss of a
    (log(1/epsilon)) factor in most settings.

    Exploring compression techniques for ROOT IO

    Zhe Zhang, Brian Bockelman
    Comments: Proceedings for 22nd International Conference on Computing in High Energy and Nuclear Physics (CHEP 2016)
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    ROOT provides an flexible format used throughout the HEP community. The
    number of use cases – from an archival data format to end-stage analysis – has
    required a number of tradeoffs to be exposed to the user. For example, a high
    “compression level” in the traditional DEFLATE algorithm will result in a
    smaller file (saving disk space) at the cost of slower decompression (costing
    CPU time when read). At the scale of the LHC experiment, poor design choices
    can result in terabytes of wasted space or wasted CPU time. We explore and
    attempt to quantify some of these tradeoffs. Specifically, we explore: the use
    of alternate compressing algorithms to optimize for read performance; an
    alternate method of compressing individual events to allow efficient random
    access; and a new approach to whole-file compression. Quantitative results are
    given, as well as guidance on how to make compression decisions for different
    use cases.

    Extreme-Scale Block-Structured Adaptive Mesh Refinement

    Florian Schornbaum, Ulrich Rüde
    Comments: 38 pages, 17 figures, 11 tables
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In this article, we present a novel approach for block-structured adaptive
    mesh refinement (AMR) that is suitable for extreme-scale parallelism. All data
    structures are designed such that the size of the meta data in each distributed
    processor memory remains bounded independent of the processor number. In all
    stages of the AMR process, we use only distributed algorithms. No central
    resources such as a master process or replicated data are employed, so that an
    unlimited scalability can be achieved. For the dynamic load balancing in
    particular, we propose to exploit the hierarchical nature of the
    block-structured domain partitioning by creating a lightweight, temporary copy
    of the core data structure. This copy acts as a local and fully distributed
    proxy data structure. It does not contain simulation data, but only provides
    topological information about the domain partitioning into blocks. Ultimately,
    this approach enables an inexpensive, local, diffusion-based dynamic load
    balancing scheme.

    We demonstrate the excellent performance and the full scalability of our new
    AMR implementation for two architecturally different petascale supercomputers.
    Benchmarks on an IBM Blue Gene/Q system with a mesh containing 3.7 trillion
    unknowns distributed to 458,752 processes confirm the applicability for future
    extreme-scale parallel machines. The algorithms proposed in this article
    operate on blocks that result from the domain partitioning. This concept and
    its realization support the storage of arbitrary data. In consequence, the
    software framework can be used for different simulation methods, including mesh
    based and meshless methods. In this article, we demonstrate fluid simulations
    based on the lattice Boltzmann method.

    Towards Distributed Machine Learning in Shared Clusters: A Dynamically-Partitioned Approach

    Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong, Shengen Yan
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Many cluster management systems (CMSs) have been proposed to share a single
    cluster with multiple distributed computing systems. However, none of the
    existing approaches can handle distributed machine learning (ML) workloads
    given the following criteria: high resource utilization, fair resource
    allocation and low sharing overhead. To solve this problem, we propose a new
    CMS named Dorm, incorporating a dynamically-partitioned cluster management
    mechanism and an utilization-fairness optimizer. Specifically, Dorm uses the
    container-based virtualization technique to partition a cluster, runs one
    application per partition, and can dynamically resize each partition at
    application runtime for resource efficiency and fairness. Each application
    directly launches its tasks on the assigned partition without petitioning for
    resources frequently, so Dorm imposes flat sharing overhead. Extensive
    performance evaluations showed that Dorm could simultaneously increase the
    resource utilization by a factor of up to 2.32, reduce the fairness loss by a
    factor of up to 1.52, and speed up popular distributed ML applications by a
    factor of up to 2.72, compared to existing approaches. Dorm’s sharing overhead
    is less than 5% in most cases.

    Scaling Reliably: Improving the Scalability of the Erlang Distributed Actor Platform

    Phil Trinder, Natalia Chechina, Nikolaos Papaspyrou, Konstantinos Sagonas, Simon Thompson, Stephen Adams, Stavros Aronis, Robert Baker, Eva Bihari, Olivier Boudeville, Francesco Cesarini, Maurizio Di Stefano, Sverker Eriksson, Viktoria Fordos, Amir Ghaffari, Aggelos Giantsios, Rickard Green, Csaba Hoch, David Klaftenegger, Huiqing Li, Kenneth Lundin, Kenneth Mackenzie, Katerina Roukounaki, Yiannis Tsiouris, Kjell Winblad
    Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC)

    Distributed actor languages are an effective means of constructing scalable
    reliable systems, and the Erlang programming language has a well-established
    and influential model. While Erlang model conceptually provides reliable
    scalability, it has some inherent scalability limits and these force developers
    to depart from the model at scale. This article establishes the scalability
    limits of Erlang systems, and reports the work to improve the language
    scalability.

    We systematically study the scalability limits of Erlang and address the
    issues at the virtual machine (VM), language, and tool levels. More
    specifically: (1) We have evolved the Erlang VM so that it can work effectively
    in large scale single-host multicore and NUMA architectures. We have made
    important architectural improvements to the Erlang/OTP. (2) We have designed
    and implemented Scalable Distributed (SD) Erlang libraries to address
    language-level scalability issues, and provided and validated a set of
    semantics for the new language constructs. (3) To make large Erlang systems
    easier to deploy, monitor, and debug we have developed and made open source
    releases of five complementary tools, some specific to SD Erlang.

    Throughout the article we use two case studies to investigate the
    capabilities of our new technologies and tools: a distributed hash table based
    Orbit calculation and Ant Colony Optimisation (ACO). Chaos Monkey experiments
    show that two versions of ACO survive random process failure and hence that SD
    Erlang preserves the Erlang reliability model. Even for programs with no global
    recovery data to maintain, SD Erlang partitions the network to reduce network
    traffic and hence improves performance of the Orbit and ACO benchmarks above 80
    hosts. ACO measurements show that maintaining global recovery data dramatically
    limits scalability; however scalability is recovered by partitioning the
    recovery data.

    Dependent Session Types

    Hanwen Wu, Hongwei Xi
    Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC)

    Session types offer a type-based discipline for enforcing communication
    protocols in distributed programming. We have previously formalized simple
    session types in the setting of multi-threaded (lambda)-calculus with linear
    types. In this work, we build upon our earlier work by presenting a form of
    dependent session types (of DML-style). The type system we formulate provides
    linearity and duality guarantees with no need for any runtime checks or special
    encodings. Our formulation of dependent session types is the first of its kind,
    and it is particularly suitable for practical implementation. As an example, we
    describe one implementation written in ATS that compiles to an Erlang/Elixir
    back-end.

    Proactive Edge Computing in Latency-Constrained Fog Networks

    Mohammed S. Elbamby, Mehdi Bennis, Walid Saad
    Comments: 6 pages, 5 figures, accepted in EuCNC 2017
    Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

    In this paper, the fundamental problem of distribution and proactive caching
    of computing tasks in fog networks is studied under latency and reliability
    constraints. In the proposed scenario, computing can be executed either locally
    at the user device or offloaded to an edge cloudlet. Moreover, cloudlets
    exploit both their computing and storage capabilities by proactively caching
    popular task computation results to minimize computing latency. To this end, a
    clustering method to group spatially proximate user devices with mutual task
    popularity interests and their serving cloudlets is proposed. Then, cloudlets
    can proactively cache the popular tasks’ computations of their cluster members
    to minimize computing latency. Additionally, the problem of distributing tasks
    to cloudlets is formulated as a matching game in which a cost function of
    computing delay is minimized under latency and reliability constraints.
    Simulation results show that the proposed scheme guarantees reliable
    computations with bounded latency and achieves up to 91% decrease in computing
    latency as compared to baseline schemes.

    Complexity Analysis of the Parallel Guided Ejection Search for the Pickup and Delivery Problem with Time Windows

    Miroslaw Blocho, Jakub Nalepa
    Comments: 4 pages, presented at the Work in Progress Session at PDP 2017
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

    This paper presents the pessimistic time complexity analysis of the parallel
    algorithm for minimizing the fleet size in the pickup and delivery problem with
    time windows. We show how to estimate the pessimistic complexity step by step.
    This approach can be easily adopted to other parallel algorithms for solving
    complex transportation problems.


    Learning

    An Aposteriorical Clusterability Criterion for (k)-Means and Simplicity of Clustering

    Mieczysław A. Kłopotek
    Comments: 37 pages
    Subjects: Learning (cs.LG)

    We define the notion of a well-clusterable data set from the point of view of
    the objective of (k)-means clustering algorithm and common sense. The novelty
    introduced here is that one can a posteriori (after running (k)-means) check if
    the data set is well-clusterable or not.

    k-FFNN: A priori knowledge infused Feed-forward Neural Networks

    Sri Harsha Dumpala, Rupayan Chakraborty, Sunil Kumar Kopparapu
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recurrent neural network (RNN) are being extensively used over feed-forward
    neural networks (FFNN) because of their inherent capability to capture temporal
    relationships that exist in the sequential data such as speech. This aspect of
    RNN is advantageous especially when there is no a priori knowledge about the
    temporal correlations within the data. However, RNNs require large amount of
    data to learn these temporal correlations, limiting their advantage in low
    resource scenarios. It is not immediately clear (a) how a priori temporal
    knowledge can be used in a FFNN architecture (b) how a FFNN performs when
    provided with this knowledge about temporal correlations (assuming available)
    during training. The objective of this paper is to explore k-FFNN, namely a
    FFNN architecture that can incorporate the a priori knowledge of the temporal
    relationships within the data sequence during training and compare k-FFNN
    performance with RNN in a low resource scenario. We evaluate the performance of
    k-FFNN and RNN by extensive experimentation on MediaEval 2016 audio data
    (“Emotional Impact of Movies” task). Experimental results show that the
    performance of k-FFNN is comparable to RNN, and in some scenarios k-FFNN
    performs better than RNN when temporal knowledge is injected into FFNN
    architecture. The main contributions of this paper are (a) fusing a priori
    knowledge into FFNN architecture to construct a k-FFNN and (b) analyzing the
    performance of k-FFNN with respect to RNN for different size of training data.

    Probabilistic Vehicle Trajectory Prediction over Occupancy Grid Map via Recurrent Neural Network

    ByeoungDo Kim, Chang Mook Kang, Seung Hi Lee, Hyunmin Chae, Jaekyum Kim, Chung Choo Chung, Jun Won Choi
    Subjects: Learning (cs.LG)

    In this paper, we propose an efficient vehicle trajectory prediction
    framework based on recurrent neural network. Basically, the characteristic of
    the vehicle’s trajectory is different from that of regular moving objects since
    it is affected by various latent factors including road structure, traffic
    rules, and driver’s intention. Previous state of the art approaches use
    sophisticated vehicle behavior model describing these factors and derive the
    complex trajectory prediction algorithm, which requires a system designer to
    conduct intensive model optimization for practical use. Our approach is
    data-driven and simple to use in that it learns complex behavior of the
    vehicles from the massive amount of trajectory data through deep neural network
    model. The proposed trajectory prediction method employs the recurrent neural
    network called long short-term memory (LSTM) to analyze the temporal behavior
    and predict the future coordinate of the surrounding vehicles. The proposed
    scheme feeds the sequence of vehicles’ coordinates obtained from sensor
    measurements to the LSTM and produces the probabilistic information on the
    future location of the vehicles over occupancy grid map. The experiments
    conducted using the data collected from highway driving show that the proposed
    method can produce reasonably good estimate of future trajectory.

    Misspecified Linear Bandits

    Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan
    Comments: Thirty-First AAAI Conference on Artificial Intelligence, 2017
    Subjects: Learning (cs.LG)

    We consider the problem of online learning in misspecified linear stochastic
    multi-armed bandit problems. Regret guarantees for state-of-the-art linear
    bandit algorithms such as Optimism in the Face of Uncertainty Linear bandit
    (OFUL) hold under the assumption that the arms expected rewards are perfectly
    linear in their features. It is, however, of interest to investigate the impact
    of potential misspecification in linear bandit models, where the expected
    rewards are perturbed away from the linear subspace determined by the arms
    features. Although OFUL has recently been shown to be robust to relatively
    small deviations from linearity, we show that any linear bandit algorithm that
    enjoys optimal regret performance in the perfectly linear setting (e.g., OFUL)
    must suffer linear regret under a sparse additive perturbation of the linear
    model. In an attempt to overcome this negative result, we define a natural
    class of bandit models characterized by a non-sparse deviation from linearity.
    We argue that the OFUL algorithm can fail to achieve sublinear regret even
    under models that have non-sparse deviation.We finally develop a novel bandit
    algorithm, comprising a hypothesis test for linearity followed by a decision to
    use either the OFUL or Upper Confidence Bound (UCB) algorithm. For perfectly
    linear bandit models, the algorithm provably exhibits OFULs favorable regret
    performance, while for misspecified models satisfying the non-sparse deviation
    property, the algorithm avoids the linear regret phenomenon and falls back on
    UCBs sublinear regret scaling. Numerical experiments on synthetic data, and on
    recommendation data from the public Yahoo! Learning to Rank Challenge dataset,
    empirically support our findings.

    Testing from One Sample: Is the casino really using a riffle shuffle?

    Constantinos Daskalakis, Nishanth Dikkala, Nick Gravin
    Comments: 35 pages, 5 figures
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS)

    Classical distribution testing assumes access to i.i.d. samples from the
    distributions that are being tested. We initiate the study of Markov chain
    testing, assuming access to a single sample from the Markov Chains that are
    being tested. In particular, we get to observe a single trajectory X_0 ,…,X_t
    ,… of an unknown Markov Chain M, for which we do not even get to control the
    distribution of the starting state X_0 . Our goal is to test whether M is
    identical to a model Markov Chain M_0 . In the first part of the paper, we
    propose a measure of difference between two Markov chains, which captures the
    scaling behavior of the total variation distance between words sampled from the
    Markov chains as the length of these words grows. We provide efficient and
    sample near- optimal testers for identity testing under our proposed measure of
    difference. In the second part of the paper, we study Markov chains whose state
    space is exponential in their description, providing testers for testing
    identity of card shuffles. We apply our results to testing the validity of the
    Gilbert, Shannon, and Reeds model for the riffle shuffle.

    Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks

    Federico Monti, Michael M. Bronstein, Xavier Bresson
    Subjects: Learning (cs.LG); Information Retrieval (cs.IR); Numerical Analysis (cs.NA); Machine Learning (stat.ML)

    Matrix completion models are among the most common formulations of
    recommender systems. Recent works have showed a boost of performance of these
    techniques when introducing the pairwise relationships between users/items in
    the form of graphs, and imposing smoothness priors on these graphs. However,
    such techniques do not fully exploit the local stationarity structures of
    user/item graphs, and the number of parameters to learn is linear w.r.t. the
    number of users and items. We propose a novel approach to overcome these
    limitations by using geometric deep learning on graphs. Our matrix completion
    architecture combines graph convolutional neural networks and recurrent neural
    networks to learn meaningful statistical graph-structured patterns and the
    non-linear diffusion process that generates the known ratings. This neural
    network system requires a constant number of parameters independent of the
    matrix size. We apply our method on both synthetic and real datasets, showing
    that it outperforms state-of-the-art techniques.

    Risk Minimization Framework for Multiple Instance Learning from Positive and Unlabeled Bags

    Han Bao, Tomoya Sakai, Issei Sato, Masashi Sugiyama
    Subjects: Learning (cs.LG)

    Multiple instance learning (MIL) is a variation of traditional supervised
    learning problems where data (referred to as bags) are composed of sub-elements
    (referred to as instances) and only bag labels are available. MIL has a variety
    of applications such as content-based image retrieval, text categorization and
    medical diagnosis. Most of the previous work for MIL assume that the training
    bags are fully labeled. However, it is often difficult to obtain an enough
    number of labeled bags in practical situations, while many unlabeled bags are
    available. A learning framework called PU learning (positive and unlabeled
    learning) can address this problem. In this paper, we propose a convex PU
    learning method to solve an MIL problem. We experimentally show that the
    proposed method achieves better performance with significantly lower
    computational costs than an existing method for PU-MIL.

    Robust, Deep and Inductive Anomaly Detection

    Raghavendra Chalapathy (University of Sydney and Capital Markets Cooperative Research Centre (CMCRC)), Aditya Krishna Menon (Data61/CSIRO and the Australian National University), Sanjay Chawla (Qatar Computing Research Institute, HBKU)
    Comments: Submitted for review ECML PKDD 2017 Skopje, Macedonia 18-22 September the European Conference On Machine Learning & Principles and Practice of Knowledge Discovery
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    PCA is a classical statistical technique whose simplicity and maturity has
    seen it find widespread use as an anomaly detection technique. However, it is
    limited in this regard by being sensitive to gross perturbations of the input,
    and by seeking a linear subspace that captures normal behaviour. The first
    issue has been dealt with by robust PCA, a variant of PCA that explicitly
    allows for some data points to be arbitrarily corrupted, however, this does not
    resolve the second issue, and indeed introduces the new issue that one can no
    longer inductively find anomalies on a test set. This paper addresses both
    issues in a single model, the robust autoencoder. This method learns a
    nonlinear subspace that captures the majority of data points, while allowing
    for some data to have arbitrary corruption. The model is simple to train and
    leverages recent advances in the optimisation of deep neural networks.
    Experiments on a range of real-world datasets highlight the model’s
    effectiveness.

    Batch-Expansion Training: An Efficient Optimization Paradigm for Machine Learning

    Michal Derezinski, Dhruv Mahajan, S. Sathiya Keerthi, S. V. N. Vishwanathan, Markus Weimer
    Subjects: Learning (cs.LG)

    We propose Batch-Expansion Training (BET), a framework for running a batch
    optimizer on a gradually expanding dataset. As opposed to stochastic
    approaches, batches do not need to be resampled i.i.d. at every iteration, thus
    making BET more resource efficient in a distributed setting, and when
    disk-access is constrained. Moreover, BET can be easily paired with most batch
    optimizers, does not require any parameter-tuning, and compares favorably to
    existing stochastic and batch methods. We show that when the batch size grows
    exponentially with the number of outer iterations, BET achieves optimal
    ( ilde{O}(1/epsilon)) data-access convergence rate for strongly convex
    objectives.

    Feature selection algorithm based on Catastrophe model to improve the performance of regression analysis

    Mahdi Zarei
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    In this paper we introduce a new feature selection algorithm to remove the
    irrelevant or redundant features in the data sets. In this algorithm the
    importance of a feature is based on its fitting to the Catastrophe model.
    Akaike information crite- rion value is used for ranking the features in the
    data set. The proposed algorithm is compared with well-known RELIEF feature
    selection algorithm. Breast Cancer, Parkinson Telemonitoring data and Slice
    locality data sets are used to evaluate the model.

    A Saddle Point Approach to Structured Low-rank Matrix Learning in Large-scale Applications

    Pratik Jawanpuria, Bamdev Mishra
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We propose a novel optimization approach for learning a low-rank matrix which
    is also constrained to be in a given linear subspace. Low-rank constraints are
    regularly employed in applications such as recommender systems and multi-task
    learning. In addition, several system identification problems require a
    learning matrix with both low-rank and linear subspace constraints. We model
    the classical nuclear norm regularized formulation as an equivalent saddle
    point minimax problem. This decouples the low-rank and the linear subspace
    constraints onto separate factors. Motivated by large-scale problems, we
    reformulate the minimax problem via a rank constrained non-convex surrogate.
    This translates into an optimization problem on the Riemannian spectrahedron
    manifold. We exploit the Riemannian structure to propose efficient first- and
    second-order algorithms. The duality theory allows to compute the duality gap
    for a candidate solution and our approach easily accommodates popular
    non-smooth loss functions, e.g., the absolute-value loss. We effortlessly scale
    on the Netflix data set on both matrix completion and robust matrix completion
    problems, obtaining state-of-the-art generalization performance. Additionally,
    we demonstrate the efficacy of our approach in Hankel matrix learning and
    multi-task learning problems.

    Joint Modeling of Text and Acoustic-Prosodic Cues for Neural Parsing

    Trang Tran, Shubham Toshniwal, Mohit Bansal, Kevin Gimpel, Karen Livescu, Mari Ostendorf
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Sound (cs.SD)

    In conversational speech, the acoustic signal provides cues that help
    listeners disambiguate difficult parses. For automatically parsing a spoken
    utterance, we introduce a model that integrates transcribed text and
    acoustic-prosodic features using a convolutional neural network over energy and
    pitch trajectories coupled with an attention-based recurrent neural network
    that accepts text and word-based prosodic features. We find that different
    types of acoustic-prosodic features are individually helpful, and together
    improve parse F1 scores significantly over a strong text-only baseline. For
    this study with known sentence boundaries, error analysis shows that the main
    benefit of acoustic-prosodic features is in sentences with disfluencies and
    that attachment errors are most improved.

    Learning from Comparisons and Choices

    Sahand Negahban, Sewoong Oh, Kiran K. Thekumparampil, Jiaming Xu
    Comments: 64 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:1506.07947
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    When tracking user-specific online activities, each user’s preference is
    revealed in the form of choices and comparisons. For example, a user’s purchase
    history tracks her choices, i.e. which item was chosen among a subset of
    offerings. A user’s comparisons are observed either explicitly as in movie
    ratings or implicitly as in viewing times of news articles. Given such
    individualized ordinal data, we address the problem of collaboratively learning
    representations of the users and the items. The learned features can be used to
    predict a user’s preference of an unseen item to be used in recommendation
    systems. This also allows one to compute similarities among users and items to
    be used for categorization and search. Motivated by the empirical successes of
    the MultiNomial Logit (MNL) model in marketing and transportation, and also
    more recent successes in word embedding and crowdsourced image embedding, we
    pose this problem as learning the MNL model parameters that best explains the
    data. We propose a convex optimization for learning the MNL model, and show
    that it is minimax optimal up to a logarithmic factor by comparing its
    performance to a fundamental lower bound. This characterizes the minimax sample
    complexity of the problem, and proves that the proposed estimator cannot be
    improved upon other than by a logarithmic factor. Further, the analysis
    identifies how the accuracy depends on the topology of sampling via the
    spectrum of the sampling graph. This provides a guideline for designing surveys
    when one can choose which items are to be compared. This is accompanies by
    numerical simulations on synthetic and real datasets confirming our theoretical
    predictions.

    Reinforcement Learning Based Dynamic Selection of Auxiliary Objectives with Preserving of the Best Found Solution

    Irina Petrova, Arina Buzdalova
    Comments: this is a full version of a paper which has been accepted as a student workshop paper to GECCO conference 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Efficiency of single-objective optimization can be improved by introducing
    some auxiliary objectives. Ideally, auxiliary objectives should be helpful.
    However, in practice, objectives may be efficient on some optimization stages
    but obstructive on others. In this paper we propose a modification of the EA+RL
    method which dynamically selects optimized objectives using reinforcement
    learning. The proposed modification prevents from losing the best found
    solution. We analysed the proposed modification and compared it with the EA+RL
    method and Random Local Search on XdivK, Generalized OneMax and LeadingOnes
    problems. The proposed modification outperforms the EA+RL method on all problem
    instances. It also outperforms the single objective approach on the most
    problem instances. We also provide detailed analysis of how different
    components of the considered algorithms influence efficiency of optimization.
    In addition, we present theoretical analysis of the proposed modification on
    the XdivK problem.

    Semi-supervised Multitask Learning for Sequence Labeling

    Marek Rei
    Comments: ACL 2017
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose a sequence labeling framework with a secondary training objective,
    learning to predict surrounding words for every word in the dataset. This
    language modeling objective incentivises the system to learn general-purpose
    patterns of semantic and syntactic composition, which are also useful for
    improving accuracy on different sequence labeling tasks. The architecture was
    evaluated on a range of datasets, covering the tasks of error detection in
    learner texts, named entity recognition, chunking and POS-tagging. The novel
    language modeling objective provided consistent performance improvements on
    every benchmark, without requiring any additional annotated or unannotated
    data.

    A Neural Network model with Bidirectional Whitening

    Yuki Fujimoto, Toru Ohira
    Comments: 16pages
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We present here a new model and algorithm which performs an efficient Natural
    gradient descent for Multilayer Perceptrons. Natural gradient descent was
    originally proposed from a point of view of information geometry, and it
    performs the steepest descent updates on manifolds in a Riemannian space. In
    particular, we extend an approach taken by the “Whitened neural networks”
    model. We make the whitening process not only in feed-forward direction as in
    the original model, but also in the back-propagation phase. Its efficacy is
    shown by an application of this “Bidirectional whitened neural networks” model
    to a handwritten character recognition data (MNIST data).

    Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets

    Wei-Lun Chao, Hexiang Hu, Fei Sha
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Visual question answering (QA) has attracted a lot of attention lately, seen
    essentially as a form of (visual) Turing test that artificial intelligence
    should strive to achieve. In this paper, we study a crucial component of this
    task: how can we design good datasets for the task? We focus on the design of
    multiple-choice based datasets where the learner has to select the right answer
    from a set of candidate ones including the target (i.e. the correct one) and
    the decoys (i.e. the incorrect ones). Through careful analysis of the results
    attained by state-of-the-art learning models and human annotators on existing
    datasets, we show the design of the decoy answers has a significant impact on
    how and what the learning models learn from the datasets. In particular, the
    resulting learner can ignore the visual information, the question, or the both
    while still doing well on the task. Inspired by this, we propose automatic
    procedures to remedy such design deficiencies. We apply the procedures to
    re-construct decoy answers for two popular visual QA datasets as well as to
    create a new visual QA dataset from the Visual Genome project, resulting in the
    largest dataset for this task. Extensive empirical studies show that the design
    deficiencies have been alleviated in the remedied datasets and the performance
    on them is likely a more faithful indicator of the difference among learning
    models. The datasets are released and publicly available via
    this http URL

    Diffusion geometry unravels the emergence of functional clusters in collective phenomena

    Manlio De Domenico
    Comments: 9 pages, 7 figures
    Journal-ref: Phys. Rev. Lett. 118, 168301 (2017)
    Subjects: Physics and Society (physics.soc-ph); Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG); Social and Information Networks (cs.SI)

    Collective phenomena emerge from the interaction of natural or artificial
    units with a complex organization. The interplay between structural patterns
    and dynamics might induce functional clusters that, in general, are different
    from topological ones. In biological systems, like the human brain, the overall
    functionality is often favored by the interplay between connectivity and
    synchronization dynamics, with functional clusters that do not coincide with
    anatomical modules in most cases. In social, socio-technical and engineering
    systems, the quest for consensus favors the emergence of clusters.

    Despite the unquestionable evidence for mesoscale organization of many
    complex systems and the heterogeneity of their inter-connectivity, a way to
    predict and identify the emergence of functional modules in collective
    phenomena continues to elude us. Here, we propose an approach based on random
    walk dynamics to define the diffusion distance between any pair of units in a
    networked system. Such a metric allows to exploit the underlying diffusion
    geometry to provide a unifying framework for the intimate relationship between
    metastable synchronization, consensus and random search dynamics in complex
    networks, pinpointing the functional mesoscale organization of synthetic and
    biological systems.

    Using Global Constraints and Reranking to Improve Cognates Detection

    Michael Bloodgood, Benjamin Strauss
    Comments: 10 pages, 6 figures, 6 tables; to appear in the Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    Global constraints and reranking have not been used in cognates detection
    research to date. We propose methods for using global constraints by performing
    rescoring of the score matrices produced by state of the art cognates detection
    systems. Using global constraints to perform rescoring is complementary to
    state of the art methods for performing cognates detection and results in
    significant performance improvements beyond current state of the art
    performance on publicly available datasets with different language pairs and
    various conditions such as different levels of baseline state of the art
    performance and different data size conditions, including with more realistic
    large data size conditions than have been evaluated with in the past.

    Differentiable Scheduled Sampling for Credit Assignment

    Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick
    Comments: Accepted at ACL2017 (this http URL)
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We demonstrate that a continuous relaxation of the argmax operation can be
    used to create a differentiable approximation to greedy decoding for
    sequence-to-sequence (seq2seq) models. By incorporating this approximation into
    the scheduled sampling training procedure (Bengio et al., 2015)–a well-known
    technique for correcting exposure bias–we introduce a new training objective
    that is continuous and differentiable everywhere and that can provide
    informative gradients near points where previous decoding decisions change
    their value. In addition, by using a related approximation, we demonstrate a
    similar approach to sampled-based training. Finally, we show that our approach
    outperforms cross-entropy training and scheduled sampling procedures in two
    sequence prediction tasks: named entity recognition and machine translation.

    Naturalizing a Programming Language via Interactive Learning

    Sida I. Wang, Samuel Ginn, Percy Liang, Christoper D. Manning
    Comments: 10 pages, ACL2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)

    Our goal is to create a convenient natural language interface for performing
    well-specified but complex actions such as analyzing data, manipulating text,
    and querying databases. However, existing natural language interfaces for such
    tasks are quite primitive compared to the power one wields with a programming
    language. To bridge this gap, we start with a core programming language and
    allow users to “naturalize” the core language incrementally by defining
    alternative, more natural syntax and increasingly complex concepts in terms of
    compositions of simpler ones. In a voxel world, we show that a community of
    users can simultaneously teach a common system a diverse language and use it to
    build hundreds of complex voxel structures. Over the course of three days,
    these users went from using only the core language to using the naturalized
    language in 85.9\% of the last 10K utterances.

    Adversarial Neural Machine Translation

    Lijun Wu, Yingce Xia, Li Zhao, Fei Tian, Tao Qin, Jianhuang Lai, Tie-Yan Liu
    Comments: In submission to IJCAI’17
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    In this paper, we study a new learning paradigm for Neural Machine
    Translation (NMT). Instead of maximizing the likelihood of the human
    translation as in previous works, we minimize the distinction between human
    translation and the translation given by a NMT model. To achieve this goal,
    inspired by the recent success of generative adversarial networks (GANs), we
    employ an adversarial training architecture and name it as Adversarial-NMT. In
    Adversarial-NMT, the training of the NMT model is assisted by an adversary,
    which is an elaborately designed Convolutional Neural Network (CNN). The goal
    of the adversary is to differentiate the translation result generated by the
    NMT model from that by human. The goal of the NMT model is to produce high
    quality translations so as to cheat the adversary. A policy gradient method is
    leveraged to co-train the NMT model and the adversary. Experimental results on
    English(
    ightarrow)French and German(
    ightarrow)English translation tasks
    show that Adversarial-NMT can achieve significantly better translation quality
    than several strong baselines.

    Learning weakly supervised multimodal phoneme embeddings

    Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Recent works have explored deep architectures for learning multimodal speech
    representation (e.g. audio and images, articulation and audio) in a supervised
    way. Here we investigate the role of combining different speech modalities,
    i.e. audio and visual information representing the lips movements, in a weakly
    supervised way using Siamese networks and lexical same-different side
    information. In particular, we ask whether one modality can benefit from the
    other to provide a richer representation for phone recognition in a weakly
    supervised setting. We introduce mono-task and multi-task methods for merging
    speech and visual modalities for phone recognition. The mono-task learning
    consists in applying a Siamese network on the concatenation of the two
    modalities, while the multi-task learning receives several different
    combinations of modalities at train time. We show that multi-task learning
    enhances discriminability for visual and multimodal inputs while minimally
    impacting auditory inputs. Furthermore, we present a qualitative analysis of
    the obtained phone embeddings, and show that cross-modal visual input can
    improve the discriminability of phonological features which are visually
    discernable (rounding, open/close, labial place of articulation), resulting in
    representations that are closer to abstract linguistic features than those
    based on audio only.

    A General Theory for Training Learning Machine

    Hong Zhao
    Comments: 55 pages, 18 figures. arXiv admin note: substantial text overlap with arXiv:1602.03950
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Though the deep learning is pushing the machine learning to a new stage,
    basic theories of machine learning are still limited. The principle of
    learning, the role of the a prior knowledge, the role of neuron bias, and the
    basis for choosing neural transfer function and cost function, etc., are still
    far from clear. In this paper, we present a general theoretical framework for
    machine learning. We classify the prior knowledge into common and
    problem-dependent parts, and consider that the aim of learning is to maximally
    incorporate them. The principle we suggested for maximizing the former is the
    design risk minimization principle, while the neural transfer function, the
    cost function, as well as pretreatment of samples, are endowed with the role
    for maximizing the latter. The role of the neuron bias is explained from a
    different angle. We develop a Monte Carlo algorithm to establish the
    input-output responses, and we control the input-output sensitivity of a
    learning machine by controlling that of individual neurons. Applications of
    function approaching and smoothing, pattern recognition and classification, are
    provided to illustrate how to train general learning machines based on our
    theory and algorithm. Our method may in addition induce new applications, such
    as the transductive inference.

    Learning to Skim Text

    Adams Wei Yu, Hongrae Lee, Quoc V. Le
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Recurrent Neural Networks are showing much promise in many sub-areas of
    natural language processing, ranging from document classification to machine
    translation to automatic question answering. Despite their promise, many
    recurrent models have to read the whole text word by word, making it slow to
    handle long documents. For example, it is difficult to use a recurrent network
    to read a book and answer questions about it. In this paper, we present an
    approach of reading text while skipping irrelevant information if needed. The
    underlying model is a recurrent network that learns how far to jump after
    reading a few words of the input text. We employ a standard policy gradient
    method to train the model to make discrete jumping decisions. In our benchmarks
    on four different tasks, including number prediction, sentiment analysis, news
    article classification and automatic Q&A, our proposed model, a modified LSTM
    with jumping, is up to 6 times faster than the standard sequential LSTM, while
    maintaining the same or even better accuracy.


    Information Theory

    The Structure of One Weight Linear and Cyclic Codes Over Z2^r x (Z2+uZ2)^s

    Ismail Aydogdu
    Subjects: Information Theory (cs.IT)

    Inspired by the Z2Z4-additive codes, linear codes over Z2^r x(Z2+uZ2)^s have
    been introduced by Aydogdu et al. more recently. Although these family of codes
    are similar to each other, linear codes over Z2^r x(Z2+uZ2)^s have some
    advantages compared to Z2Z4-additive codes. A code is called constant
    weight(one weight) if all the codewords have the same weight. It is well known
    that constant weight or one weight codes have many important applications. In
    this paper, we study the structure of one weight Z2Z2[u]-linear and cyclic
    codes. We classify these type of one weight codes and also give some
    illustrative examples.

    A Simple Proof of Fast Polarization

    Ido Tal
    Subjects: Information Theory (cs.IT)

    Fast polarization is an important and useful property of polar codes. It was
    proved for the binary polarizing (2 imes 2) kernel by Arikan and Telatar. The
    proof was later generalized by Sasoglu. We give a simplified proof.

    Z2Z4Z8-Cyclic Codes

    Ismail Aydogdu, Fatmanur Gursoy
    Subjects: Information Theory (cs.IT)

    In this paper we study Z2Z4Z8-additive codes, which are the extension of
    recently introduced Z2Z4-additive codes. We determine the standard forms of the
    generator and parity-check matrices of Z2Z4Z8-additive codes. Moreover, we
    investigate Z2Z4Z8-cyclic codes giving their generator polynomials and spanning
    sets. We also give some illustrative examples of both Z2Z4Z8-additive codes and
    Z2Z4Z8-cyclic codes.

    Robust Secure Transmission of Using Main-Lobe-Integration Based Leakage Beaforming in Directional Modulation MU-MIMO Systems

    Feng Shu, Wei Zhu, Xiangwei Zhou, Jun Li, Jinhui Lu
    Subjects: Information Theory (cs.IT)

    In the paper, we make an investigation of robust beamforming for secure
    directional modulation in the multi-user multiple-input and multiple output
    (MU-MIMO) systems in the presence of direction angle measurement errors. When
    statistical knowledge of direction angle measurement errors is unavailable, a
    novel robust beamforming scheme of combining main-lobe-integration (MLI) and
    leakage is proposed to simultaneously transmit multiple different independent
    parallel confidential message streams to the associated multiple distinct
    desired users. The proposed scheme includes two steps: designing the
    beamforming vectors of the useful confidential messages and constructing
    artificial noise (AN) projection matrix. Here, in its first step, the
    beamforming vectors for the useful confidential messages of desired user k are
    given by minimizing the useful signal power leakage from main-lobe of desired
    user k to the sum of main-lobes of the remaining desired directions plus
    main-lobes of all eavesdropper directions. In its second step, the AN
    projection matrix is constructed by simultaneously maximizing the AN power
    leakage to all eavesdropper directions such that all eavesdroppers are
    disrupted seriously, where AN is viewed by the transmitter as a useful signal
    for eavedroppers. Due to independent beamforming vectors for different desired
    users, a more secure transmission is achieved. Compared with conventional
    non-robust methods, the proposed method can provide a significant improvement
    in bit error rate along the desired directions and secrecy-sum-rate towards
    multiple desired users without needing statistical property or distribution of
    angle measurement errors.

    Regular Decomposition: an information and graph theoretic approach to stochastic block models

    Hannu Reittu, Fülöp Bazsó, Ilkka Norros
    Subjects: Information Theory (cs.IT)

    A method for compression of large graphs and non-negative matrices to a block
    structure is proposed. Szemer’edi’s regularity lemma is used as heuristic
    motivation of the significance of stochastic block models. Another ingredient
    of the method is Rissanen’s minimum description length principle (MDL). We
    propose practical algorithms and provide theoretical results on the accuracy of
    the method.

    Fast systematic encoding of multiplicity codes

    Nicholas Coxon (GRACE)
    Subjects: Information Theory (cs.IT); Computational Complexity (cs.CC); Cryptography and Security (cs.CR)

    We present quasi-linear time systematic encoding algorithms for multiplicity
    codes. The algorithms have their origins in the fast multivariate interpolation
    and evaluation algorithms of van der Hoeven and Schost (2013), which we
    generalise to address certain Hermite-type interpolation and evaluation
    problems. By providing fast encoding algorithms for multiplicity codes, we
    remove an obstruction on the road to the practical application of the private
    information retrieval protocol of Augot, Levy-dit-Vehel and Shikfa (2014).

    (H(X)) vs. (H(f(X)))

    Ferdinando Cicalese, Luisa Gargano, Ugo Vaccaro
    Comments: To appear in ISIT 2017
    Subjects: Information Theory (cs.IT)

    It is well known that the entropy (H(X)) of a finite random variable is
    always greater or equal to the entropy (H(f(X))) of a function (f) of (X), with
    equality if and only if (f) is one-to-one. In this paper, we give tights bounds
    on (H(f(X))) when the function (f) is not one-to-one, and we illustrate a few
    scenarios where this matters. As an intermediate step towards our main result,
    we prove a lower bound on the entropy of a probability distribution, when only
    a bound on the ratio between the maximum and the minimum probability is known.
    Our lower bound improves previous results in the literature, and it could find
    applications outside the present scenario.

    Network Slicing Based 5G and Future Mobile Networks: Mobility, Resource Management, and Challenges

    H. Zhang, N. Liu, X. Chu, K. Long, A. Aghvami, V. C. M. Leung
    Comments: to appear in IEEE Communications Magazine
    Subjects: Information Theory (cs.IT)

    The fifth-generation (5G) networks are expected to be able to satisfy users’
    different quality-of-service (QoS) requirements. Network slicing is a promising
    technology for 5G networks to provide services tailored for users’ specific QoS
    demands. Driven by the increased massive wireless data traffic from different
    application scenarios, efficient resource allocation schemes should be
    exploited to improve the flexibility of network resource allocation and
    capacity of 5G networks based on network slicing. Due to the diversity of 5G
    application scenarios, new mobility management schemes are greatly needed to
    guarantee seamless handover in network slicing based 5G systems. In this
    article, we introduce a logical architecture for network slicing based 5G
    systems, and present a scheme for managing mobility between different access
    networks, as well as a joint power and subchannel allocation scheme in
    spectrum-sharing two-tier systems based on network slicing, where both the
    co-tier interference and cross-tier interference are taken into account.
    Simulation results demonstrate that the proposed resource allocation scheme can
    flexibly allocate network resources between different slices in 5G systems.
    Finally, several open issues and challenges in network slicing based 5G
    networks are discussed, including network reconstruction, network slicing
    management and cooperation with other 5G technologies.

    Energy Efficient User Association and Power Allocation in Millimeter Wave Based Ultra Dense Networks with Energy Harvesting Base Stations

    H. Zhang, S. Huang, C. Jiang, K. Long, V. C. M. Leung, H. Vincent Poor
    Comments: to appear, IEEE Journal on Selected Areas in Communications, 2017
    Subjects: Information Theory (cs.IT)

    Millimeter wave (mmWave) communication technologies have recently emerged as
    an attractive solution to meet the exponentially increasing demand on mobile
    data traffic. Moreover, ultra dense networks (UDNs) combined with mmWave
    technology are expected to increase both energy efficiency and spectral
    efficiency. In this paper, user association and power allocation in mmWave
    based UDNs is considered with attention to load balance constraints, energy
    harvesting by base stations, user quality of service requirements, energy
    efficiency, and cross-tier interference limits. The joint user association and
    power optimization problem is modeled as a mixed-integer programming problem,
    which is then transformed into a convex optimization problem by relaxing the
    user association indicator and solved by Lagrangian dual decomposition. An
    iterative gradient user association and power allocation algorithm is proposed
    and shown to converge rapidly to an optimal point. The complexity of the
    proposed algorithm is analyzed and the effectiveness of the proposed scheme
    compared with existing methods is verified by simulations.

    Golden-Coded Index Coding

    Yu-Chih Huang, Yi Hong, Emanuele Viterbo
    Comments: 5 pages, 2 figures. Accepted for publication in 2017 IEEE ISIT
    Subjects: Information Theory (cs.IT)

    We study the problem of constructing good space-time codes for broadcasting
    (K) independent messages over a MIMO network to (L) users, where each user
    demands all the messages and already has a subset of messages as side
    information. As a first attempt, we consider the (2 imes 2) case and propose
    golden-coded index coding by partitioning the golden codes into (K) subcodes,
    one for each message. The proposed scheme is shown to have the property that
    for any side information configuration, the minimum determinant of the code
    increases exponentially with the amount of information contained in the side
    information.

    New Two-Stage Automorphism Group Decoders for Cyclic Codes in the Erasure Channel

    Chanki Kim, Jong-Seon No
    Subjects: Information Theory (cs.IT)

    Recently, error correcting codes in the erasure channel have drawn great
    attention for various applications such as distributed storage systems and
    wireless sensor networks, but many of their decoding algorithms are not
    practical because they have higher decoding complexity and longer delay. Thus,
    the automorphism group decoder (AGD) for cyclic codes in the erasure channel
    was introduced, which has good erasure decoding performance with low decoding
    complexity. In this paper, we propose new two-stage AGDs (TS-AGDs) for cyclic
    codes in the erasure channel by modifying the parity check matrix and
    introducing the preprocessing stage to the AGD scheme. The proposed TS-AGD has
    been analyzed for the perfect codes, BCH codes, and maximum distance separable
    (MDS) codes. Through numerical analysis, it is shown that the proposed decoding
    algorithm has good erasure decoding performance with lower decoding complexity
    and delay than the conventional AGD. For some cyclic codes, it is shown that
    the proposed TS-AGD achieves the perfect decoding in the erasure channel, that
    is, the same decoding performance as the maximum likelihood (ML) decoder. For
    MDS codes, TS-AGDs with the expanded parity check matrix and the submatrix
    inversion are also proposed and analyzed.

    Coherent multiple-antenna block-fading channels at finite blocklength

    Austin Collins, Yury Polyanskiy
    Subjects: Information Theory (cs.IT)

    In this paper we consider a channel model that is often used to describe the
    mobile wireless scenario: multiple-antenna additive white Gaussian noise
    channels subject to random (fading) gain with full channel state information at
    the receiver. Dynamics of the fading process are approximated by a
    piecewise-constant process (frequency non-selective isotropic block fading).
    This work addresses the finite blocklength fundamental limits of this channel
    model. Specifically, we give a formula for the channel dispersion — a quantity
    governing the delay required to achieve capacity. Multiplicative nature of the
    fading disturbance leads to a number of interesting technical difficulties that
    required us to enhance traditional methods for finding channel dispersion.
    Alas, one difficulty remains: the converse (impossibility) part of our result
    holds under an extra constraint on the growth of the peak-power with
    blocklength.

    Our results demonstrate, for example, that while capacities of (n_t imes
    n_r) and (n_r imes n_t) antenna configurations coincide (under fixed received
    power), the coding delay can be quite sensitive to this switch. For example, at
    the received SNR of (20) dB the (16 imes 100) system achieves capacity with
    codes of length (delay) which is only (60\%) of the length required for the
    (100 imes 16) system. Another interesting implication is that for the MISO
    channel, the dispersion-optimal coding schemes require employing orthogonal
    designs such as Alamouti’s scheme — a surprising observation considering the
    fact that Alamouti’s scheme was designed for reducing demodulation errors, not
    improving coding rate.

    Analyzing Large-Scale Multiuser Molecular Communication via 3D Stochastic Geometry

    Yansha Deng, Adam Noel, Weisi Guo, Arumugam Nallanathan, Maged Elkashlan
    Comments: arXiv admin note: text overlap with arXiv:1605.08311
    Subjects: Information Theory (cs.IT)

    Information delivery using chemical molecules is an integral part of biology
    at multiple distance scales and has attracted recent interest in bioengineering
    and communication theory. An under-explored area is the performance of
    large-scale molecular communications systems, where multiple transmitters are
    randomly distributed over space. The collective signal strength at the receiver
    (i.e., the expected number of observed molecules), resulting from a large
    number of transmitters at random distances (e.g., due to mobility), can have a
    major impact on the reliability and efficiency of a molecular communication
    system. Modelling the transmitters as a homogeneous Poisson point process
    capture the random spatial patterns of transmitters in the free diffusion fluid
    environment. Analyzing the collective signal from multiple diffusion sources
    can be computationally and analytically challenging. In this paper, we present
    the first tractable analytical model for the collective signal strength due to
    randomly-placed transmitters in a three-dimensional (3D) large-scale molecular
    communication system, either with or without degradation. By applying
    stochastic geometry, we derive analytical expressions for the expected number
    of observed molecules and the bit error probability at a fully absorbing
    receiver and a passive receiver under binary concentration shift keying. Our
    results reveal that the collective signal strength at both types of receivers
    increases proportionally with increasing transmitter density, and the minimum
    bit error probability can be improved by introducing molecule degradation. More
    importantly, the proposed framework dramatically simplifies the analysis, and
    can be generalized to the analysis of other types of receiver and other
    performance characteristics in large-scale molecular communication systems.

    Off-the-grid Two-Dimensional Line Spectral Estimation With Prior Information

    Iman Valiulahi, Hamid Fathi, Sajad Daei, Farzan Haddadi
    Subjects: Information Theory (cs.IT)

    In this paper, we provide a method to recover off-the-grid frequencies of a
    signal in two-dimensional (2-D) line spectral estimation. Most of the
    literature in this field focuses on the case in which the only information is
    spectral sparsity in a continuous domain and does not consider prior
    information. However, in many applications such as radar and sonar, one has
    extra information about the spectrum of the signal of interest. The common way
    of accommodating prior information is to use weighted atomic norm minimization.
    We present a new semidefinite program using the theory of positive
    trigonometric polynomials that incorporate this prior information into 2-D line
    spectral estimation. Specifically, we assume prior knowledge of 2-D frequency
    subbands in which signal frequency components are located. Our approach
    improves the recovery performance compared with the previous work that does not
    consider prior information. Through numerical experiments, we find out that the
    amount of this improvement depends on prior information we have about the
    locations of the true frequencies.

    Exploring Symmetry in Wireless Propagation Channels

    Ehab Salahat, Ahmed Kulaib, Nazar Ali, Raed Shubair
    Comments: Accepted in IEEE European Conference on Networks and Communications (EuCNC17), Oulu, Finland,12-15 Jun. 2017
    Subjects: Information Theory (cs.IT)

    Wireless communications literature is very rich with empirical studies and
    measurement campaigns that study the nature of the wireless propagation
    channel. However, despite their undoubted usefulness, many of these studies
    have omitted a fundamental yet key feature of the physical signal propagation,
    that is, wireless propagation asymmetry. This feature does not agree with the
    electromagnetic reciprocity theorem, and the many research papers that adopt
    wireless channel symmetry, and hence rendering their modeling, unexpectedly,
    inaccurate. Besides, asymmetry is unquestionably an important characteristic of
    wireless channels, which needs to be accurately characterized for
    vehicular/mobile communications, 5G networks, and associated applications such
    as indoor/outdoor localization. This paper presents a modest and a preliminary
    study that reports potential causes of propagation asymmetry. Measurements
    conducted on Khalifa University campus in UAE show that wireless channels are
    symmetric in the absence of symmetry impairments. Therefore, care should be
    taken when considering some practical wireless propagation scenarios. Key
    conclusions and recommendation are summarized. We believe that this study will
    be inspiring for the academic community and will trigger further investigations
    within wireless propagation assumptions.

    On the Trade-Off between Computational Load and Reliability for Network Function Virtualization

    Jinkyu Kang, Osvaldo Simeone, Joonhyuk Kang
    Comments: It is accepted for publications on IEEE Communications Letters
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    Network Function Virtualization (NFV) enables the “softwarization” of network
    functions, which are implemented on virtual machines hosted on Commercial
    off-the-shelf (COTS) servers. Both the composition of the virtual network
    functions (VNFs) into a forwarding graph (FG) at the logical layer and the
    embedding of the FG on the servers need to take into account the
    less-than-carrier-grade reliability of COTS components. This work investigates
    the trade-off between end-to-end reliability and computational load per server
    via the joint design of VNF chain composition (CC) and FG embedding (FGE) under
    the assumption of a bipartite FG that consists of controller and regular VNFs.
    Evaluating the reliability criterion within a probabilistic model, analytical
    insights are first provided for a simplified disconnected FG. Then, a block
    coordinate descent method based on mixed-integer linear programming is proposed
    to tackle the joint optimization of CC and FGE. Via simulation results, it is
    observed that a joint design of CC and FGE leads to substantial performance
    gains compared to separate optimization approaches.

    A general private information retrieval scheme for MDS coded databases with colluding servers

    Yiwei Zhang, Gennian Ge
    Comments: Submitted to IEEE Transactions on Information Theory
    Subjects: Information Theory (cs.IT)

    The problem of private information retrieval gets renewed attentions in
    recent years due to its information-theoretic reformulation and applications in
    distributed storage systems. PIR capacity is the maximal number of bits
    privately retrieved per one bit of downloaded bit. The capacity has been fully
    solved for some degenerating cases. For a general case where the database is
    both coded and colluded, the exact capacity remains unknown. We build a general
    private information retrieval scheme for MDS coded databases with colluding
    servers. Our scheme achieves the rate ((1+R+R^2+cdots+R^{M-1})), where
    (R=1-frac{{{N-T}choose K}}{{Nchoose K}}). Compared to existing PIR schemes,
    our scheme performs better for a certain range of parameters and is suitable
    for any underlying MDS code used in the distributed storage system.

    Joint Computation and Communication Cooperation for Mobile Edge Computing

    Xiaowen Cao, Feng Wang, Jie Xu, Rui Zhang, Shuguang Cui
    Comments: 7 pages, 5 figures
    Subjects: Information Theory (cs.IT)

    This paper considers a three-node mobile edge computing (MEC) system that
    consists of a user node having latency-constrained computation tasks to be
    executed, a helper node, and an access point (AP) attached with an MEC server.
    We propose a novel joint computation and communication cooperation approach,
    where the helper uses its local computation and communication resources to help
    the user’s task execution. In particular, we develop a four-slot protocol to
    enable this approach. In the first slot, the user offloads part of its tasks to
    the helper, such that the helper can cooperatively compute them on behalf of
    the user in the remaining time. In the second and third slots, the helper acts
    as a decode-and-forward (DF) relay for cooperative communication, in order to
    help the user offload another part of the tasks to the AP for remote execution.
    In the fourth slot, the helper and the AP send the computation results back to
    the user. We jointly optimize the computation and communication resource
    allocation at both the user and the helper, so as to minimize their total
    energy consumption while satisfying the user’s computation latency constraint.
    Numerical results show that the proposed joint computation and communication
    cooperation approach outperforms other benchmark schemes without such a joint
    consideration.

    Subspace Tracking Algorithms for Millimeter Wave MIMO Channel Estimation with Hybrid Beamforming

    Stefano Buzzi, Carmen D'Andrea
    Comments: 6 pages. Presented at the 21st ITG Workshop on Smart Antennas, Berlin, March 2017
    Subjects: Information Theory (cs.IT)

    This paper proposes the use of subspace tracking algorithms for performing
    MIMO channel estimation at millimeter wave (mmWave) frequencies. Using a
    subspace approach, we develop a protocol enabling the estimation of the right
    (resp. left) singular vectors at the transmitter (resp. receiver) side; then,
    we adapt the projection approximation subspace tracking with deflation (PASTd)
    and the orthogonal Oja (OOJA) algorithms to our framework and obtain two
    channel estimation algorithms. The hybrid analog/digital nature of the
    beamformer is also explicitly taken into account at the algorithm design stage.
    Numerical results show that the proposed estimation algorithms are effective,
    and that they perform better than two relevant competing alternatives available
    in the open literature.

    Multiuser Millimeter Wave MIMO Channel Estimation with Hybrid Beamforming

    Stefano Buzzi, Carmen D'Andrea
    Comments: 3 pages. To be presented at the Poster Session of the European Conference on Networks and Communications (EuCNC), Oulu, Finland, June 2017
    Subjects: Information Theory (cs.IT)

    This paper focuses on multiuser MIMO channel estimation and data transmission
    at millimeter wave (mmWave) frequencies. The proposed approach relies on the
    time-division-duplex (TDD) protocol and is based on two distinct phases. First
    of all, the Base Station (BS) sends a suitable probing signal so that all the
    Mobile Stations (MSs), using a subspace tracking algorithm, can estimate the
    dominant left singular vectors of their BS-to-MS propagation channel. Then,
    each MS, using the estimated dominant left singular vectors as pre-coding
    beamformers, sends a suitable pilot sequence so that the BS can estimate the
    corresponding right dominant channel singular vectors and the corresponding
    eigenvalues. The low-complexity projection approximation subspace tracking with
    deflation (PASTd) algorithm is used at the MSs for dominant subspace
    estimation, while pilot-matched (PM) and zero-forcing (ZF) reception is used at
    the BS. The proposed algorithms can be used in conjuction with an analog RF
    beamformer and are shown to exhibit very good performance.

    Entropic Trace Estimates for Log Determinants

    Jack Fitzsimons, Diego Granziol, Kurt Cutajar, Michael Osborne, Maurizio Filippone, Stephen Roberts
    Comments: 16 pages, 4 figures, 2 tables, 2 algorithms
    Subjects: Numerical Analysis (cs.NA); Information Theory (cs.IT); Computation (stat.CO); Machine Learning (stat.ML)

    The scalable calculation of matrix determinants has been a bottleneck to the
    widespread application of many machine learning methods such as determinantal
    point processes, Gaussian processes, generalised Markov random fields, graph
    models and many others. In this work, we estimate log determinants under the
    framework of maximum entropy, given information in the form of moment
    constraints from stochastic trace estimation. The estimates demonstrate a
    significant improvement on state-of-the-art alternative methods, as shown on a
    wide variety of UFL sparse matrices. By taking the example of a general Markov
    random field, we also demonstrate how this approach can significantly
    accelerate inference in large-scale learning methods involving the log
    determinant.

    Superadditivity of the classical capacity with limited entanglement assistance

    Elton Yechao Zhu, Quntao Zhuang, Peter W. Shor
    Comments: 12 pages
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    Finding the optimal encoding strategies can be challenging for communication
    using quantum channels, as classical and quantum capacities may be
    superadditive. Entanglement assistance can often simplify this task, as the
    entanglement-assisted classical capacity for any channel is additive, making
    entanglement across channel uses unnecessary. If the entanglement assistance is
    limited, the picture is much more unclear. Suppose the classical capacity is
    superadditive, then the classical capacity with limited entanglement assistance
    could retain superadditivity by continuity arguments. If the classical capacity
    is additive, it is unknown if superadditivity can still be developed with
    limited entanglement assistance. We show this is possible, by providing an
    example. We construct a channel for which, the classical capacity is additive,
    but that with limited entanglement assistance can be superadditive. This shows
    entanglement plays a weird role in communication and we still understand very
    little about it.

    Proactive Edge Computing in Latency-Constrained Fog Networks

    Mohammed S. Elbamby, Mehdi Bennis, Walid Saad
    Comments: 6 pages, 5 figures, accepted in EuCNC 2017
    Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

    In this paper, the fundamental problem of distribution and proactive caching
    of computing tasks in fog networks is studied under latency and reliability
    constraints. In the proposed scenario, computing can be executed either locally
    at the user device or offloaded to an edge cloudlet. Moreover, cloudlets
    exploit both their computing and storage capabilities by proactively caching
    popular task computation results to minimize computing latency. To this end, a
    clustering method to group spatially proximate user devices with mutual task
    popularity interests and their serving cloudlets is proposed. Then, cloudlets
    can proactively cache the popular tasks’ computations of their cluster members
    to minimize computing latency. Additionally, the problem of distributing tasks
    to cloudlets is formulated as a matching game in which a cost function of
    computing delay is minimized under latency and reliability constraints.
    Simulation results show that the proposed scheme guarantees reliable
    computations with bounded latency and achieves up to 91% decrease in computing
    latency as compared to baseline schemes.




沪ICP备19023445号-2号
友情链接