IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 28 Mar 2017

    我爱机器学习(52ml.net)发表于 2017-03-28 00:00:00
    love 0

    Neural and Evolutionary Computing

    Where to put the Image in an Image Caption Generator

    Marc Tanti (1), Albert Gatt (1), Kenneth P. Camilleri (1) ((1) University of Malta)
    Comments: under review, 29 pages, 5 figures, 6 tables
    Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    When a neural language model is used for caption generation, the image
    information can be fed to the neural network either by directly incorporating
    it in a recurrent neural network — conditioning the language model by
    injecting image features — or in a layer following the recurrent neural
    network — conditioning the language model by merging the image features. While
    merging implies that visual features are bound at the end of the caption
    generation process, injecting can bind the visual features at a variety stages.
    In this paper we empirically show that late binding is superior to early
    binding in terms of different evaluation metrics. This suggests that the
    different modalities (visual and linguistic) for caption generation should not
    be jointly encoded by the RNN; rather, the multimodal integration should be
    delayed to a subsequent stage. Furthermore, this suggests that recurrent neural
    networks should not be viewed as actually generating text, but only as encoding
    it for prediction in a subsequent layer.

    Deep Deterministic Policy Gradient for Urban Traffic Light Control

    Noe Casas
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Traffic light timing optimization is still an active line of research despite
    the wealth of scientific literature on the topic, and the problem remains
    unsolved for any non-toy scenario. One of the key issues with traffic light
    optimization is the large scale of the input information that is available for
    the controlling agent, namely all the traffic data that is continually sampled
    by the traffic detectors that cover the urban network. This issue has in the
    past forced researchers to focus on agents that work on localized parts of the
    traffic network, typically on individual intersections, and to coordinate every
    individual agent in a multi-agent setup. In order to overcome the large scale
    of the available state information, we propose to rely on the ability of deep
    Learning approaches to handle large input spaces, in the form of Deep
    Deterministic Policy Gradient (DDPG) algorithm. We performed several
    experiments with a range of models, from the very simple one (one intersection)
    to the more complex one (a big city section).

    Surrogate Model of Multi-Period Flexibility from a Home Energy Management System

    Rui Pinto, Ricardo Bessa, Manuel Matos
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

    Near-future electric distribution grids operation will have to rely on
    demand-side flexibility, both by implementation of demand response strategies
    and by taking advantage of the intelligent management of increasingly common
    small-scale energy storage. Home energy management systems (HEMS) will play a
    crucial role on the flexibility provision to both system operators and market
    players like aggregators. Modeling multi-period flexibility from residential
    consumers (HEMS flexibility), such as battery storage and electric water
    heater, while complying with internal constraints (comfort levels, data
    privacy) and uncertainty is a complex task. This paper describes a
    computational method that is capable of efficiently define and learn the
    feasible flexibility set from controllable resources connected to a HEMS. An
    Evolutionary Particle Swarm Optimization (EPSO) algorithm is adopted and
    reshaped to derive a set of feasible temporal trajectories for the residential
    net-load, considering storage, flexible appliances, and predefined costumer
    preferences, as well as load and photovoltaic (PV) forecast uncertainty. A
    support vector data description (SVDD) algorithm is used to build models
    capable of classifying feasible and unfeasible HEMS operating trajectories upon
    request from an optimization/control algorithm operated by a DSO or market
    player.

    Balancing Selection Pressures, Multiple Objectives, and Neural Modularity to Coevolve Cooperative Agent Behavior

    Alex C. Rollins, Jacob Schrum
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Previous research using evolutionary computation in Multi-Agent Systems
    indicates that assigning fitness based on team vs. individual behavior has a
    strong impact on the ability of evolved teams of artificial agents to exhibit
    teamwork in challenging tasks. However, such research only made use of
    single-objective evolution. In contrast, when a multiobjective evolutionary
    algorithm is used, populations can be subject to individual-level objectives,
    team-level objectives, or combinations of the two. This paper explores the
    performance of cooperatively coevolved teams of agents controlled by artificial
    neural networks subject to these types of objectives. Specifically, predator
    agents are evolved to capture scripted prey agents in a torus-shaped grid
    world. Because of the tension between individual and team behaviors, multiple
    modes of behavior can be useful, and thus the effect of modular neural networks
    is also explored. Results demonstrate that fitness rewarding individual
    behavior is superior to fitness rewarding team behavior, despite being applied
    to a cooperative task. However, the use of networks with multiple modules
    allows predators to discover intelligent behavior, regardless of which type of
    objectives are used.

    Comparing Rule-Based and Deep Learning Models for Patient Phenotyping

    Sebastian Gehrmann, Franck Dernoncourt, Yeran Li, Eric T. Carlson, Joy T. Wu, Jonathan Welt, John Foote Jr., Edward T. Moseley, David W. Grant, Patrick D. Tyler, Leo Anthony Celi
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Objective: We investigate whether deep learning techniques for natural
    language processing (NLP) can be used efficiently for patient phenotyping.
    Patient phenotyping is a classification task for determining whether a patient
    has a medical condition, and is a crucial part of secondary analysis of
    healthcare data. We assess the performance of deep learning algorithms and
    compare them with classical NLP approaches.

    Materials and Methods: We compare convolutional neural networks (CNNs),
    n-gram models, and approaches based on cTAKES that extract pre-defined medical
    concepts from clinical notes and use them to predict patient phenotypes. The
    performance is tested on 10 different phenotyping tasks using 1,610 discharge
    summaries extracted from the MIMIC-III database.

    Results: CNNs outperform other phenotyping algorithms in all 10 tasks. The
    average F1-score of our model is 76 (PPV of 83, and sensitivity of 71) with our
    model having an F1-score up to 37 points higher than alternative approaches. We
    additionally assess the interpretability of our model by presenting a method
    that extracts the most salient phrases for a particular prediction.

    Conclusion: We show that NLP methods based on deep learning improve the
    performance of patient phenotyping. Our CNN-based algorithm automatically
    learns the phrases associated with each patient phenotype. As such, it reduces
    the annotation complexity for clinical domain experts, who are normally
    required to develop task-specific annotation rules and identify relevant
    phrases. Our method performs well in terms of both performance and
    interpretability, which indicates that deep learning is an effective approach
    to patient phenotyping based on clinicians’ notes.


    Computer Vision and Pattern Recognition

    Coherent Online Video Style Transfer

    Dongdong Chen, Jing Liao, Yuan Lu, Nenghai Yu, Gang Hua
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Training a feed-forward network for fast neural style transfer of images is
    proven to be successful. However, the naive extension to process video frame by
    frame is prone to producing flickering results. We propose the first end-to-end
    network for online video style transfer, which generates temporally coherent
    stylized video sequences in near real-time. Two key ideas include an efficient
    network by incorporating short-term coherence, and propagating short-term
    coherence to long-term, which ensures the consistency over larger period of
    time. Our network can incorporate different image stylization networks. We show
    that the proposed method clearly outperforms the per-frame baseline both
    qualitatively and quantitatively. Moreover, it can achieve visually comparable
    coherence to optimization-based video style transfer, but is three orders of
    magnitudes faster in runtime.

    StyleBank: An Explicit Representation for Neural Image Style Transfer

    Dongdong Chen, Yuan Lu, Jing Liao, Nenghai Yu, Gang Hua
    Comments: Accepted by CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose StyleBank, which is composed of multiple convolution filter banks
    and each filter bank explicitly represents one style, for neural image style
    transfer. To transfer an image to a specific style, the corresponding filter
    bank is operated on top of the intermediate feature embedding produced by a
    single auto-encoder. The StyleBank and the auto-encoder are jointly learnt,
    where the learning is conducted in such a way that the auto-encoder does not
    encode any style information thanks to the flexibility introduced by the
    explicit filter bank representation. It also enables us to conduct incremental
    learning to add a new image style by learning a new filter bank while holding
    the auto-encoder fixed. The explicit style representation along with the
    flexible network design enables us to fuse styles at not only the image level,
    but also the region level. Our method is the first style transfer network that
    links back to traditional texton mapping methods, and hence provides new
    understanding on neural style transfer. Our method is easy to train, runs in
    real-time, and produces results that qualitatively better or at least
    comparable to existing methods.

    Deep Poincare Map For Robust Medical Image Segmentation

    Yuanhan Mo, Fangde Liu, Jingqing Zhang, Guang Yang, Taigang He, Yike Guo
    Comments: 8 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Precise segmentation is a prerequisite for an accurate quantification of the
    imaged objects. It is a very challenging task in many medical imaging
    applications due to relatively poor image quality and data scarcity. In this
    work, we present an innovative segmentation paradigm, named Deep Poincare Map
    (DPM), by coupling the dynamical system theory with a novel deep learning based
    approach. Firstly, we model the image segmentation process as a dynamical
    system, in which limit cycle models the boundary of the region of interest
    (ROI). Secondly, instead of segmenting the ROI directly, convolutional neural
    network is employed to predict the vector field of the dynamical system.
    Finally, the boundary of the ROI is identified using the Poincare map and the
    flow integration. We demonstrate that our segmentation model can be built using
    a very limited number of train- ing data. By cross-validation, we can achieve a
    mean Dice score of 94% compared to the manual delineation (ground truth) of the
    left ventricle ROI defined by clinical experts on a cardiac MRI dataset.
    Compared with other state-of-the-art methods, we can conclude that the proposed
    DPM method is adaptive, accurate and robust. It is straightforward to apply
    this method for other medical imaging applications.

    Introduction To The Monogenic Signal

    Christopher P. Bridge
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The monogenic signal is an image analysis methodology that was introduced by
    Felsberg and Sommer in 2001 and has been employed for a variety of purposes in
    image processing and computer vision research. In particular, it has been found
    to be useful in the analysis of ultrasound imagery in several research
    scenarios mostly in work done within the BioMedIA lab at Oxford. However, the
    literature on the monogenic signal can be difficult to penetrate due to the
    lack of a single resource to explain the various principles from basics. The
    purpose of this document is therefore to introduce the principles, purpose,
    applications, and limitations of the methodology. It assumes some background
    knowledge from the fields of image and signal processing, in particular a good
    knowledge of Fourier transforms as applied to signals and images. We will not
    attempt to provide a thorough math- ematical description or derivation of the
    monogenic signal, but rather focus on developing an intuition for understanding
    and using the methodology and refer the reader elsewhere for a more
    mathematical treatment.

    Transfer learning for music classification and regression tasks

    Keunwoo Choi, György Fazekas, Mark Sandler, Kyunghyun Cho
    Comments: 16 pages, single column, NOT iclr submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD)

    In this paper, we present a transfer learning approach for music
    classification and regression tasks. We propose to use a pretrained convnet
    feature, a concatenated feature vector using activations of feature maps of
    multiple layers in a trained convolutional network. We show that how this
    convnet feature can serve as a general-purpose music representation. In the
    experiment, a convnet is trained for music tagging and then transferred for
    many music-related classification and regression tasks as well as an
    audio-related classification task. In experiments, the convnet feature
    outperforms the baseline MFCC feature in all tasks and many reported approaches
    of aggregating MFCCs and low- and high-level music features.

    A Study on the Extraction and Analysis of a Large Set of Eye Movement Features during Reading

    Ioannis Rigas, Lee Friedman, Oleg Komogortsev
    Comments: 38 pages, 10 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Quantitative Methods (q-bio.QM)

    This work presents a study on the extraction and analysis of a set of 101
    categories of eye movement features from three types of eye movement events:
    fixations, saccades, and post-saccadic oscillations. The eye movements were
    recorded during a reading task. For the categories of features with multiple
    instances in a recording we extract corresponding feature subtypes by
    calculating descriptive statistics on the distributions of these instances. A
    unified framework of detailed descriptions and mathematical formulas are
    provided for the extraction of the feature set. The analysis of feature values
    is performed using a large database of eye movement recordings from a normative
    population of 298 subjects. We demonstrate the central tendency and overall
    variability of feature values over the experimental population, and more
    importantly, we quantify the test-retest reliability (repeatability) of each
    separate feature. The described methods and analysis can provide valuable tools
    in fields exploring the eye movements, such as in behavioral studies, attention
    and cognition research, medical research, biometric recognition, and
    human-computer interaction.

    Reweighted Infrared Patch-Tensor Model With Both Non-Local and Local Priors for Single-Frame Small Target Detection

    Yimian Dai, Yiquan Wu
    Comments: Submitted to IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 16 pages, 16 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Many state-of-the-art methods have been proposed for infrared small target
    detection. They work well on the images with homogeneous backgrounds and
    high-contrast targets. However, when facing highly heterogeneous backgrounds,
    they would not perform very well, mainly due to: 1) the existence of strong
    edges and other interfering components, 2) not utilizing the priors fully.
    Inspired by this, we propose a novel method to exploit both local and non-local
    priors simultaneously. Firstly, we employ a new infrared patch-tensor (IPT)
    model to represent the image and preserve its spatial correlations. Exploiting
    the target sparse prior and background non-local self-correlation prior, the
    target-background separation is modeled as a robust low-rank tensor recovery
    problem. Moreover, with the help of the structure tensor and reweighted idea,
    we design an entry-wise local-structure-adaptive and sparsity enhancing weight
    to replace the globally constant weighting parameter. The decomposition could
    be achieved via the element-wise reweighted higher-order robust principal
    component analysis with an additional convergence condition according to the
    practical situation of target detection. Extensive experiments demonstrate that
    our model outperforms the other state-of-the-arts, in particular for the images
    with very dim targets and heavy clutters.

    Multi-Path Region-Based Convolutional Neural Network for Accurate Detection of Unconstrained "Hard Faces"

    Yuguang Liu, Martin D. Levine
    Comments: 11 pages, 7 figures, to be presented at CRV 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Large-scale variations still pose a challenge in unconstrained face
    detection. To the best of our knowledge, no current face detection algorithm
    can detect a face as large as 800 x 800 pixels while simultaneously detecting
    another one as small as 8 x 8 pixels within a single image with equally high
    accuracy. We propose a two-stage cascaded face detection framework, Multi-Path
    Region-based Convolutional Neural Network (MP-RCNN), that seamlessly combines a
    deep neural network with a classic learning strategy, to tackle this challenge.
    The first stage is a Multi-Path Region Proposal Network (MP-RPN) that proposes
    faces at three different scales. It simultaneously utilizes three parallel
    outputs of the convolutional feature maps to predict multi-scale candidate face
    regions. The “atrous” convolution trick (convolution with up-sampled filters)
    and a newly proposed sampling layer for “hard” examples are embedded in MP-RPN
    to further boost its performance. The second stage is a Boosted Forests
    classifier, which utilizes deep facial features pooled from inside the
    candidate face regions as well as deep contextual features pooled from a larger
    region surrounding the candidate face regions. This step is included to further
    remove hard negative samples. Experiments show that this approach achieves
    state-of-the-art face detection performance on the WIDER FACE dataset “hard”
    partition, outperforming the former best result by 9.6% for the Average
    Precision.

    Active Convolution: Learning the Shape of Convolution for Image Classification

    Yunho Jeon, Junmo Kim
    Comments: Accepted to appear in CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In recent years, deep learning has achieved great success in many computer
    vision applications. Convolutional neural networks (CNNs) have lately emerged
    as a major approach to image classification. Most research on CNNs thus far has
    focused on developing architectures such as the Inception and residual
    networks. The convolution layer is the core of the CNN, but few studies have
    addressed the convolution unit itself. In this paper, we introduce a
    convolution unit called the active convolution unit (ACU). A new convolution
    has no fixed shape, because of which we can define any form of convolution. Its
    shape can be learned through backpropagation during training. Our proposed unit
    has a few advantages. First, the ACU is a generalization of convolution; it can
    define not only all conventional convolutions, but also convolutions with
    fractional pixel coordinates. We can freely change the shape of the
    convolution, which provides greater freedom to form CNN structures. Second, the
    shape of the convolution is learned while training and there is no need to tune
    it by hand. Third, the ACU can learn better than a conventional unit, where we
    obtained the improvement simply by changing the conventional convolution to an
    ACU. We tested our proposed method on plain and residual networks, and the
    results showed significant improvement using our method on various datasets and
    architectures in comparison with the baseline.

    Efficient Processing of Deep Neural Networks: A Tutorial and Survey

    Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, Joel Emer
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep neural networks (DNNs) are currently widely used for many artificial
    intelligence (AI) applications including computer vision, speech recognition,
    and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it
    comes at the cost of high computational complexity. Accordingly, techniques
    that enable efficient processing of deep neural network to improve
    energy-efficiency and throughput without sacrificing performance accuracy or
    increasing hardware cost are critical to enabling the wide deployment of DNNs
    in AI systems.

    This article aims to provide a comprehensive tutorial and survey about the
    recent advances towards the goal of enabling efficient processing of DNNs.
    Specifically, it will provide an overview of DNNs, discuss various platforms
    and architectures that support DNNs, and highlight key trends in recent
    efficient processing techniques that reduce the computation cost of DNNs either
    solely via hardware design changes or via joint hardware design and network
    algorithm changes. It will also summarize various development resources that
    can enable researchers and practitioners to quickly get started on DNN design,
    and highlight important benchmarking metrics and design considerations that
    should be used for evaluating the rapidly growing number of DNN hardware
    designs, optionally including algorithmic co-design, being proposed in academia
    and industry.

    The reader will take away the following concepts from this article:
    understand the key design considerations for DNNs; be able to evaluate
    different DNN hardware implementations with benchmarks and comparison metrics;
    understand trade-offs between various architectures and platforms; be able to
    evaluate the utility of various DNN design techniques for efficient processing;
    and understand of recent implementation trends and opportunities.

    Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video

    Davide Moltisanti, Michael Wray, Walterio Mayol-Cuevas, Dima Damen
    Comments: 9 pages, 11 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Manual annotations of temporal bounds for object interactions (i.e. start and
    end times) are typical training input to recognition, localization and
    detection algorithms. For three publicly available egocentric datasets, we
    uncover inconsistencies in ground truth temporal bounds within and across
    annotators and datasets. We systematically assess the robustness of
    state-of-the-art approaches to changes in labeled temporal bounds, for object
    interaction recognition. As boundaries are trespassed, a drop of up to 10% is
    observed for both Improved Dense Trajectories and Two-Stream Convolutional
    Neural Network. We demonstrate that such disagreement stems from a limited
    understanding of the distinct phases of an action, and propose annotating based
    on the Rubicon Boundaries, inspired by a similarly named cognitive model, for
    consistent temporal bounds of object interactions. Evaluated on a public
    dataset, we report a 4% increase in overall accuracy, and an increase in
    accuracy for 55% of classes when Rubicon Boundaries are used for temporal
    annotations.

    Simultaneous Perception and Path Generation Using Fully Convolutional Neural Networks

    Luca Caltagirone, Mauro Bellone, Lennart Svensson, Mattias Wahde
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this work, a novel learning-based approach has been developed to generate
    driving paths by integrating LIDAR point clouds, GPS-IMU information, and
    Google driving directions. The system is based on a fully convolutional neural
    network that jointly learns to carry out perception and path generation from
    real-world driving sequences and that is trained using automatically generated
    training examples. Several combinations of input data were tested in order to
    assess the performance gain provided by specific information modalities. The
    fully convolutional neural network trained using all the available sensors
    together with driving directions achieved the best MaxF score of 88.13% when
    considering a region of interest of 60×60 meters. By considering a smaller
    region of interest, the agreement between predicted paths and ground-truth
    increased to 92.60%. The positive results obtained in this work indicate that
    the proposed system may help fill the gap between low-level scene parsing and
    behavior-reflex approaches by generating outputs that are close to vehicle
    control and at the same time human-interpretable.

    Mastering Sketching: Adversarial Augmentation for Structured Prediction

    Edgar Simo-Serra, Satoshi Iizuka, Hiroshi Ishikawa
    Comments: 12 pages, 14 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present an integral framework for training sketch simplification networks
    that convert challenging rough sketches into clean line drawings. Our approach
    augments a simplification network with a discriminator network, training both
    networks jointly so that the discriminator network discerns whether a line
    drawing is a real training data or the output of the simplification network,
    which in turn tries to fool it. This approach has two major advantages. First,
    because the discriminator network learns the structure in line drawings, it
    encourages the output sketches of the simplification network to be more similar
    in appearance to the training sketches. Second, we can also train the
    simplification network with additional unsupervised data, using the
    discriminator network as a substitute teacher. Thus, by adding only rough
    sketches without simplified line drawings, or only line drawings without the
    original rough sketches, we can improve the quality of the sketch
    simplification. We show how our framework can be used to train models that
    significantly outperform the state of the art in the sketch simplification
    task, despite using the same architecture for inference. We additionally
    present an approach to optimize for a single image, which improves accuracy at
    the cost of additional computation time. Finally, we show that, using the same
    framework, it is possible to train the network to perform the inverse problem,
    i.e., convert simple line sketches into pencil drawings, which is not possible
    using the standard mean squared error loss. We validate our framework with two
    user tests, where our approach is preferred to the state of the art in sketch
    simplification 92.3% of the time and obtains 1.2 more points on a scale of 1 to
    5.

    Scaling the Scattering Transform: Deep Hybrid Networks

    Edouard Oyallon (DI-ENS), Eugene Belilovsky (CVN, GALEN), Sergey Zagoruyko (ENPC)
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We use the scattering network as a generic and fixed initialization of the
    first layers of a supervised hybrid deep network. We show that early layers do
    not necessarily need to be learned, providing the best results to-date with
    pre-defined representations while being competitive with Deep CNNs. Using a
    shallow cascade of 1×1 convolutions, which encodes scattering coefficients that
    correspond to spatial windows of very small sizes, permits to obtain AlexNet
    accuracy on the imagenet ILSVRC2012. We demonstrate that this local encoding
    explicitly learns in-variance w.r.t. rotations. Combining scattering networks
    with a modern ResNet, we achieve a single-crop top 5 error of 11.4% on imagenet
    ILSVRC2012, comparable to the Resnet-18 architecture, while utilizing only 10
    layers. We also find that hybrid architectures can yield excellent performance
    in the small sample regime, exceeding their end-to-end counterparts, through
    their ability to incorporate geometrical priors. We demonstrate this on subsets
    of the CIFAR-10 dataset and by setting a new state-of-the-art on the STL-10
    dataset.

    MIHash: Online Hashing with Mutual Information

    Fatih Cakir, Kun He, Sarah Adel Bargal, Stan Sclaroff
    Comments: 16 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Learning-based adaptive hashing methods are widely used for nearest neighbor
    retrieval. Recently, online hashing methods have demonstrated a good
    performance-complexity tradeoff by learning hash functions from streaming data.
    In this paper, we aim to advance the state-of-the-art for online hashing. We
    first address a key challenge that has often been ignored: the binary codes for
    indexed data must be recomputed to keep pace with updates to the hash
    functions. We propose an efficient quality measure for hash functions, based on
    an information-theoretic quantity, mutual information, and use it successfully
    as a criterion to eliminate unnecessary hash table updates. Next, we show that
    mutual information can also be used as an objective in learning hash functions,
    using gradient-based optimization. Experiments on image retrieval benchmarks
    (including a 2.5M image dataset) confirm the effectiveness of our formulation,
    both in reducing hash table recomputations and in learning high-quality hash
    functions.

    A Visual Measure of Changes to Weighted Self-Organizing Map Patterns

    Younjin Chung, Joachim Gudmundsson, Masahiro Takatsuka
    Comments: 8 pages, 3 figures, conference, llncs style
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Estimating output changes by input changes is the main task in causal
    analysis. In previous work, input and output Self-Organizing Maps (SOMs) were
    associated for causal analysis of multivariate and nonlinear data. Based on the
    association, a weight distribution of the output conditional on a given input
    was obtained over the output map space. Such a weighted SOM pattern of the
    output changes when the input changes. In order to analyze the change, it is
    important to measure the difference of the patterns. Many methods have been
    proposed for the dissimilarity measure of patterns. However, it remains a major
    challenge when attempting to measure how the patterns change. In this paper, we
    propose a visualization approach that simplifies the comparison of the
    difference in terms of the pattern property. Using this approach, the change
    can be analyzed by integrating colors and star glyph shapes representing the
    property dissimilarity. Ecological data is used to demonstrate the usefulness
    of our approach and the experimental results show that our approach provides
    the change information effectively.

    Exploiting Color Name Space for Salient Object Detection

    Jing Lou, Huan Wang, Longtao Chen, Qingyuan Xia, Wei Zhu, Mingwu Ren
    Comments: 13 pages, 10 figures, 2 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we will investigate the contribution of color names for
    salient object detection. Each input image is first converted to the color name
    space, which is consisted of 11 probabilistic channels. By exploring the
    topological structure relationship between the figure and the ground, we obtain
    a saliency map through a linear combination of a set of sequential attention
    maps. To overcome the limitation of only exploiting the surroundedness cue, two
    global cues with respect to color names are invoked for guiding the computation
    of another weighted saliency map. Finally, we integrate the two saliency maps
    into a unified framework to infer the saliency result. In addition, an improved
    post-processing procedure is introduced to effectively suppress the background
    while uniformly highlight the salient objects. Experimental results show that
    the proposed model produces more accurate saliency maps and performs well
    against 23 saliency models in terms of three evaluation metrics on three public
    datasets.

    Transductive Zero-Shot Learning with Adaptive Structural Embedding

    Yunlong Yu, Zhong Ji, Jichang Guo, Yanwei Pang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Zero-shot learning (ZSL) endows the computer vision system with the
    inferential capability to recognize instances of a new category that has never
    seen before. Two fundamental challenges in it are visual-semantic embedding and
    domain adaptation in cross-modality learning and unseen class prediction steps,
    respectively. To address both challenges, this paper presents two corresponding
    methods named Adaptive STructural Embedding (ASTE) and Self-PAsed Selective
    Strategy (SPASS), respectively. Specifically, ASTE formulates the
    visualsemantic interactions in a latent structural SVM framework to adaptively
    adjust the slack variables to embody the different reliableness among training
    instances. In this way, the reliable instances are imposed with small
    punishments, wheras the less reliable instances are imposed with more severe
    punishments. Thus, it ensures a more discriminative embedding. On the other
    hand, SPASS offers a framework to alleviate the domain shift problem in ZSL,
    which exploits the unseen data in an easy to hard fashion. Particularly, SPASS
    borrows the idea from selfpaced learning by iteratively selecting the unseen
    instances from reliable to less reliable to gradually adapt the knowledge from
    the seen domain to the unseen domain. Subsequently, by combining SPASS and
    ASTE, we present a self-paced Transductive ASTE (TASTE) method to progressively
    reinforce the classification capacity. Extensive experiments on three benchmark
    datasets (i.e., AwA, CUB, and aPY) demonstrate the superiorities of ASTE and
    TASTE. Furthermore, we also propose a fast training (FT) strategy to improve
    the efficiency of most of existing ZSL methods. The FT strategy is surprisingly
    simple and general enough, which can speed up the training time of most
    existing methods by 4~300 times while holding the previous performance.

    Transductive Zero-Shot Learning with a Self-training dictionary approach

    Yunlong Yu, Zhong Ji, Xi Li, Jichang Guo, Zhongfei Zhang, Haibin Ling, Fei Wu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    As an important and challenging problem in computer vision, zero-shot
    learning (ZSL) aims at automatically recognizing the instances from unseen
    object classes without training data. To address this problem, ZSL is usually
    carried out in the following two aspects: 1) capturing the domain distribution
    connections between seen classes data and unseen classes data; and 2) modeling
    the semantic interactions between the image feature space and the label
    embedding space. Motivated by these observations, we propose a bidirectional
    mapping based semantic relationship modeling scheme that seeks for crossmodal
    knowledge transfer by simultaneously projecting the image features and label
    embeddings into a common latent space. Namely, we have a bidirectional
    connection relationship that takes place from the image feature space to the
    latent space as well as from the label embedding space to the latent space. To
    deal with the domain shift problem, we further present a transductive learning
    approach that formulates the class prediction problem in an iterative refining
    process, where the object classification capacity is progressively reinforced
    through bootstrapping-based model updating over highly reliable instances.
    Experimental results on three benchmark datasets (AwA, CUB and SUN) demonstrate
    the effectiveness of the proposed approach against the state-of-the-art
    approaches.

    Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras

    Lingni Ma, Jörg Stückler, Christian Kerl, Daniel Cremers
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Visual scene understanding is an important capability that enables robots to
    purposefully act in their environment. In this paper, we propose a novel
    approach to object-class segmentation from multiple RGB-D views using deep
    learning. We train a deep neural network to predict object-class semantics that
    is consistent from several view points in a semi-supervised way. At test time,
    the semantics predictions of our network can be fused more consistently in
    semantic keyframe maps than predictions of a network trained on individual
    views. We base our network architecture on a recent single-view deep learning
    approach to RGB and depth fusion for semantic object-class segmentation and
    enhance it with multi-scale loss minimization. We obtain the camera trajectory
    using RGB-D SLAM and warp the predictions of RGB-D images into ground-truth
    annotated frames in order to enforce multi-view consistency during training. At
    test time, predictions from multiple views are fused into keyframes. We propose
    and analyze several methods for enforcing multi-view consistency during
    training and testing. We evaluate the benefit of multi-view consistency
    training and demonstrate that pooling of deep features and fusion over multiple
    views outperforms single-view baselines on the NYUDv2 benchmark for semantic
    segmentation. Our end-to-end trained network achieves state-of-the-art
    performance on the NYUDv2 dataset in single-view segmentation as well as
    multi-view semantic fusion.

    Person Re-Identification by Camera Correlation Aware Feature Augmentation

    Ying-Cong Chen, Xiatian Zhu, Wei-Shi Zheng, Jian-Huang Lai
    Comments: To Appear in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The challenge of person re-identification (re-id) is to match individual
    images of the same person captured by different non-overlapping camera views
    against significant and unknown cross-view feature distortion. While a large
    number of distance metric/subspace learning models have been developed for
    re-id, the cross-view transformations they learned are view-generic and thus
    potentially less effective in quantifying the feature distortion inherent to
    each camera view. Learning view-specific feature transformations for re-id
    (i.e., view-specific re-id), an under-studied approach, becomes an alternative
    resort for this problem. In this work, we formulate a novel view-specific
    person re-identification framework from the feature augmentation point of view,
    called Camera coRrelation Aware Feature augmenTation (CRAFT). Specifically,
    CRAFT performs cross-view adaptation by automatically measuring camera
    correlation from cross-view visual data distribution and adaptively conducting
    feature augmentation to transform the original features into a new adaptive
    space. Through our augmentation framework, view-generic learning algorithms can
    be readily generalized to learn and optimize view-specific sub-models whilst
    simultaneously modelling view-generic discrimination information. Therefore,
    our framework not only inherits the strength of view-generic model learning but
    also provides an effective way to take into account view specific
    characteristics. Our CRAFT framework can be extended to jointly learn
    view-specific feature transformations for person re-id across a large network
    with more than two cameras, a largely under-investigated but realistic re-id
    setting. Additionally, we present a domain-generic deep person appearance
    representation which is designed particularly to be towards view invariant for
    facilitating cross-view adaptation by CRAFT.

    Learned multi-patch similarity

    Wilfried Hartmann, Silvano Galliani, Michal Havlena, Konrad Schindler, Luc Van Gool
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Estimating a depth map from multiple views of a scene is a fundamental task
    in computer vision. As soon as more than two viewpoints are available, one
    faces the very basic question how to measure similarity across >2 image
    patches. Surprisingly, no direct solution exists, instead it is common to fall
    back to more or less robust averaging of two-view similarities. Encouraged by
    the success of machine learning, and in particular convolutional neural
    networks, we propose to learn a matching function which directly maps multiple
    image patches to a scalar similarity score. Experiments on several multi-view
    datasets demonstrate that this approach has advantages over methods based on
    pairwise patch similarity.

    SCAN: Structure Correcting Adversarial Network for Chest X-rays Organ Segmentation

    Wei Dai, Joseph Doyle, Xiaodan Liang, Hao Zhang, Nanqing Dong, Yuan Li, Eric P. Xing
    Comments: 10 pages, 7 figures, submitted to ICCV 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Chest X-ray (CXR) is one of the most commonly prescribed medical imaging
    procedures, often with over 2-10x more scans than other imaging modalities such
    as MRI, CT scan, and PET scans. These voluminous CXR scans place significant
    workloads on radiologists and medical practitioners. Organ segmentation is a
    crucial step to obtain effective computer-aided detection on CXR. In this work,
    we propose Structure Correcting Adversarial Network (SCAN) to segment lung
    fields and the heart in CXR images. SCAN incorporates a critic network to
    impose on the convolutional segmentation network the structural regularities
    emerging from human physiology. During training, the critic network learns to
    discriminate between the ground truth organ annotations from the masks
    synthesized by the segmentation network. Through this adversarial process the
    critic network learns the higher order structures and guides the segmentation
    model to achieve realistic segmentation outcomes. Extensive experiments show
    that our method produces highly accurate and natural segmentation. Using only
    very limited training data available, our model reaches human-level performance
    without relying on any existing trained model or dataset. Our method also
    generalizes well to CXR images from a different patient population and disease
    profiles, surpassing the current state-of-the-art.

    Open Vocabulary Scene Parsing

    Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, Antonio Torralba
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Recognizing arbitrary objects in the wild has been a challenging problem due
    to the limitations of existing classification models and datasets. In this
    paper, we propose a new task that aims at parsing scene with a large and open
    vocabulary, and several evaluation metrics are explored for this problem. Our
    proposed approach to this problem is a joint image pixel and word concept
    embeddings framework, where word concepts are connected by semantic relations.
    We validate the open vocabulary prediction ability of our framework on ADE20K
    dataset which covers a wide variety of scenes and objects. We further explore
    the trained joint embedding space to show its interpretability.

    Structured Learning of Tree Potentials in CRF for Image Segmentation

    Fayao Liu, Guosheng Lin, Ruizhi Qiao, Chunhua Shen
    Comments: 10 pages. Appearing in IEEE Transactions on Neural Networks and Learning Systems
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a new approach to image segmentation, which exploits the
    advantages of both conditional random fields (CRFs) and decision trees. In the
    literature, the potential functions of CRFs are mostly defined as a linear
    combination of some pre-defined parametric models, and then methods like
    structured support vector machines (SSVMs) are applied to learn those linear
    coefficients. We instead formulate the unary and pairwise potentials as
    nonparametric forests—ensembles of decision trees, and learn the ensemble
    parameters and the trees in a unified optimization problem within the
    large-margin framework. In this fashion, we easily achieve nonlinear learning
    of potential functions on both unary and pairwise terms in CRFs. Moreover, we
    learn class-wise decision trees for each object that appears in the image. Due
    to the rich structure and flexibility of decision trees, our approach is
    powerful in modelling complex data likelihoods and label relationships. The
    resulting optimization problem is very challenging because it can have
    exponentially many variables and constraints. We show that this challenging
    optimization can be efficiently solved by combining a modified column
    generation and cutting-planes techniques. Experimental results on both binary
    (Graz-02, Weizmann horse, Oxford flower) and multi-class (MSRC-21, PASCAL VOC
    2012) segmentation datasets demonstrate the power of the learned nonlinear
    nonparametric potentials.

    Sketch-based Face Editing in Video Using Identity Deformation Transfer

    Long Zhao, Fangda Han, Mubbasir Kapadia, Vladimir Pavlovic, Dimitris Metaxas
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We address the problem of using hand-drawn sketch to edit facial identity,
    such as enlarging the shape or modifying the position of eyes or mouth, in the
    whole video. This task is formulated as a 3D face model reconstruction and
    deformation problem. We first introduce a two-stage real-time 3D face model
    fitting schema to recover facial identity and expressions from the video. We
    recognize the user’s editing intention from the input sketch as a set of facial
    modifications. A novel identity deformation algorithm is then proposed to
    transfer these deformations from 2D space to 3D facial identity directly, while
    preserving the facial expressions. Finally, these changes are propagated to the
    whole video with the modified identity. Experimental results demonstrate that
    our method can effectively edit facial identity in video based on the input
    sketch with high consistency and fidelity.

    Count-ception: Counting by Fully Convolutional Redundant Counting

    Joseph Paul Cohen, Henry Z. Lo, Yoshua Bengio
    Comments: Under Review
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Counting objects in digital images is a process that should be replaced by
    machines. This tedious task is time consuming and prone to errors due to
    fatigue of human annotators. The goal is to have a system that takes as input
    an image and returns a count of the objects inside and justification for the
    prediction in the form of object localization. We repose a problem, originally
    posed by Lempitsky and Zisserman, to instead predict a count map which contains
    redundant counts based on the receptive field of a smaller regression network.
    The regression network predicts a count of the objects that exist inside this
    frame. By processing the image in a fully convolutional way each pixel is going
    to be accounted for some number of times, the number of windows which include
    it, which is the size of each window, (i.e., 32×32 = 1024). To recover the true
    count take the average over the redundant predictions. Our contribution is
    redundant counting instead of predicting a density map in order to average over
    errors. We also propose a novel deep neural network architecture adapted from
    the Inception family of networks called the Count-ception network. Together our
    approach results in a 20% gain over the state of the art method by Xie, Noble,
    and Zisserman in 2016.

    Improving the Accuracy of the CogniLearn System for Cognitive Behavior Assessment

    Amir Ghaderi, Srujana Gattupalli, Dylan Ebert, Ali Sharifara, Vassilis Athitsos, Fillia Makedon
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    HTKS is a game-like cognitive assessment method, designed for children
    between four and eight years of age. During the HTKS assessment, a child
    responds to a sequence of requests, such as “touch your head” or “touch your
    toes”. The cognitive challenge stems from the fact that the children are
    instructed to interpret these requests not literally, but by touching a
    different body part than the one stated. In prior work, we have developed the
    CogniLearn system, that captures data from subjects performing the HTKS game,
    and analyzes the motion of the subjects. In this paper we propose some specific
    improvements that make the motion analysis module more accurate. As a result of
    these improvements, the accuracy in recognizing cases where subjects touch
    their toes has gone from 76.46% in our previous work to 97.19% in this paper.

    Bayesian Optimization for Refining Object Proposals

    Anthony D. Rhodes, Jordan Witte, Melanie Mitchell, Bruno Jedynak
    Comments: 8 pages, 4 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We develop a general-purpose algorithm using a Bayesian optimization
    framework for the efficient refinement of object proposals. While recent
    research has achieved substantial progress for object localization and related
    objectives in computer vision, current state-of-the-art object localization
    procedures are nevertheless encumbered by inefficiency and inaccuracy. We
    present a novel, computationally efficient method for refining inaccurate
    bounding-box proposals for a target object using Bayesian optimization.
    Offline, image features from a convolutional neural network are used to train a
    model to predict the offset distance of an object proposal from a target
    object. Online, this model is used in a Bayesian active search to improve
    inaccurate object proposals. In experiments, we compare our approach to a
    state-of-the-art bounding-box regression method for localization refinement of
    pedestrian object proposals. Our method exhibits a substantial improvement for
    the task of localization refinement over this baseline regression method.

    More is Less: A More Complicated Network with Less Inference Complexity

    Xuanyi Dong, Junshi Huang, Yi Yang, Shuicheng Yan
    Comments: This paper has been accepted by the IEEE CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we present a novel and general network structure towards
    accelerating the inference process of convolutional neural networks, which is
    more complicated in network structure yet with less inference complexity. The
    core idea is to equip each original convolutional layer with another low-cost
    collaborative layer (LCCL), and the element-wise multiplication of the ReLU
    outputs of these two parallel layers produces the layer-wise output. The
    combined layer is potentially more discriminative than the original
    convolutional layer, and its inference is faster for two reasons: 1) the zero
    cells of the LCCL feature maps will remain zero after element-wise
    multiplication, and thus it is safe to skip the calculation of the
    corresponding high-cost convolution in the original convolutional layer, 2)
    LCCL is very fast if it is implemented as a 1*1 convolution or only a single
    filter shared by all channels. Extensive experiments on the CIFAR-10, CIFAR-100
    and ILSCRC-2012 benchmarks show that our proposed network structure can
    accelerate the inference process by 32\% on average with negligible performance
    drop.

    AMAT: Medial Axis Transform for Natural Images

    Stavros Tsogkas, Sven Dickinson
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The medial axis transform (MAT) is a powerful shape abstraction that has been
    successfully used in shape editing, matching and retrieval. Despite its long
    history, the MAT has not found widespread use in tasks involving natural
    images, due to the lack of a generalization that accommodates color and
    texture. In this paper we introduce Appearance-MAT (AMAT), by framing the MAT
    of natural images as a weighted geometric set cover problem. We make the
    following contributions: i) we extend previous medial point detection methods
    for color images, by associating each medial point with a local scale; ii)
    inspired by the invertibility property of the binary MAT, we also associate
    each medial point with a local encoding that allows us to invert the AMAT,
    reconstructing the input image; iii) we describe a clustering scheme that takes
    advantage of the additional scale and appearance information to group
    individual points into medial branches, providing a shape decomposition of the
    underlying image regions. In our experiments, we show state-of-the-art
    performance in medial point detection on Berkeley Medial AXes (BMAX500), a new
    dataset of medial axes based on the established BSDS500 database. We also
    measure the quality of reconstructed images from the same dataset, obtained by
    inverting their computed AMAT. Our approach delivers significantly better
    reconstruction quality with respect to three baselines, using just 10% of the
    image pixels. Our code is available at this https URL

    Temporal Non-Volume Preserving Approach to Facial Age-Progression and Age-Invariant Face Recognition

    Chi Nhan Duong, Kha Gia Quach, Khoa Luu, T. Hoang Ngan le, Marios Savvides
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Modeling the long-term facial aging process is extremely challenging due to
    the presence of large and non-linear variations during the face development
    stages. In order to efficiently address the problem, this work first decomposes
    the aging process into multiple short-term stages. Then, a novel generative
    probabilistic model, named Temporal Non-Volume Preserving (TNVP)
    transformation, is presented to model the facial aging process at each stage.
    Unlike Generative Adversarial Networks (GANs), which requires an empirical
    balance threshold, and Restricted Boltzmann Machines (RBM), an intractable
    model, our proposed TNVP approach guarantees a tractable density function,
    exact inference and evaluation for embedding the feature transformations
    between faces in consecutive stages. Our model shows its advantages not only in
    capturing the non-linear age related variance in each stage but also producing
    a smooth synthesis in age progression across faces. Our approach can model any
    face in the wild provided with only four basic landmark points. Moreover, the
    structure can be transformed into a deep convolutional network while keeping
    the advantages of probabilistic models with tractable log-likelihood density
    estimation. Our method is evaluated in both terms of synthesizing
    age-progressed faces and cross-age face verification and consistently shows the
    state-of-the-art results in various face aging databases, i.e. FG-NET, MORPH,
    AginG Faces in the Wild (AGFW), and Cross-Age Celebrity Dataset (CACD). A
    large-scale face verification on Megaface challenge 1 is also performed to
    further show the advantages of our proposed approach.

    Adversarial Examples for Semantic Segmentation and Object Detection

    Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, Alan Yuille
    Comments: Submitted to ICCV 2017 (10 pages, 6 figures)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    It has been well demonstrated that adversarial examples, i.e., natural images
    with visually imperceptible perturbations added, generally exist for deep
    networks to fail on image classification. In this paper, we extend adversarial
    examples to semantic segmentation and object detection which are much more
    difficult. Our observation is that both segmentation and detection are based on
    classifying multiple targets on an image (e.g., the basic target is a pixel or
    a receptive field in segmentation, and an object proposal in detection), which
    inspires us to optimize a loss function over a set of pixels/proposals for
    generating adversarial perturbations. Based on this idea, we propose a novel
    algorithm named Dense Adversary Generation (DAG), which generates a large
    family of adversarial examples, and applies to a wide range of state-of-the-art
    deep networks for segmentation and detection. We also find that the adversarial
    perturbations can be transferred across networks with different training data,
    based on different architectures, and even for different recognition tasks. In
    particular, the transferability across networks with the same architecture is
    more significant than in other cases. Besides, summing up heterogeneous
    perturbations often leads to better transfer performance, which provides an
    effective method of black-box adversarial attack.

    Deep Residual Learning for Instrument Segmentation in Robotic Surgery

    Daniil Pakhomov, Vittal Premachandran, Max Allan, Mahdi Azizian, Nassir Navab
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Detection, tracking, and pose estimation of surgical instruments are crucial
    tasks for computer assistance during minimally invasive robotic surgery. In the
    majority of cases, the first step is the automatic segmentation of surgical
    tools. Prior work has focused on binary segmentation, where the objective is to
    label every pixel in an image as tool or background. We improve upon previous
    work in two major ways. First, we leverage recent techniques such as deep
    residual learning and dilated convolutions to advance binary-segmentation
    performance. Second, we extend the approach to multi-class segmentation, which
    lets us segment different parts of the tool, in addition to background. We
    demonstrate the performance of this method on the MICCAI Endoscopic Vision
    Challenge Robotic Instruments dataset.

    A Dynamic Programming Solution to Bounded Dejittering Problems

    Lukas F. Lang
    Comments: The final publication is available at link.springer.com
    Subjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV)

    We propose a dynamic programming solution to image dejittering problems with
    bounded displacements and obtain efficient algorithms for the removal of line
    jitter, line pixel jitter, and pixel jitter.

    Where to put the Image in an Image Caption Generator

    Marc Tanti (1), Albert Gatt (1), Kenneth P. Camilleri (1) ((1) University of Malta)
    Comments: under review, 29 pages, 5 figures, 6 tables
    Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    When a neural language model is used for caption generation, the image
    information can be fed to the neural network either by directly incorporating
    it in a recurrent neural network — conditioning the language model by
    injecting image features — or in a layer following the recurrent neural
    network — conditioning the language model by merging the image features. While
    merging implies that visual features are bound at the end of the caption
    generation process, injecting can bind the visual features at a variety stages.
    In this paper we empirically show that late binding is superior to early
    binding in terms of different evaluation metrics. This suggests that the
    different modalities (visual and linguistic) for caption generation should not
    be jointly encoded by the RNN; rather, the multimodal integration should be
    delayed to a subsequent stage. Furthermore, this suggests that recurrent neural
    networks should not be viewed as actually generating text, but only as encoding
    it for prediction in a subsequent layer.

    Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs

    Yunzhu Li, Jiaming Song, Stefano Ermon
    Comments: 10 pages, 6 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    The goal of imitation learning is to match example expert behavior, without
    access to a reinforcement signal. Expert demonstrations provided by humans,
    however, often show significant variability due to latent factors that are not
    explicitly modeled. We introduce an extension to the Generative Adversarial
    Imitation Learning method that can infer the latent structure of human
    decision-making in an unsupervised way. Our method can not only imitate complex
    behaviors, but also learn interpretable and meaningful representations. We
    demonstrate that the approach is applicable to high-dimensional environments
    including raw visual inputs. In the highway driving domain, we show that a
    model learned from demonstrations is able to both produce different styles of
    human-like driving behaviors and accurately anticipate human actions. Our
    method surpasses various baselines in terms of performance and functionality.

    Who Said What: Modeling Individual Labelers Improves Classification

    Melody Y. Guan, Varun Gulshan, Andrew M. Dai, Geoffrey E. Hinton
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Data are often labeled by many different experts with each expert only
    labeling a small fraction of the data and each data point being labeled by
    several experts. This reduces the workload on individual experts and also gives
    a better estimate of the unobserved ground truth. When experts disagree, the
    standard approaches are to treat the majority opinion as the correct label or
    to model the correct label as a distribution. These approaches, however, do not
    make any use of potentially valuable information about which expert produced
    which label. To make use of this extra information, we propose modeling the
    experts individually and then learning averaging weights for combining them,
    possibly in sample-specific ways. This allows us to give more weight to more
    reliable experts and take advantage of the unique strengths of individual
    experts at classifying certain types of data. Here we show that our approach
    leads to improvements in computer-aided diagnosis of diabetic retinopathy. We
    also show that our method performs better than competing algorithms by Welinder
    and Perona, and by Mnih and Hinton. Our work offers an innovative approach for
    dealing with the myriad real-world settings that use expert opinions to define
    labels for training.

    Multivariate Regression with Gross Errors on Manifold-valued Data

    Xiaowei Zhang, Xudong Shi, Yu Sun, Li Cheng
    Comments: Submitted to a journal
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)

    We consider the topic of multivariate regression on manifold-valued output,
    that is, for a multivariate observation, its output response lies on a
    manifold. Moreover, we propose a new regression model to deal with the presence
    of grossly corrupted manifold-valued responses, a bottleneck issue commonly
    encountered in practical scenarios. Our model first takes a correction step on
    the grossly corrupted responses via geodesic curves on the manifold, and then
    performs multivariate linear regression on the corrected data. This results in
    a nonconvex and nonsmooth optimization problem on manifolds. To this end, we
    propose a dedicated approach named PALMR, by utilizing and extending the
    proximal alternating linearized minimization techniques. Theoretically, we
    investigate its convergence property, where it is shown to converge to a
    critical point under mild conditions. Empirically, we test our model on both
    synthetic and real diffusion tensor imaging data, and show that our model
    outperforms other multivariate regression models when manifold-valued responses
    contain gross errors, and is effective in identifying gross errors.


    Artificial Intelligence

    On Automating the Doctrine of Double Effect

    Naveen Sundar Govindarajulu, Selmer Bringsjord
    Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Robotics (cs.RO)

    The doctrine of double effect ((mathcal{DDE})) is a long-studied ethical
    principle that governs when actions that have both positive and negative
    effects are to be allowed. The goal in this paper is to automate
    (mathcal{DDE}). We briefly present (mathcal{DDE}), and use a first-order
    modal logic, the deontic cognitive event calculus, as our framework to
    formalize the doctrine. We present formalizations of increasingly stronger
    versions of the principle, including what is known as the doctrine of triple
    effect. We then use our framework to simulate successfully scenarios that have
    been used to test for the presence of the principle in human subjects. Our
    framework can be used in two different modes: One can use it to build
    (mathcal{DDE})-compliant autonomous systems from scratch, or one can use it to
    verify that a given AI system is (mathcal{DDE})-compliant, by applying a
    (mathcal{DDE}) layer on an existing system or model. For the latter mode, the
    underlying AI system can be built using any architecture (planners, deep neural
    networks, bayesian networks, knowledge-representation systems, or a hybrid); as
    long as the system exposes a few parameters in its model, such verification is
    possible. The role of the (mathcal{DDE}) layer here is akin to a (dynamic or
    static) software verifier that examines existing software modules. Finally, we
    end by presenting initial work on how one can apply our (mathcal{DDE}) layer
    to the STRIPS-style planning model, and to a modified POMDP model.

    Team Formation for Scheduling Educational Material in Massive Online Classes

    Sanaz Bahargam, Dóra Erdos, Azer Bestavros, Evimaria Terzi
    Subjects: Artificial Intelligence (cs.AI)

    Whether teaching in a classroom or a Massive Online Open Course it is crucial
    to present the material in a way that benefits the audience as a whole. We
    identify two important tasks to solve towards this objective, 1 group students
    so that they can maximally benefit from peer interaction and 2 find an optimal
    schedule of the educational material for each group. Thus, in this paper, we
    solve the problem of team formation and content scheduling for education. Given
    a time frame d, a set of students S with their required need to learn different
    activities T and given k as the number of desired groups, we study the problem
    of finding k group of students. The goal is to teach students within time frame
    d such that their potential for learning is maximized and find the best
    schedule for each group. We show this problem to be NP-hard and develop a
    polynomial algorithm for it. We show our algorithm to be effective both on
    synthetic as well as a real data set. For our experiments, we use real data on
    students’ grades in a Computer Science department. As part of our contribution,
    we release a semi-synthetic dataset that mimics the properties of the real
    data.

    Transfer learning for music classification and regression tasks

    Keunwoo Choi, György Fazekas, Mark Sandler, Kyunghyun Cho
    Comments: 16 pages, single column, NOT iclr submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD)

    In this paper, we present a transfer learning approach for music
    classification and regression tasks. We propose to use a pretrained convnet
    feature, a concatenated feature vector using activations of feature maps of
    multiple layers in a trained convolutional network. We show that how this
    convnet feature can serve as a general-purpose music representation. In the
    experiment, a convnet is trained for music tagging and then transferred for
    many music-related classification and regression tasks as well as an
    audio-related classification task. In experiments, the convnet feature
    outperforms the baseline MFCC feature in all tasks and many reported approaches
    of aggregating MFCCs and low- and high-level music features.

    Intelligent bidirectional rapidly-exploring random trees for optimal motion planning in complex cluttered environments

    Ahmed Hussain Qureshi, Yasar Ayaz
    Comments: The article is published in Elsevier Journal of Robotics and Autonomous Systems
    Journal-ref: Robotics and Autonomous Systems 68 (2015): 1-11
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

    The sampling based motion planning algorithm known as Rapidly-exploring
    Random Trees (RRT) has gained the attention of many researchers due to their
    computational efficiency and effectiveness. Recently, a variant of RRT called
    RRT* has been proposed that ensures asymptotic optimality. Subsequently its
    bidirectional version has also been introduced in the literature known as
    Bidirectional-RRT* (B-RRT*). We introduce a new variant called Intelligent
    Bidirectional-RRT* (IB-RRT*) which is an improved variant of the optimal RRT*
    and bidirectional version of RRT* (B-RRT*) algorithms and is specially designed
    for complex cluttered environments. IB-RRT* utilizes the bidirectional trees
    approach and introduces intelligent sample insertion heuristic for fast
    convergence to the optimal path solution using uniform sampling heuristics. The
    proposed algorithm is evaluated theoretically and experimental results are
    presented that compares IB-RRT* with RRT* and B-RRT*. Moreover, experimental
    results demonstrate the superior efficiency of IB-RRT* in comparison with RRT*
    and B-RRT in complex cluttered environments.

    Socially Aware Motion Planning with Deep Reinforcement Learning

    Yu Fan Chen, Michael Everett, Miao Liu, Jonathan P. How
    Comments: 8 pages
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

    For robotic vehicles to navigate safely and efficiently in pedestrian-rich
    environments, it is important to model subtle human behaviors and navigation
    rules. However, while instinctive to humans, socially compliant navigation is
    still difficult to quantify due to the stochasticity in people’s behaviors.
    Existing works are mostly focused on using feature-matching techniques to
    describe and imitate human paths, but often do not generalize well since the
    feature values can vary from person to person, and even run to run. This work
    notes that while it is challenging to directly specify the details of what to
    do (precise mechanisms of human navigation), it is straightforward to specify
    what not to do (violations of social norms). Specifically, using deep
    reinforcement learning, this work develops a time-efficient navigation policy
    that respects common social norms. The proposed method is shown to enable fully
    autonomous navigation of a robotic vehicle moving at human walking speed in an
    environment with many pedestrians.

    Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs

    Yunzhu Li, Jiaming Song, Stefano Ermon
    Comments: 10 pages, 6 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    The goal of imitation learning is to match example expert behavior, without
    access to a reinforcement signal. Expert demonstrations provided by humans,
    however, often show significant variability due to latent factors that are not
    explicitly modeled. We introduce an extension to the Generative Adversarial
    Imitation Learning method that can infer the latent structure of human
    decision-making in an unsupervised way. Our method can not only imitate complex
    behaviors, but also learn interpretable and meaningful representations. We
    demonstrate that the approach is applicable to high-dimensional environments
    including raw visual inputs. In the highway driving domain, we show that a
    model learned from demonstrations is able to both produce different styles of
    human-like driving behaviors and accurately anticipate human actions. Our
    method surpasses various baselines in terms of performance and functionality.

    Surrogate Model of Multi-Period Flexibility from a Home Energy Management System

    Rui Pinto, Ricardo Bessa, Manuel Matos
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

    Near-future electric distribution grids operation will have to rely on
    demand-side flexibility, both by implementation of demand response strategies
    and by taking advantage of the intelligent management of increasingly common
    small-scale energy storage. Home energy management systems (HEMS) will play a
    crucial role on the flexibility provision to both system operators and market
    players like aggregators. Modeling multi-period flexibility from residential
    consumers (HEMS flexibility), such as battery storage and electric water
    heater, while complying with internal constraints (comfort levels, data
    privacy) and uncertainty is a complex task. This paper describes a
    computational method that is capable of efficiently define and learn the
    feasible flexibility set from controllable resources connected to a HEMS. An
    Evolutionary Particle Swarm Optimization (EPSO) algorithm is adopted and
    reshaped to derive a set of feasible temporal trajectories for the residential
    net-load, considering storage, flexible appliances, and predefined costumer
    preferences, as well as load and photovoltaic (PV) forecast uncertainty. A
    support vector data description (SVDD) algorithm is used to build models
    capable of classifying feasible and unfeasible HEMS operating trajectories upon
    request from an optimization/control algorithm operated by a DSO or market
    player.

    Open Vocabulary Scene Parsing

    Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, Antonio Torralba
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Recognizing arbitrary objects in the wild has been a challenging problem due
    to the limitations of existing classification models and datasets. In this
    paper, we propose a new task that aims at parsing scene with a large and open
    vocabulary, and several evaluation metrics are explored for this problem. Our
    proposed approach to this problem is a joint image pixel and word concept
    embeddings framework, where word concepts are connected by semantic relations.
    We validate the open vocabulary prediction ability of our framework on ADE20K
    dataset which covers a wide variety of scenes and objects. We further explore
    the trained joint embedding space to show its interpretability.

    Comparing Rule-Based and Deep Learning Models for Patient Phenotyping

    Sebastian Gehrmann, Franck Dernoncourt, Yeran Li, Eric T. Carlson, Joy T. Wu, Jonathan Welt, John Foote Jr., Edward T. Moseley, David W. Grant, Patrick D. Tyler, Leo Anthony Celi
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Objective: We investigate whether deep learning techniques for natural
    language processing (NLP) can be used efficiently for patient phenotyping.
    Patient phenotyping is a classification task for determining whether a patient
    has a medical condition, and is a crucial part of secondary analysis of
    healthcare data. We assess the performance of deep learning algorithms and
    compare them with classical NLP approaches.

    Materials and Methods: We compare convolutional neural networks (CNNs),
    n-gram models, and approaches based on cTAKES that extract pre-defined medical
    concepts from clinical notes and use them to predict patient phenotypes. The
    performance is tested on 10 different phenotyping tasks using 1,610 discharge
    summaries extracted from the MIMIC-III database.

    Results: CNNs outperform other phenotyping algorithms in all 10 tasks. The
    average F1-score of our model is 76 (PPV of 83, and sensitivity of 71) with our
    model having an F1-score up to 37 points higher than alternative approaches. We
    additionally assess the interpretability of our model by presenting a method
    that extracts the most salient phrases for a particular prediction.

    Conclusion: We show that NLP methods based on deep learning improve the
    performance of patient phenotyping. Our CNN-based algorithm automatically
    learns the phrases associated with each patient phenotype. As such, it reduces
    the annotation complexity for clinical domain experts, who are normally
    required to develop task-specific annotation rules and identify relevant
    phrases. Our method performs well in terms of both performance and
    interpretability, which indicates that deep learning is an effective approach
    to patient phenotyping based on clinicians’ notes.


    Information Retrieval

    Towards Effective Research-Paper Recommender Systems and User Modeling based on Mind Maps

    Joeran Beel
    Comments: PhD Thesis, Otto-von-Guericke University Magdeburg, Germany
    Subjects: Information Retrieval (cs.IR)

    While user-modeling and recommender systems successfully utilize items like
    emails, news, and movies, they widely neglect mind-maps as a source for user
    modeling. We consider this a serious shortcoming since we assume user modeling
    based on mind maps to be equally effective as user modeling based on other
    items. Hence, millions of mind-mapping users could benefit from user-modeling
    applications such as recommender systems. The objective of this doctoral thesis
    is to develop an effective user-modeling approach based on mind maps. To
    achieve this objective, we integrate a recommender system in our mind-mapping
    and reference-management software Docear. The recommender system builds user
    models based on the mind maps, and recommends research papers based on the user
    models. As part of our research, we identify several variables relating to
    mind-map-based user modeling, and evaluate the variables’ impact on
    user-modeling effectiveness with an offline evaluation, a user study, and an
    online evaluation based on 430,893 recommendations displayed to 4,700 users. We
    find, among others, that the number of analyzed nodes, modification time,
    visibility of nodes, relations between nodes, and number of children and
    siblings of a node affect the effectiveness of user modeling. When all
    variables are combined in a favorable way, this novel approach achieves
    click-through rates of 7.20%, which is nearly twice as effective as the best
    baseline. In addition, we show that user modeling based on mind maps performs
    about as well as user modeling based on other items, namely the research
    articles users downloaded or cited. Our findings let us to conclude that user
    modeling based on mind maps is a promising research field, and that developers
    of mind-mapping applications should integrate recommender systems into their
    applications. Such systems could create additional value for millions of
    mind-mapping users.

    Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia

    Joeran Beel, Bela Gipp, Akiko Aizawa
    Comments: Accepted for publication at the JCDL conference 2017
    Subjects: Information Retrieval (cs.IR); Digital Libraries (cs.DL)

    Recommender systems for research papers are offered only by few digital
    libraries and reference managers, although they could help users of digital
    libraries etc. to better deal with information overload. One reason might be
    that operators of digital libraries do not have the resources to develop and
    maintain a recommender system. In this paper, we introduce Mr. DLib’s
    recommender-system as-a-service. Mr. DLib’s service allows digital libraries
    and reference managers to easily integrate a recommender system. The effort is
    low, and no knowledge about recommender systems is required. Mr. DLib’s first
    pilot partner is the digital library Sowiport. Between September 2016 and
    February 2017, Mr. DLib delivered 60 million recommendations to Sowiport with a
    click-through rate of 0.15% on average. Mr. DLib is open source, non-profit,
    and supports open data.

    Apache Lucene as Content-Based-Filtering Recommender System: 3 Lessons Learned

    Stefan Langer, Joeran Beel
    Comments: Accepted for publication at the 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR2017)
    Subjects: Information Retrieval (cs.IR)

    For the past few years, we used Apache Lucene as recommendation frame-work in
    our scholarly-literature recommender system of the reference-management
    software Docear. In this paper, we share three lessons learned from our work
    with Lucene. First, recommendations with relevance scores below 0.025 tend to
    have significantly lower click-through rates than recommendations with
    relevance scores above 0.025. Second, by picking ten recommendations randomly
    from Lucene’s top50 search results, click-through rate decreased by 15%,
    compared to recommending the top10 results. Third, the number of returned
    search results tend to predict how high click-through rates will be: when
    Lucene returns less than 1,000 search results, click-through rates tend to be
    around half as high as if 1,000+ results are returned.

    Analyzing Evolving Stories in News Articles

    Roberto Camacho Barranco, Arnold P. Boedihardjo, M. Shahriar Hossain
    Comments: submitted to KDD 2017, 9 pages, 10 figures
    Subjects: Information Retrieval (cs.IR); Information Theory (cs.IT)

    There is an overwhelming number of news articles published every day around
    the globe. Following the evolution of a news-story is a difficult task given
    that there is no such mechanism available to track back in time to study the
    diffusion of the relevant events in digital news feeds. The techniques
    developed so far to extract meaningful information from a massive corpus rely
    on similarity search, which results in a myopic loopback to the same topic
    without providing the needed insights to hypothesize the origin of a story that
    may be completely different than the news today. In this paper, we present an
    algorithm that mines historical data to detect the origin of an event, segments
    the timeline into disjoint groups of coherent news articles, and outlines the
    most important documents in a timeline with a soft probability to provide a
    better understanding of the evolution of a story. Qualitative and quantitative
    approaches to evaluate our framework demonstrate that our algorithm discovers
    statistically significant and meaningful stories in reasonable time.
    Additionally, a relevant case study on a set of news articles demonstrates that
    the generated output of the algorithm holds the promise to aid prediction of
    future entities in a story.


    Computation and Language

    A Sentence Simplification System for Improving Relation Extraction

    Christina Niklaus, Bernhard Bermeitinger, Siegfried Handschuh, André Freitas
    Comments: 26th International Conference on Computational Linguistics (COLING 2016)
    Subjects: Computation and Language (cs.CL)

    In this demo paper, we present a text simplification approach that is
    directed at improving the performance of state-of-the-art Open Relation
    Extraction (RE) systems. As syntactically complex sentences often pose a
    challenge for current Open RE approaches, we have developed a simplification
    framework that performs a pre-processing step by taking a single sentence as
    input and using a set of syntactic-based transformation rules to create a
    textual input that is easier to process for subsequently applied Open RE
    systems.

    Question Answering from Unstructured Text by Retrieval and Comprehension

    Yusuke Watanabe, Bhuwan Dhingra, Ruslan Salakhutdinov
    Subjects: Computation and Language (cs.CL)

    Open domain Question Answering (QA) systems must interact with external
    knowledge sources, such as web pages, to find relevant information. Information
    sources like Wikipedia, however, are not well structured and difficult to
    utilize in comparison with Knowledge Bases (KBs). In this work we present a
    two-step approach to question answering from unstructured text, consisting of a
    retrieval step and a comprehension step. For comprehension, we present an RNN
    based attention model with a novel mixture mechanism for selecting answers from
    either retrieved articles or a fixed vocabulary. For retrieval we introduce a
    hand-crafted model and a neural model for ranking relevant articles. We achieve
    state-of-the-art performance on W IKI M OVIES dataset, reducing the error by
    40%. Our experimental results further demonstrate the importance of each of the
    introduced components.

    Learning Simpler Language Models with the Delta Recurrent Neural Network Framework

    Alexander G. Ororbia II, Tomas Mikolov, David Reitter
    Subjects: Computation and Language (cs.CL)

    Learning useful information across long time lags is a critical and difficult
    problem for temporal neural models in tasks like language modeling. Existing
    architectures that address the issue are often complex and costly to train. The
    Delta Recurrent Neural Network (Delta-RNN) framework is a simple and
    high-performing design that unifies previously proposed gated neural models.
    The Delta-RNN models maintain longer-term memory by learning to interpolate
    between a fast-changing data-driven representation and a slowly changing,
    implicitly stable state. This requires hardly any more parameters than a
    classical simple recurrent network. The models outperform popular complex
    architectures, such as the Long Short Term Memory (LSTM) and the Gated
    Recurrent Unit (GRU) and achieve state-of-the art performance in language
    modeling at character and word levels and yield comparable performance at the
    subword level.

    LEPOR: An Augmented Machine Translation Evaluation Metric

    Lifeng Han
    Comments: 132 pages, thesis
    Subjects: Computation and Language (cs.CL)

    Machine translation (MT) was developed as one of the hottest research topics
    in the natural language processing (NLP) literature. One important issue in MT
    is that how to evaluate the MT system reasonably and tell us whether the
    translation system makes an improvement or not. The traditional manual judgment
    methods are expensive, time-consuming, unrepeatable, and sometimes with low
    agreement. On the other hand, the popular automatic MT evaluation methods have
    some weaknesses. Firstly, they tend to perform well on the language pairs with
    English as the target language, but weak when English is used as source.
    Secondly, some methods rely on many additional linguistic features to achieve
    good performance, which makes the metric unable to replicate and apply to other
    language pairs easily. Thirdly, some popular metrics utilize incomprehensive
    factors, which result in low performance on some practical tasks. In this
    thesis, to address the existing problems, we design novel MT evaluation methods
    and investigate their performances on different languages. Firstly, we design
    augmented factors to yield highly accurate evaluation.Secondly, we design a
    tunable evaluation model where weighting of factors can be optimised according
    to the characteristics of languages. Thirdly, in the enhanced version of our
    methods, we design concise linguistic feature using POS to show that our
    methods can yield even higher performance when using some external linguistic
    resources. Finally, we introduce the practical performance of our metrics in
    the ACL-WMT workshop shared tasks, which show that the proposed methods are
    robust across different languages.

    Comparing Rule-Based and Deep Learning Models for Patient Phenotyping

    Sebastian Gehrmann, Franck Dernoncourt, Yeran Li, Eric T. Carlson, Joy T. Wu, Jonathan Welt, John Foote Jr., Edward T. Moseley, David W. Grant, Patrick D. Tyler, Leo Anthony Celi
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Objective: We investigate whether deep learning techniques for natural
    language processing (NLP) can be used efficiently for patient phenotyping.
    Patient phenotyping is a classification task for determining whether a patient
    has a medical condition, and is a crucial part of secondary analysis of
    healthcare data. We assess the performance of deep learning algorithms and
    compare them with classical NLP approaches.

    Materials and Methods: We compare convolutional neural networks (CNNs),
    n-gram models, and approaches based on cTAKES that extract pre-defined medical
    concepts from clinical notes and use them to predict patient phenotypes. The
    performance is tested on 10 different phenotyping tasks using 1,610 discharge
    summaries extracted from the MIMIC-III database.

    Results: CNNs outperform other phenotyping algorithms in all 10 tasks. The
    average F1-score of our model is 76 (PPV of 83, and sensitivity of 71) with our
    model having an F1-score up to 37 points higher than alternative approaches. We
    additionally assess the interpretability of our model by presenting a method
    that extracts the most salient phrases for a particular prediction.

    Conclusion: We show that NLP methods based on deep learning improve the
    performance of patient phenotyping. Our CNN-based algorithm automatically
    learns the phrases associated with each patient phenotype. As such, it reduces
    the annotation complexity for clinical domain experts, who are normally
    required to develop task-specific annotation rules and identify relevant
    phrases. Our method performs well in terms of both performance and
    interpretability, which indicates that deep learning is an effective approach
    to patient phenotyping based on clinicians’ notes.

    Morphological Analysis for the Maltese Language: The Challenges of a Hybrid System

    Claudia Borg, Albert Gatt
    Comments: 11pages, Proceedings of the 3rd Arabic Natural Language Processing Workshop (WANLP’17)
    Subjects: Computation and Language (cs.CL)

    Maltese is a morphologically rich language with a hybrid morphological system
    which features both concatenative and non-concatenative processes. This paper
    analyses the impact of this hybridity on the performance of machine learning
    techniques for morphological labelling and clustering. In particular, we
    analyse a dataset of morphologically related word clusters to evaluate the
    difference in results for concatenative and nonconcatenative clusters. We also
    describe research carried out in morphological labelling, with a particular
    focus on the verb category. Two evaluations were carried out, one using an
    unseen dataset, and another one using a gold standard dataset which was
    manually labelled. The gold standard dataset was split into concatenative and
    non-concatenative to analyse the difference in results between the two
    morphological systems.

    Simplifying the Bible and Wikipedia Using Statistical Machine Translation

    Yohan Jo
    Subjects: Computation and Language (cs.CL)

    I started this work with the hope of generating a text synthesizer (like a
    musical synthesizer) that can imitate certain linguistic styles. Most of the
    report focuses on text simplification using statistical machine translation
    (SMT) techniques. I applied MOSES to a parallel corpus of the Bible (King James
    Version and Easy-to-Read Version) and that of Wikipedia articles (normal and
    simplified). I report the importance of the three main components of
    SMT—phrase translation, language model, and recording—by changing their
    weights and comparing the resulting quality of simplified text in terms of
    METEOR and BLEU. Toward the end of the report will be presented some examples
    of text “synthesized” into the King James style.

    Sequence-to-Sequence Models Can Directly Transcribe Foreign Speech

    Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen
    Comments: Submitted to Interspeech 2017
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    We present a recurrent encoder-decoder deep neural network architecture that
    directly translates speech in one language into text in another. The model does
    not explicitly transcribe the speech into text in the source language, nor does
    it require supervision from the ground truth source language transcription
    during training. We apply a slightly modified sequence-to-sequence with
    attention architecture that has previously been used for speech recognition and
    show that it can be repurposed for this more complex task, illustrating the
    power of attention-based models. A single model trained end-to-end obtains
    state-of-the-art performance on the Fisher Callhome Spanish-English speech
    translation task, outperforming a cascade of independently trained
    sequence-to-sequence speech recognition and machine translation models by 1.8
    BLEU points on the Fisher test set. In addition, we find that making use of the
    training data in both languages by multi-task training sequence-to-sequence
    speech translation and recognition models with a shared encoder network can
    improve performance by a further 1.4 BLEU points.

    Where to put the Image in an Image Caption Generator

    Marc Tanti (1), Albert Gatt (1), Kenneth P. Camilleri (1) ((1) University of Malta)
    Comments: under review, 29 pages, 5 figures, 6 tables
    Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    When a neural language model is used for caption generation, the image
    information can be fed to the neural network either by directly incorporating
    it in a recurrent neural network — conditioning the language model by
    injecting image features — or in a layer following the recurrent neural
    network — conditioning the language model by merging the image features. While
    merging implies that visual features are bound at the end of the caption
    generation process, injecting can bind the visual features at a variety stages.
    In this paper we empirically show that late binding is superior to early
    binding in terms of different evaluation metrics. This suggests that the
    different modalities (visual and linguistic) for caption generation should not
    be jointly encoded by the RNN; rather, the multimodal integration should be
    delayed to a subsequent stage. Furthermore, this suggests that recurrent neural
    networks should not be viewed as actually generating text, but only as encoding
    it for prediction in a subsequent layer.

    Bootstrapping a Lexicon for Emotional Arousal in Software Engineering

    Mika V. Mäntylä, Nicole Novielli, Filippo Lanubile, Maëlick Claes, Miikka Kuutila
    Comments: 5 pages. Accepted version. Copyright IEEE
    Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL)

    Emotional arousal increases activation and performance but may also lead to
    burnout in software development. We present the first version of a Software
    Engineering Arousal lexicon (SEA) that is specifically designed to address the
    problem of emotional arousal in the software developer ecosystem. SEA is built
    using a bootstrapping approach that combines word embedding model trained on
    issue-tracking data and manual scoring of items in the lexicon. We show that
    our lexicon is able to differentiate between issue priorities, which are a
    source of emotional activation and then act as a proxy for arousal. The best
    performance is obtained by combining SEA (428 words) with a previously created
    general purpose lexicon by Warriner et al. (13,915 words) and it achieves
    Cohen’s d effect sizes up to 0.5.

    D.TRUMP: Data-mining Textual Responses to Uncover Misconception Patterns

    Joshua J. Michalenko, Andrew S. Lan, Richard G. Baraniuk
    Comments: 7 Pages, Submitted to EDM 2017, Workshop version accepted to L@S 2017
    Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL)

    An important, yet largely unstudied, problem in student data analysis is to
    detect misconceptions from students’ responses to open-response questions.
    Misconception detection enables instructors to deliver more targeted feedback
    on the misconceptions exhibited by many students in their class, thus improving
    the quality of instruction. In this paper, we propose D.TRUMP, a new natural
    language processing-based framework to detect the common misconceptions among
    students’ textual responses to short-answer questions. We propose a
    probabilistic model for students’ textual responses involving misconceptions
    and experimentally validate it on a real-world student-response dataset.
    Experimental results show that D.TRUMP excels at classifying whether a response
    exhibits one or more misconceptions. More importantly, it can also
    automatically detect the common misconceptions exhibited across responses from
    multiple students to multiple questions; this property is especially important
    at large scale, since instructors will no longer need to manually specify all
    possible misconceptions that students might exhibit.


    Distributed, Parallel, and Cluster Computing

    Private Learning on Networks: Part II

    Shripad Gade, Nitin H. Vaidya
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Optimization and Control (math.OC)

    Widespread deployment of distributed machine learning algorithms has raised
    new privacy challenges. The focus of this paper is on improving privacy of each
    participant’s local information (such as dataset or loss function) while
    collaboratively learning underlying model. We present two iterative algorithms
    for privacy preserving distributed learning. Our algorithms involves adding
    structured randomization to the state estimates. We prove deterministic
    correctness (in every execution) of our algorithm despite the iterates being
    perturbed by non-zero mean random variables. We motivate privacy using privacy
    analysis of a special case of our algorithm referred to as Function Sharing
    strategy (presented in [1]).

    MURS: Mitigating Memory Pressure in Data Processing Systems for Service

    Xuanhua Shi, Xiong Zhang, Ligang He, Hai Jin, Zhixiang Ke, Song Wu
    Comments: 10 pages, 7 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    It has been shown that in-memory computing systems suffer from serious memory
    pressure. The memory pressure will affect all submitted jobs. Memory pressure
    comes from the running tasks as they produce massive long-living data objects
    in the limited memory space. The long-living objects incur significant memory
    and CPU overheads. Some tasks cause the heavy memory pressure because of the
    operations and dataset they process, which in turn affect all running tasks in
    the system. Our studies show that a task often call several API functions
    provided by the need to constant memory space, while some need the linear
    memory space. As different models have different impact on memory pressure, we
    propose a method of classifying the models that the tasks belong to. The method
    uses the memory usage rate as the classification criteria. Further, we design a
    scheduler called MURS to mitigate the memory pressure. We implement MURS in
    Spark and conduct the experiments to evaluate the performance of MURS. The
    results show that when comparing to Spark, our scheduler can 1) decrease the
    execution time of submitted jobs by up to 65.8%, 2) mitigate the memory
    pressure in the server by decreasing the garbage collection time by up to 81%,
    and 3) reduce the data spilling, and hence disk I/Os, by approximately 90%.

    Analysis of Different Approaches of Parallel Block Processing for K-Means Clustering Algorithm

    C Rashmi
    Comments: 11 pages
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Distributed Computation has been a recent trend in engineering research.
    Parallel Computation is widely used in different areas of Data Mining, Image
    Processing, Simulating Models, Aerodynamics and so forth. One of the major
    usage of Parallel Processing is widely implemented for clustering the satellite
    images of size more than dimension of 1000×1000 in a legacy system. This paper
    mainly focuses on the different approaches of parallel block processing such as
    row-shaped, column-shaped and square-shaped. These approaches are applied for
    classification problem. These approaches is applied to the K-Means clustering
    algorithm as this is widely used for the detection of features for high
    resolution orthoimagery satellite images. The different approaches are
    analyzed, which lead to reduction in execution time and resulted the influence
    of improvement in performance measurement compared to sequential K-Means
    Clustering algorithm.

    Multileader WAN Paxos: Ruling the Archipelago with Fast Consensus

    Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas, Tevfik Kosar
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    We present WPaxos, a multileader wide area network (WAN) Paxos protocol, that
    achieves low-latency high-throughput consensus across WAN deployments. WPaxos
    dynamically partitions the global object-space across multiple concurrent
    leaders that are deployed strategically using flexible quorums. This
    partitioning and emphasis on local operations allow our protocol to
    significantly outperform leaderless approaches, such as EPaxos, while
    maintaining the same consistency guarantees. Unlike statically partitioned
    multiple Paxos deployments, WPaxos adapts dynamically to the changing access
    locality through adaptive object stealing. The ability to quickly react to
    changing access locality not only speeds up the protocol, but also enables
    support for mini-transactions.

    We implemented WPaxos and evaluated it across WAN deployments using the
    benchmarks introduced in the EPaxos work. Our results show that WPaxos achieves
    up to 18 times faster average request latency and 65 times faster median
    latency than EPaxos due to the reduction in WAN communication.

    Distributed Voting/Ranking with Optimal Number of States per Node

    Saber Salehkaleybar, Arsalan Sharif-Nassab, S. Jamaloddin Golestani
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    Considering a network with (n) nodes, where each node initially votes for one
    (or more) choices out of (K) possible choices, we present a Distributed
    Multi-choice Voting/Ranking (DMVR) algorithm to determine either the choice
    with maximum vote (the voting problem) or to rank all the choices in terms of
    their acquired votes (the ranking problem). The algorithm consolidates node
    votes across the network by updating the states of interacting nodes using two
    key operations, the union and the intersection. The proposed algorithm is
    simple, independent from network size, and easily scalable in terms of the
    number of choices (K), using only (K imes 2^{K-1}) nodal states for voting,
    and (K imes K!) nodal states for ranking. We prove the number of states to be
    optimal in the ranking case, this optimality is conjectured to also apply to
    the voting case. The time complexity of the algorithm is analyzed in complete
    graphs. We show that the time complexity for both ranking and voting is
    (O(log(n))) for given vote percentages, and is inversely proportional to the
    minimum of the vote percentage differences among various choices.

    Token-based Function Computation with Memory

    Saber Salehkaleybar, S. Jamaloddin Golestani
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

    In distributed function computation, each node has an initial value and the
    goal is to compute a function of these values in a distributed manner. In this
    paper, we propose a novel token-based approach to compute a wide class of
    target functions to which we refer as “Token-based function Computation with
    Memory” (TCM) algorithm. In this approach, node values are attached to tokens
    and travel across the network. Each pair of travelling tokens would coalesce
    when they meet, forming a token with a new value as a function of the original
    token values. In contrast to the Coalescing Random Walk (CRW) algorithm, where
    token movement is governed by random walk, meeting of tokens in our scheme is
    accelerated by adopting a novel chasing mechanism. We proved that, compared to
    the CRW algorithm, the TCM algorithm results in a reduction of time complexity
    by a factor of at least (sqrt{n/log(n)}) in Erd”os-Renyi and complete
    graphs, and by a factor of (log(n)/log(log(n))) in torus networks.
    Simulation results show that there is at least a constant factor improvement in
    the message complexity of TCM algorithm in all considered topologies.
    Robustness of the CRW and TCM algorithms in the presence of node failure is
    analyzed. We show that their robustness can be improved by running multiple
    instances of the algorithms in parallel.

    Randomized Load Balancing on Networks with Stochastic Inputs

    Leran Cai, Thomas Sauerwald
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    Iterative load balancing algorithms for indivisible tokens have been studied
    intensively in the past. Complementing previous worst-case analyses, we study
    an average-case scenario where the load inputs are drawn from a fixed
    probability distribution. For cycles, tori, hypercubes and expanders, we obtain
    almost matching upper and lower bounds on the discrepancy, the difference
    between the maximum and the minimum load. Our bounds hold for a variety of
    probability distributions including the uniform and binomial distribution but
    also distributions with unbounded range such as the Poisson and geometric
    distribution. For graphs with slow convergence like cycles and tori, our
    results demonstrate a substantial difference between the convergence in the
    worst- and average-case. An important ingredient in our analysis is new upper
    bound on the t-step transition probability of a general Markov chain, which is
    derived by invoking the evolving set process.


    Learning

    Deep Architectures for Modulation Recognition

    Nathan E West, Timothy J. O'Shea
    Comments: 7 pages, 14 figures, to be published in proceedings of IEEE DySPAN 2017
    Subjects: Learning (cs.LG)

    We survey the latest advances in machine learning with deep neural networks
    by applying them to the task of radio modulation recognition. Results show that
    radio modulation recognition is not limited by network depth and further work
    should focus on improving learned synchronization and equalization. Advances in
    these areas will likely come from novel architectures designed for these tasks
    or through novel training methods.

    GPU Activity Prediction using Representation Learning

    Aswin Raghavan, Mohamed Amer, Timothy Shields, David Zhang, Sek Chai
    Comments: Proceedings of the 33 rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48. Copyright 2016 by the author(s)
    Subjects: Learning (cs.LG)

    GPU activity prediction is an important and complex problem. This is due to
    the high level of contention among thousands of parallel threads. This problem
    was mostly addressed using heuristics. We propose a representation learning
    approach to address this problem. We model any performance metric as a temporal
    function of the executed instructions with the intuition that the flow of
    instructions can be identified as distinct activities of the code. Our
    experiments show high accuracy and non-trivial predictive power of
    representation learning on a benchmark.

    Automatic Decomposition of Self-Triggering Kernels of Hawkes Processes

    Rafael Lima, Jaesik Choi
    Subjects: Learning (cs.LG)

    Hawkes Processes capture self- and mutual-excitation between events when the
    arrival of one event makes future ones more likely to happen in time-series
    data. Identification of the temporal covariance kernel can reveal the
    underlying structure to better predict future events. In this paper, we present
    a new framework to represent time-series events with a composition of
    self-triggering kernels of Hawkes Processes. That is, the input time-series
    events are decomposed into multiple Hawkes Processes with heterogeneous
    kernels. Our automatic decomposition procedure is composed of three main steps:
    (1) discretized kernel estimation through frequency domain inversion equation
    associated with the covariance density, (2) greedy kernel decomposition through
    four base kernels and their combinations (addition and multiplication), and (3)
    automated report generation. We demonstrate that the new automatic
    decomposition procedure performs better to predict future events than the
    existing framework in real-world data.

    Multimodal deep learning approach for joint EEG-EMG data compression and classification

    Ahmed Ben Said, Amr Mohamed, Tarek Elfouly, Khaled Harras, Z. Jane Wang
    Comments: IEEE Wireless Communications and Networking Conference (WCNC), 2017
    Subjects: Learning (cs.LG)

    In this paper, we present a joint compression and classification approach of
    EEG and EMG signals using a deep learning approach. Specifically, we build our
    system based on the deep autoencoder architecture which is designed not only to
    extract discriminant features in the multimodal data representation but also to
    reconstruct the data from the latent representation using encoder-decoder
    layers. Since autoencoder can be seen as a compression approach, we extend it
    to handle multimodal data at the encoder layer, reconstructed and retrieved at
    the decoder layer. We show through experimental results, that exploiting both
    multimodal data intercorellation and intracorellation 1) Significantly reduces
    signal distortion particularly for high compression levels 2) Achieves better
    accuracy in classifying EEG and EMG signals recorded and labeled according to
    the sentiments of the volunteer.

    Multiple Instance Learning with the Optimal Sub-Pattern Assignment Metric

    Quang N. Tran, Ba-Ngu Vo, Dinh Phung, Ba-Tuong Vo, Thuong Nguyen
    Subjects: Learning (cs.LG)

    Multiple instance data are sets or multi-sets of unordered elements. Using
    metrics or distances for sets, we propose an approach to several multiple
    instance learning tasks, such as clustering (unsupervised learning),
    classification (supervised learning), and novelty detection (semi-supervised
    learning). In particular, we introduce the Optimal Sub-Pattern Assignment
    metric to multiple instance learning so as to provide versatile design choices.
    Numerical experiments on both simulated and real data are presented to
    illustrate the versatility of the proposed solution.

    Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs

    Yunzhu Li, Jiaming Song, Stefano Ermon
    Comments: 10 pages, 6 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    The goal of imitation learning is to match example expert behavior, without
    access to a reinforcement signal. Expert demonstrations provided by humans,
    however, often show significant variability due to latent factors that are not
    explicitly modeled. We introduce an extension to the Generative Adversarial
    Imitation Learning method that can infer the latent structure of human
    decision-making in an unsupervised way. Our method can not only imitate complex
    behaviors, but also learn interpretable and meaningful representations. We
    demonstrate that the approach is applicable to high-dimensional environments
    including raw visual inputs. In the highway driving domain, we show that a
    model learned from demonstrations is able to both produce different styles of
    human-like driving behaviors and accurately anticipate human actions. Our
    method surpasses various baselines in terms of performance and functionality.

    Uncertainty Quantification in the Classification of High Dimensional Data

    Andrea L. Bertozzi, Xiyang Luo, Andrew M. Stuart, Konstantinos C. Zygalakis
    Comments: 33 pages, 14 figures
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Classification of high dimensional data finds wide-ranging applications. In
    many of these applications equipping the resulting classification with a
    measure of uncertainty may be as important as the classification itself. In
    this paper we introduce, develop algorithms for, and investigate the properties
    of, a variety of Bayesian models for the task of binary classification; via the
    posterior distribution on the classification labels, these methods
    automatically give measures of uncertainty. The methods are all based around
    the graph formulation of semi-supervised learning.

    We provide a unified framework which brings together a variety of methods
    which have been introduced in different communities within the mathematical
    sciences. We study probit classification, generalize the level-set method for
    Bayesian inverse problems to the classification setting, and generalize the
    Ginzburg-Landau optimization-based classifier to a Bayesian setting; we also
    show that the probit and level set approaches are natural relaxations of the
    harmonic function approach.

    We introduce efficient numerical methods, suited to large data-sets, for both
    MCMC-based sampling as well as gradient-based MAP estimation. Through numerical
    experiments we study classification accuracy and uncertainty quantification for
    our models; these experiments showcase a suite of datasets commonly used to
    evaluate graph-based semi-supervised learning algorithms.

    Who Said What: Modeling Individual Labelers Improves Classification

    Melody Y. Guan, Varun Gulshan, Andrew M. Dai, Geoffrey E. Hinton
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Data are often labeled by many different experts with each expert only
    labeling a small fraction of the data and each data point being labeled by
    several experts. This reduces the workload on individual experts and also gives
    a better estimate of the unobserved ground truth. When experts disagree, the
    standard approaches are to treat the majority opinion as the correct label or
    to model the correct label as a distribution. These approaches, however, do not
    make any use of potentially valuable information about which expert produced
    which label. To make use of this extra information, we propose modeling the
    experts individually and then learning averaging weights for combining them,
    possibly in sample-specific ways. This allows us to give more weight to more
    reliable experts and take advantage of the unique strengths of individual
    experts at classifying certain types of data. Here we show that our approach
    leads to improvements in computer-aided diagnosis of diabetic retinopathy. We
    also show that our method performs better than competing algorithms by Welinder
    and Perona, and by Mnih and Hinton. Our work offers an innovative approach for
    dealing with the myriad real-world settings that use expert opinions to define
    labels for training.

    Exploration–Exploitation in MDPs with Options

    Ronan Fruit, Alessandro Lazaric
    Subjects: Learning (cs.LG)

    While a large body of empirical results show that temporally-extended actions
    and options may significantly affect the learning performance of an agent, the
    theoretical understanding of how and when options can be beneficial in online
    reinforcement learning is relatively limited. In this paper, we derive an upper
    and lower bound on the regret of a variant of UCRL using options. While we
    first analyze the algorithm in the general case of semi-Markov decision
    processes (SMDPs), we show how these results can be translated to the specific
    case of MDPs with options and we illustrate simple scenarios in which the
    regret of learning with options can be extit{provably} much smaller than the
    regret suffered when learning with primitive actions.

    Low Precision Neural Networks using Subband Decomposition

    Sek Chai, Aswin Raghavan, David Zhang, Mohamed Amer, Tim Shields
    Comments: Presented at CogArch Workshop, Atlanta, GA, April 2016
    Subjects: Learning (cs.LG)

    Large-scale deep neural networks (DNN) have been successfully used in a
    number of tasks from image recognition to natural language processing. They are
    trained using large training sets on large models, making them computationally
    and memory intensive. As such, there is much interest in research development
    for faster training and test time. In this paper, we present a unique approach
    using lower precision weights for more efficient and faster training phase. We
    separate imagery into different frequency bands (e.g. with different
    information content) such that the neural net can better learn using less bits.
    We present this approach as a complement existing methods such as pruning
    network connections and encoding learning weights. We show results where this
    approach supports more stable learning with 2-4X reduction in precision with
    17X reduction in DNN parameters.

    Biologically inspired protection of deep networks from adversarial attacks

    Aran Nayebi, Surya Ganguli
    Comments: 11 pages
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neurons and Cognition (q-bio.NC)

    Inspired by biophysical principles underlying nonlinear dendritic computation
    in neural circuits, we develop a scheme to train deep neural networks to make
    them robust to adversarial attacks. Our scheme generates highly nonlinear,
    saturated neural networks that achieve state of the art performance on gradient
    based adversarial examples on MNIST, despite never being exposed to
    adversarially chosen examples during training. Moreover, these networks exhibit
    unprecedented robustness to targeted, iterative schemes for generating
    adversarial examples, including second-order methods. We further identify
    principles governing how these networks achieve their robustness, drawing on
    methods from information geometry. We find these networks progressively create
    highly flat and compressed internal representations that are sensitive to very
    few input dimensions, while still solving the task. Moreover, they employ
    highly kurtotic weight distributions, also found in the brain, and we
    demonstrate how such kurtosis can protect even linear classifiers from
    adversarial attack.

    Sticking the Landing: An Asymptotically Zero-Variance Gradient Estimator for Variational Inference

    Geoffrey Roeder, Yuhuai Wu, David Duvenaud
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We propose a simple and general variant of the standard reparameterized
    gradient estimator for the variational evidence lower bound. Specifically, we
    remove a part of the total derivative with respect to the variational
    parameters that corresponds to the score function. Removing this term produces
    an unbiased gradient estimator whose variance approaches zero as the
    approximate posterior approaches the exact posterior. We analyze the behavior
    of this gradient estimator theoretically and empirically, and generalize it to
    more complex variational distributions such as mixtures and importance-weighted
    posteriors.

    Private Learning on Networks: Part II

    Shripad Gade, Nitin H. Vaidya
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Optimization and Control (math.OC)

    Widespread deployment of distributed machine learning algorithms has raised
    new privacy challenges. The focus of this paper is on improving privacy of each
    participant’s local information (such as dataset or loss function) while
    collaboratively learning underlying model. We present two iterative algorithms
    for privacy preserving distributed learning. Our algorithms involves adding
    structured randomization to the state estimates. We prove deterministic
    correctness (in every execution) of our algorithm despite the iterates being
    perturbed by non-zero mean random variables. We motivate privacy using privacy
    analysis of a special case of our algorithm referred to as Function Sharing
    strategy (presented in [1]).

    Scaling the Scattering Transform: Deep Hybrid Networks

    Edouard Oyallon (DI-ENS), Eugene Belilovsky (CVN, GALEN), Sergey Zagoruyko (ENPC)
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We use the scattering network as a generic and fixed initialization of the
    first layers of a supervised hybrid deep network. We show that early layers do
    not necessarily need to be learned, providing the best results to-date with
    pre-defined representations while being competitive with Deep CNNs. Using a
    shallow cascade of 1×1 convolutions, which encodes scattering coefficients that
    correspond to spatial windows of very small sizes, permits to obtain AlexNet
    accuracy on the imagenet ILSVRC2012. We demonstrate that this local encoding
    explicitly learns in-variance w.r.t. rotations. Combining scattering networks
    with a modern ResNet, we achieve a single-crop top 5 error of 11.4% on imagenet
    ILSVRC2012, comparable to the Resnet-18 architecture, while utilizing only 10
    layers. We also find that hybrid architectures can yield excellent performance
    in the small sample regime, exceeding their end-to-end counterparts, through
    their ability to incorporate geometrical priors. We demonstrate this on subsets
    of the CIFAR-10 dataset and by setting a new state-of-the-art on the STL-10
    dataset.

    Distributed Voting/Ranking with Optimal Number of States per Node

    Saber Salehkaleybar, Arsalan Sharif-Nassab, S. Jamaloddin Golestani
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    Considering a network with (n) nodes, where each node initially votes for one
    (or more) choices out of (K) possible choices, we present a Distributed
    Multi-choice Voting/Ranking (DMVR) algorithm to determine either the choice
    with maximum vote (the voting problem) or to rank all the choices in terms of
    their acquired votes (the ranking problem). The algorithm consolidates node
    votes across the network by updating the states of interacting nodes using two
    key operations, the union and the intersection. The proposed algorithm is
    simple, independent from network size, and easily scalable in terms of the
    number of choices (K), using only (K imes 2^{K-1}) nodal states for voting,
    and (K imes K!) nodal states for ranking. We prove the number of states to be
    optimal in the ranking case, this optimality is conjectured to also apply to
    the voting case. The time complexity of the algorithm is analyzed in complete
    graphs. We show that the time complexity for both ranking and voting is
    (O(log(n))) for given vote percentages, and is inversely proportional to the
    minimum of the vote percentage differences among various choices.

    Learned multi-patch similarity

    Wilfried Hartmann, Silvano Galliani, Michal Havlena, Konrad Schindler, Luc Van Gool
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Estimating a depth map from multiple views of a scene is a fundamental task
    in computer vision. As soon as more than two viewpoints are available, one
    faces the very basic question how to measure similarity across >2 image
    patches. Surprisingly, no direct solution exists, instead it is common to fall
    back to more or less robust averaging of two-view similarities. Encouraged by
    the success of machine learning, and in particular convolutional neural
    networks, we propose to learn a matching function which directly maps multiple
    image patches to a scalar similarity score. Experiments on several multi-view
    datasets demonstrate that this approach has advantages over methods based on
    pairwise patch similarity.

    Count-ception: Counting by Fully Convolutional Redundant Counting

    Joseph Paul Cohen, Henry Z. Lo, Yoshua Bengio
    Comments: Under Review
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Counting objects in digital images is a process that should be replaced by
    machines. This tedious task is time consuming and prone to errors due to
    fatigue of human annotators. The goal is to have a system that takes as input
    an image and returns a count of the objects inside and justification for the
    prediction in the form of object localization. We repose a problem, originally
    posed by Lempitsky and Zisserman, to instead predict a count map which contains
    redundant counts based on the receptive field of a smaller regression network.
    The regression network predicts a count of the objects that exist inside this
    frame. By processing the image in a fully convolutional way each pixel is going
    to be accounted for some number of times, the number of windows which include
    it, which is the size of each window, (i.e., 32×32 = 1024). To recover the true
    count take the average over the redundant predictions. Our contribution is
    redundant counting instead of predicting a density map in order to average over
    errors. We also propose a novel deep neural network architecture adapted from
    the Inception family of networks called the Count-ception network. Together our
    approach results in a 20% gain over the state of the art method by Xie, Noble,
    and Zisserman in 2016.

    Jointly Optimizing Placement and Inference for Beacon-based Localization

    Charles Schaff, David Yunis, Ayan Chakrabarti, Matthew R. Walter
    Subjects: Robotics (cs.RO); Learning (cs.LG)

    The ability of robots to estimate their location is crucial for a wide
    variety of autonomous operations. In settings where GPS is unavailable, range-
    or bearing-only observations relative to a set of fixed beacons provide an
    effective means of estimating a robot’s location as it navigates. The accuracy
    of such a beacon-based localization system depends both on how beacons are
    spatially distributed in the environment, and how the robot’s location is
    inferred based on noisy measurements of range or bearing. However, it is
    computationally challenging to search for a placement and an inference strategy
    that, together, are optimal. Existing methods decouple these decisions,
    forgoing optimality for tractability. We propose a new optimization approach to
    jointly determine the beacon placement and inference algorithm. We model
    inference as a neural network and incorporate beacon placement as a
    differentiable neural layer. This formulation allows us to optimize placement
    and inference by jointly training the inference network and beacon layer. We
    evaluate our method on different localization problems and demonstrate
    performance that exceeds hand-crafted baselines.

    Sequence-to-Sequence Models Can Directly Transcribe Foreign Speech

    Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen
    Comments: Submitted to Interspeech 2017
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    We present a recurrent encoder-decoder deep neural network architecture that
    directly translates speech in one language into text in another. The model does
    not explicitly transcribe the speech into text in the source language, nor does
    it require supervision from the ground truth source language transcription
    during training. We apply a slightly modified sequence-to-sequence with
    attention architecture that has previously been used for speech recognition and
    show that it can be repurposed for this more complex task, illustrating the
    power of attention-based models. A single model trained end-to-end obtains
    state-of-the-art performance on the Fisher Callhome Spanish-English speech
    translation task, outperforming a cascade of independently trained
    sequence-to-sequence speech recognition and machine translation models by 1.8
    BLEU points on the Fisher test set. In addition, we find that making use of the
    training data in both languages by multi-task training sequence-to-sequence
    speech translation and recognition models with a shared encoder network can
    improve performance by a further 1.4 BLEU points.


    Information Theory

    Generalized Gabidulin codes over fields of any characteristic

    Daniel Augot, Pierre Loidreau, Gwezheneg Robert
    Subjects: Information Theory (cs.IT)

    We generalise Gabidulin codes to the case of infinite fields, eventually with
    characteristic zero. For this purpose, we consider an abstract field extension
    and any automorphism in the Galois group. We derive some conditions on the
    automorphism to be able to have a proper notion of rank metric which is in
    coherence with linearized polynomials. Under these conditions, we generalize
    Gabidulin codes and provide a decoding algorithm which decode both errors and
    erasures. Then, we focus on codes over integer rings and how to decode them. We
    are then faced with the problem of the exponential growth of intermediate
    values, and to circumvent the problem, it is natural to propose to do
    computations modulo a prime ideal. For this, we study the reduction of
    generalized Gabidulin codes over number ideals codes modulo a prime ideal, and
    show they are classical Gabidulin codes. As a consequence, knowing side
    information on the size of the errors or the message, we can reduce the
    decoding problem over the integer ring to a decoding problem over a finite
    field. We also give examples and timings.

    Multiple Access for 5G New Radio: Categorization, Evaluation, and Challenges

    Hyunsoo Kim, Yeon-Geun Lim, Chan-Byoung Chae, Daesik Hong
    Comments: 9 pages, 4 figures, 2 tables
    Subjects: Information Theory (cs.IT)

    Next generation wireless networks require massive uplink connections as well
    as high spectral efficiency. It is well known that, theoretically, it is not
    possible to achieve the sum capacity of multi-user communications with
    orthogonal multiple access. To meet the challenging requirements of next
    generation networks, researchers have explored non-orthogonal and overloaded
    transmission technologies-known as new radio multiple access (NR-MA)
    schemes-for fifth generation (5G) networks. In this article, we discuss the key
    features of the promising NR-MA schemes for the massive uplink connections. The
    candidate schemes of NR-MA can be characterized by multiple access signatures
    (MA-signatures), such as codebook, sequence, and interleaver/scrambler. At the
    receiver side, advanced multi-user detection (MUD) schemes are employed to
    extract each user’s data from non-orthogonally superposed data according to
    MA-signatures. Through link-level simulations, we compare the performances of
    NR-MA candidates under the same conditions. We further evaluate the sum rate
    performances of the NR-MA schemes using a 3-dimensional (3D) ray tracing tool
    based system-level simulator by reflecting realistic environments. Lastly, we
    discuss the tips for system operations as well as call attention to the
    remaining technical challenges.

    One- and Two-Way Relay Optimization for MIMO Interference Networks

    Muhammad R A Khandaker, Kai-Kit Wong
    Comments: Accepted in EURASIP Journal on Advances in Signal Processing
    Journal-ref: EURASIP J. Adv. Signal Process., 2017:24
    Subjects: Information Theory (cs.IT); Emerging Technologies (cs.ET)

    This paper considers multiple-input multiple-output (MIMO) relay
    communication in multi-cellular (interference) systems in which MIMO
    source-destination pairs communicate simultaneously. It is assumed that due to
    severe attenuation and/or shadowing effects, communication links can be
    established only with the aid of a relay node. The aim is to minimize the
    maximal mean-square-error (MSE) among all the receiving nodes under constrained
    source and relay transmit powers. Both one- and two-way amplify-and-forward
    (AF) relaying mechanisms are considered. Since the exactly optimal solution for
    this practically appealing problem is intractable, we first propose optimizing
    the source, relay, and receiver matrices in an alternating fashion. Then we
    contrive a simplified semidefinite programming (SDP) solution based on the
    error covariance matrix decomposition technique, avoiding the high complexity
    of the iterative process. Numerical results reveal the effectiveness of the
    proposed schemes.

    Group Cooperation with Optimal Resource Allocation in Wireless Powered Communication Networks

    Ke Xiong, Chen Chen, Gang Qu, Pingyi Fan (IEEE), Khaled Ben Letaief
    Comments: 13 pages, 14 figures, to appear in IEEE Transactions on Wireless Communications Information Theory (cs.IT)
    Subjects: Information Theory (cs.IT)

    This paper considers a wireless powered communication network (WPCN) with
    group cooperation, where two communication groups cooperate with each other via
    wireless power transfer and time sharing to fulfill their expected information
    delivering and achieve “win-win” collaboration. To explore the system
    performance limits, we formulate optimization problems to respectively maximize
    the weighted sum-rate and minimize the total consumed power. The time
    assignment, beamforming vector and power allocation are jointly optimized under
    available power and quality of service requirement constraints of both groups.
    For the WSR-maximization, both fixed and flexible power scenarios are
    investigated. As all problems are non-convex and have no known solution
    methods, we solve them by using proper variable substitutions and the
    semi-definite relaxation. We theoretically prove that our proposed solution
    method guarantees the global optimum for each problem. Numerical results are
    presented to show the system performance behaviors, which provide some useful
    insights for future WPCN design. It shows that in such a group
    cooperation-aware WPCN, optimal time assignment has the greatest effect on the
    system performance than other factors.

    Physical Layer Security in Wireless Ad Hoc Networks Under A Hybrid Full-/Half-Duplex Receiver Deployment Strategy

    Tong-Xing Zheng, Hui-Ming Wang, Jinhong Yuan, Zhu Han, Moon Ho Lee
    Comments: Journal paper, double-column 12 pages, 9 figures, accepted by IEEE Transactions on Wireless Communications, 2017
    Subjects: Information Theory (cs.IT)

    This paper studies physical layer security in a wireless ad hoc network with
    numerous legitimate transmitter-receiver pairs and eavesdroppers. A hybrid
    full-/half-duplex receiver deployment strategy is proposed to secure legitimate
    transmissions, by letting a fraction of legitimate receivers work in the
    full-duplex (FD) mode sending jamming signals to confuse eavesdroppers upon
    their information receptions, and letting the other receivers work in the
    half-duplex mode just receiving their desired signals. The objective of this
    paper is to choose properly the fraction of FD receivers for achieving the
    optimal network security performance. Both accurate expressions and tractable
    approximations for the connection outage probability and the secrecy outage
    probability of an arbitrary legitimate link are derived, based on which the
    area secure link number, network-wide secrecy throughput and network-wide
    secrecy energy efficiency are optimized respectively. Various insights into the
    optimal fraction are further developed and its closed-form expressions are also
    derived under perfect self-interference cancellation or in a dense network. It
    is concluded that the fraction of FD receivers triggers a non-trivial trade-off
    between reliability and secrecy, and the proposed strategy can significantly
    enhance the network security performance.

    Some new bounds of placement delivery arrays

    X. Niu, H. Cao
    Comments: Coded caching scheme, placement delivery array, optimal
    Subjects: Information Theory (cs.IT)

    Coded caching scheme is a technique which reduce the load during peak traffic
    times in a wireless network system. Placement delivery array (PDA in short) was
    first introduced by Yan et al.. It can be used to design coded caching scheme.
    In this paper, we prove some lower bounds of PDA on the element and some lower
    bounds of PDA on the column. We also give some constructions for optimal PDA.

    A categorical characterization of relative entropy on Polish spaces

    Nicolas Gagne, Prakash Panangaden
    Comments: 19 pages
    Subjects: Information Theory (cs.IT)

    We give a categorical treatment, in the spirit of Baez and Fritz, of relative
    entropy for probability distributions defined on Polish spaces. We define a
    category called PolStat suitable for reasoning about statistical inference on
    Polish spaces. We define relative entropy as a functor into Lawvere’s category
    ([0,infty]) and we show convexity, lower semicontinuity and uniqueness.

    A Unified Ensemble of Concatenated Convolutional Codes

    Saeedeh Moloudi, Michael Lentmaier, Alexandre Graell i Amat
    Subjects: Information Theory (cs.IT)

    We introduce a unified ensemble for turbo-like codes (TCs) that contains the
    four main classes of TCs: parallel concatenated codes, serially concatenated
    codes, hybrid concatenated codes, and braided convolutional codes. We show that
    for each of the original classes of TCs, it is possible to find an equivalent
    ensemble by proper selection of the design parameters in the unified ensemble.
    We also derive the density evolution (DE) equations for this ensemble over the
    binary erasure channel. The thresholds obtained from the DE indicate that the
    TC ensembles from the unified ensemble have similar asymptotic behavior to the
    original TC ensembles.

    Denoising-based Turbo Compressed Sensing

    Zhipeng Xue, Junjie Ma, Xiaojun Yuan
    Comments: 11pages, 10 figures
    Subjects: Information Theory (cs.IT)

    Turbo compressed sensing (Turbo-CS) is an efficient iterative algorithm for
    sparse signal recovery with partial orthogonal sensing matrices. In this paper,
    we extend the Turbo-CS algorithm to solve compressed sensing problems involving
    more general signal structure, including compressive image recovery and
    low-rank matrix recovery. A main difficulty for such an extension is that the
    original Turbo-CS algorithm requires prior knowledge of the signal distribution
    that is usually unavailable in practice. To overcome this difficulty, we
    propose to redesign the Turbo-CS algorithm by employing a generic denoiser that
    does not depend on the prior distribution and hence the name denoising-based
    Turbo-CS (D-Turbo-CS). We then derive the extrinsic information for a generic
    denoiser by following the Turbo-CS principle. Based on that, we optimize the
    parametric extrinsic denoisers to minimize the output mean-square error (MSE).
    Explicit expressions are derived for the extrinsic SURE-LET denoiser used in
    compressive image denoising and also for the singular value thresholding (SVT)
    denoiser used in low-rank matrix denoising. We find that the dynamics of
    D-Turbo-CS can be well described by a scaler recursion called MSE evolution,
    similar to the case for Turbo-CS. Numerical results demonstrate that D-Turbo-CS
    considerably outperforms the counterpart algorithms in both reconstruction
    quality and running time.

    Multipair Massive MIMO Relaying Systems with One-Bit ADCs and DACs

    Chuili Kong, Amine Mezghani, Caijun Zhong, A. Lee Swindlehurst, Zhaoyang Zhang
    Comments: 14 pages, 10 figures, submitted to IEEE Trans. Signal Processing
    Subjects: Information Theory (cs.IT)

    This paper considers a multipair amplify-and-forward massive MIMO relaying
    system with one-bit ADCs and one-bit DACs at the relay. The channel state
    information is estimated via pilot training, and then utilized by the relay to
    perform simple maximum-ratio combining/maximum-ratio transmission processing.
    Leveraging on the Bussgang decomposition, an exact achievable rate is derived
    for the system with correlated quantization noise. Based on this, a closed-form
    asymptotic approximation for the achievable rate is presented, thereby enabling
    efficient evaluation of the impact of key parameters on the system performance.
    Furthermore, power scaling laws are characterized to study the potential energy
    efficiency associated with deploying massive one-bit antenna arrays at the
    relay. In addition, a power allocation strategy is designed to compensate for
    the rate degradation caused by the coarse quantization. Our results suggest
    that the quality of the channel estimates depends on the specific orthogonal
    pilot sequences that are used, contrary to unquantized systems where any set of
    orthogonal pilot sequences gives the same result. Moreover, the sum rate gap
    between the double-quantized relay system and an ideal non-quantized system is
    a moderate factor of (4/pi^2) in the low power regime.

    Regularized Gradient Descent: A Nonconvex Recipe for Fast Joint Blind Deconvolution and Demixing

    Shuyang Ling, Thomas Strohmer
    Subjects: Information Theory (cs.IT)

    We study the question of extracting a sequence of functions
    ({oldsymbol{f}_i, oldsymbol{g}_i}_{i=1}^s) from observing only the sum of
    their convolutions, i.e., from (oldsymbol{y} = sum_{i=1}^s
    oldsymbol{f}_iast oldsymbol{g}_i). While convex optimization techniques
    are able to solve this joint blind deconvolution-demixing problem provably and
    robustly under certain conditions, for medium-size or large-size problems we
    need computationally faster methods without sacrificing the benefits of
    mathematical rigor that come with convex methods. In this paper we present a
    non-convex algorithm which guarantees exact recovery under conditions that are
    competitive with convex optimization methods, with the additional advantage of
    being computationally much more efficient. Our two-step algorithm converges to
    the global minimum linearly and is also robust in the presence of additive
    noise. While the derived performance bounds are suboptimal in terms of the
    information-theoretic limit, numerical simulations show remarkable performance
    even if the number of measurements is close to the number of degrees of
    freedom. We discuss an application of the proposed framework in wireless
    communications in connection with the Internet-of-Things.

    Computing the capacity of a Markoff channel with perfect feedback is PSPACE-hard

    Mukul Agarwal, Sanjoy Mitter
    Subjects: Information Theory (cs.IT)

    It will be proved that computing the capacity of a Markoff channel with
    perfect feedback is PSPACE-hard.

    Channel Impulse Response-based Distributed Physical Layer Authentication

    Ammar Mahmood, Waqas Aman, M. Ozair Iqbal, M. Mahboob Ur Rahman, Qammer H. Abbasi
    Comments: 6 pages, 5 figures, accepted for presentation at IEEE VTC 2017 Spring
    Subjects: Information Theory (cs.IT)

    In this preliminary work, we study the problem of {it distributed}
    authentication in wireless networks. Specifically, we consider a system where
    multiple Bob (sensor) nodes listen to a channel and report their {it
    correlated} measurements to a Fusion Center (FC) which makes the ultimate
    authentication decision. For the feature-based authentication at the FC,
    channel impulse response has been utilized as the device fingerprint.
    Additionally, the {it correlated} measurements by the Bob nodes allow us to
    invoke Compressed sensing to significantly reduce the reporting overhead to the
    FC. Numerical results show that: i) the detection performance of the FC is
    superior to that of a single Bob-node, ii) compressed sensing leads to at least
    (20\%) overhead reduction on the reporting channel at the expense of a small
    ((<1) dB) SNR margin to achieve the same detection performance.

    On period polynomials of degree (2^m) and weight distributions of certain irreducible cyclic codes

    Ioulia N. Baoulina
    Comments: 17 pages, 5 tables
    Subjects: Number Theory (math.NT); Information Theory (cs.IT)

    We explicitly determine the values of reduced cyclotomic periods of order
    (2^m), (mge 4), for finite fields of characteristic (pequiv 3) or
    (5pmod{8}). These evaluations are applied to obtain explicit factorizations of
    the corresponding reduced period polynomials. As another application, the
    weight distributions of certain irreducible cyclic codes are described.

    Multi-sensor Transmission Management for Remote State Estimation under Coordination

    Kemi Ding, Yuzhe Li, Subhrakanti Dey, Ling Shi
    Subjects: Methodology (stat.ME); Information Theory (cs.IT)

    This paper considers the remote state estimation in a cyber-physical system
    (CPS) using multiple sensors. The measurements of each sensor are transmitted
    to a remote estimator over a shared channel, where simultaneous transmissions
    from other sensors are regarded as interference signals. In such a competitive
    environment, each sensor needs to choose its transmission power for sending
    data packets taking into account of other sensors’ behavior. To model this
    interactive decision-making process among the sensors, we introduce a
    multi-player non-cooperative game framework. To overcome the inefficiency
    arising from the Nash equilibrium (NE) solution, we propose a correlation
    policy, along with the notion of correlation equilibrium (CE). An analytical
    comparison of the game value between the NE and the CE is provided,
    with/without the power expenditure constraints for each sensor. Also, numerical
    simulations demonstrate the comparison results.

    TCP in 5G mmWave Networks: Link Level Retransmissions and MP-TCP

    Michele Polese, Rittwik Jana, Michele Zorzi
    Comments: 6 pages, 11 figures, accepted for presentation at the 2017 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    MmWave communications, one of the cornerstones of future 5G mobile networks,
    are characterized at the same time by a potential multi-gigabit capacity and by
    a very dynamic channel, sensitive to blockage, wide fluctuations in the
    received signal quality, and possibly also sudden link disruption. While the
    performance of physical and MAC layer schemes that address these issues has
    been thoroughly investigated in the literature, the complex interactions
    between mmWave links and transport layer protocols such as TCP are still
    relatively unexplored. This paper uses the ns-3 mmWave module, with its channel
    model based on real measurements in New York City, to analyze the performance
    of the Linux TCP/IP stack (i) with and without link-layer retransmissions,
    showing that they are fundamental to reach a high TCP throughput on mmWave
    links and (ii) with Multipath TCP (MP-TCP) over multiple LTE and mmWave links,
    illustrating which are the throughput-optimal combinations of secondary paths
    and congestion control algorithms in different conditions.

    Analyzing Evolving Stories in News Articles

    Roberto Camacho Barranco, Arnold P. Boedihardjo, M. Shahriar Hossain
    Comments: submitted to KDD 2017, 9 pages, 10 figures
    Subjects: Information Retrieval (cs.IR); Information Theory (cs.IT)

    There is an overwhelming number of news articles published every day around
    the globe. Following the evolution of a news-story is a difficult task given
    that there is no such mechanism available to track back in time to study the
    diffusion of the relevant events in digital news feeds. The techniques
    developed so far to extract meaningful information from a massive corpus rely
    on similarity search, which results in a myopic loopback to the same topic
    without providing the needed insights to hypothesize the origin of a story that
    may be completely different than the news today. In this paper, we present an
    algorithm that mines historical data to detect the origin of an event, segments
    the timeline into disjoint groups of coherent news articles, and outlines the
    most important documents in a timeline with a soft probability to provide a
    better understanding of the evolution of a story. Qualitative and quantitative
    approaches to evaluate our framework demonstrate that our algorithm discovers
    statistically significant and meaningful stories in reasonable time.
    Additionally, a relevant case study on a set of news articles demonstrates that
    the generated output of the algorithm holds the promise to aid prediction of
    future entities in a story.




沪ICP备19023445号-2号
友情链接