IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 30 May 2017

    我爱机器学习(52ml.net)发表于 2017-05-30 00:00:00
    love 0

    Neural and Evolutionary Computing

    Deep Complex Networks

    Chiheb Trabelsi, Olexa Bilaniuk, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, Christopher J Pal
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    At present, the vast majority of building blocks, techniques, and
    architectures for deep learning are based on real-valued operations and
    representations. However, recent work on recurrent neural networks and older
    fundamental theoretical analysis suggests that complex numbers could have a
    richer representational capacity and could also facilitate noise-robust memory
    retrieval mechanisms. Despite their attractive properties and potential for
    opening up entirely new neural architectures, complex-valued deep neural
    networks have been marginalized due to the absence of the building blocks
    required to design such models. In this work, we provide the key atomic
    components for complex-valued deep neural networks and apply them to
    convolutional feed-forward networks. More precisely, we rely on complex
    convolutions and present algorithms for complex batch-normalization, complex
    weight initialization strategies for complex-valued neural nets and we use them
    in experiments with end-to-end training schemes. We demonstrate that such
    complex-valued models are able to achieve comparable or better performance than
    their real-valued counterparts. We test deep complex models on several computer
    vision tasks and on music transcription using the MusicNet dataset where we
    achieve state of the art performance.

    Latent Intention Dialogue Models

    Tsung-Hsien Wen, Yishu Miao, Phil Blunsom, Steve Young
    Comments: Accepted at ICML 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Developing a dialogue agent that is capable of making autonomous decisions
    and communicating by natural language is one of the long-term goals of machine
    learning research. Traditional approaches either rely on hand-crafting a small
    state-action set for applying reinforcement learning that is not scalable or
    constructing deterministic models for learning dialogue sentences that fail to
    capture natural conversational variability. In this paper, we propose a Latent
    Intention Dialogue Model (LIDM) that employs a discrete latent variable to
    learn underlying dialogue intentions in the framework of neural variational
    inference. In a goal-oriented dialogue scenario, these latent intentions can be
    interpreted as actions guiding the generation of machine responses, which can
    be further refined autonomously by reinforcement learning. The experimental
    evaluation of LIDM shows that the model out-performs published benchmarks for
    both corpus-based and human evaluation, demonstrating the effectiveness of
    discrete latent variable models for learning goal-oriented dialogues.

    On Multilingual Training of Neural Dependency Parsers

    Michał Zapotoczny, Paweł Rychlikowski, Jan Chorowski
    Comments: preprint accepted into the TSD2017
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We show that a recently proposed neural dependency parser can be improved by
    joint training on multiple languages from the same family. The parser is
    implemented as a deep neural network whose only input is orthographic
    representations of words. In order to successfully parse, the network has to
    discover how linguistically relevant concepts can be inferred from word
    spellings. We analyze the representations of characters and words that are
    learned by the network to establish which properties of languages were
    accounted for. In particular we show that the parser has approximately learned
    to associate Latin characters with their Cyrillic counterparts and that it can
    group Polish and Russian words that have a similar grammatical function.
    Finally, we evaluate the parser on selected languages from the Universal
    Dependencies dataset and show that it is competitive with other recently
    proposed state-of-the art methods, while having a simple structure.

    Implicit Variational Inference with Kernel Density Ratio Fitting

    Jiaxin Shi, Shengyang Sun, Jun Zhu
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recent progress in variational inference has paid much attention to the
    flexibility of variational posteriors. Work has been done to use implicit
    distributions, i.e., distributions without tractable likelihoods as the
    variational posterior. However, existing methods on implicit posteriors still
    face challenges of noisy estimation and can hardly scale to high-dimensional
    latent variable models. In this paper, we present an implicit variational
    inference approach with kernel density ratio fitting that addresses these
    challenges. As far as we know, for the first time implicit variational
    inference is successfully applied to Bayesian neural networks, which shows
    promising results on both regression and classification tasks.

    BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet

    Haojin Yang, Martin Fritzsche, Christian Bartz, Christoph Meinel
    Comments: 4 pages
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Binary Neural Networks (BNNs) can drastically reduce memory size and accesses
    by applying bit-wise operations instead of standard arithmetic operations.
    Therefore it could significantly improve the efficiency and lower the energy
    consumption at runtime, which enables the application of state-of-the-art deep
    learning models on low power devices. BMXNet is an open-source BNN library
    based on MXNet, which supports both XNOR-Networks and Quantized Neural
    Networks. The developed BNN layers can be seamlessly applied with other
    standard library components and work in both GPU and CPU mode. BMXNet is
    maintained and developed by the multimedia research group at Hasso Plattner
    Institute and released under Apache license. Extensive experiments validate the
    efficiency and effectiveness of our implementation. The BMXNet library, several
    sample projects, and a collection of pre-trained binary deep models are
    available for download at this https URL


    Computer Vision and Pattern Recognition

    Feature Incay for Representation Regularization

    Yuhui Yuan, Kuiyuan Yang, Chao Zhang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Softmax loss is widely used in deep neural networks for multi-class
    classification, where each class is represented by a weight vector, a sample is
    represented as a feature vector, and the feature vector has the largest
    projection on the weight vector of the correct category when the model
    correctly classifies a sample. To ensure generalization, weight decay that
    shrinks the weight norm is often used as regularizer. Different from
    traditional learning algorithms where features are fixed and only weights are
    tunable, features are also tunable as representation learning in deep learning.
    Thus, we propose feature incay to also regularize representation learning,
    which favors feature vectors with large norm when the samples can be correctly
    classified. With the feature incay, feature vectors are further pushed away
    from the origin along the direction of their corresponding weight vectors,
    which achieves better inter-class separability. In addition, the proposed
    feature incay encourages intra-class compactness along the directions of weight
    vectors by increasing the small feature norm faster than the large ones.
    Empirical results on MNIST, CIFAR10 and CIFAR100 demonstrate feature incay can
    improve the generalization ability.

    Pose-Aware Person Recognition

    Vijay Kumar, Anoop Namboodiri, Manohar Paluri, C V Jawahar
    Comments: To appear in CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Person recognition methods that use multiple body regions have shown
    significant improvements over traditional face-based recognition. One of the
    primary challenges in full-body person recognition is the extreme variation in
    pose and view point. In this work, (i) we present an approach that tackles pose
    variations utilizing multiple models that are trained on specific poses, and
    combined using pose-aware weights during testing. (ii) For learning a person
    representation, we propose a network that jointly optimizes a single loss over
    multiple body regions. (iii) Finally, we introduce new benchmarks to evaluate
    person recognition in diverse scenarios and show significant improvements over
    previously proposed approaches on all the benchmarks including the photo album
    setting of PIPA.

    Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks – Counting, Detection, and Tracking

    Di Kang, Zheng Ma, Antoni B. Chan
    Comments: 14 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    For crowded scenes, the accuracy of object-based computer vision methods
    declines when the images are low-resolution and objects have severe occlusions.
    Taking counting methods for example, almost all the recent state-of-the-art
    counting methods bypass explicit detection and adopt regression-based methods
    to directly count the objects of interest. Among regression-based methods,
    density map estimation, where the number of objects inside a subregion is the
    integral of the density map over that subregion, is especially promising
    because it preserves spatial information, which makes it useful for both
    counting and localization (detection and tracking). With the power of deep
    convolutional neural networks (CNNs) the counting performance has improved
    steadily. The goal of this paper is to evaluate density maps generated by
    density estimation methods on a variety of crowd analysis tasks, including
    counting, detection, and tracking. Most existing CNN methods produce density
    maps with resolution that is smaller than the original images, due to the
    downsample strides in the convolution/pooling operations. To produce an
    original-resolution density map, we also evaluate a classical CNN that uses a
    sliding window regressor to predict the density for every pixel in the image.
    We also consider a fully convolutional (FCNN) adaptation, with skip connections
    from lower convolutional layers to compensate for loss in spatial information
    during upsampling. In our experiments, we found that the lower-resolution
    density maps sometimes have better counting performance. In contrast, the
    original-resolution density maps improved localization tasks, such as detection
    and tracking, compared to bilinear upsampling the lower-resolution density
    maps. Finally, we also propose several metrics for measuring the quality of a
    density map, and relate them to experiment results on counting and
    localization.

    On the Power Spectral Density Applied to the Analysis of Old Canvases

    Francisco J. Simois, Juan J. Murillo-Fuentes
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Spectral Theory (math.SP)

    A routine task for art historians is painting diagnostics, such as dating or
    attribution. Signal processing of the X-ray image of a canvas provides useful
    information about its fabric. However, previous methods may fail when very old
    and deteriorated artworks or simply canvases of small size are studied. We
    present a new framework to analyze and further characterize the paintings from
    their radiographs. First, we start from a general analysis of lattices and
    provide new unifying results about the theoretical spectra of weaves. Then, we
    use these results to infer the main structure of the fabric, like the type of
    weave and the thread densities. We propose a practical estimation of these
    theoretical results from paintings with the averaged power spectral density
    (PSD), which provides a more robust tool. Furthermore, we found that the PSD
    provides a fingerprint that characterizes the whole canvas. We search and
    discuss some distinctive features we may find in that fingerprint. We apply
    these results to several masterpieces of the 17th and 18th centuries from the
    Museo Nacional del Prado to show that this approach yields accurate results in
    thread counting and is very useful for paintings comparison, even in situations
    where previous methods fail.

    Towards Metamerism via Foveated Style Transfer

    Arturo Deza, Aditya Jonnalagadda, Miguel Eckstein
    Comments: Submitted to NIPS 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

    Given the recent successes of deep learning applied to style transfer and
    texture synthesis, we propose a new theoretical framework to construct visual
    metamers: extit{a family of perceptually identical, yet physically different
    images}. We review work both in neuroscience related to metameric stimuli, as
    well as computer vision research in style transfer. We propose our NeuroFovea
    metamer model that is based on a mixture of peripheral representations and
    style transfer forward-pass algorithms for emph{any} image from the recent
    work of Adaptive Instance Normalization (Huang~&~Belongie). Our model is
    parametrized by a VGG-Net versus a set of joint statistics of complex wavelet
    coefficients which allows us to encode images in high dimensional space and
    interpolate between the content and texture information. We empirically show
    that human observers discriminate our metamers at a similar rate as the
    metamers of Freeman~&~Simoncelli (FS) In addition, our NeuroFovea metamer
    model gives us the benefit of near real-time generation which presents a
    ( imes1000) speed-up compared to previous work. Critically, psychophysical
    studies show that both the FS and NeuroFovea metamers are discriminable from
    the original images highlighting an important limitation of current metamer
    generation methods.

    Ensemble of Part Detectors for Simultaneous Classification and Localization

    Xiaopeng Zhang, Hongkai Xiong, Weiyao Lin, Qi Tian
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Part-based representation has been proven to be effective for a variety of
    visual applications. However, automatic discovery of discriminative parts
    without object/part-level annotations is challenging. This paper proposes a
    discriminative mid-level representation paradigm based on the responses of a
    collection of part detectors, which only requires the image-level labels.
    Towards this goal, we first develop a detector-based spectral clustering method
    to mine the representative and discriminative mid-level patterns for detector
    initialization. The advantage of the proposed pattern mining technology is that
    the distance metric based on detectors only focuses on discriminative details,
    and a set of such grouped detectors offer an effective way for consistent
    pattern mining. Relying on the discovered patterns, we further formulate the
    detector learning process as a confidence-loss sparse Multiple Instance
    Learning (cls-MIL) task, which considers the diversity of the positive samples,
    while avoid drifting away the well localized ones by assigning a confidence
    value to each positive sample. The responses of the learned detectors can form
    an effective mid-level image representation for both image classification and
    object localization. Experiments conducted on benchmark datasets demonstrate
    the superiority of our method over existing approaches.

    Data Driven Coded Aperture Design for Depth Recovery

    Prasan A Shedligeri, Sreyas Mohan, Kaushik Mitra
    Comments: 4 pages, 4 figures, ICIP 2017 conference accepted
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Coded Aperture has been used for recovering all in focus image and a layered
    depth map simultaneously using a single captured image. The non trivial task of
    finding an optimal code has driven researchers to make some simplifying
    assumptions over image distribution which may not hold under all practical
    scenarios. In this work we propose a data driven approach to find the optimal
    code for depth recovery. We also learn a neural network that can efficiently
    recover depth map given the coded aperture photograph. We jointly train a
    neural network for code selection as well as for depth estimation. We find that
    the code learned by the network performs better in all metrics that have been
    proposed previously.

    Robust Online Matrix Factorization for Dynamic Background Subtraction

    Hongwei Yong, Deyu Meng, Wangmeng Zuo, Lei Zhang
    Comments: 14 pages, 13 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose an effective online background subtraction method, which can be
    robustly applied to practical videos that have variations in both foreground
    and background. Different from previous methods which often model the
    foreground as Gaussian or Laplacian distributions, we model the foreground for
    each frame with a specific mixture of Gaussians (MoG) distribution, which is
    updated online frame by frame. Particularly, our MoG model in each frame is
    regularized by the learned foreground/background knowledge in previous frames.
    This makes our online MoG model highly robust, stable and adaptive to practical
    foreground and background variations. The proposed model can be formulated as a
    concise probabilistic MAP model, which can be readily solved by EM algorithm.
    We further embed an affine transformation operator into the proposed model,
    which can be automatically adjusted to fit a wide range of video background
    transformations and make the method more robust to camera movements. With using
    the sub-sampling technique, the proposed method can be accelerated to execute
    more than 250 frames per second on average, meeting the requirement of
    real-time background subtraction for practical video processing tasks. The
    superiority of the proposed method is substantiated by extensive experiments
    implemented on synthetic and real videos, as compared with state-of-the-art
    online and offline background subtraction methods.

    Conditional CycleGAN for Attribute Guided Face Image Generation

    Yongyi Lu, Yu-Wing Tai, Chi-Keung Tang
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    State-of-the-art techniques in Generative Adversarial Networks (GANs) such as
    cycleGAN is able to learn the mapping of one image domain (X) to another image
    domain (Y) using unpaired image data. We extend the cycleGAN to ({it
    Conditional}) cycleGAN such that the mapping from (X) to (Y) is subjected to
    attribute condition (Z). Using face image generation as an application example,
    where (X) is a low resolution face image, (Y) is a high resolution face image,
    and (Z) is a set of attributes related to facial appearance (e.g. gender, hair
    color, smile), we present our method to incorporate (Z) into the network, such
    that the hallucinated high resolution face image (Y’) not only satisfies the
    low resolution constrain inherent in (X), but also the attribute condition
    prescribed by (Z). Using face feature vector extracted from face verification
    network as (Z), we demonstrate the efficacy of our approach on
    identity-preserving face image super-resolution. Our approach is general and
    applicable to high-quality face image generation where specific facial
    attributes can be controlled easily in the automatically generated results.

    L1-norm Error Function Robustness and Outlier Regularization

    Chris Ding, Bo Jiang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In many real-world applications, data come with corruptions, large errors or
    outliers. One popular approach is to use L1-norm function. However, the
    robustness of L1-norm function is not well understood so far. In this paper, we
    present a new outlier regularization framework to understand and analyze the
    robustness of L1-norm function. There are two main features for the proposed
    outlier regularization. (1) A key property of outlier regularization is that
    how far an outlier lies away from its theoretically predicted value does not
    affect the final regularization and analysis results. (2) Another important
    feature of outlier regularization is that it has an equivalent continuous
    representation that closely relates to L1 function. This provides a new way to
    understand and analyze the robustness of L1 function. We apply our outlier
    regularization framework to PCA and propose an outlier regularized PCA (ORPCA)
    model. Comparing to the trace-normbased robust PCA, ORPCA has several benefits:
    (1) It does not suffer singular value suppression. (2) It can retain small high
    rank components which help retain fine details of data. (3) ORPCA can be
    computed more efficiently.

    Dilated Residual Networks

    Fisher Yu, Vladlen Koltun, Thomas Funkhouser
    Comments: Published at the Conference on Computer Vision and Pattern Recognition (CVPR 2017)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional networks for image classification progressively reduce
    resolution until the image is represented by tiny feature maps in which the
    spatial structure of the scene is no longer discernible. Such loss of spatial
    acuity can limit image classification accuracy and complicate the transfer of
    the model to downstream applications that require detailed scene understanding.
    These problems can be alleviated by dilation, which increases the resolution of
    output feature maps without reducing the receptive field of individual neurons.
    We show that dilated residual networks (DRNs) outperform their non-dilated
    counterparts in image classification without increasing the model’s depth or
    complexity. We then study gridding artifacts introduced by dilation, develop an
    approach to removing these artifacts (`degridding’), and show that this further
    increases the performance of DRNs. In addition, we show that the accuracy
    advantage of DRNs is further magnified in downstream applications such as
    object localization and semantic segmentation.

    Multi-channel Weighted Nuclear Norm Minimization for Real Color Image Denoising

    Jun Xu, Lei Zhang, David Zhang, Xiangchu Feng
    Comments: 9 pages, 4 figures, submitted
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Most of the existing denoising algorithms are developed for grayscale images,
    while it is not a trivial work to extend them for color image denoising because
    the noise statistics in R, G, B channels can be very different for real noisy
    images. In this paper, we propose a multi-channel (MC) optimization model for
    real color image denoising under the weighted nuclear norm minimization (WNNM)
    framework. We concatenate the RGB patches to make use of the channel
    redundancy, and introduce a weight matrix to balance the data fidelity of the
    three channels in consideration of their different noise statistics. The
    proposed MC-WNNM model does not have an analytical solution. We reformulate it
    into a linear equality-constrained problem and solve it with the alternating
    direction method of multipliers. Each alternative updating step has closed-form
    solution and the convergence can be guaranteed. Extensive experiments on both
    synthetic and real noisy image datasets demonstrate the superiority of the
    proposed MC-WNNM over state-of-the-art denoising methods.

    Continuous Video to Simple Signals for Swimming Stroke Detection with Convolutional Neural Networks

    Brandon Victor, Zhen He, Stuart Morgan, Dino Miniutti
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In many sports, it is useful to analyse video of an athlete in competition
    for training purposes. In swimming, stroke rate is a common metric used by
    coaches; requiring a laborious labelling of each individual stroke. We show
    that using a Convolutional Neural Network (CNN) we can automatically detect
    discrete events in continuous video (in this case, swimming strokes). We create
    a CNN that learns a mapping from a window of frames to a point on a smooth 1D
    target signal, with peaks denoting the location of a stroke, evaluated as a
    sliding window. To our knowledge this process of training and utilizing a CNN
    has not been investigated before; either in sports or fundamental computer
    vision research. Most research has been focused on action recognition and using
    it to classify many clips in continuous video for action localisation.

    In this paper we demonstrate our process works well on the task of detecting
    swimming strokes in the wild. However, without modifying the model architecture
    or training method, the process is also shown to work equally well on detecting
    tennis strokes, implying that this is a general process.

    The outputs of our system are surprisingly smooth signals that predict an
    arbitrary event at least as accurately as humans (manually evaluated from a
    sample of negative results). A number of different architectures are evaluated,
    pertaining to slightly different problem formulations and signal targets.

    Care about you: towards large-scale human-centric visual relationship detection

    Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton van den Hengel
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Visual relationship detection aims to capture interactions between pairs of
    objects in images. Relationships between objects and humans represent a
    particularly important subset of this problem, with implications for challenges
    such as understanding human behaviour, and identifying affordances, amongst
    others. In addressing this problem we first construct a large-scale
    human-centric visual relationship detection dataset (HCVRD), which provides
    many more types of relationship annotation (nearly 10K categories) than the
    previous released datasets.

    This large label space better reflects the reality of human-object
    interactions, but gives rise to a long-tail distribution problem, which in turn
    demands a zero-shot approach to labels appearing only in the test set. This is
    the first time this issue has been addressed. We propose a webly-supervised
    approach to these problems and demonstrate that the proposed model provides a
    strong baseline on our HCVRD dataset.

    Cross-modal Subspace Learning for Fine-grained Sketch-based Image Retrieval

    Peng Xu, Qiyue Yin, Yongye Huang, Yi-Zhe Song, Zhanyu Ma, Liang Wang, Tao Xiang, W. Bastiaan Kleijn, Jun Guo
    Comments: Accepted by Neurocomputing
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Sketch-based image retrieval (SBIR) is challenging due to the inherent
    domain-gap between sketch and photo. Compared with pixel-perfect depictions of
    photos, sketches are iconic renderings of the real world with highly abstract.
    Therefore, matching sketch and photo directly using low-level visual clues are
    unsufficient, since a common low-level subspace that traverses semantically
    across the two modalities is non-trivial to establish. Most existing SBIR
    studies do not directly tackle this cross-modal problem. This naturally
    motivates us to explore the effectiveness of cross-modal retrieval methods in
    SBIR, which have been applied in the image-text matching successfully. In this
    paper, we introduce and compare a series of state-of-the-art cross-modal
    subspace learning methods and benchmark them on two recently released
    fine-grained SBIR datasets. Through thorough examination of the experimental
    results, we have demonstrated that the subspace learning can effectively model
    the sketch-photo domain-gap. In addition we draw a few key insights to drive
    future research.

    Vocabulary-informed Extreme Value Learning

    Yanwei Fu, HanZe Dong, Yu-feng Ma, Zhengjun Zhang, Xiangyang Xue
    Comments: nips 2017 submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST); Machine Learning (stat.ML)

    The novel unseen classes can be formulated as the extreme values of known
    classes. This inspired the recent works on open-set recognition
    cite{Scheirer_2013_TPAMI,Scheirer_2014_TPAMIb,EVM}, which however can have no
    way of naming the novel unseen classes. To solve this problem, we propose the
    Extreme Value Learning (EVL) formulation to learn the mapping from visual
    feature to semantic space. To model the margin and coverage distributions of
    each class, the Vocabulary-informed Learning (ViL) is adopted by using vast
    open vocabulary in the semantic space. Essentially, by incorporating the EVL
    and ViL, we for the first time propose a novel semantic embedding paradigm —
    Vocabulary-informed Extreme Value Learning (ViEVL), which embeds the visual
    features into semantic space in a probabilistic way. The learned embedding can
    be directly used to solve supervised learning, zero-shot and open set
    recognition simultaneously. Experiments on two benchmark datasets demonstrate
    the effectiveness of proposed frameworks.

    Person Depth ReID: Robust Person Re-identification with Commodity Depth Sensors

    Nikolaos Karianakis, Zicheng Liu, Yinpeng Chen, Stefano Soatto
    Comments: 13 pages, 6 figures, 5 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This work targets person re-identification (ReID) from depth sensors such as
    Kinect. Since depth is invariant to illumination and less sensitive than color
    to day-by-day appearance changes, a natural question is whether depth is an
    effective modality for Person ReID, especially in scenarios where individuals
    wear different colored clothes or over a period of several months. We explore
    the use of recurrent Deep Neural Networks for learning high-level shape
    information from low-resolution depth images. In order to tackle the small
    sample size problem, we introduce regularization and a hard temporal attention
    unit. The whole model can be trained end to end with a hybrid supervised loss.
    We carry out a thorough experimental evaluation of the proposed method on three
    person re-identification datasets, which include side views, views from the top
    and sequences with varying degree of partial occlusion, pose and viewpoint
    variations. To that end, we introduce a new dataset with RGB-D and skeleton
    data. In a scenario where subjects are recorded after three months with new
    clothes, we demonstrate large performance gains attained using Depth ReID
    compared to a state-of-the-art Color ReID. Finally, we show further
    improvements using the temporal attention unit in multi-shot setting.

    Probabilistic Global Scale Estimation for MonoSLAM Based on Generic Object Detection

    Edgar Sucar, Jean-Bernard Hayet
    Comments: Int. Workshop on Visual Odometry, CVPR, (July 2017)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes a novel method to estimate the global scale of a 3D
    reconstructed model within a Kalman filtering-based monocular SLAM algorithm.
    Our Bayesian framework integrates height priors over the detected objects
    belonging to a set of broad predefined classes, based on recent advances in
    fast generic object detection. Each observation is produced on single frames,
    so that we do not need a data association process along video frames. This is
    because we associate the height priors with the image region sizes at image
    places where map features projections fall within the object detection regions.
    We present very promising results of this approach obtained on several
    experiments with different object classes.

    Abnormality Detection and Localization in Chest X-Rays using Deep Convolutional Neural Networks

    Mohammad Tariqul Islam, Md Abdul Aowal, Ahmed Tahseen Minhaz, Khalid Ashraf
    Comments: 9 figures, 12 pages, 6 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Chest X-Rays are widely used for diagnosing abnormalities in the heart and
    lung area. Automatically detecting these abnormalities with high accuracy could
    greatly enhance real world diagnosis processes. Lack of standard publicly
    available dataset and benchmark studies, however, makes it difficult to compare
    various detection methods. In order to overcome these difficulties, we have
    used a publicly available Indiana chest X-Ray dataset and studied the
    performance of known deep convolutional network (DCN) architectures on
    different abnormalities. We find that the same DCN architecture doesn’t perform
    well across all abnormalities. Shallow features or earlier layers consistently
    provide higher detection accuracy compared to deep features. We have also found
    ensemble models to improve classification significantly compared to single
    model. Combining these insight, we report the highest accuracy on chest X-Ray
    abnormality detection on this dataset. Our localization experiments using these
    trained classifiers show that for spatially spread out abnormalities like
    cardiomegaly and pulmonary edema, the network can localize the abnormalities
    successfully most of the time. We find that in the cardiomegaly classification
    task, the deep learning method improves the accuracy by a staggering 17
    percentage point compared to rule based methods. One remarkable result of the
    cardiomegaly localization is that the heart and its surrounding region is most
    responsible for cardiomegaly detection, in contrast to the rule based models
    where the ratio of heart and lung area is used as the measure. We believe that
    through deep learning based classification and localization, we will discover
    many more interesting features in medical image diagnosis that are not
    considered traditionally.

    Global hard thresholding algorithms for joint sparse image representation and denoising

    Reza Borhani, Jeremy Watt, Aggelos Katsaggelos
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Sparse coding of images is traditionally done by cutting them into small
    patches and representing each patch individually over some dictionary given a
    pre-determined number of nonzero coefficients to use for each patch. In lack of
    a way to effectively distribute a total number (or global budget) of nonzero
    coefficients across all patches, current sparse recovery algorithms distribute
    the global budget equally across all patches despite the wide range of
    differences in structural complexity among them. In this work we propose a new
    framework for joint sparse representation and recovery of all image patches
    simultaneously. We also present two novel global hard thresholding algorithms,
    based on the notion of variable splitting, for solving the joint sparse model.
    Experimentation using both synthetic and real data shows effectiveness of the
    proposed framework for sparse image representation and denoising tasks.
    Additionally, time complexity analysis of the proposed algorithms indicate high
    scalability of both algorithms, making them favorable to use on large megapixel
    images.

    Nearest Neighbour Radial Basis Function Solvers for Deep Neural Networks

    Benjamin J. Meyer, Ben Harwood, Tom Drummond
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a radial basis function solver for convolutional neural networks
    that can be directly applied to both image classification and distance metric
    learning problems. Our method treats all training features from a deep neural
    network as radial basis function centres and computes loss by summing the
    influence of a feature’s nearby centres in the embedding space. Having a radial
    basis function centred on each training feature is made scalable by treating it
    as an approximate nearest neighbour search problem. End-to-end learning of the
    network and solver is carried out, mapping high dimensional features into
    clusters of the same class. This results in a well formed embedding space,
    where semantically related instances are likely to be located near one another,
    regardless of whether or not the network was trained on those classes. The same
    loss function is used for both the metric learning and classification problems.
    We show that our radial basis function solver sets state-of-the-art embedding
    results on the Stanford Cars196 and CUB-200-2011 datasets. Additionally, we
    show that when used as a classifier, our method outperforms a conventional
    softmax classifier on the Caltech-256 object recognition dataset and the
    fine-grained recognition dataset CUB-200-2011.

    Deep Matching and Validation Network — An End-to-End Solution to Constrained Image Splicing Localization and Detection

    Yue Wu, Wael AbdAlmageed, Prem Natarajan
    Comments: 9 pages, 10 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)

    Image splicing is a very common image manipulation technique that is
    sometimes used for malicious purposes. A splicing detec- tion and localization
    algorithm usually takes an input image and produces a binary decision
    indicating whether the input image has been manipulated, and also a
    segmentation mask that corre- sponds to the spliced region. Most existing
    splicing detection and localization pipelines suffer from two main
    shortcomings: 1) they use handcrafted features that are not robust against
    subsequent processing (e.g., compression), and 2) each stage of the pipeline is
    usually optimized independently. In this paper we extend the formulation of the
    underlying splicing problem to consider two input images, a query image and a
    potential donor image. Here the task is to estimate the probability that the
    donor image has been used to splice the query image, and obtain the splicing
    masks for both the query and donor images. We introduce a novel deep
    convolutional neural network architecture, called Deep Matching and Validation
    Network (DMVN), which simultaneously localizes and detects image splicing. The
    proposed approach does not depend on handcrafted features and uses raw input
    images to create deep learned representations. Furthermore, the DMVN is
    end-to-end op- timized to produce the probability estimates and the
    segmentation masks. Our extensive experiments demonstrate that this approach
    outperforms state-of-the-art splicing detection methods by a large margin in
    terms of both AUC score and speed.

    CASENet: Deep Category-Aware Semantic Edge Detection

    Zhiding Yu, Chen Feng, Ming-Yu Liu, Srikumar Ramalingam
    Comments: Accepted to CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Boundary and edge cues are highly beneficial in improving a wide variety of
    vision tasks such as semantic segmentation, object recognition, stereo, and
    object proposal generation. Recently, the problem of edge detection has been
    revisited and significant progress has been made with deep learning. While
    classical edge detection is a challenging binary problem in itself, the
    category-aware semantic edge detection by nature is an even more challenging
    multi-label problem. We model the problem such that each edge pixel can be
    associated with more than one class as they appear in contours or junctions
    belonging to two or more semantic classes. To this end, we propose a novel
    end-to-end deep semantic edge learning architecture based on ResNet and a new
    skip-layer architecture where category-wise edge activations at the top
    convolution layer share and are fused with the same set of bottom layer
    features. We then propose a multi-label loss function to supervise the fused
    activations. We show that our proposed architecture benefits this problem with
    better performance, and we outperform the current state-of-the-art semantic
    edge detection methods by a large margin on standard data sets such as SBD and
    Cityscapes.

    Direct Estimation of Regional Wall Thicknesses via Residual Recurrent Neural Network

    Wufeng Xue, Ilanit Ben Nachum, Sachin Pandey, James Warrington, Stephanie Leung, Shuo Li
    Comments: To appear as an oral paper in IPMI2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Accurate estimation of regional wall thicknesses (RWT) of left ventricular
    (LV) myocardium from cardiac MR sequences is of significant importance for
    identification and diagnosis of cardiac disease. Existing RWT estimation still
    relies on segmentation of LV myocardium, which requires strong prior
    information and user interaction. No work has been devoted into direct
    estimation of RWT from cardiac MR images due to the diverse shapes and
    structures for various subjects and cardiac diseases, as well as the complex
    regional deformation of LV myocardium during the systole and diastole phases of
    the cardiac cycle. In this paper, we present a newly proposed Residual
    Recurrent Neural Network (ResRNN) that fully leverages the spatial and temporal
    dynamics of LV myocardium to achieve accurate frame-wise RWT estimation. Our
    ResRNN comprises two paths: 1) a feed forward convolution neural network (CNN)
    for effective and robust CNN embedding learning of various cardiac images and
    preliminary estimation of RWT from each frame itself independently, and 2) a
    recurrent neural network (RNN) for further improving the estimation by modeling
    spatial and temporal dynamics of LV myocardium. For the RNN path, we design for
    cardiac sequences a Circle-RNN to eliminate the effect of null hidden input for
    the first time-step. Our ResRNN is capable of obtaining accurate estimation of
    cardiac RWT with Mean Absolute Error of 1.44mm (less than 1-pixel error) when
    validated on cardiac MR sequences of 145 subjects, evidencing its great
    potential in clinical cardiac function assessment.

    Towards Visual Ego-motion Learning in Robots

    Sudeep Pillai, John J. Leonard
    Comments: Conference paper; Submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017, Vancouver CA; 8 pages, 8 figures, 2 tables
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    Many model-based Visual Odometry (VO) algorithms have been proposed in the
    past decade, often restricted to the type of camera optics, or the underlying
    motion manifold observed. We envision robots to be able to learn and perform
    these tasks, in a minimally supervised setting, as they gain more experience.
    To this end, we propose a fully trainable solution to visual ego-motion
    estimation for varied camera optics. We propose a visual ego-motion learning
    architecture that maps observed optical flow vectors to an ego-motion density
    estimate via a Mixture Density Network (MDN). By modeling the architecture as a
    Conditional Variational Autoencoder (C-VAE), our model is able to provide
    introspective reasoning and prediction for ego-motion induced scene-flow.
    Additionally, our proposed model is especially amenable to bootstrapped
    ego-motion learning in robots where the supervision in ego-motion estimation
    for a particular camera sensor can be obtained from standard navigation-based
    sensor fusion strategies (GPS/INS and wheel-odometry fusion). Through
    experiments, we show the utility of our proposed approach in enabling the
    concept of self-supervised learning for visual ego-motion estimation in
    autonomous robots.

    LAP: a Linearize and Project Method for Solving Inverse Problems with Coupled Variables

    James Herring, James Nagy, Lars Ruthotto
    Comments: 21 pages, 6 figures, 3 tables
    Subjects: Numerical Analysis (math.NA); Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (cs.NA); Optimization and Control (math.OC)

    Many inverse problems involve two or more sets of variables that represent
    different physical quantities but are tightly coupled with each other. For
    example, image super-resolution requires joint estimation of image and motion
    parameters from noisy measurements. Exploiting this structure is key for
    efficiently solving large-scale problems to avoid, e.g., ill-conditioned
    optimization problems.

    In this paper, we present a new method called Linearize And Project (LAP)
    that offers a flexible framework for solving inverse problems with coupled
    variables. LAP is most promising for cases when the subproblem corresponding to
    one of the variables is considerably easier to solve than the other. LAP is
    based on a Gauss-Newton method, and thus after linearizing the residual, it
    eliminates one block of variables through projection. Due to the linearization,
    the block can be chosen freely and can represent quadratic as well as nonlinear
    variables. Further, LAP supports direct, iterative, and hybrid regularization
    as well as constraints. Therefore LAP is attractive, e.g., for ill-posed
    imaging problems. These traits differentiate LAP from common alternatives for
    this type of problems such as variable projection (VarPro) and block coordinate
    descent (BCD). Our numerical experiments compare the performance of LAP to BCD
    and VarPro using four coupled problems with one quadratic and one nonlinear set
    of variables.

    BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet

    Haojin Yang, Martin Fritzsche, Christian Bartz, Christoph Meinel
    Comments: 4 pages
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Binary Neural Networks (BNNs) can drastically reduce memory size and accesses
    by applying bit-wise operations instead of standard arithmetic operations.
    Therefore it could significantly improve the efficiency and lower the energy
    consumption at runtime, which enables the application of state-of-the-art deep
    learning models on low power devices. BMXNet is an open-source BNN library
    based on MXNet, which supports both XNOR-Networks and Quantized Neural
    Networks. The developed BNN layers can be seamlessly applied with other
    standard library components and work in both GPU and CPU mode. BMXNet is
    maintained and developed by the multimedia research group at Hasso Plattner
    Institute and released under Apache license. Extensive experiments validate the
    efficiency and effectiveness of our implementation. The BMXNet library, several
    sample projects, and a collection of pre-trained binary deep models are
    available for download at this https URL

    PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations

    Rico Jonschkowski, Roland Hafner, Jonathan Scholz, Martin Riedmiller
    Comments: Submitted to Robotics: Science and Systems (RSS 2017) Workshop New Frontiers for Deep Learning in Robotics this http URL
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We propose position-velocity encoders (PVEs) which learn—without
    supervision—to encode images to positions and velocities of task-relevant
    objects. PVEs encode a single image into a low-dimensional position state and
    compute the velocity state from finite differences in position. In contrast to
    autoencoders, position-velocity encoders are not trained by image
    reconstruction, but by making the position-velocity representation consistent
    with priors about interacting with the physical world. We applied PVEs to
    several simulated control tasks from pixels and achieved promising preliminary
    results.

    LiDAR-Camera Calibration using 3D-3D Point correspondences

    Ankit Dhall, Kunal Chelani, Vishnu Radhakrishnan, K.M. Krishna
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    With the advent of autonomous vehicles, LiDAR and cameras have become an
    indispensable combination of sensors. They both provide rich and complementary
    data which can be used by various algorithms and machine learning to sense and
    make vital inferences about the surroundings. We propose a novel pipeline and
    experimental setup to find accurate rigid-body transformation for extrinsically
    calibrating a LiDAR and a camera. The pipeling uses 3D-3D point correspondences
    in LiDAR and camera frame and gives a closed form solution. We further show the
    accuracy of the estimate by fusing point clouds from two stereo cameras which
    align perfectly with the rotation and translation estimated by our method,
    confirming the accuracy of our method’s estimates both mathematically and
    visually. Taking our idea of extrinsic LiDAR-camera calibration forward, we
    demonstrate how two cameras with no overlapping field-of-view can also be
    calibrated extrinsically using 3D point correspondences. The code has been made
    available as open-source software in the form of a ROS package, more
    information about which can be sought here:
    this https URL .

    A Multi-strength Adversarial Training Method to Mitigate Adversarial Attacks

    Chang Song, Hsin-Pai Cheng, Chunpeng Wu, Hai Li, Yiran Chen, Qing Wu
    Comments: Submitted to NIPS 2017
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Some recent works revealed that deep neural networks (DNNs) are vulnerable to
    so-called adversarial attacks where input examples are intentionally perturbed
    to fool DNNs. In this work, we revisit the DNN training process that includes
    adversarial examples into the training dataset so as to improve DNN’s
    resilience to adversarial attacks, namely, adversarial training. Our
    experiments show that different adversarial strengths, i.e., perturbation
    levels of adversarial examples, have different working zones to resist the
    attack. Based on the observation, we propose a multi-strength adversarial
    training method (MAT) that combines the adversarial training examples with
    different adversarial strengths to defend adversarial attacks. Two training
    structures – mixed MAT and parallel MAT – are developed to facilitate the
    tradeoffs between training time and memory occupation. Our results show that
    MAT can substantially minimize the accuracy degradation of deep learning
    systems to adversarial attacks on MNIST, CIFAR-10, CIFAR-100, and SVHN.


    Artificial Intelligence

    Learning Belief Network Structure From Data under Causal Insufficiency

    Mieczysław Kłopotek
    Comments: A short version of this paper appeared in [Klopotek:94m] M.A. K{l}opotek: Learning Belief Network Structure From Data under Causal Insufficiency. [in:] F. Bergadano, L.DeRaed Eds.: Machine Learning ECML-94 , Proc. 13th European Conference on Machine Learning, Catania, Italy, 6-8 April 1994, Lecture Notes in Artificial Intelligence 784, Springer-Verlag, 1994, pp. 379-382
    Subjects: Artificial Intelligence (cs.AI)

    Though a belief network (a representation of the joint probability
    distribution, see [3]) and a causal network (a representation of causal
    relationships [14]) are intended to mean different things, they are closely
    related. Both assume an underlying dag (directed acyclic graph) structure of
    relations among variables and if Markov condition and faithfulness condition
    [15] are met, then a causal network is in fact a belief network. The difference
    comes to appearance when we recover belief network and causal network structure
    from data.

    A causal network structure may be impossible to recover completely from data
    as not all directions of causal links may be uniquely determined [15].
    Fortunately, if we deal with causally sufficient sets of variables (that is
    whenever significant influence variables are not omitted from observation),
    then there exists the possibility to identify the family of belief networks a
    causal network belongs to [16]. Regrettably, to our knowledge, a similar result
    is not directly known for causally insufficient sets of variables. Spirtes,
    Glymour and Scheines developed a CI algorithm to handle this situation, but it
    leaves some important questions open.

    The big open question is whether or not the bidirectional edges (that is
    indications of a common cause) are the only ones necessary to develop a belief
    network out of the product of CI, or must there be some other hidden variables
    added (e.g. by guessing). This paper is devoted to settling this question.

    Automatic White-Box Testing of First-Order Logic Ontologies

    Javier Álvez, Montserrat Hermo, Paqui Lucio, German Rigau
    Comments: 41 pages
    Subjects: Artificial Intelligence (cs.AI)

    A long-standing dream of Artificial Intelligence (AI) has pursued to encode
    commonsense knowledge into computer programs enabling machines to reason about
    our world and problems. This work offers a new practical insight towards the
    automatic testing of first-order logic (FOL) ontologies. We introduce a novel
    fully automatic white-box testing framework for first-order logic (FOL)
    ontologies. The application of the proposed testing method is fully automatic
    since a) the automated generation of tests is only guided by the syntax of
    axioms and b) the evaluation of tests is performed by automated theorem
    provers. Our proposal enables the detection of defective axioms and,
    additionally, it also serves to demonstrate the suitability for reasoning
    purposes of those formulas included into FOL ontologies. We validate our
    proposal by its practical application to different FOL ontologies. In
    particular, DOLCE —consisting of around 200 axioms—, FPK (formal proof of
    the Kepler conjecture) —which has been derived from the Flyspeck project for
    its use in the CADE ATP System Competition CASC-J8—, and Adimen-SUMO —which
    is an ontology with more than 7,000 axioms derived from SUMO—. As result, we
    have detected several non-trivial defects that were hidden in those ontologies.
    Further, we have obtained an improved version of Adimen-SUMO (v2.6) by
    correcting all the defects detected during the practical application of our
    white-box testing method.

    Black-box Testing of First-Order Logic Ontologies Using WordNet

    Javier Álvez, Paqui Lucio, German Rigau
    Comments: 53 pages
    Subjects: Artificial Intelligence (cs.AI)

    A long-standing dream of Artificial Intelligence (AI) has pursued to enrich
    computer programs with commonsense knowledge enabling machines to reason about
    our world. This paper offers a new practical insight towards the automation of
    commonsense reasoning with first-order logic (FOL) ontologies. We propose a new
    black-box testing methodology of FOL SUMO-based ontologies by exploiting
    WordNet and its mapping into SUMO. Our proposal includes a method for the
    (semi-)automatic creation of a very large set of tests and a procedure for its
    automated evaluation by using automated theorem provers (ATPs). Applying our
    testing proposal, we are able to successfully evaluate a) the competency of
    several translations of SUMO into FOL and b) the performance of various
    automated ATPs. In addition, we are also able to evaluate the resulting set of
    tests according to different quality criteria.

    Machine Learned Learning Machines

    Leigh Sheneman, Arend Hintze
    Subjects: Artificial Intelligence (cs.AI)

    There are two common approaches for optimizing the performance of a machine:
    genetic algorithms and machine learning. A genetic algorithm is applied over
    many generations whereas machine learning works by applying feedback until the
    system meets a performance threshold. Though these are methods that typically
    operate separately, we combine evolutionary adaptation and machine learning
    into one approach. Our focus is on machines that can learn during their
    lifetime, but instead of equipping them with a machine learning algorithm we
    aim to let them evolve their ability to learn by themselves. We use evolvable
    networks of probabilistic and deterministic logic gates, known as Markov
    Brains, as our computational model organism. The ability of Markov Brains to
    learn is augmented by a novel adaptive component that can change its
    computational behavior based on feedback. We show that Markov Brains can indeed
    evolve to incorporate these feedback gates to improve their adaptability to
    variable environments. By combining these two methods, we now also implemented
    a computational model that can be used to study the evolution of learning.

    Abstract Argumentation / Persuasion / Dynamics

    Ryuta Arisaka, Ken Satoh
    Subjects: Artificial Intelligence (cs.AI)

    The act of persuasion, a key component in rhetoric argumentation, may be
    viewed as a dynamics modifier. Such modifiers are well-known in other research
    fields: recall dynamic epistemic logic where operators modify possible world
    accessibilities, or recall side effects and concurrency in programming
    languages. We consider persuasion in abstract argumentation as undertaking a
    similar role. We extend Dung’s frameworks with acts of persuasion among agents
    into Abstract Persuasion Argumentation (APA), and set forth properties related
    to arguments’ admissibilities. We show a way of enriching our basic notion of
    admissibility through CTL (computation tree logic) encoding, which also permits
    importation of the theoretical results known to the logic into our
    argumentation frameworks.

    Should Robots be Obedient?

    Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, Stuart Russell
    Comments: Accepted to IJCAI 2017
    Subjects: Artificial Intelligence (cs.AI)

    Intuitively, obedience — following the order that a human gives — seems
    like a good property for a robot to have. But, we humans are not perfect and we
    may give orders that are not best aligned to our preferences. We show that when
    a human is not perfectly rational then a robot that tries to infer and act
    according to the human’s underlying preferences can always perform better than
    a robot that simply follows the human’s literal order. Thus, there is a
    tradeoff between the obedience of a robot and the value it can attain for its
    owner. We investigate how this tradeoff is impacted by the way the robot infers
    the human’s preferences, showing that some methods err more on the side of
    obedience than others. We then analyze how performance degrades when the robot
    has a misspecified model of the features that the human cares about or the
    level of rationality of the human. Finally, we study how robots can start
    detecting such model misspecification. Overall, our work suggests that there
    might be a middle ground in which robots intelligently decide when to obey
    human orders, but err on the side of obedience.

    Probabilistic Program Abstractions

    Steven Holtzen, Todd Millstein, Guy Van den Broeck
    Subjects: Artificial Intelligence (cs.AI)

    Abstraction is a fundamental tool for reasoning about complex systems.
    Program abstraction has been utilized to great effect for analyzing
    deterministic programs. At the heart of pro- gram abstraction is the
    relationship between a concrete program, which is difficult to analyze, and an
    abstraction, which is more tractable. We generalize non-deterministic program
    abstractions to probabilistic program abstractions by explicitly quantifying
    the non-deterministic choices made by traditional program abstractions. We
    upgrade key theoretical program abstraction insights to the probabilistic
    context. Probabilistic program abstractions provide avenues for utilizing
    abstraction techniques from the programming languages community to improve the
    analysis of probabilistic programs.

    Bayesian Unification of Gradient and Bandit-based Learning for Accelerated Global Optimisation

    Ole-Christoffer Granmo
    Comments: 15th IEEE International Conference on Machine Learning and Applications (ICMLA 2016)
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    Bandit based optimisation has a remarkable advantage over gradient based
    approaches due to their global perspective, which eliminates the danger of
    getting stuck at local optima. However, for continuous optimisation problems or
    problems with a large number of actions, bandit based approaches can be
    hindered by slow learning. Gradient based approaches, on the other hand,
    navigate quickly in high-dimensional continuous spaces through local
    optimisation, following the gradient in fine grained steps. Yet, apart from
    being susceptible to local optima, these schemes are less suited for online
    learning due to their reliance on extensive trial-and-error before the optimum
    can be identified. In this paper, we propose a Bayesian approach that unifies
    the above two paradigms in one single framework, with the aim of combining
    their advantages. At the heart of our approach we find a stochastic linear
    approximation of the function to be optimised, where both the gradient and
    values of the function are explicitly captured. This allows us to learn from
    both noisy function and gradient observations, and predict these properties
    across the action space to support optimisation. We further propose an
    accompanying bandit driven exploration scheme that uses Bayesian credible
    bounds to trade off exploration against exploitation. Our empirical results
    demonstrate that by unifying bandit and gradient based learning, one obtains
    consistently improved performance across a wide spectrum of problem
    environments. Furthermore, even when gradient feedback is unavailable, the
    flexibility of our model, including gradient prediction, still allows us
    outperform competing approaches, although with a smaller margin. Due to the
    pervasiveness of bandit based optimisation, our scheme opens up for improved
    performance both in meta-optimisation and in applications where gradient
    related information is readily available.

    Inexpensive Cost-Optimized Measurement Proposal for Sequential Model-Based Diagnosis

    Patrick Rodler, Wolfgang Schmid, Konstantin Schekotihin
    Subjects: Artificial Intelligence (cs.AI)

    In this work we present strategies for (optimal) measurement selection in
    model-based sequential diagnosis. In particular, assuming a set of leading
    diagnoses being given, we show how queries (sets of measurements) can be
    computed and optimized along two dimensions: expected number of queries and
    cost per query. By means of a suitable decoupling of two optimizations and a
    clever search space reduction the computations are done without any inference
    engine calls. For the full search space, we give a method requiring only a
    polynomial number of inferences and guaranteeing query properties existing
    methods cannot provide. Evaluation results using real-world problems indicate
    that the new method computes (virtually) optimal queries instantly
    independently of the size and complexity of the considered diagnosis problems.

    Quadratic Unconstrained Binary Optimization Problem Preprocessing: Theory and Empirical Analysis

    Mark Lewis, Fred Glover
    Comments: Benchmark problems used are available from the first author
    Subjects: Artificial Intelligence (cs.AI)

    The Quadratic Unconstrained Binary Optimization problem (QUBO) has become a
    unifying model for representing a wide range of combinatorial optimization
    problems, and for linking a variety of disciplines that face these problems. A
    new class of quantum annealing computer that maps QUBO onto a physical qubit
    network structure with specific size and edge density restrictions is
    generating a growing interest in ways to transform the underlying QUBO
    structure into an equivalent graph having fewer nodes and edges. In this paper
    we present rules for reducing the size of the QUBO matrix by identifying
    variables whose value at optimality can be predetermined. We verify that the
    reductions improve both solution quality and time to solution and, in the case
    of metaheuristic methods where optimal solutions cannot be guaranteed, the
    quality of solutions obtained within reasonable time limits.

    We discuss the general QUBO structural characteristics that can take
    advantage of these reduction techniques and perform careful experimental design
    and analysis to identify and quantify the specific characteristics most
    affecting reduction. The rules make it possible to dramatically improve
    solution times on a new set of problems using both the exact Cplex solver and a
    tabu search metaheuristic.

    Multi-shot ASP solving with clingo

    Martin Gebser, Roland Kaminski, Benjamin Kaufmann, Torsten Schaub
    Comments: Submitted to TPLP
    Subjects: Artificial Intelligence (cs.AI)

    We introduce a new flexible paradigm of grounding and solving in Answer Set
    Programming (ASP), which we refer to as multi-shot ASP solving, and present its
    implementation in the ASP system clingo.

    Multi-shot ASP solving features grounding and solving processes that deal
    with continuously changing logic programs. In doing so, they remain operative
    and accommodate changes in a seamless way. For instance, such processes allow
    for advanced forms of search, as in optimization or theory solving, or
    interaction with an environment, as in robotics or query-answering. Common to
    them is that the problem specification evolves during the reasoning process,
    either because data or constraints are added, deleted, or replaced. This
    evolutionary aspect adds another dimension to ASP since it brings about state
    changing operations. We address this issue by providing an operational
    semantics that characterizes grounding and solving processes in multi-shot ASP
    solving. This characterization provides a semantic account of grounder and
    solver states along with the operations manipulating them.

    The operative nature of multi-shot solving avoids redundancies in relaunching
    grounder and solver programs and benefits from the solver’s learning
    capacities. clingo accomplishes this by complementing ASP’s declarative input
    language with control capacities. On the declarative side, a new directive
    allows for structuring logic programs into named and parameterizable
    subprograms. The grounding and integration of these subprograms into the
    solving process is completely modular and fully controllable from the
    procedural side. To this end, clingo offers a new application programming
    interface that is conveniently accessible via scripting languages.

    Contextual Explanation Networks

    Maruan Al-Shedivat, Avinava Dubey, Eric P. Xing
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We introduce contextual explanation networks (CENs)—a class of models that
    learn to predict by generating and leveraging intermediate explanations. CENs
    combine deep networks with context-specific probabilistic models and construct
    explanations in the form of locally-correct hypotheses. Contrary to the
    existing post-hoc model-explanation tools, CENs learn to predict and to explain
    jointly. Our approach offers two major advantages: (i) for each prediction,
    valid instance-specific explanations are generated with no computational
    overhead and (ii) prediction via explanation acts as a regularization and
    boosts performance in low-resource settings. We prove that local approximations
    to the decision boundary of our networks are consistent with the generated
    explanations. Our results on image and text classification and survival
    analysis tasks demonstrate that CENs can easily match or outperform the
    state-of-the-art while offering additional insights behind each prediction,
    valuable for decision support.

    Towards Visual Ego-motion Learning in Robots

    Sudeep Pillai, John J. Leonard
    Comments: Conference paper; Submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017, Vancouver CA; 8 pages, 8 figures, 2 tables
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    Many model-based Visual Odometry (VO) algorithms have been proposed in the
    past decade, often restricted to the type of camera optics, or the underlying
    motion manifold observed. We envision robots to be able to learn and perform
    these tasks, in a minimally supervised setting, as they gain more experience.
    To this end, we propose a fully trainable solution to visual ego-motion
    estimation for varied camera optics. We propose a visual ego-motion learning
    architecture that maps observed optical flow vectors to an ego-motion density
    estimate via a Mixture Density Network (MDN). By modeling the architecture as a
    Conditional Variational Autoencoder (C-VAE), our model is able to provide
    introspective reasoning and prediction for ego-motion induced scene-flow.
    Additionally, our proposed model is especially amenable to bootstrapped
    ego-motion learning in robots where the supervision in ego-motion estimation
    for a particular camera sensor can be obtained from standard navigation-based
    sensor fusion strategies (GPS/INS and wheel-odometry fusion). Through
    experiments, we show the utility of our proposed approach in enabling the
    concept of self-supervised learning for visual ego-motion estimation in
    autonomous robots.

    Latent Intention Dialogue Models

    Tsung-Hsien Wen, Yishu Miao, Phil Blunsom, Steve Young
    Comments: Accepted at ICML 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Developing a dialogue agent that is capable of making autonomous decisions
    and communicating by natural language is one of the long-term goals of machine
    learning research. Traditional approaches either rely on hand-crafting a small
    state-action set for applying reinforcement learning that is not scalable or
    constructing deterministic models for learning dialogue sentences that fail to
    capture natural conversational variability. In this paper, we propose a Latent
    Intention Dialogue Model (LIDM) that employs a discrete latent variable to
    learn underlying dialogue intentions in the framework of neural variational
    inference. In a goal-oriented dialogue scenario, these latent intentions can be
    interpreted as actions guiding the generation of machine responses, which can
    be further refined autonomously by reinforcement learning. The experimental
    evaluation of LIDM shows that the model out-performs published benchmarks for
    both corpus-based and human evaluation, demonstrating the effectiveness of
    discrete latent variable models for learning goal-oriented dialogues.

    Mining Process Model Descriptions of Daily Life through Event Abstraction

    Niek Tax, Natalia Sidorova, Reinder Haakma, Wil M.P. van der Aalst
    Comments: arXiv admin note: substantial text overlap with arXiv:1606.07283
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB)

    Process mining techniques focus on extracting insight in processes from event
    logs. Process mining has the potential to provide valuable insights in
    (un)healthy habits and to contribute to ambient assisted living solutions when
    applied on data from smart home environments. However, events recorded in smart
    home environments are on the level of sensor triggers, at which process
    discovery algorithms produce overgeneralizing process models that allow for too
    much behavior and that are difficult to interpret for human experts. We show
    that abstracting the events to a higher-level interpretation can enable
    discovery of more precise and more comprehensible models. We present a
    framework for the extraction of features that can be used for abstraction with
    supervised learning methods that is based on the XES IEEE standard for event
    logs. This framework can automatically abstract sensor-level events to their
    interpretation at the human activity level, after training it on training data
    for which both the sensor and human activity events are known. We demonstrate
    our abstraction framework on three real-life smart home event logs and show
    that the process models that can be discovered after abstraction are more
    precise indeed.

    Implicit Variational Inference with Kernel Density Ratio Fitting

    Jiaxin Shi, Shengyang Sun, Jun Zhu
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recent progress in variational inference has paid much attention to the
    flexibility of variational posteriors. Work has been done to use implicit
    distributions, i.e., distributions without tractable likelihoods as the
    variational posterior. However, existing methods on implicit posteriors still
    face challenges of noisy estimation and can hardly scale to high-dimensional
    latent variable models. In this paper, we present an implicit variational
    inference approach with kernel density ratio fitting that addresses these
    challenges. As far as we know, for the first time implicit variational
    inference is successfully applied to Bayesian neural networks, which shows
    promising results on both regression and classification tasks.

    Role Playing Learning for Socially Concomitant Mobile Robot Navigation

    Mingming Li, Rui Jiang, Shuzhi Sam Ge, Tong Heng Lee
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

    In this paper, we present the Role Playing Learning (RPL) scheme for a mobile
    robot to navigate socially with its human companion in populated environments.
    Neural networks (NN) are constructed to parameterize a stochastic policy that
    directly maps sensory data collected by the robot to its velocity outputs,
    while respecting a set of social norms. An efficient simulative learning
    environment is built with maps and pedestrians trajectories collected from a
    number of real-world crowd data sets. In each learning iteration, a robot
    equipped with the NN policy is created virtually in the learning environment to
    play itself as a companied pedestrian and navigate towards a goal in a socially
    concomitant manner. Thus, we call this process Role Playing Learning, which is
    formulated under a reinforcement learning (RL) framework. The NN policy is
    optimized end-to-end using Trust Region Policy Optimization (TRPO), with
    consideration of the imperfectness of robot’s sensor measurements. Simulative
    and experimental results are provided to demonstrate the efficacy and
    superiority of our method.

    AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks

    Alex Gaunt, Matthew Johnson, Maik Riechert, Daniel Tarlow, Ryota Tomioka, Dimitrios Vytiniotis, Sam Webster
    Comments: 17 pages, 13 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

    New types of machine learning hardware in development and entering the market
    hold the promise of revolutionizing deep learning in a manner as profound as
    GPUs. However, existing software frameworks and training algorithms for deep
    learning have yet to evolve to fully leverage the capability of the new wave of
    silicon. We already see the limitations of existing algorithms for models that
    exploit structured input via complex and instance-dependent control flow, which
    prohibits minibatching. We present an {em asynchronous model-parallel} (AMP)
    training algorithm that is specifically motivated by training on networks of
    interconnected devices. Through an implementation on multi-core CPUs, we show
    that AMP training converges to the same accuracy as conventional synchronous
    training algorithms in a similar number of epochs, but utilizes the available
    hardware more efficiently even for small minibatch sizes, resulting in
    significantly shorter overall training times. Our framework opens the door for
    scaling up a new class of deep learning models that cannot be efficiently
    trained today.

    Good Semi-supervised Learning that Requires a Bad GAN

    Zihang Dai, Zhilin Yang, Fan Yang, William W. Cohen, Ruslan Salakhutdinov
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Semi-supervised learning methods based on generative adversarial networks
    (GANs) obtained strong empirical results, but it is not clear 1) how the
    discriminator benefits from joint training with a generator, and 2) why good
    semi-supervised classification performance and a good generator cannot be
    obtained at the same time. Theoretically, we show that given the discriminator
    objective, good semisupervised learning indeed requires a bad generator, and
    propose the definition of a preferred generator. Empirically, we derive a novel
    formulation based on our analysis that substantially improves over feature
    matching GANs, obtaining state-of-the-art results on multiple benchmark
    datasets.

    Multiple Source Domain Adaptation with Adversarial Training of Neural Networks

    Han Zhao, Shanghang Zhang, Guanhang Wu, João P. Costeira, José M. F. Moura, Geoffrey J. Gordon
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We propose a new generalization bound for domain adaptation when there are
    multiple source domains with labeled instances and one target domain with
    unlabeled instances. The new bound has an interesting interpretation and
    reduces to an existing bound when there is only one source domain. Compared
    with existing bounds, the new bound does not require expert knowledge about the
    target distribution, nor the optimal combination rule for multisource domains.
    Interestingly, our theory also leads to an efficient implementation using
    adversarial neural networks: we show how to interpret it as learning feature
    representations that are invariant to the multiple domain shifts while still
    being discriminative for the learning task. To this end, we propose two models,
    both of which we call multisource domain adversarial networks (MDANs): the
    first model optimizes directly our bound, while the second model is a smoothed
    approximation of the first one, leading to a more data-efficient and
    task-adaptive model. The optimization tasks of both models are minimax saddle
    point problems that can be optimized by adversarial training. To demonstrate
    the effectiveness of MDANs, we conduct extensive experiments showing superior
    adaptation performance on three real-world datasets: sentiment analysis, digit
    classification, and vehicle counting.


    Information Retrieval

    KlusTree: Clustering Answer Trees from Keyword Search on Graphs

    Madhulika Mohanty, Maya Ramanath
    Comments: 16 pages, 2 Figures
    Subjects: Information Retrieval (cs.IR)

    Graph structured data on the web is now massive as well as diverse, ranging
    from social networks, web graphs to knowledge-bases. Effectively querying this
    graph structured data is non-trivial and has led to research in a variety of
    directions — structured queries, keyword and natural language queries,
    automatic translation of these queries to structured queries, etc. We are
    concerned with a class of queries called relationship queries, which are
    usually expressed as a set of keywords (each keyword denoting a named entity).
    The results returned are a set of ranked trees, each of which denotes
    relationships among the various keywords. The result list could consist of
    hundreds of answers. The problem of keyword search on graphs has been explored
    for over a decade now, but an important aspect that is not as extensively
    studied is that of user experience. We propose KlusTree, which presents
    clustered results to the users instead of a list of all the results. In our
    approach, the result trees are represented using language models and are
    clustered using JS divergence as a distance measure. We compare KlusTree with
    the well-known approaches based on isomorphism and tree-edit distance based
    clustering. The user evaluations show that KlusTree outperforms the other two
    in providing better clustering, thereby enriching user experience, revealing
    interesting patterns and improving result interpretation by the user.


    Computation and Language

    Who's to say what's funny? A computer using Language Models and Deep Learning, That's Who!

    Xinru Yan, Ted Pedersen
    Comments: 3 pages, Appears in the Proceedings of the Women and Underrepresented Minorities in Natural Language Processing Workshop (WiNLP-2017), July 30, 2017, Vancouver, BC
    Subjects: Computation and Language (cs.CL)

    Humor is a defining characteristic of human beings. Our goal is to develop
    methods that automatically detect humorous statements and rank them on a
    continuous scale. In this paper we report on results using a Language Model
    approach, and outline our plans for using methods from Deep Learning.

    Latent Intention Dialogue Models

    Tsung-Hsien Wen, Yishu Miao, Phil Blunsom, Steve Young
    Comments: Accepted at ICML 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Developing a dialogue agent that is capable of making autonomous decisions
    and communicating by natural language is one of the long-term goals of machine
    learning research. Traditional approaches either rely on hand-crafting a small
    state-action set for applying reinforcement learning that is not scalable or
    constructing deterministic models for learning dialogue sentences that fail to
    capture natural conversational variability. In this paper, we propose a Latent
    Intention Dialogue Model (LIDM) that employs a discrete latent variable to
    learn underlying dialogue intentions in the framework of neural variational
    inference. In a goal-oriented dialogue scenario, these latent intentions can be
    interpreted as actions guiding the generation of machine responses, which can
    be further refined autonomously by reinforcement learning. The experimental
    evaluation of LIDM shows that the model out-performs published benchmarks for
    both corpus-based and human evaluation, demonstrating the effectiveness of
    discrete latent variable models for learning goal-oriented dialogues.

    On Multilingual Training of Neural Dependency Parsers

    Michał Zapotoczny, Paweł Rychlikowski, Jan Chorowski
    Comments: preprint accepted into the TSD2017
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We show that a recently proposed neural dependency parser can be improved by
    joint training on multiple languages from the same family. The parser is
    implemented as a deep neural network whose only input is orthographic
    representations of words. In order to successfully parse, the network has to
    discover how linguistically relevant concepts can be inferred from word
    spellings. We analyze the representations of characters and words that are
    learned by the network to establish which properties of languages were
    accounted for. In particular we show that the parser has approximately learned
    to associate Latin characters with their Cyrillic counterparts and that it can
    group Polish and Russian words that have a similar grammatical function.
    Finally, we evaluate the parser on selected languages from the Universal
    Dependencies dataset and show that it is competitive with other recently
    proposed state-of-the art methods, while having a simple structure.

    An Automatic Contextual Analysis and Clustering Classifiers Ensemble approach to Sentiment Analysis

    Murtadha Talib AL-Sharuee, Fei Liu, Mahardhika Pratama
    Comments: This article is submitted to a journal
    Subjects: Computation and Language (cs.CL)

    Products reviews are one of the major resources to determine the public
    sentiment. The existing literature on reviews sentiment analysis mainly
    utilizes supervised paradigm, which needs labeled data to be trained on and
    suffers from domain-dependency. This article addresses these issues by
    describes a completely automatic approach for sentiment analysis based on
    unsupervised ensemble learning. The method consists of two phases. The first
    phase is contextual analysis, which has five processes, namely (1) data
    preparation; (2) spelling correction; (3) intensifier handling; (4) negation
    handling and (5) contrast handling. The second phase comprises the unsupervised
    learning approach, which is an ensemble of clustering classifiers using a
    majority voting mechanism with different weight schemes. The base classifier of
    the ensemble method is a modified k-means algorithm. The base classifier is
    modified by extracting initial centroids from the feature set via using
    SentWordNet (SWN). We also introduce new sentiment analysis problems of
    Australian airlines and home builders which offer potential benchmark problems
    in the sentiment analysis field. Our experiments on datasets from different
    domains show that contextual analysis and the ensemble phases improve the
    clustering performance in term of accuracy, stability and generalization
    ability.

    Dynamics of core of language vocabulary

    Valery D. Solovyev, Vladimir V. Bochkarev, Anna V. Shevlyakova
    Comments: This report was presented at the Workshop “Computational linguistics and language science”, Moscow, Russia on April 25, 2016
    Subjects: Computation and Language (cs.CL)

    Studies of the overall structure of vocabulary and its dynamics became
    possible due to creation of diachronic text corpora, especially Google Books
    Ngram. This article discusses the question of core change rate and the degree
    to which the core words cover the texts. Different periods of the last three
    centuries and six main European languages presented in Google Books Ngram are
    compared. The main result is high stability of core change rate, which is
    analogous to stability of the Swadesh list.

    Supervised Complementary Entity Recognition with Augmented Key-value Pairs of Knowledge

    Hu Xu, Lei Shu, Philip S. Yu
    Subjects: Computation and Language (cs.CL)

    Extracting opinion targets is an important task in sentiment analysis on
    product reviews and complementary entities (products) are one important type of
    opinion targets that may work together with the reviewed product. In this
    paper, we address the problem of Complementary Entity Recognition (CER) as a
    supervised sequence labeling with the capability of expanding domain knowledge
    as key-value pairs from unlabeled reviews, by automatically learning and
    enhancing knowledge-based features. We use Conditional Random Field (CRF) as
    the base learner and augment CRF with knowledge-based features (called the
    Knowledge-based CRF or KCRF for short). We conduct experiments to show that
    KCRF effectively improves the performance of supervised CER task.

    Subject Specific Stream Classification Preprocessing Algorithm for Twitter Data Stream

    Nisansa de Silva, Danaja Maldeniya, Chamilka Wijeratne
    Comments: 6 pages
    Subjects: Computation and Language (cs.CL)

    Micro-blogging service Twitter is a lucrative source for data mining
    applications on global sentiment. But due to the omnifariousness of the
    subjects mentioned in each data item; it is inefficient to run a data mining
    algorithm on the raw data. This paper discusses an algorithm to accurately
    classify the entire stream in to a given number of mutually exclusive
    collectively exhaustive streams upon each of which the data mining algorithm
    can be run separately yielding more relevant results with a high efficiency.

    Deep Learning for User Comment Moderation

    John Pavlopoulos, Prodromos Malakasiotis, Ion Androutsopoulos
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Experimenting with a new dataset of 1.6M user comments from a Greek news
    portal and existing datasets of English Wikipedia comments, we show that an RNN
    outperforms the previous state of the art in moderation. A deep,
    classification-specific attention mechanism improves further the overall
    performance of the RNN. We also compare against a CNN and a word-list baseline,
    considering both fully automatic and semi-automatic moderation.

    Neural Semantic Parsing by Character-based Translation: Experiments with Abstract Meaning Representations

    Rik van Noord, Johan Bos
    Comments: In review for CLIN Journal
    Subjects: Computation and Language (cs.CL)

    We evaluate the character-level translation method for neural semantic
    parsing on a large corpus of sentences annotated with Abstract Meaning
    Representations (AMRs). Using a seq2seq model, and some trivial preprocessing
    and postprocessing of AMRs, we obtain a baseline accuracy of 53.1 (F-score on
    AMR-triples). We examine four different approaches to improve this baseline
    result: (i) reordering AMR branches to match the word order of the input
    sentence increases performance to 58.3; (ii) adding part-of-speech tags
    (automatically produced) to the input shows improvement as well (57.2); (iii)
    So does the introduction of super characters (conflating frequent sequences of
    characters to a single character), reaching 57.4; (iv) adding silver-standard
    training data obtained by an off-the-shelf parser yields the biggest
    improvement, resulting in an F-score of 64.0. Combining all four techniques
    leads to an F-score of 69.0, which is state-of-the-art in AMR parsing. This is
    remarkable because of the relatively simplicity of the approach: the only
    explicit linguistic knowledge that we use are part-of-speech tags.

    The placement of the head that maximizes predictability. An information theoretic approach

    Ramon Ferrer-i-Cancho
    Subjects: Computation and Language (cs.CL); Adaptation and Self-Organizing Systems (nlin.AO); Physics and Society (physics.soc-ph); Neurons and Cognition (q-bio.NC)

    The minimization of the length of syntactic dependencies is a well-stablished
    principle of word order and the basis of a mathematical theory of word order.
    Here we complete that theory from the perspective of information theory, adding
    a competing word order principle: the maximization of predictability of a
    target element. These two principles are in conflict: to maximize the
    predictability of the head, the head should appear last, which maximizes the
    costs with respect to dependency length minimization. The implications of such
    a broad theoretical framework to understand the optimality, diversity and
    evolution of the six possible orderings of subject, object and verb are
    reviewed.

    Listen, Interact and Talk: Learning to Speak via Interaction

    Haichao Zhang, Haonan Yu, Wei Xu
    Subjects: Computation and Language (cs.CL)

    One of the long-term goals of artificial intelligence is to build an agent
    that can communicate intelligently with human in natural language. Most
    existing work on natural language learning relies heavily on training over a
    pre-collected dataset with annotated labels, leading to an agent that
    essentially captures the statistics of the fixed external training data. As the
    training data is essentially a static snapshot representation of the knowledge
    from the annotator, the agent trained this way is limited in adaptiveness and
    generalization of its behavior. Moreover, this is very different from the
    language learning process of humans, where language is acquired during
    communication by taking speaking action and learning from the consequences of
    speaking action in an interactive manner. This paper presents an interactive
    setting for grounded natural language learning, where an agent learns natural
    language by interacting with a teacher and learning from feedback, thus
    learning and improving language skills while taking part in the conversation.
    To achieve this goal, we propose a model which incorporates both imitation and
    reinforcement by leveraging jointly sentence and reward feedbacks from the
    teacher. Experiments are conducted to validate the effectiveness of the
    proposed approach.

    Understanding Abuse: A Typology of Abusive Language Detection Subtasks

    Zeerak Waseem, Thomas Davidson, Dana Warmsley, Ingmar Weber
    Comments: To appear in the proceedings of the 1st Workshop on Abusive Language Online. Please cite that version
    Subjects: Computation and Language (cs.CL)

    As the body of research on abusive language detection and analysis grows,
    there is a need for critical consideration of the relationships between
    different subtasks that have been grouped under this label. Based on work on
    hate speech, cyberbullying, and online abuse we propose a typology that
    captures central similarities and differences between subtasks and we discuss
    its implications for data annotation and feature construction. We emphasize the
    practical actions that can be taken by researchers to best approach their
    abusive language detection subtask of interest.

    On the relation between dependency distance, crossing dependencies, and parsing. Comment on "Dependency distance: a new perspective on syntactic patterns in natural languages" by Haitao Liu et al

    Carlos Gómez-Rodríguez
    Comments: This is a comment on the article “Dependency distance: a new perspective on syntactic patterns in natural languages”, by Haitao Liu et al. Submitted to Physics of Life Reviews
    Subjects: Computation and Language (cs.CL)

    Liu et al. (2017) provide a comprehensive account of research on dependency
    distance in human languages. While the article is a very rich and useful report
    on this complex subject, here I will expand on a few specific issues where
    research in computational linguistics (specifically natural language
    processing) can inform DDM research, and vice versa. These aspects have not
    been explored much in the article by Liu et al. or elsewhere, probably due to
    the little overlap between both research communities, but they may provide
    interesting insights for improving our understanding of the evolution of human
    languages, the mechanisms by which the brain processes and understands
    language, and the construction of effective computer systems to achieve this
    goal.

    word2vec Skip-Gram with Negative Sampling is a Weighted Logistic PCA

    Andrew J. Landgraf, Jeremy Bellay
    Subjects: Computation and Language (cs.CL); Machine Learning (stat.ML)

    We show that the skip-gram formulation of word2vec trained with negative
    sampling is equivalent to a weighted logistic PCA. This connection allows us to
    better understand the objective, compare it to other word embedding methods,
    and extend it to higher dimensional models.

    Semi-Supervised Model Training for Unbounded Conversational Speech Recognition

    Shane Walker, Morten Pedersen, Iroro Orife, Jason Flaks
    Subjects: Computation and Language (cs.CL)

    For conversational large-vocabulary continuous speech recognition (LVCSR)
    tasks, up to about two thousand hours of audio is commonly used to train state
    of the art models. Collection of labeled conversational audio however, is
    prohibitively expensive, laborious and error-prone. Furthermore, academic
    corpora like Fisher English (2004) or Switchboard (1992) are inadequate to
    train models with sufficient accuracy in the unbounded space of conversational
    speech. These corpora are also timeworn due to dated acoustic telephony
    features and the rapid advancement of colloquial vocabulary and idiomatic
    speech over the last decades. Utilizing the colossal scale of our unlabeled
    telephony dataset, we propose a technique to construct a modern, high quality
    conversational speech training corpus on the order of hundreds of millions of
    utterances (or tens of thousands of hours) for both acoustic and language model
    training. We describe the data collection, selection and training, evaluating
    the results of our updated speech recognition system on a test corpus of 7K
    manually transcribed utterances. We show relative word error rate (WER)
    reductions of {35%, 19%} on {agent, caller} utterances over our seed model and
    5% absolute WER improvements over IBM Watson STT on this conversational speech
    task.

    A Deep Multi-View Learning Framework for City Event Extraction from Twitter Data Streams

    Nazli Farajidavar, Sefki Kolozali, Payam Barnaghi
    Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)

    Cities have been a thriving place for citizens over the centuries due to
    their complex infrastructure. The emergence of the Cyber-Physical-Social
    Systems (CPSS) and context-aware technologies boost a growing interest in
    analysing, extracting and eventually understanding city events which
    subsequently can be utilised to leverage the citizen observations of their
    cities. In this paper, we investigate the feasibility of using Twitter textual
    streams for extracting city events. We propose a hierarchical multi-view deep
    learning approach to contextualise citizen observations of various city systems
    and services. Our goal has been to build a flexible architecture that can learn
    representations useful for tasks, thus avoiding excessive task-specific feature
    engineering. We apply our approach on a real-world dataset consisting of event
    reports and tweets of over four months from San Francisco Bay Area dataset and
    additional datasets collected from London. The results of our evaluations show
    that our proposed solution outperforms the existing models and can be used for
    extracting city related events with an averaged accuracy of 81% over all
    classes. To further evaluate the impact of our Twitter event extraction model,
    we have used two sources of authorised reports through collecting road traffic
    disruptions data from Transport for London API, and parsing the Time Out London
    website for sociocultural events. The analysis showed that 49.5% of the Twitter
    traffic comments are reported approximately five hours prior to the authorities
    official records. Moreover, we discovered that amongst the scheduled
    sociocultural event topics; tweets reporting transportation, cultural and
    social events are 31.75% more likely to influence the distribution of the
    Twitter comments than sport, weather and crime topics.

    Multiplex model of mental lexicon reveals explosive learning in humans

    Massimo Stella, Nicole M. Beckage, Markus Brede, Manlio De Domenico
    Comments: 11 pages, 4 figures and 1 table
    Subjects: Physics and Society (physics.soc-ph); Computation and Language (cs.CL); Social and Information Networks (cs.SI); Adaptation and Self-Organizing Systems (nlin.AO)

    Similarities among words affect language acquisition and processing in a
    multi-relational way barely accounted for in the literature. We propose a
    multiplex network representation of word similarities in a mental lexicon as a
    natural framework for investigating large-scale cognitive patterns. Our model
    accounts for semantic, taxonomic, and phonological interactions and identifies
    a cluster of words of higher frequency, easier to identify, memorise and learn
    and with more meanings than expected at random. This cluster emerges around age
    7 yr through an explosive transition not reproduced by null models. We relate
    this phenomenon to polysemy, i.e. redundancy in word meanings. We show that the
    word cluster acts as a core for the lexicon, increasing both its navigability
    and robustness to degradation in cognitive impairments. Our findings provide
    quantitative confirmation of existing psycholinguistic conjectures about core
    structure in the mental lexicon and the importance of integrating
    multi-relational word-word interactions in suitable frameworks.

    Community Identity and User Engagement in a Multi-Community Landscape

    Justine Zhang, William L. Hamilton, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky, Jure Leskovec
    Comments: 10 page, 3 figures, To appear in the Proceedings of the 11th International Conference On Web And Social Media, ICWSM 2017; this version has subtle differences with the proceedings version, including an introductory quote
    Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Computers and Society (cs.CY); Physics and Society (physics.soc-ph)

    A community’s identity defines and shapes its internal dynamics. Our current
    understanding of this interplay is mostly limited to glimpses gathered from
    isolated studies of individual communities. In this work we provide a
    systematic exploration of the nature of this relation across a wide variety of
    online communities. To this end we introduce a quantitative, language-based
    typology reflecting two key aspects of a community’s identity: how distinctive,
    and how temporally dynamic it is. By mapping almost 300 Reddit communities into
    the landscape induced by this typology, we reveal regularities in how patterns
    of user engagement vary with the characteristics of a community.

    Our results suggest that the way new and existing users engage with a
    community depends strongly and systematically on the nature of the collective
    identity it fosters, in ways that are highly consequential to community
    maintainers. For example, communities with distinctive and highly dynamic
    identities are more likely to retain their users. However, such niche
    communities also exhibit much larger acculturation gaps between existing users
    and newcomers, which potentially hinder the integration of the latter.

    More generally, our methodology reveals differences in how various social
    phenomena manifest across communities, and shows that structuring the
    multi-community landscape can lead to a better understanding of the systematic
    nature of this diversity.


    Distributed, Parallel, and Cluster Computing

    Increasing the Efficiency of Sparse Matrix-Matrix Multiplication with a 2.5D Algorithm and One-Sided MPI

    Alfio Lazzaro, Joost VandeVondele, Juerg Hutter, Ole Schuett
    Comments: In Proceedings of PASC ’17, Lugano, Switzerland, June 26-28, 2017, 10 pages, 4 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Matrix-matrix multiplication is a basic operation in linear algebra and an
    essential building block for a wide range of algorithms in various scientific
    fields. Theory and implementation for the dense, square matrix case are
    well-developed. If matrices are sparse, with application-specific sparsity
    patterns, the optimal implementation remains an open question. Here, we explore
    the performance of communication reducing 2.5D algorithms and one-sided MPI
    communication in the context of linear scaling electronic structure theory. In
    particular, we extend the DBCSR sparse matrix library, which is the basic
    building block for linear scaling electronic structure theory and low scaling
    correlated methods in CP2K. The library is specifically designed to efficiently
    perform block-sparse matrix-matrix multiplication of matrices with a relatively
    large occupation. Here, we compare the performance of the original
    implementation based on Cannon’s algorithm and MPI point-to-point
    communication, with an implementation based on MPI one-sided communications
    (RMA), in both a 2D and a 2.5D approach. The 2.5D approach trades memory and
    auxiliary operations for reduced communication, which can lead to a speedup if
    communication is dominant. The 2.5D algorithm is somewhat easier to implement
    with one-sided communications. A detailed description of the implementation is
    provided, also for non ideal processor topologies, since this is important for
    actual applications. Given the importance of the precise sparsity pattern, and
    even the actual matrix data, which decides the effective fill-in upon
    multiplication, the tests are performed within the CP2K package with
    application benchmarks. Results show a substantial boost in performance for the
    RMA based 2.5D algorithm, up to 1.80x, which is observed to increase with the
    number of involved processes in the parallelization.

    Dependency-Aware Rollback and Checkpoint-Restart for Distributed Task-Based Runtimes

    Kiril Dichev, Herbert Jordan, Konstantinos Tovletoglou, Thomas Heller, Dimitrios S. Nikolopoulos, Georgios Karakonstantis, Charles Gillan
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    With the increase in compute nodes in large compute platforms, a proportional
    increase in node failures will follow. Many application-based
    checkpoint/restart (C/R) techniques have been proposed for MPI applications to
    target the reduced mean time between failures. However, rollback as part of the
    recovery remains a dominant cost even in highly optimised MPI applications
    employing C/R techniques. Continuing execution past a checkpoint (that is,
    reducing rollback) is possible in message-passing runtimes, but extremely
    complex to design and implement. Our work focuses on task-based runtimes, where
    task dependencies are explicit and message passing is implicit. We see an
    opportunity for reducing rollback for such runtimes: we explore task
    dependencies in the rollback, which we call dependency-aware rollback. We also
    design a new C/R technique, which is influenced by recursive decomposition of
    tasks, and combine it with dependency-aware rollback. We expect the
    dependency-aware rollback to cancel and recompute less tasks in the presence of
    node failures. We describe, implement and validate the proposed protocol in a
    simulator, which confirms these expectations. In addition, we consistently
    observe faster overall execution time for dependency-aware rollback in the
    presence of faults, despite the fact that reduced task cancellation does not
    guarantee reduced overall execution time.

    Deterministic subgraph detection in broadcast CONGEST

    Janne H. Korhonen, Joel Rybicki
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    We present simple deterministic algorithms for subgraph finding and
    enumeration in the broadcast CONGEST model of distributed computation:

    — For any constant (k), detecting (k)-paths and trees on (k) nodes can be
    done in constantly many rounds, and (k)-cycles in (O(n)) rounds.

    — On (d)-degenerate graphs, cliques and (4)-cycles can be enumerated in (O(d
    + log n)) rounds, and (5)-cycles in (O(d^2 + log n)) rounds.

    In many cases, these bounds are tight up to logarithmic factors. Moreover, we
    show that the algorithms for (d)-degenerate graphs can be improved to optimal
    complexity (O(d/log n)) and (O(d^2/log n)), respectively, in the supported
    CONGEST model, which can be seen as an intermediate model between CONGEST and
    the congested clique.

    Fully distributed PageRank computation with exponential convergence

    Liang Dai, Nikolaos M. Freris
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (cs.SY)

    This work studies a fully distributed algorithm for computing the PageRank
    vector, which is inspired by the Matching Pursuit and features: 1) fully
    distributed 2) expected converges with exponential rate 3) low storage
    requirement (two scalar values per page). Illustrative experiments are
    conducted to verify the findings.

    A Unified Optimization Approach for Sparse Tensor Operations on GPUs

    Bangtian Liu, Chengyao Wen, Anand D.Sarwate, Maryam Mehri Dehnavi
    Subjects: Mathematical Software (cs.MS); Distributed, Parallel, and Cluster Computing (cs.DC)

    Sparse tensors appear in many large-scale applications with multidimensional
    and sparse data. While multidimensional sparse data often need to be processed
    on manycore processors, attempts to develop highly-optimized GPU-based
    implementations of sparse tensor operations are rare. The irregular computation
    patterns and sparsity structures as well as the large memory footprints of
    sparse tensor operations make such implementations challenging. We leverage the
    fact that sparse tensor operations share similar computation patterns to
    propose a unified tensor representation called F-COO. Combined with
    GPU-specific optimizations, F-COO provides highly-optimized implementations of
    sparse tensor computations on GPUs. The performance of the proposed unified
    approach is demonstrated for tensor-based kernels such as the Sparse Matricized
    Tensor- Times-Khatri-Rao Product (SpMTTKRP) and the Sparse Tensor- Times-Matrix
    Multiply (SpTTM) and is used in tensor decomposition algorithms. Compared to
    state-of-the-art work we improve the performance of SpTTM and SpMTTKRP up to
    3.7 and 30.6 times respectively on NVIDIA Titan-X GPUs. We implement a
    CANDECOMP/PARAFAC (CP) decomposition and achieve up to 14.9 times speedup using
    the unified method over state-of-the-art libraries on NVIDIA Titan-X GPUs.

    Spreading a Confirmed Rumor: A Case for Oscillatory Dynamics

    Bartlomiej Dudek, Adrian Kosowski (GANG)
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Dynamical Systems (math.DS)

    We consider an information spreading problem in which a population of (n)
    agents is to determine, through random pairwise interactions, whether an
    authoritative rumor source (X) is present in the population or not. The studied
    problem is a generalization of the rumor spreading problem, in which we
    additionally impose that the rumor should disappear when the rumor source no
    longer exists. It is also a generalization of the self-stabilizing broadcasting
    problem and has direct application to amplifying trace concentrations in
    chemical reaction networks.We show that there exists a protocol such that,
    starting from any possible initial state configuration, in the absence of a
    rumor source all agents reach a designated “uninformed” state after (O(log^2
    n)) rounds w.h.p., whereas in the presence of the rumor source, at any time
    after at least (O(log n)) rounds from the moment (X) appears, at least ((1
    -varepsilon)n) agents are in an “informed” state with probability (1 –
    O(1/n)), where (varepsilon>0) may be arbitrarily fixed. The protocol uses a
    constant number of states and its operation relies on an underlying oscillatory
    dynamics with a closed limit orbit. On the negative side, we show that any
    system which has such an ability to “suppress false rumors” in sub-polynomial
    time must either exhibit significant and perpetual variations of opinion over
    time in the presence of the rumor source, or use a super-constant number of
    states.

    AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks

    Alex Gaunt, Matthew Johnson, Maik Riechert, Daniel Tarlow, Ryota Tomioka, Dimitrios Vytiniotis, Sam Webster
    Comments: 17 pages, 13 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

    New types of machine learning hardware in development and entering the market
    hold the promise of revolutionizing deep learning in a manner as profound as
    GPUs. However, existing software frameworks and training algorithms for deep
    learning have yet to evolve to fully leverage the capability of the new wave of
    silicon. We already see the limitations of existing algorithms for models that
    exploit structured input via complex and instance-dependent control flow, which
    prohibits minibatching. We present an {em asynchronous model-parallel} (AMP)
    training algorithm that is specifically motivated by training on networks of
    interconnected devices. Through an implementation on multi-core CPUs, we show
    that AMP training converges to the same accuracy as conventional synchronous
    training algorithms in a similar number of epochs, but utilizes the available
    hardware more efficiently even for small minibatch sizes, resulting in
    significantly shorter overall training times. Our framework opens the door for
    scaling up a new class of deep learning models that cannot be efficiently
    trained today.


    Learning

    Contextual Explanation Networks

    Maruan Al-Shedivat, Avinava Dubey, Eric P. Xing
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We introduce contextual explanation networks (CENs)—a class of models that
    learn to predict by generating and leveraging intermediate explanations. CENs
    combine deep networks with context-specific probabilistic models and construct
    explanations in the form of locally-correct hypotheses. Contrary to the
    existing post-hoc model-explanation tools, CENs learn to predict and to explain
    jointly. Our approach offers two major advantages: (i) for each prediction,
    valid instance-specific explanations are generated with no computational
    overhead and (ii) prediction via explanation acts as a regularization and
    boosts performance in low-resource settings. We prove that local approximations
    to the decision boundary of our networks are consistent with the generated
    explanations. Our results on image and text classification and survival
    analysis tasks demonstrate that CENs can easily match or outperform the
    state-of-the-art while offering additional insights behind each prediction,
    valuable for decision support.

    Boltzmann Exploration Done Right

    Nicolò Cesa-Bianchi, Claudio Gentile, Gábor Lugosi, Gergely Neu
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Boltzmann exploration is a classic strategy for sequential decision-making
    under uncertainty, and is one of the most standard tools in Reinforcement
    Learning (RL). Despite its widespread use, there is virtually no theoretical
    understanding about the limitations or the actual benefits of this exploration
    scheme. Does it drive exploration in a meaningful way? Is it prone to
    misidentifying the optimal actions or spending too much time exploring the
    suboptimal ones? What is the right tuning for the learning rate? In this paper,
    we address several of these questions in the classic setup of stochastic
    multi-armed bandits. One of our main results is showing that the Boltzmann
    exploration strategy with any monotone learning-rate sequence will induce
    suboptimal behavior. As a remedy, we offer a simple non-monotone schedule that
    guarantees near-optimal performance, albeit only when given prior access to key
    problem parameters that are typically not available in practical situations
    (like the time horizon (T) and the suboptimality gap (Delta)). More
    importantly, we propose a novel variant that uses different learning rates for
    different arms, and achieves a distribution-dependent regret bound of order
    (frac{Klog^2 T}{Delta}) and a distribution-independent bound of order
    (sqrt{KT}log K) without requiring such prior knowledge. To demonstrate the
    flexibility of our technique, we also propose a variant that guarantees the
    same performance bounds even if the rewards are heavy-tailed.

    Deep Learning for Patient-Specific Kidney Graft Survival Analysis

    Margaux Luck, Tristan Sylvain, Héloïse Cardinal, Andrea Lodi, Yoshua Bengio
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    An accurate model of patient-specific kidney graft survival distributions can
    help to improve shared-decision making in the treatment and care of patients.
    In this paper, we propose a deep learning method that directly models the
    survival function instead of estimating the hazard function to predict survival
    times for graft patients based on the principle of multi-task learning. By
    learning to jointly predict the time of the event, and its rank in the cox
    partial log likelihood framework, our deep learning approach outperforms, in
    terms of survival time prediction quality and concordance index, other common
    methods for survival analysis, including the Cox Proportional Hazards model and
    a network trained on the cox partial log-likelihood.

    Mining Process Model Descriptions of Daily Life through Event Abstraction

    Niek Tax, Natalia Sidorova, Reinder Haakma, Wil M.P. van der Aalst
    Comments: arXiv admin note: substantial text overlap with arXiv:1606.07283
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB)

    Process mining techniques focus on extracting insight in processes from event
    logs. Process mining has the potential to provide valuable insights in
    (un)healthy habits and to contribute to ambient assisted living solutions when
    applied on data from smart home environments. However, events recorded in smart
    home environments are on the level of sensor triggers, at which process
    discovery algorithms produce overgeneralizing process models that allow for too
    much behavior and that are difficult to interpret for human experts. We show
    that abstracting the events to a higher-level interpretation can enable
    discovery of more precise and more comprehensible models. We present a
    framework for the extraction of features that can be used for abstraction with
    supervised learning methods that is based on the XES IEEE standard for event
    logs. This framework can automatically abstract sensor-level events to their
    interpretation at the human activity level, after training it on training data
    for which both the sensor and human activity events are known. We demonstrate
    our abstraction framework on three real-life smart home event logs and show
    that the process models that can be discovered after abstraction are more
    precise indeed.

    Kronecker Recurrent Units

    Cijo Jose, Moustpaha Cisse, Francois Fleuret
    Subjects: Learning (cs.LG)

    Our work addresses two important issues with recurrent neural networks: (1)
    they are over-parameterized, and (2) the recurrence matrix is ill-conditioned.
    The former increases the sample complexity of learning and the training time.
    The latter causes the vanishing and exploding gradient problem.

    We present a flexible recurrent neural network model called Kronecker
    Recurrent Units (KRU). KRU achieves parameter efficiency in RNNs through a
    Kronecker factored recurrent matrix. It overcomes the ill-conditioning of the
    recurrent matrix by enforcing soft unitary constraints on the factors. Thanks
    to the small dimensionality of the factors, maintaining these constraints is
    computationally efficient.

    Our experimental results on five standard data-sets reveal that KRU can
    reduce the number of parameters by three orders of magnitude in the recurrent
    weight matrix compared to the existing recurrent models, without trading the
    statistical performance.

    These results in particular show that while there are advantages in having a
    high dimensional recurrent space, the capacity of the recurrent part of the
    model can be dramatically reduced.

    Distributed Convolutional Sparse Coding

    Thomas Moreau, Laurent Oudre, Nicolas Vayatis
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We consider the problem of building shift-invariant representations for long
    signals in the context of distributed processing. We propose an asynchronous
    algorithm based on coordinate descent called DICOD to efficiently solve the
    (ell_1)-minimization problems involved in convolutional sparse coding. This
    algorithm leverages the weak temporal dependency of the convolution to reduce
    the interprocess communication to a few local messages. We prove that this
    algorithm converges to the optimal solution and that it scales with superlinear
    speedup, up to a certain limit. These properties are illustrated with numerical
    experiments and our algorithm is compared to the state-of-the-art methods used
    for convolutional sparse coding.

    Learning Network Structures from Contagion

    Adisak Supeesun (Kasetsart University, Bangkok, Thailand), Jittat Fakcharoenphol (Kasetsart University, Bangkok, Thailand)
    Journal-ref: Information Processing Letters, Volume 121, May 2017, Pages 11-16
    Subjects: Learning (cs.LG); Social and Information Networks (cs.SI)

    In 2014, Amin, Heidari, and Kearns proved that tree networks can be learned
    by observing only the infected set of vertices of the contagion process under
    the independent cascade model, in both the active and passive query models.
    They also showed empirically that simple extensions of their algorithms work on
    sparse networks. In this work, we focus on the active model. We prove that a
    simple modification of Amin et al.’s algorithm works on more general classes of
    networks, namely (i) networks with large girth and low path growth rate, and
    (ii) networks with bounded degree. This also provides partial theoretical
    explanation for Amin et al.’s experiments on sparse networks.

    Improving the Expected Improvement Algorithm

    Chao Qin, Diego Klabjan, Daniel Russo
    Comments: Submitted to NIPS 2017
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    The expected improvement (EI) algorithm is a popular strategy for information
    collection in optimization under uncertainty. The algorithm is widely known to
    be too greedy, but nevertheless enjoys wide use due to its simplicity and
    ability to handle uncertainty and noise in a coherent decision theoretic
    framework. To provide rigorous insight into EI, we study its properties in a
    simple setting of Bayesian optimization where the domain consists of a finite
    grid of points. This is the so-called best-arm identification problem, where
    the goal is to allocate measurement effort wisely to confidently identify the
    best arm using a small number of measurements. In this framework, one can show
    formally that EI is far from optimal. To overcome this shortcoming, we
    introduce a simple modification of the expected improvement algorithm.
    Surprisingly, this simple change results in an algorithm that is asymptotically
    optimal for Gaussian best-arm identification problems, and provably outperforms
    standard EI by an order of magnitude.

    Learning Data Manifolds with a Cutting Plane Method

    SueYeon Chung, Uri Cohen, Haim Sompolinsky, Daniel D. Lee
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We consider the problem of classifying data manifolds where each manifold
    represents invariances that are parameterized by continuous degrees of freedom.
    Conventional data augmentation methods rely upon sampling large numbers of
    training examples from these manifolds; instead, we propose an iterative
    algorithm called M_{CP} based upon a cutting-plane approach that efficiently
    solves a quadratic semi-infinite programming problem to find the maximum margin
    solution. We provide a proof of convergence as well as a polynomial bound on
    the number of iterations required for a desired tolerance in the objective
    function. The efficiency and performance of M_{CP} are demonstrated in
    high-dimensional simulations and on image manifolds generated from the ImageNet
    dataset. Our results indicate that M_{CP} is able to rapidly learn good
    classifiers and shows superior generalization performance compared with
    conventional maximum margin methods using data augmentation methods.

    Convergence Analysis of Two-layer Neural Networks with ReLU Activation

    Yuanzhi Li, Yang Yuan
    Subjects: Learning (cs.LG)

    In recent years, stochastic gradient descent (SGD) based techniques has
    become the standard tools for training neural networks. However, formal
    theoretical understanding of why SGD can train neural networks in practice is
    largely missing.

    In this paper, we make progress on understanding this mystery by providing a
    convergence analysis for SGD on a rich subset of two-layer feedforward networks
    with ReLU activations. This subset is characterized by a special structure
    called “identity mapping”. We prove that, if input follows from Gaussian
    distribution, with standard (O(1/sqrt{d})) initialization of the weights, SGD
    converges to the global minimum in polynomial number of steps. Unlike normal
    vanilla networks, the “identity mapping” makes our network asymmetric and thus
    the global minimum is unique. To complement our theory, we are also able to
    show experimentally that multi-layer networks with this mapping have better
    performance compared with normal vanilla networks.

    Our convergence theorem differs from traditional non-convex optimization
    techniques. We show that SGD converges to optimal in “two phases”: In phase I,
    the gradient points to the wrong direction, however, a potential function (g)
    gradually decreases. Then in phase II, SGD enters a nice one point convex
    region and converges. We also show that the identity mapping is necessary for
    convergence, as it moves the initial point to a better place for optimization.
    Experiment verifies our claims.

    BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet

    Haojin Yang, Martin Fritzsche, Christian Bartz, Christoph Meinel
    Comments: 4 pages
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Binary Neural Networks (BNNs) can drastically reduce memory size and accesses
    by applying bit-wise operations instead of standard arithmetic operations.
    Therefore it could significantly improve the efficiency and lower the energy
    consumption at runtime, which enables the application of state-of-the-art deep
    learning models on low power devices. BMXNet is an open-source BNN library
    based on MXNet, which supports both XNOR-Networks and Quantized Neural
    Networks. The developed BNN layers can be seamlessly applied with other
    standard library components and work in both GPU and CPU mode. BMXNet is
    maintained and developed by the multimedia research group at Hasso Plattner
    Institute and released under Apache license. Extensive experiments validate the
    efficiency and effectiveness of our implementation. The BMXNet library, several
    sample projects, and a collection of pre-trained binary deep models are
    available for download at this https URL

    AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks

    Alex Gaunt, Matthew Johnson, Maik Riechert, Daniel Tarlow, Ryota Tomioka, Dimitrios Vytiniotis, Sam Webster
    Comments: 17 pages, 13 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

    New types of machine learning hardware in development and entering the market
    hold the promise of revolutionizing deep learning in a manner as profound as
    GPUs. However, existing software frameworks and training algorithms for deep
    learning have yet to evolve to fully leverage the capability of the new wave of
    silicon. We already see the limitations of existing algorithms for models that
    exploit structured input via complex and instance-dependent control flow, which
    prohibits minibatching. We present an {em asynchronous model-parallel} (AMP)
    training algorithm that is specifically motivated by training on networks of
    interconnected devices. Through an implementation on multi-core CPUs, we show
    that AMP training converges to the same accuracy as conventional synchronous
    training algorithms in a similar number of epochs, but utilizes the available
    hardware more efficiently even for small minibatch sizes, resulting in
    significantly shorter overall training times. Our framework opens the door for
    scaling up a new class of deep learning models that cannot be efficiently
    trained today.

    Good Semi-supervised Learning that Requires a Bad GAN

    Zihang Dai, Zhilin Yang, Fan Yang, William W. Cohen, Ruslan Salakhutdinov
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Semi-supervised learning methods based on generative adversarial networks
    (GANs) obtained strong empirical results, but it is not clear 1) how the
    discriminator benefits from joint training with a generator, and 2) why good
    semi-supervised classification performance and a good generator cannot be
    obtained at the same time. Theoretically, we show that given the discriminator
    objective, good semisupervised learning indeed requires a bad generator, and
    propose the definition of a preferred generator. Empirically, we derive a novel
    formulation based on our analysis that substantially improves over feature
    matching GANs, obtaining state-of-the-art results on multiple benchmark
    datasets.

    A Multi-strength Adversarial Training Method to Mitigate Adversarial Attacks

    Chang Song, Hsin-Pai Cheng, Chunpeng Wu, Hai Li, Yiran Chen, Qing Wu
    Comments: Submitted to NIPS 2017
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Some recent works revealed that deep neural networks (DNNs) are vulnerable to
    so-called adversarial attacks where input examples are intentionally perturbed
    to fool DNNs. In this work, we revisit the DNN training process that includes
    adversarial examples into the training dataset so as to improve DNN’s
    resilience to adversarial attacks, namely, adversarial training. Our
    experiments show that different adversarial strengths, i.e., perturbation
    levels of adversarial examples, have different working zones to resist the
    attack. Based on the observation, we propose a multi-strength adversarial
    training method (MAT) that combines the adversarial training examples with
    different adversarial strengths to defend adversarial attacks. Two training
    structures – mixed MAT and parallel MAT – are developed to facilitate the
    tradeoffs between training time and memory occupation. Our results show that
    MAT can substantially minimize the accuracy degradation of deep learning
    systems to adversarial attacks on MNIST, CIFAR-10, CIFAR-100, and SVHN.

    Multiple Source Domain Adaptation with Adversarial Training of Neural Networks

    Han Zhao, Shanghang Zhang, Guanhang Wu, João P. Costeira, José M. F. Moura, Geoffrey J. Gordon
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We propose a new generalization bound for domain adaptation when there are
    multiple source domains with labeled instances and one target domain with
    unlabeled instances. The new bound has an interesting interpretation and
    reduces to an existing bound when there is only one source domain. Compared
    with existing bounds, the new bound does not require expert knowledge about the
    target distribution, nor the optimal combination rule for multisource domains.
    Interestingly, our theory also leads to an efficient implementation using
    adversarial neural networks: we show how to interpret it as learning feature
    representations that are invariant to the multiple domain shifts while still
    being discriminative for the learning task. To this end, we propose two models,
    both of which we call multisource domain adversarial networks (MDANs): the
    first model optimizes directly our bound, while the second model is a smoothed
    approximation of the first one, leading to a more data-efficient and
    task-adaptive model. The optimization tasks of both models are minimax saddle
    point problems that can be optimized by adversarial training. To demonstrate
    the effectiveness of MDANs, we conduct extensive experiments showing superior
    adaptation performance on three real-world datasets: sentiment analysis, digit
    classification, and vehicle counting.

    Fisher GAN

    Youssef Mroueh, Tom Sercu
    Comments: submitted to NIPS 2017
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Generative Adversarial Networks (GANs) are powerful models for learning
    complex distributions. Stable training of GANs has been addressed in many
    recent works which explore different metrics between distributions. In this
    paper we introduce Fisher GAN which fits within the Integral Probability
    Metrics (IPM) framework for training GANs. Fisher GAN defines a critic with a
    data dependent constraint on its second order moments. We show in this paper
    that Fisher GAN allows for stable and time efficient training that does not
    compromise the capacity of the critic, and does not need data independent
    constraints such as weight clipping. We analyze our Fisher IPM theoretically
    and provide an algorithm based on Augmented Lagrangian for Fisher GAN. We
    validate our claims on both image sample generation and semi-supervised
    classification using Fisher GAN.

    Fast Single-Class Classification and the Principle of Logit Separation

    Gil Keren, Sivan Sabato, Björn Schuller
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We consider neural network training, in applications in which there are many
    possible classes, but at test time the task is to identify whether the example
    belongs to one specific class, e.g., when searching for photos of a specific
    person. We focus on reducing the computational burden at test-time in such
    applications. We define the Single Logit Classification (SLC) task: training
    the network so that at test time, it would be possible to accurately identify
    if the example belongs to a given class, based only on the output logit of the
    trained model for this class. We propose a natural principle, the Principle of
    Logit Separation (PoLS), as a guideline for choosing and designing losses
    suitable for the SLC. We study previously suggested losses and their alignment
    with the PoLS. We further derive new batch versions of known losses, and show
    that unlike the standard versions, these new versions satisfy the PoLS. Our
    experiments show that the losses aligned with the PoLS overwhelmingly
    outperform the other losses on the SLC task. Tensorflow code for optimizing the
    new batch losses is publicly available in
    this https URL

    Latent Intention Dialogue Models

    Tsung-Hsien Wen, Yishu Miao, Phil Blunsom, Steve Young
    Comments: Accepted at ICML 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Developing a dialogue agent that is capable of making autonomous decisions
    and communicating by natural language is one of the long-term goals of machine
    learning research. Traditional approaches either rely on hand-crafting a small
    state-action set for applying reinforcement learning that is not scalable or
    constructing deterministic models for learning dialogue sentences that fail to
    capture natural conversational variability. In this paper, we propose a Latent
    Intention Dialogue Model (LIDM) that employs a discrete latent variable to
    learn underlying dialogue intentions in the framework of neural variational
    inference. In a goal-oriented dialogue scenario, these latent intentions can be
    interpreted as actions guiding the generation of machine responses, which can
    be further refined autonomously by reinforcement learning. The experimental
    evaluation of LIDM shows that the model out-performs published benchmarks for
    both corpus-based and human evaluation, demonstrating the effectiveness of
    discrete latent variable models for learning goal-oriented dialogues.

    On Multilingual Training of Neural Dependency Parsers

    Michał Zapotoczny, Paweł Rychlikowski, Jan Chorowski
    Comments: preprint accepted into the TSD2017
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We show that a recently proposed neural dependency parser can be improved by
    joint training on multiple languages from the same family. The parser is
    implemented as a deep neural network whose only input is orthographic
    representations of words. In order to successfully parse, the network has to
    discover how linguistically relevant concepts can be inferred from word
    spellings. We analyze the representations of characters and words that are
    learned by the network to establish which properties of languages were
    accounted for. In particular we show that the parser has approximately learned
    to associate Latin characters with their Cyrillic counterparts and that it can
    group Polish and Russian words that have a similar grammatical function.
    Finally, we evaluate the parser on selected languages from the Universal
    Dependencies dataset and show that it is competitive with other recently
    proposed state-of-the art methods, while having a simple structure.

    Adaptive Classification for Prediction Under a Budget

    Feng Nan, Venkatesh Saligrama
    Comments: arXiv admin note: substantial text overlap with arXiv:1704.07505
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We propose a novel adaptive approximation approach for test-time
    resource-constrained prediction. Given an input instance at test-time, a gating
    function identifies a prediction model for the input among a collection of
    models. Our objective is to minimize overall average cost without sacrificing
    accuracy. We learn gating and prediction models on fully labeled training data
    by means of a bottom-up strategy. Our novel bottom-up method first trains a
    high-accuracy complex model. Then a low-complexity gating and prediction model
    are subsequently learned to adaptively approximate the high-accuracy model in
    regions where low-cost models are capable of making highly accurate
    predictions. We pose an empirical loss minimization problem with cost
    constraints to jointly train gating and prediction models. On a number of
    benchmark datasets our method outperforms state-of-the-art achieving higher
    accuracy for the same cost.

    Tangent Cones to TT Varieties

    Benjamin Kutschan
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Algebraic Geometry (math.AG); Numerical Analysis (math.NA)

    As already done for the matrix case for example in [Joe Harris, Algebraic
    Geometry – A first course, p.256] we give a parametrization of the Bouligand
    tangent cone of the variety of tensors of bounded TT rank. We discuss how the
    proof generalizes to any binary hierarchical format. The parametrization can be
    rewritten as an orthogonal sum of TT tensors. Its retraction onto the variety
    is particularly easy to compose. We also give an implicit description of the
    tangent cone as the solution of a system of polynomial equations.

    On Residual CNN in text-dependent speaker verification task

    Egor Malykh, Sergey Novoselov, Oleg Kudashev
    Comments: Accepted for Specom 2017
    Subjects: Sound (cs.SD); Learning (cs.LG)

    Deep learning approaches are still not very common in the speaker
    verification field. We investigate the possibility of using deep residual
    convolutional neural network with spectrograms as an input features in the
    text-dependent speaker verification task. Despite the fact that we were not
    able to surpass the baseline system in quality, we achieved a quite good
    results for such a new approach getting an 5.23\% ERR on the RSR2015 evaluation
    part. Fusion of the baseline and proposed systems outperformed the best
    individual system by 18\% relatively.

    Implicit Variational Inference with Kernel Density Ratio Fitting

    Jiaxin Shi, Shengyang Sun, Jun Zhu
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recent progress in variational inference has paid much attention to the
    flexibility of variational posteriors. Work has been done to use implicit
    distributions, i.e., distributions without tractable likelihoods as the
    variational posterior. However, existing methods on implicit posteriors still
    face challenges of noisy estimation and can hardly scale to high-dimensional
    latent variable models. In this paper, we present an implicit variational
    inference approach with kernel density ratio fitting that addresses these
    challenges. As far as we know, for the first time implicit variational
    inference is successfully applied to Bayesian neural networks, which shows
    promising results on both regression and classification tasks.

    Coreset Construction via Randomized Matrix Multiplication

    Jiasen Yang, Agniva Chowdhury, Petros Drineas
    Comments: 27 pages, 11 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Coresets are small sets of points that approximate the properties of a larger
    point-set. For example, given a compact set (mathcal{S} subseteq
    mathbb{R}^d), a coreset could be defined as a (weighted) subset of
    (mathcal{S}) that approximates the sum of squared distances from (mathcal{S})
    to every linear subspace of (mathbb{R}^d). As such, coresets can be used as a
    proxy to the full dataset and provide an important technique to speed up
    algorithms for solving problems including principal component analysis, latent
    semantic indexing, etc. In this paper, we provide a structural result that
    connects the construction of such coresets to approximating matrix products.
    This structural result implies a simple, randomized algorithm that constructs
    coresets whose sizes are independent of the number and dimensionality of the
    input points. The expected size of the resulting coresets yields an improvement
    over the state-of-the-art deterministic approach. Finally, we evaluate the
    proposed randomized algorithm on synthetic and real data, and demonstrate its
    effective performance relative to its deterministic counterpart.

    Temporal anomaly detection: calibrating the surprise

    Eyal Gutflaish, Aryeh Kontorovich, Sivan Sabato, Ofer Biller, Oded Sofer
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

    We propose a hybrid approach to temporal anomaly detection in user-database
    access data — or more generally, any kind of subject-object co-occurrence
    data. Our methodology allows identifying anomalies based on a single stationary
    model, instead of requiring a full temporal one, which would be prohibitive in
    our setting. We learn our low-rank stationary model from the high-dimensional
    training data, and then fit a regression model for predicting the expected
    likelihood score of normal access patterns in the future. The disparity between
    the predicted and the observed likelihood scores is used to assess the
    “surprise”. This approach enables calibration of the anomaly score so that
    time-varying normal behavior patterns are not considered anomalous. We provide
    a detailed description of the algorithm, including a convergence analysis, and
    report encouraging empirical results. One of the datasets we tested is new for
    the public domain. It consists of two months’ worth of database access records
    from a live system. This dataset will be made publicly available, and is
    provided in the supplementary material.

    Deep Learning for User Comment Moderation

    John Pavlopoulos, Prodromos Malakasiotis, Ion Androutsopoulos
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Experimenting with a new dataset of 1.6M user comments from a Greek news
    portal and existing datasets of English Wikipedia comments, we show that an RNN
    outperforms the previous state of the art in moderation. A deep,
    classification-specific attention mechanism improves further the overall
    performance of the RNN. We also compare against a CNN and a word-list baseline,
    considering both fully automatic and semi-automatic moderation.

    Conditional CycleGAN for Attribute Guided Face Image Generation

    Yongyi Lu, Yu-Wing Tai, Chi-Keung Tang
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    State-of-the-art techniques in Generative Adversarial Networks (GANs) such as
    cycleGAN is able to learn the mapping of one image domain (X) to another image
    domain (Y) using unpaired image data. We extend the cycleGAN to ({it
    Conditional}) cycleGAN such that the mapping from (X) to (Y) is subjected to
    attribute condition (Z). Using face image generation as an application example,
    where (X) is a low resolution face image, (Y) is a high resolution face image,
    and (Z) is a set of attributes related to facial appearance (e.g. gender, hair
    color, smile), we present our method to incorporate (Z) into the network, such
    that the hallucinated high resolution face image (Y’) not only satisfies the
    low resolution constrain inherent in (X), but also the attribute condition
    prescribed by (Z). Using face feature vector extracted from face verification
    network as (Z), we demonstrate the efficacy of our approach on
    identity-preserving face image super-resolution. Our approach is general and
    applicable to high-quality face image generation where specific facial
    attributes can be controlled easily in the automatically generated results.

    Bayesian Unification of Gradient and Bandit-based Learning for Accelerated Global Optimisation

    Ole-Christoffer Granmo
    Comments: 15th IEEE International Conference on Machine Learning and Applications (ICMLA 2016)
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    Bandit based optimisation has a remarkable advantage over gradient based
    approaches due to their global perspective, which eliminates the danger of
    getting stuck at local optima. However, for continuous optimisation problems or
    problems with a large number of actions, bandit based approaches can be
    hindered by slow learning. Gradient based approaches, on the other hand,
    navigate quickly in high-dimensional continuous spaces through local
    optimisation, following the gradient in fine grained steps. Yet, apart from
    being susceptible to local optima, these schemes are less suited for online
    learning due to their reliance on extensive trial-and-error before the optimum
    can be identified. In this paper, we propose a Bayesian approach that unifies
    the above two paradigms in one single framework, with the aim of combining
    their advantages. At the heart of our approach we find a stochastic linear
    approximation of the function to be optimised, where both the gradient and
    values of the function are explicitly captured. This allows us to learn from
    both noisy function and gradient observations, and predict these properties
    across the action space to support optimisation. We further propose an
    accompanying bandit driven exploration scheme that uses Bayesian credible
    bounds to trade off exploration against exploitation. Our empirical results
    demonstrate that by unifying bandit and gradient based learning, one obtains
    consistently improved performance across a wide spectrum of problem
    environments. Furthermore, even when gradient feedback is unavailable, the
    flexibility of our model, including gradient prediction, still allows us
    outperform competing approaches, although with a smaller margin. Due to the
    pervasiveness of bandit based optimisation, our scheme opens up for improved
    performance both in meta-optimisation and in applications where gradient
    related information is readily available.

    Dimensionality reduction for acoustic vehicle classification with spectral clustering

    Justin Sunu, Allon G. Percus
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)

    Classification of vehicles has broad applications, ranging from traffic flow
    management to military target recognition. We propose a method for automated
    identification of moving vehicles from roadside audio sensors. We use a
    short-time Fourier transform to decompose audio signals, and treat the
    frequency signature at each time window as an individual data point to be
    classified. Using spectral clustering, we then decrease the dimensionality of
    the data sufficiently for K-nearest neighbors to provide accurate vehicle
    identification.

    Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes

    Zhenwen Dai, Mauricio A. Álvarez, Neil D. Lawrence
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Often in machine learning, data are collected as a combination of multiple
    conditions, e.g., the voice recordings of multiple persons, each labeled with
    an ID. How could we build a model that captures the latent information related
    to these conditions and generalize to a new one with few data? We present a new
    model called Latent Variable Multiple Output Gaussian Processes (LVMOGP) and
    that allows to jointly model multiple conditions for regression and generalize
    to a new condition with a few data points at test time. LVMOGP infers the
    posteriors of Gaussian processes together with a latent space representing the
    information about different conditions. We derive an efficient variational
    inference method for LVMOGP, of which the computational complexity is as low as
    sparse Gaussian processes. We show that LVMOGP significantly outperforms
    related Gaussian process methods on various tasks with both synthetic and real
    data.

    Lifelong Generative Modeling

    Jason Ramapuram, Magda Gregorova, Alexandros Kalousis
    Comments: 14 pages, 9 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Lifelong learning is the problem of learning multiple consecutive tasks in an
    online manner and is essential towards the development of intelligent machines
    that can adapt to their surroundings. In this work we focus on learning a
    lifelong approach to generative modeling whereby we continuously incorporate
    newly observed distributions into our model representation. We utilize two
    models, aptly named the student and the teacher, in order to aggregate
    information about all past distributions without the preservation of any of the
    past data or previous models. The teacher is utilized as a form of compressed
    memory in order to allow for the student model to learn over the past as well
    as present data. We demonstrate why a naive approach to lifelong generative
    modeling fails and introduce a regularizer with which we demonstrate learning
    across a long range of distributions.

    Global hard thresholding algorithms for joint sparse image representation and denoising

    Reza Borhani, Jeremy Watt, Aggelos Katsaggelos
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Sparse coding of images is traditionally done by cutting them into small
    patches and representing each patch individually over some dictionary given a
    pre-determined number of nonzero coefficients to use for each patch. In lack of
    a way to effectively distribute a total number (or global budget) of nonzero
    coefficients across all patches, current sparse recovery algorithms distribute
    the global budget equally across all patches despite the wide range of
    differences in structural complexity among them. In this work we propose a new
    framework for joint sparse representation and recovery of all image patches
    simultaneously. We also present two novel global hard thresholding algorithms,
    based on the notion of variable splitting, for solving the joint sparse model.
    Experimentation using both synthetic and real data shows effectiveness of the
    proposed framework for sparse image representation and denoising tasks.
    Additionally, time complexity analysis of the proposed algorithms indicate high
    scalability of both algorithms, making them favorable to use on large megapixel
    images.

    PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations

    Rico Jonschkowski, Roland Hafner, Jonathan Scholz, Martin Riedmiller
    Comments: Submitted to Robotics: Science and Systems (RSS 2017) Workshop New Frontiers for Deep Learning in Robotics this http URL
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We propose position-velocity encoders (PVEs) which learn—without
    supervision—to encode images to positions and velocities of task-relevant
    objects. PVEs encode a single image into a low-dimensional position state and
    compute the velocity state from finite differences in position. In contrast to
    autoencoders, position-velocity encoders are not trained by image
    reconstruction, but by making the position-velocity representation consistent
    with priors about interacting with the physical world. We applied PVEs to
    several simulated control tasks from pixels and achieved promising preliminary
    results.

    Growth-Optimal Portfolio Selection under CVaR Constraints

    Guy Uziel, Ran El-Yaniv
    Subjects: Mathematical Finance (q-fin.MF); Learning (cs.LG)

    Online portfolio selection research has so far focused mainly on minimizing
    regret defined in terms of wealth growth. Practical financial decision making,
    however, is deeply concerned with both wealth and risk. We consider online
    learning of portfolios of stocks whose prices are governed by arbitrary
    (unknown) stationary and ergodic processes, where the goal is to maximize
    wealth while keeping the conditional value at risk (CVaR) below a desired
    threshold. We characterize the asymptomatically optimal risk-adjusted
    performance and present an investment strategy whose portfolios are guaranteed
    to achieve the asymptotic optimal solution while fulfilling the desired risk
    constraint. We also numerically demonstrate and validate the viability of our
    method on standard datasets.

    Deep Complex Networks

    Chiheb Trabelsi, Olexa Bilaniuk, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, Christopher J Pal
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    At present, the vast majority of building blocks, techniques, and
    architectures for deep learning are based on real-valued operations and
    representations. However, recent work on recurrent neural networks and older
    fundamental theoretical analysis suggests that complex numbers could have a
    richer representational capacity and could also facilitate noise-robust memory
    retrieval mechanisms. Despite their attractive properties and potential for
    opening up entirely new neural architectures, complex-valued deep neural
    networks have been marginalized due to the absence of the building blocks
    required to design such models. In this work, we provide the key atomic
    components for complex-valued deep neural networks and apply them to
    convolutional feed-forward networks. More precisely, we rely on complex
    convolutions and present algorithms for complex batch-normalization, complex
    weight initialization strategies for complex-valued neural nets and we use them
    in experiments with end-to-end training schemes. We demonstrate that such
    complex-valued models are able to achieve comparable or better performance than
    their real-valued counterparts. We test deep complex models on several computer
    vision tasks and on music transcription using the MusicNet dataset where we
    achieve state of the art performance.

    Stochastic Feedback Control of Systems with Unknown Nonlinear Dynamics

    Dan Yu, Mohammadhussein Rafieisakhaei, Suman Chakravorty
    Comments: 7 pages, 7 figures, submitted to 56th IEEE Conference on Decision and Control (CDC), 2017
    Subjects: Systems and Control (cs.SY); Learning (cs.LG)

    This paper studies the stochastic optimal control problem for systems with
    unknown dynamics. First, an open-loop deterministic trajectory optimization
    problem is solved without knowing the explicit form of the dynamical system.
    Next, a Linear Quadratic Gaussian (LQG) controller is designed for the nominal
    trajectory-dependent linearized system, such that under a small noise
    assumption, the actual states remain close to the optimal trajectory. The
    trajectory-dependent linearized system is identified using input-output
    experimental data consisting of the impulse responses of the nominal system. A
    computational example is given to illustrate the performance of the proposed
    approach.

    Online Auctions and Multi-scale Online Learning

    Sébastien Bubeck, Nikhil R. Devanur, Zhiyi Huang, Rad Niazadeh
    Comments: In Proceeding of 18th ACM conference on Economics and Computation (EC 2017)
    Subjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Machine Learning (stat.ML)

    We consider revenue maximization in online auctions and pricing. A seller
    sells an identical item in each period to a new buyer, or a new set of buyers.
    For the online posted pricing problem, we show regret bounds that scale with
    the best fixed price, rather than the range of the values. We also show regret
    bounds that are almost scale free, and match the offline sample complexity,
    when comparing to a benchmark that requires a lower bound on the market share.
    These results are obtained by generalizing the classical learning from experts
    and multi-armed bandit problems to their multi-scale versions. In this version,
    the reward of each action is in a different range, and the regret w.r.t. a
    given action scales with its own range, rather than the maximum range.

    Deriving Neural Architectures from Sequence and Graph Kernels

    Tao Lei, Wengong Jin, Regina Barzilay, Tommi Jaakkola
    Comments: to appear at ICML 2017; includes additional discussions
    Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG)

    The design of neural architectures for structured objects is typically guided
    by experimental insights rather than a formal process. In this work, we appeal
    to kernels over combinatorial structures, such as sequences and graphs, to
    derive appropriate neural operations. We introduce a class of deep recurrent
    neural operations and formally characterize their associated kernel spaces. Our
    recurrent modules compare the input to virtual reference objects (cf. filters
    in CNN) via the kernels. Similar to traditional neural operations, these
    reference objects are parameterized and directly optimized in end-to-end
    training. We empirically evaluate the proposed class of neural architectures on
    standard applications such as language modeling and molecular graph regression,
    achieving state-of-the-art or competitive results across these applications. We
    also draw connections to existing architectures such as LSTMs.

    Deep Descriptor Transforming for Image Co-Localization

    Xiu-Shen Wei, Chen-Lin Zhang, Yao Li, Chen-Wei Xie, Jianxin Wu, Chunhua Shen, Zhi-Hua Zhou
    Comments: Accepted by IJCAI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Reusable model design becomes desirable with the rapid expansion of machine
    learning applications. In this paper, we focus on the reusability of
    pre-trained deep convolutional models. Specifically, different from treating
    pre-trained models as feature extractors, we reveal more treasures beneath
    convolutional layers, i.e., the convolutional activations could act as a
    detector for the common object in the image co-localization problem. We propose
    a simple but effective method, named Deep Descriptor Transforming (DDT), for
    evaluating the correlations of descriptors and then obtaining the
    category-consistent regions, which can accurately locate the common object in a
    set of images. Empirical studies validate the effectiveness of the proposed DDT
    method. On benchmark image co-localization datasets, DDT consistently
    outperforms existing state-of-the-art methods by a large margin. Moreover, DDT
    also demonstrates good generalization ability for unseen categories and
    robustness for dealing with noisy data.


    Information Theory

    Near Optimal Online Distortion Minimization for Energy Harvesting Nodes

    Ahmed Arafa, Sennur Ulukus
    Comments: To appear in the 2017 IEEE International Symposium on Information Theory
    Subjects: Information Theory (cs.IT)

    We consider online scheduling for an energy harvesting communication system
    where a sensor node collects samples from a Gaussian source and sends them to a
    destination node over a Gaussian channel. The sensor is equipped with a
    finite-sized battery that is recharged by an independent and identically
    distributed (i.i.d.) energy harvesting process over time. The goal is to
    minimize the long term average distortion of the source samples received at the
    destination. We study two problems: the first is when sampling is cost-free,
    and the second is when there is a sampling cost incurred whenever samples are
    collected. We show that fixed fraction policies [Shaviv-Ozgur], in which a
    fixed fraction of the battery state is consumed in each time slot, are
    near-optimal in the sense that they achieve a long term average distortion that
    lies within a constant additive gap from the optimal solution for all energy
    arrivals and battery sizes. For the problem with sampling costs, the
    transmission policy is bursty; the sensor can collect samples and transmit for
    only a portion of the time.

    Robustness to unknown error in sparse regularization

    Simone Brugiapaglia, Ben Adcock
    Subjects: Information Theory (cs.IT); Numerical Analysis (math.NA)

    Quadratically-constrained basis pursuit has become a popular device in sparse
    regularization; in particular, in the context of compressed sensing. However,
    the majority of theoretical error estimates for this regularizer assume an a
    priori bound on the noise level, which is usually lacking in practice. In this
    paper, we develop stability and robustness estimates which remove this
    assumption. First, we introduce an abstract framework and show that robust
    instance optimality of any decoder in the noise-aware setting implies stability
    and robustness in the noise-blind setting. This is based on certain sup-inf
    constants referred to as quotients, strictly related to the quotient property
    of compressed sensing. We then apply this theory to prove the robustness of
    quadratically-constrained basis pursuit under unknown error in the cases of
    random Gaussian matrices and of random matrices with heavy-tailed rows, such as
    random sampling matrices from bounded orthonormal systems. We illustrate our
    results in several cases of practical importance, including subsampled Fourier
    measurements and recovery of sparse polynomial expansions.

    Session-Based Cooperation in Cognitive Radio Networks: A Network-Level Approach

    Haichuan Ding, Chi Zhang, Xuanheng Li, Jianqing Liu, Miao Pan, Yuguang Fang, Shigang Chen
    Subjects: Information Theory (cs.IT)

    In cognitive radio networks (CRNs), secondary users (SUs) can proactively
    obtain spectrum access opportunities by helping with primary users’ (PUs’) data
    transmissions. Currently, such kind of spectrum access is implemented via a
    cooperative communications based link-level frame-based cooperative (LLC)
    approach where individual SUs independently serve as relays for PUs in order to
    gain spectrum access opportunities. Unfortunately, this LLC approach cannot
    fully exploit spectrum access opportunities to enhance the throughput of CRNs
    and fails to motivate PUs to join the spectrum sharing processes. To address
    these challenges, we propose a network-level session-based cooperative (NLC)
    approach where SUs are grouped together to cooperate with PUs session by
    session, instead of frame by frame as what has been done in existing works, for
    spectrum access opportunities of the corresponding group. Thanks to our
    group-based session-by-session cooperating strategy, our NLC approach is able
    to address all those challenges in the LLC approach. To articulate our NLC
    approach, we further develop an NLC scheme under a cognitive capacity
    harvesting network (CCHN) architecture. We formulate the cooperative mechanism
    design as a cross-layer optimization problem with constraints on primary
    session selection, flow routing and link scheduling. To search for solutions to
    the optimization problem, we propose an augmented scheduling index ordering
    based (SIO-based) algorithm to identify maximal independent sets. Through
    extensive simulations, we demonstrate the effectiveness of the proposed NLC
    approach and the superiority of the augmented SIO-based algorithm over the
    traditional method.

    Energy-Efficient Transponder Configuration for FMF-based Elastic Optical Networks

    Mohammad Hadi, Mohammad Reza Pakravan
    Comments: 4 pages, 4 figures. arXiv admin note: text overlap with arXiv:1705.06891
    Subjects: Information Theory (cs.IT)

    We propose an energy-efficient procedure for transponder configuration in
    FMF-based elastic optical networks in which quality of service and physical
    constraints are guaranteed and joint optimization of transmit optical power,
    temporal, spatial and spectral variables are addressed. We use geometric
    convexification techniques to provide convex representations for quality of
    service, transponder power consumption and transponder configuration problem.
    Simulation results show that our convex formulation is considerably faster than
    its mixed-integer nonlinear counterpart and its ability to optimize transmit
    optical power reduces total transponder power consumption up to 32%. We also
    analyze the effect of mode coupling and number of available modes on power
    consumption of different network elements.

    Rate ((n-1)/n) Systematic MDS Convolutional Codes over (GF(2^m))

    Ángela Barbero, Øyvind Ytrehus
    Subjects: Information Theory (cs.IT)

    A systematic convolutional encoder of rate ((n-1)/n) and maximum degree (D)
    generates a code of free distance at most ({cal D} = D+2) and, at best, a
    column distance profile (CDP) of ([2,3,ldots,{cal D}]). A code is
    emph{Maximum Distance Separable} (MDS) if it possesses this CDP. Applied on a
    communication channel over which packets are transmitted sequentially and which
    loses (erases) packets randomly, such a code allows the recovery from any
    pattern of (j) erasures in the first (j) (n)-packet blocks for (j<{cal D}),
    with a delay of at most (j) blocks counting from the first erasure. This paper
    addresses the problem of finding the largest ({cal D}) for which a systematic
    rate ((n-1)/n) code over (GF(2^m)) exists, for given (n) and (m). In
    particular, constructions for rates ((2^m-1)/2^m) and ((2^{m-1}-1)/2^{m-1}) are
    presented which provide optimum values of ({cal D}) equal to 3 and 4,
    respectively. A search algorithm is also developed, which produces new codes
    for ({cal D}) for field sizes (2^m leq 2^{14}). Using a complete search
    version of the algorithm, the maximum value of ({cal D}), and codes that
    achieve it, are determined for all code rates (geq 1/2) and every field size
    (GF(2^m)) for (mleq 5) (and for some rates for (m=6)).

    Symmetry Group of Ordered Hamming Block Space

    Luciano Panek, Nayene Michele Paião Panek
    Subjects: Information Theory (cs.IT)

    Let (P = ({1,2,ldots,n,leq)) be a poset that is an union of disjoint
    chains of the same length and (V=mathbb{F}_q^N) be the space of (N)-tuples
    over the finite field (mathbb{F}_q). Let (V_i = mathbb{F}_q^{k_i}), (1 leq i
    leq n), be a family of finite-dimensional linear spaces such that
    (k_1+k_2+ldots +k_n = N) and let (V = V_1 oplus V_2 oplus ldots oplus V_n)
    endow with the poset block metric (d_{(P,pi)}) induced by the poset (P) and
    the partition (pi=(k_1,k_2,ldots,k_n)), encompassing both
    Niederreiter-Rosenbloom-Tsfasman metric and error-block metric. In this paper,
    we give a complete description of group of symmetries of the metric space
    ((V,d_{(P,pi)})), called the ordered Hammming block space. In particular, we
    reobtain the group of symmetries of the Niederreiter-Rosenbloom-Tsfasman space
    and obtain the group of symmetries of the error-block metric space.

    User Selection and Widely Linear Multiuser Precoding for One-dimensional Signalling

    Majid Bavand, Steven D. Blostein
    Subjects: Information Theory (cs.IT)

    Massive deployment of low data rate Internet of things and ehealth devices
    prompts us to develop more practical precoding and user selection techniques
    that comply with these requirements. Moreover, it is known that when the data
    is real-valued and the observation is complex-valued, widely linear (WL)
    estimation can be employed in lieu of linear estimation to improve the
    performance. With these motivations, in this paper, we study the transmit
    precoding (beamforming) in multiuser multiple-input single-output
    communications systems assuming the transmit signal is one-dimensionally
    modulated and widely linear estimation is performed at the receivers.
    Closed-form solutions for widely linear maximum ratio transmission (MRT), WL
    zero-forcing (ZF), WL minimum mean square error (MMSE), and WL maximum signal
    to leakage and noise ratio (MSLNR) precoding are obtained. It is shown that
    widely linear processing can potentially double the number of simultaneous
    users compared to the linear processing of one-dimensionally modulated signals.
    Furthermore, to deal with the increasing number of communications devices a
    user selection algorithm compatible with widely linear processing of
    one-dimensionally modulated signals is proposed. The proposed user selection
    algorithm can double the number of simultaneously selected users compared to
    conventional user selection methods.

    Local Large Deviations, McMillian Theorem for multitype Galton-Watson Processes

    Kwabena Doku-Amponsah
    Comments: 6 pages
    Subjects: Information Theory (cs.IT)

    In this article we prove a local large deviation principle (LLDP) for the
    critical multitype Galton-Watson process from spectral potential point. We
    define the so-called spectral potential (U_{mathbb{Q}}(,cdot,,pi)) for the
    Galton-Watson process, where (pi) is normalized eigen vector corresponding to
    the leading eigen value (1) of the transition matrix (A(cdot,,cdot)) defined
    from ({mathbb{Q}},) the transition kernel. We show that the Kullback action or
    the deviation function, (J(pi,
    u),) with respect to an empirical offspring
    measure, (
    u,) is the Legengre dual of (U_{mathbb{Q}}(,cdot,,pi).) From
    the LLDP we deduce a full large deviation principle and a weak variant of the
    classical McMillian Theorem for the multitype Galton-Watson process. To be
    specific, given any empirical offspring measure (varpi,) we show the number of
    critical multitype Galton-Watson processes on (n) vertices is approximately
    (e^{nlangle H_{varpi},,pi
    angle},) where (H_{varpi}) is suitably defined
    entropy.

    Near-optimal matrix recovery from random linear measurements

    Elad Romanov, Matan Gavish
    Subjects: Information Theory (cs.IT)

    In matrix recovery from random linear measurements, one is interested in
    recovering an unknown (M)-by-(N) matrix (X_0) from (n<MN) measurements
    (y_i=Tr(A_i^T X_0)) where each (A_i) is an (M)-by-(N) measurement matrix with
    i.i.d random entries, (i=1,ldots,n). We present a novel matrix recovery
    algorithm, based on approximate message passing, which iteratively applies an
    optimal singular value shrinker — a nonconvex nonlinearity tailored
    specifically for matrix estimation. Our algorithm often converges exponentially
    fast, offering a significant speedup over previously suggested matrix recovery
    algorithms, such as iterative solvers for Nuclear Norm Minimization (NNM). It
    is well known that there is recovery tradeoff between the information content
    of the object (X_0) to be recovered (specifically, its matrix rank (r)) and the
    number of linear measurements (n) from which recovery is to be attempted. The
    precise tradeoff between (r) and (n), beyond which recovery by a given
    algorithm becomes possible, traces the so-called phase transition curve of that
    algorithm in the ((r,n)) plane. The phase transition curve of our algorithm is
    noticeably better than that of NNM. Interestingly, it is close to the
    information-theoretic lower bound for the minimal number of measurements needed
    for matrix recovery, making it not only the fastest algorithm known, but also
    near-optimal in terms of the matrices it successfully recovers.

    Projection Theorems of Divergences and Likelihood Maximization Methods

    Atin Gayen, M. Ashok Kumar
    Comments: 20 pages
    Subjects: Information Theory (cs.IT); Probability (math.PR); Statistics Theory (math.ST)

    Projection theorems of divergences enable us to find reverse projection of a
    divergence on a specific statistical model as a forward projection of the
    divergence on a different but rather “simpler” statistical model, which, in
    turn, results in solving a system of linear equations. Reverse projection of
    divergences are closely related to various estimation methods such as the
    maximum likelihood estimation or its variants in robust statistics. We consider
    projection theorems of three parametric families of divergences that are widely
    used in robust statistics, namely the Renyi divergences (or the Cressie-Reed
    Power Divergences), Density Power Divergences, and the Relative alpha-Entropy
    (or the Logarithmic Density Power Divergences). We explore these projection
    theorems from the usual likelihood maximization approach and from the principle
    of sufficiency. In particular, we show the equivalence of solving the
    estimation problems by the projection theorems of the respective divergences
    and by directly solving the corresponding estimating equations.

    On shortened and punctured cyclic codes

    Arti Yardi, Ruud Pellikaan
    Subjects: Information Theory (cs.IT)

    The problem of identifying whether the family of cyclic codes is
    asymptotically good or not is a long-standing open problem in the field of
    coding theory. It is known in the literature that some families of cyclic codes
    such as BCH codes and Reed-Solomon codes are asymptotically bad, however in
    general the answer to this question is not known. A recent result by Nelson and
    Van Zwam shows that, all linear codes can be obtained by a sequence of
    puncturing and/or shortening of a collection of asymptotically good
    codes~cite{Nelson_2015}. In this paper, we prove that any linear code can be
    obtained by a sequence of puncturing and/or shortening of some cyclic code.
    Therefore the result that all codes can be obtained by shortening and/or
    puncturing cyclic codes leaves the possibility open that cyclic codes are
    asymptotically good.

    Maximizing Indoor Wireless Coverage Using UAVs Equipped with Directional Antennas

    Hazim Shakhatreh, Abdallah Khreishah
    Comments: 19 pages, 17 figures
    Subjects: Information Theory (cs.IT)

    Unmanned aerial vehicles (UAVs) can be used to provide wireless coverage
    during emergency cases where each UAV serves as an aerial wireless base station
    when the cellular network goes down. They can also be used to supplement the
    ground base station in order to provide better coverage and higher data rates
    for the users. In this paper, we aim to maximize the indoor wireless coverage
    using UAVs equipped with directional antennas. We study the case that the UAVs
    are using one channel, thus in order to maximize the total indoor wireless
    coverage, we avoid any overlapping in their coverage volumes. We present two
    methods to place the UAVs; providing wireless coverage from one building side
    and from two building sides. In the first method, we utilize circle packing
    theory to determine the 3-D locations of the UAVs in a way that the total
    coverage area is maximized. In the second method, we place the UAVs in front of
    two building sides and efficiently arrange the UAVs in alternating upsidedown
    arrangements. We show that the upside-down arrangements problem can be
    transformed from 3D to 2D and based on that we present an efficient algorithm
    to solve the problem. Our results show that the upside-down arrangements of
    UAVs, can improve the maximum total coverage by 100% compared to providing
    wireless coverage from one building side.

    The Indoor Mobile Coverage Problem Using UAVs

    Hazim Shakhatreh, Abdallah Khreishah, Issa Khalil
    Comments: 27 pages, 14 figures
    Subjects: Information Theory (cs.IT)

    Unmanned aerial vehicles (UAVs) can be used as aerial wireless base stations
    when cellular networks are not operational due to natural disasters. They can
    also be used to supplement the ground base station in order to provide better
    coverage and higher data rates for the users. Prior studies on UAV based
    wireless coverage typically consider an Air-to-Ground path loss model, which
    assumes that the users are outdoor and located on a 2D plane. In this paper, we
    propose using UAVs to provide wireless coverage for indoor users inside a
    high-rise building. First, we present realistic Outdoor-Indoor path loss models
    and describe the tradeoff introduced by these models. Then we study the problem
    of efficient placement of a single UAV, where the objective is to minimize the
    total transmit power required to cover the entire high-rise building. The
    formulated problem is non-convex and is generally difficult to solve. To that
    end, we consider three cases of practical interest and provide efficient
    solutions to the formulated problem under these cases. Then we study the
    problem of minimizing the number of UAVs required to provide wireless coverage
    to high rise buildings and prove that this problem is NP-complete. Due to the
    intractability of the problem, we use clustering to minimize the number of UAVs
    required to cover the indoor users. We demonstrate through simulations that the
    method that clusters the building into regular structures and places the UAVs
    in each cluster requires 80% more number of UAVs relative to our clustering
    algorithm.

    Providing Wireless Coverage to High-rise Buildings Using UAVs

    Hazim Shakhatreh, Abdallah Khreishah, Bo Ji
    Comments: 6 pages, 5 figures
    Subjects: Information Theory (cs.IT)

    Unmanned aerial vehicles (UAVs) can be used as aerial wireless base stations
    when cellular networks go down. Prior studies on UAV-based wireless coverage
    typically consider an Air-to-Ground path loss model, which assumes that the
    users are outdoor and they are located on a 2D plane. In this paper, we propose
    using a single UAV to provide wireless coverage for indoor users inside a
    high-rise building under disaster situations (such as earthquakes or floods),
    when cellular networks are down. First, we present a realistic Outdoor-Indoor
    path loss model and describe the tradeoff introduced by this model. Then, we
    study the problem of efficient UAV placement, where the objective is to
    minimize the total transmit power required to cover the entire high-rise
    building. The formulated problem is non-convex and is generally difficult to
    solve. To that end, we consider two cases of practical interest and provide the
    efficient solutions to the formulated problem under these cases. In the first
    case, we aim to find the minimum transmit power such that an indoor user with
    the maximum path loss can be covered. In the second case, we assume that the
    locations of indoor users are symmetric across the dimensions of each floor.

    Efficient 3D Placement of a UAV Using Particle Swarm Optimization

    Hazim Shakhatreh, Abdallah Khreishah, Ayoub Alsarhan, Issa Khalil, Ahmad Sawalmeh, Noor Shamsiah Othman
    Comments: 6 pages, 7 figures
    Subjects: Information Theory (cs.IT)

    Unmanned aerial vehicles (UAVs) can be used as aerial wireless base stations
    when cellular networks go down. Prior studies on UAV-based wireless coverage
    typically consider an Air-to-Ground path loss model, which assumes that the
    users are outdoor and they are located on a 2D plane. In this paper, we propose
    using a single UAV to provide wireless coverage for indoor users inside a
    high-rise building under disaster situations (such as earthquakes or floods),
    when cellular networks are down. We assume that the locations of indoor users
    are uniformly distributed in each floor and we propose a particle swarm
    optimization algorithm to find an efficient 3D placement of a UAV that
    minimizes the total transmit power required to cover the indoor users.

    On The Continuous Coverage Problem for a Swarm of UAVs

    Hazim Shakhatreh, Abdallah Khreishah, Jacob Chakareski, Haythem Bany Salameh, Issa Khalil
    Comments: 6 pages, 6 figures
    Subjects: Information Theory (cs.IT)

    Unmanned aerial vehicles (UAVs) can be used to provide wireless network and
    remote surveillance coverage for disaster-affected areas. During such a
    situation, the UAVs need to return periodically to a charging station for
    recharging, due to their limited battery capacity. We study the problem of
    minimizing the number of UAVs required for a continuous coverage of a given
    area, given the recharging requirement. We prove that this problem is
    NP-complete. Due to its intractability, we study partitioning the coverage
    graph into cycles that start at the charging station. We first characterize the
    minimum number of UAVs to cover such a cycle based on the charging time, the
    traveling time, and the number of subareas to be covered by the cycle. Based on
    this analysis, we then develop an efficient algorithm, the cycles with limited
    energy algorithm. The straightforward method to continuously cover a given area
    is to split it into N subareas and cover it by N cycles using N additional
    UAVs. Our simulation results examine the importance of critical system
    parameters: the energy capacity of the UAVs, the number of subareas in the
    covered area, and the UAV charging and traveling times.We demonstrate that the
    cycles with limited energy algorithm requires 69%-94% fewer additional UAVs
    relative to the straightforward method, as the energy capacity of the UAVs is
    increased, and 67%-71% fewer additional UAVs, as the number of subareas is
    increased.

    On the Capacity of Fractal Wireless Networks With Direct Social Interactions

    Ying Chen, Rongpeng Li, Zhifeng Zhao, Honggang Zhang
    Subjects: Information Theory (cs.IT)

    The capacity of a fractal wireless network with direct social interactions is
    studied in this paper. Specifically, we mathematically formulate the
    self-similarity of a fractal wireless network by a power-law degree
    distribution ( P(k) ), and we capture the connection feature between two nodes
    with degree ( k_{1} ) and ( k_{2} ) by a joint probability distribution (
    P(k_{1},k_{2}) ). It is proved that if the source node communicates with one of
    its direct contacts randomly, the maximum capacity is consistent with the
    classical result ( Thetaleft(frac{1}{sqrt{nlog n}}
    ight) ) achieved by
    Kumar cite{Gupta2000The}. On the other hand, if the two nodes with distance (
    d ) communicate according to the probability ( d^{-eta} ), the maximum
    capacity can reach up to ( Thetaleft(frac{1}{log n}
    ight) ), which
    exhibits remarkable improvement compared with the well-known result in
    cite{Gupta2000The}.

    Optimal Transport Theory for Cell Association in UAV-Enabled Cellular Networks

    Mohammad Mozaffari, Walid Saad, Mehdi Bennis, Merouane Debbah
    Comments: Accepted in IEEE Communications Letters
    Subjects: Information Theory (cs.IT)

    In this paper, a novel framework for delay-optimal cell association in
    unmanned aerial vehicle (UAV)-enabled cellular networks is proposed. In
    particular, to minimize the average network delay under any arbitrary spatial
    distribution of the ground users, the optimal cell partitions of UAVs and
    terrestrial base stations (BSs) are determined. To this end, using the powerful
    mathematical tools of optimal transport theory, the existence of the solution
    to the optimal cell association problem is proved and the solution space is
    completely characterized. The analytical and simulation results show that the
    proposed approach yields substantial improvements of the average network delay.

    Control and Energy Management System in Microgrids

    Hajir Pourbabak, Tao Chen, Bowen Zhang, Wencong Su
    Comments: 25 pages, 12 figures, Book Chapter in Clean Energy Microgrids, Editors: Shin’ya Obara, Jorge Morel (2017)
    Subjects: Systems and Control (cs.SY); Information Theory (cs.IT)

    As a cutting-edge technology, microgrids feature intelligent EMSs and
    sophisticated control, which will dramatically change our energy
    infrastructure. The modern microgrids are a relatively recent development with
    high potential to bring distributed generation, DES devices, controllable
    loads, communication infrastructure, and many new technologies into the
    mainstream. As a more controllable and intelligent entity, a microgrid has more
    growth potential than ever before. However, there are still many open
    questions, such as the future business models and economics. What is the
    cost-benefit to the end-user? How should we systematically evaluate the
    potential benefits and costs of control and energy management in a microgrid?

    Maximum Number of Common Zeros of Homogeneous Polynomials over Finite Fields

    Peter Beelen, Mrinmoy Datta, Sudhir R. Ghorpade
    Comments: 15 pages
    Subjects: Algebraic Geometry (math.AG); Information Theory (cs.IT)

    About two decades ago, Tsfasman and Boguslavsky conjectured a formula for the
    maximum number of common zeros that (r) linearly independent homogeneous
    polynomials of degree (d) in (m+1) variables with coefficients in a finite
    field with (q) elements can have in the corresponding (m)-dimensional
    projective space. Recently, it has been shown by Datta and Ghorpade that this
    conjecture is valid if (r) is at most (m+1) and can be invalid otherwise.
    Moreover a new conjecture was proposed for many values of (r) beyond (m+1). In
    this paper, we prove that this new conjecture holds true for several values of
    (r). In particular, this settles the new conjecture completely when (d=3). Our
    result also includes the positive result of Datta and Ghorpade as a special
    case. Further, we determine the maximum number of zeros in certain cases not
    covered by the earlier conjectures and results, namely, the case of (d=q-1) and
    of (d=q). All these results are directly applicable to the determination of the
    maximum number of points on sections of Veronese varieties by linear
    subvarieties of a fixed dimension, and also the determination of generalized
    Hamming weights of projective Reed-Muller codes.

    Measurement uncertainty relations for position and momentum: Relative entropy formulation

    Alberto Barchielli, Matteo Gregoratti, Alessandro Toigo
    Comments: 33 pages
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph)

    Heisenberg’s uncertainty principle has recently led to general measurement
    uncertainty relations for quantum systems: incompatible observables can be
    measured jointly or in sequence only with some unavoidable approximation, which
    can be quantified in various ways. The relative entropy is the natural
    theoretical quantifier of the information loss when a `true’ probability
    distribution is replaced by an approximating one. In this paper, we provide a
    lower bound for the amount of information that is lost by replacing the
    distributions of the sharp position and momentum observables, as they could be
    obtained with two separate experiments, by the marginals of any smeared joint
    measurement. The bound is obtained by introducing an entropic error function,
    and optimizing it over a suitable class of covariant approximate joint
    measurements. We fully exploit two cases of target observables: (1)
    (n)-dimensional position and momentum vectors; (2) two components of position
    and momentum along different directions. In (1), we connect the quantum bound
    to the dimension (n); in (2), going from parallel to orthogonal directions, we
    show the transition from highly incompatible observables to compatible ones.
    For simplicity, we develop the theory only for Gaussian states and
    measurements.




沪ICP备19023445号-2号
友情链接