IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 22 Mar 2017

    我爱机器学习(52ml.net)发表于 2017-03-22 00:00:00
    love 0

    Neural and Evolutionary Computing

    An Accelerated Analog Neuromorphic Hardware System Emulating NMDA- and Calcium-Based Non-Linear Dendrites

    Johannes Schemmel, Laura Kriener, Paul Müller, Karlheinz Meier
    Comments: Accepted at IJCNN 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Emerging Technologies (cs.ET)

    This paper presents an extension of the BrainScaleS accelerated analog
    neuromorphic hardware model. The scalable neuromorphic architecture is extended
    by the support for multi-compartment models and non-linear dendrites. These
    features are part of a SI{65}{
    anometer} prototype ASIC. It allows to
    emulate different spike types observed in cortical pyramidal neurons: NMDA
    plateau potentials, calcium and sodium spikes. By replicating some of the
    structures of these cells, they can be configured to perform coincidence
    detection within a single neuron. Built-in plasticity mechanisms can modify not
    only the synaptic weights, but also the dendritic synaptic composition to
    efficiently train large multi-compartment neurons. Transistor-level simulations
    demonstrate the functionality of the analog implementation and illustrate
    analogies to biological measurements.

    Evolving Parsimonious Networks by Mixing Activation Functions

    Alexander Hagg, Maximilian Mensing, Alexander Asteroth
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Neuroevolution methods evolve the weights of a neural network, and in some
    cases the topology, but little work has been done to analyze the effect of
    evolving the activation functions of individual nodes on network size, which is
    important when training networks with a small number of samples. In this work
    we extend the neuroevolution algorithm NEAT to evolve the activation function
    of neurons in addition to the topology and weights of the network. The size and
    performance of networks produced using NEAT with uniform activation in all
    nodes, or homogenous networks, is compared to networks which contain a mixture
    of activation functions, or heterogenous networks. For a number of regression
    and classification benchmarks it is shown that, (1) qualitatively different
    activation functions lead to different results in homogeneous networks, (2) the
    heterogeneous version of NEAT is able to select well performing activation
    functions, (3) producing heterogeneous networks that are significantly smaller
    than homogeneous networks.

    Ensemble representation learning: an analysis of fitness and survival for wrapper-based genetic programming methods

    William La Cava, Jason H. Moore
    Comments: Genetic and Evolutionary Computation Conference (GECCO) 2017, Berlin, Germany
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

    Recently we proposed a general, ensemble-based feature engineering wrapper
    (FEW) that was paired with a number of machine learning methods to solve
    regression problems. Here, we adapt FEW for supervised classification and
    perform a thorough analysis of fitness and survival methods within this
    framework. Our tests demonstrate that two fitness metrics, one introduced as an
    adaptation of the silhouette score, outperform the more commonly used Fisher
    criterion. We analyze survival methods and demonstrate that (epsilon)-lexicase
    survival works best across our test problems, followed by random survival which
    outperforms both tournament and deterministic crowding. We conduct
    hyper-parameter optimization for several classification methods using a large
    set of problems to benchmark the ability of FEW to improve data
    representations. The results show that FEW can improve the best classifier
    performance on several problems. We show that FEW generates readable and
    meaningful features for a biomedical problem with different ML pairings.

    One-Shot Imitation Learning

    Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, Wojciech Zaremba
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

    Imitation learning has been commonly applied to solve different tasks in
    isolation. This usually requires either careful feature engineering, or a
    significant number of samples. This is far from what we desire: ideally, robots
    should be able to learn from very few demonstrations of any given task, and
    instantly generalize to new situations of the same task, without requiring
    task-specific engineering. In this paper, we propose a meta-learning framework
    for achieving such capability, which we call one-shot imitation learning.

    Specifically, we consider the setting where there is a very large set of
    tasks, and each task has many instantiations. For example, a task could be to
    stack all blocks on a table into a single tower, another task could be to place
    all blocks on a table into two-block towers, etc. In each case, different
    instances of the task would consist of different sets of blocks with different
    initial states. At training time, our algorithm is presented with pairs of
    demonstrations for a subset of all tasks. A neural net is trained that takes as
    input one demonstration and the current state (which initially is the initial
    state of the other demonstration of the pair), and outputs an action with the
    goal that the resulting sequence of states and actions matches as closely as
    possible with the second demonstration. At test time, a demonstration of a
    single instance of a new task is presented, and the neural net is expected to
    perform well on new instances of this new task. The use of soft attention
    allows the model to generalize to conditions and tasks unseen in the training
    data. We anticipate that by training this model on a much greater variety of
    tasks and settings, we will obtain a general system that can turn any
    demonstrations into robust policies that can accomplish an overwhelming variety
    of tasks.

    Videos available at this https URL

    Dance Dance Convolution

    Chris Donahue, Zachary C. Lipton, Julian McAuley
    Subjects: Learning (cs.LG); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Machine Learning (stat.ML)

    Dance Dance Revolution (DDR) is a popular rhythm-based video game. Players
    perform steps on a dance platform in synchronization with music as directed by
    on-screen step charts. While many step charts are available in standardized
    packs, users may grow tired of existing charts, or wish to dance to a song for
    which no chart exists. We introduce the task of learning to choreograph. Given
    a raw audio track, the goal is to produce a new step chart. This task
    decomposes naturally into two subtasks: deciding when to place steps and
    deciding which steps to select. For the step placement task, we combine
    recurrent and convolutional neural networks to ingest spectrograms of low-level
    audio features to predict steps, conditioned on chart difficulty. For step
    selection, we present a conditional LSTM generative model that substantially
    outperforms n-gram and fixed-window approaches.


    Computer Vision and Pattern Recognition

    Pop-up SLAM: Semantic Monocular Plane SLAM for Low-texture Environments

    Shichao Yang, Yu Song, Michael Kaess, Sebastian Scherer
    Comments: International Conference on Intelligent Robots and Systems (IROS) 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    Existing simultaneous localization and mapping (SLAM) algorithms are not
    robust in challenging low-texture environments because there are only few
    salient features. The resulting sparse or semi-dense map also conveys little
    information for motion planning. Though some work utilize plane or scene layout
    for dense map regularization, they require decent state estimation from other
    sources. In this paper, we propose real-time monocular plane SLAM to
    demonstrate that scene understanding could improve both state estimation and
    dense mapping especially in low-texture environments. The plane measurements
    come from a pop-up 3D plane model applied to each single image. We also combine
    planes with point based SLAM to improve robustness. On a public TUM dataset,
    our algorithm generates a dense semantic 3D model with pixel depth error of 6.2
    cm while existing SLAM algorithms fail. On a 60 m long dataset with loops, our
    method creates a much better 3D model with state estimation error of 0.67%.

    How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)

    Adrian Bulat, Georgios Tzimiropoulos
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper investigates how far a very deep neural network is from attaining
    close to saturating performance on existing 2D and 3D face alignment datasets.
    To this end, we make the following three contributions: (a) we construct, for
    the first time, a very strong baseline by combining a state-of-the-art
    architecture for landmark localization with a state-of-the-art residual block,
    train it on a very large yet synthetically expanded 2D facial landmark dataset
    and finally evaluate it on all other 2D facial landmark datasets. (b) We create
    a guided by 2D landmarks network which converts 2D landmark annotations to 3D
    and unifies all existing datasets, leading to the creation of LS3D-W, the
    largest and most challenging 3D facial landmark dataset to date (~230,000
    images). (c) Following that, we train a neural network for 3D face alignment
    and evaluate it on the newly introduced LS3D-W. (d) We further look into the
    effect of all “traditional” factors affecting face alignment performance like
    large pose, initialization and resolution, and introduce a “new” one, namely
    the size of the network. (e) We show that both 2D and 3D face alignment
    networks achieve performance of remarkable accuracy which is probably close to
    saturating the datasets used. Demo code and pre-trained models can be
    downloaded from this http URL

    License Plate Detection and Recognition Using Deeply Learned Convolutional Neural Networks

    Syed Zain Masood, Guang Shu, Afshin Dehghan, Enrique G. Ortiz
    Comments: 10 pages, 4 figures, 2 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This work details Sighthounds fully automated license plate detection and
    recognition system. The core technology of the system is built using a sequence
    of deep Convolutional Neural Networks (CNNs) interlaced with accurate and
    efficient algorithms. The CNNs are trained and fine-tuned so that they are
    robust under different conditions (e.g. variations in pose, lighting,
    occlusion, etc.) and can work across a variety of license plate templates (e.g.
    sizes, backgrounds, fonts, etc). For quantitative analysis, we show that our
    system outperforms the leading license plate detection and recognition
    technology i.e. ALPR on several benchmarks. Our system is available to
    developers through the Sighthound Cloud API at
    this https URL

    Robust classification of different fingerprint copies with deep neural networks for database penetration rate reduction

    Daniel Peralta, Isaac Triguero, Salvador García, Yvan Saeys, Jose M. Benitez, Francisco Herrera
    Comments: Preprint submitted to Pattern Recognition
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The growth of fingerprint databases creates a need for strategies to reduce
    the identification time. Fingerprint classification reduces the search
    penetration rate by grouping the fingerprints into several classes. Typically,
    features describing the visual patterns of a fingerprint are extracted and fed
    to a classifier. The extraction can be time-consuming and error-prone,
    especially for fingerprints whose visual classification is dubious, and often
    includes a criterion to reject ambiguous fingerprints. In this paper, we
    propose to improve on this manually designed process by using deep neural
    networks, which extract implicit features directly from the images and perform
    the classification within a single learning process. An extensive experimental
    study assesses that convolutional neural networks outperform all other tested
    approaches by achieving a very high accuracy with no rejection. Moreover,
    multiple copies of the same fingerprint are consistently classified. The
    runtime of convolutional networks is also lower than that of combining feature
    extraction procedures with classification algorithms.

    ZM-Net: Real-time Zero-shot Image Manipulation Network

    Hao Wang, Xiaodan Liang, Hao Zhang, Dit-Yan Yeung, Eric P. Xing
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Learning (cs.LG); Machine Learning (stat.ML)

    Many problems in image processing and computer vision (e.g. colorization,
    style transfer) can be posed as ‘manipulating’ an input image into a
    corresponding output image given a user-specified guiding signal. A holy-grail
    solution towards generic image manipulation should be able to efficiently alter
    an input image with any personalized signals (even signals unseen during
    training), such as diverse paintings and arbitrary descriptive attributes.
    However, existing methods are either inefficient to simultaneously process
    multiple signals (let alone generalize to unseen signals), or unable to handle
    signals from other modalities. In this paper, we make the first attempt to
    address the zero-shot image manipulation task. We cast this problem as
    manipulating an input image according to a parametric model whose key
    parameters can be conditionally generated from any guiding signal (even unseen
    ones). To this end, we propose the Zero-shot Manipulation Net (ZM-Net), a
    fully-differentiable architecture that jointly optimizes an
    image-transformation network (TNet) and a parameter network (PNet). The PNet
    learns to generate key transformation parameters for the TNet given any guiding
    signal while the TNet performs fast zero-shot image manipulation according to
    both signal-dependent parameters from the PNet and signal-invariant parameters
    from the TNet itself. Extensive experiments show that our ZM-Net can perform
    high-quality image manipulation conditioned on different forms of guiding
    signals (e.g. style images and attributes) in real-time (tens of milliseconds
    per image) even for unseen signals. Moreover, a large-scale style dataset with
    over 20,000 style images is also constructed to promote further research.

    Improving Person Re-identification by Attribute and Identity Learning

    Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, Yi Yang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Person re-identification (re-ID) and attribute recognition share a common
    target at the pedestrian description. Their difference consists in the
    granularity. Attribute recognition focuses on local aspects of a person while
    person re-ID usually extracts global representations. Considering their
    similarity and difference, this paper proposes a very simple convolutional
    neural network (CNN) that learns a re-ID embedding and predicts the pedestrian
    attributes simultaneously. This multi-task method integrates an ID
    classification loss and a number of attribute classification losses, and
    back-propagates the weighted sum of the individual losses.

    Albeit simple, we demonstrate on two pedestrian benchmarks that by learning a
    more discriminative representation, our method significantly improves the re-ID
    baseline and is scalable on large galleries. We report competitive re-ID
    performance compared with the state-of-the-art methods on the two datasets.

    GP-GAN: Towards Realistic High-Resolution Image Blending

    Huikai Wu, Shuai Zheng, Junge Zhang, Kaiqi Huang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent advances in generative adversarial networks (GANs) have shown
    promising potentials in conditional image generation. However, how to generate
    high-resolution images remains an open problem. In this paper, we aim at
    generating high-resolution well-blended images given composited copy-and-paste
    ones, i.e. realistic high-resolution image blending. To achieve this goal, we
    propose Gaussian-Poisson GAN (GP-GAN), a framework that combines the strengths
    of classical gradient-based approaches and GANs, which is the first work that
    explores the capability of GANs in high-resolution image blending task to the
    best of our knowledge. Particularly, we propose Gaussian-Poisson Equation to
    formulate the high-resolution image blending problem, which is a joint
    optimisation constrained by the gradient and colour information. Gradient
    filters can obtain gradient information. For generating the colour information,
    we propose Blending GAN to learn the mapping between the composited image and
    the well-blended one. Compared to the alternative methods, our approach can
    deliver high-resolution, realistic images with fewer bleedings and unpleasant
    artefacts. Experiments confirm that our approach achieves the state-of-the-art
    performance on Transient Attributes dataset. A user study on Amazon Mechanical
    Turk finds that majority of workers are in favour of the proposed approach.

    Proposal Flow: Semantic Correspondences from Object Proposals

    Bumsub Ham, Minsu Cho, Cordelia Schmid, Jean Ponce
    Comments: arXiv admin note: text overlap with arXiv:1511.05065
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Finding image correspondences remains a challenging problem in the presence
    of intra-class variations and large changes in scene layout. Semantic flow
    methods are designed to handle images depicting different instances of the same
    object or scene category. We introduce a novel approach to semantic flow,
    dubbed proposal flow, that establishes reliable correspondences using object
    proposals. Unlike prevailing semantic flow approaches that operate on pixels or
    regularly sampled local regions, proposal flow benefits from the
    characteristics of modern object proposals, that exhibit high repeatability at
    multiple scales, and can take advantage of both local and geometric consistency
    constraints among proposals. We also show that the corresponding sparse
    proposal flow can effectively be transformed into a conventional dense flow
    field. We introduce two new challenging datasets that can be used to evaluate
    both general semantic flow techniques and region-based approaches such as
    proposal flow. We use these benchmarks to compare different matching
    algorithms, object proposals, and region features within proposal flow, to the
    state of the art in semantic flow. This comparison, along with experiments on
    standard datasets, demonstrates that proposal flow significantly outperforms
    existing semantic flow methods in various settings.

    Deep generative-contrastive networks for facial expression recognition

    Youngsung Kim, ByungIn Yoo, Youngjun Kwak, Changkyu Choi, Junmo Kim
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    As the expressive depth of an emotional face differs with individuals,
    expressions, or situations, recognizing an expression using a single facial
    image at a moment is difficult. One of the approaches to alleviate this
    difficulty is using a video-based method that utilizes multiple frames to
    extract temporal information between facial expression images. In this paper,
    we attempt to utilize a generative image that is estimated based on a given
    single image. Then, we propose to utilize a contrastive representation that
    explains an expression difference for discriminative purposes. The contrastive
    representation is calculated at the embedding layer of a deep network by
    comparing a single given image with a reference sample generated by a deep
    encoder-decoder network. Consequently, we deploy deep neural networks that
    embed a combination of a generative model, a contrastive model, and a
    discriminative model. In our proposed networks, we attempt to disentangle a
    facial expressive factor in two steps including learning of a reference
    generator network and learning of a contrastive encoder network. We conducted
    extensive experiments on three publicly available face expression databases
    (CK+, MMI, and Oulu-CASIA) that have been widely adopted in the recent
    literatures. The proposed method outperforms the known state-of-the art methods
    in terms of the recognition accuracy.

    Knowledge distillation using unlabeled mismatched images

    Mandar Kulkarni, Kalpesh Patil, Shirish Karande
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Current approaches for Knowledge Distillation (KD) either directly use
    training data or sample from the training data distribution. In this paper, we
    demonstrate effectiveness of ‘mismatched’ unlabeled stimulus to perform KD for
    image classification networks. For illustration, we consider scenarios where
    this is a complete absence of training data, or mismatched stimulus has to be
    used for augmenting a small amount of training data. We demonstrate that
    stimulus complexity is a key factor for distillation’s good performance. Our
    examples include use of various datasets for stimulating MNIST and CIFAR
    teachers.

    High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks

    Krzysztof J. Geras, Stacey Wolfson, S. Gene Kim, Linda Moy, Kyunghyun Cho
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Recent advances in deep learning for object recognition in natural images has
    prompted a surge of interest in applying a similar set of techniques to medical
    images. Most of the initial attempts largely focused on replacing the input to
    such a deep convolutional neural network from a natural image to a medical
    image. This, however, does not take into consideration the fundamental
    differences between these two types of data. More specifically, detection or
    recognition of an anomaly in medical images depends significantly on fine
    details, unlike object recognition in natural images where coarser, more global
    structures matter more. This difference makes it inadequate to use the existing
    deep convolutional neural networks architectures, which were developed for
    natural images, because they rely on heavily downsampling an image to a much
    lower resolution to reduce the memory requirements. This hides details
    necessary to make accurate predictions for medical images. Furthermore, a
    single exam in medical imaging often comes with a set of different views which
    must be seamlessly fused in order to reach a correct conclusion. In our work,
    we propose to use a multi-view deep convolutional neural network that handles a
    set of more than one high-resolution medical image. We evaluate this network on
    large-scale mammography-based breast cancer screening (BI-RADS prediction)
    using 103 thousand images. We focus on investigating the impact of training set
    sizes and image sizes on the prediction accuracy. Our results highlight that
    performance clearly increases with the size of training set, and that the best
    performance can only be achieved using the images in the original resolution.
    This suggests the future direction of medical imaging research using deep
    neural networks is to utilize as much data as possible with the least amount of
    potentially harmful preprocessing.

    Encouraging LSTMs to Anticipate Actions Very Early

    Mohammad Sadegh Aliakbarian, Fatemehsadat Saleh, Mathieu Salzmann, Basura Fernando, Lars Petersson, Lars Andersson
    Comments: 13 Pages, 7 Figures, 9 Tables. arXiv admin note: substantial text overlap with arXiv:1611.05520
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In contrast to the widely studied problem of recognizing an action given a
    complete sequence, action anticipation aims to identify the action from only
    partially available videos. As such, it is therefore key to the success of
    computer vision applications requiring to react as early as possible, such as
    autonomous navigation. In this paper, we propose a new action anticipation
    method that achieves high prediction accuracy even in the presence of a very
    small percentage of a video sequence. To this end, we develop a multi-stage
    LSTM architecture that leverages context- and action-aware features, and
    introduce a novel loss function that encourages the model to predict the
    correct class as early as possible. Our experiments on standard benchmark
    datasets evidence the benefits of our approach; We outperform the
    state-of-the-art action anticipation methods for early prediction by a relative
    increase in accuracy of 22.0% on JHMDB-21, 14.0% on UT-Interaction and 49.9% on
    UCF-101.

    Recurrent Topic-Transition GAN for Visual Paragraph Generation

    Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing
    Comments: 10 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    A natural image usually conveys rich semantic content and can be viewed from
    different angles. Existing image description methods are largely restricted by
    small sets of biased visual paragraph annotations, and fail to cover rich
    underlying semantics. In this paper, we investigate a semi-supervised paragraph
    generative framework that is able to synthesize diverse and semantically
    coherent paragraph descriptions by reasoning over local semantic regions and
    exploiting linguistic knowledge. The proposed Recurrent Topic-Transition
    Generative Adversarial Network (RTT-GAN) builds an adversarial framework
    between a structured paragraph generator and multi-level paragraph
    discriminators. The paragraph generator generates sentences recurrently by
    incorporating region-based visual and language attention mechanisms at each
    step. The quality of generated paragraph sentences is assessed by multi-level
    adversarial discriminators from two aspects, namely, plausibility at sentence
    level and topic-transition coherence at paragraph level. The joint adversarial
    training of RTT-GAN drives the model to generate realistic paragraphs with
    smooth logical transition between sentence topics. Extensive quantitative
    experiments on image and video paragraph datasets demonstrate the effectiveness
    of our RTT-GAN in both supervised and semi-supervised settings. Qualitative
    results on telling diverse stories for an image also verify the
    interpretability of RTT-GAN.

    Spatio-Temporal Facial Expression Recognition Using Convolutional Neural Networks and Conditional Random Fields

    Behzad Hasani, Mohammad H. Mahoor
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Automated Facial Expression Recognition (FER) has been a challenging task for
    decades. Many of the existing works use hand-crafted features such as LBP, HOG,
    LPQ, and Histogram of Optical Flow (HOF) combined with classifiers such as
    Support Vector Machines for expression recognition. These methods often require
    rigorous hyperparameter tuning to achieve good results. Recently Deep Neural
    Networks (DNN) have shown to outperform traditional methods in visual object
    recognition. In this paper, we propose a two-part network consisting of a
    DNN-based architecture followed by a Conditional Random Field (CRF) module for
    facial expression recognition in videos. The first part captures the spatial
    relation within facial images using convolutional layers followed by three
    Inception-ResNet modules and two fully-connected layers. To capture the
    temporal relation between the image frames, we use linear chain CRF in the
    second part of our network. We evaluate our proposed network on three publicly
    available databases, viz. CK+, MMI, and FERA. Experiments are performed in
    subject-independent and cross-database manners. Our experimental results show
    that cascading the deep network architecture with the CRF module considerably
    increases the recognition of facial expressions in videos and in particular it
    outperforms the state-of-the-art methods in the cross-database experiments and
    yields comparable results in the subject-independent experiments.

    SORT: Second-Order Response Transform for Visual Recognition

    Yan Wang, Lingxi Xie, Chenxi Liu, Ya Zhang, Wenjun Zhang, Alan Yuille
    Comments: Submitted to ICCV 2017 (10 pages, 4 figures)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we reveal the importance and benefits of introducing
    second-order operations into deep neural networks. We propose a novel approach
    named Second-Order Response Transform (SORT), which appends element-wise
    product transform to the linear sum of a two-branch network module. A direct
    advantage of SORT is to facilitate cross-branch response propagation, so that
    each branch can update its weights based on the current status of the other
    branch. Moreover, SORT augments the family of transform operations and
    increases the nonlinearity of the network, making it possible to learn flexible
    functions to fit the complicated distribution of feature space. SORT can be
    applied to a wide range of network architectures, including a branched variant
    of a chain-styled network and a residual network, with very light-weighted
    modifications. We observe consistent accuracy gain on both small (CIFAR10,
    CIFAR100 and SVHN) and big (ILSVRC2012) datasets. In addition, SORT is very
    efficient, as the extra computation overhead is less than 5%.

    Active Decision Boundary Annotation with Deep Generative Models

    Miriam W. Huijser, Jan C. van Gemert
    Comments: ICCV submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    This paper is on active learning where the goal is to reduce the data
    annotation burden by interacting with a (human) oracle during training.
    Standard active learning methods ask the oracle to annotate data samples.
    Instead, we take a profoundly different approach: we ask for annotations of the
    decision boundary. We achieve this using a deep generative model to create
    novel instances along a 1d line. A point on the decision boundary is revealed
    where the instances change class. Experimentally we show on three data sets
    that our method can be plugged-in to other active learning schemes, that human
    oracles can effectively annotate points on the decision boundary, that our
    method is robust to annotation noise, and that decision boundary annotations
    improve over annotating data samples.

    Multi-style Generative Network for Real-time Transfer

    Hang Zhang, Kristin Dana
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent work in style transfer learns a feed-forward generative network to
    approximate the prior optimization-based approaches, resulting in real-time
    performance. However, these methods require training separate networks for
    different target styles which greatly limits the scalability. We introduce a
    Multi-style Generative Network (MSG-Net) with a novel Inspiration Layer, which
    retains the functionality of optimization-based approaches and has the fast
    speed of feed-forward networks. The proposed Inspiration Layer explicitly
    matches the feature statistics with the target styles at run time, which
    dramatically improves versatility of existing generative network, so that
    multiple styles can be realized within one network. The proposed MSG-Net
    matches image styles at multiple scales and puts the computational burden into
    the training. The learned generator is a compact feed-forward network that runs
    in real-time after training. Comparing to previous work, the proposed network
    can achieve fast style transfer with at least comparable quality using a single
    network. The experimental results have covered (but are not limited to)
    simultaneous training of twenty different styles in a single network. The
    complete software system and pre-trained models will be publicly available upon
    publication.

    Fast Spectral Ranking for Similarity Search

    Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Teddy Furon, Ondrej Chum
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Despite the success of deep learning on representing images for particular
    object retrieval, recent studies show that the learned representations still
    lie on manifolds in a high dimensional space. Therefore, nearest neighbor
    search cannot be expected to be optimal for this task. Even if a nearest
    neighbor graph is computed offline, exploring the manifolds online remains
    expensive. This work introduces an explicit embedding reducing manifold search
    to Euclidean search followed by dot product similarity search. We show this is
    equivalent to linear graph filtering of a sparse signal in the frequency
    domain, and we introduce a scalable offline computation of an approximate
    Fourier basis of the graph. We improve the state of art on standard particular
    object retrieval datasets including a challenging one containing small objects.
    At a scale of (10^5) images, the offline cost is only a few hours, while query
    time is comparable to standard similarity search.

    Learning Correspondence Structures for Person Re-identification

    Weiyao Lin, Yang Shen, Junchi Yan, Mingliang Xu, Jianxin Wu, Jingdong Wang, Ke Lu
    Comments: accepted by IEEE Trans. Image Processing. arXiv admin note: text overlap with arXiv:1504.06243
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)

    This paper addresses the problem of handling spatial misalignments due to
    camera-view changes or human-pose variations in person re-identification. We
    first introduce a boosting-based approach to learn a correspondence structure
    which indicates the patch-wise matching probabilities between images from a
    target camera pair. The learned correspondence structure can not only capture
    the spatial correspondence pattern between cameras but also handle the
    viewpoint or human-pose variation in individual images. We further introduce a
    global constraint-based matching process. It integrates a global matching
    constraint over the learned correspondence structure to exclude cross-view
    misalignments during the image patch matching process, hence achieving a more
    reliable matching score between images. Finally, we also extend our approach by
    introducing a multi-structure scheme, which learns a set of local
    correspondence structures to capture the spatial correspondence sub-patterns
    between a camera pair, so as to handle the spatial misalignments between
    individual images in a more precise way. Experimental results on various
    datasets demonstrate the effectiveness of our approach.

    On the Interplay between Strong Regularity and Graph Densification

    Marco Fiorucci, Alessandro Torcinovich, Manuel Curado, Francisco Escolano, Marcello Pelillo
    Comments: GbR2017 to appear in Lecture Notes in Computer Science (LNCS)
    Subjects: Data Structures and Algorithms (cs.DS); Computer Vision and Pattern Recognition (cs.CV)

    In this paper we analyze the practical implications of Szemer’edi’s
    regularity lemma in the preservation of metric information contained in large
    graphs. To this end, we present a heuristic algorithm to find regular
    partitions. Our experiments show that this method is quite robust to the
    natural sparsification of proximity graphs. In addition, this robustness can be
    enforced by graph densification.

    Cross-modal Deep Metric Learning with Multi-task Regularization

    Xin Huang, Yuxin Peng
    Comments: 6 pages, 1 figure, to appear in the proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Jul 10, 2017 – Jul 14, 2017, Hong Kong, Hong Kong
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    DNN-based cross-modal retrieval has become a research hotspot, by which users
    can search results across various modalities like image and text. However,
    existing methods mainly focus on the pairwise correlation and reconstruction
    error of labeled data. They ignore the semantically similar and dissimilar
    constraints between different modalities, and cannot take advantage of
    unlabeled data. This paper proposes Cross-modal Deep Metric Learning with
    Multi-task Regularization (CDMLMR), which integrates quadruplet ranking loss
    and semi-supervised contrastive loss for modeling cross-modal semantic
    similarity in a unified multi-task learning architecture. The quadruplet
    ranking loss can model the semantically similar and dissimilar constraints to
    preserve cross-modal relative similarity ranking information. The
    semi-supervised contrastive loss is able to maximize the semantic similarity on
    both labeled and unlabeled data. Compared to the existing methods, CDMLMR
    exploits not only the similarity ranking information but also unlabeled
    cross-modal data, and thus boosts cross-modal retrieval accuracy.


    Artificial Intelligence

    One-Shot Imitation Learning

    Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, Wojciech Zaremba
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

    Imitation learning has been commonly applied to solve different tasks in
    isolation. This usually requires either careful feature engineering, or a
    significant number of samples. This is far from what we desire: ideally, robots
    should be able to learn from very few demonstrations of any given task, and
    instantly generalize to new situations of the same task, without requiring
    task-specific engineering. In this paper, we propose a meta-learning framework
    for achieving such capability, which we call one-shot imitation learning.

    Specifically, we consider the setting where there is a very large set of
    tasks, and each task has many instantiations. For example, a task could be to
    stack all blocks on a table into a single tower, another task could be to place
    all blocks on a table into two-block towers, etc. In each case, different
    instances of the task would consist of different sets of blocks with different
    initial states. At training time, our algorithm is presented with pairs of
    demonstrations for a subset of all tasks. A neural net is trained that takes as
    input one demonstration and the current state (which initially is the initial
    state of the other demonstration of the pair), and outputs an action with the
    goal that the resulting sequence of states and actions matches as closely as
    possible with the second demonstration. At test time, a demonstration of a
    single instance of a new task is presented, and the neural net is expected to
    perform well on new instances of this new task. The use of soft attention
    allows the model to generalize to conditions and tasks unseen in the training
    data. We anticipate that by training this model on a much greater variety of
    tasks and settings, we will obtain a general system that can turn any
    demonstrations into robust policies that can accomplish an overwhelming variety
    of tasks.

    Videos available at this https URL

    Pseudorehearsal in value function approximation

    Vladimir Marochko, Leonard Johard, Manuel Mazzara
    Journal-ref: 11th International Conference on Agents and Multi-agent Systems
    Technologies and Applications, 2017
    Subjects: Artificial Intelligence (cs.AI)

    Catastrophic forgetting is of special importance in reinforcement learning,
    as the data distribution is generally non-stationary over time. We study and
    compare several pseudorehearsal approaches for Q-learning with function
    approximation in a pole balancing task. We have found that pseudorehearsal
    seems to assist learning even in such very simple problems, given proper
    initialization of the rehearsal parameters.

    Distributed Constraint Problems for Utilitarian Agents with Privacy Concerns, Recast as POMDPs

    Julien Savaux, Julien Vion, Sylvain Piechowiak, René Mandiau, Toshihiro Matsui, Katsutoshi Hirayama, Makoto Yokoo, Shakre Elmane, Marius Silaghi
    Subjects: Artificial Intelligence (cs.AI)

    Privacy has traditionally been a major motivation for distributed problem
    solving. Distributed Constraint Satisfaction Problem (DisCSP) as well as
    Distributed Constraint Optimization Problem (DCOP) are fundamental models used
    to solve various families of distributed problems. Even though several
    approaches have been proposed to quantify and preserve privacy in such
    problems, none of them is exempt from limitations. Here we approach the problem
    by assuming that computation is performed among utilitarian agents. We
    introduce a utilitarian approach where the utility of each state is estimated
    as the difference between the reward for reaching an agreement on assignments
    of shared variables and the cost of privacy loss. We investigate extensions to
    solvers where agents integrate the utility function to guide their search and
    decide which action to perform, defining thereby their policy. We show that
    these extended solvers succeed in significantly reducing privacy loss without
    significant degradation of the solution quality.

    ZM-Net: Real-time Zero-shot Image Manipulation Network

    Hao Wang, Xiaodan Liang, Hao Zhang, Dit-Yan Yeung, Eric P. Xing
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Learning (cs.LG); Machine Learning (stat.ML)

    Many problems in image processing and computer vision (e.g. colorization,
    style transfer) can be posed as ‘manipulating’ an input image into a
    corresponding output image given a user-specified guiding signal. A holy-grail
    solution towards generic image manipulation should be able to efficiently alter
    an input image with any personalized signals (even signals unseen during
    training), such as diverse paintings and arbitrary descriptive attributes.
    However, existing methods are either inefficient to simultaneously process
    multiple signals (let alone generalize to unseen signals), or unable to handle
    signals from other modalities. In this paper, we make the first attempt to
    address the zero-shot image manipulation task. We cast this problem as
    manipulating an input image according to a parametric model whose key
    parameters can be conditionally generated from any guiding signal (even unseen
    ones). To this end, we propose the Zero-shot Manipulation Net (ZM-Net), a
    fully-differentiable architecture that jointly optimizes an
    image-transformation network (TNet) and a parameter network (PNet). The PNet
    learns to generate key transformation parameters for the TNet given any guiding
    signal while the TNet performs fast zero-shot image manipulation according to
    both signal-dependent parameters from the PNet and signal-invariant parameters
    from the TNet itself. Extensive experiments show that our ZM-Net can perform
    high-quality image manipulation conditioned on different forms of guiding
    signals (e.g. style images and attributes) in real-time (tens of milliseconds
    per image) even for unseen signals. Moreover, a large-scale style dataset with
    over 20,000 style images is also constructed to promote further research.

    Interest-Driven Discovery of Local Process Models

    Niek Tax, Benjamin Dalmas, Natalia Sidorova, Wil M P van der Aalst, Sylvie Norre
    Comments: submitted to the International Conference on Business Process Management (BPM) 2017
    Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Local Process Models (LPM) describe structured fragments of process behavior
    occurring in the context of less structured business processes. Traditional LPM
    discovery aims to generate a collection of process models that describe highly
    frequent behavior, but these models do not always provide useful answers for
    questions posed by process analysts aiming at business process improvement. We
    propose a framework for goal-driven LPM discovery, based on utility functions
    and constraints. We describe four scopes on which these utility functions and
    constrains can be defined, and show that utility functions and constraints on
    different scopes can be combined to form composite utility
    functions/constraints. Finally, we demonstrate the applicability of our
    approach by presenting several actionable business insights discovered with LPM
    discovery on two real life data sets.

    Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems

    Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, Asli Celikyilmaz
    Comments: 5 pages, 5 figures
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Language understanding is a key component in a spoken dialogue system. In
    this paper, we investigate how the language understanding module influences the
    dialogue system performance by conducting a series of systematic experiments on
    a task-oriented neural dialogue system in a reinforcement learning based
    setting. The empirical study shows that among different types of language
    understanding errors, slot-level errors can have more impact on the overall
    performance of a dialogue system compared to intent-level errors. In addition,
    our experiments demonstrate that the reinforcement learning based dialogue
    system is able to learn when and what to confirm in order to achieve better
    performance and greater robustness.

    Recurrent Topic-Transition GAN for Visual Paragraph Generation

    Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing
    Comments: 10 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    A natural image usually conveys rich semantic content and can be viewed from
    different angles. Existing image description methods are largely restricted by
    small sets of biased visual paragraph annotations, and fail to cover rich
    underlying semantics. In this paper, we investigate a semi-supervised paragraph
    generative framework that is able to synthesize diverse and semantically
    coherent paragraph descriptions by reasoning over local semantic regions and
    exploiting linguistic knowledge. The proposed Recurrent Topic-Transition
    Generative Adversarial Network (RTT-GAN) builds an adversarial framework
    between a structured paragraph generator and multi-level paragraph
    discriminators. The paragraph generator generates sentences recurrently by
    incorporating region-based visual and language attention mechanisms at each
    step. The quality of generated paragraph sentences is assessed by multi-level
    adversarial discriminators from two aspects, namely, plausibility at sentence
    level and topic-transition coherence at paragraph level. The joint adversarial
    training of RTT-GAN drives the model to generate realistic paragraphs with
    smooth logical transition between sentence topics. Extensive quantitative
    experiments on image and video paragraph datasets demonstrate the effectiveness
    of our RTT-GAN in both supervised and semi-supervised settings. Qualitative
    results on telling diverse stories for an image also verify the
    interpretability of RTT-GAN.

    Learning Correspondence Structures for Person Re-identification

    Weiyao Lin, Yang Shen, Junchi Yan, Mingliang Xu, Jianxin Wu, Jingdong Wang, Ke Lu
    Comments: accepted by IEEE Trans. Image Processing. arXiv admin note: text overlap with arXiv:1504.06243
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)

    This paper addresses the problem of handling spatial misalignments due to
    camera-view changes or human-pose variations in person re-identification. We
    first introduce a boosting-based approach to learn a correspondence structure
    which indicates the patch-wise matching probabilities between images from a
    target camera pair. The learned correspondence structure can not only capture
    the spatial correspondence pattern between cameras but also handle the
    viewpoint or human-pose variation in individual images. We further introduce a
    global constraint-based matching process. It integrates a global matching
    constraint over the learned correspondence structure to exclude cross-view
    misalignments during the image patch matching process, hence achieving a more
    reliable matching score between images. Finally, we also extend our approach by
    introducing a multi-structure scheme, which learns a set of local
    correspondence structures to capture the spatial correspondence sub-patterns
    between a camera pair, so as to handle the spatial misalignments between
    individual images in a more precise way. Experimental results on various
    datasets demonstrate the effectiveness of our approach.


    Computation and Language

    Deep LSTM for Large Vocabulary Continuous Speech Recognition

    Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei, Peihao Wu, Wenchang Situ, Shuai Li, Yang Zhang
    Comments: 8 pages. arXiv admin note: text overlap with arXiv:1703.01024
    Subjects: Computation and Language (cs.CL)

    Recurrent neural networks (RNNs), especially long short-term memory (LSTM)
    RNNs, are effective network for sequential task like speech recognition. Deeper
    LSTM models perform well on large vocabulary continuous speech recognition,
    because of their impressive learning ability. However, it is more difficult to
    train a deeper network. We introduce a training framework with layer-wise
    training and exponential moving average methods for deeper LSTM models. It is a
    competitive framework that LSTM models of more than 7 layers are successfully
    trained on Shenma voice search data in Mandarin and they outperform the deep
    LSTM models trained by conventional approach. Moreover, in order for online
    streaming speech recognition applications, the shallow model with low real time
    factor is distilled from the very deep model. The recognition accuracy have
    little loss in the distillation process. Therefore, the model trained with the
    proposed training framework reduces relative 14\% character error rate,
    compared to original model which has the similar real-time capability.
    Furthermore, the novel transfer learning strategy with segmental Minimum
    Bayes-Risk is also introduced in the framework. The strategy makes it possible
    that training with only a small part of dataset could outperform full dataset
    training from the beginning.

    Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems

    Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, Asli Celikyilmaz
    Comments: 5 pages, 5 figures
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Language understanding is a key component in a spoken dialogue system. In
    this paper, we investigate how the language understanding module influences the
    dialogue system performance by conducting a series of systematic experiments on
    a task-oriented neural dialogue system in a reinforcement learning based
    setting. The empirical study shows that among different types of language
    understanding errors, slot-level errors can have more impact on the overall
    performance of a dialogue system compared to intent-level errors. In addition,
    our experiments demonstrate that the reinforcement learning based dialogue
    system is able to learn when and what to confirm in order to achieve better
    performance and greater robustness.


    Distributed, Parallel, and Cluster Computing

    PriMaL: A Privacy-Preserving Machine Learning Method for Event Detection in Distributed Sensor Networks

    Stefano Bennati, Catholijn M. Jonker
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)

    This paper introduces PriMaL, a general PRIvacy-preserving MAchine-Learning
    method for reducing the privacy cost of information transmitted through a
    network. Distributed sensor networks are often used for automated
    classification and detection of abnormal events in high-stakes situations, e.g.
    fire in buildings, earthquakes, or crowd disasters. Such networks might
    transmit privacy-sensitive information, e.g. GPS location of smartphones, which
    might be disclosed if the network is compromised. Privacy concerns might slow
    down the adoption of the technology, in particular in the scenario of social
    sensing where participation is voluntary, thus solutions are needed which
    improve privacy without compromising on the event detection accuracy. PriMaL is
    implemented as a machine-learning layer that works on top of an existing event
    detection algorithm. Experiments are run in a general simulation framework, for
    several network topologies and parameter values. The privacy footprint of
    state-of-the-art event detection algorithms is compared within the proposed
    framework. Results show that PriMaL is able to reduce the privacy cost of a
    distributed event detection algorithm below that of the corresponding
    centralized algorithm, within the bounds of some assumptions about the
    protocol. Moreover the performance of the distributed algorithm is not
    statistically worse than that of the centralized algorithm.


    Learning

    On The Projection Operator to A Three-view Cardinality Constrained Set

    Haichuan Yang, Shupeng Gui, Chuyang Ke, Daniel Stefankovic, Ryohei Fujimaki, Ji Liu
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    The cardinality constraint is an intrinsic way to restrict the solution
    structure in many domains, for example, sparse learning, feature selection, and
    compressed sensing. To solve a cardinality constrained problem, the key
    challenge is to solve the projection onto the cardinality constraint set, which
    is NP-hard in general when there exist multiple overlapped cardiaiality
    constraints. In this paper, we consider the scenario where overlapped
    cardinality constraints satisfy a Three-view Cardinality Structure (TVCS),
    which reflects the natural restriction in many applications, such as
    identification of gene regulatory networks and task-worker assignment problem.
    We cast the projection onto the TVCS set into a linear programming, and prove
    that its solution can be obtained by finding an integer solution to such linear
    programming. We further prove that such integer solution can be found with the
    complexity proportional to the problem scale. We finally use synthetic
    experiments and two interesting applications in bioinformatics and
    crowdsourcing to validate the proposed TVCS model and method.

    Layer-wise training of deep networks using kernel similarity

    Mandar Kulkarni, Shirish Karande
    Journal-ref: Deep Learning for Pattern Recognition (DLPR) workshop at ICPR 2016
    Subjects: Learning (cs.LG)

    Deep learning has shown promising results in many machine learning
    applications. The hierarchical feature representation built by deep networks
    enable compact and precise encoding of the data. A kernel analysis of the
    trained deep networks demonstrated that with deeper layers, more simple and
    more accurate data representations are obtained. In this paper, we propose an
    approach for layer-wise training of a deep network for the supervised
    classification task. A transformation matrix of each layer is obtained by
    solving an optimization aimed at a better representation where a subsequent
    layer builds its representation on the top of the features produced by a
    previous layer. We compared the performance of our approach with a DNN trained
    using back-propagation which has same architecture as ours. Experimental
    results on the real image datasets demonstrate efficacy of our approach. We
    also performed kernel analysis of layer representations to validate the claim
    of better feature encoding.

    SMILES Enumeration as Data Augmentation for Neural Network Modeling of Molecules

    Esben Jannik Bjerrum
    Subjects: Learning (cs.LG)

    Simplified Molecular Input Line Entry System (SMILES) is a single line text
    representation of a unique molecule. One molecule can however have multiple
    SMILES strings, which is a reason that canonical SMILES have been defined,
    which ensures a one to one correspondence between SMILES string and molecule.
    Here the fact that multiple SMILES represent the same molecule is explored as a
    technique for data augmentation of a molecular QSAR dataset modeled by a long
    short term memory (LSTM) cell based neural network. The augmented dataset was
    130 times bigger than the original. The network trained with the augmented
    dataset shows better performance on a test set when compared to a model built
    with only one canonical SMILES string per molecule. The correlation coefficient
    R2 on the test set was improved from 0.56 to 0.66 when using SMILES
    enumeration, and the root mean square error (RMS) likewise fell from 0.62 to
    0.55. The technique also works in the prediction phase. By taking the average
    per molecule of the predictions for the enumerated SMILES a further improvement
    to a correlation coefficient of 0.68 and a RMS of 0.52 was found.

    Nonparametric Variational Auto-encoders for Hierarchical Representation Learning

    Prasoon Goyal, Zhiting Hu, Xiaodan Liang, Chenyu Wang, Eric Xing
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    The recently developed variational autoencoders (VAEs) have proved to be an
    effective confluence of the rich representational power of neural networks with
    Bayesian methods. However, most work on VAEs use a rather simple prior over the
    latent variables such as standard normal distribution, thereby restricting its
    applications to relatively simple phenomena. In this work, we propose
    hierarchical nonparametric variational autoencoders, which combines
    tree-structured Bayesian nonparametric priors with VAEs, to enable infinite
    flexibility of the latent representation space. Both the neural parameters and
    Bayesian priors are learned jointly using tailored variational inference. The
    resulting model induces a hierarchical structure of latent semantic concepts
    underlying the data corpus, and infers accurate representations of data
    instances. We apply our model in video representation learning. Our method is
    able to discover highly interpretable activity hierarchies, and obtain improved
    clustering accuracy and generalization capacity based on the learned rich
    representations.

    Cross-modal Deep Metric Learning with Multi-task Regularization

    Xin Huang, Yuxin Peng
    Comments: 6 pages, 1 figure, to appear in the proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Jul 10, 2017 – Jul 14, 2017, Hong Kong, Hong Kong
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    DNN-based cross-modal retrieval has become a research hotspot, by which users
    can search results across various modalities like image and text. However,
    existing methods mainly focus on the pairwise correlation and reconstruction
    error of labeled data. They ignore the semantically similar and dissimilar
    constraints between different modalities, and cannot take advantage of
    unlabeled data. This paper proposes Cross-modal Deep Metric Learning with
    Multi-task Regularization (CDMLMR), which integrates quadruplet ranking loss
    and semi-supervised contrastive loss for modeling cross-modal semantic
    similarity in a unified multi-task learning architecture. The quadruplet
    ranking loss can model the semantically similar and dissimilar constraints to
    preserve cross-modal relative similarity ranking information. The
    semi-supervised contrastive loss is able to maximize the semantic similarity on
    both labeled and unlabeled data. Compared to the existing methods, CDMLMR
    exploits not only the similarity ranking information but also unlabeled
    cross-modal data, and thus boosts cross-modal retrieval accuracy.

    Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks

    Guokun Lai, Wei-Cheng Chang, Yiming Yang, Hanxiao Liu
    Subjects: Learning (cs.LG)

    Multivariate time series forecasting is an important machine learning problem
    across many domains, including predictions of solar plant energy output,
    electricity consumption, and traffic jam situation. Temporal data arise in
    these real-world applications often involves a mixture of long-term and
    short-term patterns, for which traditional approaches such as Autoregressive
    models and Gaussian Process may fail. In this paper, we proposed a novel deep
    learning framework, namely Long- and Short-term Time-series network (LSTNet),
    to address this open challenge. LSTNet uses the Convolution Neural Network
    (CNN) to extract short-term local dependency patterns among variables, and the
    Recurrent Neural Network (RNN) to discover long-term patterns and trends. In
    our evaluation on real-world data with complex mixtures of repetitive patterns,
    LSTNet achieved significant performance improvements over that of several
    state-of-the-art baseline methods.

    The Use of Autoencoders for Discovering Patient Phenotypes

    Harini Suresh, Peter Szolovits, Marzyeh Ghassemi
    Journal-ref: NIPS Workshop on Machine Learning for Healthcare (NIPS ML4HC) 2016
    Subjects: Learning (cs.LG)

    We use autoencoders to create low-dimensional embeddings of underlying
    patient phenotypes that we hypothesize are a governing factor in determining
    how different patients will react to different interventions. We compare the
    performance of autoencoders that take fixed length sequences of concatenated
    timesteps as input with a recurrent sequence-to-sequence autoencoder. We
    evaluate our methods on around 35,500 patients from the latest MIMIC III
    dataset from Beth Israel Deaconess Hospital.

    Metalearning for Feature Selection

    Ben Goertzel, Nil Geisweiller, Chris Poulin
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    A general formulation of optimization problems in which various candidate
    solutions may use different feature-sets is presented, encompassing supervised
    classification, automated program learning and other cases. A novel
    characterization of the concept of a “good quality feature” for such an
    optimization problem is provided; and a proposal regarding the integration of
    quality based feature selection into metalearning is suggested, wherein the
    quality of a feature for a problem is estimated using knowledge about related
    features in the context of related problems. Results are presented regarding
    extensive testing of this “feature metalearning” approach on supervised text
    classification problems; it is demonstrated that, in this context, feature
    metalearning can provide significant and sometimes dramatic speedup over
    standard feature selection heuristics.

    CSI: A Hybrid Deep Model for Fake News

    Natali Ruchansky, Sungyong Seo, Yan Liu
    Subjects: Learning (cs.LG); Social and Information Networks (cs.SI)

    In the recent political climate, the topic of fake news has drawn attention
    both from the public and the academic communities. Such misinformation has been
    cited to have a strong impact on public opinion, presenting the opportunity for
    malicious manipulation. Detecting fake news is an important, yet challenging
    problem since it is often difficult for humans to distinguish misinformation.
    However, there have been three generally agreed upon characteristics of fake
    news: the text, the response received, and the source users promoting it.
    Existing work has largely focused on tailoring solutions to a particular
    characteristic, but the complexity of the fake news epidemic limited their
    success and generality.

    In this work, we propose a model that combines all three characteristics for
    a more accurate and automated prediction. Specifically, we incorporate the
    behavior of both parties, users and articles, and the group behavior of users
    who propagate fake news. Motivated by the three characteristics, we propose a
    model called CSI, which is composed of three modules: Capture, Score, and
    Integrate. The first module uses a Recurrent Neural Network (RNN) to capture
    the temporal pattern of user activity that occurred with a given article, and
    the second captures the behavior of users over time. The two are then
    integrated with the third module to classify an article as fake or not. Through
    experimental analysis on real-world data, we demonstrate that CSI achieves
    higher accuracy than existing models. Further, we show that each module
    captures relevant behavioral information both on users and articles with
    respect to the propagation of fake news.

    Black-Box Optimization in Machine Learning with Trust Region Based Derivative Free Algorithm

    Hiva Ghanbari, Katya Scheinberg
    Subjects: Learning (cs.LG)

    In this work, we utilize a Trust Region based Derivative Free Optimization
    (DFO-TR) method to directly maximize the Area Under Receiver Operating
    Characteristic Curve (AUC), which is a nonsmooth, noisy function. We show that
    AUC is a smooth function, in expectation, if the distributions of the positive
    and negative data points obey a jointly normal distribution. The practical
    performance of this algorithm is compared to three prominent Bayesian
    optimization methods and random search. The presented numerical results show
    that DFO-TR surpasses Bayesian optimization and random search on various
    black-box optimization problem, such as maximizing AUC and hyperparameter
    tuning.

    Applying Deep Machine Learning for psycho-demographic profiling of Internet users using O.C.E.A.N. model of personality

    Iaroslav Omelianenko
    Comments: arXiv admin note: text overlap with arXiv:1207.0580 by other authors
    Subjects: Learning (cs.LG); Computers and Society (cs.CY)

    In the modern era, each Internet user leaves enormous amounts of auxiliary
    digital residuals (footprints) by using a variety of on-line services. All this
    data is already collected and stored for many years. In recent works, it was
    demonstrated that it’s possible to apply simple machine learning methods to
    analyze collected digital footprints and to create psychological profiles of
    individuals. However, while these works clearly demonstrated the applicability
    of machine learning methods for such an analysis, created simple prediction
    models still lacks accuracy necessary to be successfully applied to practical
    needs. We have assumed that using advanced deep machine learning methods may
    considerably increase the accuracy of predictions. We started with simple
    machine learning methods to estimate basic prediction performance and moved
    further by applying advanced methods based on shallow and deep neural networks.
    Then we compared prediction power of studied models and made conclusions about
    its performance. Finally, we made hypotheses how prediction accuracy can be
    further improved. As result of this work, we provide full source code used in
    the experiments for all interested researchers and practitioners in
    corresponding GitHub repository. We believe that applying deep machine learning
    for psychological profiling may have an enormous impact on the society (for
    good or worse) and providing full source code of our research we hope to
    intensify further research by the wider circle of scholars.

    Dance Dance Convolution

    Chris Donahue, Zachary C. Lipton, Julian McAuley
    Subjects: Learning (cs.LG); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Machine Learning (stat.ML)

    Dance Dance Revolution (DDR) is a popular rhythm-based video game. Players
    perform steps on a dance platform in synchronization with music as directed by
    on-screen step charts. While many step charts are available in standardized
    packs, users may grow tired of existing charts, or wish to dance to a song for
    which no chart exists. We introduce the task of learning to choreograph. Given
    a raw audio track, the goal is to produce a new step chart. This task
    decomposes naturally into two subtasks: deciding when to place steps and
    deciding which steps to select. For the step placement task, we combine
    recurrent and convolutional neural networks to ingest spectrograms of low-level
    audio features to predict steps, conditioned on chart difficulty. For step
    selection, we present a conditional LSTM generative model that substantially
    outperforms n-gram and fixed-window approaches.

    One-Shot Imitation Learning

    Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, Wojciech Zaremba
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

    Imitation learning has been commonly applied to solve different tasks in
    isolation. This usually requires either careful feature engineering, or a
    significant number of samples. This is far from what we desire: ideally, robots
    should be able to learn from very few demonstrations of any given task, and
    instantly generalize to new situations of the same task, without requiring
    task-specific engineering. In this paper, we propose a meta-learning framework
    for achieving such capability, which we call one-shot imitation learning.

    Specifically, we consider the setting where there is a very large set of
    tasks, and each task has many instantiations. For example, a task could be to
    stack all blocks on a table into a single tower, another task could be to place
    all blocks on a table into two-block towers, etc. In each case, different
    instances of the task would consist of different sets of blocks with different
    initial states. At training time, our algorithm is presented with pairs of
    demonstrations for a subset of all tasks. A neural net is trained that takes as
    input one demonstration and the current state (which initially is the initial
    state of the other demonstration of the pair), and outputs an action with the
    goal that the resulting sequence of states and actions matches as closely as
    possible with the second demonstration. At test time, a demonstration of a
    single instance of a new task is presented, and the neural net is expected to
    perform well on new instances of this new task. The use of soft attention
    allows the model to generalize to conditions and tasks unseen in the training
    data. We anticipate that by training this model on a much greater variety of
    tasks and settings, we will obtain a general system that can turn any
    demonstrations into robust policies that can accomplish an overwhelming variety
    of tasks.

    Videos available at this https URL

    From safe screening rules to working sets for faster Lasso-type solvers

    Mathurin Massias, Alexandre Gramfort, Joseph Salmon
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC); Computation (stat.CO)

    Convex sparsity-promoting regularizations are ubiquitous in modern
    statistical learning. By construction, they yield solutions with few non-zero
    coefficients, which correspond to saturated constraints in the dual
    optimization formulation. Working set (WS) strategies are generic optimization
    techniques that consist in solving simpler problems that only consider a subset
    of constraints, whose indices form the WS. Working set methods therefore
    involve two nested iterations: the outer loop corresponds to the definition of
    the WS and the inner loop calls a solver for the subproblems. For the Lasso
    estimator a WS is a set of features, while for a Group Lasso it refers to a set
    of groups. In practice, WS are generally small in this context so the
    associated feature Gram matrix can fit in memory. Here we show that the
    Gauss-Southwell rule (a greedy strategy for block coordinate descent
    techniques) leads to fast solvers in this case. Combined with a working set
    strategy based on an aggressive use of so-called Gap Safe screening rules, we
    propose a solver achieving state-of-the-art performance on sparse learning
    problems. Results are presented on Lasso and multi-task Lasso estimators.

    Black-Box Data-efficient Policy Search for Robotics

    Konstantinos Chatzilygeroudis, Roberto Rama, Rituraj Kaushik, Dorian Goepp, Vassilis Vassiliades, Jean-Baptiste Mouret
    Comments: 8 pages, 6 figures
    Subjects: Robotics (cs.RO); Learning (cs.LG)

    The most data-efficient algorithms for reinforcement learning (RL) in
    robotics are based on uncertain dynamical models: after each episode, they
    first learn a dynamical model of the robot, then they use an optimization
    algorithm to find a policy that maximizes the expected return given the model
    and its uncertainties. It is often believed that this optimization can be
    tractable only if analytical, gradient-based algorithms are used; however,
    these algorithms require using specific families of reward functions and
    policies, which greatly limits the flexibility of the overall approach. In this
    paper, we introduce a novel model-based RL algorithm, called Black-DROPS
    (Black-box Data-efficient RObot Policy Search) that: (1) does not impose any
    constraint on the reward function or the policy (they are treated as
    black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for
    data-efficient RL in robotics, and (3) is as fast (or faster) than analytical
    approaches when several cores are available. The key idea is to replace the
    gradient-based optimization algorithm with a parallel, black-box algorithm that
    takes into account the model uncertainties. We demonstrate the performance of
    our new algorithm on two standard control benchmark problems (in simulation)
    and a low-cost robotic manipulator (with a real robot).

    ZM-Net: Real-time Zero-shot Image Manipulation Network

    Hao Wang, Xiaodan Liang, Hao Zhang, Dit-Yan Yeung, Eric P. Xing
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Learning (cs.LG); Machine Learning (stat.ML)

    Many problems in image processing and computer vision (e.g. colorization,
    style transfer) can be posed as ‘manipulating’ an input image into a
    corresponding output image given a user-specified guiding signal. A holy-grail
    solution towards generic image manipulation should be able to efficiently alter
    an input image with any personalized signals (even signals unseen during
    training), such as diverse paintings and arbitrary descriptive attributes.
    However, existing methods are either inefficient to simultaneously process
    multiple signals (let alone generalize to unseen signals), or unable to handle
    signals from other modalities. In this paper, we make the first attempt to
    address the zero-shot image manipulation task. We cast this problem as
    manipulating an input image according to a parametric model whose key
    parameters can be conditionally generated from any guiding signal (even unseen
    ones). To this end, we propose the Zero-shot Manipulation Net (ZM-Net), a
    fully-differentiable architecture that jointly optimizes an
    image-transformation network (TNet) and a parameter network (PNet). The PNet
    learns to generate key transformation parameters for the TNet given any guiding
    signal while the TNet performs fast zero-shot image manipulation according to
    both signal-dependent parameters from the PNet and signal-invariant parameters
    from the TNet itself. Extensive experiments show that our ZM-Net can perform
    high-quality image manipulation conditioned on different forms of guiding
    signals (e.g. style images and attributes) in real-time (tens of milliseconds
    per image) even for unseen signals. Moreover, a large-scale style dataset with
    over 20,000 style images is also constructed to promote further research.

    Knowledge distillation using unlabeled mismatched images

    Mandar Kulkarni, Kalpesh Patil, Shirish Karande
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Current approaches for Knowledge Distillation (KD) either directly use
    training data or sample from the training data distribution. In this paper, we
    demonstrate effectiveness of ‘mismatched’ unlabeled stimulus to perform KD for
    image classification networks. For illustration, we consider scenarios where
    this is a complete absence of training data, or mismatched stimulus has to be
    used for augmenting a small amount of training data. We demonstrate that
    stimulus complexity is a key factor for distillation’s good performance. Our
    examples include use of various datasets for stimulating MNIST and CIFAR
    teachers.

    Interest-Driven Discovery of Local Process Models

    Niek Tax, Benjamin Dalmas, Natalia Sidorova, Wil M P van der Aalst, Sylvie Norre
    Comments: submitted to the International Conference on Business Process Management (BPM) 2017
    Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Local Process Models (LPM) describe structured fragments of process behavior
    occurring in the context of less structured business processes. Traditional LPM
    discovery aims to generate a collection of process models that describe highly
    frequent behavior, but these models do not always provide useful answers for
    questions posed by process analysts aiming at business process improvement. We
    propose a framework for goal-driven LPM discovery, based on utility functions
    and constraints. We describe four scopes on which these utility functions and
    constrains can be defined, and show that utility functions and constraints on
    different scopes can be combined to form composite utility
    functions/constraints. Finally, we demonstrate the applicability of our
    approach by presenting several actionable business insights discovered with LPM
    discovery on two real life data sets.

    Stochastic Primal Dual Coordinate Method with Non-Uniform Sampling Based on Optimality Violations

    Atsushi Shibagaki, Ichiro Takeuchi
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)

    We study primal-dual type stochastic optimization algorithms with non-uniform
    sampling. Our main theoretical contribution in this paper is to present a
    convergence analysis of Stochastic Primal Dual Coordinate (SPDC) Method with
    arbitrary sampling. Based on this theoretical framework, we propose Optimality
    Violation-based Sampling SPDC (ovsSPDC), a non-uniform sampling method based on
    Optimality Violation. We also propose two efficient heuristic variants of
    ovsSPDC called ovsSDPC+ and ovsSDPC++. Through intensive numerical experiments,
    we demonstrate that the proposed method and its variants are faster than other
    state-of-the-art primal-dual type stochastic optimization methods.

    Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems

    Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, Asli Celikyilmaz
    Comments: 5 pages, 5 figures
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Language understanding is a key component in a spoken dialogue system. In
    this paper, we investigate how the language understanding module influences the
    dialogue system performance by conducting a series of systematic experiments on
    a task-oriented neural dialogue system in a reinforcement learning based
    setting. The empirical study shows that among different types of language
    understanding errors, slot-level errors can have more impact on the overall
    performance of a dialogue system compared to intent-level errors. In addition,
    our experiments demonstrate that the reinforcement learning based dialogue
    system is able to learn when and what to confirm in order to achieve better
    performance and greater robustness.

    High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks

    Krzysztof J. Geras, Stacey Wolfson, S. Gene Kim, Linda Moy, Kyunghyun Cho
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Recent advances in deep learning for object recognition in natural images has
    prompted a surge of interest in applying a similar set of techniques to medical
    images. Most of the initial attempts largely focused on replacing the input to
    such a deep convolutional neural network from a natural image to a medical
    image. This, however, does not take into consideration the fundamental
    differences between these two types of data. More specifically, detection or
    recognition of an anomaly in medical images depends significantly on fine
    details, unlike object recognition in natural images where coarser, more global
    structures matter more. This difference makes it inadequate to use the existing
    deep convolutional neural networks architectures, which were developed for
    natural images, because they rely on heavily downsampling an image to a much
    lower resolution to reduce the memory requirements. This hides details
    necessary to make accurate predictions for medical images. Furthermore, a
    single exam in medical imaging often comes with a set of different views which
    must be seamlessly fused in order to reach a correct conclusion. In our work,
    we propose to use a multi-view deep convolutional neural network that handles a
    set of more than one high-resolution medical image. We evaluate this network on
    large-scale mammography-based breast cancer screening (BI-RADS prediction)
    using 103 thousand images. We focus on investigating the impact of training set
    sizes and image sizes on the prediction accuracy. Our results highlight that
    performance clearly increases with the size of training set, and that the best
    performance can only be achieved using the images in the original resolution.
    This suggests the future direction of medical imaging research using deep
    neural networks is to utilize as much data as possible with the least amount of
    potentially harmful preprocessing.

    Recurrent Topic-Transition GAN for Visual Paragraph Generation

    Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing
    Comments: 10 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    A natural image usually conveys rich semantic content and can be viewed from
    different angles. Existing image description methods are largely restricted by
    small sets of biased visual paragraph annotations, and fail to cover rich
    underlying semantics. In this paper, we investigate a semi-supervised paragraph
    generative framework that is able to synthesize diverse and semantically
    coherent paragraph descriptions by reasoning over local semantic regions and
    exploiting linguistic knowledge. The proposed Recurrent Topic-Transition
    Generative Adversarial Network (RTT-GAN) builds an adversarial framework
    between a structured paragraph generator and multi-level paragraph
    discriminators. The paragraph generator generates sentences recurrently by
    incorporating region-based visual and language attention mechanisms at each
    step. The quality of generated paragraph sentences is assessed by multi-level
    adversarial discriminators from two aspects, namely, plausibility at sentence
    level and topic-transition coherence at paragraph level. The joint adversarial
    training of RTT-GAN drives the model to generate realistic paragraphs with
    smooth logical transition between sentence topics. Extensive quantitative
    experiments on image and video paragraph datasets demonstrate the effectiveness
    of our RTT-GAN in both supervised and semi-supervised settings. Qualitative
    results on telling diverse stories for an image also verify the
    interpretability of RTT-GAN.

    Learning to Generate Samples from Noise through Infusion Training

    Florian Bordes, Sina Honari, Pascal Vincent
    Comments: Published as a conference paper at ICLR 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    In this work, we investigate a novel training procedure to learn a generative
    model as the transition operator of a Markov chain, such that, when applied
    repeatedly on an unstructured random noise sample, it will denoise it into a
    sample that matches the target distribution from the training set. The novel
    training procedure to learn this progressive denoising operation involves
    sampling from a slightly different chain than the model chain used for
    generation in the absence of a denoising target. In the training chain we
    infuse information from the training target example that we would like the
    chains to reach with a high probability. The thus learned transition operator
    is able to produce quality and varied samples in a small number of steps.
    Experiments show competitive results compared to the samples generated with a
    basic Generative Adversarial Net

    Active Decision Boundary Annotation with Deep Generative Models

    Miriam W. Huijser, Jan C. van Gemert
    Comments: ICCV submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    This paper is on active learning where the goal is to reduce the data
    annotation burden by interacting with a (human) oracle during training.
    Standard active learning methods ask the oracle to annotate data samples.
    Instead, we take a profoundly different approach: we ask for annotations of the
    decision boundary. We achieve this using a deep generative model to create
    novel instances along a 1d line. A point on the decision boundary is revealed
    where the instances change class. Experimentally we show on three data sets
    that our method can be plugged-in to other active learning schemes, that human
    oracles can effectively annotate points on the decision boundary, that our
    method is robust to annotation noise, and that decision boundary annotations
    improve over annotating data samples.

    Ensemble representation learning: an analysis of fitness and survival for wrapper-based genetic programming methods

    William La Cava, Jason H. Moore
    Comments: Genetic and Evolutionary Computation Conference (GECCO) 2017, Berlin, Germany
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

    Recently we proposed a general, ensemble-based feature engineering wrapper
    (FEW) that was paired with a number of machine learning methods to solve
    regression problems. Here, we adapt FEW for supervised classification and
    perform a thorough analysis of fitness and survival methods within this
    framework. Our tests demonstrate that two fitness metrics, one introduced as an
    adaptation of the silhouette score, outperform the more commonly used Fisher
    criterion. We analyze survival methods and demonstrate that (epsilon)-lexicase
    survival works best across our test problems, followed by random survival which
    outperforms both tournament and deterministic crowding. We conduct
    hyper-parameter optimization for several classification methods using a large
    set of problems to benchmark the ability of FEW to improve data
    representations. The results show that FEW can improve the best classifier
    performance on several problems. We show that FEW generates readable and
    meaningful features for a biomedical problem with different ML pairings.

    Application of backpropagation neural networks to both stages of fingerprinting based WIPS

    Caifa Zhou, Andreas Wieser
    Comments: 11 pages, 11 figures, published in proceedings UPINLBS 2016
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We propose a scheme to employ backpropagation neural networks (BPNNs) for
    both stages of fingerprinting-based indoor positioning using WLAN/WiFi signal
    strengths (FWIPS): radio map construction during the offline stage, and
    localization during the online stage. Given a training radio map (TRM), i.e., a
    set of coordinate vectors and associated WLAN/WiFi signal strengths of the
    available access points, a BPNN can be trained to output the expected signal
    strengths for any input position within the region of interest (BPNN-RM). This
    can be used to provide a continuous representation of the radio map and to
    filter, densify or decimate a discrete radio map. Correspondingly, the TRM can
    also be used to train another BPNN to output the expected position within the
    region of interest for any input vector of recorded signal strengths and thus
    carry out localization (BPNN-LA).Key aspects of the design of such artificial
    neural networks for a specific application are the selection of design
    parameters like the number of hidden layers and nodes within the network, and
    the training procedure. Summarizing extensive numerical simulations, based on
    real measurements in a testbed, we analyze the impact of these design choices
    on the performance of the BPNN and compare the results in particular to those
    obtained using the (k) nearest neighbors ((k)NN) and weighted (k) nearest
    neighbors approaches to FWIPS.

    Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

    Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, Pieter Abbeel
    Comments: 8 pages, 7 figures. Submitted to 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017)
    Subjects: Robotics (cs.RO); Learning (cs.LG)

    Bridging the ‘reality gap’ that separates simulated robotics from experiments
    on hardware could accelerate robotic research through improved data
    availability. This paper explores domain randomization, a simple technique for
    training models on simulated images that transfer to real images by randomizing
    rendering in the simulator. With enough variability in the simulator, the real
    world may appear to the model as just another variation. We focus on the task
    of object localization, which is a stepping stone to general robotic
    manipulation skills. We find that it is possible to train a real-world object
    detector that is accurate to (1.5)cm and robust to distractors and partial
    occlusions using only data from a simulator with non-realistic random textures.
    To demonstrate the capabilities of our detectors, we show they can be used to
    perform grasping in a cluttered environment. To our knowledge, this is the
    first successful transfer of a deep neural network trained only on simulated
    RGB images (without pre-training on real images) to the real world for the
    purpose of robotic control.

    A Comparison of deep learning methods for environmental sound

    Juncheng Li, Wei Dai, Florian Metze, Shuhui Qu, Samarjit Das
    Comments: 5 pages including reference
    Journal-ref: published at ICASSP 2017
    Subjects: Sound (cs.SD); Learning (cs.LG)

    Environmental sound detection is a challenging application of machine
    learning because of the noisy nature of the signal, and the small amount of
    (labeled) data that is typically available. This work thus presents a
    comparison of several state-of-the-art Deep Learning models on the IEEE
    challenge on Detection and Classification of Acoustic Scenes and Events (DCASE)
    2016 challenge task and data, classifying sounds into one of fifteen common
    indoor and outdoor acoustic scenes, such as bus, cafe, car, city center, forest
    path, library, train, etc. In total, 13 hours of stereo audio recordings are
    available, making this one of the largest datasets available. We perform
    experiments on six sets of features, including standard Mel-frequency cepstral
    coefficients (MFCC), Binaural MFCC, log Mel-spectrum and two different large-
    scale temporal pooling features extracted using OpenSMILE. On these features,
    we apply five models: Gaussian Mixture Model (GMM), Deep Neural Network (DNN),
    Recurrent Neural Network (RNN), Convolutional Deep Neural Net- work (CNN) and
    i-vector. Using the late-fusion approach, we improve the performance of the
    baseline 72.5% by 15.6% in 4-fold Cross Validation (CV) avg. accuracy and 11%
    in test accuracy, which matches the best result of the DCASE 2016 challenge.
    With large feature sets, deep neural network models out- perform traditional
    methods and achieve the best performance among all the studied methods.
    Consistent with other work, the best performing single model is the
    non-temporal DNN model, which we take as evidence that sounds in the DCASE
    challenge do not exhibit strong temporal dynamics.


    Information Theory

    Optimal DoF region of the K-User MISO BC with Partial CSIT

    Enrico Piovano, Bruno Clerckx
    Subjects: Information Theory (cs.IT)

    We consider the (K)-User Multiple-Input-Single-Output (MISO) Broadcast
    Channel (BC) where the transmitter, equipped with (M) antennas, serves (K)
    users, with (K leq M). The transmitter has access to a partial channel state
    information of the users. This is modelled by letting the variance of the
    Channel State Information at the Transmitter (CSIT) error of user (i) scale as
    (O(P^{-alpha_i})) for the Signal-to-Noise Ratio (SNR) (P) and some constant
    (alpha_i geq 0). In this work we derive the optimal Degrees-of-Freedom (DoF)
    region in such setting.

    Performance analysis of RF-FSO multi-hop networks

    Behrooz Makki, Tommy Svensson, Maite Brandt-Pearce, Mohamed-Slim Alouini
    Comments: Presented at IEEE WCNC 2017
    Subjects: Information Theory (cs.IT)

    We study the performance of multi-hop networks composed of millimeter wave
    (MMW)-based radio frequency (RF) and free-space optical (FSO) links. The
    results are obtained in the cases with and without hybrid automatic repeat
    request (HARQ). Taking the MMW characteristics of the RF links into account, we
    derive closed-form expressions for the network outage probability. We also
    evaluate the effect of various parameters such as power amplifiers efficiency,
    number of antennas as well as different coherence times of the RF and the FSO
    links on the system performance. Finally, we present mappings between the
    performance of RF-FSO multi-hop networks and the ones using only the RF- or the
    FSO-based communication, in the sense that with appropriate parameter settings
    the same outage probability is achieved in these setups. The results show the
    efficiency of the RF-FSO setups in different conditions. Moreover, the HARQ can
    effectively improve the outage probability/energy efficiency, and compensate
    the effect of hardware impairments in RF-FSO networks. For common parameter
    settings of the RF-FSO dual-hop networks, outage probability 10^{-4} and code
    rate 3 nats-per-channel-use, the implementation of HARQ with a maximum of 2 and
    3 retransmissions reduces the required power, compared to the cases with no
    HARQ, by 13 and 17 dB, respectively.

    Frequency Offset Estimation for OFDM Systems with a Novel Frequency Domain Training Sequence

    Yanxiang Jiang, Xiqi Gao, Xiaohu You
    Comments: 11 pages, 9 figures, IEICE Trans. Commun., 2006
    Journal-ref: IEICE Trans. Commun., Apr. 2006
    Subjects: Information Theory (cs.IT)

    A novel frequency domain training sequence and the corresponding carrier
    frequency offset (CFO) estimator are proposed for orthogonal frequency division
    multiplexing (OFDM) systems over frequency-selective fading channels. The
    proposed frequency domain training sequence comprises two types of pilot tones,
    namely distinctively spaced pilot tones with high energies and uniformly spaced
    ones with low energies. Based on the distinctively spaced pilot tones, integer
    CFO estimation is accomplished. After the subcarriers occupied by the
    distinctively spaced pilot tones and their adjacent subcarriers are nulled for
    the sake of interference cancellation, fractional CFO estimation is executed
    according to the uniformly spaced pilot tones. By exploiting a predefined
    lookup table making the best of the structure of the distinctively spaced pilot
    tones, computational complexity of the proposed CFO estimator can be decreased
    considerably. With the aid of the uniformly spaced pilot tones generated from
    Chu sequence with cyclically orthogonal property, the ability of the proposed
    estimator to combat multipath effect is enhanced to a great extent. Simulation
    results illustrate the good performance of the proposed CFO estimator.

    Full-duplex Amplify-and-Forward Relaying: Power and Location Optimization

    Shuai Li, Kun Yang, Mingxin Zhou, Jianjun Wu, Lingyang Song, Yonghui Li, Hongbin Li
    Comments: tvt,journal
    Subjects: Information Theory (cs.IT)

    In this paper, we consider a full-duplex (FD) amplify-and-forward (AF) relay
    system and optimize its power allocation and relay location to minimize the
    system symbol error rate (SER). We first derive the asymptotic expressions of
    the outage probability and SER performance by taking into account the residual
    self interference (RSI) in FD systems. We then formulate the optimization
    problem based on the minimal SER criterion. Analytical and numerical results
    show that optimized relay location and power allocation can greatly improve
    system SER performance, and the performance floor caused by the RSI can be
    significantly reduced via optimizing relay location or power allocation.

    Simplified Frequency Offset Estimation for MIMO OFDM Systems

    Yanxiang Jiang, Hlaing Minn, Xiaohu You, Xiqi Gao
    Comments: 5 pages, 3 figures, IEEE TVT, 2008
    Journal-ref: IEEE TVT, Sept. 2008
    Subjects: Information Theory (cs.IT)

    This paper addresses a simplified frequency offset estimator for
    multiple-input multiple-output (MIMO) orthogonal frequency division
    multiplexing (OFDM) systems over frequency selective fading channels. By
    exploiting the good correlation property of the training sequences, which are
    constructed from the Chu sequence, carrier frequency offset (CFO) estimation is
    obtained through factor decomposition for the derivative of the cost function
    with great complexity reduction. The mean-squared error (MSE) of the CFO
    estimation is derived to optimize the key parameter of the simplified estimator
    and also to evaluate the estimator performance. Simulation results confirm the
    good performance of the training-assisted CFO estimator.

    SNR Degradation due to Carrier Frequency Offset in OFDM based Amplify-and-Forward Relay Systems

    Yanxiang Jiang, Yanxing Hu, Xiaohu You
    Comments: 4 pages, 4 figures, IEICE Trans. Commun
    Journal-ref: IEICE Trans. Commun., Jan. 2012
    Subjects: Information Theory (cs.IT)

    In this letter, signal-to-noise ratio (SNR) performance is analyzed for
    orthogonal frequency division multiplexing (OFDM) based amplify-and-forward
    (AF) relay systems in the presence of carrier frequency offset (CFO) for fading
    channels. The SNR expression is derived under one-relay-node scenario, and is
    further extended to multiple-relay-node scenario. Analytical results show that
    the SNR is quite sensitive to CFO and the sensitivity of the SNR to CFO is
    mainly determined by the power of the corresponding link channel and gain
    factor.

    Energy Efficient Power Allocation in Massive MIMO Systems based on Standard Interference Function

    Jiadian Zhang, Yanxiang Jiang, Peng Li, Fuchun Zheng, Xiaohu You
    Comments: 6 pages, 5 figures, IEEE VTC 2016-S
    Journal-ref: IEEE VTC 2016-S, May 2016
    Subjects: Information Theory (cs.IT)

    In this paper, energy efficient power allocation for downlink massive MIMO
    systems is investigated. A constrained non-convex optimization problem is
    formulated to maximize the energy efficiency (EE), which takes into account the
    quality of service (QoS) requirements. By exploiting the properties of
    fractional programming and the lower bound of the user data rate, the
    non-convex optimization problem is transformed into a convex optimization
    problem. The Lagrangian dual function method is utilized to convert the
    constrained convex problem into an unconstrained convex one. Due to the
    multi-variable coupling problem caused by the intra-user interference, it is
    intractable to derive an explicit solution to the above optimization problem.
    Exploiting the standard interference function, we propose an implicit iterative
    algorithm to solve the unconstrained convex optimization problem and obtain the
    optimal power allocation scheme. Simulation results show that the proposed
    iterative algorithm converges in just a few iterations, and demonstrate the
    impact of the number of users and the number of antennas on the EE.

    A CMDP-based Approach for Energy Efficient Power Allocation in Massive MIMO Systems

    Peng Li, Yanxiang Jiang, Wei Li, Fuchun Zheng, Xiaohu You
    Comments: 6 pages, 5 figures, IEEE Wireless Communications and Networking Conference (WCNC’16)
    Journal-ref: IEEE Wireless Communications and Networking Conference (WCNC’16),
    April 2016
    Subjects: Information Theory (cs.IT)

    In this paper, energy efficient power allocation for the uplink of a
    multi-cell massive MIMO system is investigated. With the simplified power
    consumption model, the problem of power allocation is formulated as a
    constrained Markov decision process (CMDP) framework with infinite-horizon
    expected discounted total reward, which takes into account different quality of
    service (QoS) requirements for each user terminal (UT). We propose an offline
    solution containing the value iteration and Q-learning algorithms, which can
    obtain the global optimum power allocation policy. Simulation results show that
    our proposed policy performs very close to the ergodic optimal policy.

    Energy Efficient Joint Resource Allocation and Power Control for D2D Communications

    Yanxiang Jiang, Qiang Liu, Fuchun Zheng, Xiqi Gao, Xiaohu You
    Comments: 9 pages, 5 figures, this paper has been published by IEEE Transactions on Vehicular Technology
    Journal-ref: IEEE Transactions on Vehicular Technology, August 2016
    Subjects: Information Theory (cs.IT)

    In this paper, joint resource allocation and power control for energy
    efficient device-to-device (D2D) communications underlaying cellular networks
    are investigated. The resource and power are optimized for maximization of the
    energy efficiency (EE) of D2D communications. Exploiting the properties of
    fractional programming, we transform the original nonconvex optimization
    problem in fractional form into an equivalent optimization problem in
    subtractive form. Then, an efficient iterative resource allocation and power
    control scheme is proposed. In each iteration, part of the constraints of the
    EE optimization problem is removed by exploiting the penalty function approach.
    We further propose a novel two-layer approach which allows to find the optimum
    at each iteration by decoupling the EE optimization problem of joint resource
    allocation and power control into two separate steps. In the first layer, the
    optimal power values are obtained by solving a series of maximization problems
    through root-finding with or without considering the loss of cellular users’
    rates. In the second layer, the formulated optimization problem belongs to a
    classical resource allocation problem with single allocation format which
    admits a network flow formulation so that it can be solved to optimality.
    Simulation results demonstrate the remarkable improvements in terms of EE by
    using the proposed iterative resource allocation and power control scheme.

    Pattern Division Multiple Access with Large-scale Antenna Array

    Peng Li, Yanxiang Jiang, Shaoli Kang, Fuchun Zheng, Xiaohu You
    Comments: 6 pages, 5 figures, this paper has been accepted by IEEE VTC 2017-Spring
    Subjects: Information Theory (cs.IT)

    In this paper, pattern division multiple access with large-scale antenna
    array (LSA-PDMA) is proposed as a novel non-orthogonal multiple access (NOMA)
    scheme. In the proposed scheme, pattern is designed in both beam domain and
    power domain in a joint manner. At the transmitter, pattern mapping utilizes
    power allocation to improve the system sum rate and beam allocation to enhance
    the access connectivity and realize the integration of LSA into multiple access
    spontaneously. At the receiver, hybrid detection of spatial filter (SF) and
    successive interference cancellation (SIC) is employed to separate the
    superposed multiple-domain signals. Furthermore, we formulate the sum rate
    maximization problem to obtain the optimal pattern mapping policy, and the
    optimization problem is proved to be convex through proper mathematical
    manipulations. Simulation results show that the proposed LSA-PDMA scheme
    achieves significant performance gain on system sum rate compared to both the
    orthogonal multiple access scheme and the power-domain NOMA scheme.

    Sparse Channel Estimation for Massive MIMO System Based on Dirichlet Process and Combined Message Passing

    Zhengdao Yuan
    Comments: arXiv admin note: text overlap with arXiv:1409.4671 by other authors
    Subjects: Information Theory (cs.IT)

    This paper investigate the problem of estimating sparse channels in massive
    MIMO systems. Most wireless channel are sparse with large delay spread, while
    some channels can be observed have common support within a certain area of the
    antenna array. This common support property is attractive when it comes to the
    estimation of large number of channels in massive MIMO systems. In this paper,
    we proposed a novel channel estimation approach which utilize the common
    support by exerting a Dirichlet process (DP) prior over the sparse Bayesian
    learning (SBL) model. In addition, this Dirichlet process is modeled based on
    factor graph and combined BP-MF message passing. Compare to the variational
    Bayesian (VB) method in literaturewhich, the proposed method can improve the
    performance while significantly reduce the complexity. Simulation results
    demonstrate that the proposed algorithm outperform other reported ones in both
    performance and complexity

    A Unified Performance Analysis of the Effective Capacity of Dispersed Spectrum Cognitive Radio Systems over Generalized Fading Channels

    K. Denia Kanellopoulou, Kostas P. Peppas, P. Takis Mathiopoulos
    Subjects: Information Theory (cs.IT)

    The effective capacity (EC) has been recently established as a rigorous
    alternative to the classical Shannon’ s ergodic capacity since it accounts for
    the delay constraints imposed by future wireless applications and their impact
    on the overall system performance. This paper develops a novel unified approach
    for the EC analysis of dispersed spectrum cognitive radio (CR) with equal gain
    combining (EGC) and maximal ratio combining (MRC) diversity receivers over
    generalized fading channels under a maximum delay constraint. The mathematical
    formalism is validated with selected numerical and equivalent simulation
    performance evaluation results thus confirming the correctness of the proposed
    unified approach.




沪ICP备19023445号-2号
友情链接