IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 21 Mar 2017

    我爱机器学习(52ml.net)发表于 2017-03-21 00:00:00
    love 0

    Neural and Evolutionary Computing

    Empirical Analysis of the Necessary and Sufficient Conditions of the Echo State Property

    Sebastián Basterrech
    Comments: 23 pages, 14 figures, accepted paper for the IEEE IJCNN, 2017
    Subjects: Neural and Evolutionary Computing (cs.NE)

    The Echo State Network (ESN) is a specific recurrent network, which has
    gained popularity during the last years. The model has a recurrent network
    named reservoir, that is fixed during the learning process. The reservoir is
    used for transforming the input space in a larger space. A fundamental property
    that provokes an impact on the model accuracy is the Echo State Property (ESP).
    There are two main theoretical results related to the ESP. First, a sufficient
    condition for the ESP existence that involves the singular values of the
    reservoir matrix. Second, a necessary condition for the ESP. The ESP can be
    violated according to the spectral radius value of the reservoir matrix. There
    is a theoretical gap between these necessary and sufficient conditions. This
    article presents an empirical analysis of the accuracy and the projections of
    reservoirs that satisfy this theoretical gap. It gives some insights about the
    generation of the reservoir matrix. From previous works, it is already known
    that the optimal accuracy is obtained near to the border of stability control
    of the dynamics. Then, according to our empirical results, we can see that this
    border seems to be closer to the sufficient conditions than to the necessary
    conditions of the ESP.

    A wake-sleep algorithm for recurrent, spiking neural networks

    Johannes Thiele, Peter Diehl, Matthew Cook
    Comments: Presented at the NIPS 2016 workshop “Computing with Spikes”
    Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

    We investigate a recently proposed model for cortical computation which
    performs relational inference. It consists of several interconnected,
    structurally equivalent populations of leaky integrate-and-fire (LIF) neurons,
    which are trained in a self-organized fashion with spike-timing dependent
    plasticity (STDP). Despite its robust learning dynamics, the model is
    susceptible to a problem typical for recurrent networks which use a correlation
    based (Hebbian) learning rule: if trained with high learning rates, the
    recurrent connections can cause strong feedback loops in the network dynamics,
    which lead to the emergence of attractor states. This causes a strong reduction
    in the number of representable patterns and a decay in the inference ability of
    the network. As a solution, we introduce a conceptually very simple
    “wake-sleep” algorithm: during the wake phase, training is executed normally,
    while during the sleep phase, the network “dreams” samples from its generative
    model, which are induced by random input. This process allows us to activate
    the attractor states in the network, which can then be unlearned effectively by
    an anti-Hebbian mechanism. The algorithm allows us to increase learning rates
    up to a factor of ten while avoiding clustering, which allows the network to
    learn several times faster. Also for low learning rates, where clustering is
    not an issue, it improves convergence speed and reduces the final inference
    error.

    An Adaptive Framework to Tune the Coordinate Systems in Evolutionary Algorithms

    Zhi-Zhong Liu, Yong Wang, Shengxiang Yang, Ke Tang
    Comments: This paper provides a new point of view toward how to describe an evolutionary operator in the original coordinate system, and also offers a convenient transformation from an evolutionary operator in the original coordinate system to the corresponding evolutionary operator in the Eigen coordinate system
    Subjects: Neural and Evolutionary Computing (cs.NE)

    In the evolutionary computation research community, the performance of most
    evolutionary algorithms (EAs) depends strongly on their implemented coordinate
    system. However, the commonly used coordinate system is fixed and not well
    suited for different function landscapes, EAs thus might not search
    efficiently. To overcome this shortcoming, in this paper we propose a
    framework, named ACoS, to adaptively tune the coordinate systems in EAs. In
    ACoS, an Eigen coordinate system is established by making use of the cumulative
    population distribution information, which can be obtained based on a
    covariance matrix adaptation strategy and an additional archiving mechanism.
    Since the population distribution information can reflect the features of the
    function landscape to some extent, EAs in the Eigen coordinate system have the
    capability to identify the modality of the function landscape. In addition, the
    Eigen coordinate system is coupled with the original coordinate system, and
    they are selected according to a probability vector. The probability vector
    aims to determine the selection ratio of each coordinate system for each
    individual, and is adaptively updated based on the collected information from
    the offspring. ACoS has been applied to two of the most popular EA paradigms,
    i.e., particle swarm optimization (PSO) and differential evolution (DE), for
    solving 30 test functions with 30 and 50 dimensions at the 2014 IEEE Congress
    on Evolutionary Computation. The experimental studies demonstrate its
    effectiveness.

    Curriculum Dropout

    Pietro Morerio, Jacopo Cavazza, Riccardo Volpi, Rene Vidal, Vittorio Murino
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

    Dropout is a very effective way of regularizing neural networks.
    Stochastically “dropping out” units with a certain probability discourages
    over-specific co-adaptations of feature detectors, preventing overfitting and
    improving network generalization. Besides, Dropout can be interpreted as an
    approximate model aggregation technique, where an exponential number of smaller
    networks are averaged in order to get a more powerful ensemble. In this paper,
    we show that using a fixed dropout probability during training is a suboptimal
    choice. We thus propose a time scheduling for the probability of retaining
    neurons in the network. This induces an adaptive regularization scheme that
    smoothly increases the difficulty of the optimization problem. This idea of
    “starting easy” and adaptively increasing the difficulty of the learning
    problem has its roots in curriculum learning and allows one to train better
    models. Indeed, we prove that our optimization strategy implements a very
    general curriculum scheme, by gradually adding noise to both the input and
    intermediate feature representations within the network architecture.
    Experiments on seven image classification datasets and different network
    architectures show that our method, named Curriculum Dropout, frequently yields
    to better generalization and, at worst, performs just as well as the standard
    Dropout method.

    Boosting Dilated Convolutional Networks with Mixed Tensor Decompositions

    Nadav Cohen, Ronen Tamari, Amnon Shashua
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Expressive efficiency is a concept that allows formally reasoning about the
    representational capacity of deep network architectures. A network architecture
    is expressively efficient with respect to an alternative architecture if the
    latter must grow super-linearly in order to represent functions realized by the
    former. A well-known example is the exponential expressive efficiency of depth,
    namely, that in many cases shallow networks must grow exponentially large in
    order to represent functions realized by deep networks. In this paper we study
    the expressive efficiency brought forth by the architectural feature of
    connectivity, motivated by the observation that nearly all state of the art
    networks these days employ elaborate connection schemes, running layers in
    parallel while splitting and merging them in various ways. A formal treatment
    of this question would shed light on the effectiveness of modern connectivity
    schemes, and in addition, could provide new tools for network design. We focus
    on dilated convolutional networks, a family of deep models gaining increased
    attention, underlying state of the art architectures like Google’s WaveNet and
    ByteNet. By introducing and studying the concept of mixed tensor
    decompositions, we prove that interconnecting dilated convolutional networks
    can lead to expressive efficiency. In particular, we show that a single
    connection between intermediate layers can already lead to an almost quadratic
    gap, which in large-scale settings typically makes the difference between a
    model that is practical and one that is not.

    QMDP-Net: Deep Learning for Planning under Partial Observability

    Peter Karkus, David Hsu, Wee Sun Lee
    Comments: 9 pages, 5 figures, 1 table
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    This paper introduces QMDP-net, a neural network architecture for planning
    under partial observability. The QMDP-net combines the strengths of model-free
    learning and model-based planning. It is a recurrent policy network, but it
    represents a policy by connecting a model with a planning algorithm that solves
    the model, thus embedding the solution structure of planning in the network
    architecture. The QMDP-net is fully differentiable and allows end-to-end
    training. We train a QMDP-net over a set of different environments so that it
    can generalize over new ones. In preliminary experiments, QMDP-net showed
    strong performance on several robotic tasks in simulation. Interestingly, it
    also sometimes outperformed the QMDP algorithm, which generated the data for
    learning, because of QMDP-net’s robustness resulting from end-to-end learning.

    Generating Multi-label Discrete Electronic Health Records using Generative Adversarial Networks

    Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, Jimeng Sun
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Access to electronic health records (EHR) data has motivated computational
    advances in medical research. However, various concerns, particularly over
    privacy, can limit access to and collaborative use of EHR data. Sharing
    synthetic EHR data could mitigate risk. In this paper, we propose a new
    approach, medical Generative Adversarial Network (medGAN), to generate
    realistic synthetic EHRs. Based on an input EHR dataset, medGAN can generate
    high-dimensional discrete variables (e.g., binary and count features) via a
    combination of an autoencoder and generative adversarial networks. We also
    propose minibatch averaging to efficiently avoid mode collapse, and increase
    the learning efficiency with batch normalization and shortcut connections. To
    demonstrate feasibility, we showed that medGAN generates synthetic EHR datasets
    that achieve comparable performance to real data on many experiments including
    distribution statistics, predictive modeling tasks and medical expert review.

    An Automated Auto-encoder Correlation-based Health-Monitoring and Prognostic Method for Machine Bearings

    Ramin M. Hasani, Guodong Wang, Radu Grosu
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    This paper studies an intelligent ultimate technique for health-monitoring
    and prognostic of common rotary machine components, particularly bearings.
    During a run-to-failure experiment, rich unsupervised features from vibration
    sensory data are extracted by a trained sparse auto-encoder. Then, the
    correlation of the extracted attributes of the initial samples (presumably
    healthy at the beginning of the test) with the succeeding samples is calculated
    and passed through a moving-average filter. The normalized output is named
    auto-encoder correlation-based (AEC) rate which stands for an informative
    attribute of the system depicting its health status and precisely identifying
    the degradation starting point. We show that AEC technique well-generalizes in
    several run-to-failure tests. AEC collects rich unsupervised features form the
    vibration data fully autonomous. We demonstrate the superiority of the AEC over
    many other state-of-the-art approaches for the health monitoring and prognostic
    of machine bearings.

    SIM-CE: An Advanced Simulink Platform for Studying the Brain of Caenorhabditis elegans

    Ramin M. Hasani, Victoria Beneder, Magdalena Fuchs, David Lung, Radu Grosu
    Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

    We introduce SIM-CE, an advanced, user-friendly modeling and simulation
    environment in Simulink for performing multi-scale behavioral analysis of the
    nervous system of Caenorhabditis elegans (C. elegans). SIM-CE contains an
    implementation of the mathematical models of C. elegans’s neurons and synapses,
    in Simulink, which can be easily extended and particularized by the user. The
    Simulink model is able to capture both complex dynamics of ion channels and
    additional biophysical detail such as intracellular calcium concentration. We
    demonstrate the performance of SIM-CE by carrying out neuronal, synaptic and
    neural-circuit-level behavioral simulations. Such environment enables the user
    to capture unknown properties of the neural circuits, test hypotheses and
    determine the origin of many behavioral plasticities exhibited by the worm.

    Non-Associative Learning Representation in the Nervous System of the Nematode Caenorhabditis elegans

    Ramin M. Hasani, Magdalena Fuchs, Victoria Beneder, Radu Grosu
    Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE); Quantitative Methods (q-bio.QM)

    Caenorhabditis elegans (C. elegans) illustrated remarkable behavioral
    plasticities including complex non-associative and associative learning
    representations. Understanding the principles of such mechanisms presumably
    leads to constructive inspirations for the design of efficient learning
    algorithms. In the present study, we postulate a novel approach on modeling
    single neurons and synapses to study the mechanisms underlying learning in the
    C. elegans nervous system. In this regard, we construct a precise mathematical
    model of sensory neurons where we include multi-scale details from genes, ion
    channels and ion pumps, together with a dynamic model of synapses comprised of
    neurotransmitters and receptors kinetics. We recapitulate mechanosensory
    habituation mechanism, a non-associative learning process, in which elements of
    the neural network tune their parameters as a result of repeated input stimuli.
    Accordingly, we quantitatively demonstrate the roots of such plasticity in the
    neuronal and synaptic-level representations. Our findings can potentially give
    rise to the development of new bio-inspired learning algorithms.

    Deciding How to Decide: Dynamic Routing in Artificial Neural Networks

    Mason McGill, Pietro Perona
    Comments: Submitted to ICML 2017
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose and systematically evaluate three strategies for training
    dynamically-routed artificial neural networks: graphs of learned
    transformations through which different input signals may take different paths.
    Though some approaches have advantages over others, the resulting networks are
    often qualitatively similar. We find that, in dynamically-routed networks
    trained to classify images, layers and branches become specialized to process
    distinct categories of images. Additionally, given a fixed computational
    budget, dynamically-routed networks tend to perform better than comparable
    statically-routed networks.


    Computer Vision and Pattern Recognition

    Mask R-CNN

    Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick
    Comments: Technical report
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a conceptually simple, flexible, and general framework for object
    instance segmentation. Our approach efficiently detects objects in an image
    while simultaneously generating a high-quality segmentation mask for each
    instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a
    branch for predicting an object mask in parallel with the existing branch for
    bounding box recognition. Mask R-CNN is simple to train and adds only a small
    overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to
    generalize to other tasks, e.g., allowing us to estimate human poses in the
    same framework. We show top results in all three tracks of the COCO suite of
    challenges, including instance segmentation, bounding-box object detection, and
    person keypoint detection. Without tricks, Mask R-CNN outperforms all existing,
    single-model entries on every task, including the COCO 2016 challenge winners.
    We hope our simple and effective approach will serve as a solid baseline and
    help ease future research in instance-level recognition. Code will be made
    available.

    Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization

    Xun Huang, Serge Belongie
    Comments: An extended abstract of this paper was accepted to ICLR workshop. Code is available: this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Gatys et al. recently introduced a neural algorithm that renders a content
    image in the style of another image, achieving so-called style transfer.
    However, their framework requires a slow iterative optimization process, which
    limits its practical application. Fast approximations with feed-forward neural
    networks have been proposed to speed up neural style transfer. Unfortunately,
    the speed improvement comes at a cost: the network is usually tied to a fixed
    set of styles and cannot adapt to arbitrary new styles. In this paper, we
    present a simple yet effective approach that for the first time enables
    arbitrary style transfer in real-time. At the heart of our method is a novel
    adaptive instance normalization (AdaIN) layer that aligns the mean and variance
    of the content features with those of the style features. Our method achieves
    speed comparable to the fastest existing approach, without the restriction to a
    pre-defined set of styles. In addition, our approach allows flexible user
    controls such as content-style trade-off, style interpolation, color & spatial
    controls, all using a single feed-forward neural network.

    Deep Neural Networks Do Not Recognize Negative Images

    Hossein Hosseini, Radha Poovendran
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Deep Neural Networks (DNNs) have achieved remarkable performance on a variety
    of pattern-recognition tasks, particularly visual classification problems,
    where new algorithms reported to achieve or even surpass the human performance.
    In this paper, we test the state-of-the-art DNNs with negative images and show
    that the accuracy drops to the level of random classification. This leads us to
    the conjecture that the DNNs, which are merely trained on raw data, do not
    recognize the semantics of the objects, but rather memorize the inputs. We
    suggest that negative images can be thought as “semantic adversarial examples”,
    which we define as transformed inputs that semantically represent the same
    objects, but the model does not classify them correctly.

    Second-order Convolutional Neural Networks

    Kaicheng Yu, Mathieu Salzmann
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional Neural Networks (CNNs) have been successfully applied to many
    computer vision tasks, such as image classification. By performing linear
    combinations and element-wise nonlinear operations, these networks can be
    thought of as extracting solely first-order information from an input image. In
    the past, however, second-order statistics computed from handcrafted features,
    e.g., covariances, have proven highly effective in diverse recognition tasks.
    In this paper, we introduce a novel class of CNNs that exploit second-order
    statistics. To this end, we design a series of new layers that (i) extract a
    covariance matrix from convolutional activations, (ii) compute a parametric,
    second-order transformation of a matrix, and (iii) perform a parametric
    vectorization of a matrix. These operations can be assembled to form a
    Covariance Descriptor Unit (CDU), which replaces the fully-connected layers of
    standard CNNs. Our experiments demonstrate the benefits of our new
    architecture, which outperform the first-order CNNs, while relying on up to 90%
    fewer parameters.

    I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation

    Hao Dong, Jingqing Zhang, Douglas McIlwraith, Yike Guo
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    Translating information between text and image is a fundamental problem in
    artificial intelligence that connects natural language processing and computer
    vision. In the past few years, performance in image caption generation has seen
    significant improvement through the adoption of recurrent neural networks
    (RNN). Meanwhile, text-to-image generation begun to generate plausible images
    using datasets of specific categories like birds and flowers. We’ve even seen
    image generation from multi-category datasets such as the Microsoft Common
    Objects in Context (MSCOCO) through the use of generative adversarial networks
    (GANs). Synthesizing objects with a complex shape, however, is still
    challenging. For example, animals and humans have many degrees of freedom,
    which means that they can take on many complex shapes. We propose a new
    training method called Image-Text-Image (I2T2I) which integrates text-to-image
    and image-to-text (image captioning) synthesis to improve the performance of
    text-to-image synthesis. We demonstrate that %the capability of our method to
    understand the sentence descriptions, so as to I2T2I can generate better
    multi-categories images using MSCOCO than the state-of-the-art. We also
    demonstrate that I2T2I can achieve transfer learning by using a pre-trained
    image captioning module to generate human images on the MPII Human Pose

    Twitter100k: A Real-world Dataset for Weakly Supervised Cross-Media Retrieval

    Yuting Hu, Liang Zheng, Yi Yang, Yongfeng Huang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper contributes a new large-scale dataset for weakly supervised
    cross-media retrieval, named Twitter100k. Current datasets, such as Wikipedia,
    NUS Wide and Flickr30k, have two major limitations. First, these datasets are
    lacking in content diversity, i.e., only some pre-defined classes are covered.
    Second, texts in these datasets are written in well-organized language, leading
    to inconsistency with realistic applications. To overcome these drawbacks, the
    proposed Twitter100k dataset is characterized by two aspects: 1) it has 100,000
    image-text pairs randomly crawled from Twitter and thus has no constraint in
    the image categories; 2) text in Twitter100k is written in informal language by
    the users.

    Since strongly supervised methods leverage the class labels that may be
    missing in practice, this paper focuses on weakly supervised learning for
    cross-media retrieval, in which only text-image pairs are exploited during
    training. We extensively benchmark the performance of four subspace learning
    methods and three variants of the Correspondence AutoEncoder, along with
    various text features on Wikipedia, Flickr30k and Twitter100k. Novel insights
    are provided. As a minor contribution, inspired by the characteristic of
    Twitter100k, we propose an OCR-based cross-media retrieval method. In
    experiment, we show that the proposed OCR-based method improves the baseline
    performance.

    Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

    Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra
    Comments: 11 pages, 4 figures, 2 tables, webpage: this http URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    We introduce the first goal-driven training for visual question answering and
    dialog agents. Specifically, we pose a cooperative ‘image guessing’ game
    between two agents — Qbot and Abot — who communicate in natural language
    dialog so that Qbot can select an unseen image from a lineup of images. We use
    deep reinforcement learning (RL) to learn the policies of these agents
    end-to-end — from pixels to multi-agent multi-round dialog to game reward.

    We demonstrate two experimental results.

    First, as a ‘sanity check’ demonstration of pure RL (from scratch), we show
    results on a synthetic world, where the agents communicate in ungrounded
    vocabulary, i.e., symbols with no pre-specified meanings (X, Y, Z). We find
    that two bots invent their own communication protocol and start using certain
    symbols to ask/answer about certain visual attributes (shape/color/size). Thus,
    we demonstrate the emergence of grounded language and communication among
    ‘visual’ dialog agents with no human supervision at all.

    Second, we conduct large-scale real-image experiments on the VisDial dataset,
    where we pretrain with supervised dialog data and show that the RL ‘fine-tuned’
    agents significantly outperform SL agents. Interestingly, the RL Qbot learns to
    ask questions that Abot is good at, ultimately resulting in more informative
    dialog and a better team.

    Object category understanding via eye fixations on freehand sketches

    Ravi Kiran Sarvadevabhatla, Sudharshan Suresh, R. Venkatesh Babu
    Comments: Accepted for publication in Transactions on Image Processing (this http URL)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The study of eye gaze fixations on photographic images is an active research
    area. In contrast, the image subcategory of freehand sketches has not received
    as much attention for such studies. In this paper, we analyze the results of a
    free-viewing gaze fixation study conducted on 3904 freehand sketches
    distributed across 160 object categories. Our analysis shows that fixation
    sequences exhibit marked consistency within a sketch, across sketches of a
    category and even across suitably grouped sets of categories. This multi-level
    consistency is remarkable given the variability in depiction and extreme image
    content sparsity that characterizes hand-drawn object sketches. In our paper,
    we show that the multi-level consistency in the fixation data can be exploited
    to (a) predict a test sketch’s category given only its fixation sequence and
    (b) build a computational model which predicts part-labels underlying fixations
    on objects. We hope that our findings motivate the community to deem
    sketch-like representations worthy of gaze-based studies vis-a-vis photographic
    images.

    Vision-based Real-Time Aerial Object Localization and Tracking for UAV Sensing System

    Yuanwei Wu, Yao Sui, Guanghui Wang
    Comments: 8 pages, 7 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The paper focuses on the problem of vision-based obstacle detection and
    tracking for unmanned aerial vehicle navigation. A real-time object
    localization and tracking strategy from monocular image sequences is developed
    by effectively integrating the object detection and tracking into a dynamic
    Kalman model. At the detection stage, the object of interest is automatically
    detected and localized from a saliency map computed via the image background
    connectivity cue at each frame; at the tracking stage, a Kalman filter is
    employed to provide a coarse prediction of the object state, which is further
    refined via a local detector incorporating the saliency map and the temporal
    information between two consecutive frames. Compared to existing methods, the
    proposed approach does not require any manual initialization for tracking, runs
    much faster than the state-of-the-art trackers of its kind, and achieves
    competitive tracking performance on a large number of image sequences.
    Extensive experiments demonstrate the effectiveness and superior performance of
    the proposed approach.

    Detecting Oriented Text in Natural Images by Linking Segments

    Baoguang Shi, Xiang Bai, Serge Belongie
    Comments: Accepted by CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Most state-of-the-art text detection methods are specific to horizontal text
    in Latin scripts and are not fast enough for real-time applications. We
    introduce Segment Linking (SegLink), an oriented text detection method. The
    main idea is to decompose text into two locally detectable elements, namely
    segments and links. A segment is an oriented bounding box that covers a part of
    a word or text line; A link connects two adjacent segments, indicating that
    they belong to the same word or line. Both elements are detected densely at
    multiple scales by an end-to-end trained, fully-convolutional neural network.
    Final detections are the combinations of segments that are connected by links.
    Compared with previous methods, our method improves along the dimensions of
    accuracy, speed and ease of training. It achieves an f-measure of 75.0% on the
    standard ICDAR 2015 Incidental (Challenge 4) benchmark, outperforming the
    previous best by a large margin. It runs at over 20 FPS on 512×512 input
    images. In addition, our method is able to detect non-Latin text in long lines.

    VQABQ: Visual Question Answering by Basic Questions

    Jia-Hong Huang, Modar Alfadly, Bernard Ghanem
    Comments: Submitted ICCV 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    Taking image and question as the input of our method, it can output the
    text-based answer of the query question about the given image, so called Visual
    Question Answering (VQA). There are two main modules in our algorithm. Given a
    natural language question about an image, the first module takes the question
    as input and then outputs the basic questions of the main question, given
    question. The second module takes the main question, image and these basic
    questions as input and then outputs the text-based answer of the main question.
    We formulate the basic questions generation problem as a LASSO optimization
    problem, and also propose a criterion about how to exploit these basic
    questions to help answer main question. Our method is evaluated on the
    challenging VQA dataset, and yields the competitive performance compared to
    state-of-the-art.

    Deep Neural Networks for Semantic Segmentation of Multispectral Remote Sensing Imagery

    Ronald Kemker, Christopher Kanan
    Comments: 9 pages, submitted to ICCV 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    A semantic segmentation algorithm must assign a label to every pixel in an
    image. Recently, semantic segmentation of RGB imagery has advanced
    significantly due to deep learning. Because creating datasets for semantic
    segmentation is laborious, these datasets tend to be significantly smaller than
    object recognition datasets. This makes it difficult to directly train a deep
    neural network for semantic segmentation, because it will be prone to
    overfitting. To cope with this, deep learning models typically use
    convolutional neural networks pre-trained on large-scale image classification
    datasets, which are then fine-tuned for semantic segmentation. For non-RGB
    imagery, this is currently not possible because large-scale labeled non-RGB
    datasets do not exist. In this paper, we developed two deep neural networks for
    semantic segmentation of multispectral remote sensing imagery. Prior to
    training on the target dataset, we initialize the networks with large amounts
    of synthetic multispectral imagery. We show that this significantly improves
    results on real-world remote sensing imagery, and we establish a new
    state-of-the-art result on the challenging Hamlin Beach State Park Dataset.

    A Fully-Automated Pipeline for Detection and Segmentation of Liver Lesions and Pathological Lymph Nodes

    Assaf Hoogi, John W. Lambert, Yefeng Zheng, Dorin Comaniciu, Daniel L. Rubin
    Comments: Workshop on Machine Learning in Healthcare, Neural Information Processing Systems (NIPS). Barcelona, Spain, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a fully-automated method for accurate and robust detection and
    segmentation of potentially cancerous lesions found in the liver and in lymph
    nodes. The process is performed in three steps, including organ detection,
    lesion detection and lesion segmentation. Our method applies machine learning
    techniques such as marginal space learning and convolutional neural networks,
    as well as active contour models. The method proves to be robust in its
    handling of extremely high lesion diversity. We tested our method on volumetric
    computed tomography (CT) images, including 42 volumes containing liver lesions
    and 86 volumes containing 595 pathological lymph nodes. Preliminary results
    under 10-fold cross validation show that for both the liver lesions and the
    lymph nodes, a total detection sensitivity of 0.53 and average Dice score of
    (0.71 pm 0.15) for segmentation were obtained.

    TAC-GAN – Text Conditioned Auxiliary Classifier Generative Adversarial Network

    Ayushman Dash, John Cristian Borges Gamboa, Sheraz Ahmed, Muhammad Zeshan Afzal, Marcus Liwicki
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this work, we present the Text Conditioned Auxiliary Classifier Generative
    Adversarial Network, (TAC-GAN) a text to image Generative Adversarial Network
    (GAN) for synthesizing images from their text descriptions. Former approaches
    have tried to condition the generative process on the textual data; but allying
    it to the usage of class information, known to diversify the generated samples
    and improve their structural coherence, has not been explored. We trained the
    presented TAC-GAN model on the Oxford-102 dataset of flowers, and evaluated the
    discriminability of the generated images with Inception-Score, as well as their
    diversity using the Multi-Scale Structural Similarity Index (MS-SSIM). Our
    approach outperforms the state-of-the-art models, i.e., its inception score is
    3.45, corresponding to a relative increase of 7.8% compared to the recently
    introduced StackGan. A comparison of the mean MS-SSIM scores of the training
    and generated samples per class shows that our approach is able to generate
    highly diverse images with an average MS-SSIM of 0.14 over all generated
    classes.

    Multilevel Context Representation for Improving Object Recognition

    Andreas Kölsch, Muhammad Zeshan Afzal, Marcus Liwicki
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this work, we propose the combined usage of low- and high-level blocks of
    convolutional neural networks (CNNs) for improving object recognition. While
    recent research focused on either propagating the context from all layers, e.g.
    ResNet, (including the very low-level layers) or having multiple loss layers
    (e.g. GoogLeNet), the importance of the features close to the higher layers is
    ignored. This paper postulates that the use of context closer to the high-level
    layers provides the scale and translation invariance and works better than
    using the top layer only. In particular, we extend AlexNet and GoogLeNet by
    additional connections in the top (n) layers. In order to demonstrate the
    effectiveness of the proposed approach, we evaluated it on the standard
    ImageNet task. The relative reduction of the classification error is around
    1-2% without affecting the computational cost. Furthermore, we show that this
    approach is orthogonal to typical test data augmentation techniques, as
    recently introduced by Szegedy et al. (leading to a runtime reduction of 144
    during test time).

    Zero-Shot Learning by Generating Pseudo Feature Representations

    Jiang Lu, Jin Li, Ziang Yan, Changshui Zhang
    Comments: 9 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Zero-shot learning (ZSL) is a challenging task aiming at recognizing novel
    classes without any training instances. In this paper we present a simple but
    high-performance ZSL approach by generating pseudo feature representations
    (GPFR). Given the dataset of seen classes and side information of unseen
    classes (e.g. attributes), we synthesize feature-level pseudo representations
    for novel concepts, which allows us access to the formulation of unseen class
    predictor. Firstly we design a Joint Attribute Feature Extractor (JAFE) to
    acquire understandings about attributes, then construct a cognitive repository
    of attributes filtered by confidence margins, and finally generate pseudo
    feature representations using a probability based sampling strategy to
    facilitate subsequent training process of class predictor. We demonstrate the
    effectiveness in ZSL settings and the extensibility in supervised recognition
    scenario of our method on a synthetic colored MNIST dataset (C-MNIST). For
    several popular ZSL benchmark datasets, our approach also shows compelling
    results on zero-shot recognition task, especially leading to tremendous
    improvement to state-of-the-art mAP on zero-shot retrieval task.

    Direct Monocular Odometry Using Points and Lines

    Shichao Yang, Sebastian Scherer
    Comments: ICRA 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    Most visual odometry algorithm for a monocular camera focuses on points,
    either by feature matching, or direct alignment of pixel intensity, while
    ignoring a common but important geometry entity: edges. In this paper, we
    propose an odometry algorithm that combines points and edges to benefit from
    the advantages of both direct and feature based methods. It works better in
    texture-less environments and is also more robust to lighting changes and fast
    motion by increasing the convergence basin. We maintain a depth map for the
    keyframe then in the tracking part, the camera pose is recovered by minimizing
    both the photometric error and geometric error to the matched edge in a
    probabilistic framework. In the mapping part, edge is used to speed up and
    increase stereo matching accuracy. On various public datasets, our algorithm
    achieves better or comparable performance than state-of-the-art monocular
    odometry methods. In some challenging texture-less environments, our algorithm
    reduces the state estimation error over 50%.

    Recent Advances in Features Extraction and Description Algorithms: A Comprehensive Survey

    Ehab Salahat, Murad Qasaimeh
    Comments: Annual IEEE Industrial Electronics Societys 18th International Conf. on Industrial Technology (ICIT), 22-25 March, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

    Computer vision is one of the most active research fields in information
    technology today. Giving machines and robots the ability to see and comprehend
    the surrounding world at the speed of sight creates endless potential
    applications and opportunities. Feature detection and description algorithms
    can be indeed considered as the retina of the eyes of such machines and robots.
    However, these algorithms are typically computationally intensive, which
    prevents them from achieving the speed of sight real-time performance. In
    addition, they differ in their capabilities and some may favor and work better
    given a specific type of input compared to others. As such, it is essential to
    compactly report their pros and cons as well as their performances and recent
    advances. This paper is dedicated to provide a comprehensive overview on the
    state-of-the-art and recent advances in feature detection and description
    algorithms. Specifically, it starts by overviewing fundamental concepts. It
    then compares, reports and discusses their performance and capabilities. The
    Maximally Stable Extremal Regions algorithm and the Scale Invariant Feature
    Transform algorithms, being two of the best of their type, are selected to
    report their recent algorithmic derivatives.

    Weakly-supervised DCNN for RGB-D Object Recognition in Real-World Applications Which Lack Large-scale Annotated Training Data

    Li Sun, Cheng Zhao, Rustam Stolkin
    Comments: 8 pages, 5 figures, submitted to conference
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper addresses the problem of RGBD object recognition in real-world
    applications, where large amounts of annotated training data are typically
    unavailable. To overcome this problem, we propose a novel, weakly-supervised
    learning architecture (DCNN-GPC) which combines parametric models (a pair of
    Deep Convolutional Neural Networks (DCNN) for RGB and D modalities) with
    non-parametric models (Gaussian Process Classification). Our system is
    initially trained using a small amount of labeled data, and then automatically
    prop- agates labels to large-scale unlabeled data. We first run 3D- based
    objectness detection on RGBD videos to acquire many unlabeled object proposals,
    and then employ DCNN-GPC to label them. As a result, our multi-modal DCNN can
    be trained end-to-end using only a small amount of human annotation. Finally,
    our 3D-based objectness detection and multi-modal DCNN are integrated into a
    real-time detection and recognition pipeline. In our approach, bounding-box
    annotations are not required and boundary-aware detection is achieved. We also
    propose a novel way to pretrain a DCNN for the depth modality, by training on
    virtual depth images projected from CAD models. We pretrain our multi-modal
    DCNN on public 3D datasets, achieving performance comparable to
    state-of-the-art methods on Washington RGBS Dataset. We then finetune the
    network by further training on a small amount of annotated data from our novel
    dataset of industrial objects (nuclear waste simulants). Our weakly supervised
    approach has demonstrated to be highly effective in solving a novel RGBD object
    recognition application which lacks of human annotations.

    PatternNet: Visual Pattern Mining with Deep Neural Network

    Hongzhi Li, Joseph G. Ellis, Lei Zhang, Shih-Fu Chang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Visual patterns represent the discernible regularity in the visual world.
    They capture the essential nature of visual objects or scenes. Understanding
    and modeling visual patterns is a fundamental problem in visual recognition
    that has wide ranging applications. In this paper, we study the problem of
    visual pattern mining and propose a novel deep neural network architecture
    called PatternNet for discovering these patterns that are both discriminative
    and representative. The proposed PatternNet leverages the filters in the last
    convolution layer of a convolutional neural network to find locally consistent
    visual patches, and by combining these filters we can effectively discover
    unique visual patterns. In addition, PatternNet can discover visual patterns
    efficiently without performing expensive image patch sampling, and this
    advantage provides an order of magnitude speedup compared to most other
    approaches. We evaluate the proposed PatternNet subjectively by showing
    randomly selected visual patterns which are discovered by our method and
    quantitatively by performing image classification with the identified visual
    patterns and comparing our performance with the current state-of-the-art. We
    also directly evaluate the quality of the discovered visual patterns by
    leveraging the identified patterns as proposed objects in an image and compare
    with other relevant methods. Our proposed network and procedure, PatterNet, is
    able to outperform competing methods for the tasks described.

    Recognition in-the-Tail: Training Detectors for Unusual Pedestrians with Synthetic Imposters

    Shiyu Huang, Deva Ramanan
    Comments: To appear in CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    As autonomous vehicles become an every-day reality, high-accuracy pedestrian
    detection is of paramount practical importance. Pedestrian detection is a
    highly researched topic with mature methods, but most datasets focus on common
    scenes of people engaged in typical walking poses on sidewalks. But performance
    is most crucial for dangerous scenarios, such as children playing in the street
    or people using bicycles/skateboards in unexpected ways. Such “in-the-tail”
    data is notoriously hard to observe, making both training and testing
    difficult. To analyze this problem, we have collected a novel annotated dataset
    of dangerous scenarios called the Precarious Pedestrian dataset. Even given a
    dedicated collection effort, it is relatively small by contemporary standards
    (around 1000 images). To explore large-scale data-driven learning, we explore
    the use of synthetic data generated by a game engine. A significant challenge
    is selected the right “priors” or parameters for synthesis: we would like
    realistic data with realistic poses and object configurations. Inspired by
    Generative Adversarial Networks, we generate a massive amount of synthetic data
    and train a discriminative classifier to select a realistic subset, which we
    deem Synthetic Imposters. We demonstrate that this pipeline allows one to
    generate realistic training data by making use of rendering/animation engines.
    Interestingly, we also demonstrate that such data can be used to rank
    algorithms, suggesting that Synthetic Imposters can also be used for
    “in-the-tail” validation at test-time, a notoriously difficult challenge for
    real-world deployment.

    Single image super-resolution using self-optimizing mask via fractional-order gradient interpolation and reconstruction

    Qi Yang, Yanzhu Zhang, Tiebiao Zhao, YangQuan Chen
    Comments: 24 pages, 13 figures, it is to appear in ISA Transactions
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Image super-resolution using self-optimizing mask via fractional-order
    gradient interpolation and reconstruction aims to recover detailed information
    from low-resolution images and reconstruct them into high-resolution images.
    Due to the limited amount of data and information retrieved from low-resolution
    images, it is difficult to restore clear, artifact-free images, while still
    preserving enough structure of the image such as the texture. This paper
    presents a new single image super-resolution method which is based on adaptive
    fractional-order gradient interpolation and reconstruction. The interpolated
    image gradient via optimal fractional-order gradient is first constructed
    according to the image similarity and afterwards the minimum energy function is
    employed to reconstruct the final high-resolution image. Fractional-order
    gradient based interpolation methods provide an additional degree of freedom
    which helps optimize the implementation quality due to the fact that an extra
    free parameter (alpha)-order is being used. The proposed method is able to
    produce a rich texture detail while still being able to maintain structural
    similarity even under large zoom conditions. Experimental results show that the
    proposed method performs better than current single image super-resolution
    techniques.

    A Fast HOG Descriptor Using Lookup Table and Integral Image

    Chunde Huang, Jiaxiang Huang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The histogram of oriented gradients (HOG) is a widely used feature descriptor
    in computer vision for the purpose of object detection. In the paper, a
    modified HOG descriptor is described, it uses a lookup table and the method of
    integral image to speed up the detection performance by a factor of 5~10. By
    exploiting the special hardware features of a given platform(e.g. a digital
    signal processor), further improvement can be made to the HOG descriptor in
    order to have real-time object detection and tracking.

    Towards Context-aware Interaction Recognition

    Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Ian Reid
    Comments: 12 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recognizing how objects interact with each other is a crucial task in visual
    recognition. If we define the context of the interaction to be the objects
    involved, then most current methods can be categorized as either: (i) training
    a single classifier on the combination of the interaction and its context; or
    (ii) aiming to recognize the interaction independently of its explicit context.
    Both methods suffer limitations: the former scales poorly with the number of
    combinations and fails to generalize to unseen combinations, while the latter
    often leads to poor interaction recognition performance due to the difficulty
    of designing a context-independent interaction classifier. To mitigate those
    drawbacks, this paper proposes an alternative, context-aware interaction
    recognition framework. The key to our method is to explicitly construct an
    interaction classifier which combines the context, and the interaction. The
    context is encoded via word2vec into a semantic space, and is used to derive a
    classification result for the interaction.

    The proposed method still builds one classifier for one interaction (as per
    type (ii) above), but the classifier built is adaptive to context via weights
    which are context dependent. The benefit of using the semantic space is that it
    naturally leads to zero-shot generalizations in which semantically similar
    contexts (subjectobject pairs) can be recognized as suitable contexts for an
    interaction, even if they were not observed in the training set.

    RoomNet: End-to-End Room Layout Estimation

    Chen-Yu Lee, Vijay Badrinarayanan, Tomasz Malisiewicz, Andrew Rabinovich
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper focuses on the task of room layout estimation from a monocular RGB
    image. Prior works break the problem into two sub-tasks: semantic segmentation
    of floor, walls, ceiling to produce layout hypotheses, followed by an iterative
    optimization step to rank these hypotheses. In contrast, we adopt a more direct
    formulation of this problem as one of estimating an ordered set of room layout
    keypoints. The room layout and the corresponding segmentation is completely
    specified given the locations of these ordered keypoints. We predict the
    locations of the room layout keypoints using RoomNet, an end-to-end trainable
    encoder-decoder network. On the challenging benchmark datasets Hedau and LSUN,
    we achieve state-of-the-art performance along with 200x to 600x speedup
    compared to the most recent work. Additionally, we present optional extensions
    to the RoomNet architecture such as including recurrent computations and memory
    units to refine the keypoint locations under the same parametric capacity.

    Recurrent Models for Situation Recognition

    Arun Mallya, Svetlana Lazebnik
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This work proposes Recurrent Neural Network (RNN) models to predict
    structured image situations – actions and noun entities fulfilling semantic
    roles related to the action. We transform the task of noun entity prediction to
    that of sequential prediction and use RNNs in contrast to prior work that uses
    Conditional Random Fields (CRFs). By using an action prediction network and a
    separate network with an RNN for noun prediction, we obtain state-of-the-art
    accuracy on the challenging recent imSitu dataset, beating CRF-based models,
    including ones trained with additional data. Further, we show that specialized
    features learned from situation prediction can be transferred to the task of
    image captioning to more accurately describe human-object interactions.

    Deformable Convolutional Networks

    Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional neural networks (CNNs) are inherently limited to model
    geometric transformations due to the fixed geometric structures in its building
    modules. In this work, we introduce two new modules to enhance the
    transformation modeling capacity of CNNs, namely, deformable convolution and
    deformable RoI pooling. Both are based on the idea of augmenting the spatial
    sampling locations in the modules with additional offsets and learning the
    offsets from target tasks, without additional supervision. The new modules can
    readily replace their plain counterparts in existing CNNs and can be easily
    trained end-to-end by standard back-propagation, giving rise to deformable
    convolutional networks. Extensive experiments validate the effectiveness of our
    approach on sophisticated vision tasks of object detection and semantic
    segmentation. The code would be released.

    TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals

    Jiyang Gao, Zhenheng Yang, Chen Sun, Kan Chen, Ram Nevatia
    Comments: 8 Pages , 7 Figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Temporal Action Proposal (TAP) generation is an important problem, as fast
    and accurate extraction of semantically important (e.g. human actions) segments
    from untrimmed videos is an important step for large-scale video analysis. We
    propose a novel Temporal Unit Regression Network (TURN) model. There are two
    salient aspects of TURN: (1) TURN jointly predicts action proposals and refines
    the temporal boundaries by temporal coordinate regression; (2) Fast computation
    is enabled by unit feature reuse: a long untrimmed video is decomposed into
    video units, which are reused as basic building blocks of temporal proposals.
    TURN outperforms the state-of-the-art methods under average recall (AR) by a
    large margin on THUMOS-14 and ActivityNet datasets, and runs at over 880 frames
    per second (FPS) on a TITAN X GPU. We further apply TURN as a proposal
    generation stage for existing temporal action localization pipelines, it
    outperforms state-of-the-art performance on THUMOS-14 and ActivityNet.

    Hyperspectral Unmixing with Endmember Variability using Semi-supervised Partial Membership Latent Dirichlet Allocation

    Sheng Zou, Hao Sun, Alina Zare
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A semi-supervised Partial Membership Latent Dirichlet Allocation approach is
    developed for hyperspectral unmixing and endmember estimation while accounting
    for spectral variability and spatial information. Partial Membership Latent
    Dirichlet Allocation is an effective approach for spectral unmixing while
    representing spectral variability and leveraging spatial information. In this
    work, we extend Partial Membership Latent Dirichlet Allocation to incorporate
    any available (imprecise) label information to help guide unmixing.
    Experimental results on two hyperspectral datasets show that the proposed
    semi-supervised PM-LDA can yield improved hyperspectral unmixing and endmember
    estimation results.

    Deciding How to Decide: Dynamic Routing in Artificial Neural Networks

    Mason McGill, Pietro Perona
    Comments: Submitted to ICML 2017
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose and systematically evaluate three strategies for training
    dynamically-routed artificial neural networks: graphs of learned
    transformations through which different input signals may take different paths.
    Though some approaches have advantages over others, the resulting networks are
    often qualitatively similar. We find that, in dynamically-routed networks
    trained to classify images, layers and branches become specialized to process
    distinct categories of images. Additionally, given a fixed computational
    budget, dynamically-routed networks tend to perform better than comparable
    statically-routed networks.


    Artificial Intelligence

    Foundations for a Probabilistic Event Calculus

    Fabio Aurelio D'Asaro, Antonis Bikakis, Luke Dickens, Rob Miller
    Comments: Technical report
    Subjects: Artificial Intelligence (cs.AI)

    We present PEC, an Event Calculus (EC) style action language for reasoning
    about probabilistic causal and narrative information. It has an action language
    style syntax similar to that of the EC variant Modular-E. Its semantics is
    given in terms of possible worlds which constitute possible evolutions of the
    domain, and builds on that of EFEC, an epistemic extension of EC. We also
    describe an ASP implementation of PEC and show the sense in which this is sound
    and complete.

    QMDP-Net: Deep Learning for Planning under Partial Observability

    Peter Karkus, David Hsu, Wee Sun Lee
    Comments: 9 pages, 5 figures, 1 table
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    This paper introduces QMDP-net, a neural network architecture for planning
    under partial observability. The QMDP-net combines the strengths of model-free
    learning and model-based planning. It is a recurrent policy network, but it
    represents a policy by connecting a model with a planning algorithm that solves
    the model, thus embedding the solution structure of planning in the network
    architecture. The QMDP-net is fully differentiable and allows end-to-end
    training. We train a QMDP-net over a set of different environments so that it
    can generalize over new ones. In preliminary experiments, QMDP-net showed
    strong performance on several robotic tasks in simulation. Interestingly, it
    also sometimes outperformed the QMDP algorithm, which generated the data for
    learning, because of QMDP-net’s robustness resulting from end-to-end learning.

    Towards a Quantum World Wide Web

    Diederik Aerts, Jonito Aerts Arguelles, Lester Beltran, Lyneth Beltran, Isaac Distrito, Massimiliano Sassoli de Bianchi, Sandro Sozzo, Tomas Veloz
    Comments: 20 pages, no figures
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Quantum Physics (quant-ph)

    We elaborate a quantum model for corpora of written documents, like the pages
    forming the World Wide Web. To that end, we are guided by how physicists
    constructed quantum theory for microscopic entities, which unlike classical
    objects cannot be fully represented in our spatial theater. We suggest that a
    similar construction needs to be carried out by linguists and computational
    scientists, to capture the full meaning content of collections of documental
    entities. More precisely, we show how to associate a quantum-like ‘entity of
    meaning’ to a ‘language entity formed by printed documents’, considering the
    latter as the collection of traces that are left by the former, in specific
    results of search actions that we describe as measurements. In other words, we
    offer a perspective where a collection of documents, like the Web, is described
    as the space of manifestation of a more complex entity – the QWeb – which is
    the object of our modeling, drawing its inspiration from previous studies on
    operational-realistic approaches to quantum physics and quantum modeling of
    human cognition and decision-making. We emphasize that a consistent QWeb model
    needs to account for the observed correlations between words appearing in
    printed documents, e.g., co-occurrences, as the latter would depend on the
    ‘meaning connections’ existing between the concepts that are associated with
    these words. In that respect, we show that both ‘context and interference
    (quantum) effects’ are required to explain the probabilities calculated by
    counting the relative number of documents containing certain words and
    co-ocurrrences of words.

    Artificial Intelligence and Economic Theories

    Tshilidzi Marwala, Evan Hurwitz
    Comments: Marwala, T. and Hurwitz, E. (2017) Artificial Intelligence and Economic Theory: Skynet in the Market. Springer. (Accepted)
    Subjects: Artificial Intelligence (cs.AI)

    The advent of artificial intelligence has changed many disciplines such as
    engineering, social science and economics. Artificial intelligence is a
    computational technique which is inspired by natural intelligence such as the
    swarming of birds, the working of the brain and the pathfinding of the ants.
    These techniques have impact on economic theories. This book studies the impact
    of artificial intelligence on economic theories, a subject that has not been
    extensively studied. The theories that are considered are: demand and supply,
    asymmetrical information, pricing, rational choice, rational expectation, game
    theory, efficient market hypotheses, mechanism design, prospect, bounded
    rationality, portfolio theory, rational counterfactual and causality. The
    benefit of this book is that it evaluates existing theories of economics and
    update them based on the developments in artificial intelligence field.

    Evidence Updating for Stream-Processing in Big-Data: Robust Conditioning in Soft and Hard Fusion Environments

    Thanuka Wickramarathne
    Subjects: Artificial Intelligence (cs.AI)

    Conditioning is the primary method for belief revision in data fusion systems
    employing probabilistic inferencing. However, big-data environments, where soft
    (i.e., human or human-based) sources are commonly utilized in addition to hard
    (i.e., physics-based sensors, pose several challenges to traditional
    conditioning tasks primarily due to the numerous data/source imperfections that
    are characteristic of such data. The objective of this paper is to investigate
    the most natural extension of Bayes conditioning based evidence updates in the
    presence of such large-scale data uncertainties and source/sensor
    imperfections. By viewing the evidence updating process as a thought
    experiment, we devise an elegant strategy for robust evidence updating in the
    presence of extreme uncertainties characteristic of big-data environments. In
    particular, we look at the formulation of a belief theoretic evidence updating
    mechanism that is derived as a natural extension of Bayes conditional approach
    when the incoming evidence takes the form of a general belief function.
    Proposed method generalizes the belief theoretic Fagin-Halpern conditional
    notion, and provides a novel evidence updating strategy that is derived as a
    natural extension of Bayes conditional applied in a highly uncertain and
    complex fusion scenario that is characteristic of big-data environments. The
    presented extension differs fundamentally from the previously published work on
    Conditional Update Equation (CUE) as well as authors own work. An overview of
    this development is provided via illustrative examples. Furthermore, insights
    into parameter selection under various fusion contexts are also provided.

    Multi-Timescale, Gradient Descent, Temporal Difference Learning with Linear Options

    Peeyush Kumar, Doina Precup
    Subjects: Artificial Intelligence (cs.AI)

    Deliberating on large or continuous state spaces have been long standing
    challenges in reinforcement learning. Temporal Abstraction have somewhat made
    this possible, but efficiently planing using temporal abstraction still remains
    an issue. Moreover using spatial abstractions to learn policies for various
    situations at once while using temporal abstraction models is an open problem.
    We propose here an efficient algorithm which is convergent under linear
    function approximation while planning using temporally abstract actions. We
    show how this algorithm can be used along with randomly generated option models
    over multiple time scales to plan agents which need to act real time. Using
    these randomly generated option models over multiple time scales are shown to
    reduce number of decision epochs required to solve the given task, hence
    effectively reducing the time needed for deliberation.

    Goal Conflict in Designing an Autonomous Artificial System

    Mark Muraven
    Subjects: Artificial Intelligence (cs.AI)

    Research on human self-regulation has shown that people hold many goals
    simultaneously and have complex self-regulation mechanisms to deal with this
    goal conflict. Artificial autonomous systems may also need to find ways to cope
    with conflicting goals. Indeed, the intricate interplay among different goals
    may be critical to the design as well as long-term safety and stability of
    artificial autonomous systems. I discuss some of the critical features of the
    human self-regulation system and how it might be applied to an artificial
    system. Furthermore, the implications of goal conflict for the reliability and
    stability of artificial autonomous systems and ensuring their alignment with
    human goals and ethics is examined.

    Solving the Goddard problem by an influence diagram

    Jiří Vomlel, Václav Kratochvíl
    Comments: 10 pages, 2 figures
    Subjects: Artificial Intelligence (cs.AI)

    Influence diagrams are a decision-theoretic extension of probabilistic
    graphical models. In this paper we show how they can be used to solve the
    Goddard problem. We present results of numerical experiments with this problem
    and compare the solutions provided by influence diagrams with the optimal
    solution.

    Evolving Game Skill-Depth using General Video Game AI Agents

    Jialin Liu, Julian Togelius, Diego Perez-Liebana, Simon M. Lucas
    Comments: 9 pages, 17 figures, CEC2017
    Subjects: Artificial Intelligence (cs.AI)

    Most games have, or can be generalised to have, a number of parameters that
    may be varied in order to provide instances of games that lead to very
    different player experiences. The space of possible parameter settings can be
    seen as a search space, and we can therefore use a Random Mutation Hill
    Climbing algorithm or other search methods to find the parameter settings that
    induce the best games. One of the hardest parts of this approach is defining a
    suitable fitness function. In this paper we explore the possibility of using
    one of a growing set of General Video Game AI agents to perform automatic
    play-testing. This enables a very general approach to game evaluation based on
    estimating the skill-depth of a game. Agent-based play-testing is
    computationally expensive, so we compare two simple but efficient optimisation
    algorithms: the Random Mutation Hill-Climber and the Multi-Armed Bandit Random
    Mutation Hill-Climber. For the test game we use a space-battle game in order to
    provide a suitable balance between simulation speed and potential skill-depth.
    Results show that both algorithms are able to rapidly evolve game versions with
    significant skill-depth, but that choosing a suitable resampling number is
    essential in order to combat the effects of noise.

    Cooperating with Machines

    Jacob W. Crandall, Mayada Oudah, Tennom, Fatimah Ishowo-Oloko, Sherief Abdallah, Jean-François Bonnefon, Manuel Cebrian, Azim Shariff, Michael A. Goodrich, Iyad Rahwan
    Subjects: Artificial Intelligence (cs.AI)

    Since Alan Turing envisioned Artificial Intelligence (AI) [1], a major
    driving force behind technical progress has been competition with human
    cognition. Historical milestones have been frequently associated with computers
    matching or outperforming humans in difficult cognitive tasks (e.g. face
    recognition [2], personality classification [3], driving cars [4], or playing
    video games [5]), or defeating humans in strategic zero-sum encounters (e.g.
    Chess [6], Checkers [7], Jeopardy! [8], Poker [9], or Go [10]). In contrast,
    less attention has been given to developing autonomous machines that establish
    mutually cooperative relationships with people who may not share the machine’s
    preferences. A main challenge has been that human cooperation does not require
    sheer computational power, but rather relies on intuition [11], cultural norms
    [12], emotions and signals [13, 14, 15, 16], and pre-evolved dispositions
    toward cooperation [17], common-sense mechanisms that are difficult to encode
    in machines for arbitrary contexts. Here, we combine a state-of-the-art
    machine-learning algorithm with novel mechanisms for generating and acting on
    signals to produce a new learning algorithm that cooperates with people and
    other machines at levels that rival human cooperation in a variety of
    two-player repeated stochastic games. This is the first general-purpose
    algorithm that is capable, given a description of a previously unseen game
    environment, of learning to cooperate with people within short timescales in
    scenarios previously unanticipated by algorithm designers. This is achieved
    without complex opponent modeling or higher-order theories of mind, thus
    showing that flexible, fast, and general human-machine cooperation is
    computationally achievable using a non-trivial, but ultimately simple, set of
    algorithmic mechanisms.

    Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

    Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra
    Comments: 11 pages, 4 figures, 2 tables, webpage: this http URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    We introduce the first goal-driven training for visual question answering and
    dialog agents. Specifically, we pose a cooperative ‘image guessing’ game
    between two agents — Qbot and Abot — who communicate in natural language
    dialog so that Qbot can select an unseen image from a lineup of images. We use
    deep reinforcement learning (RL) to learn the policies of these agents
    end-to-end — from pixels to multi-agent multi-round dialog to game reward.

    We demonstrate two experimental results.

    First, as a ‘sanity check’ demonstration of pure RL (from scratch), we show
    results on a synthetic world, where the agents communicate in ungrounded
    vocabulary, i.e., symbols with no pre-specified meanings (X, Y, Z). We find
    that two bots invent their own communication protocol and start using certain
    symbols to ask/answer about certain visual attributes (shape/color/size). Thus,
    we demonstrate the emergence of grounded language and communication among
    ‘visual’ dialog agents with no human supervision at all.

    Second, we conduct large-scale real-image experiments on the VisDial dataset,
    where we pretrain with supervised dialog data and show that the RL ‘fine-tuned’
    agents significantly outperform SL agents. Interestingly, the RL Qbot learns to
    ask questions that Abot is good at, ultimately resulting in more informative
    dialog and a better team.

    Deep Neural Networks for Semantic Segmentation of Multispectral Remote Sensing Imagery

    Ronald Kemker, Christopher Kanan
    Comments: 9 pages, submitted to ICCV 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    A semantic segmentation algorithm must assign a label to every pixel in an
    image. Recently, semantic segmentation of RGB imagery has advanced
    significantly due to deep learning. Because creating datasets for semantic
    segmentation is laborious, these datasets tend to be significantly smaller than
    object recognition datasets. This makes it difficult to directly train a deep
    neural network for semantic segmentation, because it will be prone to
    overfitting. To cope with this, deep learning models typically use
    convolutional neural networks pre-trained on large-scale image classification
    datasets, which are then fine-tuned for semantic segmentation. For non-RGB
    imagery, this is currently not possible because large-scale labeled non-RGB
    datasets do not exist. In this paper, we developed two deep neural networks for
    semantic segmentation of multispectral remote sensing imagery. Prior to
    training on the target dataset, we initialize the networks with large amounts
    of synthetic multispectral imagery. We show that this significantly improves
    results on real-world remote sensing imagery, and we establish a new
    state-of-the-art result on the challenging Hamlin Beach State Park Dataset.

    Recognition in-the-Tail: Training Detectors for Unusual Pedestrians with Synthetic Imposters

    Shiyu Huang, Deva Ramanan
    Comments: To appear in CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    As autonomous vehicles become an every-day reality, high-accuracy pedestrian
    detection is of paramount practical importance. Pedestrian detection is a
    highly researched topic with mature methods, but most datasets focus on common
    scenes of people engaged in typical walking poses on sidewalks. But performance
    is most crucial for dangerous scenarios, such as children playing in the street
    or people using bicycles/skateboards in unexpected ways. Such “in-the-tail”
    data is notoriously hard to observe, making both training and testing
    difficult. To analyze this problem, we have collected a novel annotated dataset
    of dangerous scenarios called the Precarious Pedestrian dataset. Even given a
    dedicated collection effort, it is relatively small by contemporary standards
    (around 1000 images). To explore large-scale data-driven learning, we explore
    the use of synthetic data generated by a game engine. A significant challenge
    is selected the right “priors” or parameters for synthesis: we would like
    realistic data with realistic poses and object configurations. Inspired by
    Generative Adversarial Networks, we generate a massive amount of synthetic data
    and train a discriminative classifier to select a realistic subset, which we
    deem Synthetic Imposters. We demonstrate that this pipeline allows one to
    generate realistic training data by making use of rendering/animation engines.
    Interestingly, we also demonstrate that such data can be used to rank
    algorithms, suggesting that Synthetic Imposters can also be used for
    “in-the-tail” validation at test-time, a notoriously difficult challenge for
    real-world deployment.

    Deep Decentralized Multi-task Multi-Agent RL under Partial Observability

    Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P. How, John Vian
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Many real-world tasks involve multiple agents with partial observability and
    limited communication. Learning is challenging in these settings due to local
    viewpoints of agents, which perceive the world as non-stationary due to
    concurrently-exploring teammates. Approaches that learn specialized policies
    for individual tasks face major problems when applied to the real world: not
    only do agents have to learn and store a distinct policy for each task, but in
    practice the identity of the task is often non-observable, making these
    algorithms inapplicable. This paper formalizes and addresses the problem of
    multi-task multi-agent reinforcement learning under partial observability. We
    introduce a decentralized single-task learning approach that is robust to
    concurrent interactions of teammates, and present an approach for distilling
    single-task policies into a unified policy that performs well across multiple
    related tasks, without explicit provision of task identity.


    Information Retrieval

    Automatic Text Summarization Approaches to Speed up Topic Model Learning Process

    Mohamed Morchid, Juan-Manuel Torres-Moreno, Richard Dufour, Javier Ramírez-Rodríguez, Georges Linarès
    Comments: 16 pages, 4 tables, 8 figures
    Journal-ref: International Journal of Computational Linguistics and
    Applications, 7(2):87-109, 2016
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    The number of documents available into Internet moves each day up. For this
    reason, processing this amount of information effectively and expressibly
    becomes a major concern for companies and scientists. Methods that represent a
    textual document by a topic representation are widely used in Information
    Retrieval (IR) to process big data such as Wikipedia articles. One of the main
    difficulty in using topic model on huge data collection is related to the
    material resources (CPU time and memory) required for model estimate. To deal
    with this issue, we propose to build topic spaces from summarized documents. In
    this paper, we present a study of topic space representation in the context of
    big data. The topic space representation behavior is analyzed on different
    languages. Experiments show that topic spaces estimated from text summaries are
    as relevant as those estimated from the complete documents. The real advantage
    of such an approach is the processing time gain: we showed that the processing
    time can be drastically reduced using summarized documents (more than 60\% in
    general). This study finally points out the differences between thematic
    representations of documents depending on the targeted languages such as
    English or latin languages.

    Paper2vec: Citation-Context Based Document Distributed Representation for Scholar Recommendation

    Han Tian, Hankz Hankui Zhuo
    Subjects: Information Retrieval (cs.IR)

    Due to the availability of references of research papers and the rich
    information contained in papers, various citation analysis approaches have been
    proposed to identify similar documents for scholar recommendation. Despite of
    the success of previous approaches, they are, however, based on co-occurrence
    of items. Once there are no co-occurrence items available in documents, they
    will not work well. Inspired by distributed representations of words in the
    literature of natural language processing, we propose a novel approach to
    measuring the similarity of papers based on distributed representations learned
    from the citation context of papers. We view the set of papers as the
    vocabulary, define the weighted citation context of papers, and convert it to
    weight matrix similar to the word-word cooccurrence matrix in natural language
    processing. After that we explore a variant of matrix factorization approach to
    train distributed representations of papers on the matrix, and leverage the
    distributed representations to measure similarities of papers. In the
    experiment, we exhibit that our approach outperforms state-of-theart
    citation-based approaches by 25%, and better than other distributed
    representation based methods.

    Deep Tensor Encoding

    B Sengupta, E Vasquez, Y Qian
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)

    Learning an encoding of feature vectors in terms of an over-complete
    dictionary or a probabilistic information geometric (Fisher vectors) construct
    is wide-spread in statistical signal processing and computer vision. In content
    based information retrieval using deep-learning classifiers, such encodings are
    learnt on the flattened last layer, without adherence to the multi-linear
    structure of the underlying feature tensor. We illustrate a variety of feature
    encodings incl. sparse dictionary coding and Fisher vectors along with
    proposing that a structured tensor factorization scheme enables us to perform
    retrieval that is at par, in terms of average precision, with Fisher vector
    encoded image signatures. In short, we illustrate how structural constraints
    increase retrieval fidelity.

    Effective Evaluation using Logged Bandit Feedback from Multiple Loggers

    Aman Agarwal, Soumya Basu, Tobias Schnabel, Thorsten Joachims
    Subjects: Learning (cs.LG); Information Retrieval (cs.IR)

    Accurately evaluating new policies (e.g. ad-placement models, ranking
    functions, recommendation functions) is one of the key prerequisites for
    improving interactive systems. While the conventional approach to evaluation
    relies on online A/B tests, recent work has shown that counterfactual
    estimators can provide an inexpensive and fast alternative, since they can be
    applied offline using log data that was collected from a different policy
    fielded in the past. In this paper, we address the question of how to estimate
    the performance of a new policy when we have log data from multiple historic
    policies. This question is of great relevance in practice, since policies get
    updated frequently in most online systems. We show that naively combining data
    from multiple logging policies can be highly suboptimal. In particular, we find
    that the standard Inverse Propensity Score (IPS) estimator suffers especially
    when logging and evaluation policies diverge — to a point where throwing away
    data improves the variance of the estimator. We therefore propose two
    alternative estimators which we characterize theoretically and compare
    experimentally. We find that the new estimators can provide substantially
    improved estimation accuracy.


    Computation and Language

    Native Language Identification using Stacked Generalization

    Shervin Malmasi, Mark Dras
    Subjects: Computation and Language (cs.CL)

    Ensemble methods using multiple classifiers have proven to be the most
    successful approach for the task of Native Language Identification (NLI),
    achieving the current state of the art. However, a systematic examination of
    ensemble methods for NLI has yet to be conducted. Additionally, deeper ensemble
    architectures such as classifier stacking have not been closely evaluated. We
    present a set of experiments using three ensemble-based models, testing each
    with multiple configurations and algorithms. This includes a rigorous
    application of meta-classification models for NLI, achieving state-of-the-art
    results on three datasets from different languages. We also present the first
    use of statistical significance testing for comparing NLI systems, showing that
    our results are significantly better than the previous state of the art. We
    make available a collection of test set predictions to facilitate future
    statistical tests.

    Métodos de Otimização Combinatória Aplicados ao Problema de Compressão MultiFrases

    Elvys Linhares Pontes, Thiago Gouveia da Silva, Andréa Carneiro Linhares, Juan-Manuel Torres-Moreno, Stéphane Huet
    Comments: 12 pages, 1 figure, 3 tables (paper in Portuguese), Preprint of XLVIII Simp’osio Brasileiro de Pesquisa Operacional, 2016, Vit’oria, ES, (Brazil)
    Subjects: Computation and Language (cs.CL)

    The Internet has led to a dramatic increase in the amount of available
    information. In this context, reading and understanding this flow of
    information have become costly tasks. In the last years, to assist people to
    understand textual data, various Natural Language Processing (NLP) applications
    based on Combinatorial Optimization have been devised. However, for
    Multi-Sentences Compression (MSC), method which reduces the sentence length
    without removing core information, the insertion of optimization methods
    requires further study to improve the performance of MSC. This article
    describes a method for MSC using Combinatorial Optimization and Graph Theory to
    generate more informative sentences while maintaining their grammaticality. An
    experiment led on a corpus of 40 clusters of sentences shows that our system
    has achieved a very good quality and is better than the state-of-the-art.

    Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks

    Zhilin Yang, Ruslan Salakhutdinov, William W. Cohen
    Comments: Accepted as a conference paper at ICLR 2017. This is an extended version of the original paper (this https URL). The original paper proposes a new architecture, while this version focuses on transfer learning for a general model class
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Recent papers have shown that neural networks obtain state-of-the-art
    performance on several different sequence tagging tasks. One appealing property
    of such systems is their generality, as excellent performance can be achieved
    with a unified architecture and without task-specific feature engineering.
    However, it is unclear if such systems can be used for tasks without large
    amounts of training data. In this paper we explore the problem of transfer
    learning for neural sequence taggers, where a source task with plentiful
    annotations (e.g., POS tagging on Penn Treebank) is used to improve performance
    on a target task with fewer available annotations (e.g., POS tagging for
    microblogs). We examine the effects of transfer learning for deep hierarchical
    recurrent networks across domains, applications, and languages, and show that
    significant improvement can often be obtained. These improvements lead to
    improvements over the current state-of-the-art on several well-studied tasks.

    I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation

    Hao Dong, Jingqing Zhang, Douglas McIlwraith, Yike Guo
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    Translating information between text and image is a fundamental problem in
    artificial intelligence that connects natural language processing and computer
    vision. In the past few years, performance in image caption generation has seen
    significant improvement through the adoption of recurrent neural networks
    (RNN). Meanwhile, text-to-image generation begun to generate plausible images
    using datasets of specific categories like birds and flowers. We’ve even seen
    image generation from multi-category datasets such as the Microsoft Common
    Objects in Context (MSCOCO) through the use of generative adversarial networks
    (GANs). Synthesizing objects with a complex shape, however, is still
    challenging. For example, animals and humans have many degrees of freedom,
    which means that they can take on many complex shapes. We propose a new
    training method called Image-Text-Image (I2T2I) which integrates text-to-image
    and image-to-text (image captioning) synthesis to improve the performance of
    text-to-image synthesis. We demonstrate that %the capability of our method to
    understand the sentence descriptions, so as to I2T2I can generate better
    multi-categories images using MSCOCO than the state-of-the-art. We also
    demonstrate that I2T2I can achieve transfer learning by using a pre-trained
    image captioning module to generate human images on the MPII Human Pose

    Towards a Quantum World Wide Web

    Diederik Aerts, Jonito Aerts Arguelles, Lester Beltran, Lyneth Beltran, Isaac Distrito, Massimiliano Sassoli de Bianchi, Sandro Sozzo, Tomas Veloz
    Comments: 20 pages, no figures
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Quantum Physics (quant-ph)

    We elaborate a quantum model for corpora of written documents, like the pages
    forming the World Wide Web. To that end, we are guided by how physicists
    constructed quantum theory for microscopic entities, which unlike classical
    objects cannot be fully represented in our spatial theater. We suggest that a
    similar construction needs to be carried out by linguists and computational
    scientists, to capture the full meaning content of collections of documental
    entities. More precisely, we show how to associate a quantum-like ‘entity of
    meaning’ to a ‘language entity formed by printed documents’, considering the
    latter as the collection of traces that are left by the former, in specific
    results of search actions that we describe as measurements. In other words, we
    offer a perspective where a collection of documents, like the Web, is described
    as the space of manifestation of a more complex entity – the QWeb – which is
    the object of our modeling, drawing its inspiration from previous studies on
    operational-realistic approaches to quantum physics and quantum modeling of
    human cognition and decision-making. We emphasize that a consistent QWeb model
    needs to account for the observed correlations between words appearing in
    printed documents, e.g., co-occurrences, as the latter would depend on the
    ‘meaning connections’ existing between the concepts that are associated with
    these words. In that respect, we show that both ‘context and interference
    (quantum) effects’ are required to explain the probabilities calculated by
    counting the relative number of documents containing certain words and
    co-ocurrrences of words.

    Automatic Text Summarization Approaches to Speed up Topic Model Learning Process

    Mohamed Morchid, Juan-Manuel Torres-Moreno, Richard Dufour, Javier Ramírez-Rodríguez, Georges Linarès
    Comments: 16 pages, 4 tables, 8 figures
    Journal-ref: International Journal of Computational Linguistics and
    Applications, 7(2):87-109, 2016
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    The number of documents available into Internet moves each day up. For this
    reason, processing this amount of information effectively and expressibly
    becomes a major concern for companies and scientists. Methods that represent a
    textual document by a topic representation are widely used in Information
    Retrieval (IR) to process big data such as Wikipedia articles. One of the main
    difficulty in using topic model on huge data collection is related to the
    material resources (CPU time and memory) required for model estimate. To deal
    with this issue, we propose to build topic spaces from summarized documents. In
    this paper, we present a study of topic space representation in the context of
    big data. The topic space representation behavior is analyzed on different
    languages. Experiments show that topic spaces estimated from text summaries are
    as relevant as those estimated from the complete documents. The real advantage
    of such an approach is the processing time gain: we showed that the processing
    time can be drastically reduced using summarized documents (more than 60\% in
    general). This study finally points out the differences between thematic
    representations of documents depending on the targeted languages such as
    English or latin languages.

    Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

    Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra
    Comments: 11 pages, 4 figures, 2 tables, webpage: this http URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    We introduce the first goal-driven training for visual question answering and
    dialog agents. Specifically, we pose a cooperative ‘image guessing’ game
    between two agents — Qbot and Abot — who communicate in natural language
    dialog so that Qbot can select an unseen image from a lineup of images. We use
    deep reinforcement learning (RL) to learn the policies of these agents
    end-to-end — from pixels to multi-agent multi-round dialog to game reward.

    We demonstrate two experimental results.

    First, as a ‘sanity check’ demonstration of pure RL (from scratch), we show
    results on a synthetic world, where the agents communicate in ungrounded
    vocabulary, i.e., symbols with no pre-specified meanings (X, Y, Z). We find
    that two bots invent their own communication protocol and start using certain
    symbols to ask/answer about certain visual attributes (shape/color/size). Thus,
    we demonstrate the emergence of grounded language and communication among
    ‘visual’ dialog agents with no human supervision at all.

    Second, we conduct large-scale real-image experiments on the VisDial dataset,
    where we pretrain with supervised dialog data and show that the RL ‘fine-tuned’
    agents significantly outperform SL agents. Interestingly, the RL Qbot learns to
    ask questions that Abot is good at, ultimately resulting in more informative
    dialog and a better team.

    VQABQ: Visual Question Answering by Basic Questions

    Jia-Hong Huang, Modar Alfadly, Bernard Ghanem
    Comments: Submitted ICCV 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    Taking image and question as the input of our method, it can output the
    text-based answer of the query question about the given image, so called Visual
    Question Answering (VQA). There are two main modules in our algorithm. Given a
    natural language question about an image, the first module takes the question
    as input and then outputs the basic questions of the main question, given
    question. The second module takes the main question, image and these basic
    questions as input and then outputs the text-based answer of the main question.
    We formulate the basic questions generation problem as a LASSO optimization
    problem, and also propose a criterion about how to exploit these basic
    questions to help answer main question. Our method is evaluated on the
    challenging VQA dataset, and yields the competitive performance compared to
    state-of-the-art.


    Distributed, Parallel, and Cluster Computing

    Parallel Sort-Based Matching for Data Distribution Management on Shared-Memory Multiprocessors

    Moreno Marzolla, Gabriele D'Angelo
    Comments: Submitted to IEEE/ACM DS-RT 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Multiagent Systems (cs.MA)

    In this paper we consider the problem of identifying intersections between
    two sets of d-dimensional axis-parallel rectangles. This is a common problem
    that arises in many agent-based simulation studies, and is of central
    importance in the context of High Level Architecture (HLA), where it is at the
    core of the Data Distribution Management (DDM) service. Several realizations of
    the DDM service have been proposed; however, many of them are either
    inefficient or inherently sequential. These are serious limitations since
    multicore processors are now ubiquitous, and DDM algorithms — being
    CPU-intensive — could benefit from additional computing power. We propose a
    parallel version of the Sort-Based Matching algorithm for shared-memory
    multiprocessors. Sort-Based Matching is one of the most efficient serial
    algorithms for the DDM problem, but is quite difficult to parallelize due to
    data dependencies. We describe the algorithm and compute its asymptotic running
    time; we complete the analysis by assessing its performance and scalability
    through extensive experiments on two commodity multicore systems based on a
    dual socket Intel Xeon processor, and a single socket Intel Core i7 processor.

    A Flexible Privacy-preserving Framework for Singular Value Decomposition under Internet of Things Environment

    Shuo Chen, Rongxing Lu, Jie Zhang
    Comments: 23 pages, 4 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)

    The singular value decomposition (SVD) is a widely used matrix factorization
    tool which underlies plenty of useful applications, e.g. recommendation system,
    abnormal detection and data compression. Under the environment of emerging
    Internet of Things (IoT), there would be an increasing demand for data analysis
    to better human’s lives and create new economic growth points. Moreover, due to
    the large scope of IoT, most of the data analysis work should be done in the
    network edge, i.e. handled by fog computing. However, the devices which provide
    fog computing may not be trustable while the data privacy is often the
    significant concern of the IoT application users. Thus, when performing SVD for
    data analysis purpose, the privacy of user data should be preserved. Based on
    the above reasons, in this paper, we propose a privacy-preserving fog computing
    framework for SVD computation. The security and performance analysis shows the
    practicability of the proposed framework. Furthermore, since the SVD
    computation is the tool for data analysis rather than the ultimate goal, three
    different applications are introduced to show how the framework could flexibly
    achieve the purposes of different applications, which indicates the flexibility
    of the design.

    Internet of Things: An Overview

    Farzad Khodadadi, Amir Vahid Dastjerdi, Rajkumar Buyya
    Comments: Keywords: Internet of Things; IoT; Web of Things; Cloud of Things
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

    As technology proceeds and the number of smart devices continues to grow
    substantially, need for ubiquitous context-aware platforms that support
    interconnected, heterogeneous, and distributed network of devices has given
    rise to what is referred today as Internet-of-Things. However, paving the path
    for achieving aforementioned objectives and making the IoT paradigm more
    tangible requires integration and convergence of different knowledge and
    research domains, covering aspects from identification and communication to
    resource discovery and service integration. Through this chapter, we aim to
    highlight researches in topics including proposed architectures, security and
    privacy, network communication means and protocols, and eventually conclude by
    providing future directions and open challenges facing the IoT development.

    Proceedings International Workshop on Formal Engineering approaches to Software Components and Architectures

    Jan Kofroň (Charles University), Jana Tumova (KTH Royal Institute of Technology)
    Journal-ref: EPTCS 245, 2017
    Subjects: Software Engineering (cs.SE); Distributed, Parallel, and Cluster Computing (cs.DC)

    These are the proceedings of the 14th International Workshop on Formal
    Engineering approaches to Software Components and Architectures (FESCA). The
    workshop was held on April 22, 2017 in Uppsala (Sweden) as a satellite event to
    the European Joint Conference on Theory and Practice of Software (ETAPS’17).

    The aim of the FESCA workshop is to bring together junior researchers from
    formal methods, software engineering, and industry interested in the
    development and application of formal modelling approaches as well as
    associated analysis and reasoning techniques with practical benefits for
    software engineering.

    In recent years, the growing importance of functional correctness and the
    increased relevance of system quality properties (e.g. performance,
    reliability, security) have stimulated the emergence of analytical and
    modelling techniques for the design and development of software systems. With
    the increasing complexity and utilization of today’s software systems, FESCA
    aims at addressing two research questions: (1) what role is played by the
    software design phase in the systematic addressing of the analytical and
    modelling challenges, and (2) how can formal and semi-formal techniques be
    effectively applied to make the issues easier to address automatically, with
    lower human intervention.

    The Unheralded Value of the Multiway Rendezvous: Illustration with the Production Cell Benchmark

    Hubert Garavel, Wendelin Serwe
    Comments: In Proceedings MARS 2017, arXiv:1703.05812
    Journal-ref: EPTCS 244, 2017, pp. 230-270
    Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC)

    The multiway rendezvous introduced in Theoretical CSP is a powerful paradigm
    to achieve synchronization and communication among a group of (possibly more
    than two) processes. We illustrate the advantages of this paradigm on the
    production cell benchmark, a model of a real metal processing plant, for which
    we propose a compositional software controller, which is written in LNT and
    LOTOS, and makes intensive use of the multiway rendezvous.

    A Passivity-Based Distributed Reference Governor for Constrained Robotic Networks

    Tam Nguyen, Takeshi Hatanaka, Mamoru Doi, Emanuele Garone, Masayuki Fujita
    Comments: 8 pages, International Federation of Automatic Conference 2017, 8 figures
    Subjects: Multiagent Systems (cs.MA); Distributed, Parallel, and Cluster Computing (cs.DC); Robotics (cs.RO); Systems and Control (cs.SY)

    This paper focuses on a passivity-based distributed reference governor (RG)
    applied to a pre-stabilized mobile robotic network. The novelty of this paper
    lies in the method used to solve the RG problem, where a passivity-based
    distributed optimization scheme is proposed. In particular, the gradient
    descent method minimizes the global objective function while the dual ascent
    method maximizes the Hamiltonian. To make the agents converge to the agreed
    optimal solution, a proportional-integral consensus estimator is used. This
    paper proves the convergence of the state estimates of the RG to the optimal
    solution through passivity arguments, considering the physical system static.
    Then, the effectiveness of the scheme considering the dynamics of the physical
    system is demonstrated through simulations and experiments.

    Multirole Logic (Extended Abstract)

    Hongwei Xi, Hanwen Wu
    Subjects: Logic (math.LO); Distributed, Parallel, and Cluster Computing (cs.DC); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)

    We identify multirole logic as a new form of logic in which
    conjunction/disjunction is interpreted as an ultrafilter on the power set of
    some underlying set (of roles) and the notion of negation is generalized to
    endomorphisms on this underlying set. We formalize both multirole logic (MRL)
    and linear multirole logic (LMRL) as natural generalizations of classical logic
    (CL) and classical linear logic (CLL), respectively, and also present a
    filter-based interpretation for intuitionism in multirole logic. Among various
    meta-properties established for MRL and LMRL, we obtain one named multiparty
    cut-elimination stating that every cut involving one or more sequents (as a
    generalization of a (binary) cut involving exactly two sequents) can be
    eliminated, thus extending the celebrated result of cut-elimination by Gentzen.

    Recent Advances in Features Extraction and Description Algorithms: A Comprehensive Survey

    Ehab Salahat, Murad Qasaimeh
    Comments: Annual IEEE Industrial Electronics Societys 18th International Conf. on Industrial Technology (ICIT), 22-25 March, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

    Computer vision is one of the most active research fields in information
    technology today. Giving machines and robots the ability to see and comprehend
    the surrounding world at the speed of sight creates endless potential
    applications and opportunities. Feature detection and description algorithms
    can be indeed considered as the retina of the eyes of such machines and robots.
    However, these algorithms are typically computationally intensive, which
    prevents them from achieving the speed of sight real-time performance. In
    addition, they differ in their capabilities and some may favor and work better
    given a specific type of input compared to others. As such, it is essential to
    compactly report their pros and cons as well as their performances and recent
    advances. This paper is dedicated to provide a comprehensive overview on the
    state-of-the-art and recent advances in feature detection and description
    algorithms. Specifically, it starts by overviewing fundamental concepts. It
    then compares, reports and discusses their performance and capabilities. The
    Maximally Stable Extremal Regions algorithm and the Scale Invariant Feature
    Transform algorithms, being two of the best of their type, are selected to
    report their recent algorithmic derivatives.


    Learning

    Boosting Dilated Convolutional Networks with Mixed Tensor Decompositions

    Nadav Cohen, Ronen Tamari, Amnon Shashua
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Expressive efficiency is a concept that allows formally reasoning about the
    representational capacity of deep network architectures. A network architecture
    is expressively efficient with respect to an alternative architecture if the
    latter must grow super-linearly in order to represent functions realized by the
    former. A well-known example is the exponential expressive efficiency of depth,
    namely, that in many cases shallow networks must grow exponentially large in
    order to represent functions realized by deep networks. In this paper we study
    the expressive efficiency brought forth by the architectural feature of
    connectivity, motivated by the observation that nearly all state of the art
    networks these days employ elaborate connection schemes, running layers in
    parallel while splitting and merging them in various ways. A formal treatment
    of this question would shed light on the effectiveness of modern connectivity
    schemes, and in addition, could provide new tools for network design. We focus
    on dilated convolutional networks, a family of deep models gaining increased
    attention, underlying state of the art architectures like Google’s WaveNet and
    ByteNet. By introducing and studying the concept of mixed tensor
    decompositions, we prove that interconnecting dilated convolutional networks
    can lead to expressive efficiency. In particular, we show that a single
    connection between intermediate layers can already lead to an almost quadratic
    gap, which in large-scale settings typically makes the difference between a
    model that is practical and one that is not.

    Variance Reduced Stochastic Gradient Descent with Sufficient Decrease

    Fanhua Shang, Yuanyuan Liu, James Cheng, Kelvin Kai Wing Ng, Yuichi Yoshida
    Comments: 22 pages, 9 figures
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

    The sufficient decrease technique has been widely used in deterministic
    optimization, even for non-convex optimization problems, such as line-search
    techniques. Motivated by those successes, we propose a novel sufficient
    decrease framework for a class of variance reduced stochastic gradient descent
    (VR-SGD) methods such as SVRG and SAGA. In order to make sufficient decrease
    for stochastic optimization, we design a new sufficient decrease criterion. We
    then introduce a coefficient heta to satisfy the sufficient decrease
    property, which takes the decisions to shrink, expand or move in the opposite
    direction (i.e., heta x for the variable x), and give two specific update
    rules for Lasso and ridge regression. Moreover, we analyze the convergence
    properties of our algorithms for strongly convex problems, which show that both
    of our algorithms attain linear convergence rates. We also provide the
    convergence guarantees of both of our algorithms for non-strongly convex
    problems. Our experimental results further verify that our algorithms achieve
    better performance than their counterparts.

    On the Use of Default Parameter Settings in the Empirical Evaluation of Classification Algorithms

    Anthony Bagnall, Gavin C. Cawley
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We demonstrate that, for a range of state-of-the-art machine learning
    algorithms, the differences in generalisation performance obtained using
    default parameter settings and using parameters tuned via cross-validation can
    be similar in magnitude to the differences in performance observed between
    state-of-the-art and uncompetitive learning systems. This means that fair and
    rigorous evaluation of new learning algorithms requires performance comparison
    against benchmark methods with best-practice model selection procedures, rather
    than using default parameter settings. We investigate the sensitivity of three
    key machine learning algorithms (support vector machine, random forest and
    rotation forest) to their default parameter settings, and provide guidance on
    determining sensible default parameter values for implementations of these
    algorithms. We also conduct an experimental comparison of these three
    algorithms on 121 classification problems and find that, perhaps surprisingly,
    rotation forest is significantly more accurate on average than both random
    forest and a support vector machine.

    Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

    Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, Min Sun
    Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

    We introduce two tactics to attack agents trained by deep reinforcement
    learning algorithms using adversarial examples, namely the strategically-timed
    attack and the enchanting attack. In the strategically-timed attack, the
    adversary aims at minimizing the agent’s reward by only attacking the agent at
    a small subset of time steps in an episode. Limiting the attack activity to
    this subset helps prevent detection of the attack by the agent. We propose a
    novel method to determine when an adversarial example should be crafted and
    applied. In the enchanting attack, the adversary aims at luring the agent to a
    designated target state. This is achieved by combining a generative model and a
    planning algorithm: while the generative model predicts the future states, the
    planning algorithm generates a preferred sequence of actions for luring the
    agent. A sequence of adversarial examples is then crafted to lure the agent to
    take the preferred sequence of actions. We apply the two tactics to the agents
    trained by the state-of-the-art deep reinforcement learning algorithm including
    DQN and A3C. In 5 Atari games, our strategically timed attack reduces as much
    reward as the uniform attack (i.e., attacking at every time step) does by
    attacking the agent 4 times less often. Our enchanting attack lures the agent
    toward designated target states with a more than 70% success rate. Videos are
    available at this http URL

    On the effect of pooling on the geometry of representations

    Gary Bécigneul
    Subjects: Learning (cs.LG)

    In machine learning and neuroscience, certain computational structures and
    algorithms are known to yield disentangled representations without us
    understanding why, the most striking examples being perhaps convolutional
    neural networks and the ventral stream of the visual cortex in humans and
    primates. As for the latter, it was conjectured that representations may be
    disentangled by being flattened progressively and at a local scale. An attempt
    at a formalization of the role of invariance in learning representations was
    made recently, being referred to as I-theory. In this framework and using the
    language of differential geometry, we show that pooling over a group of
    transformations of the input contracts the metric and reduces its curvature,
    and provide quantitative bounds, in the aim of moving towards a theoretical
    understanding on how to disentangle representations.

    Independence clustering (without a matrix)

    Daniil Ryabko
    Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

    The independence clustering problem is considered in the following
    formulation: given a set (S) of random variables, it is required to find the
    finest partitioning ({U_1,dots,U_k}) of (S) into clusters such that the
    clusters (U_1,dots,U_k) are mutually independent. Since mutual independence is
    the target, pairwise similarity measurements are of no use, and thus
    traditional clustering algorithms are inapplicable. The distribution of the
    random variables in (S) is, in general, unknown, but a sample is available.
    Thus, the problem is cast in terms of time series. Two forms of sampling are
    considered: i.i.d. and stationary time series, with the main emphasis being on
    the latter, more general, case. A consistent, computationally tractable
    algorithm for each of the settings is proposed, and a number of open directions
    for further research are outlined.

    A Systematic Study of Online Class Imbalance Learning with Concept Drift

    Shuo Wang, Leandro L. Minku, Xin Yao
    Subjects: Learning (cs.LG)

    As an emerging research topic, online class imbalance learning often combines
    the challenges of both class imbalance and concept drift. It deals with data
    streams having very skewed class distributions, where concept drift may occur.
    It has recently received increased research attention; however, very little
    work addresses the combined problem where both class imbalance and concept
    drift coexist. As the first systematic study of handling concept drift in
    class-imbalanced data streams, this paper first provides a comprehensive review
    of current research progress in this field, including current research focuses
    and open challenges. Then, an in-depth experimental study is performed, with
    the goal of understanding how to best overcome concept drift in online learning
    with class imbalance. Based on the analysis, a general guideline is proposed
    for the development of an effective algorithm.

    The Relationship Between Agnostic Selective Classification Active Learning and the Disagreement Coefficient

    Roei Gelbhart, Ran El-Yaniv
    Subjects: Learning (cs.LG)

    A selective classifier (f,g) comprises a classification function f and a
    binary selection function g, which determines if the classifier abstains from
    prediction, or uses f to predict. The classifier is called
    pointwise-competitive if it classifies each point identically to the best
    classifier in hindsight (from the same class), whenever it does not abstain.
    The quality of such a classifier is quantified by its rejection mass, defined
    to be the probability mass of the points it rejects. A “fast” rejection rate is
    achieved if the rejection mass is bounded from above by O(1/m) where m is the
    number of labeled examples used to train the classifier (and O hides
    logarithmic factors). Pointwise-competitive selective (PCS) classifiers are
    intimately related to disagreement-based active learning and it is known that
    in the realizable case, a fast rejection rate of a known PCS algorithm (called
    Consistent Selective Strategy) is equivalent to an exponential speedup of the
    well-known CAL active algorithm.

    We focus on the agnostic setting, for which there is a known algorithm called
    LESS that learns a PCS classifier and achieves a fast rejection rate (depending
    on Hanneke’s disagreement coefficient) under strong assumptions. We present an
    improved PCS learning algorithm called ILESS for which we show a fast rate
    (depending on Hanneke’s disagreement coefficient) without any assumptions. Our
    rejection bound smoothly interpolates the realizable and agnostic settings. The
    main result of this paper is an equivalence between the following three
    entities: (i) the existence of a fast rejection rate for any PCS learning
    algorithm (such as ILESS); (ii) a poly-logarithmic bound for Hanneke’s
    disagreement coefficient; and (iii) an exponential speedup for a new
    disagreement-based active learner called ActiveiLESS.

    Recurrent Collective Classification

    Shuangfei Fan, Bert Huang
    Subjects: Learning (cs.LG)

    We propose a new method for training iterative collective classifiers for
    labeling nodes in network data. The iterative classification algorithm (ICA) is
    a canonical method for incorporating relational information into
    classification. Yet, existing methods for training ICA models rely on the
    assumption that relational features reflect the true labels of the nodes. This
    unrealistic assumption introduces a bias that is inconsistent with the actual
    prediction algorithm. In this paper, we introduce recurrent collective
    classification (RCC), a variant of ICA analogous to recurrent neural network
    prediction. RCC accommodates any differentiable local classifier and relational
    feature functions. We provide gradient-based strategies for optimizing over
    model parameters to more directly minimize the loss function. In our
    experiments, this direct loss minimization translates to improved accuracy and
    robustness on real network data. We demonstrate the robustness of RCC in
    settings where local classification is very noisy, settings that are
    particularly challenging for ICA.

    Bernoulli Rank-(1) Bandits for Click Feedback

    Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Claire Vernade, Zheng Wen
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    The probability that a user will click a search result depends both on its
    relevance and its position on the results page. The position based model
    explains this behavior by ascribing to every item an attraction probability,
    and to every position an examination probability. To be clicked, a result must
    be both attractive and examined. The probabilities of an item-position pair
    being clicked thus form the entries of a rank-(1) matrix. We propose the
    learning problem of a Bernoulli rank-(1) bandit where at each step, the
    learning agent chooses a pair of row and column arms, and receives the product
    of their Bernoulli-distributed values as a reward. This is a special case of
    the stochastic rank-(1) bandit problem considered in recent work that proposed
    an elimination based algorithm Rank1Elim, and showed that Rank1Elim’s regret
    scales linearly with the number of rows and columns on “benign” instances.
    These are the instances where the minimum of the average row and column rewards
    (mu) is bounded away from zero. The issue with Rank1Elim is that it fails to
    be competitive with straightforward bandit strategies as (mu
    ightarrow 0).
    In this paper we propose Rank1ElimKL which simply replaces the (crude)
    confidence intervals of Rank1Elim with confidence intervals based on
    Kullback-Leibler (KL) divergences, and with the help of a novel result
    concerning the scaling of KL divergences we prove that with this change, our
    algorithm will be competitive no matter the value of (mu). Experiments with
    synthetic data confirm that on benign instances the performance of Rank1ElimKL
    is significantly better than that of even Rank1Elim, while experiments with
    models derived from real data confirm that the improvements are significant
    across the board, regardless of whether the data is benign or not.

    Generating Multi-label Discrete Electronic Health Records using Generative Adversarial Networks

    Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, Jimeng Sun
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Access to electronic health records (EHR) data has motivated computational
    advances in medical research. However, various concerns, particularly over
    privacy, can limit access to and collaborative use of EHR data. Sharing
    synthetic EHR data could mitigate risk. In this paper, we propose a new
    approach, medical Generative Adversarial Network (medGAN), to generate
    realistic synthetic EHRs. Based on an input EHR dataset, medGAN can generate
    high-dimensional discrete variables (e.g., binary and count features) via a
    combination of an autoencoder and generative adversarial networks. We also
    propose minibatch averaging to efficiently avoid mode collapse, and increase
    the learning efficiency with batch normalization and shortcut connections. To
    demonstrate feasibility, we showed that medGAN generates synthetic EHR datasets
    that achieve comparable performance to real data on many experiments including
    distribution statistics, predictive modeling tasks and medical expert review.

    Near Optimal Hamiltonian-Control and Learning via Chattering

    Peeyush Kumar, Wolf Kohn, Zelda B. Zabinsky
    Subjects: Learning (cs.LG)

    Many applications require solving non-linear control problems that are
    classically not well behaved. This paper develops a simple and efficient
    chattering algorithm that learns near optimal decision policies through an
    open-loop feedback strategy. The optimal control problem reduces to a series of
    linear optimization programs that can be easily solved to recover a relaxed
    optimal trajectory. This algorithm is implemented on a real-time enterprise
    scheduling and control process.

    Semi-Supervised Learning with Competitive Infection Models

    Nir Rosenfeld, Amir Globerson
    Subjects: Learning (cs.LG)

    The goal of semi supervised learning methods is to effectively combine
    labeled and unlabeled data to arrive at a better model. Many such methods rely
    on graph-based approaches, where continuous labels are propagated through a
    graph on the input points. Here we argue that it is more effective to consider
    infection processes on these graphs, whereby at any point in time nodes can
    infect other nodes with their labels. Since the dynamics of these processes is
    stochastic, we develop algorithms for efficiently estimating the expected
    labels over time. We show that our approach addresses many of the limitations
    of graph based learning, and is also empirically effective.

    An Automated Auto-encoder Correlation-based Health-Monitoring and Prognostic Method for Machine Bearings

    Ramin M. Hasani, Guodong Wang, Radu Grosu
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    This paper studies an intelligent ultimate technique for health-monitoring
    and prognostic of common rotary machine components, particularly bearings.
    During a run-to-failure experiment, rich unsupervised features from vibration
    sensory data are extracted by a trained sparse auto-encoder. Then, the
    correlation of the extracted attributes of the initial samples (presumably
    healthy at the beginning of the test) with the succeeding samples is calculated
    and passed through a moving-average filter. The normalized output is named
    auto-encoder correlation-based (AEC) rate which stands for an informative
    attribute of the system depicting its health status and precisely identifying
    the degradation starting point. We show that AEC technique well-generalizes in
    several run-to-failure tests. AEC collects rich unsupervised features form the
    vibration data fully autonomous. We demonstrate the superiority of the AEC over
    many other state-of-the-art approaches for the health monitoring and prognostic
    of machine bearings.

    Deep Decentralized Multi-task Multi-Agent RL under Partial Observability

    Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P. How, John Vian
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Many real-world tasks involve multiple agents with partial observability and
    limited communication. Learning is challenging in these settings due to local
    viewpoints of agents, which perceive the world as non-stationary due to
    concurrently-exploring teammates. Approaches that learn specialized policies
    for individual tasks face major problems when applied to the real world: not
    only do agents have to learn and store a distinct policy for each task, but in
    practice the identity of the task is often non-observable, making these
    algorithms inapplicable. This paper formalizes and addresses the problem of
    multi-task multi-agent reinforcement learning under partial observability. We
    introduce a decentralized single-task learning approach that is robust to
    concurrent interactions of teammates, and present an approach for distilling
    single-task policies into a unified policy that performs well across multiple
    related tasks, without explicit provision of task identity.

    Effective Evaluation using Logged Bandit Feedback from Multiple Loggers

    Aman Agarwal, Soumya Basu, Tobias Schnabel, Thorsten Joachims
    Subjects: Learning (cs.LG); Information Retrieval (cs.IR)

    Accurately evaluating new policies (e.g. ad-placement models, ranking
    functions, recommendation functions) is one of the key prerequisites for
    improving interactive systems. While the conventional approach to evaluation
    relies on online A/B tests, recent work has shown that counterfactual
    estimators can provide an inexpensive and fast alternative, since they can be
    applied offline using log data that was collected from a different policy
    fielded in the past. In this paper, we address the question of how to estimate
    the performance of a new policy when we have log data from multiple historic
    policies. This question is of great relevance in practice, since policies get
    updated frequently in most online systems. We show that naively combining data
    from multiple logging policies can be highly suboptimal. In particular, we find
    that the standard Inverse Propensity Score (IPS) estimator suffers especially
    when logging and evaluation policies diverge — to a point where throwing away
    data improves the variance of the estimator. We therefore propose two
    alternative estimators which we characterize theoretically and compare
    experimentally. We find that the new estimators can provide substantially
    improved estimation accuracy.

    Deep Neural Networks Do Not Recognize Negative Images

    Hossein Hosseini, Radha Poovendran
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Deep Neural Networks (DNNs) have achieved remarkable performance on a variety
    of pattern-recognition tasks, particularly visual classification problems,
    where new algorithms reported to achieve or even surpass the human performance.
    In this paper, we test the state-of-the-art DNNs with negative images and show
    that the accuracy drops to the level of random classification. This leads us to
    the conjecture that the DNNs, which are merely trained on raw data, do not
    recognize the semantics of the objects, but rather memorize the inputs. We
    suggest that negative images can be thought as “semantic adversarial examples”,
    which we define as transformed inputs that semantically represent the same
    objects, but the model does not classify them correctly.

    Counterfactual Fairness

    Matt J. Kusner, Joshua R. Loftus, Chris Russell, Ricardo Silva
    Subjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Learning (cs.LG)

    Machine learning has matured to the point to where it is now being considered
    to automate decisions in loan lending, employee hiring, and predictive
    policing. In many of these scenarios however, previous decisions have been made
    that are unfairly biased against certain subpopulations (e.g., those of a
    particular race, gender, or sexual orientation). Because this past data is
    often biased, machine learning predictors must account for this to avoid
    perpetuating discriminatory practices (or incidentally making new ones). In
    this paper, we develop a framework for modeling fairness in any dataset using
    tools from counterfactual inference. We propose a definition called
    counterfactual fairness that captures the intuition that a decision is fair
    towards an individual if it gives the same predictions in (a) the observed
    world and (b) a world where the individual had always belonged to a different
    demographic group, other background causes of the outcome being equal. We
    demonstrate our framework on two real-world problems: fair prediction of law
    school success, and fair modeling of an individual’s criminality in policing
    data.

    Efficient variational Bayesian neural network ensembles for outlier detection

    Nick Pawlowski, Miguel Jaques, Ben Glocker
    Comments: Under review for Workshop track – ICLR 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    In this work we perform outlier detection using ensembles of neural networks
    obtained by variational approximation of the posterior in a Bayesian neural
    network setting. The variational parameters are obtained by sampling from the
    true posterior by gradient descent. We show our outlier detection results are
    better than those obtained using other efficient ensembling methods.

    QMDP-Net: Deep Learning for Planning under Partial Observability

    Peter Karkus, David Hsu, Wee Sun Lee
    Comments: 9 pages, 5 figures, 1 table
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    This paper introduces QMDP-net, a neural network architecture for planning
    under partial observability. The QMDP-net combines the strengths of model-free
    learning and model-based planning. It is a recurrent policy network, but it
    represents a policy by connecting a model with a planning algorithm that solves
    the model, thus embedding the solution structure of planning in the network
    architecture. The QMDP-net is fully differentiable and allows end-to-end
    training. We train a QMDP-net over a set of different environments so that it
    can generalize over new ones. In preliminary experiments, QMDP-net showed
    strong performance on several robotic tasks in simulation. Interestingly, it
    also sometimes outperformed the QMDP algorithm, which generated the data for
    learning, because of QMDP-net’s robustness resulting from end-to-end learning.

    Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

    Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra
    Comments: 11 pages, 4 figures, 2 tables, webpage: this http URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    We introduce the first goal-driven training for visual question answering and
    dialog agents. Specifically, we pose a cooperative ‘image guessing’ game
    between two agents — Qbot and Abot — who communicate in natural language
    dialog so that Qbot can select an unseen image from a lineup of images. We use
    deep reinforcement learning (RL) to learn the policies of these agents
    end-to-end — from pixels to multi-agent multi-round dialog to game reward.

    We demonstrate two experimental results.

    First, as a ‘sanity check’ demonstration of pure RL (from scratch), we show
    results on a synthetic world, where the agents communicate in ungrounded
    vocabulary, i.e., symbols with no pre-specified meanings (X, Y, Z). We find
    that two bots invent their own communication protocol and start using certain
    symbols to ask/answer about certain visual attributes (shape/color/size). Thus,
    we demonstrate the emergence of grounded language and communication among
    ‘visual’ dialog agents with no human supervision at all.

    Second, we conduct large-scale real-image experiments on the VisDial dataset,
    where we pretrain with supervised dialog data and show that the RL ‘fine-tuned’
    agents significantly outperform SL agents. Interestingly, the RL Qbot learns to
    ask questions that Abot is good at, ultimately resulting in more informative
    dialog and a better team.

    Optimal Learning from Multiple Information Sources

    Annie Liang, Xiaosheng Mu, Vasilis Syrgkanis
    Subjects: Computer Science and Game Theory (cs.GT); Learning (cs.LG); Statistics Theory (math.ST)

    Decision-makers often learn by acquiring information from distinct sources
    that possibly provide complementary information. We consider a decision-maker
    who sequentially samples from a finite set of Gaussian signals, and wants to
    predict a persistent multi-dimensional state at an unknown final period. What
    signal should he choose to observe in each period? Related problems about
    optimal experimentation and dynamic learning tend to have solutions that can
    only be approximated or implicitly characterized. In contrast, we find that in
    our problem, the dynamically optimal path of signal acquisitions generically:
    (1) eventually coincides at every period with the myopic path of signal
    acquisitions, and (2) eventually achieves “total optimality,” so that at every
    large period, the decision-maker will not want to revise his previous signal
    acquisitions, even if given this opportunity. In special classes of
    environments that we describe, these properties attain not only eventually, but
    from period 1. Finally, we characterize the asymptotic frequency with which
    each signal is chosen, and how this depends on primitives of the informational
    environment.

    Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks

    Zhilin Yang, Ruslan Salakhutdinov, William W. Cohen
    Comments: Accepted as a conference paper at ICLR 2017. This is an extended version of the original paper (this https URL). The original paper proposes a new architecture, while this version focuses on transfer learning for a general model class
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Recent papers have shown that neural networks obtain state-of-the-art
    performance on several different sequence tagging tasks. One appealing property
    of such systems is their generality, as excellent performance can be achieved
    with a unified architecture and without task-specific feature engineering.
    However, it is unclear if such systems can be used for tasks without large
    amounts of training data. In this paper we explore the problem of transfer
    learning for neural sequence taggers, where a source task with plentiful
    annotations (e.g., POS tagging on Penn Treebank) is used to improve performance
    on a target task with fewer available annotations (e.g., POS tagging for
    microblogs). We examine the effects of transfer learning for deep hierarchical
    recurrent networks across domains, applications, and languages, and show that
    significant improvement can often be obtained. These improvements lead to
    improvements over the current state-of-the-art on several well-studied tasks.

    Spectrum Estimation from a Few Entries

    Ashish Khetan, Sewoong Oh
    Comments: 52 pages; 15 figures
    Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Numerical Analysis (cs.NA)

    Singular values of a data in a matrix form provide insights on the structure
    of the data, the effective dimensionality, and the choice of hyper-parameters
    on higher-level data analysis tools. However, in many practical applications
    such as collaborative filtering and network analysis, we only get a partial
    observation. Under such scenarios, we consider the fundamental problem of
    recovering spectral properties of the underlying matrix from a sampling of its
    entries. We are particularly interested in directly recovering the spectrum,
    which is the set of singular values, and also in sample-efficient approaches
    for recovering a spectral sum function, which is an aggregate sum of the same
    function applied to each of the singular values. We propose first estimating
    the Schatten (k)-norms of a matrix, and then applying Chebyshev approximation
    to the spectral sum function or applying moment matching in Wasserstein
    distance to recover the singular values. The main technical challenge is in
    accurately estimating the Schatten norms from a sampling of a matrix. We
    introduce a novel unbiased estimator based on counting small structures in a
    graph and provide guarantees that match its empirical performance. Our
    theoretical analysis shows that Schatten norms can be recovered accurately from
    strictly smaller number of samples compared to what is needed to recover the
    underlying low-rank matrix. Numerical experiments suggest that we significantly
    improve upon a competing approach of using matrix completion methods.

    Deep Tensor Encoding

    B Sengupta, E Vasquez, Y Qian
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)

    Learning an encoding of feature vectors in terms of an over-complete
    dictionary or a probabilistic information geometric (Fisher vectors) construct
    is wide-spread in statistical signal processing and computer vision. In content
    based information retrieval using deep-learning classifiers, such encodings are
    learnt on the flattened last layer, without adherence to the multi-linear
    structure of the underlying feature tensor. We illustrate a variety of feature
    encodings incl. sparse dictionary coding and Fisher vectors along with
    proposing that a structured tensor factorization scheme enables us to perform
    retrieval that is at par, in terms of average precision, with Fisher vector
    encoded image signatures. In short, we illustrate how structural constraints
    increase retrieval fidelity.

    Multi-talker Speech Separation and Tracing with Permutation Invariant Training of Deep Recurrent Neural Networks

    Morten Kolbæk, Dong Yu, Zheng-Hua Tan, Jesper Jensen
    Subjects: Sound (cs.SD); Learning (cs.LG)

    Despite the significant progress made in the recent years in dictating
    single-talker speech, the progress made in speaker independent multi-talker
    mixed speech separation and tracing, often referred to as the cocktail-party
    problem, has been less impressive. In this paper we propose a novel technique
    for attacking this problem. The core of our technique is permutation invariant
    training (PIT), which aims at minimizing the source stream reconstruction error
    no matter how labels are ordered. This is achieved by aligning labels to the
    output streams automatically during the training time. This strategy
    effectively solves the label permutation problem observed in deep learning
    based techniques for speech separation. More interestingly, our approach can
    integrate speaker tracing in the PIT framework so that separation and tracing
    can be carried out in one step and trained end-to-end. This is achieved using
    recurrent neural networks (RNNs) by forcing separated frames belonging to the
    same speaker to be aligned to the same output layer during training.
    Furthermore, the computational cost introduced by PIT is very small compared to
    the RNN computation during training and is zero during separation. We evaluated
    PIT on the WSJ0 and Danish two- and three-talker mixed-speech separation tasks
    and found that it compares favorably to non-negative matrix factorization
    (NMF), computational auditory scene analysis (CASA), deep clustering (DPCL) and
    deep attractor network (DANet), and generalizes well over unseen speakers and
    languages.

    Curriculum Dropout

    Pietro Morerio, Jacopo Cavazza, Riccardo Volpi, Rene Vidal, Vittorio Murino
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

    Dropout is a very effective way of regularizing neural networks.
    Stochastically “dropping out” units with a certain probability discourages
    over-specific co-adaptations of feature detectors, preventing overfitting and
    improving network generalization. Besides, Dropout can be interpreted as an
    approximate model aggregation technique, where an exponential number of smaller
    networks are averaged in order to get a more powerful ensemble. In this paper,
    we show that using a fixed dropout probability during training is a suboptimal
    choice. We thus propose a time scheduling for the probability of retaining
    neurons in the network. This induces an adaptive regularization scheme that
    smoothly increases the difficulty of the optimization problem. This idea of
    “starting easy” and adaptively increasing the difficulty of the learning
    problem has its roots in curriculum learning and allows one to train better
    models. Indeed, we prove that our optimization strategy implements a very
    general curriculum scheme, by gradually adding noise to both the input and
    intermediate feature representations within the network architecture.
    Experiments on seven image classification datasets and different network
    architectures show that our method, named Curriculum Dropout, frequently yields
    to better generalization and, at worst, performs just as well as the standard
    Dropout method.

    Deciding How to Decide: Dynamic Routing in Artificial Neural Networks

    Mason McGill, Pietro Perona
    Comments: Submitted to ICML 2017
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose and systematically evaluate three strategies for training
    dynamically-routed artificial neural networks: graphs of learned
    transformations through which different input signals may take different paths.
    Though some approaches have advantages over others, the resulting networks are
    often qualitatively similar. We find that, in dynamically-routed networks
    trained to classify images, layers and branches become specialized to process
    distinct categories of images. Additionally, given a fixed computational
    budget, dynamically-routed networks tend to perform better than comparable
    statically-routed networks.


    Information Theory

    Analog Transmit Signal Optimization for Undersampled Delay-Doppler Estimation

    Andreas Lenz, Manuel Stein, Lee Swindlehurst
    Comments: 5 pages, 4 figures
    Subjects: Information Theory (cs.IT)

    In this work, the optimization of the analog transmit waveform for joint
    delay-Doppler estimation under sub-Nyquist conditions is considered. In
    particular, we derive an estimation theoretic design rule for the Fourier
    coefficients of the analog transmit signal when violating the sampling theorem
    at the receiver by using a wide analog pre-filtering bandwidth. For a wireless
    delay-Doppler channel, we derive an optimization problem based on the Bayesian
    Cram’er-Rao lower bound (BCRLB) which allows us to solve the transmitter
    design problem using an Eigenvalue decomposition. Our approach enables one to
    explore the Pareto-optimal design region spanned by the optimized waveforms.
    Furthermore, we demonstrate how our framework can be used to reduce the
    sampling rate at the receiver while maintaining high estimation accuracy.
    Finally, we verify the practical impact by Monte-Carlo simulations of a channel
    estimation algorithm.

    Generalized Compute-Compress-and-Forward

    Hai Cheng, Xiaojun Yuan, Yihua Tan
    Comments: 27 pages, 11 figures, submitted to IEEE Transaction on Information Theory
    Subjects: Information Theory (cs.IT)

    Compute-and-forward (CF) harnesses interference in wireless communications by
    exploiting structured coding. The key idea of CF is to compute integer
    combinations of codewords from multiple source nodes, rather than to decode
    individual codewords by treating others as noise. Compute-compress-and-forward
    (CCF) can further enhance the network performance by introducing compression
    operations at receivers. In this paper, we develop a more general compression
    framework, termed generalized compute-compress-and-forward (GCCF), where the
    compression function involves multiple quantization-and-modulo lattice
    operations. We show that GCCF achieves a broader compression rate region than
    CCF. We also compare our compression rate region with the fundamental
    Slepian-Wolf (SW) region. We show that GCCF is optimal in the sense of
    achieving the minimum total compression rate. We also establish the criteria
    under which GCCF achieves the SW region. In addition, we consider a two-hop
    relay network employing the GCCF scheme. We formulate a sum-rate maximization
    problem and develop an approximate algorithm to solve the problem. Numerical
    results are presented to demonstrate the performance superiority of GCCF over
    CCF and other schemes.

    Inference-Based Distributed Channel Allocation in Wireless Sensor Networks

    Panos N. Alevizos, Efthymios A. Vlachos, Aggelos Bletsas
    Subjects: Information Theory (cs.IT)

    Interference-aware resource allocation of time slots and frequency channels
    in single-antenna, halfduplex radio wireless sensor networks (WSN) is
    challenging. Devising distributed algorithms for such task further complicates
    the problem. This work studiesWSN joint time and frequency channel allocation
    for a given routing tree, such that: a) allocation is performed in a fully
    distributed way, i.e., information exchange is only performed among neighboring
    WSN terminals, within communication up to two hops, and b) detection of
    potential interfering terminals is simplified and can be practically realized.
    The algorithm imprints space, time, frequency and radio hardware constraints
    into a loopy factor graph and performs iterative message passing/ loopy belief
    propagation (BP) with randomized initial priors. Sufficient conditions for
    convergence to a valid solution are offered, for the first time in the
    literature, exploiting the structure of the proposed factor graph. Based on
    theoretical findings, modifications of BP are devised that i) accelerate
    convergence to a valid solution and ii) reduce computation cost. Simulations
    reveal promising throughput results of the proposed distributed algorithm, even
    though it utilizes simplified interfering terminals set detection. Future work
    could modify the constraints such that other disruptive wireless technologies
    (e.g., full-duplex radios or network coding) could be accommodated within the
    same inference framework.

    Power Beacon-Assisted Millimeter Wave Ad Hoc Networks

    Xiaohui Zhou, Jing Guo, Salman Durrani, Marco Di Renzo
    Comments: submitted for possible journal publication
    Subjects: Information Theory (cs.IT)

    Deployment of low cost power beacons (PBs) is a promising solution for
    dedicated wireless power transfer (WPT) in future wireless networks. In this
    paper, we present a tractable model for PB-assisted millimeter wave (mmWave)
    wireless ad hoc networks, where each transmitter (TX) harvests energy from all
    PBs and then uses the harvested energy to transmit information to its desired
    receiver. Our model accounts for realistic aspects of WPT and mmWave
    transmissions, such as power circuit activation threshold, allowed maximum
    harvested power, maximum transmit power, beamforming and blockage. Using
    stochastic geometry, we derive the Laplace transform of the aggregate received
    power at the TX to calculate the power coverage probability. We approximate and
    discretize the transmit power of each TX into a finite number of discrete power
    levels in log scale to compute the channel and total coverage probability. We
    compare our analytical predictions to simulations and observe good accuracy.
    The proposed model allows insights into effect of system parameters, such as
    transmit power of PBs, main lobe beam-width and power circuit activation
    threshold on the overall coverage probability. The results confirm that it is
    feasible and safe to power TXs in a mmWave ad hoc network using PBs.

    Full-Duplex Cooperative Cognitive Radio Networks with Wireless Energy Harvesting

    Rui Zhang, He Chen, Phee Lep Yeoh, Yonghui Li, Branka Vucetic
    Comments: 6 pages, 3 figures, conference
    Subjects: Information Theory (cs.IT)

    This paper proposes and analyzes a new full-duplex (FD) cooperative cognitive
    radio network with wireless energy harvesting (EH). We consider that the
    secondary receiver is equipped with a FD radio and acts as a FD hybrid access
    point (HAP), which aims to collect information from its associated EH secondary
    transmitter (ST) and relay the signals. The ST is assumed to be equipped with
    an EH unit and a rechargeable battery such that it can harvest and accumulate
    energy from radio frequency (RF) signals transmitted by the primary transmitter
    (PT) and the HAP. We develop a novel cooperative spectrum sharing (CSS)
    protocol for the considered system. In the proposed protocol, thanks to its FD
    capability, the HAP can receive the PT’s signals and transmit energy-bearing
    signals to charge the ST simultaneously, or forward the PT’s signals and
    receive the ST’s signals at the same time. We derive analytical expressions for
    the achievable throughput of both primary and secondary links by characterizing
    the dynamic charging/discharging behaviors of the ST battery as a finite-state
    Markov chain. We present numerical results to validate our theoretical analysis
    and demonstrate the merits of the proposed protocol over its non-cooperative
    counterpart.

    Fast Sequential Decoding of Polar Codes

    Peter Trifonov, Vera Miloslavskaya, Ruslan Morozov
    Subjects: Information Theory (cs.IT)

    An extension of the stack decoding algorithm for polar codes is presented.
    The paper introduces a new score function, which enables one to accurately
    compare paths of different length. This results in significant complexity
    reduction with respect to the original stack algorithm at the expense of
    negligible performance loss.

    Truth-Telling Mechanism for Secure Two-Way Relay Communications with Energy-Harvesting Revenue

    Muhammad R. A. Khandaker, Kai-Kit Wong, Gan Zheng
    Comments: Accepted in IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    This paper brings the novel idea of paying the utility to the winning agents
    in terms of some physical entity in cooperative communications. Our setting is
    a secret two-way communication channel where two transmitters exchange
    information in the presence of an eavesdropper. The relays are selected from a
    set of interested parties such that the secrecy sum rate is maximized. In
    return, the selected relay nodes’ energy harvesting requirements will be
    fulfilled up to a certain threshold through their own payoff so that they have
    the natural incentive to be selected and involved in the communication.
    However, relays may exaggerate their private information in order to improve
    their chance to be selected. Our objective is to develop a mechanism for relay
    selection that enforces them to reveal the truth since otherwise they may be
    penalized. We also propose a joint cooperative relay beamforming and transmit
    power optimization scheme based on an alternating optimization approach. Note
    that the problem is highly non-convex since the objective function appears as a
    product of three correlated Rayleigh quotients. While a common practice in the
    existing literature is to optimize the relay beamforming vector for given
    transmit power via rank relaxation, we propose a second-order cone programming
    (SOCP)-based approach in this paper which requires a significantly lower
    computational task. The performance of the incentive control mechanism and the
    optimization algorithm has been evaluated through numerical simulations.

    Scalable Content Delivery with Coded Caching in Multi-Antenna Fading Channels

    Khac-Hoang Ngo, Sheng Yang, Mari Kobayashi
    Comments: 30 pages, 5 figures, submitted to the IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    We consider the content delivery problem in a fading multi-input
    single-output channel with cacheaided users. We are interested in the
    scalability of the content delivery rate when the number of users, K, is large.
    Analytical results show that, using coded caching and wireless multicasting,
    without channel state information at the transmitter (CSIT), linear scaling of
    the content delivery rate with respect to K can be achieved in three different
    ways. First, with quasi-static fading, it can be achieved when the number of
    transmit antennas grows logarithmically with K. Second, even with a fixed
    number of antennas, we can achieve the linear scaling with a threshold-based
    user selection requiring only one-bit feedbacks from the users. Third, if the
    multicast transmission can span over multiple independent coherence blocks
    (with block fading), we show that linear scaling can be obtained when the
    product of the number of coherence blocks and the number of transmit antennas
    scales logarithmically with K. When CSIT is available, we propose a mixed
    strategy that combines spatial multiplexing and multicasting. Numerical results
    show that, by optimizing the power split between spatial multiplexing and
    multicasting, we can achieve a significant gain of the content delivery rate
    with moderate cache size.

    Enhancing Physical Layer Security in Dual-Hop Multiuser Transmission

    Waqas Aman, Guftaar Ahmad Sardar Sidhu, Tayyaba Jabeen, Feifei Gao, Shi Jin
    Subjects: Information Theory (cs.IT)

    In this paper, we consider the Physical Layer Security(PLS) problem in
    orthogonal frequency division multiple access (OFDMA) based dual-hop system
    which consists of multiple users, multiple amplify and forward relays, and an
    eavesdropper. The aim is to enhance PLS of the entire system by maximizing sum
    secrecy rate of secret users through optimal resource allocation under various
    practical constraints. Specifically, the sub-carrier allocation to different
    users, the relay assignments, and the power loading over different sub-carriers
    at transmitting nodes are optimized. The joint optimization problem is modeled
    as a mixed binary integer programming problem subject to exclusive sub-carrier
    allocation and separate power budget constraints at each node. A joint
    optimization solution is obtained through Lagrangian dual decomposition where
    KKT conditions are exploited to find the optimal power allocation at base
    station. Further, to reduce the complexity, a sub-optimal scheme is presented
    where the optimal power allocation is derived under fixed sub-carrier-relay
    assignment. Simulation results are also provided to validate the performance of
    proposed schemes.

    First- and Second-Order Hypothesis Testing for Mixed Memoryless Sources with General Mixture

    Te Sun Han, Ryo Nomura
    Comments: 23 pages
    Subjects: Information Theory (cs.IT)

    The first- and second-order optimum achievable exponents in the simple
    hypothesis testing problem are investigated. The optimum achievable exponent
    for type II error probability, under the constraint that the type I error
    probability is allowed asymptotically up to epsilon, is called the
    epsilon-optimum exponent. In this paper, we first give the second-order
    epsilon-exponent in the case where the null hypothesis and the alternative
    hypothesis are a mixed memoryless source and a stationary memoryless source,
    respectively. We next generalize this setting to the case where the alternative
    hypothesis is also a mixed memoryless source. We address the first-order
    epsilon-optimum exponent in this setting. In addition, an extension of our
    results to more general setting such as the hypothesis testing with mixed
    general source and the relationship with the general compound hypothesis
    testing problem are also discussed.

    Preserving Data-Privacy with Added Noises: Optimal Estimation and Privacy Analysis

    Jianping He, Lin Cai, Xinping Guan
    Comments: 32 pages, 2 figures
    Subjects: Information Theory (cs.IT)

    Networked system often relies on distributed algorithms to achieve a global
    computation goal with iterative local information exchanges between neighbor
    nodes. To preserve data privacy, a node may add a random noise to its original
    data for information exchange at each iteration. Nevertheless, a neighbor node
    can estimate other’s original data based on the information it received. The
    estimation accuracy and data privacy can be measured in terms of ((epsilon,
    delta))-data-privacy, defined as the probability of (epsilon)-accurate
    estimation (the difference of an estimation and the original data is within
    (epsilon)) is no larger than (delta) (the disclosure probability). How to
    optimize the estimation and analyze data privacy is a critical and open issue.
    In this paper, a theoretical framework is developed to investigate how to
    optimize the estimation of neighbor’s original data using the local information
    received, named optimal distributed estimation. Then, we study the disclosure
    probability under the optimal estimation for data privacy analysis. We further
    apply the developed framework to analyze the data privacy of the
    privacy-preserving average consensus algorithm and identify the optimal noises
    for the algorithm.

    The geometry of hypothesis testing over convex cones: Generalized likelihood tests and minimax radii

    Yuting Wei, Martin J. Wainwright, Adityanand Guntuboyina
    Subjects: Statistics Theory (math.ST); Information Theory (cs.IT)

    We consider a compound testing problem within the Gaussian sequence model in
    which the null and alternative are specified by a pair of closed, convex cones.
    Such cone testing problem arise in various applications, including detection of
    treatment effects, trend detection in econometrics, signal detection in radar
    processing, and shape-constrained inference in non-parametric statistics. We
    provide a sharp characterization of the GLRT testing radius up to a universal
    multiplicative constant in terms of the geometric structure of the underlying
    convex cones. When applied to concrete examples, this result reveals some
    interesting phenomena that do not arise in the analogous problems of estimation
    under convex constraints. In particular, in contrast to estimation error, the
    testing error no longer depends purely on the problem complexity via a
    volume-based measure (such as metric entropy or Gaussian complexity), other
    geometric properties of the cones also play an important role. To address the
    issue of optimality, we prove information-theoretic lower bounds for minimax
    testing radius again in terms of geometric quantities. Our general theorems are
    illustrated by examples including the cases of monotone and orthant cones, and
    involve some results of independent interest.

    Independence clustering (without a matrix)

    Daniil Ryabko
    Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

    The independence clustering problem is considered in the following
    formulation: given a set (S) of random variables, it is required to find the
    finest partitioning ({U_1,dots,U_k}) of (S) into clusters such that the
    clusters (U_1,dots,U_k) are mutually independent. Since mutual independence is
    the target, pairwise similarity measurements are of no use, and thus
    traditional clustering algorithms are inapplicable. The distribution of the
    random variables in (S) is, in general, unknown, but a sample is available.
    Thus, the problem is cast in terms of time series. Two forms of sampling are
    considered: i.i.d. and stationary time series, with the main emphasis being on
    the latter, more general, case. A consistent, computationally tractable
    algorithm for each of the settings is proposed, and a number of open directions
    for further research are outlined.

    A 594 Gbps LDPC Decoder Based on Finite-Alphabet Message Passing

    Reza Ghanaatian, Alexios Balatsoukas-Stimming, Christoph Muller, Michael Meidlinger, Gerald Matz, Adam Teman, Andreas Burg
    Comments: 11 pages, submitted to T-VLSI
    Subjects: Hardware Architecture (cs.AR); Information Theory (cs.IT)

    An ultra-high throughput low-density parity check (LDPC) decoder with an
    unrolled full-parallel architecture is proposed, which achieves the highest
    decoding throughput compared to previously reported LDPC decoders in
    literature. The decoder benefits from a serial message-transfer approach
    between the decoding stages to alleviate the well-known routing congestion
    problem in parallel LDPC decoders. Furthermore, a finite-alphabet message
    passing algorithm is employed to replace the variable node update rule of
    standard min-sum decoders with optimized look-up tables, designed in a way that
    maximize the mutual information between decoding messages. The proposed
    algorithm results in an architecture with reduced bit-width messages, leading
    to a significantly higher decoding throughput and to a lower area as compared
    to a min-sum decoder when serial message-transfer is used. The architecture is
    placed and routed for the standard min-sum reference decoder and for the
    proposed finite-alphabet decoder using a custom pseudo-hierarchical backend
    design strategy to further alleviate routing congestions and to handle the
    large design. Post-layout results show that the finite-alphabet decoder with
    the serial message-transfer architecture achieves a throughput as large as 594
    Gbps with an area of 16.2 mm(^2) and dissipates an average power of 22.7 pJ per
    decoded bit in a 28 nm FD-SOI library. Compared to the reference min-sum
    decoder, this corresponds to 3.3 times smaller area and 2 times better energy
    efficiency.




沪ICP备19023445号-2号
友情链接