IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Thu, 17 Nov 2016

    我爱机器学习(52ml.net)发表于 2016-11-17 00:00:00
    love 0

    Neural and Evolutionary Computing

    Training Spiking Deep Networks for Neuromorphic Hardware

    Eric Hunsberger, Chris Eliasmith
    Comments: 10 pages, 3 figures, 4 tables; the “methods” section of this article draws heavily on arXiv:1510.08829
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    We describe a method to train spiking deep networks that can be run using
    leaky integrate-and-fire (LIF) neurons, achieving state-of-the-art results for
    spiking LIF networks on five datasets, including the large ImageNet ILSVRC-2012
    benchmark. Our method for transforming deep artificial neural networks into
    spiking networks is scalable and works with a wide range of neural
    nonlinearities. We achieve these results by softening the neural response
    function, such that its derivative remains bounded, and by training the network
    with noise to provide robustness against the variability introduced by spikes.
    Our analysis shows that implementations of these networks on neuromorphic
    hardware will be many times more power-efficient than the equivalent
    non-spiking networks on traditional hardware.

    Reinforcement Learning with Unsupervised Auxiliary Tasks

    Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, Koray Kavukcuoglu
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Deep reinforcement learning agents have achieved state-of-the-art results by
    directly maximising cumulative reward. However, environments contain a much
    wider variety of possible training signals. In this paper, we introduce an
    agent that also maximises many other pseudo-reward functions simultaneously by
    reinforcement learning. All of these tasks share a common representation that,
    like unsupervised learning, continues to develop in the absence of extrinsic
    rewards. We also introduce a novel mechanism for focusing this representation
    upon extrinsic rewards, so that learning can rapidly adapt to the most relevant
    aspects of the actual task. Our agent significantly outperforms the previous
    state-of-the-art on Atari, averaging 880\% expert human performance, and a
    challenging suite of first-person, three-dimensional emph{Labyrinth} tasks
    leading to a mean speedup in learning of 10( imes) and averaging 87\% expert
    human performance on Labyrinth.


    Computer Vision and Pattern Recognition

    Convolutional Gated Recurrent Networks for Video Segmentation

    Mennatullah Siam, Sepehr Valipour, Martin Jagersand, Nilanjan Ray
    Comments: arXiv admin note: substantial text overlap with arXiv:1606.00487
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Semantic segmentation has recently witnessed major progress, where fully
    convolutional neural networks have shown to perform well. However, most of the
    previous work focused on improving single image segmentation. To our knowledge,
    no prior work has made use of temporal video information in a recurrent
    network. In this paper, we propose and implement a novel method for online
    semantic segmentation of video sequences that utilizes temporal data. The
    network combines a fully convolutional network and a gated recurrent unit that
    works on a sliding window over consecutive frames. The convolutional gated
    recurrent unit is used to preserve spatial information and reduce the
    parameters learned. Our method has the advantage that it can work in an online
    fashion instead of operating over the whole input batch of video frames. This
    architecture is tested for both binary and semantic video segmentation tasks.
    Experiments are conducted on the recent benchmarks in SegTrack V2, Davis,
    CityScapes, and Synthia. It is shown to have 5% improvement in Segtrack and 3%
    improvement in Davis in F-measure over a baseline plain fully convolutional
    network. It also proved to have 5.7% improvement on Synthia in mean IoU, and
    3.5% improvement on CityScapes in mean category IoU over the baseline network.
    The performance of the RFCN network depends on its baseline fully convolutional
    network. Thus RFCN architecture can be seen as a method to improve its baseline
    segmentation network by exploiting spatiotemporal information in videos.

    Aggregated Residual Transformations for Deep Neural Networks

    Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He
    Comments: Tech report
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a simple, highly modularized network architecture for image
    classification. Our network is constructed by repeating a building block that
    aggregates a set of transformations with the same topology. Our simple design
    results in a homogeneous, multi-branch architecture that has only a few
    hyper-parameters to set. This strategy exposes a new dimension, which we call
    “cardinality” (the size of the set of transformations), as an essential factor
    in addition to the dimensions of depth and width. On the ImageNet-1K dataset,
    we empirically show that even under the restricted condition of maintaining
    complexity, increasing cardinality is able to improve classification accuracy.
    Moreover, increasing cardinality is more effective than going deeper or wider
    when we increase the capacity. Our models, codenamed ResNeXt, are the
    foundations of our entry to the ILSVRC 2016 classification task in which we
    secured 2nd place. We further investigate ResNeXt on an ImageNet-5K set and the
    COCO detection set, also showing better results than its ResNet counterpart.

    Associative Embedding:End-to-End Learning for Joint Detection and Grouping

    Alejandro Newell, Jia Deng
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We introduce associative embedding, a novel method for supervising
    convolutional neural networks for the task of detection and grouping. A number
    of computer vision problems can be framed in this manner including multi-person
    pose estimation, instance segmentation, and multi-object tracking. Usually the
    grouping of detections is achieved with multi-stage pipelines, instead we
    propose an approach that teaches a network to simultaneously output detections
    and group assignments. This technique can be easily integrated into any
    state-of-the-art network architecture that produces pixel-wise predictions. We
    show how to apply this method to both multi-person pose estimation and instance
    segmentation. We present results for both tasks, and report state-of-the-art
    performance for multi-person pose.

    VisualBackProp: visualizing CNNs for autonomous driving

    Mariusz Bojarski, Anna Choromanska, Krzysztof Choromanski, Bernhard Firner, Larry Jackel, Urs Muller, Karol Zieba
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes a new method, that we call VisualBackProp, for
    visualizing which sets of pixels of the input image contribute most to the
    predictions made by the convolutional neural network (CNN). The method heavily
    hinges on exploring the intuition that the feature maps contain less and less
    irrelevant information to the prediction decision when moving deeper into the
    network. The technique we propose was developed as a debugging tool for
    CNN-based systems for steering self-driving cars and is therefore required to
    run in real-time, i.e. it was designed to require less computation than a
    forward propagation. This makes the presented visualization method a valuable
    debugging tool which can be easily used during both training and inference. We
    furthermore justify our approach with theoretical arguments and theoretically
    confirm that the proposed method identifies sets of input pixels, rather than
    individual pixels, that collaboratively contribute to the prediction. Our
    theoretical findings stand in agreement with experimental results. The
    empirical evaluation shows the plausibility of the proposed approach on road
    data.

    Dynamic Attention-controlled Cascaded Shape Regression Exploiting Training Data Augmentation and Fuzzy-set Sample Weighting

    Zhen-Hua Feng, Josef Kittler, William Christmas, Patrik Huber, Xiao-Jun Wu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a new Cascaded Shape Regression (CSR) architecture, namely Dynamic
    Attention-Controlled CSR (DAC-CSR), for robust facial landmark detection on
    unconstrained faces. Our DAC-CSR divides facial landmark detection into three
    cascaded sub-tasks: face bounding box refinement, general CSR and
    attention-controlled CSR. The first two stages refine initial face bounding
    boxes and output intermediate facial landmarks. Then, an online dynamic model
    selection method is used to choose appropriate domain-specific CSRs for further
    landmark refinement. The key innovation of our DAC-CSR is the fault-tolerant
    mechanism, using fuzzy set sample weighting for attention-controlled
    domain-specific model training. Moreover, we advocate data augmentation with a
    simple but effective 2D profile face generator, and context-aware feature
    extraction for better facial feature representation. Experimental results
    obtained on challenging datasets demonstrate the merits of our DAC-CSR over the
    state-of-the-art.

    Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification

    Yongxi Lu, Abhishek Kumar, Shuangfei Zhai, Yu Cheng, Tara Javidi, Rogerio Feris
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Multi-task learning aims to improve generalization performance of multiple
    prediction tasks by appropriately sharing relevant information across them. In
    the context of deep neural networks, this idea is often realized by
    hand-designed network architectures with layers that are shared across tasks
    and branches that encode task-specific features. However, the space of possible
    multi-task deep architectures is combinatorially large and often the final
    architecture is arrived at by manual exploration of this space subject to
    designer’s bias, which can be both error-prone and tedious. In this work, we
    propose a principled approach for designing compact multi-task deep learning
    architectures. Our approach starts with a thin network and dynamically widens
    it in a greedy manner during training using a novel criterion that promotes
    grouping of similar tasks together. Our Extensive evaluation on person
    attributes classification tasks involving facial and clothing attributes
    suggests that the models produced by the proposed method are fast, compact and
    can closely match or exceed the state-of-the-art accuracy from strong baselines
    by much more expensive models.

    Fast On-Line Kernel Density Estimation for Active Object Localization

    Anthony D. Rhodes, Max H. Quinn, Melanie Mitchell
    Comments: arXiv admin note: text overlap with arXiv:1607.00548
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    A major goal of computer vision is to enable computers to interpret visual
    situations—abstract concepts (e.g., “a person walking a dog,” “a crowd
    waiting for a bus,” “a picnic”) whose image instantiations are linked more by
    their common spatial and semantic structure than by low-level visual
    similarity. In this paper, we propose a novel method for prior learning and
    active object localization for this kind of knowledge-driven search in static
    images. In our system, prior situation knowledge is captured by a set of
    flexible, kernel-based density estimations—a situation model—that represent
    the expected spatial structure of the given situation. These estimations are
    efficiently updated by information gained as the system searches for relevant
    objects, allowing the system to use context as it is discovered to narrow the
    search.

    More specifically, at any given time in a run on a test image, our system
    uses image features plus contextual information it has discovered to identify a
    small subset of training images—an importance cluster—that is deemed most
    similar to the given test image, given the context. This subset is used to
    generate an updated situation model in an on-line fashion, using an efficient
    multipole expansion technique.

    As a proof of concept, we apply our algorithm to a highly varied and
    challenging dataset consisting of instances of a “dog-walking” situation. Our
    results support the hypothesis that dynamically-rendered, context-based
    probability models can support efficient object localization in visual
    situations. Moreover, our approach is general enough to be applied to diverse
    machine learning paradigms requiring interpretable, probabilistic
    representations generated from partially observed data.

    Neural Style Representations and the Large-Scale Classification of Artistic Style

    Jeremiah Johnson
    Comments: 10 pages, 4 figures, 2 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Applications (stat.AP); Machine Learning (stat.ML)

    The artistic style of a painting is a subtle aesthetic judgment used by art
    historians for grouping and classifying artwork. The recently introduced
    `neural-style’ algorithm substantially succeeds in merging the perceived
    artistic style of one image or set of images with the perceived content of
    another. In light of this and other recent developments in image analysis via
    convolutional neural networks, we investigate the effectiveness of a
    `neural-style’ representation for classifying the artistic style of paintings.

    Am I a Baller? Basketball Skill Assessment using First-Person Cameras

    Gedas Bertasius, Stella X. Yu, Hyun Soo Park, Jianbo Shi
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Skill assessment is a fundamental problem in sports like basketball.
    Nowadays, basketball skill assessment is handled by basketball experts who
    evaluate a player’s skill from unscripted third-person basketball game videos.
    However, due to a large distance between a camera and the players, a
    third-person video captures a low-resolution view of the players, which makes
    it difficult to 1) identify specific players in the video and 2) to recognize
    what they are doing.

    To address these issues, we use first-person cameras, which 1) provide a
    high-resolution view of a player’s actions, and 2) also eliminate the need to
    track each player. Despite this, learning a basketball skill assessment model
    from the first-person data is still challenging, because 1) a player’s actions
    of interest occur rarely, and 2) the data labeling requires using basketball
    experts, which is costly.

    To counter these problems, we introduce a concept of basketball elements, 1)
    which addresses a limited player’s activity data issue, and 2) eliminates the
    reliance on basketball experts. Basketball elements define simple basketball
    concepts, making labeling easy even for non-experts. Basketball elements are
    also prevalent in the first-person data, which allows us to learn, and use them
    for a player’s basketball activity recognition and his basketball skill
    assessment.

    Thus, our contributions include (1) a new task of assessing a player’s
    basketball skill from an unscripted first-person basketball game video, (2) a
    new 10.3 hour long first-person basketball video dataset capturing 48 players
    and (3) a data-driven model that assesses a player’s basketball skill without
    relying on basketball expert labelers.

    Lip Reading Sentences in the Wild

    Joon Son Chung, Andrew Senior, Oriol Vinyals, Andrew Zisserman
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The goal of this work is to recognise phrases and sentences being spoken by a
    talking face, with or without the audio. Unlike previous works that have
    focussed on recognising a limited number of words or phrases, we tackle lip
    reading as an open-world problem – unconstrained natural language sentences,
    and in the wild videos.

    Our key contributions are: (1) a ‘Watch, Listen, Attend and Spell’ (WLAS)
    network that learns to transcribe videos of mouth motion to characters; (2) a
    curriculum learning strategy to accelerate training and to reduce overfitting;
    (3) a ‘Lip Reading Sentences’ (LRS) dataset for visual speech recognition,
    consisting of over 100,000 natural sentences from British television.

    The WLAS model trained on the LRS dataset surpasses the performance of all
    previous work on standard lip reading benchmark datasets, often by a
    significant margin. This lip reading performance beats a professional lip
    reader on videos from BBC television, and we also demonstrate that visual
    information helps to improve speech recognition performance even when the audio
    is available.

    Weakly Supervised Top-down Salient Object Detection

    Hisham Cholakkal, Jubin Johnson, Deepu Rajan
    Comments: 14 pages, 12 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Top-down saliency models produce a probability map that peaks at target
    locations specified by a task/goal such as object detection. They are usually
    trained in a fully supervised setting involving pixel-level annotations of
    objects. We propose a weakly supervised top-down saliency framework using only
    binary labels that indicate the presence/absence of an object in an image.
    First, the probabilistic contribution of each image region to the confidence of
    a CNN-based image classifier is computed through a backtracking strategy to
    produce top-down saliency. From a set of saliency maps of an image produced by
    fast bottom-up saliency approaches, we select the best saliency map suitable
    for the top-down task. The selected bottom-up saliency map is combined with the
    top-down saliency map. Features having high combined saliency are used to train
    a linear SVM classifier to estimate feature saliency. This is integrated with
    combined saliency and further refined through a multi-scale
    superpixel-averaging of saliency map. We evaluate the performance of the
    proposed weakly supervised top-down saliency against fully supervised
    approaches and achieve state-of-the-art performance. Experiments are carried
    out on seven challenging datasets and quantitative results are compared with 36
    closely related approaches across 4 different applications.

    Exploiting Visual-Spatial First-Person Co-Occurrence for Action-Object Detection without Labels

    Gedas Bertasius, Stella X. Yu, Jianbo Shi
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Many first-person vision tasks such as activity recognition or video
    summarization requires knowing, which objects the camera wearer is interacting
    with (i.e. action-objects). The standard way to obtain this information is via
    a manual annotation, which is costly and time consuming. Also, whereas for the
    third-person tasks such as object detection, the annotator can be anybody,
    action-object detection task requires the camera wearer to annotate the data
    because a third-person may not know what the camera wearer was thinking. Such a
    constraint makes it even more difficult to obtain first-person annotations.

    To address this problem, we propose a Visual-Spatial Network (VSN) that
    detects action-objects without using any first-person labels. We do so (1) by
    exploiting the visual-spatial co-occurrence in the first-person data and (2) by
    employing an alternating cross-pathway supervision between the visual and
    spatial pathways of our VSN. During training, we use a selected action-object
    prior location to initialize the pseudo action-object ground truth, which is
    then used to optimize both pathways in an alternating fashion. The predictions
    from the spatial pathway are used to update the pseudo ground truth for the
    visual pathway and vice versa, which allows both pathways to improve each
    other. We show our method’s success on two different action-object datasets,
    where our method achieves similar or better results than the supervised
    methods. We also show that our method can be successfully used as pretraining
    for a supervised action-object detection task.

    Bootstrap, Review, Decode: Using Out-of-Domain Textual Data to Improve Image Captioning

    Wenhu Chen, Aurelien Lucchi, Thomas Hofmann
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a novel way of using out-of-domain textual data to enhance the
    performance of existing image captioning systems. We evaluate this learning
    approach on a newly designed model that uses – and improves upon – building
    blocks from state-of-the-art methods. This model starts from detecting visual
    concepts present in an image which are then fed to a reviewer-decoder
    architecture with an attention mechanism. Unlike previous approaches that
    encode visual concepts using word embeddings, we instead suggest using regional
    image features which capture more intrinsic information. The main benefit of
    this architecture is that it synthesizes meaningful thought vectors that
    capture salient image properties and then applies a soft attentive decoder to
    decode the thought vectors and generate image captions. We evaluate our model
    on both Microsoft COCO and Flickr30K datasets and demonstrate that this model
    combined with our bootstrap learning method can largely improve performance and
    help the model to generate more accurate and diverse captions.

    Guidefill: GPU Accelerated, Artist Guided Geometric Inpainting for 3D Conversion

    L. Robert Hocking, Russell MacKenzie, Carola-Bibiane Schoenlieb
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The conversion of traditional film into stereo 3D has become an important
    problem in the past decade. One of the main bottlenecks is a disocclusion step,
    which in commercial 3D conversion is usually done by teams of artists armed
    with a toolbox of inpainting algorithms. A current difficulty in this is that
    most available algorithms are either too slow for interactive use, or provide
    no intuitive means for users to tweak the output.

    In this paper we present a new fast inpainting algorithm based on
    transporting along automatically detected splines, which the user may edit. Our
    algorithm is implemented on the GPU and fills the inpainting domain in
    successive shells that adapt their shape on the fly. In order to allocate GPU
    resources as efficiently as possible, we propose a parallel algorithm to track
    the inpainting interface as it evolves, ensuring that no resources are wasted
    on pixels that are not currently being worked on. Theoretical analysis of the
    time and processor complexiy of our algorithm without and with tracking (as
    well as numerous numerical experiments) demonstrate the merits of the latter.

    Our transport mechanism is similar to the one used in coherence transport,
    but improves upon it by corrected a “kinking” phenomena whereby extrapolated
    isophotes may bend at the boundary of the inpainting domain. Theoretical
    results explaining this phenomena and its resolution are presented.

    Although our method ignores texture, in many cases this is not a problem due
    to the thin inpainting domains in 3D conversion. Experimental results show that
    our method can achieve a visual quality that is competitive with the
    state-of-the-art while maintaining interactive speeds and providing the user
    with an intuitive interface to tweak the results.

    Generalisation and Sharing in Triplet Convnets for Sketch based Visual Search

    Tu Bui, Leonardo Ribeiro, Moacir Ponti, John Collomosse
    Comments: submitted to CVPR2017 on 15Nov16
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose and evaluate several triplet CNN architectures for measuring the
    similarity between sketches and photographs, within the context of the sketch
    based image retrieval (SBIR) task. In contrast to recent fine-grained SBIR
    work, we study the ability of our networks to generalise across diverse object
    categories from limited training data, and explore in detail strategies for
    weight sharing, pre-processing, data augmentation and dimensionality reduction.
    We exceed the performance of pre-existing techniques on both the Flickr15k
    category level SBIR benchmark by (18\%), and the TU-Berlin SBIR benchmark by
    (sim10 mathcal{T}_b), when trained on the 250 category TU-Berlin
    classification dataset augmented with 25k corresponding photographs harvested
    from the Internet.

    DeMeshNet: Blind Face Inpainting for Deep MeshFace Verification

    Shu Zhang, Ran He, Tieniu Tan
    Comments: 10pages, submitted to CVPR 17
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    MeshFace photos have been widely used in many Chinese business organizations
    to protect ID face photos from being misused. The occlusions incurred by random
    meshes severely degenerate the performance of face verification systems, which
    raises the MeshFace verification problem between MeshFace and daily photos.
    Previous methods cast this problem as a typical low-level vision problem, i.e.
    blind inpainting. They recover perceptually pleasing clear ID photos from
    MeshFaces by enforcing pixel level similarity between the recovered ID images
    and the ground-truth clear ID images and then perform face verification on
    them. Essentially, face verification is conducted on a compact feature space
    rather than the image pixel space. Therefore, this paper argues that pixel
    level similarity and feature level similarity jointly offer the key to improve
    the verification performance. Based on this insight, we offer a novel feature
    oriented blind face inpainting framework. Specifically, we implement this by
    establishing a novel DeMeshNet, which consists of three parts. The first part
    addresses blind inpainting of the MeshFaces by implicitly exploiting extra
    supervision from the occlusion position to enforce pixel level similarity. The
    second part explicitly enforces a feature level similarity in the compact
    feature space, which can explore informative supervision from the feature space
    to produce better inpainting results for verification. The last part copes with
    face alignment within the net via a customized spatial transformer module when
    extracting deep facial features. All the three parts are implemented within an
    end-to-end network that facilitates efficient optimization. Extensive
    experiments on two MeshFace datasets demonstrate the effectiveness of the
    proposed DeMeshNet as well as the insight of this paper.

    Temporal Convolutional Networks for Action Segmentation and Detection

    Colin Lea, Michael D. Flynn, Rene Vidal, Austin Reiter, Gregory D. Hager
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The ability to identify and temporally segment fine-grained human actions
    throughout a video is crucial for robotics, surveillance, education, and
    beyond. Typical approaches decouple this problem by first extracting local
    spatiotemporal features from video frames and then feeding them into a temporal
    classifier that captures high-level temporal patterns. We introduce a new class
    of temporal models, which we call Temporal Convolutional Networks (TCNs), that
    use a hierarchy of temporal convolutions to perform fine-grained action
    segmentation or detection. Our Encoder-Decoder TCN uses pooling and upsampling
    to efficiently capture long-range temporal patterns whereas our Dilated TCN
    uses dilated convolutions. We show that TCNs are capable of capturing action
    compositions, segment durations, and long-range dependencies, and are over a
    magnitude faster to train than competing LSTM-based Recurrent Neural Networks.
    We apply these models to three challenging fine-grained datasets and show large
    improvements over the state of the art.

    Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation

    Jose Caballero, Christian Ledig, Andrew Aitken, Alejandro Acosta, Johannes Totz, Zehan Wang, Wenzhe Shi
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional neural networks have enabled accurate image super-resolution in
    real-time. However, recent attempts to benefit from temporal correlations in
    video super-resolution have been limited to naive or inefficient architectures.
    In this paper, we introduce spatio-temporal sub-pixel convolution networks that
    effectively exploit temporal redundancies and improve reconstruction accuracy
    while maintaining real-time speed. Specifically, we discuss the use of early
    fusion, slow fusion and 3D convolutions for the joint processing of multiple
    consecutive video frames. We also propose a novel joint motion compensation and
    video super-resolution algorithm that is orders of magnitude more efficient
    than competing methods, relying on a fast multi-resolution spatial transformer
    module that is end-to-end trainable. These contributions provide both higher
    accuracy and temporally more consistent videos, which we confirm qualitatively
    and quantitatively. Relative to single-frame models, spatio-temporal networks
    can either reduce the computational cost by 30% whilst maintaining the same
    quality or provide a 0.2dB gain for a similar computational cost. Results on
    publicly available datasets demonstrate that the proposed algorithms surpass
    current state-of-the-art performance in both accuracy and efficiency.

    Deep Transfer Learning for Person Re-identification

    Mengyue Geng, Yaowei Wang, Tao Xiang, Yonghong Tian
    Comments: 10 pages, 1 figure
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Person re-identification (Re-ID) poses a unique challenge to deep learning:
    how to learn a deep model with millions of parameters on a small training set
    of few or no labels. In this paper, a number of deep transfer learning models
    are proposed to address the data sparsity problem. First, a deep network
    architecture is designed which differs from existing deep Re-ID models in that
    (a) it is more suitable for transferring representations learned from large
    image classification datasets, and (b) classification loss and verification
    loss are combined, each of which adopts a different dropout strategy. Second, a
    two-stepped fine-tuning strategy is developed to transfer knowledge from
    auxiliary datasets. Third, given an unlabelled Re-ID dataset, a novel
    unsupervised deep transfer learning model is developed based on co-training.
    The proposed models outperform the state-of-the-art deep Re-ID models by large
    margins: we achieve Rank-1 accuracy of 85.4\%, 83.7\% and 56.3\% on CUHK03,
    Market1501, and VIPeR respectively, whilst on VIPeR, our unsupervised model
    (45.1\%) beats most supervised models.

    A Combinatorial Solution to Non-Rigid 3D Shape-to-Image Matching

    Florian Bernard, Frank R. Schmidt, Johan Thunberg, Daniel Cremers
    Comments: 10 pages, 7 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)

    We propose a combinatorial solution for the problem of non-rigidly matching a
    3D shape to 3D image data. To this end, we model the shape as a triangular mesh
    and allow each triangle of this mesh to be rigidly transformed to achieve a
    suitable matching to the image. By penalising the distance and the relative
    rotation between neighbouring triangles our matching compromises between the
    image and the shape information. In this paper, we resolve two major
    challenges: Firstly, we address the resulting large and NP-hard combinatorial
    problem with a suitable graph-theoretic approach. Secondly, we propose an
    efficient discretisation of the unbounded 6-dimensional Lie group SE(3). To our
    knowledge this is the first combinatorial formulation for non-rigid 3D
    shape-to-image matching. In contrast to existing local (gradient descent)
    optimisation methods, we obtain solutions that do not require a good
    initialisation and that are within a bound of the optimal solution. We evaluate
    the proposed combinatorial method on the two problems of non-rigid 3D
    shape-to-shape and non-rigid 3D shape-to-image registration and demonstrate
    that it provides promising results.

    shuttleNet: A biologically-inspired RNN with loop connection and parameter sharing

    Yemin Shi, Yonghong Tian, Yaowei Wang, Tiejun Huang
    Comments: 10 pages, 9 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Despite a lot of research efforts devoted in recent years, how to efficiently
    learn long-term dependencies from sequences still remains a pretty challenging
    task. As one of the key models for sequence learning, recurrent neural network
    (RNN) and its variants such as long short term memory (LSTM) and gated
    recurrent unit (GRU) are still not powerful enough in practice. One possible
    reason is that they have only feedforward connections, which is different from
    biological neural network that is typically composed of both feedforward and
    feedback connections. To address the problem, this paper proposes a
    biologically-inspired RNN structure, called shuttleNet, by introducing loop
    connections in the network and utilizing parameter sharing to prevent
    overfitting. Unlike the traditional RNNs, the cells of shuttleNet are loop
    connected to mimic the brain’s feedforward and feedback connections. The
    structure is then stretched in the depth dimension to generate a deeper model
    with multiple information flow paths, while the parameters are shared so as to
    prevent shuttleNet from being over-fitting. The attention mechanism is then
    applied to select the best information path. The extensive experiments are
    conducted on two datasets for action recognition: UCF101 and HMDB51. We find
    that our model can outperform LSTMs and GRUs remarkably. Even only replacing
    the LSTMs with our shuttleNet in a CNN-RNN network, we can still achieve the
    state-of-the-art performance on both datasets.

    Joint Network based Attention for Action Recognition

    Yemin Shi, Yonghong Tian, Yaowei Wang, Tiejun Huang
    Comments: 8 pages, 5 figures, JNA
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    By extracting spatial and temporal characteristics in one network, the
    two-stream ConvNets can achieve the state-of-the-art performance in action
    recognition. However, such a framework typically suffers from the separately
    processing of spatial and temporal information between the two standalone
    streams and is hard to capture long-term temporal dependence of an action. More
    importantly, it is incapable of finding the salient portions of an action, say,
    the frames that are the most discriminative to identify the action. To address
    these problems, a extbf{j}oint extbf{n}etwork based extbf{a}ttention
    (JNA) is proposed in this study. We find that the fully-connected fusion,
    branch selection and spatial attention mechanism are totally infeasible for
    action recognition. Thus in our joint network, the spatial and temporal
    branches share some information during the training stage. We also introduce an
    attention mechanism on the temporal domain to capture the long-term dependence
    meanwhile finding the salient portions. Extensive experiments are conducted on
    two benchmark datasets, UCF101 and HMDB51. Experimental results show that our
    method can improve the action recognition performance significantly and
    achieves the state-of-the-art results on both datasets.

    Will People Like Your Image?

    Katharina Schwarz, Patrick Wieschollek, Hendrik P.A. Lensch
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The wide distribution of digital devices as well as cheap storage allow us to
    take series of photos making sure not to miss any specific beautiful moment.
    Thereby, the huge and constantly growing image assembly makes it quite
    time-consuming to manually pick the best shots afterwards. Even more
    challenging, finding the most aesthetically pleasing images that might also be
    worth sharing is a largely subjective task in which general rules rarely apply.
    Nowadays, online platforms allow users to “like” or favor certain content with
    a single click. As we aim to predict the aesthetic quality of images, we now
    make use of such multi-user agreements. More precisely, we assemble a large
    data set of 380K images with associated meta information and derive a score to
    rate how visually pleasing a given photo is. predict the aesthetic quality of
    any arbitrary image or video, we transfer the Our proposed model of aesthetics
    is validated in a user study. We demonstrate our results on applications for
    resorting photo collections, capturing the best shot on mobile devices and
    aesthetic key-frame extraction from videos.

    One-Shot Video Object Segmentation

    Sergi Caelles, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Laura Leal-Taixé, Daniel Cremers, Luc Van Gool
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper tackles the task of semi-supervised video object segmentation,
    i.e., the separation of an object from the background in a video, given the
    mask of the first frame. We present One-Shot Video Object Segmentation (OSVOS),
    based on a fully-convolutional neural network architecture that is able to
    successively transfer generic semantic information, learned on ImageNet, to the
    task of foreground segmentation, and finally to learning the appearance of a
    single annotated object of the test sequence (hence one-shot). Although all
    frames are processed independently, the results are temporally coherent and
    stable. We perform experiments on three annotated video segmentation databases,
    which show that OSVOS is fast and improves the state of the art by a
    significant margin (79.8% vs 68.0%).

    Variational Deep Embedding: A Generative Approach to Clustering

    Zhuxi jiang, Yin Zheng, Huachun Tan, Bangsheng Tang, Hanning Zhou
    Comments: 8 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Clustering is among the most fundamental tasks in computer vision and machine
    learning. In this paper, we propose Variational Deep Embedding (VaDE), a novel
    unsupervised generative clustering approach within the framework of Variational
    Auto-Encoder (VAE). Specifically, VaDE models the data generative procedure
    with a Gaussian Mixture Model (GMM) and a deep neural network (DNN): 1) the GMM
    picks a cluster; 2) from which a latent embedding is generated; 3) then the DNN
    decodes the latent embedding into observables. Inference in VaDE is done in a
    variational way: a different DNN is used to encode observables to latent
    embeddings, so that the evidence lower bound (ELBO) can be optimized using
    Stochastic Gradient Variational Bayes (SGVB) estimator and the
    reparameterization trick. Quantitative comparisons with strong baselines are
    included in this paper, and experimental results show that VaDE significantly
    outperforms the state-of-the-art clustering methods on 4 benchmarks from
    various modalities. Moreover, by VaDE’s generative nature, we show its
    capability of generating highly realistic samples for any specified cluster,
    without using supervised information during training. Lastly, VaDE is a
    flexible and extensible framework for unsupervised generative clustering, more
    general mixture models than GMM can be easily plugged in.

    Cost-Sensitive Deep Learning with Layer-Wise Cost Estimation

    Yu-An Chung, Hsuan-Tien Lin
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    While deep neural networks have succeeded in several visual applications,
    such as object recognition, detection, and localization, by reaching very high
    classification accuracies, it is important to note that many real-world
    applications demand vary- ing costs for different types of misclassification
    errors, thus requiring cost-sensitive classification algorithms. Current models
    of deep neural networks for cost-sensitive classification are restricted to
    some specific network structures and limited depth. In this paper, we propose a
    novel framework that can be applied to deep neural networks with any structure
    to facilitate their learning of meaningful representations for cost-sensitive
    classification problems. Furthermore, the framework allows end- to-end training
    of deeper networks directly. The framework is designed by augmenting auxiliary
    neurons to the output of each hidden layer for layer-wise cost estimation, and
    including the total estimation loss within the optimization objective.
    Experimental results on public benchmark visual data sets with two cost
    information settings demonstrate that the proposed frame- work outperforms
    state-of-the-art cost-sensitive deep learning models.

    Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning

    Tien-Ju Yang, Yu-Hsin Chen, Vivienne Sze
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep convolutional neural networks (CNNs) are indispensable to
    state-of-the-art computer vision algorithms. However, they are still rarely
    deployed on battery-powered mobile devices, such as smartphones and wearable
    gadgets, where vision algorithms can enable many revolutionary real-world
    applications. The key limiting factor is the high energy consumption of CNN
    processing due to its high computational complexity. While there are many
    previous efforts that try to reduce the CNN model size or amount of
    computation, we find that they do not necessarily result in lower energy
    consumption, and therefore do not serve as a good metric for energy cost
    estimation.

    To close the gap between CNN design and energy consumption optimization, we
    propose an energy-aware pruning algorithm for CNNs that directly uses energy
    consumption estimation of a CNN to guide the pruning process. The energy
    estimation methodology uses parameters extrapolated from actual hardware
    measurements that target realistic battery-powered system setups. The proposed
    layer-by-layer pruning algorithm also prunes more aggressively than previously
    proposed pruning methods by minimizing the error in output feature maps instead
    of filter weights. For each layer, the weights are first pruned and then
    locally fine-tuned with a closed-form least-square solution to quickly restore
    the accuracy. After all layers are pruned, the entire network is further
    globally fine-tuned using back-propagation. With the proposed pruning method,
    the energy consumption of AlexNet and GoogLeNet are reduced by 3.7x and 1.6x,
    respectively, with less than 1% top-5 accuracy loss. Finally, we show that
    pruning the AlexNet with a reduced number of target classes can greatly
    decrease the number of weights but the energy reduction is limited.

    Learning To Score Olympic Events

    Paritosh Parmar, Brendan Tran Morris
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    While action recognition has been addressed extensively in the field of
    computer vision, action quality assessment has not been given much attention.
    Estimating action quality is crucial in areas such as sports and health care,
    while being useful in other areas like video retrieval. Unlike action
    recognition, which has millions of examples to learn from, the action quality
    datasets that are currently available are small — typically comprised of only
    a few hundred samples. We develop quality assessment frameworks which use SVR,
    LSTM and LSTM-SVR on top of spatiotemporal features learned using 3D
    convolutional neural networks (C3D). We demonstrate an efficient training
    mechanism for action quality LSTM suitable for limited data scenarios. The
    proposed systems show significant improvement over existing quality assessment
    approaches on the task of predicting scores of Olympic events both with
    short-time length actions (10m platform diving) and long-time length actions
    (figure skating short program). While SVR based frameworks yields better
    results, LSTM based frameworks are more intuitive and natural for describing
    the action, and can be used for improvement feedback.

    The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives

    Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan Boyd-Graber, Hal Daumé III, Larry Davis
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    Visual narrative is often a combination of explicit information and judicious
    omissions, relying on the viewer to supply missing details. In comics, most
    movements in time and space are hidden in the “gutters” between panels. To
    follow the story, readers logically connect panels together by inferring unseen
    actions through a process called “closure”. While computers can now describe
    the content of natural images, in this paper we examine whether they can
    understand the closure-driven narratives conveyed by stylized artwork and
    dialogue in comic book panels. We collect a dataset, COMICS, that consists of
    over 1.2 million panels (120 GB) paired with automatic textbox transcriptions.
    An in-depth analysis of COMICS demonstrates that neither text nor image alone
    can tell a comic book story, so a computer must understand both modalities to
    keep up with the plot. We introduce three cloze-style tasks that ask models to
    predict narrative and character-centric aspects of a panel given n preceding
    panels as context. Various deep neural architectures underperform human
    baselines on these tasks, suggesting that COMICS contains fundamental
    challenges for both vision and language.

    Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations

    Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Teddy Furon, Ondrej Chum
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Query expansion is a popular method to improve the quality of image retrieval
    with both conventional and CNN representations. It has been so far limited to
    global image similarity. This work focuses on diffusion, a mechanism that
    captures the image manifold in the feature space. The diffusion is carried out
    on descriptors of overlapping image regions rather than on a global image
    descriptor like in previous approaches. An efficient off-line stage allows
    optional reduction in the number of stored regions. In the on-line stage, the
    proposed handling of unseen queries in the indexing stage removes additional
    computation to adjust the precomputed data. A novel way to perform diffusion
    through a sparse linear system solver yields practical query times well below
    one second. Experimentally, we observe a significant boost in performance of
    image retrieval with compact CNN descriptors on standard benchmarks, especially
    when the query object covers only a small part of the image. Small objects have
    been a common failure case of CNN-based retrieval.

    Low-rank Bilinear Pooling for Fine-Grained Classification

    Shu Kong, Charless Fowlkes
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Pooling second-order local feature statistics to form a high-dimensional
    bilinear feature has been shown to achieve state-of-the-art performance on a
    variety of fine-grained classification tasks. To address the computational
    demands of high feature dimensionality, we propose to represent the covariance
    features as a matrix and apply a low-rank bilinear classifier. The resulting
    classifier can be evaluated without explicitly computing the bilinear feature
    map which allows for a large reduction in the compute time as well as
    decreasing the effective number of parameters to be learned.

    To further compress the model, we propose classifier co-decomposition that
    factorizes the collection of bilinear classifiers into a common factor and
    compact per-class terms. The co-decomposition idea can be deployed through two
    convolutional layers and trained in an end-to-end architecture. We suggest a
    simple yet effective initialization that avoids explicitly first training and
    factorizing the larger bilinear classifiers. Through extensive experiments, we
    show that our model achieves state-of-the-art performance on several public
    datasets for fine-grained classification trained with only category labels.
    Importantly, our final model is an order of magnitude smaller than the recently
    proposed compact bilinear model, and three orders smaller than the standard
    bilinear CNN model.

    Learning a Deep Embedding Model for Zero-Shot Learning

    Li Zhang, Tao Xiang, Shaogang Gong
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Zero-shot learning (ZSL) models rely on learning a joint embedding space
    where both textual/semantic description of object classes and visual
    representation of object images can be projected to for nearest neighbour
    search. Despite the success of deep neural networks that learn an end-to-end
    model between text and images in other vision problems such as image
    captioning, very few deep ZSL model exists and they show little advantage over
    ZSL models that utilise deep feature representations but do not learn an
    end-to-end embedding. In this paper we argue that the key to make deep ZSL
    models succeed is to choose the right embedding space. Instead of embedding
    into a semantic space or an intermediate space, we propose to use the visual
    space as the embedding space. This is because that in this space, the
    subsequent nearest neighbour search would suffer much less from the hubness
    problem and thus become more effective. This model design also provides a
    natural mechanism for multiple semantic modalities (e.g., attributes and
    sentence descriptions) to be fused and optimised jointly in an end-to-end
    manner. Extensive experiments on four benchmarks show that our model
    significantly outperforms the existing models.

    Learning Detailed Face Reconstruction from a Single Image

    Elad Richardson, Matan Sela, Roy Or-El, Ron Kimmel
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Reconstructing the detailed geometric structure of a face from a given image
    is a key to many computer vision and graphics applications, such as motion
    capture and reenactment. The reconstruction task is challenging as human faces
    vary extensively when considering expressions, poses, textures, and intrinsic
    geometry. While many approaches tackle this complexity by using additional data
    to reconstruct the face of a single subject, extracting facial surface from a
    single image remains a difficult problem. As a result, single-image based
    methods can usually provide only a rough estimate of the facial geometry. In
    contrast, we propose to leverage the power of convolutional neural networks to
    produce a highly detailed face reconstruction from a single image. For this
    purpose, we introduce an end-to-end CNN framework which derives the shape in a
    coarse-to-fine fashion. The proposed architecture is composed of two main
    blocks, a network that recovers the coarse facial geometry (CoarseNet),
    followed by a CNN that refines the facial features of that geometry (FineNet).
    The proposed networks are connected by a novel layer which renders a depth
    image given a mesh in 3D. Unlike object recognition and detection problems,
    there are no suitable datasets for training CNNs to perform face geometry
    reconstruction. Therefore, our training regime begins with a supervised phase,
    based on synthetic images, followed by an unsupervised phase that uses only
    unconstrained facial images. The accuracy and robustness of the proposed model
    is demonstrated by both qualitative and quantitative evaluation tests.

    Image Credibility Analysis with Effective Domain Transferred Deep Networks

    Zhiwei Jin, Juan Cao, Jiebo Luo, Yongdong Zhang
    Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)

    Numerous fake images spread on social media today and can severely jeopardize
    the credibility of online content to public. In this paper, we employ deep
    networks to learn distinct fake image related features. In contrast to
    authentic images, fake images tend to be eye-catching and visually striking.
    Compared with traditional visual recognition tasks, it is extremely challenging
    to understand these psychologically triggered visual patterns in fake images.
    Traditional general image classification datasets, such as ImageNet set, are
    designed for feature learning at the object level but are not suitable for
    learning the hyper-features that would be required by image credibility
    analysis. In order to overcome the scarcity of training samples of fake images,
    we first construct a large-scale auxiliary dataset indirectly related to this
    task. This auxiliary dataset contains 0.6 million weakly-labeled fake and real
    images collected automatically from social media. Through an AdaBoost-like
    transfer learning algorithm, we train a CNN model with a few instances in the
    target training set and 0.6 million images in the collected auxiliary set. This
    learning algorithm is able to leverage knowledge from the auxiliary set and
    gradually transfer it to the target task. Experiments on a real-world testing
    set show that our proposed domain transferred CNN model outperforms several
    competing baselines. It obtains superiror results over transfer learning
    methods based on the general ImageNet set. Moreover, case studies show that our
    proposed method reveals some interesting patterns for distinguishing fake and
    authentic images.

    Deep Variational Inference Without Pixel-Wise Reconstruction

    Siddharth Agrawal, Ambedkar Dukkipati
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Variational autoencoders (VAEs), that are built upon deep neural networks
    have emerged as popular generative models in computer vision. Most of the work
    towards improving variational autoencoders has focused mainly on making the
    approximations to the posterior flexible and accurate, leading to tremendous
    progress. However, there have been limited efforts to replace pixel-wise
    reconstruction, which have known shortcomings. In this work, we use real-valued
    non-volume preserving transformations (real NVP) to exactly compute the
    conditional likelihood of the data given the latent distribution. We show that
    a simple VAE with this form of reconstruction is competitive with complicated
    VAE structures, on image modeling tasks. As part of our model, we develop
    powerful conditional coupling layers that enable real NVP to learn with fewer
    intermediate layers.

    S3Pool: Pooling with Stochastic Spatial Sampling

    Shuangfei Zhai, Hui Wu, Abhishek Kumar, Yu Cheng, Yongxi Lu, Zhongfei Zhang, Rogerio Feris
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Feature pooling layers (e.g., max pooling) in convolutional neural networks
    (CNNs) serve the dual purpose of providing increasingly abstract
    representations as well as yielding computational savings in subsequent
    convolutional layers. We view the pooling operation in CNNs as a two-step
    procedure: first, a pooling window (e.g., (2 imes 2)) slides over the feature
    map with stride one which leaves the spatial resolution intact, and second,
    downsampling is performed by selecting one pixel from each non-overlapping
    pooling window in an often uniform and deterministic (e.g., top-left) manner.
    Our starting point in this work is the observation that this regularly spaced
    downsampling arising from non-overlapping windows, although intuitive from a
    signal processing perspective (which has the goal of signal reconstruction), is
    not necessarily optimal for emph{learning} (where the goal is to generalize).
    We study this aspect and propose a novel pooling strategy with stochastic
    spatial sampling (S3Pool), where the regular downsampling is replaced by a more
    general stochastic version. We observe that this general stochasticity acts as
    a strong regularizer, and can also be seen as doing implicit data augmentation
    by introducing distortions in the feature maps. We further introduce a
    mechanism to control the amount of distortion to suit different datasets and
    architectures. To demonstrate the effectiveness of the proposed approach, we
    perform extensive experiments on several popular image classification
    benchmarks, observing excellent improvements over baseline models. Experimental
    code is available at this https URL


    Artificial Intelligence

    ProjE: Embedding Projection for Knowledge Graph Completion

    Baoxu Shi, Tim Weninger
    Comments: 14 pages, Accepted to AAAI 2017
    Subjects: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    With the large volume of new information created every day, determining the
    validity of information in a knowledge graph and filling in its missing parts
    are crucial tasks for many researchers and practitioners. To address this
    challenge, a number of knowledge graph completion methods have been developed
    using low-dimensional graph embeddings. Although researchers continue to
    improve these models using an increasingly complex feature space, we show that
    simple changes in the architecture of the underlying model can outperform
    state-of-the-art models without the need for complex feature engineering. In
    this work, we present a shared variable neural network model called ProjE that
    fills-in missing information in a knowledge graph by learning joint embeddings
    of the knowledge graph’s entities and edges, and through subtle, but important,
    changes to the standard loss function. In doing so, ProjE has a parameter size
    that is smaller than 11 out of 15 existing methods while performing (37\%)
    better than the current-best method on standard datasets. We also show, via a
    new fact checking task, that ProjE is capable of accurately determining the
    veracity of many declarative statements.

    PCT and Beyond: Towards a Computational Framework for `Intelligent' Communicative Systems

    Prof. Roger K. Moore
    Comments: To appear in A. McElhone & W. Mansell (Eds.), Living Control Systems IV: Perceptual Control Theory and the Future of the Life and Social Sciences, Benchmark Publications Inc
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Robotics (cs.RO)

    Recent years have witnessed increasing interest in the potential benefits of
    `intelligent’ autonomous machines such as robots. Honda’s Asimo humanoid robot,
    iRobot’s Roomba robot vacuum cleaner and Google’s driverless cars have fired
    the imagination of the general public, and social media buzz with speculation
    about a utopian world of helpful robot assistants or the coming robot
    apocalypse! However, there is a long way to go before autonomous systems reach
    the level of capabilities required for even the simplest of tasks involving
    human-robot interaction – especially if it involves communicative behaviour
    such as speech and language. Of course the field of Artificial Intelligence
    (AI) has made great strides in these areas, and has moved on from abstract
    high-level rule-based paradigms to embodied architectures whose operations are
    grounded in real physical environments. What is still missing, however, is an
    overarching theory of intelligent communicative behaviour that informs
    system-level design decisions in order to provide a more coherent approach to
    system integration. This chapter introduces the beginnings of such a framework
    inspired by the principles of Perceptual Control Theory (PCT). In particular,
    it is observed that PCT has hitherto tended to view perceptual processes as a
    relatively straightforward series of transformations from sensation to
    perception, and has overlooked the potential of powerful generative model-based
    solutions that have emerged in practical fields such as visual or auditory
    scene analysis. Starting from first principles, a sequence of arguments is
    presented which not only shows how these ideas might be integrated into PCT,
    but which also extend PCT towards a remarkably symmetric architecture for a
    needs-driven communicative agent. It is concluded that, if behaviour is the
    control of perception, then perception is the simulation of behaviour.

    Driving CDCL Search

    Carmine Dodaro, Philip Gasteiger, Nicola Leone, Benjamin Musitsch, Francesco Ricca, Konstantin Schekotihin
    Comments: Paper presented at the 1st Workshop on Trends and Applications of Answer Set Programming (TAASP 2016), Klagenfurt, Austria, 26 September 2016, 15 pages, LaTeX, 5 figures
    Subjects: Artificial Intelligence (cs.AI)

    The CDCL algorithm is the leading solution adopted by state-of-the-art
    solvers for SAT, SMT, ASP, and others. Experiments show that the performance of
    CDCL solvers can be significantly boosted by embedding domain-specific
    heuristics, especially on large real-world problems. However, a proper
    integration of such criteria in off-the-shelf CDCL implementations is not
    obvious. In this paper, we distill the key ingredients that drive the search of
    CDCL solvers, and propose a general framework for designing and implementing
    new heuristics. We implemented our strategy in an ASP solver, and we
    experimented on two industrial domains. On hard problem instances,
    state-of-the-art implementations fail to find any solution in acceptable time,
    whereas our implementation is very successful and finds all solutions.

    The Effects of Relative Importance of User Constraints in Cloud of Things Resource Discovery: A Case Study

    Luiz H. Nunes, Julio C. Estrella, Alexandre C. B. Delbem, Charith Perera, Stephan Reiff-Marganiec
    Comments: Proceedings of the 9th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2016) Shaghai, China, December, 2016
    Journal-ref: Proceedings of the 9th IEEE/ACM International Conference on
    Utility and Cloud Computing (UCC 2016) Shaghai, China, December, 2016
    Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

    Over the last few years, the number of smart objects connected to the
    Internet has grown exponentially in comparison to the number of services and
    applications. The integration between Cloud Computing and Internet of Things,
    named as Cloud of Things, plays a key role in managing the connected things,
    their data and services. One of the main challenges in Cloud of Things is the
    resource discovery of the smart objects and their reuse in different contexts.
    Most of the existent work uses some kind of multi-criteria decision analysis
    algorithm to perform the resource discovery, but do not evaluate the impact
    that the user constraints has in the final solution. In this paper, we analyse
    the behaviour of the SAW, TOPSIS and VIKOR multi-objective decision analyses
    algorithms and the impact of user constraints on them. We evaluated the quality
    of the proposed solutions using the Pareto-optimality concept.

    Machine Learning Approach for Skill Evaluation in Robotic-Assisted Surgery

    Mahtab J. Fard, Sattar Ameri, Ratna B. Chinnam, Abhilash K. Pandya, Michael D. Klein, R. Darin Ellis
    Journal-ref: Lecture Notes in Engineering and Computer Science: Proceedings of
    The World Congress on Engineering and Computer Science 2016, 19-21 October,
    2016, San Francisco, USA
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Evaluating surgeon skill has predominantly been a subjective task.
    Development of objective methods for surgical skill assessment are of increased
    interest. Recently, with technological advances such as robotic-assisted
    minimally invasive surgery (RMIS), new opportunities for objective and
    automated assessment frameworks have arisen. In this paper, we applied machine
    learning methods to automatically evaluate performance of the surgeon in RMIS.
    Six important movement features were used in the evaluation including
    completion time, path length, depth perception, speed, smoothness and
    curvature. Different classification methods applied to discriminate expert and
    novice surgeons. We test our method on real surgical data for suturing task and
    compare the classification result with the ground truth data (obtained by
    manual labeling). The experimental results show that the proposed framework can
    classify surgical skill level with relatively high accuracy of 85.7%. This
    study demonstrates the ability of machine learning methods to automatically
    classify expert and novice surgeons using movement features for different RMIS
    tasks. Due to the simplicity and generalizability of the introduced
    classification method, it is easy to implement in existing trainers.

    Grammar Argumented LSTM Neural Networks with Note-Level Encoding for Music Composition

    Zheng Sun, Jiaqi Liu, Zewang Zhang, Jingwen Chen, Zhao Huo, Ching Hua Lee, Xiao Zhang
    Comments: 5 pages, 5 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)

    Creating any aesthetically pleasing piece of art, like music, has been a long
    time dream for artificial intelligence research. Based on recent success of
    long-short term memory (LSTM) on sequence learning, we put forward a novel
    system to reflect the thinking pattern of a musician. For data representation,
    we propose a note-level encoding method, which enables our model to simulate
    how human composes and polishes music phrases. To avoid failure against music
    theory, we invent a novel method, grammar argumented (GA) method. It can teach
    machine basic composing principles. In this method, we propose three rules as
    argumented grammars and three metrics for evaluation of machine-made music.
    Results show that comparing to basic LSTM, grammar argumented model’s
    compositions have higher contents of diatonic scale notes, short pitch
    intervals, and chords.

    Neural Style Representations and the Large-Scale Classification of Artistic Style

    Jeremiah Johnson
    Comments: 10 pages, 4 figures, 2 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Applications (stat.AP); Machine Learning (stat.ML)

    The artistic style of a painting is a subtle aesthetic judgment used by art
    historians for grouping and classifying artwork. The recently introduced
    `neural-style’ algorithm substantially succeeds in merging the perceived
    artistic style of one image or set of images with the perceived content of
    another. In light of this and other recent developments in image analysis via
    convolutional neural networks, we investigate the effectiveness of a
    `neural-style’ representation for classifying the artistic style of paintings.

    Variable Neighborhood Search Algorithms for the multi-depot dial-a-ride problem with heterogeneous vehicles and users

    Paolo Detti, Garazi Zabalo Manrique de Lara
    Subjects: Discrete Mathematics (cs.DM); Artificial Intelligence (cs.AI)

    In this work, a study on Variable Neighborhood Search algorithms for
    multi-depot dial-a-ride problems is presented. In dial-a-ride problems patients
    need to be transported from pre-specified pickup locations to pre-specified
    delivery locations, under different considerations. The addressed problem
    presents several constraints and features, such as heterogeneous vehicles,
    distributed in different depots, and heterogeneous patients. The aim is of
    minimizing the total routing cost, while respecting time-window, ride-time,
    capacity and route duration constraints. The objective of the study is of
    determining the best algorithm configuration in terms of initial solution,
    neighborhood and local search procedures. At this aim, two different procedures
    for the computation of an initial solution, six different type of neighborhoods
    and five local search procedures, where only intra-route changes are made, have
    been considered and compared.

    We have also evaluated an “adjusting procedure” that aims to produce feasible
    solutions from infeasible solutions with small constraints violations. The
    different VNS algorithms have been tested on instances from literature as well
    as on random instances arising from a real-world healthcare application.


    Information Retrieval

    Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking

    Drahomira Herrmannova, Petr Knoth
    Comments: WSDM Cup 2016 – Entity Ranking Challenge. The 9th ACM International Conference on Web Search and Data Mining, San Francisco, CA, USA. February 22-25, 2016
    Subjects: Information Retrieval (cs.IR); Digital Libraries (cs.DL)

    With the growing amount of published research, automatic evaluation of
    scholarly publications is becoming an important task. In this paper we address
    this problem and present a simple and transparent approach for evaluating the
    importance of scholarly publications. Our method has been ranked among the top
    performers in the WSDM Cup 2016 Challenge. The first part of this paper
    describes our method. In the second part we present potential improvements to
    the method and analyse the evaluation setup which was provided during the
    challenge. Finally, we discuss future challenges in automatic evaluation of
    papers including the use of full-texts based evaluation methods.

    Neural Style Representations and the Large-Scale Classification of Artistic Style

    Jeremiah Johnson
    Comments: 10 pages, 4 figures, 2 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Applications (stat.AP); Machine Learning (stat.ML)

    The artistic style of a painting is a subtle aesthetic judgment used by art
    historians for grouping and classifying artwork. The recently introduced
    `neural-style’ algorithm substantially succeeds in merging the perceived
    artistic style of one image or set of images with the perceived content of
    another. In light of this and other recent developments in image analysis via
    convolutional neural networks, we investigate the effectiveness of a
    `neural-style’ representation for classifying the artistic style of paintings.


    Computation and Language

    A Long Dependency Aware Deep Architecture for Joint Chinese Word Segmentation and POS Tagging

    Xinchi Chen, Xipeng Qiu, Xuanjing Huang
    Subjects: Computation and Language (cs.CL)

    Long-term context is crucial to joint Chinese word segmentation and POS
    tagging (S&T) task. However, most of machine learning based methods extract
    features from a window of characters. Due to the limitation of window size,
    these methods can not exploit the long distance information. In this work, we
    propose a long dependency aware deep architecture for joint S&T task.
    Specifically, to simulate the feature templates of traditional discrete feature
    based models, we use different filters to model the complex compositional
    features with convolutional and pooling layer, and then utilize long distance
    dependency information with recurrent layer. Experiment results on five
    different datasets show the effectiveness of our proposed model.

    The Life of Lazarillo de Tormes and of His Machine Learning Adversities

    Javier de la Rosa, Juan-Luis Suárez
    Comments: 66 pages, 11 figures
    Journal-ref: Lemir: Revista de Literatura Espa~nola Medieval y del
    Renacimiento, 20 (2016)
    Subjects: Computation and Language (cs.CL)

    Summit work of the Spanish Golden Age and forefather of the so-called
    picaresque novel, The Life of Lazarillo de Tormes and of His Fortunes and
    Adversities still remains an anonymous text. Although distinguished scholars
    have tried to attribute it to different authors based on a variety of criteria,
    a consensus has yet to be reached. The list of candidates is long and not all
    of them enjoy the same support within the scholarly community. Analyzing their
    works from a data-driven perspective and applying machine learning techniques
    for style and text fingerprinting, we shed light on the authorship of the
    Lazarillo. As in a state-of-the-art survey, we discuss the methods used and how
    they perform in our specific case. According to our methodology, the most
    likely author seems to be Juan Arce de Ot’alora, closely followed by Alfonso
    de Vald’es. The method states that not certain attribution can be made with
    the given corpus.

    How to do lexical quality estimation of a large OCRed historical Finnish newspaper collection with scarce resources

    Kimmo Kettunen, Tuula Pääkkönen
    Comments: 24 pages, 6 tables, 6 figures
    Subjects: Computation and Language (cs.CL)

    The National Library of Finland has digitized the historical newspapers
    published in Finland between 1771 and 1910. This collection contains
    approximately 1.95 million pages in Finnish and Swedish. Finnish part of the
    collection consists of about 2.40 billion words. The National Library’s Digital
    Collections are offered via the digi.kansalliskirjasto.fi web service, also
    known as Digi. Part of the newspaper material (from 1771 to 1874) is also
    available freely downloadable in The Language Bank of Finland provided by the
    FINCLARIN consortium. The collection can also be accessed through the Korp
    environment that has been developed by Spr{aa}kbanken at the University of
    Gothenburg and extended by FINCLARIN team at the University of Helsinki to
    provide concordances of text resources. A Cranfield style information retrieval
    test collection has also been produced out of a small part of the Digi
    newspaper material at the University of Tampere.

    Quality of OCRed collections is an important topic in digital humanities, as
    it affects general usability and searchability of collections. There is no
    single available method to assess quality of large collections, but different
    methods can be used to approximate quality. This paper discusses different
    corpus analysis style methods to approximate overall lexical quality of the
    Finnish part of the Digi collection. Methods include usage of parallel samples
    and word error rates, usage of morphological analyzers, frequency analysis of
    words and comparisons to comparable edited lexical data. Our aim in the quality
    analysis is twofold: firstly to analyze the present state of the lexical data
    and secondly, to establish a set of assessment methods that build up a compact
    procedure for quality assessment after e.g. new OCRing or post correction of
    the material. In the discussion part of the paper we shall synthesize results
    of our different analyses.

    A Way out of the Odyssey: Analyzing and Combining Recent Insights for LSTMs

    Shayne Longpre, Sabeek Pradhan, Caiming Xiong, Richard Socher
    Subjects: Computation and Language (cs.CL)

    LSTMs have become a basic building block for many deep NLP models. In recent
    years, many improvements and variations have been proposed for deep sequence
    models in general, and LSTMs in particular. We propose and analyze a series of
    architectural modifications for LSTM networks resulting in improved performance
    for text classification datasets. We observe compounding improvements on
    traditional LSTMs using Monte Carlo test-time model averaging, deep vector
    averaging (DVA), and residual connections, along with four other suggested
    modifications. Our analysis provides a simple, reliable, and high quality
    baseline model.

    PCT and Beyond: Towards a Computational Framework for `Intelligent' Communicative Systems

    Prof. Roger K. Moore
    Comments: To appear in A. McElhone & W. Mansell (Eds.), Living Control Systems IV: Perceptual Control Theory and the Future of the Life and Social Sciences, Benchmark Publications Inc
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Robotics (cs.RO)

    Recent years have witnessed increasing interest in the potential benefits of
    `intelligent’ autonomous machines such as robots. Honda’s Asimo humanoid robot,
    iRobot’s Roomba robot vacuum cleaner and Google’s driverless cars have fired
    the imagination of the general public, and social media buzz with speculation
    about a utopian world of helpful robot assistants or the coming robot
    apocalypse! However, there is a long way to go before autonomous systems reach
    the level of capabilities required for even the simplest of tasks involving
    human-robot interaction – especially if it involves communicative behaviour
    such as speech and language. Of course the field of Artificial Intelligence
    (AI) has made great strides in these areas, and has moved on from abstract
    high-level rule-based paradigms to embodied architectures whose operations are
    grounded in real physical environments. What is still missing, however, is an
    overarching theory of intelligent communicative behaviour that informs
    system-level design decisions in order to provide a more coherent approach to
    system integration. This chapter introduces the beginnings of such a framework
    inspired by the principles of Perceptual Control Theory (PCT). In particular,
    it is observed that PCT has hitherto tended to view perceptual processes as a
    relatively straightforward series of transformations from sensation to
    perception, and has overlooked the potential of powerful generative model-based
    solutions that have emerged in practical fields such as visual or auditory
    scene analysis. Starting from first principles, a sequence of arguments is
    presented which not only shows how these ideas might be integrated into PCT,
    but which also extend PCT towards a remarkably symmetric architecture for a
    needs-driven communicative agent. It is concluded that, if behaviour is the
    control of perception, then perception is the simulation of behaviour.

    The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives

    Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan Boyd-Graber, Hal Daumé III, Larry Davis
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    Visual narrative is often a combination of explicit information and judicious
    omissions, relying on the viewer to supply missing details. In comics, most
    movements in time and space are hidden in the “gutters” between panels. To
    follow the story, readers logically connect panels together by inferring unseen
    actions through a process called “closure”. While computers can now describe
    the content of natural images, in this paper we examine whether they can
    understand the closure-driven narratives conveyed by stylized artwork and
    dialogue in comic book panels. We collect a dataset, COMICS, that consists of
    over 1.2 million panels (120 GB) paired with automatic textbox transcriptions.
    An in-depth analysis of COMICS demonstrates that neither text nor image alone
    can tell a comic book story, so a computer must understand both modalities to
    keep up with the plot. We introduce three cloze-style tasks that ask models to
    predict narrative and character-centric aspects of a panel given n preceding
    panels as context. Various deep neural architectures underperform human
    baselines on these tasks, suggesting that COMICS contains fundamental
    challenges for both vision and language.


    Distributed, Parallel, and Cluster Computing

    File Synchronization Systems Survey

    Zulqarnain Mehdi, Hani Ragab-Hassen
    Comments: The Sixth International Conference on Computer Science, Engineering & Applications (ICCSEA 2016)
    Journal-ref: ICCSEA 2016
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Several solutions exist for file storage, sharing, and synchronization. Many
    of them involve a central server, or a collection of servers, that either store
    the files, or act as a gateway for them to be shared. Some systems take a
    decentralized approach, wherein interconnected users form a peer-to-peer (P2P)
    network, and partake in the sharing process: they share the files they possess
    with others, and can obtain the files owned by other peers. In this paper, we
    survey various technologies, both cloud-based and P2P-based, that users use to
    synchronize their files across the network, and discuss their strengths and
    weaknesses.

    Possibility and Impossibility of Reliable Broadcast in the Bounded Model

    Danny Dolev, Meir Spielrien
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    The Reliable Broadcast concept allows an honest party to send a message to
    all other parties and to make sure that all honest parties receive this
    message. In addition, it allows an honest party that received a message to know
    that all other honest parties would also receive the same message. This
    technique is important to ensure distributed consistency when facing failures.

    In the current paper, we study the ability to use RR to consistently
    transmit a sequence of input values in an asynchronous environment with a
    designated sender. The task can be easily achieved using counters, but cannot
    be achieved with a bounded memory facing failures. We weaken the problem and
    ask whether the receivers can at least share a common suffix. We prove that in
    a standard (lossless) asynchronous system no bounded memory protocol can
    guarantee a common suffix at all receivers for every input sequence if a single
    party might crash.

    We further study the problem facing transient faults and prove that when
    limiting the problem to transmitting a stream of a single value being sent
    repeatedly we show a bounded memory self-stabilizing protocol that can ensure a
    common suffix even in the presence of transient faults and an arbitrary number
    of crash faults. We further prove that this last problem is not solvable in the
    presence of a single Byzantine fault. Thus, this problem {f separates}
    Byzantine behavior from crash faults in an asynchronous environment.

    The Effects of Relative Importance of User Constraints in Cloud of Things Resource Discovery: A Case Study

    Luiz H. Nunes, Julio C. Estrella, Alexandre C. B. Delbem, Charith Perera, Stephan Reiff-Marganiec
    Comments: Proceedings of the 9th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2016) Shaghai, China, December, 2016
    Journal-ref: Proceedings of the 9th IEEE/ACM International Conference on
    Utility and Cloud Computing (UCC 2016) Shaghai, China, December, 2016
    Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

    Over the last few years, the number of smart objects connected to the
    Internet has grown exponentially in comparison to the number of services and
    applications. The integration between Cloud Computing and Internet of Things,
    named as Cloud of Things, plays a key role in managing the connected things,
    their data and services. One of the main challenges in Cloud of Things is the
    resource discovery of the smart objects and their reuse in different contexts.
    Most of the existent work uses some kind of multi-criteria decision analysis
    algorithm to perform the resource discovery, but do not evaluate the impact
    that the user constraints has in the final solution. In this paper, we analyse
    the behaviour of the SAW, TOPSIS and VIKOR multi-objective decision analyses
    algorithms and the impact of user constraints on them. We evaluated the quality
    of the proposed solutions using the Pareto-optimality concept.


    Learning

    Grammar Argumented LSTM Neural Networks with Note-Level Encoding for Music Composition

    Zheng Sun, Jiaqi Liu, Zewang Zhang, Jingwen Chen, Zhao Huo, Ching Hua Lee, Xiao Zhang
    Comments: 5 pages, 5 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)

    Creating any aesthetically pleasing piece of art, like music, has been a long
    time dream for artificial intelligence research. Based on recent success of
    long-short term memory (LSTM) on sequence learning, we put forward a novel
    system to reflect the thinking pattern of a musician. For data representation,
    we propose a note-level encoding method, which enables our model to simulate
    how human composes and polishes music phrases. To avoid failure against music
    theory, we invent a novel method, grammar argumented (GA) method. It can teach
    machine basic composing principles. In this method, we propose three rules as
    argumented grammars and three metrics for evaluation of machine-made music.
    Results show that comparing to basic LSTM, grammar argumented model’s
    compositions have higher contents of diatonic scale notes, short pitch
    intervals, and chords.

    ZipML: An End-to-end Bitwise Framework for Dense Generalized Linear Models

    Hantian Zhang, Kaan Kara, Jerry Li, Dan Alistarh, Ji Liu, Ce Zhang
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We present ZipML, the first framework for training dense generalized linear
    models using end-to-end low-precision representation–in ZipML, all movements
    of data, including those for input samples, model, and gradients, are
    represented using as little as two bits per component. Within our framework, we
    have successfully compressed, separately, the input data by 16x, gradient by
    16x, and model by 16x while still getting the same training result. Even for
    the most challenging datasets, we find that robust convergence can be ensured
    using only an end-to-end 8-bit representation or a 6-bit representation if only
    samples are quantized.

    Our work builds on previous research on using low-precision representations
    for gradient and model in the context of stochastic gradient descent. Our main
    technical contribution is a new set of techniques which allow the training
    samples to be processed with low precision, without affecting the convergence
    of the algorithm. In turn, this leads to a system where all data items move in
    a quantized, low precision format. In particular, we first establish that
    randomized rounding, while sufficient when quantizing the model and the
    gradients, is biased when quantizing samples, and thus leads to a different
    training result. We propose two new data representations which converge to the
    same solution as in the original data representation both in theory and
    empirically and require as little as 2-bits per component. As a result, if the
    original data is stored as 32-bit floats, we decrease the bandwidth footprint
    for each training iteration by up to 16x. Our results hold for models such as
    linear regression and least squares SVM.

    ZipML raises interesting theoretical questions related to the robustness of
    SGD to approximate data, model, and gradient representations. We conclude this
    working paper by a description of ongoing work extending these preliminary
    results.

    Reinforcement Learning with Unsupervised Auxiliary Tasks

    Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, Koray Kavukcuoglu
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Deep reinforcement learning agents have achieved state-of-the-art results by
    directly maximising cumulative reward. However, environments contain a much
    wider variety of possible training signals. In this paper, we introduce an
    agent that also maximises many other pseudo-reward functions simultaneously by
    reinforcement learning. All of these tasks share a common representation that,
    like unsupervised learning, continues to develop in the absence of extrinsic
    rewards. We also introduce a novel mechanism for focusing this representation
    upon extrinsic rewards, so that learning can rapidly adapt to the most relevant
    aspects of the actual task. Our agent significantly outperforms the previous
    state-of-the-art on Atari, averaging 880\% expert human performance, and a
    challenging suite of first-person, three-dimensional emph{Labyrinth} tasks
    leading to a mean speedup in learning of 10( imes) and averaging 87\% expert
    human performance on Labyrinth.

    Spectral Convolution Networks

    Maria Francesca, Arthur Hughes, David Gregg
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Previous research has shown that computation of convolution in the frequency
    domain provides a significant speedup versus traditional convolution network
    implementations. However, this performance increase comes at the expense of
    repeatedly computing the transform and its inverse in order to apply other
    network operations such as activation, pooling, and dropout. We show,
    mathematically, how convolution and activation can both be implemented in the
    frequency domain using either the Fourier or Laplace transformation. The main
    contributions are a description of spectral activation under the Fourier
    transform and a further description of an efficient algorithm for computing
    both convolution and activation under the Laplace transform. By computing both
    the convolution and activation functions in the frequency domain, we can reduce
    the number of transforms required, as well as reducing overall complexity. Our
    description of a spectral activation function, together with existing spectral
    analogs of other network functions may then be used to compose a fully spectral
    implementation of a convolution network.

    Vote Aggregation as a Clustering Problem

    Abhay Gupta
    Comments: 8 pages, 1 figure
    Subjects: Learning (cs.LG)

    An important way to make large training sets is to gather noisy labels from
    crowds of non experts. We propose a method to aggregate noisy labels collected
    from a crowd of workers or annotators. Eliciting labels is important in tasks
    such as judging web search quality and rating products. Our method assumes that
    labels are generated by a probability distribution over items and labels. We
    formulate the method by drawing parallels between Gaussian Mixture Models
    (GMMs) and Restricted Boltzmann Machines (RBMs) and show that the problem of
    vote aggregation can be viewed as one of clustering. We use K-RBMs to perform
    clustering. We finally show some empirical evaluations over real datasets.

    A Learning Scheme for Microgrid Islanding and Reconnection

    Carter Lassetter, Eduardo Cotilla-Sanchez, Jinsub Kim
    Comments: 8 pages, 4 figures
    Subjects: Learning (cs.LG); Systems and Control (cs.SY)

    This paper introduces a robust learning scheme that can dynamically predict
    the stability of the reconnection of sub-networks to a main grid. As the future
    electrical power systems tend towards smarter and greener technology, the
    deployment of self sufficient networks, or microgrids, becomes more likely.
    Microgrids may operate on their own or synchronized with the main grid, thus
    control methods need to take into account islanding and reconnecting said
    networks. The ability to optimally and safely reconnect a portion of the grid
    is not well understood and, as of now, limited to raw synchronization between
    interconnection points. A support vector machine (SVM) leveraging real-time
    data from phasor measurement units (PMUs) is proposed to predict in real time
    whether the reconnection of a sub-network to the main grid would lead to
    stability or instability. A dynamics simulator fed with pre-acquired system
    parameters is used to create training data for the SVM in various operating
    states. The classifier was tested on a variety of cases and operating points to
    ensure diversity. Accuracies of approximately 90% were observed throughout most
    conditions when making dynamic predictions of a given network.

    Bayesian optimization of hyper-parameters in reservoir computing

    Jan Yperman, Thijs Becker
    Subjects: Learning (cs.LG)

    We describe a method for searching the optimal hyper-parameters in reservoir
    computing, which consists of a Gaussian process with Bayesian optimization. It
    provides an alternative to other frequently used optimization methods such as
    grid, random, or manual search. In addition to a set of optimal
    hyper-parameters, the method also provides a probability distribution of the
    cost function as a function of the hyper-parameters. We apply this method to
    two types of reservoirs: nonlinear delay nodes and echo state networks. It
    shows excellent performance on all considered benchmarks, either matching or
    significantly surpassing expert human optimization. We find that some values
    for hyper-parameters that have become standard in the research community, are
    in fact suboptimal for most of the problems we considered. In general, the
    algorithm achieves optimal results in fewer iterations when compared to other
    optimization methods, and scales well with increasing dimensionality of the
    hyper-parameter space. Due to its automated nature, this method significantly
    reduces the need for expert knowledge when optimizing the hyper-parameters in
    reservoir computing. Existing software libraries for Bayesian optimization make
    the implementation of the algorithm straightforward.

    Graph Learning from Data under Structural and Laplacian Constraints

    Hilmi E. Egilmez, Eduardo Pavez, Antonio Ortega
    Comments: This paper has been submitted to IEEE Trans. on Selected Topics in Signal Processing
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Graphs are fundamental mathematical structures used in various fields to
    represent data, signals and processes. In this paper, we propose a novel
    framework for learning/estimating graphs from data. The proposed framework
    includes (i) formulation of various graph learning problems, (ii) their
    probabilistic interpretations and (iii) efficient algorithms to solve them. We
    specifically focus on graph learning problems where the goal is to estimate a
    graph Laplacian matrix from some observed data under given structural
    constraints (e.g., graph connectivity and sparsity). Our experimental results
    demonstrate that the proposed algorithms outperform the current
    state-of-the-art methods in terms of graph learning performance.

    Net-Trim: A Layer-wise Convex Pruning of Deep Neural Networks

    Alireza Aghasi, Nam Nguyen, Justin Romberg
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Model reduction is a highly desirable process for deep neural networks. While
    large networks are theoretically capable of learning arbitrarily complex
    models, overfitting and model redundancy negatively affects the prediction
    accuracy and model variance. Net-Trim is a layer-wise convex framework to prune
    (sparsify) deep neural networks. The method is applicable to neural networks
    operating with the rectified linear unit (ReLU) as the nonlinear activation.
    The basic idea is to retrain the network layer by layer keeping the layer
    inputs and outputs close to the originally trained model, while seeking a
    sparse transform matrix. We present both the parallel and cascade versions of
    the algorithm. While the former enjoys computational distributability, the
    latter is capable of achieving simpler models. In both cases, we mathematically
    show a consistency between the retrained model and the initial trained network.
    We also derive the general sufficient conditions for the recovery of a sparse
    transform matrix. In the case of standard Gaussian training samples of
    dimension (N) being fed to a layer, and (s) being the maximum number of nonzero
    terms across all columns of the transform matrix, we show that
    (mathcal{O}(slog N)) samples are enough to accurately learn the layer model.

    A Semi-Markov Switching Linear Gaussian Model for Censored Physiological Data

    Ahmed M. Alaa, Jinsung Yoon, Scott Hu, Mihaela van der Schaar
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Critically ill patients in regular wards are vulnerable to unanticipated
    clinical dete- rioration which requires timely transfer to the intensive care
    unit (ICU). To allow for risk scoring and patient monitoring in such a setting,
    we develop a novel Semi- Markov Switching Linear Gaussian Model (SSLGM) for the
    inpatients’ physiol- ogy. The model captures the patients’ latent clinical
    states and their corresponding observable lab tests and vital signs. We present
    an efficient unsupervised learn- ing algorithm that capitalizes on the
    informatively censored data in the electronic health records (EHR) to learn the
    parameters of the SSLGM; the learned model is then used to assess the new
    inpatients’ risk for clinical deterioration in an online fashion, allowing for
    timely ICU admission. Experiments conducted on a het- erogeneous cohort of
    6,094 patients admitted to a large academic medical center show that the
    proposed model significantly outperforms the currently deployed risk scores
    such as Rothman index, MEWS, SOFA and APACHE.

    S3Pool: Pooling with Stochastic Spatial Sampling

    Shuangfei Zhai, Hui Wu, Abhishek Kumar, Yu Cheng, Yongxi Lu, Zhongfei Zhang, Rogerio Feris
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Feature pooling layers (e.g., max pooling) in convolutional neural networks
    (CNNs) serve the dual purpose of providing increasingly abstract
    representations as well as yielding computational savings in subsequent
    convolutional layers. We view the pooling operation in CNNs as a two-step
    procedure: first, a pooling window (e.g., (2 imes 2)) slides over the feature
    map with stride one which leaves the spatial resolution intact, and second,
    downsampling is performed by selecting one pixel from each non-overlapping
    pooling window in an often uniform and deterministic (e.g., top-left) manner.
    Our starting point in this work is the observation that this regularly spaced
    downsampling arising from non-overlapping windows, although intuitive from a
    signal processing perspective (which has the goal of signal reconstruction), is
    not necessarily optimal for emph{learning} (where the goal is to generalize).
    We study this aspect and propose a novel pooling strategy with stochastic
    spatial sampling (S3Pool), where the regular downsampling is replaced by a more
    general stochastic version. We observe that this general stochasticity acts as
    a strong regularizer, and can also be seen as doing implicit data augmentation
    by introducing distortions in the feature maps. We further introduce a
    mechanism to control the amount of distortion to suit different datasets and
    architectures. To demonstrate the effectiveness of the proposed approach, we
    perform extensive experiments on several popular image classification
    benchmarks, observing excellent improvements over baseline models. Experimental
    code is available at this https URL

    Convergence rate of stochastic k-means

    Cheng Tang, Claire Monteleoni
    Comments: arXiv admin note: substantial text overlap with arXiv:1610.04900
    Subjects: Learning (cs.LG)

    We analyze online cite{BottouBengio} and mini-batch cite{Sculley} (k)-means
    variants. Both scale up the widely used (k)-means algorithm via stochastic
    approximation, and have become popular for large-scale clustering and
    unsupervised feature learning. We show, for the first time, that starting with
    any initial solution, they converge to a “local optimum” at rate
    (O(frac{1}{t})) (in terms of the (k)-means objective) under general
    conditions. In addition, we show if the dataset is clusterable, when
    initialized with a simple and scalable seeding algorithm, mini-batch (k)-means
    converges to an optimal (k)-means solution at rate (O(frac{1}{t})) with high
    probability. The (k)-means objective is non-convex and non-differentiable: we
    exploit ideas from recent work on stochastic gradient descent for non-convex
    problems cite{ge:sgd_tensor, balsubramani13} by providing a novel
    characterization of the trajectory of (k)-means algorithm on its solution
    space, and circumvent the non-differentiability problem via geometric insights
    about (k)-means update.

    Learning Dexterous Manipulation Policies from Experience and Imitation

    Vikash Kumar, Abhishek Gupta, Emanuel Todorov, Sergey Levine
    Comments: Initial draft for a journal submission
    Subjects: Learning (cs.LG); Robotics (cs.RO); Systems and Control (cs.SY)

    We explore learning-based approaches for feedback control of a dexterous
    five-finger hand performing non-prehensile manipulation. First, we learn local
    controllers that are able to perform the task starting at a predefined initial
    state. These controllers are constructed using trajectory optimization with
    respect to locally-linear time-varying models learned directly from sensor
    data. In some cases, we initialize the optimizer with human demonstrations
    collected via teleoperation in a virtual environment. We demonstrate that such
    controllers can perform the task robustly, both in simulation and on the
    physical platform, for a limited range of initial conditions around the trained
    starting state. We then consider two interpolation methods for generalizing to
    a wider range of initial conditions: deep learning, and nearest neighbors. We
    find that nearest neighbors achieve higher performance. Nevertheless, the
    neural network has its advantages: it uses only tactile and proprioceptive
    feedback but no visual feedback about the object (i.e. it performs the task
    blind) and learns a time-invariant policy. In contrast, the nearest neighbors
    method switches between time-varying local controllers based on the proximity
    of initial object states sensed via motion capture. While both generalization
    methods leave room for improvement, our work shows that (i) local
    trajectory-based controllers for complex non-prehensile manipulation tasks can
    be constructed from surprisingly small amounts of training data, and (ii)
    collections of such controllers can be interpolated to form more global
    controllers. Results are summarized in the supplementary video:
    this https URL

    Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification

    Yongxi Lu, Abhishek Kumar, Shuangfei Zhai, Yu Cheng, Tara Javidi, Rogerio Feris
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Multi-task learning aims to improve generalization performance of multiple
    prediction tasks by appropriately sharing relevant information across them. In
    the context of deep neural networks, this idea is often realized by
    hand-designed network architectures with layers that are shared across tasks
    and branches that encode task-specific features. However, the space of possible
    multi-task deep architectures is combinatorially large and often the final
    architecture is arrived at by manual exploration of this space subject to
    designer’s bias, which can be both error-prone and tedious. In this work, we
    propose a principled approach for designing compact multi-task deep learning
    architectures. Our approach starts with a thin network and dynamically widens
    it in a greedy manner during training using a novel criterion that promotes
    grouping of similar tasks together. Our Extensive evaluation on person
    attributes classification tasks involving facial and clothing attributes
    suggests that the models produced by the proposed method are fast, compact and
    can closely match or exceed the state-of-the-art accuracy from strong baselines
    by much more expensive models.

    DeepCas: an End-to-end Predictor of Information Cascades

    Cheng Li, Jiaqi Ma, Xiaoxiao Guo, Qiaozhu Mei
    Subjects: Social and Information Networks (cs.SI); Learning (cs.LG)

    Information cascades, effectively facilitated by most social network
    platforms, are recognized as a major factor in almost every social success and
    disaster in these networks. Can cascades be predicted? While many believe that
    they are inherently unpredictable, recent work has shown that some key
    properties of information cascades, such as size, growth, and shape, can be
    predicted by a machine learning algorithm that combines many features. These
    predictors all depend on a bag of hand-crafting features to represent the
    cascade network and the global network structure. Such features, always
    carefully and sometimes mysteriously designed, are not easy to extend or to
    generalize to a different platform or domain.

    Inspired by the recent successes of deep learning in multiple data mining
    tasks, we investigate whether an end-to-end deep learning approach could
    effectively predict the future size of cascades. Such a method automatically
    learns the representation of individual cascade graphs in the context of the
    global network structure, without hand-crafted features and heuristics. We find
    that node embeddings fall short of predictive power, and it is critical to
    learn the representation of a cascade graph as a whole. We present algorithms
    that learn the representation of cascade graphs in an end-to-end manner, which
    significantly improve the performance of cascade prediction over strong
    baselines that include feature based methods, node embedding methods, and graph
    kernel methods. Our results also provide interesting implications for cascade
    prediction in general.

    Fast On-Line Kernel Density Estimation for Active Object Localization

    Anthony D. Rhodes, Max H. Quinn, Melanie Mitchell
    Comments: arXiv admin note: text overlap with arXiv:1607.00548
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    A major goal of computer vision is to enable computers to interpret visual
    situations—abstract concepts (e.g., “a person walking a dog,” “a crowd
    waiting for a bus,” “a picnic”) whose image instantiations are linked more by
    their common spatial and semantic structure than by low-level visual
    similarity. In this paper, we propose a novel method for prior learning and
    active object localization for this kind of knowledge-driven search in static
    images. In our system, prior situation knowledge is captured by a set of
    flexible, kernel-based density estimations—a situation model—that represent
    the expected spatial structure of the given situation. These estimations are
    efficiently updated by information gained as the system searches for relevant
    objects, allowing the system to use context as it is discovered to narrow the
    search.

    More specifically, at any given time in a run on a test image, our system
    uses image features plus contextual information it has discovered to identify a
    small subset of training images—an importance cluster—that is deemed most
    similar to the given test image, given the context. This subset is used to
    generate an updated situation model in an on-line fashion, using an efficient
    multipole expansion technique.

    As a proof of concept, we apply our algorithm to a highly varied and
    challenging dataset consisting of instances of a “dog-walking” situation. Our
    results support the hypothesis that dynamically-rendered, context-based
    probability models can support efficient object localization in visual
    situations. Moreover, our approach is general enough to be applied to diverse
    machine learning paradigms requiring interpretable, probabilistic
    representations generated from partially observed data.

    Deep Variational Inference Without Pixel-Wise Reconstruction

    Siddharth Agrawal, Ambedkar Dukkipati
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Variational autoencoders (VAEs), that are built upon deep neural networks
    have emerged as popular generative models in computer vision. Most of the work
    towards improving variational autoencoders has focused mainly on making the
    approximations to the posterior flexible and accurate, leading to tremendous
    progress. However, there have been limited efforts to replace pixel-wise
    reconstruction, which have known shortcomings. In this work, we use real-valued
    non-volume preserving transformations (real NVP) to exactly compute the
    conditional likelihood of the data given the latent distribution. We show that
    a simple VAE with this form of reconstruction is competitive with complicated
    VAE structures, on image modeling tasks. As part of our model, we develop
    powerful conditional coupling layers that enable real NVP to learn with fewer
    intermediate layers.

    Training Spiking Deep Networks for Neuromorphic Hardware

    Eric Hunsberger, Chris Eliasmith
    Comments: 10 pages, 3 figures, 4 tables; the “methods” section of this article draws heavily on arXiv:1510.08829
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    We describe a method to train spiking deep networks that can be run using
    leaky integrate-and-fire (LIF) neurons, achieving state-of-the-art results for
    spiking LIF networks on five datasets, including the large ImageNet ILSVRC-2012
    benchmark. Our method for transforming deep artificial neural networks into
    spiking networks is scalable and works with a wide range of neural
    nonlinearities. We achieve these results by softening the neural response
    function, such that its derivative remains bounded, and by training the network
    with noise to provide robustness against the variability introduced by spikes.
    Our analysis shows that implementations of these networks on neuromorphic
    hardware will be many times more power-efficient than the equivalent
    non-spiking networks on traditional hardware.

    Machine Learning Approach for Skill Evaluation in Robotic-Assisted Surgery

    Mahtab J. Fard, Sattar Ameri, Ratna B. Chinnam, Abhilash K. Pandya, Michael D. Klein, R. Darin Ellis
    Journal-ref: Lecture Notes in Engineering and Computer Science: Proceedings of
    The World Congress on Engineering and Computer Science 2016, 19-21 October,
    2016, San Francisco, USA
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Evaluating surgeon skill has predominantly been a subjective task.
    Development of objective methods for surgical skill assessment are of increased
    interest. Recently, with technological advances such as robotic-assisted
    minimally invasive surgery (RMIS), new opportunities for objective and
    automated assessment frameworks have arisen. In this paper, we applied machine
    learning methods to automatically evaluate performance of the surgeon in RMIS.
    Six important movement features were used in the evaluation including
    completion time, path length, depth perception, speed, smoothness and
    curvature. Different classification methods applied to discriminate expert and
    novice surgeons. We test our method on real surgical data for suturing task and
    compare the classification result with the ground truth data (obtained by
    manual labeling). The experimental results show that the proposed framework can
    classify surgical skill level with relatively high accuracy of 85.7%. This
    study demonstrates the ability of machine learning methods to automatically
    classify expert and novice surgeons using movement features for different RMIS
    tasks. Due to the simplicity and generalizability of the introduced
    classification method, it is easy to implement in existing trainers.

    Probabilistic Failure Analysis in Model Validation & Verification

    Ning Ge, Marc Panten, Xavier Crégut
    Journal-ref: International Conference on Embedded Real Time Software and
    Systems (ERTS 2014)
    Subjects: Software Engineering (cs.SE); Learning (cs.LG)

    Automated fault localization is an important issue in model validation and
    verification. It helps the end users in analyzing the origin of failure. In
    this work, we show the early experiments with probabilistic analysis approaches
    in fault localization. Inspired by the Kullback-Leibler Divergence from
    Bayesian probabilistic theory, we propose a suspiciousness factor to compute
    the fault contribution for the transitions in the reachability graph of model
    checking, using which to rank the potential faulty transitions. To
    automatically locate design faults in the simulation model of detailed design,
    we propose to use the statistical model Hidden Markov Model (HMM), which
    provides statistically identical information to component’s real behavior. The
    core of this method is a fault localization algorithm that gives out the set of
    suspicious ranked faulty components and a backward algorithm that computes the
    matching degree between the HMM and the simulation model to evaluate the
    confidence degree of the localization conclusion.


    Information Theory

    On the Spectral Efficiency and Security Enhancements of NOMA Assisted Multicast-Unicast Streaming

    Zhiguo Ding, Zhongyuan Zhao, Mugen Peng, H. Vincent Poor
    Subjects: Information Theory (cs.IT)

    This paper considers the application of non-orthogonal multiple access (NOMA)
    to a multi-user network with mixed multicasting and unicasting traffic. The
    proposed design of beamforming and power allocation ensures that the unicasting
    performance is improved while maintaining the reception reliability of
    multicasting. Both analytical and simulation results are provided to
    demonstrate that the use of the NOMA assisted multicast-unicast scheme yields a
    significant improvement in spectral efficiency compared to orthogonal multiple
    access (OMA) schemes which realize multicasting and unicasting services
    separately. Since unicasting messages are broadcasted to all the users, how the
    use of NOMA can prevent those multicasting receivers intercepting the
    unicasting messages is also investigated, where it is shown that the secrecy
    unicasting rate achieved by NOMA is always larger than or equal to that of OMA.
    This security gain is mainly due to the fact that the multicasting messages can
    be used as jamming signals to prevent potential eavesdropping when the
    multicasting and unicasting messages are superimposed together following the
    NOMA principle.

    Optimizing DF Cognitive Radio Networks with Full-Duplex-Enabled Energy Access Points

    Hong Xing, Xin Kang, Kai-Kit Wong, Arumugam Nallanathan
    Comments: 30 pages, 7 figures, submitted for possible journal publication
    Subjects: Information Theory (cs.IT)

    With the recent advances in radio frequency (RF) energy harvesting (EH)
    technologies, wireless powered cooperative cognitive radio network (CCRN) has
    drawn an upsurge of interest for improving the spectrum utilization with
    incentive to motivate joint information and energy cooperation between the
    primary and secondary systems. Dedicated energy beamforming (EB) is aimed for
    remedying the low efficiency of wireless power transfer (WPT), which
    nevertheless arouses out-of-band EH phases and thus low cooperation efficiency.
    To address this issue, in this paper, we consider a novel RF EH CCRN aided by
    full-duplex (FD)-enabled energy access points (EAPs) that can cooperate to
    wireless charge the secondary transmitter (ST) while concurrently receiving
    primary transmitter (PT)’s signal in the first transmission phase, and to
    perform decode-and-forward (DF) relaying in the second transmission phase. We
    investigate a weighted sum-rate maximization problem subject to the
    transmitting power constraints as well as a total cost constraint using
    successive convex approximation (SCA) techniques. A zero-forcing (ZF) based
    suboptimal scheme that is locally optimal at the EAPs is also derived. Various
    tradeoffs between the weighted sum-rate and other system parameters are
    provided in numerical results to corroborate the effectiveness of the proposed
    solutions against the benchmark schemes.

    Approximate Capacity Region of the Two-User Gaussian Interference Channel with Noisy Channel-Output Feedback

    Victor Quintero, Samir M. Perlaza, Iñaki Esnaola, Jean-Marie Gorce
    Comments: This work was submitted to the IEEE Transactions on Information Theory in November 10 2016. Part of this work was presented at the IEEE International Workshop on Information Theory (ITW), Cambridge, United Kingdom, September, 2016 (arXiv:1603.07554), and IEEE International Workshop on Information Theory (ITW), Jeju Island, Korea, October, 2015 (arXiv:1502.04649). Parts of this work appear in INRIA Research Reports 0456 (arXiv:1608.08920) and 8861 (arXiv:1608.08907)
    Subjects: Information Theory (cs.IT)

    In this paper, the capacity region of the linear deterministic interference
    channel with noisy channel-output feedback (LD-IC-NOF) is fully characterized.
    A capacity-achieving scheme is obtained using a random coding argument and
    three well-known techniques: rate splitting, superposition coding and backward
    decoding. The converse region is obtained using some of the existing outer
    bounds as well as a set of new outer bounds that are obtained by using
    genie-aided models of the original LD-IC-NOF. Using the insights gained from
    the analysis of the LD-IC-NOF, an achievability region and a converse region
    for the two-user Gaussian interference channel with noisy channel-output
    feedback (G-IC-NOF) are presented. Finally, the achievability region and the
    converse region approximate the capacity region of the G-IC-NOF to within 4.4
    bits.

    Hilbert Transform, Analytic Signal, and Modulation Analysis for Graph Signal Processing

    Arun Venkitaraman, Saikat Chatterjee, Peter Händel
    Comments: Submitted to IEEE JSTSP
    Subjects: Information Theory (cs.IT); Social and Information Networks (cs.SI)

    We propose Hilbert transform (HT) and analytic signal (AS) construction for
    signals over graphs. This is motivated by the popularity of HT, AS, and
    modulation analysis in conventional signal processing, and the observation that
    complementary insight is often obtained by viewing conventional signals in the
    graph setting. Our definitions of HT and AS use a conjugate-symmetry-like
    property exhibited by the graph Fourier transform (GFT). We show that a real
    graph signal (GS) can be represented using smaller number of GFT coefficients
    than the signal length. We show that the graph HT (GHT) and graph AS (GAS)
    operations are linear and shift-invariant over graphs. Using the GAS, we define
    the amplitude, phase, and frequency modulations for a graph signal (GS).
    Further, we use convex optimization to develop an alternative definition of
    envelope for a GS. We illustrate the proposed concepts by showing applications
    to synthesized and real-world signals. For example, we show that the GHT is
    suitable for anomaly detection/analysis over networks and that GAS reveals
    complementary information in speech signals.

    Joint Energy-Bandwidth Allocation for Multi-User Channels with Cooperating Hybrid Energy Nodes

    Vaneet Aggarwal, Mark R. Bell, Anis Elgabli, Xiaodong Wang, Shan Zhong
    Comments: arXiv admin note: text overlap with arXiv:1502.04391 by other authors
    Subjects: Information Theory (cs.IT)

    In this paper, we consider the energy-bandwidth allocation for a network of
    multiple users, where the transmitters each powered by both an energy harvester
    and conventional grid, access the network orthogonally on the assigned
    frequency band. We assume that the energy harvesting state and channel gain of
    each transmitter can be predicted for (K) time slots a priori. The different
    transmitters can cooperate by donating energy to each other. The tradeoff among
    the weighted sum throughput, the use of grid energy, and the amount of energy
    cooperation is studied through an optimization objective which is a linear
    combination of these quantities. This leads to an optimization problem with
    O((N^2K)) constraints, where (N) is the total number of transmitter-receiver
    pairs, and the optimization is over seven sets of variables that denote energy
    and bandwidth allocation, grid energy utilization, and energy cooperation. To
    solve the problem efficiently, an iterative algorithm is proposed using the
    Proximal Jacobian ADMM. The optimization sub-problems corresponding to Proximal
    Jacobian ADMM steps are solved in closed form. We show that this algorithm
    converges to the optimal solution with an overall complexity of O((N^2K^2)).
    Numerical results show that the proposed algorithms can make efficient use of
    the harvested energy, grid energy, energy cooperation, and the available
    bandwidth.

    The Fluctuating Two-Ray Fading Model: Statistical Characterization and Performance Analysis

    Juan M. Romero-Jerez, F. Javier Lopez-Martinez, José F. Paris, Andrea J. Goldsmith
    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accesible
    Subjects: Information Theory (cs.IT)

    We introduce the Fluctuating Two-Ray (FTR) fading model, a new statistical
    channel model that consists of two fluctuating specular components with random
    phases plus a diffuse component. The FTR model arises as the natural
    generalization of the two-wave with diffuse power (TWDP) fading model; this
    generalization allows its two specular components to exhibit a random amplitude
    fluctuation. Unlike the TWDP model, all the chief probability functions of the
    FTR fading model (PDF, CDF and MGF) are expressed in closed-form, having a
    functional form similar to other state-of-the-art fading models. We also
    provide approximate closed-form expressions for the PDF and CDF in terms of a
    finite number of elementary functions, which allow for a simple evaluation of
    these statistics to an arbitrary level of precision. We show that the FTR
    fading model provides a much better fit than Rician fading for recent
    small-scale fading measurements in 28 GHz outdoor millimeter-wave channels.
    Finally, the performance of wireless communication systems over FTR fading is
    evaluated in terms of the bit error rate and the outage capacity, and the
    interplay between the FTR fading model parameters and the system performance is
    discussed. Monte Carlo simulations have been carried out in order to validate
    the obtained theoretical expressions.

    Market Segmentation for Privacy Differentiated "Free" Services

    Chong Huang, Lalitha Sankar
    Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT); Social and Information Networks (cs.SI)

    The emerging marketplace for online free services in which service providers
    earn revenue from using consumer data in direct and indirect ways has lead to
    significant privacy concerns. This begs understanding of the following
    question: can the marketplace sustain multiple service providers (SPs) that
    offer privacy-differentiated free services? This paper studies this problem of
    market segmentation for the free online services market by augmenting the
    classical Hotelling model for market segmentation analysis to include the fact
    that for the free services market, a consumer values service not in monetized
    terms but by its quality of service (QoS) and that the differentiator of
    services is not product price but the privacy risk advertised by a SP. Building
    upon the Hotelling model, this paper presents a parametrized model for SP
    profit and consumer valuation of service for both the two- and multi-SP
    problems to show that: (i) when consumers place a high value on privacy, it
    leads to a lower use of private data by SPs (i.e., their advertised privacy
    risk reduces), and thus, SPs compete on the QoS; (ii) SPs that are capable of
    differentiating on services that do not use directly consumer data (untargeted
    services) gain larger market share; and (iii) a higher valuation of privacy by
    consumers forces SPs with smaller untargeted revenue to offer lower privacy
    risk to attract more consumers. The work also illustrates the market
    segmentation problem for more than two SPs and highlights the instability of
    such markets.

    Estimation Theory Based Robust Phase Offset Estimation in the Presence of Delay Attacks

    Anantha K. Karthik, Rick S. Blum
    Comments: 30 pages, 4 figures, Journal paper
    Subjects: Applications (stat.AP); Information Theory (cs.IT)

    This paper addresses the problem of robust clock phase offset estimation for
    the IEEE 1588 precision time protocol (PTP) in the presence of delay attacks.
    Delay attacks are one of the most effective cyber attacks in PTP, as they
    cannot be mitigated using typical security measures. In this paper, we consider
    the case where the slave node can exchange synchronization messages with
    multiple master nodes synchronized to the same clock. We first provide lower
    bounds on the best achievable performance for any phase offset estimation
    scheme in the presence of delay attacks. We then present a novel phase offset
    estimation scheme that employs the Expectation-Maximization algorithm for
    detecting which of the master-slave communication links have been subject to
    delay attacks. After discarding information from the links identified as
    attacked, which we show to be optimal, the optimal vector location parameter
    estimator is employed to estimate the phase offset of the slave node.
    Simulation results are presented to show that the proposed phase offset
    estimation scheme exhibits performance close to the lower bounds in a wide
    variety of scenarios.

    Neural stochastic codes, encoding and decoding

    Hugo Gabriel Eyherabide
    Comments: 14 Pages, 9 Figures, 1 Table
    Subjects: Neurons and Cognition (q-bio.NC); Information Theory (cs.IT); Quantitative Methods (q-bio.QM); Applications (stat.AP)

    Identifying informative aspects of brain activity has traditionally been
    thought to provide insight into how brains may perform optimal computations.
    However, here we show that this need not be the case when studying spike-time
    precision or response discrimination, among other activity aspects beyond noise
    correlations. Our results show that decoders designed with noisy data may
    perform optimally on quality data, thereby potentially yielding experimental
    and computational savings.




沪ICP备19023445号-2号
友情链接