IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Mon, 3 Apr 2017

    我爱机器学习(52ml.net)发表于 2017-04-03 00:00:00
    love 0

    Neural and Evolutionary Computing

    On Self-Adaptive Mutation Restarts for Evolutionary Robotics with Real Rotorcraft

    Gerard David Howard
    Comments: 8 pages
    Subjects: Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

    Self-adaptive parameters are increasingly used in the field of Evolutionary
    Robotics, as they allow key evolutionary rates to vary autonomously in a
    context-sensitive manner throughout the optimisation process. A significant
    limitation to self-adaptive mutation is that rates can be set unfavourably,
    which hinders convergence. Rate restarts are typically employed to remedy this,
    but thus far have only been applied in Evolutionary Robotics for mutation-only
    algorithms. This paper focuses on the level at which evolutionary rate restarts
    are applied in population-based algorithms with more than 1 evolutionary
    operator. After testing on a real hexacopter hovering task, we conclude that
    individual-level restarting results in higher fitness solutions without fitness
    stagnation, and population restarts provide a more stable rate evolution.
    Without restarts, experiments can become stuck in suboptimal controller/rate
    combinations which can be difficult to escape from.

    Diabetic Retinopathy Detection via Deep Convolutional Networks for Discriminative Localization and Visual Explanation

    Zhiguang Wang, Jianbo Yang
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We proposed a deep learning method for interpretable diabetic retinopathy
    (DR) detection. The visual-interpretable feature of the proposed method is
    achieved by adding the regression activation map (RAM) after the global
    averaging pooling layer of the convolutional networks (CNN). With RAM, the
    proposed model can localize the discriminative regions of an retina image to
    show the specific region of interest in terms of its severity level. We believe
    this advantage of the proposed deep learning model is highly desired for DR
    detection because in practice, users are not only interested with high
    prediction performance, but also keen to understand the insights of DR
    detection and why the adopted learning model works. In the experiments
    conducted on a large scale of retina image dataset, we show that the proposed
    CNN model can achieve high performance on DR detection compared with the
    state-of-the-art while achieving the merits of providing the RAM to highlight
    the salient regions of the input image.

    Factorization tricks for LSTM networks

    Oleksii Kuchaiev, Boris Ginsburg
    Comments: accepted to ICLR 2017 Workshop
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We present two simple ways of reducing the number of parameters and
    accelerating the training of large Long Short-Term Memory (LSTM) networks: the
    first one is “matrix factorization by design” of LSTM matrix into the product
    of two smaller matrices, and the second one is partitioning of LSTM matrix, its
    inputs and states into the independent groups. Both approaches allow us to
    train large LSTM networks significantly faster to the state-of the art
    perplexity. On the One Billion Word Benchmark we improve single model
    perplexity down to 24.29.

    Deep Neural Network Optimized to Resistive Memory with Nonlinear Current-Voltage Characteristics

    Hyungjun Kim, Taesu Kim, Jinseok Kim, Jae-Joon Kim
    Comments: 14 pages
    Subjects: Emerging Technologies (cs.ET); Neural and Evolutionary Computing (cs.NE)

    Artificial Neural Network computation relies on intensive vector-matrix
    multiplications. Recently, the emerging nonvolatile memory (NVM) crossbar array
    showed a feasibility of implementing such operations with high energy
    efficiency, thus there are many works on efficiently utilizing emerging NVM
    crossbar array as analog vector-matrix multiplier. However, its nonlinear I-V
    characteristics restrain critical design parameters, such as the read voltage
    and weight range, resulting in substantial accuracy loss. In this paper,
    instead of optimizing hardware parameters to a given neural network, we propose
    a methodology of reconstructing a neural network itself optimized to resistive
    memory crossbar arrays. To verify the validity of the proposed method, we
    simulated various neural network with MNIST and CIFAR-10 dataset using two
    different specific Resistive Random Access Memory (RRAM) model. Simulation
    results show that our proposed neural network produces significantly higher
    inference accuracies than conventional neural network when the synapse devices
    have nonlinear I-V characteristics.


    Computer Vision and Pattern Recognition

    InverseFaceNet: Deep Single-Shot Inverse Face Rendering From A Single Image

    Hyeongwoo Kim, Michael Zollhöfer, Ayush Tewari, Justus Thies, Christian Richardt, Christian Theobalt
    Comments: 10 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We introduce InverseFaceNet, a deep convolutional inverse rendering framework
    for faces that jointly estimates facial pose, shape, expression, reflectance
    and illumination from a single input image in a single shot. By estimating all
    these parameters from just a single image, advanced editing possibilities on a
    single face image, such as appearance editing and relighting, become feasible.
    Previous learning-based face reconstruction approaches do not jointly recover
    all dimensions, or are severely limited in terms of visual quality. In
    contrast, we propose to recover high-quality facial pose, shape, expression,
    reflectance and illumination using a deep neural network that is trained using
    a large, synthetically created dataset. Our approach builds on a novel loss
    function that measures model-space similarity directly in parameter space and
    significantly improves reconstruction accuracy. In addition, we propose an
    analysis-by-synthesis breeding approach which iteratively updates the synthetic
    training corpus based on the distribution of real-world images, and we
    demonstrate that this strategy outperforms completely synthetically trained
    networks. Finally, we show high-quality reconstructions and compare our
    approach to several state-of-the-art approaches.

    Quicksilver: Fast Predictive Image Registration – a Deep Learning Approach

    Xiao Yang, Roland Kwitt, Marc Niethammer
    Comments: Neuroimage Journal submission. Removed line number
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper introduces Quicksilver, a fast deformable image registration
    method. Quicksilver registration for image-pairs works by patch-wise prediction
    of a deformation model based directly on image appearance. A deep
    encoder-decoder network is used as the prediction model. While the prediction
    strategy is general, we focus on predictions for the Large Deformation
    Diffeomorphic Metric Mapping (LDDMM) model. Specifically, we predict the
    momentum-parameterization of LDDMM, which facilitates a patch-wise prediction
    strategy while maintaining the theoretical properties of LDDMM, such as
    guaranteed diffeomorphic mappings for sufficiently strong regularization. We
    also provide a probabilistic version of our prediction network which can be
    sampled during test time to calculate uncertainties in the predicted
    deformations. Finally, we introduce a new correction network which greatly
    increases the prediction accuracy of an already existing prediction network.
    Experiments are conducted for both atlas-to-image and image-to-image
    registrations. These experiments show that our method accurately predicts
    registrations obtained by numerical optimization, is very fast, and achieves
    state-of-the-art registration results on four standard validation datasets.
    Quicksilver is freely available as open-source software.

    Fast Predictive Multimodal Image Registration

    Xiao Yang, Roland Kwitt, Martin Styner, Marc Niethammer
    Comments: Accepted as a conference paper for ISBI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We introduce a deep encoder-decoder architecture for image deformation
    prediction from multimodal images. Specifically, we design an image-patch-based
    deep network that jointly (i) learns an image similarity measure and (ii) the
    relationship between image patches and deformation parameters. While our method
    can be applied to general image registration formulations, we focus on the
    Large Deformation Diffeomorphic Metric Mapping (LDDMM) registration model. By
    predicting the initial momentum of the shooting formulation of LDDMM, we
    preserve its mathematical properties and drastically reduce the computation
    time, compared to optimization-based approaches. Furthermore, we create a
    Bayesian probabilistic version of the network that allows evaluation of
    registration uncertainty via sampling of the network at test time. We evaluate
    our method on a 3D brain MRI dataset using both T1- and T2-weighted images. Our
    experiments show that our method generates accurate predictions and that
    learning the similarity measure leads to more consistent registrations than
    relying on generic multimodal image similarity measures, such as mutual
    information. Our approach is an order of magnitude faster than
    optimization-based LDDMM.

    Unsupervised learning from video to detect foreground objects in single images

    Ioana Croitoru (1), Simion-Vlad Bogolin (1), Marius Leordeanu (1 and 2) ((1) Institute of Mathematics of the Romanian Academy, (2) University "Politehnica" of Bucharest)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Unsupervised learning from visual data is one of the most difficult
    challenges in computer vision, being a fundamental task for understanding how
    visual recognition works. From a practical point of view, learning from
    unsupervised visual input has an immense practical value, as very large
    quantities of unlabeled videos can be collected at low cost. In this paper, we
    address the task of unsupervised learning to detect and segment foreground
    objects in single images. We achieve our goal by training a student pathway,
    consisting of a deep neural network. It learns to predict from a single input
    image (a video frame) the output for that particular frame, of a teacher
    pathway that performs unsupervised object discovery in video. Our approach is
    different from the published literature that performs unsupervised discovery in
    videos or in collections of images at test time. We move the unsupervised
    discovery phase during the training stage, while at test time we apply the
    standard feed-forward processing along the student pathway. This has a dual
    benefit: firstly, it allows in principle unlimited possibilities of learning
    and generalization during training, while remaining very fast at testing.
    Secondly, the student not only becomes able to detect in single images
    significantly better than its unsupervised video discovery teacher, but it also
    achieves state of the art results on two important current benchmarks, YouTube
    Objects and Object Discovery datasets. Moreover, at test time, our system is at
    least two orders of magnitude faster than other previous methods.

    Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos

    Jie Song, Limin Wang, Luc Van Gool, Otmar Hilliges
    Comments: Preliminary version to appear in CVPR2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep ConvNets have been shown to be effective for the task of human pose
    estimation from single images. However, several challenging issues arise in the
    video-based case such as self-occlusion, motion blur, and uncommon poses with
    few or no examples in training data sets. Temporal information can provide
    additional cues about the location of body joints and help to alleviate these
    issues. In this paper, we propose a deep structured model to estimate a
    sequence of human poses in unconstrained videos. This model can be efficiently
    trained in an end-to-end manner and is capable of representing appearance of
    body joints and their spatio-temporal relationships simultaneously. Domain
    knowledge about the human body is explicitly incorporated into the network
    providing effective priors to regularize the skeletal structure and to enforce
    temporal consistency. The proposed end-to-end architecture is evaluated on two
    widely used benchmarks (Penn Action dataset and JHMDB dataset) for video-based
    pose estimation. Our approach significantly outperforms the existing
    state-of-the-art methods.

    BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth

    Mahdi Rad, Vincent Lepetit
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We introduce a novel method for 3D object detection and pose estimation from
    color images only. We first use segmentation to detect the objects of interest
    in 2D even in presence of partial occlusions and cluttered background. By
    contrast with recent patch-based methods, we rely on a “holistic” approach: We
    apply to the detected objects a Convolutional Neural Network (CNN) trained to
    predict their 3D poses in the form of 2D projections of the corners of their 3D
    bounding boxes for the pose of objects’ parts. This, however, is not sufficient
    for handling objects from the recent T-LESS dataset: These objects exhibit an
    axis of rotational symmetry, and the similarity of two images of such an object
    under two different poses makes training the CNN challenging. We solve this
    problem by restricting the range of poses used for training, and by introducing
    a classifier to identify the range of a pose at run-time before estimating it.
    We also use an optional additional step that refines the predicted poses for
    hand pose estimation. We improve the state-of-the-art on the LINEMOD dataset
    from 73.7% to 89.3% of correctly registered RGB frames. We are also the first
    to report results on the Occlusion dataset using color images only. We obtain
    54% of frames passing the Pose 6D criterion on average on several sequences of
    the T-LESS dataset, compared to the 67% of the state-of-the-art on the same
    sequences which uses both color and depth. The full approach is also scalable,
    as a single network can be trained for multiple objects simultaneously.

    Single Image Super Resolution – When Model Adaptation Matters

    Yudong Liang, Radu Timofte, Jinjun Wang, Yihong Gong, Nanning Zheng
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In the recent years impressive advances were made for single image
    super-resolution. Deep learning is behind a big part of this success. Deep(er)
    architecture design and external priors modeling are the key ingredients. The
    internal contents of the low resolution input image is neglected with deep
    modeling despite the earlier works showing the power of using such internal
    priors. In this paper we propose a novel deep convolutional neural network
    carefully designed for robustness and efficiency at both learning and testing.
    Moreover, we propose a couple of model adaptation strategies to the internal
    contents of the low resolution input image and analyze their strong points and
    weaknesses. By trading runtime and using internal priors we achieve 0.1 up to
    0.3dB PSNR improvements over best reported results on standard datasets. Our
    adaptation especially favors images with repetitive structures or under large
    resolutions. Moreover, it can be combined with other simple techniques, such as
    back-projection or enhanced prediction, for further improvements.

    (DE)^2 CO: Deep Depth Colorization

    F. M. Carlucci, P. Russo, S. M. Baharlou, B. Caputo
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Object recognition on depth images using convolutional neural networks
    requires mapping the data collected with depth sensors into three dimensional
    channels. This makes them processable by deep architectures, pre-trained over
    large scale RGB databases like ImageNet. Current mappings are based on
    heuristic assumptions over what depth properties should be most preserved,
    resulting often in cumbersome data visualizations, and likely in sub-optimal
    recognition results. Here we take an alternative route and we attempt instead
    to emph{learn} an optimal colorization mapping for any given pre-trained
    architecture, using as training data a reference RGB-D database. We propose a
    deep network architecture, exploiting the residual paradigm, that learns how to
    map depth data to three channel images from a reference database. A qualitative
    analysis of the images obtained with this approach clearly indicates that
    learning the optimal mapping for depth data preserves the richness of depth
    information much better than hand-crafted approaches currently in use.
    Experiments on the Washington, JHUIT-50 and BigBIRD public benchmark databases,
    using AlexNet, VGG-16, GoogleNet, ResNet and SqueezeNet, clearly showcase the
    power of our approach, with gains in performance of up to (17\%) compared to
    the state of the art.

    End-To-End Face Detection and Recognition

    Liying Chi, Hongxin Zhang, Mingxiu Chen
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Plenty of face detection and recognition methods have been proposed and got
    delightful results in decades. Common face recognition pipeline consists of: 1)
    face detection, 2) face alignment, 3) feature extraction, 4) similarity
    calculation, which are separated and independent from each other. The separated
    face analyzing stages lead the model redundant calculation and are hard for
    end-to-end training. In this paper, we proposed a novel end-to-end trainable
    convolutional network framework for face detection and recognition, in which a
    geometric transformation matrix was directly learned to align the faces,
    instead of predicting the facial landmarks. In training stage, our single CNN
    model is supervised only by face bounding boxes and personal identities, which
    are publicly available from WIDER FACE cite{Yang2016} dataset and
    CASIA-WebFace cite{Yi2014} dataset. Tested on Face Detection Dataset and
    Benchmark (FDDB) cite{Jain2010} dataset and Labeled Face in the Wild (LFW)
    cite{Huang2007} dataset, we have achieved 89.24\% recall for face detection
    task and 98.63\% verification accuracy for face recognition task
    simultaneously, which are comparable to state-of-the-art results.

    Semantic-driven Generation of Hyperlapse from (360^circ) Video

    Wei-Sheng Lai, Yujia Huang, Neel Joshi, Chris Buehler, Ming-Hsuan Yang, Sing Bing Kang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a system for converting a fully panoramic ((360^circ)) video into
    a normal field-of-view (NFOV) hyperlapse for an optimal viewing experience. Our
    system exploits visual saliency and semantics to non-uniformly sample in space
    and time for generating hyperlapses. In addition, users can optionally choose
    objects of interest for customizing the hyperlapses. We first stabilize an
    input (360^circ) video by smoothing the rotation between adjacent frames and
    then compute regions of interest and saliency scores. An initial hyperlapse is
    generated by optimizing the saliency and motion smoothness followed by the
    saliency-aware frame selection. It is smoothed further using an efficient 2D
    video stabilization approach that adaptively selects the motion model to
    generate the final hyperlapse. We validate the design of our system by showing
    results for a variety of scenes and comparing against the state-of-the-art
    method through a user study.

    A Hybrid Data Association Framework for Robust Online Multi-Object Tracking

    Min Yang, Yuwei Wu, Yunde Jia
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Global optimization algorithms have shown impressive performance in
    data-association based multi-object tracking, but handling online data remains
    a difficult hurdle to overcome. In this paper, we present a hybrid data
    association framework with a min-cost multi-commodity network flow for robust
    online multi-object tracking. We build local target-specific models interleaved
    with global optimization of the optimal data association over multiple video
    frames. More specifically, in the min-cost multi-commodity network flow, the
    target-specific similarities are online learned to enforce the local
    consistency for reducing the complexity of the global data association.
    Meanwhile, the global data association taking multiple video frames into
    account alleviates irrecoverable errors caused by the local data association
    between adjacent frames. To ensure the efficiency of online tracking, we give
    an efficient near-optimal solution to the proposed min-cost multi-commodity
    flow problem, and provide the empirical proof of its sub-optimality. The
    comprehensive experiments on real data demonstrate the superior tracking
    performance of our approach in various challenging situations.

    Diabetic Retinopathy Detection via Deep Convolutional Networks for Discriminative Localization and Visual Explanation

    Zhiguang Wang, Jianbo Yang
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We proposed a deep learning method for interpretable diabetic retinopathy
    (DR) detection. The visual-interpretable feature of the proposed method is
    achieved by adding the regression activation map (RAM) after the global
    averaging pooling layer of the convolutional networks (CNN). With RAM, the
    proposed model can localize the discriminative regions of an retina image to
    show the specific region of interest in terms of its severity level. We believe
    this advantage of the proposed deep learning model is highly desired for DR
    detection because in practice, users are not only interested with high
    prediction performance, but also keen to understand the insights of DR
    detection and why the adopted learning model works. In the experiments
    conducted on a large scale of retina image dataset, we show that the proposed
    CNN model can achieve high performance on DR detection compared with the
    state-of-the-art while achieving the merits of providing the RAM to highlight
    the salient regions of the input image.

    Novel Framework for Spectral Clustering using Topological Node Features(TNF)

    Lalith Srikanth Chintalapati, Raghunatha Sarma Rachakonda
    Comments: 8 pages, This work is under consideration at Pattern Recognition Letters
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Spectral clustering has gained importance in recent years due to its ability
    to cluster complex data as it requires only pairwise similarity among data
    points with its ease of implementation. The central point in spectral
    clustering is the process of capturing pair-wise similarity. In the literature,
    many research techniques have been proposed for effective construction of
    affinity matrix with suitable pair- wise similarity. In this paper a general
    framework for capturing pairwise affinity using local features such as density,
    proximity and structural similarity is been proposed. Topological Node Features
    are exploited to define the notion of density and local structure. These local
    features are incorporated into the construction of the affinity matrix.
    Experimental results, on widely used datasets such as synthetic shape datasets,
    UCI real datasets and MNIST handwritten datasets show that the proposed
    framework outperforms standard spectral clustering methods.

    Unsupervised Holistic Image Generation from Key Local Patches

    Donghoon Lee, Sangdoo Yun, Sungjoon Choi, Hwiyeon Yoo, Ming-Hsuan Yang, Songhwai Oh
    Comments: 16 pages, 14 figures, ICCV 2017 submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We introduce a new problem of generating an image based on a small number of
    key local patches without any geometric prior. In this work, key local patches
    are defined as informative regions of the target object or scene. This is a
    challenging problem since it requires generating realistic images and
    predicting locations of parts at the same time. We construct adversarial
    networks to tackle this problem. A generator network generates a fake image as
    well as a mask based on the encoder-decoder framework. On the other hand, a
    discriminator network aims to detect fake images. The network is trained with
    three losses to consider spatial, appearance, and adversarial information. The
    spatial loss determines whether the locations of predicted parts are correct.
    Input patches are restored in the output image without much modification due to
    the appearance loss. The adversarial loss ensures output images are realistic.
    The proposed network is trained without supervisory signals since no labels of
    key parts are required. Experimental results on six datasets demonstrate that
    the proposed algorithm performs favorably on challenging objects and scenes.

    Deep Domain Adaptation Based Video Smoke Detection using Synthetic Smoke Images

    Gao Xu, Yongming Zhang, Qixing Zhang, Gaohua Lin, Jinjun Wang
    Comments: The manuscript approved by all authors is our original work, and has submitted to Fire Safety Journal for peer review previously. There are 4516 words, 8 figures and 2 tables in this manuscript
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, a deep domain adaptation based method for video smoke
    detection is proposed to extract a powerful feature representation of smoke.
    Due to the smoke image samples limited in scale and diversity for deep CNN
    training, we systematically produced adequate synthetic smoke images with a
    wide variation in the smoke shape, background and lighting conditions.
    Considering that the appearance gap (dataset bias) between synthetic and real
    smoke images degrades significantly the performance of the trained model on the
    test set composed fully of real images, we build deep architectures based on
    domain adaptation to confuse the distributions of features extracted from
    synthetic and real smoke images. This approach expands the domain-invariant
    feature space for smoke image samples. With their approximate feature
    distribution off non-smoke images, the recognition rate of the trained model is
    improved significantly compared to the model trained directly on mixed dataset
    of synthetic and real images. Experimentally, several deep architectures with
    different design choices are applied to the smoke detector. The ultimate
    framework can get a satisfactory result on the test set. We believe that our
    approach is a start in the direction of utilizing deep neural networks enhanced
    with synthetic smoke images for video smoke detection.

    Deep 3D Face Identification

    Donghyun Kim, Matthias Hernandez, Jongmoo Choi, Gerard Medioni
    Comments: 9 pages, 5 figures, 2 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a novel 3D face recognition algorithm using a deep convolutional
    neural network (DCNN) and a 3D augmentation technique. The performance of 2D
    face recognition algorithms has significantly increased by leveraging the
    representational power of deep neural networks and the use of large-scale
    labeled training data. As opposed to 2D face recognition, training
    discriminative deep features for 3D face recognition is very difficult due to
    the lack of large-scale 3D face datasets. In this paper, we show that transfer
    learning from a CNN trained on 2D face images can effectively work for 3D face
    recognition by fine-tuning the CNN with a relatively small number of 3D facial
    scans. We also propose a 3D face augmentation technique which synthesizes a
    number of different facial expressions from a single 3D face scan. Our proposed
    method shows excellent recognition results on Bosphorus, BU-3DFE, and 3D-TEC
    datasets, without using hand-crafted features. The 3D identification using our
    deep features also scales well for large databases.

    Concurrent Segmentation and Localization for Tracking of Surgical Instruments

    Iro Laina, Nicola Rieke, Christian Rupprecht, Josué Page Vizcaíno, Abouzar Eslami, Federico Tombari, Nassir Navab
    Comments: I. Laina and N. Rieke contributed equally to this work. Submitted to MICCAI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Instrument tracking is an essential requirement for various computer-assisted
    interventions. To overcome problems such as specular reflection and motion
    blur, we propose a novel method that takes advantage of the interdependency
    between localization and segmentation of the tool. In particular, we
    reformulate the 2D pose estimation as a heatmap regression and thereby enable a
    robust, concurrent regression of both tasks. Throughout experimental results,
    we demonstrate that this modeling leads to a significantly higher accuracy than
    directly regressing the tool’s coordinates. The performance is compared to
    state-of-the-art on a Retinal Microsurgery benchmark and the EndoVis Challenge.

    TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

    Chih-Yao Ma, Min-Hung Chen, Zsolt Kira, Ghassan AlRegib
    Comments: 16 pages, 11 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent two-stream deep Convolutional Neural Networks (ConvNets) have made
    significant progress in recognizing human actions in videos. Despite their
    success, methods extending the basic two-stream ConvNet have not systematically
    explored possible network architectures to further exploit spatiotemporal
    dynamics within video sequences. Further, such networks often use different
    baseline two-stream networks. Therefore, the differences and the distinguishing
    factors between various methods using Recurrent Neural Networks (RNN) or
    convolutional networks on temporally-constructed feature vectors
    (Temporal-ConvNet) are unclear. In this work, we first demonstrate a strong
    baseline two-stream ConvNet using ResNet-101. We use this baseline to
    thoroughly examine the use of both RNNs and Temporal-ConvNets for extracting
    spatiotemporal information. Building upon our experimental results, we then
    propose and investigate two different networks to further integrate
    spatiotemporal information: 1) temporal segment RNN and 2) Inception-style
    Temporal-ConvNet. We demonstrate that using both RNNs (using LSTMs) and
    Temporal-ConvNets on spatiotemporal feature matrices are able to exploit
    spatiotemporal dynamics to improve the overall performance. However, each of
    these methods require proper care to achieve state-of-the-art performance; for
    example, LSTMs require pre-segmented data or else they cannot fully exploit
    temporal information. Our analysis identifies specific limitations for each
    method that could form the basis of future work. Our experimental results on
    UCF101 and HMDB51 datasets achieve state-of-the-art performances, 94.1% and
    69.0%, respectively, without requiring extensive temporal augmentation.

    Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos

    Rui Hou, Chen Chen, Mubarak Shah
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep learning has been demonstrated to achieve excellent results for image
    classification and object detection. However, the impact of deep learning on
    video analysis (e.g. action detection and recognition) has been limited due to
    complexity of video data and lack of annotations. Previous convolutional neural
    networks (CNN) based video action detection approaches usually consist of two
    major steps: frame-level action proposal detection and association of proposals
    across frames. Also, these methods employ two-stream CNN framework to handle
    spatial and temporal feature separately. In this paper, we propose an
    end-to-end deep network called Tube Convolutional Neural Network (T-CNN) for
    action detection in videos. The proposed architecture is a unified network that
    is able to recognize and localize action based on 3D convolution features. A
    video is first divided into equal length clips and for each clip a set of tube
    proposals are generated next based on 3D Convolutional Network (ConvNet)
    features. Finally, the tube proposals of different clips are linked together
    employing network flow and spatio-temporal action detection is performed using
    these linked video proposals. Extensive experiments on several video datasets
    demonstrate the superior performance of T-CNN for classifying and localizing
    actions in both trimmed and untrimmed videos compared to state-of-the-arts.

    Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images

    Tribhuvanesh Orekondy, Bernt Schiele, Mario Fritz
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Computers and Society (cs.CY); Social and Information Networks (cs.SI)

    With an increasing number of users sharing information online, privacy
    implications entailing such actions are a major concern. For explicit content,
    such as user profile or GPS data, devices (e.g. mobile phones) as well as web
    services (e.g. Facebook) offer to set privacy settings in order to enforce the
    users’ privacy preferences. We propose the first approach that extends this
    concept to image content in the spirit of a Visual Privacy Advisor. First, we
    categorize personal information in images into 68 image attributes and collect
    a dataset, which allows us to train models that predict such information
    directly from images. Second, we run a user study to understand the privacy
    preferences of different users w.r.t. such attributes. Third, we propose models
    that predict user specific privacy score from images in order to enforce the
    users’ privacy preferences. Our model is trained to predict the user specific
    privacy risk and even outperforms the judgment of the users, who often fail to
    follow their own privacy preferences on image data.

    Relevance Subject Machine: A Novel Person Re-identification Framework

    Igor Fedorov, Ritwik Giri, Bhaskar D. Rao, Truong Q. Nguyen
    Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a novel method called the Relevance Subject Machine (RSM) to solve
    the person re-identification (re-id) problem. RSM falls under the category of
    Bayesian sparse recovery algorithms and uses the sparse representation of the
    input video under a pre-defined dictionary to identify the subject in the
    video. Our approach focuses on the multi-shot re-id problem, which is the
    prevalent problem in many video analytics applications. RSM captures the
    essence of the multi-shot re-id problem by constraining the support of the
    sparse codes for each input video frame to be the same. Our proposed approach
    is also robust enough to deal with time varying outliers and occlusions by
    introducing a sparse, non-stationary noise term in the model error. We provide
    a novel Variational Bayesian based inference procedure along with an intuitive
    interpretation of the proposed update rules. We evaluate our approach over
    several commonly used re-id datasets and show superior performance over current
    state-of-the-art algorithms. Specifically, for ILIDS-VID, a recent large scale
    re-id dataset, RSM shows significant improvement over all published approaches,
    achieving an 11.5% (absolute) improvement in rank 1 accuracy over the closest
    competing algorithm considered.

    Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention

    Jinkyu Kim, John Canny
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Deep neural perception and control networks are likely to be a key component
    of self-driving vehicles. These models need to be explainable – they should
    provide easy-to-interpret rationales for their behavior – so that passengers,
    insurance companies, law enforcement, developers etc., can understand what
    triggered a particular behavior. Here we explore the use of visual
    explanations. These explanations take the form of real-time highlighted regions
    of an image that causally influence the network’s output (steering control).
    Our approach is two-stage. In the first stage, we use a visual attention model
    to train a convolution network end-to-end from images to steering angle. The
    attention model highlights image regions that potentially influence the
    network’s output. Some of these are true influences, but some are spurious. We
    then apply a causal filtering step to determine which input regions actually
    influence the output. This produces more succinct visual explanations and more
    accurately exposes the network’s behavior. We demonstrate the effectiveness of
    our model on three datasets totaling 16 hours of driving. We first show that
    training with attention does not degrade the performance of the end-to-end
    network. Then we show that the network causally cues on a variety of features
    that are used by humans while driving.


    Artificial Intelligence

    Diversity of preferences can increase collective welfare in sequential exploration problems

    Pantelis P. Analytis, Hrvoje Stojic, Alexandros Gelastopoulos, Mehdi Moussaïd
    Comments: 4 pages, 1 figure, originally presented at the collected intelligence (CI) conference in June 2017
    Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

    We advance a novel model of choice in online interfaces in which agents with
    diverse yet correlated preferences search the alternatives in order of
    popularity and choose the first alternative with utility higher than a certain
    satisficing threshold. The model goes beyond existing accounts in that (i) it
    suggests a cognitive process through which social influence plays out in these
    markets (ii) it is bolstered by a rich utility framework and is thus amenable
    to welfare analysis, and (iii) it facilitates comparisons with scenarios
    without social influence. Using agent-based simulations we find that social
    social interaction leads to a larger increase in the average consumer welfare
    when there is at least some diversity of preferences in the consumer
    population.

    Learning Visual Servoing with Deep Features and Fitted Q-Iteration

    Alex X. Lee, Sergey Levine, Pieter Abbeel
    Comments: ICLR 2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

    Visual servoing involves choosing actions that move a robot in response to
    observations from a camera, in order to reach a goal configuration in the
    world. Standard visual servoing approaches typically rely on manually designed
    features and analytical dynamics models, which limits their generalization
    capability and often requires extensive application-specific feature and model
    engineering. In this work, we study how learned visual features, learned
    predictive dynamics models, and reinforcement learning can be combined to learn
    visual servoing mechanisms. We focus on target following, with the goal of
    designing algorithms that can learn a visual servo using low amounts of data of
    the target in question, to enable quick adaptation to new targets. Our approach
    is based on servoing the camera in the space of learned visual features, rather
    than image pixels or manually-designed keypoints. We demonstrate that standard
    deep features, in our case taken from a model trained for object
    classification, can be used together with a bilinear predictive model to learn
    an effective visual servo that is robust to visual variation, changes in
    viewing angle and appearance, and occlusions. A key component of our approach
    is to use a sample-efficient fitted Q-iteration algorithm to learn which
    features are best suited for the task at hand. We show that we can learn an
    effective visual servo on a complex synthetic car following benchmark using
    just 20 training trajectory samples for reinforcement learning. We demonstrate
    substantial improvement over a conventional approach based on image pixels or
    hand-designed keypoints, and we show an improvement in sample-efficiency of
    more than two orders of magnitude over standard model-free deep reinforcement
    learning algorithms. Videos are available at
    url{this http URL}.

    EMULATOR vs REAL PHONE: Android Malware Detection Using Machine Learning

    Mohammed K. Alzaylaee, Suleiman Y. Yerima, Sakir Sezer
    Comments: IWSPA 2017 Proceedings of the 3rd ACM International Workshop on Security and Privacy Analytics, co-located with CODASPY’17, Scottsdale, Arizona, USA – March 24 – 24, 2017, pages 65-72
    Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

    The Android operating system has become the most popular operating system for
    smartphones and tablets leading to a rapid rise in malware. Sophisticated
    Android malware employ detection avoidance techniques in order to hide their
    malicious activities from analysis tools. These include a wide range of
    anti-emulator techniques, where the malware programs attempt to hide their
    malicious activities by detecting the emulator. For this reason,
    countermeasures against antiemulation are becoming increasingly important in
    Android malware detection. Analysis and detection based on real devices can
    alleviate the problems of anti-emulation as well as improve the effectiveness
    of dynamic analysis. Hence, in this paper we present an investigation of
    machine learning based malware detection using dynamic analysis on real
    devices. A tool is implemented to automatically extract dynamic features from
    Android phones and through several experiments, a comparative analysis of
    emulator based vs. device based detection by means of several machine learning
    algorithms is undertaken. Our study shows that several features could be
    extracted more effectively from the on-device dynamic analysis compared to
    emulators. It was also found that approximately 24% more apps were successfully
    analysed on the phone. Furthermore, all of the studied machine learning based
    detection performed better when applied to features extracted from the
    on-device dynamic analysis.

    MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation using 1D and 2D Conditions

    Li-Chia Yang, Szu-Yu Chou, Yi-Hsuan Yang
    Comments: 6 pages
    Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)

    In this paper, we present MidiNet, a deep convolutional neural network (CNN)
    based generative adversarial network (GAN) that is intended to provide a
    general, highly adaptive network structure for symbolic-domain music
    generation. The network takes random noise as input and generates a melody
    sequence one mea- sure (bar) after another. Moreover, it has a novel reflective
    CNN sub-model that allows us to guide the generation process by providing not
    only 1D but also 2D conditions. In our implementation, we used the intended
    chord of the current bar as a 1D condition to provide a harmonic context, and
    the melody generated for the preceding bar previously as a 2D condition to
    provide sequential information. The output of the network is a 16 by 128 matrix
    each time, representing the presence of each of the 128 MIDI notes in the
    generated melody sequence of that bar, with the smallest temporal unit being
    the sixteenth note. MidiNet can generate music of arbitrary number of bars, by
    concatenating these 16 by 128 matrices. The melody sequence can then be played
    back with a synthesizer. We provide example clips showing the effectiveness of
    MidiNet in generating harmonic music.

    What-If Reasoning with Counterfactual Gaussian Processes

    Peter Schulam, Suchi Saria
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Answering “What if?” questions is important in many domains. For example,
    would a patient’s disease progression slow down if I were to give them a dose
    of drug A? Ideally, we answer our question using an experiment, but this is not
    always possible (e.g., it may be unethical). As an alternative, we can use
    non-experimental data to learn models that make counterfactual predictions of
    what we would observe had we run an experiment. In this paper, we propose a
    model to make counterfactual predictions about how continuous-time trajectories
    (time series) respond to sequences of actions taken in continuous-time. We
    develop our model within the potential outcomes framework of Neyman and Rubin.
    One challenge is that the assumptions commonly made to learn potential outcome
    (counterfactual) models from observational data are not applicable in
    continuous-time as-is. We therefore propose a model using marked point
    processes and Gaussian processes, and develop alternative assumptions that
    allow us to learn counterfactual models from continuous-time observational
    data. We evaluate our approach on two tasks from health care: disease
    trajectory prediction and personalized treatment planning.

    Fundamental Parameters of Main-Sequence Stars in an Instant with Machine Learning

    Earl P. Bellinger, George C. Angelou, Saskia Hekker, Sarbani Basu, Warrick Ball, Elisabeth Guggenberger
    Comments: 26 pages, 18 figures, accepted for publication in ApJ
    Subjects: Solar and Stellar Astrophysics (astro-ph.SR); Instrumentation and Methods for Astrophysics (astro-ph.IM); Artificial Intelligence (cs.AI)

    Owing to the remarkable photometric precision of space observatories like
    Kepler, stellar and planetary systems beyond our own are now being
    characterized en masse for the first time. These characterizations are pivotal
    for endeavors such as searching for Earth-like planets and solar twins,
    understanding the mechanisms that govern stellar evolution, and tracing the
    dynamics of our Galaxy. The volume of data that is becoming available, however,
    brings with it the need to process this information accurately and rapidly.
    While existing methods can constrain fundamental stellar parameters such as
    ages, masses, and radii from these observations, they require substantial
    computational efforts to do so.

    We develop a method based on machine learning for rapidly estimating
    fundamental parameters of main-sequence solar-like stars from classical and
    asteroseismic observations. We first demonstrate this method on a
    hare-and-hound exercise and then apply it to the Sun, 16 Cyg A & B, and 34
    planet-hosting candidates that have been observed by the Kepler spacecraft. We
    find that our estimates and their associated uncertainties are comparable to
    the results of other methods, but with the additional benefit of being able to
    explore many more stellar parameters while using much less computation time. We
    furthermore use this method to present evidence for an empirical diffusion-mass
    relation. Our method is open source and freely available for the community to
    use.

    The source code for all analyses and for all figures appearing in this
    manuscript can be found electronically at
    this https URL


    Computation and Language

    Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders

    Tiancheng Zhao, Ran Zhao, Maxine Eskenazi
    Comments: Accepted as a long paper in ACL 2017
    Subjects: Computation and Language (cs.CL)

    While recent neural encoder-decoder models have shown great promise in
    modeling open-domain conversations, they often generate dull and generic
    responses. Unlike past work that has focused on diversifying the output of the
    decoder at word-level to alleviate this problem, we present a novel framework
    based on conditional variational autoencoders that captures the discourse-level
    diversity in the encoder. Our model uses latent variables to learn a
    distribution over potential conversational intents and generates diverse
    responses using only greedy decoders. We have further developed a novel variant
    that is integrated with linguistic prior knowledge for better performance.
    Finally, the training procedure is improved by introducing a bag-of-word loss.
    Our proposed models have been validated to generate significantly more diverse
    responses than baseline approaches and exhibit competence in discourse-level
    decision-making.

    Sentence Simplification with Deep Reinforcement Learning

    Xingxing Zhang, Mirella Lapata
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Sentence simplification aims to make sentences easier to read and understand.
    Most recent approaches draw on insights from machine translation to learn
    simplification rewrites from monolingual corpora of complex and simple
    sentences. We address the simplification problem with an encoder-decoder model
    coupled with a deep reinforcement learning framework. Our model explores the
    space of possible simplifications while learning to optimize a reward function
    that encourages outputs which are simple, fluent, and preserve the meaning of
    the input. Experiments on three datasets demonstrate that our model brings
    significant improvements over the state of the art.

    Joining Hands: Exploiting Monolingual Treebanks for Parsing of Code-mixing Data

    Irshad Ahmad Bhat, Riyaz Ahmad Bhat, Manish Shrivastava, Dipti Misra Sharma
    Comments: 5 pages, EACL 2017 short paper
    Subjects: Computation and Language (cs.CL)

    In this paper, we propose efficient and less resource-intensive strategies
    for parsing of code-mixed data. These strategies are not constrained by
    in-domain annotations, rather they leverage pre-existing monolingual annotated
    resources for training. We show that these methods can produce significantly
    better results as compared to an informed baseline. Besides, we also present a
    data set of 450 Hindi and English code-mixed tweets of Hindi multilingual
    speakers for evaluation. The data set is manually annotated with Universal
    Dependencies.

    N-gram Language Modeling using Recurrent Neural Network Estimation

    Ciprian Chelba, Mohammad Norouzi, Samy Bengio
    Comments: 10 pages, including references
    Subjects: Computation and Language (cs.CL)

    We investigate the effective memory depth of RNN models by using them for
    n-gram language model (LM) smoothing.

    Experiments on a small corpus (UPenn Treebank, one million words of training
    data and 10k vocabulary) have found the LSTM cell with dropout to be the best
    model for encoding the n-gram state when compared with feed-forward and vanilla
    RNN models.

    When preserving the sentence independence assumption the LSTM n-gram matches
    the LSTM LM performance for n=9 and slightly outperforms it for n=13. When
    allowing dependencies across sentence boundaries, the LSTM 13-gram almost
    matches the perplexity of the unlimited history LSTM LM.

    LSTM n-gram smoothing also has the desirable property of improving with
    increasing n-gram order, unlike the Katz or Kneser-Ney back-off estimators.
    Using multinomial distributions as targets in training instead of the usual
    one-hot target is only slightly beneficial for low n-gram orders.

    Experiments on the One Billion Words benchmark show that the results hold at
    larger scale.

    Building LSTM n-gram LMs may be appealing for some practical situations: the
    state in a n-gram LM can be succinctly represented with (n-1)*4 bytes storing
    the identity of the words in the context and batches of n-gram contexts can be
    processed in parallel. On the downside, the n-gram context encoding computed by
    the LSTM is discarded, making the model more expensive than a regular recurrent
    LSTM LM.

    Factorization tricks for LSTM networks

    Oleksii Kuchaiev, Boris Ginsburg
    Comments: accepted to ICLR 2017 Workshop
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We present two simple ways of reducing the number of parameters and
    accelerating the training of large Long Short-Term Memory (LSTM) networks: the
    first one is “matrix factorization by design” of LSTM matrix into the product
    of two smaller matrices, and the second one is partitioning of LSTM matrix, its
    inputs and states into the independent groups. Both approaches allow us to
    train large LSTM networks significantly faster to the state-of the art
    perplexity. On the One Billion Word Benchmark we improve single model
    perplexity down to 24.29.

    Neutral evolution and turnover over centuries of English word popularity

    Damian Ruck, R. Alexander Bentley, Alberto Acerbi, Philip Garnett, Daniel J. Hruschka
    Comments: 12 pages, 5 figures, 1 table
    Subjects: Computation and Language (cs.CL); Physics and Society (physics.soc-ph)

    Here we test Neutral models against the evolution of English word frequency
    and vocabulary at the population scale, as recorded in annual word frequencies
    from three centuries of English language books. Against these data, we test
    both static and dynamic predictions of two neutral models, including the
    relation between corpus size and vocabulary size, frequency distributions, and
    turnover within those frequency distributions. Although a commonly used Neutral
    model fails to replicate all these emergent properties at once, we find that
    modified two-stage Neutral model does replicate the static and dynamic
    properties of the corpus data. This two-stage model is meant to represent a
    relatively small corpus (population) of English books, analogous to a `canon’,
    sampled by an exponentially increasing corpus of books in the wider population
    of authors. More broadly, this mode — a smaller neutral model within a larger
    neutral model — could represent more broadly those situations where mass
    attention is focused on a small subset of the cultural variants.

    BanglaLekha-Isolated: A Comprehensive Bangla Handwritten Character Dataset

    Mithun Biswas, Rafiqul Islam, Gautam Kumar Shom, Md Shopon, Nabeel Mohammed, Sifat Momen, Md Anowarul Abedin
    Comments: Bangla Handwriting Dataset, OCR
    Subjects: Computation and Language (cs.CL)

    Bangla handwriting recognition is becoming a very important issue nowadays.
    It is potentially a very important task specially for Bangla speaking
    population of Bangladesh and West Bengal. By keeping that in our mind we are
    introducing a comprehensive Bangla handwritten character dataset named
    BanglaLekha-Isolated. This dataset contains Bangla handwritten numerals, basic
    characters and compound characters. This dataset was collected from multiple
    geographical location within Bangladesh and includes sample collected from a
    variety of aged groups. This dataset can also be used for other classification
    problems i.e: gender, age, district. This is the largest dataset on Bangla
    handwritten characters yet.


    Distributed, Parallel, and Cluster Computing

    A simplicial complex model of dynamic epistemic logic for fault-tolerant distributed computing

    Eric Goubault, Sergio Rajsbaum
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA)

    The usual epistemic S5 model for multi-agent systems is a Kripke graph, whose
    edges are labeled with the agents that do not distinguish between two states.
    We propose to uncover the higher dimensional information implicit in the Kripke
    graph, by using as a model its dual, a chromatic simplicial complex. For each
    state of the Kripke model there is a facet in the complex, with one vertex per
    agent. If an edge (u,v) is labeled with a set of agents S, the facets
    corresponding to u and v intersect in a simplex consisting of one vertex for
    each agent of S. Then we use dynamic epistemic logic to study how the
    simplicial complex epistemic model changes after the agents communicate with
    each other. We show that there are topological invariants preserved from the
    initial epistemic complex to the epistemic complex after an action model is
    applied, that depend on how reliable the communication is. In turn these
    topological properties determine the knowledge that the agents may gain after
    the communication happens.

    Parallelism, Concurreny and Distribution in Constraint Handling Rules: A Survey (Draft)

    Thom Fruehwirth
    Comments: Draft of survey submitted to a journal 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Programming Languages (cs.PL)

    Constraint Handling Rules is an effective concurrent declarative programming
    language and a versatile computational logic formalism. CHR programs consist of
    guarded reactive rules that transform multisets of constraints. One of the main
    features of CHR is its inherent concurrency. Intuitively, rules can be applied
    to parts of a multiset in parallel.

    In this comprehensive survey, we give an overview of concurrent and parallel
    as well as distributed CHR semantics, standard and more exotic, that have been
    proposed over the years at various levels of refinement. These semantics range
    from the abstract to the concrete. They are related by formal soundness
    results. Their correctness is established as correspondence between parallel
    and sequential computations.

    We present common concise sample CHR programs that have been widely used in
    experiments and benchmarks. We review parallel CHR implementations in software
    and hardware. The experimental results obtained show a consistent parallel
    speedup. Most implementations are available online.

    The CHR formalism can also be used to implement and reason with models for
    concurrency. To this end, the Software Transaction Model, the Actor Model,
    Colored Petri Nets and the Join-Calculus have been faithfully encoded in CHR.

    Study on Resource Efficiency of Distributed Graph Processing

    Miguel E. Coimbra, Alexandre P. Francisco, Luis Veiga
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    Graphs may be used to represent many different problem domains — a concrete
    example is that of detecting communities in social networks, which are
    represented as graphs. With big data and more sophisticated applications
    becoming widespread in recent years, graph processing has seen an emergence of
    requirements pertaining data volume and volatility. This multidisciplinary
    study presents a review of relevant distributed graph processing systems.
    Herein they are presented in groups defined by common traits (distributed
    processing paradigm, type of graph operations, among others), with an overview
    of each system’s strengths and weaknesses. The set of systems is then narrowed
    down to a set of two, upon which quantitative analysis was performed. For this
    quantitative comparison of systems, focus was cast on evaluating the
    performance of algorithms for the problem of detecting communities. To help
    further understand the evaluations performed, a background is provided on graph
    clustering.

    Architecture of processing and analysis system for big astronomical data

    Ivan Kolosov, Sergey Gerasimov, Alexander Meshcheryakov
    Comments: 4 pages, to appear in the Proceedings of ADASS 2016, Astronomical Society of the Pacific (ASP) Conference Series
    Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Distributed, Parallel, and Cluster Computing (cs.DC)

    This work explores the use of big data technologies deployed in the cloud for
    processing of astronomical data. We have applied Hadoop and Spark to the task
    of co-adding astronomical images. We compared the overhead and execution time
    of these frameworks. We conclude that performance of both frameworks is
    generally on par. The Spark API is more flexible, which allows one to easily
    construct astronomical data processing pipelines.

    An analysis of budgeted parallel search on conditional Galton-Watson trees

    David Avis, Luc Devroye
    Comments: 14 pages, 2 figures, 2 tables
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

    Recently Avis and Jordan have demonstrated the efficiency of a simple
    technique called budgeting for the parallelization of a number of tree search
    algorithms. The idea is to limit the amount of work that a processor performs
    before it terminates its search and returns any unexplored nodes to a master
    process. This limit is set by a critical budget parameter which determines the
    overhead of the process. In this paper we study the behaviour of the budget
    parameter on conditional Galton-Watson trees obtaining asymptotically tight
    bounds on this overhead. We present empirical results to show that this bound
    is surprisingly accurate in practice.


    Learning

    Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

    Gintare Karolina Dziugaite, Daniel M. Roy
    Comments: 16 pages, 1 table
    Subjects: Learning (cs.LG)

    One of the defining properties of deep learning is that models are chosen to
    have many more parameters than available training data. In light of this
    capacity for overfitting, it is remarkable that simple algorithms like SGD
    reliably return solutions with low test error. One roadblock to explaining
    these phenomena in terms of implicit regularization, structural properties of
    the solution, and/or easiness of the data is that many learning bounds are
    quantitatively vacuous in this “deep learning” regime. In order to explain
    generalization, we need nonvacuous bounds. We return to an idea by Langford and
    Caruana (2001), who used PAC-Bayes bounds to compute nonvacuous numerical
    bounds on generalization error for stochastic two-layer two-hidden-unit neural
    networks via a sensitivity analysis. By optimizing the PAC-Bayes bound
    directly, we are able to extend their approach and obtain nonvacuous
    generalization bounds for deep stochastic neural network classifiers with
    millions of parameters trained on only tens of thousands of examples. We
    connect our findings to recent and old work on flat minima and MDL-based
    explanations of generalization.

    Learning Visual Servoing with Deep Features and Fitted Q-Iteration

    Alex X. Lee, Sergey Levine, Pieter Abbeel
    Comments: ICLR 2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

    Visual servoing involves choosing actions that move a robot in response to
    observations from a camera, in order to reach a goal configuration in the
    world. Standard visual servoing approaches typically rely on manually designed
    features and analytical dynamics models, which limits their generalization
    capability and often requires extensive application-specific feature and model
    engineering. In this work, we study how learned visual features, learned
    predictive dynamics models, and reinforcement learning can be combined to learn
    visual servoing mechanisms. We focus on target following, with the goal of
    designing algorithms that can learn a visual servo using low amounts of data of
    the target in question, to enable quick adaptation to new targets. Our approach
    is based on servoing the camera in the space of learned visual features, rather
    than image pixels or manually-designed keypoints. We demonstrate that standard
    deep features, in our case taken from a model trained for object
    classification, can be used together with a bilinear predictive model to learn
    an effective visual servo that is robust to visual variation, changes in
    viewing angle and appearance, and occlusions. A key component of our approach
    is to use a sample-efficient fitted Q-iteration algorithm to learn which
    features are best suited for the task at hand. We show that we can learn an
    effective visual servo on a complex synthetic car following benchmark using
    just 20 training trajectory samples for reinforcement learning. We demonstrate
    substantial improvement over a conventional approach based on image pixels or
    hand-designed keypoints, and we show an improvement in sample-efficiency of
    more than two orders of magnitude over standard model-free deep reinforcement
    learning algorithms. Videos are available at
    url{this http URL}.

    Fundamental Conditions for Low-CP-Rank Tensor Completion

    Morteza Ashraphijuo, Xiaodong Wang
    Comments: arXiv admin note: text overlap with arXiv:1703.07698
    Subjects: Learning (cs.LG); Numerical Analysis (cs.NA); Numerical Analysis (math.NA); Machine Learning (stat.ML)

    We consider the problem of low canonical polyadic (CP) rank tensor
    completion. A completion is a tensor whose entries agree with the observed
    entries and its rank matches the given CP rank. We analyze the manifold
    structure corresponding to the tensors with the given rank and define a set of
    polynomials based on the sampling pattern and CP decomposition. Then, we show
    that finite completability of the sampled tensor is equivalent to having a
    certain number of algebraically independent polynomials among the defined
    polynomials. Our proposed approach results in characterizing the maximum number
    of algebraically independent polynomials in terms of a simple geometric
    structure of the sampling pattern, and therefore we obtain the deterministic
    necessary and sufficient condition on the sampling pattern for finite
    completability of the sampled tensor. Moreover, assuming that the entries of
    the tensor are sampled independently with probability (p) and using the
    mentioned deterministic analysis, we propose a combinatorial method to derive a
    lower bound on the sampling probability (p), or equivalently, the number of
    sampled entries that guarantees finite completability with high probability. We
    also show that the existing result for the matrix completion problem can be
    used to obtain a loose lower bound on the sampling probability (p). In
    addition, we obtain deterministic and probabilistic conditions for unique
    completability. It is seen that the number of samples required for finite or
    unique completability obtained by the proposed analysis on the CP manifold is
    orders-of-magnitude lower than that is obtained by the existing analysis on the
    Grassmannian manifold.

    BEGAN: Boundary Equilibrium Generative Adversarial Networks

    David Berthelot, Tom Schumm, Luke Metz
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We propose a new equilibrium enforcing method paired with a loss derived from
    the Wasserstein distance for training auto-encoder based Generative Adversarial
    Networks. This method balances the generator and discriminator during training.
    Additionally, it provides a new approximate convergence measure, fast and
    stable training and high visual quality. We also derive a way of controlling
    the trade-off between image diversity and visual quality. We focus on the image
    generation task, setting a new milestone in visual quality, even at higher
    resolutions. This is achieved while using a relatively simple model
    architecture and a standard training procedure.

    Applying Ricci Flow to Manifold Learning

    Yangyang Li
    Comments: 10 pages, 1 figure
    Subjects: Learning (cs.LG)

    Traditional manifold learning algorithms often bear an assumption that the
    local neighborhood of any point on embedded manifold is roughly equal to the
    tangent space at that point without considering the curvature. The curvature
    indifferent way of manifold processing often makes traditional dimension
    reduction poorly neighborhood preserving. To overcome this drawback we propose
    a new algorithm called RF-ML to perform an operation on the manifold with help
    of Ricci flow before reducing the dimension of manifold.

    QoS-Aware Multi-Armed Bandits

    Lenz Belzner, Thomas Gabor
    Comments: Accepted at IEEE Workshop on Quality Assurance for Self-adaptive Self-organising Systems, FAS* 2016
    Subjects: Learning (cs.LG); Software Engineering (cs.SE)

    Motivated by runtime verification of QoS requirements in self-adaptive and
    self-organizing systems that are able to reconfigure their structure and
    behavior in response to runtime data, we propose a QoS-aware variant of
    Thompson sampling for multi-armed bandits. It is applicable in settings where
    QoS satisfaction of an arm has to be ensured with high confidence efficiently,
    rather than finding the optimal arm while minimizing regret. Preliminary
    experimental results encourage further research in the field of QoS-aware
    decision making.

    Comparison of multi-task convolutional neural network (MT-CNN) and a few other methods for toxicity prediction

    Kedi Wu, Guo-Wei Wei
    Subjects: Quantitative Methods (q-bio.QM); Learning (cs.LG); Machine Learning (stat.ML)

    Toxicity analysis and prediction are of paramount importance to human health
    and environmental protection. Existing computational methods are built from a
    wide variety of descriptors and regressors, which makes their performance
    analysis difficult. For example, deep neural network (DNN), a successful
    approach in many occasions, acts like a black box and offers little conceptual
    elegance or physical understanding. The present work constructs a common set of
    microscopic descriptors based on established physical models for charges,
    surface areas and free energies to assess the performance of multi-task
    convolutional neural network (MT-CNN) architectures and a few other approaches,
    including random forest (RF) and gradient boosting decision tree (GBDT), on an
    equal footing. Comparison is also given to convolutional neural network (CNN)
    and non-convolutional deep neural network (DNN) algorithms. Four benchmark
    toxicity data sets (i.e., endpoints) are used to evaluate various approaches.
    Extensive numerical studies indicate that the present MT-CNN architecture is
    able to outperform the state-of-the-art methods.

    Sentence Simplification with Deep Reinforcement Learning

    Xingxing Zhang, Mirella Lapata
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Sentence simplification aims to make sentences easier to read and understand.
    Most recent approaches draw on insights from machine translation to learn
    simplification rewrites from monolingual corpora of complex and simple
    sentences. We address the simplification problem with an encoder-decoder model
    coupled with a deep reinforcement learning framework. Our model explores the
    space of possible simplifications while learning to optimize a reward function
    that encourages outputs which are simple, fluent, and preserve the meaning of
    the input. Experiments on three datasets demonstrate that our model brings
    significant improvements over the state of the art.

    Feature functional theory – binding predictor (FFT-BP) for the blind prediction of binding free energies

    Bao Wang, Zhixiong Zhao, Duc D. Nguyen, Guo-Wei Wei
    Comments: 25 pages, 11 figures
    Subjects: Quantitative Methods (q-bio.QM); Learning (cs.LG); Chemical Physics (physics.chem-ph)

    We present a feature functional theory – binding predictor (FFT-BP) for the
    protein-ligand binding affinity prediction. The underpinning assumptions of
    FFT-BP are as follows: i) representability: there exists a microscopic feature
    vector that can uniquely characterize and distinguish one protein-ligand
    complex from another; ii) feature-function relationship: the macroscopic
    features, including binding free energy, of a complex is a functional of
    microscopic feature vectors; and iii) similarity: molecules with similar
    microscopic features have similar macroscopic features, such as binding
    affinity. Physical models, such as implicit solvent models and quantum theory,
    are utilized to extract microscopic features, while machine learning algorithms
    are employed to rank the similarity among protein-ligand complexes. A large
    variety of numerical validations and tests confirms the accuracy and robustness
    of the proposed FFT-BP model. The root mean square errors (RMSEs) of FFT-BP
    blind predictions of a benchmark set of 100 complexes, the PDBBind v2007 core
    set of 195 complexes and the PDBBind v2015 core set of 195 complexes are 1.99,
    2.02 and 1.92 kcal/mol, respectively. Their corresponding Pearson correlation
    coefficients are 0.75, 0.80, and 0.78, respectively.

    Bi-class classification of humpback whale sound units against complex background noise with Deep Convolution Neural Network

    Cazau Dorian, Riwal Lefort, Julien Bonnel, Jean-Luc Zarader, Olivier Adam
    Comments: arXiv admin note: text overlap with arXiv:1702.02741 by other authors
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)

    Automatically detecting sound units of humpback whales in complex
    time-varying background noises is a current challenge for scientists. In this
    paper, we explore the applicability of Convolution Neural Network (CNN) method
    for this task. In the evaluation stage, we present 6 bi-class classification
    experimentations of whale sound detection against different background noise
    types (e.g., rain, wind). In comparison to classical FFT-based representation
    like spectrograms, we showed that the use of image-based pretrained CNN
    features brought higher performance to classify whale sounds and background
    noise.

    Diabetic Retinopathy Detection via Deep Convolutional Networks for Discriminative Localization and Visual Explanation

    Zhiguang Wang, Jianbo Yang
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We proposed a deep learning method for interpretable diabetic retinopathy
    (DR) detection. The visual-interpretable feature of the proposed method is
    achieved by adding the regression activation map (RAM) after the global
    averaging pooling layer of the convolutional networks (CNN). With RAM, the
    proposed model can localize the discriminative regions of an retina image to
    show the specific region of interest in terms of its severity level. We believe
    this advantage of the proposed deep learning model is highly desired for DR
    detection because in practice, users are not only interested with high
    prediction performance, but also keen to understand the insights of DR
    detection and why the adopted learning model works. In the experiments
    conducted on a large scale of retina image dataset, we show that the proposed
    CNN model can achieve high performance on DR detection compared with the
    state-of-the-art while achieving the merits of providing the RAM to highlight
    the salient regions of the input image.

    Near Perfect Protein Multi-Label Classification with Deep Neural Networks

    Balazs Szalkai, Vince Grolmusz
    Subjects: Biomolecules (q-bio.BM); Learning (cs.LG); Machine Learning (stat.ML)

    Artificial neural networks (ANNs) have gained a well-deserved popularity
    among machine learning tools upon their recent successful applications in
    image- and sound processing and classification problems. ANNs have also been
    applied for predicting the family or function of a protein, knowing its residue
    sequence. Here we present two new ANNs with multi-label classification ability,
    showing impressive accuracy when classifying protein sequences into 698 UniProt
    families (AUC=99.99%) and 983 Gene Ontology classes (AUC=99.45%).

    What-If Reasoning with Counterfactual Gaussian Processes

    Peter Schulam, Suchi Saria
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Answering “What if?” questions is important in many domains. For example,
    would a patient’s disease progression slow down if I were to give them a dose
    of drug A? Ideally, we answer our question using an experiment, but this is not
    always possible (e.g., it may be unethical). As an alternative, we can use
    non-experimental data to learn models that make counterfactual predictions of
    what we would observe had we run an experiment. In this paper, we propose a
    model to make counterfactual predictions about how continuous-time trajectories
    (time series) respond to sequences of actions taken in continuous-time. We
    develop our model within the potential outcomes framework of Neyman and Rubin.
    One challenge is that the assumptions commonly made to learn potential outcome
    (counterfactual) models from observational data are not applicable in
    continuous-time as-is. We therefore propose a model using marked point
    processes and Gaussian processes, and develop alternative assumptions that
    allow us to learn counterfactual models from continuous-time observational
    data. We evaluate our approach on two tasks from health care: disease
    trajectory prediction and personalized treatment planning.

    Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention

    Jinkyu Kim, John Canny
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Deep neural perception and control networks are likely to be a key component
    of self-driving vehicles. These models need to be explainable – they should
    provide easy-to-interpret rationales for their behavior – so that passengers,
    insurance companies, law enforcement, developers etc., can understand what
    triggered a particular behavior. Here we explore the use of visual
    explanations. These explanations take the form of real-time highlighted regions
    of an image that causally influence the network’s output (steering control).
    Our approach is two-stage. In the first stage, we use a visual attention model
    to train a convolution network end-to-end from images to steering angle. The
    attention model highlights image regions that potentially influence the
    network’s output. Some of these are true influences, but some are spurious. We
    then apply a causal filtering step to determine which input regions actually
    influence the output. This produces more succinct visual explanations and more
    accurately exposes the network’s behavior. We demonstrate the effectiveness of
    our model on three datasets totaling 16 hours of driving. We first show that
    training with attention does not degrade the performance of the end-to-end
    network. Then we show that the network causally cues on a variety of features
    that are used by humans while driving.

    Diving into the shallows: a computational perspective on large-scale shallow learning

    Siyuan Ma, Mikhail Belkin
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Remarkable success of deep neural networks has not been easy to analyze
    theoretically. It has been particularly hard to disentangle relative
    significance of architecture and optimization in achieving accurate
    classification on large datasets. On the flip side, shallow methods have
    encountered obstacles in scaling to large data, despite excellent performance
    on smaller datasets, and extensive theoretical analysis. Practical methods,
    such as variants of gradient descent used so successfully in deep learning,
    seem to perform below par when applied to kernel methods. This difficulty has
    sometimes been attributed to the limitations of shallow architecture.

    In this paper we identify a basic limitation in gradient descent-based
    optimization in conjunctions with smooth kernels. An analysis demonstrates that
    only a vanishingly small fraction of the function space is reachable after a
    fixed number of iterations drastically limiting its power and resulting in
    severe over-regularization. The issue is purely algorithmic, persisting even in
    the limit of infinite data.

    To address this issue, we introduce EigenPro iteration, based on a simple
    preconditioning scheme using a small number of approximately computed
    eigenvectors. It turns out that even this small amount of approximate
    second-order information results in significant improvement of performance for
    large-scale kernel methods. Using EigenPro in conjunction with stochastic
    gradient descent we demonstrate scalable state-of-the-art results for kernel
    methods on a modest computational budget.

    Finally, these results indicate a need for a broader computational
    perspective on modern large-scale learning to complement more traditional
    statistical and convergence analyses. In particular, systematic analysis
    concentrating on the approximation power of algorithms with a fixed computation
    budget will lead to progress both in theory and practice.


    Information Theory

    All Cognitive MIMO: A New Multiuser Detection Approach with Different Priorities

    Nikolaos I. Miridakis, Theodoros A. Tsiftsis, Dimitrios D. Vergados, Angelos Michalas
    Subjects: Information Theory (cs.IT)

    A new detection scheme for multiuser multiple-input multiple-output (MIMO)
    systems is analytically presented. In particular, the transmitting users are
    being categorized in two distinct priority service groups, while they
    communicate directly with a multi-antenna receiver. The linear zero-forcing
    scheme is applied in two consecutive detection stages upon the signal
    reception. In the first stage, the signals of one service group are detected,
    followed by the second stage including the corresponding detection of the
    remaining signals. An appropriate switching scheme based on specific
    transmission quality requirements is utilized prior to the detection so as to
    allocate the signals of a given service group to the suitable detection stage.
    The objective is the enhancement of the reception quality for both service
    groups. The proposed approach can be implemented directly in cognitive radio
    communication assigning the secondary users to the appropriate service group.
    The exact outage probability of the considered system is derived in closed
    form. The special case of massive MIMO is further studied yielding some useful
    engineering outcomes; the effective channel coherence time and a certain
    optimality condition defining both the transmission quality and effective
    number of independent transmissions.

    Optimal Robust Precoders for Tracking the AoD and AoA of a mm-Wave Path

    Nil Garcia, Henk Wymeersch, Dirk Slock
    Comments: 14 pages
    Subjects: Information Theory (cs.IT)

    Due to high penetration losses at millimeter wave frequencies, channels are
    usually sparse in the sense that only a few paths carry non-negligible energy.
    Such channel structure is exploited by most channel estimation procedures
    which, in general, sound the channel in multiple directions and identify those
    yielding the largest power. The prior knowledge on the multipath parameters is
    then carried on to subsequent iterations in order to track the channel. Whether
    in initial access or tracking mode, the beams for sweeping the
    angles-of-departure and angles-of-arrival at the transmitter and receiver,
    respectively, of the multipath usually have a “sector shape”, meaning that
    their gain is large for a small range of angles and low for all other angles.
    Such beams are heuristic in nature and may not lead the best channel
    estimation/tracking performance. In this paper, we focus on the tracking phase,
    and investigate what are the optimal precoders for estimating the parameters of
    a single path according to the well known Cram’er-Rao lower bound. A procedure
    based on orthogonal matching pursuit (OMP) is proposed for generating such
    optimal precoders in a hybrid analog-digital architecture. Contrary to previous
    approaches which relied on approximations of OMP, we show that OMP can be
    computed exactly, leading to a substantial decrease in the number of required
    RF chains. To validate the theoretical results, the maximum likelihood
    estimator (MLE) and quasioptimal estimators of the channel parameters are
    derived and their accuracy evaluated.

    Unifying Message Passing Algorithms Under the Framework of Constrained Bethe Free Energy Minimization

    Dan Zhang, Wenjin Wang, Gerhard Fettweis, Xiqi Gao
    Subjects: Information Theory (cs.IT)

    Variational message passing (VMP), belief propagation (BP), expectation
    propagation (EP) and more recent generalized approximate message passing (GAMP)
    have found their wide uses in complex statistical inference problems. In
    addition to view them as a class of algorithms operating on graphical models,
    this paper unifies them under an optimization framework, namely, Bethe free
    energy minimization with differently and appropriately imposed constraints.
    This new perspective in terms of constraint manipulation can offer additional
    insights on the connection between message passing algorithms and it is valid
    for a generic statistical model, e.g., without requiring a fully separable
    a-priori density or likelihood function. Furthermore, it also founds a
    theoretical framework to systematically derive hybrid message passing for
    achieving a better comprise between inference performance and complexity.

    Study of cost functionals for ptychographic phase retrieval to improve the robustness against noise, and a proposal for another noise-robust ptychographic phase retrieval scheme

    A.P. Konijnenberg, W.M.J. Coene, H.P. Urbach
    Subjects: Information Theory (cs.IT)

    Recently, efforts have been made to improve ptychography phase retrieval
    algorithms so that they are more robust against noise. Often the algorithm is
    adapted by changing the cost functional that needs to be minimized. In
    particular, it has been suggested that the cost functional should be obtained
    using a maximum-likelihood approach that takes the noise statistics into
    account. Here, we consider the different choices of cost functional, and to how
    they affect the reconstruction results. We find that seemingly the only
    consistently reliable way to improve reconstruction results in the presence of
    noise is to reduce the step size of the update function. In addition, a
    noise-robust ptychographic reconstruction method has been proposed that relies
    on adapting the intensity constraints

    Measurement Results for Millimeter Wave pure LOS MIMO Channels

    Tim Hälsig, Darko Cvetkovski, Eckhard Grass, Berthold Lankl
    Comments: Accepted at IEEE WCNC 2017
    Subjects: Information Theory (cs.IT)

    In this paper we present measurement results for pure line-of-sight MIMO
    links operating in the millimeter wave range. We show that the estimated
    condition numbers and capacities of the measured channels are in good agreement
    with the theory for various transmission distances and antenna setups.
    Furthermore, the results show that orthogonal channel vectors can be observed
    if the spacing criterion is fulfilled, thus facilitating spatial multiplexing
    and achieving high spectral efficiencies even over fairly long distances.
    Spacings generating ill-conditioned channel matrices show on the other hand
    significantly reduced performance.

    On cyclic codes of composite length and the minimal distance

    Maosheng Xiong
    Subjects: Information Theory (cs.IT)

    In an interesting paper Professor Cunsheng Ding provided three constructions
    of cyclic codes of length being a product of two primes. Numerical data shows
    that many codes from these constructions are best cyclic codes of the same
    length and dimension over the same finite field. However, not much is known
    about these codes. In this paper we explain some of the mysteries of the
    numerical data by developing a general method on cyclic codes of composite
    length and on estimating the minimal distance. Inspired by the new method, we
    also provide a general construction of cyclic codes of composite length.
    Numerical data shows that it produces many best cyclic codes as well. Finally,
    we point out how these cyclic codes can be used to construct convolutional
    codes with large free distance.

    Advanced Quantizer Designs for FDD-based FD-MIMO Systems Using Uniform Planar Arrays

    Jiho Song, Junil Choi, Taeyoung Kim, David J. Love
    Comments: 13 pages, 6 figures
    Subjects: Information Theory (cs.IT)

    Massive multiple-input multiple-output (MIMO) systems, which utilize a large
    number of antennas at the base station, are expected to enhance network
    throughput by enabling improved multiuser MIMO techniques. To deploy many
    antennas in reasonable form factors, base stations are expected to employ
    antenna arrays in both horizontal and vertical dimensions, which is known as
    full-dimension (FD) MIMO. The most promising two-dimensional array is the
    uniform planar array (UPA), where antennas are placed in a grid pattern. To
    exploit the full benefit of massive MIMO in frequency division duplexing (FDD),
    the downlink channel state information (CSI) should be estimated, quantized,
    and fed back from the receiver to the transmitter. However, it is difficult to
    accurately quantize the channel in a computationally efficient manner due to
    the high dimensionality of the massive MIMO channel. In this paper, we develop
    both narrowband and wideband CSI quantizers for FD-MIMO taking the properties
    of realistic channels and the UPA into consideration. To improve quantization
    quality, we focus on not only quantizing dominant radio paths in the channel,
    but also combining the quantized beams. We also develop a hierarchical beam
    search approach, which scans both vertical and horizontal domains jointly with
    moderate computational complexity. Numerical simulations verify that the
    performance of the proposed quantizers is better than that of previous CSI
    quantization techniques.

    How to Scale Up the Spectral Efficiency of Multi-way Massive MIMO Relaying?

    Chung Duc Ho, Hien Quoc Ngo, Michail Matthaiou, Trung Q. Duong
    Subjects: Information Theory (cs.IT)

    This paper considers a decode-and-forward (DF) multi-way massive
    multiple-input multiple-output (MIMO) relay system where many users exchange
    their data with the aid of a relay station equipped with a massive antenna
    array. We propose a new transmission protocol which leverages successive
    cancelation decoding and zero-forcing (ZF) at the users. By using properties of
    massive MIMO, a tight analytical approximation of the spectral efficiency is
    derived. We show that our proposed scheme uses only half of the time-slots
    required in the conventional scheme (in which the number of time-slots is equal
    to the number of users [1]), to exchange data across different users. As a
    result, the sum spectral efficiency of our proposed scheme is nearly double the
    one of the conventional scheme, thereby boosting the performance of multi-way
    massive MIMO to unprecedented levels.

    Interference Exploitation in Full Duplex Communications: Trading Interference Power for Both Uplink and Downlink Power Savings

    Mahmoud T. Kabir, Muhammad R. A. Khandaker, Christos Masouros
    Comments: Submitted to IEEE Transactions on Signal Processing, March 2017
    Subjects: Information Theory (cs.IT)

    This paper considers a multiuser full-duplex (FD) wireless communication
    system, where a FD radio base station (BS) serves multiple single-antenna
    half-duplex (HD) uplink and downlink users simultaneously. Unlike conventional
    interference mitigation approaches, we propose to use the knowledge of the data
    symbols and the channel state information (CSI) at the FD radio BS to exploit
    the multi-user interference constructively rather than to suppress it. We
    propose a multi-objective optimisation problem (MOOP) via the weighted
    Tchebycheff method to study the trade-off between the two desirable system
    design objectives namely the total downlink transmit power minimisation and the
    total uplink transmit power minimisation problems at the same time ensuring the
    required quality-of-service (QoS) for all users. In the proposed MOOP, we adapt
    the QoS constraints for the downlink users to accommodate constructive
    interference (CI) for both generic phase shift keying (PSK) modulated signals
    as well as for quadrature amplitude modulated (QAM) signals. We also extended
    our work to a robust design to study the system with imperfect uplink, downlink
    and self-interference CSI. Simulation results and analysis show that,
    significant power savings can be obtained. More importantly, however, the MOOP
    approach here allows for the power saved to be traded off for both uplink and
    downlink power savings, leading to an overall energy efficiency improvement in
    the wireless link.

    Millimeter Wave communication with out-of-band information

    Nuria González Prelcic, Anum Ali, Vutha Va, Robert W. Heath Jr
    Comments: 14 pages, 6 figures
    Subjects: Information Theory (cs.IT)

    Configuring the antenna arrays is the main source of overhead in millimeter
    wave (mmWave) communication systems. In high mobility scenarios, the problem is
    exacerbated, as achieving the highest rates requires frequent link
    reconfiguration. One solution is to exploit spatial congruence between signals
    at different frequency bands and extract mmWave channel parameters from side
    information obtained in another band. In this paper we propose the concept of
    out-of-band information aided mmWave communication. We analyze different
    strategies to leverage information derived from sensors or from other
    communication systems operating at sub-6 GHz bands to help configure the mmWave
    communication link. The overhead reductions that can be obtained when
    exploiting out-of-band information are characterized in a preliminary study.
    Finally, the challenges associated with using out-of-band signals as a source
    of side information at mmWave are analyzed in detail.

    3D MIMO Outdoor-to-Indoor Propagation Channel Measurement

    Vinod Kristem, Seun Sangodoyin, C. U. Bas, Martin Kaeske, Juho Lee, Christian Schneider, Gerd Sommerkorn, J. Zhang, Reiner S. Thomae, Andreas F. Molisch
    Subjects: Information Theory (cs.IT)

    3-dimensional Multiple-Input Multiple-Output (3D MIMO) systems have received
    great interest recently because of the spatial diversity advantage and
    capability for full-dimensional beamforming, making them promising candidates
    for practical realization of massive MIMO. In this paper, we present a low-cost
    test equipment (channel sounder) and post-processing algorithms suitable for
    investigating 3D MIMO channels, as well as the results from a measurement
    campaign for obtaining elevation and azimuth characteristics in an
    outdoor-to-indoor (O2I) environment. Due to limitations in available antenna
    switches, our channel sounder consists of a hybrid switched/virtual cylindrical
    array with effectively 480 antenna elements at the base station (BS). The
    virtual setup increased the overall MIMO measurement duration, thereby
    introducing phase drift errors in the measurements. Using a reference antenna
    measurements, we estimate and correct for the phase errors during
    post-processing. We provide the elevation and azimuth angular spreads, for the
    measurements done in an urban macro-cellular (UMa) and urban micro-cellular
    (UMi) environments, and study their dependence on the UE height.

    Based on the measurements done with UE placed on different floors, we study
    the feasibility of separating users in the elevation domain. The measured
    channel impulse responses are also used to study the channel hardening aspects
    of Massive MIMO and the optimality of Maximum Ratio Combining (MRC) receiver.

    Time-triggering versus event-triggering control over communication channels

    Mohammad Javad Khojasteh, Pavankumar Tallapragada, Jorge Cortes, Massimo Franceschetti
    Comments: arXiv admin note: text overlap with arXiv:1609.09594
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Systems and Control (cs.SY)

    Time-triggered and event-triggered control strategies for stabilization of an
    unstable plant over a rate-limited communication channel subject to unknown,
    bounded delay are studied and compared. Event triggering carries implicit
    information, revealing the state of the plant. However, the delay in the
    communication channel causes information loss, as it makes the state
    information out of date. There is a critical delay value, when the loss of
    information due to the communication delay perfectly compensates the implicit
    information carried by the triggering events. This occurs when the maximum
    delay equals the inverse of the entropy rate of the plant. In this context,
    extensions of our previous results for event triggering strategies are
    presented for vector systems and are compared with the data-rate theorem for
    time-triggered control, that is extended here to a setting with unknown delay.

    The Informativeness of k-Means for Learning Gaussian Mixture Models

    Zhaoqiang Liu, Vincent Y. F. Tan
    Comments: 10 pages, 3 figures
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG); Methodology (stat.ME)

    The learning of Gaussian mixture models (GMMs) is a classical problem in
    machine learning and applied statistics. This can also be interpreted as a
    clustering problem. Indeed, given data samples independently generated from a
    GMM, we would like to find the correct target clustering of the samples
    according to which Gaussian they were generated from. Despite the large number
    of algorithms designed to find the correct target clustering, many
    practitioners prefer to use the k-means algorithm because of its simplicity.
    k-means tries to find an optimal clustering which minimizes the sum of squared
    distances between each point and its cluster center. In this paper, we provide
    sufficient conditions for the closeness of any optimal clustering and the
    correct target clustering of the samples which are independently generated from
    a GMM. Moreover, to achieve significantly faster running time and reduced
    memory usage, we show that under weaker conditions on the GMM, any optimal
    clustering for the samples with reduced dimensionality is also close to the
    correct target clustering. These results provide intuition for the
    informativeness of k-means as an algorithm for learning a GMM, further
    substantiating the conclusions in Kumar and Kannan [2010]. We verify the
    correctness of our theorems using numerical experiments and show, using
    datasets with reduced dimensionality, significant speed ups for the time
    required to perform clustering.




沪ICP备19023445号-2号
友情链接