IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 1 Mar 2017

    我爱机器学习(52ml.net)发表于 2017-03-01 00:00:00
    love 0

    Computer Vision and Pattern Recognition

    Predicting Slice-to-Volume Transformation in Presence of Arbitrary Subject Motion

    Benjamin Hou, Amir Alansary, Steven McDonagh, Daniel Rueckert, Ben Glocker, Bernhard Kainz
    Comments: 8 pages, 4 figures, 6 pages supplemental material, currently under review for MICCAI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper aims to solve a fundamental problem in intensity-based 2D/3D
    registration, which concerns the limited capture range and need for very good
    initialization of state-of-the-art image registration methods. We propose a
    regression approach that learns to predict rotation and translations of
    arbitrary 2D image slices from 3D volumes, with respect to a learned canonical
    atlas co-ordinate system. To this end, we utilize Convolutional Neural Networks
    (CNNs) to learn the highly complex regression function that maps 2D image
    slices into their correct position and orientation in 3D space. Our approach is
    attractive in challenging imaging scenarios, where significant subject motion
    complicates reconstruction performance of 3D volumes from 2D slice data. We
    extensively evaluate the effectiveness of our approach quantitatively on
    simulated MRI brain data with extreme random motion. We further demonstrate
    qualitative results on fetal MRI where our method is integrated into a full
    reconstruction and motion compensation pipeline. With our CNN regression
    approach we obtain an average prediction error of 7mm on simulated data, and
    convincing reconstruction quality of images of very young fetuses where
    previous methods fail. We further discuss applications to Computed Tomography
    and X-ray projections. Our approach is a general solution to the 2D/3D
    initialization problem. It is computationally efficient, with prediction times
    per slice of a few milliseconds, making it suitable for real-time scenarios.

    Unsupervised Triplet Hashing for Fast Image Retrieval

    Shanshan Huang, Yichao Xiong, Ya Zhang, Jia Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Hashing has played a pivotal role in large-scale image retrieval. With the
    development of Convolutional Neural Network (CNN), hashing learning has shown
    great promise. But existing methods are mostly tuned for classification, which
    are not optimized for retrieval tasks, especially for instance-level retrieval.
    In this study, we propose a novel hashing method for large-scale image
    retrieval. Considering the difficulty in obtaining labeled datasets for image
    retrieval task in large scale, we propose a novel CNN-based unsupervised
    hashing method, namely Unsupervised Triplet Hashing (UTH). The unsupervised
    hashing network is designed under the following three principles: 1) more
    discriminative representations for image retrieval; 2) minimum quantization
    loss between the original real-valued feature descriptors and the learned hash
    codes; 3) maximum information entropy for the learned hash codes. Extensive
    experiments on CIFAR-10, MNIST and In-shop datasets have shown that UTH
    outperforms several state-of-the-art unsupervised hashing methods in terms of
    retrieval accuracy.

    ShaResNet: reducing residual network parameter number by sharing weights

    Alexandre Boulch
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Deep Residual Networks have reached the state of the art in many image
    processing tasks such image classification. However, the cost for a gain in
    accuracy in terms of depth and memory is prohibitive as it requires a higher
    number of residual blocks, up to double the initial value. To tackle this
    problem, we propose in this paper a way to reduce the redundant information of
    the networks. We share the weights of convolutional layers between residual
    blocks operating at the same spatial scale. The signal flows multiple times in
    the same convolutional layer. The resulting architecture, called ShaResNet,
    contains block specific layers and shared layers. These ShaResNet are trained
    exactly in the same fashion as the commonly used residual networks. We show, on
    the one hand, that they are almost as efficient as their sequential
    counterparts while involving less parameters, and on the other hand that they
    are more efficient than a residual network with the same number of parameters.
    For example, a 152-layer-deep residual network can be reduced to 106
    convolutional layers, i.e. a parameter gain of 39\%, while loosing less than
    0.2\% accuracy on ImageNet.

    MILD: Multi-Index hashing for Loop closure Detection

    Lei Han, Lu Fang
    Comments: 6 pages, 5 figures; accepted by IEEE ICME 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Loop Closure Detection (LCD) has been proved to be extremely useful in global
    consistent visual Simultaneously Localization and Mapping (SLAM) and
    appearance-based robot relocalization. Methods exploiting binary features in
    bag of words representation have recently gained a lot of popularity for their
    efficiency, but suffer from low recall due to the inherent drawback that high
    dimensional binary feature descriptors lack well-defined centroids. In this
    paper, we propose a realtime LCD approach called MILD (Multi-Index Hashing for
    Loop closure Detection), in which image similarity is measured by feature
    matching directly to achieve high recall without introducing extra
    computational complexity with the aid of Multi-Index Hashing (MIH). A
    theoretical analysis of the approximate image similarity measurement using MIH
    is presented, which reveals the trade-off between efficiency and accuracy from
    a probabilistic perspective. Extensive comparisons with state-of-the-art LCD
    methods demonstrate the superiority of MILD in both efficiency and accuracy.

    Weakly- and Semi-Supervised Object Detection with Expectation-Maximization Algorithm

    Ziang Yan, Jian Liang, Weishen Pan, Jin Li, Changshui Zhang
    Comments: 9 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Object detection when provided image-level labels instead of instance-level
    labels (i.e., bounding boxes) during training is an important problem in
    computer vision, since large scale image datasets with instance-level labels
    are extremely costly to obtain. In this paper, we address this challenging
    problem by developing an Expectation-Maximization (EM) based object detection
    method using deep convolutional neural networks (CNNs). Our method is
    applicable to both the weakly-supervised and semi-supervised settings.
    Extensive experiments on PASCAL VOC 2007 benchmark show that (1) in the weakly
    supervised setting, our method provides significant detection performance
    improvement over current state-of-the-art methods, (2) having access to a small
    number of strongly (instance-level) annotated images, our method can almost
    match the performace of the fully supervised Fast RCNN. We share our source
    code at this https URL

    Billion-scale similarity search with GPUs

    Jeff Johnson, Matthijs Douze, Hervé Jégou
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB); Data Structures and Algorithms (cs.DS)

    Similarity search finds application in specialized database systems handling
    complex data such as images or videos, which are typically represented by
    high-dimensional features and require specific indexing structures. This paper
    tackles the problem of better utilizing GPUs for this task. While GPUs excel at
    data-parallel tasks, prior approaches are bottlenecked by algorithms that
    expose less parallelism, such as k-min selection, or make poor use of the
    memory hierarchy.

    We propose a design for k-selection that operates at up to 55% of theoretical
    peak performance, enabling a nearest neighbor implementation that is 8.5x
    faster than prior GPU state of the art. We apply it in different similarity
    search scenarios, by proposing optimized design for brute-force, approximate
    and compressed-domain search based on product quantization. In all these
    setups, we outperform the state of the art by large margins. Our implementation
    enables the construction of a high accuracy k-NN graph on 95 million images
    from the Yfcc100M dataset in 35 minutes, and of a graph connecting 1 billion
    vectors in less than 12 hours on 4 Maxwell Titan X GPUs. We have open-sourced
    our approach for the sake of comparison and reproducibility.

    An Extensive Technique to Detect and Analyze Melanoma: A Challenge at the International Symposium on Biomedical Imaging (ISBI) 2017

    G Wiselin Jiji, P Johnson Durai Raj
    Comments: 4 Page Abstract for ISIC 2017 Challenge
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    An automated method to detect and analyze the melanoma is presented to
    improve diagnosis which will leads to the exact treatment. Image processing
    techniques such as segmentation, feature descriptors and classification models
    are involved in this method. In the First phase the lesion region is segmented
    using CIELAB Color space Based Segmentation. Then feature descriptors such as
    shape, color and texture are extracted. Finally, in the third phase lesion
    region is classified as melanoma, seborrheic keratosis or nevus using multi
    class O-A SVM model. Experiment with ISIC 2017 Archive skin image database has
    been done and analyzed the results.

    II-FCN for skin lesion analysis towards melanoma detection

    Hongdiao Wen
    Comments: 4 page abstract about our solution to the challenge of Lesion Segmentation in ISBI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Dermoscopy image detection stays a tough task due to the weak distinguishable
    property of the object.Although the deep convolution neural network
    signifigantly boosted the performance on prevelance computer vision tasks in
    recent years,there remains a room to explore more robust and precise models to
    the problem of low contrast image segmentation.Towards the challenge of Lesion
    Segmentation in ISBI 2017,we built a symmetrical identity inception fully
    convolution network which is based on only 10 reversible inception blocks,every
    block composed of four convolution branches with combination of different layer
    depth and kernel size to extract sundry semantic features.Then we proposed an
    approximate loss function for jaccard index metrics to train our model.To
    overcome the drawbacks of traditional convolution,we adopted the dilation
    convolution and conditional random field method to rectify our segmentation.We
    also introduced multiple ways to prevent the problem of overfitting.The
    experimental results shows that our model achived jaccard index of 0.82 and
    kept learning from epoch to epoch.

    Cascade one-vs-rest detection network for fine-grained recognition without part annotations

    Long Chen, Junyu Dong, ShengKe Wang, Kin-Man Lam, Muwei Jian, Hua Zhang, XiaoChun Cao
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Fine-grained recognition is a challenging task due to the small
    intra-category variances. Most of top-performing fine-grained recognition
    methods leverage parts of objects for better performance. Therefore, part
    annotations which are extremely computationally expensive are required. In this
    paper, we propose a novel cascaded deep CNN detection framework for
    fine-grained recognition which is trained to detect the whole object without
    considering parts. Nevertheless, most of current top-performing detection
    networks use the N+1 class (N object categories plus background) softmax loss,
    and the background category with much more training samples dominates the
    feature learning progress so that the features are not good for object
    categories with fewer samples. To bridge this gap, we introduce a cascaded
    structure to eliminate background and exploit a one-vs-rest loss to capture
    more minute variances among different subordinate categories. Experiments show
    that our proposed recognition framework achieves comparable performance with
    state-of-the-art, part-free, fine-grained recognition methods on the
    CUB-200-2011 Bird dataset. Moreover, our method even outperforms most of
    part-based methods while does not need part annotations at the training stage
    and is free from any annotations at test stage.

    Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-tuning

    Weifeng Ge, Yizhou Yu
    Comments: To appear in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep neural networks require a large amount of labeled training data during
    supervised learning. However, collecting and labeling so much data might be
    infeasible in many cases. In this paper, we introduce a source-target selective
    joint fine-tuning scheme for improving the performance of deep learning tasks
    with insufficient training data. In this scheme, a target learning task with
    insufficient training data is carried out simultaneously with another source
    learning task with abundant training data. However, the source learning task
    does not use all existing training data. Our core idea is to identify and use a
    subset of training images from the original source learning task whose
    low-level characteristics are similar to those from the target learning task,
    and jointly fine-tune shared convolutional layers for both tasks. Specifically,
    we compute descriptors from linear or nonlinear filter bank responses on
    training images from both tasks, and use such descriptors to search for a
    desired subset of training samples for the source learning task.

    Experiments demonstrate that our selective joint fine-tuning scheme achieves
    state-of-the-art performance on multiple visual classification tasks with
    insufficient training data for deep learning. Such tasks include Caltech 256,
    MIT Indoor 67, Oxford Flowers 102 and Stanford Dogs 120. In comparison to
    fine-tuning without a source domain, the proposed method can improve the
    classification accuracy by 2% – 10% using a single model.

    MIML-FCN+: Multi-instance Multi-label Learning via Fully Convolutional Networks with Privileged Information

    Hao Yang, Joey Tianyi Zhou, Jianfei Cai, Yew Soon Ong
    Comments: Accepted in CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Multi-instance multi-label (MIML) learning has many interesting applications
    in computer visions, including multi-object recognition and automatic image
    tagging. In these applications, additional information such as bounding-boxes,
    image captions and descriptions is often available during training phrase,
    which is referred as privileged information (PI). However, as existing works on
    learning using PI only consider instance-level PI (privileged instances), they
    fail to make use of bag-level PI (privileged bags) available in MIML learning.
    Therefore, in this paper, we propose a two-stream fully convolutional network,
    named MIML-FCN+, unified by a novel PI loss to solve the problem of MIML
    learning with privileged bags. Compared to the previous works on PI, the
    proposed MIML-FCN+ utilizes the readily available privileged bags, instead of
    hard-to-obtain privileged instances, making the system more general and
    practical in real world applications. As the proposed PI loss is convex and SGD
    compatible and the framework itself is a fully convolutional network, MIML-FCN+
    can be easily integrated with state of-the-art deep learning networks.
    Moreover, the flexibility of convolutional layers allows us to exploit
    structured correlations among instances to facilitate more effective training
    and testing. Experimental results on three benchmark datasets demonstrate the
    effectiveness of the proposed MIML-FCN+, outperforming state-of-the-art methods
    in the application of multi-object recognition.

    3D Shape Segmentation via Shape Fully Convolutional Networks

    Pengyu Wang, Yuan Gan, Yan Zhang, Panpan Shui
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a novel fully convolutional networks architecture for shapes,
    denoted as Shape Fully Convolutional Networks (SFCN). Similar to convolution
    and pooling operation on image, the 3D shape is represented as a graph
    structure in the SFCN architecture, based on which we first propose and
    implement shape convolution and pooling operation. Meanwhile, to build our SFCN
    architecture in the original image segmentation FCN architecture, we also
    design and implement the generating operation with bridging function. This
    ensures that the convolution and pooling operation we designed can be
    successfully applied in the original FCN architecture. In this paper,we also
    present a new shape segmentation based on SFCN. In contrast to existing
    state-of-the-art shape segmentation methods that require the same types of
    shapes as input, we allow the more general and challenging input such as mixed
    datasets of different types of shapes. In our approach, SFCNs are first trained
    end-to-end, triangles-to-triangles by three low-level geometric features. Then,
    based on the trained SFCNs, we can complete the shape segmentation task with
    high quality. Finally, The feature voting-based multilabel graph cuts is
    adopted to optimize the segmentation results obtained by SFCN prediction. The
    experiment results show that our method can effectively learn and predict mixed
    shape datasets of either similar or different characters, and achieve excellent
    segmentation results.

    Scene Flow to Action Map: A New Representation for RGB-D based Action Recognition with Convolutional Neural Networks

    Pichao Wang, Wanqing Li, Zhimin Gao, Yuyao Zhang, Chang Tang, Philip Ogunbona
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Scene flow describes the motion of 3D objects in real world and potentially
    could be the basis of a good feature for 3D action recognition. However, its
    use for action recognition, especially in the context of convolutional neural
    networks (ConvNets), has not been previously studied. In this paper, we propose
    the extraction and use of scene flow for action recognition from RGB-D data.
    Previous works have considered the depth and RGB modalities as separate
    channels and extract features for later fusion. We take a different approach
    and consider the modalities as one entity, thus allowing feature extraction for
    action recognition at the beginning. Two key questions about the use of scene
    flow for action recognition are addressed: how to organize the scene flow
    vectors and how to represent the long term dynamics of videos based on scene
    flow. In order to calculate the scene flow correctly on the available datasets,
    we propose an effective self-calibration method to align the RGB and depth data
    spatially without knowledge of the camera parameters. Based on the scene flow
    vectors, we propose a new representation, namely, Scene Flow to Action Map
    (SFAM), that describes several long term spatio-temporal dynamics for action
    recognition. We adopt a channel transform kernel to transform the scene flow
    vectors to an optimal color space analogous to RGB. This transformation takes
    better advantage of the trained ConvNets models over ImageNet. Experimental
    results indicate that this new representation can surpass the performance of
    state-of-the-art methods on two large public datasets.

    Joint Spatio-Temporal Boundary Detection and Boundary Flow Prediction with a Fully Convolutional Siamese Network

    Peng Lei, Fuxin Li, Sinisa Todorovic
    Comments: 10 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper addresses a new problem of joint object boundary detection and
    boundary motion estimation in videos, which we named boundary flow estimation.
    Boundary flow is an important mid-level visual cue as boundaries characterize
    spatial extents of objects, and the flow indicates motions and interactions of
    objects. Yet, most prior work on motion estimation has focused on dense object
    motion or feature points that may not necessarily reside on boundaries. For
    boundary flow estimation, we specify a new fully convolutional Siamese network
    (FCSN) that jointly estimates object-level boundaries in two consecutive
    frames. Boundary correspondences in the two frames are predicted by the same
    FCSN with a new, unconventional deconvolution approach. Finally, the boundary
    flow estimate is improved with an edgelet based filtering. Evaluation is
    conducted on three tasks: boundary detection in videos, boundary flow
    estimation, and optical flow estimation. On boundary detection, we achieve the
    state-of-the-art performance on the benchmark VSB100 dataset. On boundary flow
    estimation, we present the first results on the Sintel training dataset. For
    optical flow estimation, we run the recent approach CPM-Flow but on the
    augmented input with our boundary flow matches, and achieve significant
    performance improvement on the Sintel benchmark.

    Selective Video Cutout using Global Pyramid Models and Local Uncertainty Propagation

    Wenguan Wang, Jianbing Shen, Fatih Porikli
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Conventional video segmentation approaches rely heavily on appearance models.
    Such methods often use appearance descriptors that have limited discriminative
    power under complex scenarios. To improve the segmentation performance, this
    paper presents a pyramid histogram based confidence map that incorporates
    structure information into appearance statistics. It also combines geodesic
    distance based dynamic models. Then, it employs an efficient measure of
    uncertainty propagation using local classifiers to determine the image regions
    where the object labels might be ambiguous. The final foreground cutout is
    obtained by refining on the uncertain regions. Additionally, to reduce manual
    labeling, our method determines the frames to be labeled by the human operator
    in a principled manner, which further boosts the segmentation performance and
    minimizes the labeling effort. Our extensive experimental analyses on two big
    benchmarks demonstrate that our solution achieves superior performance,
    favorable computational efficiency, and reduced manual labeling in comparison
    to the state-of-the-art.

    Super-Trajectory for Video Segmentation

    Wenguan Wang, Shenjian Bing
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a semi-supervised video segmentation via an efficient video
    representation, called as “super-trajectory”. Each super-trajectory corresponds
    to a group of compact trajectories that exhibit consistent motion patterns,
    similar appearance and close spatiotemporal relationships. To handle occlusions
    and drifts, we develop a trajectory generation method based on probabilistic
    model, which is more reasonable and interpretable than traditional trajectory
    methods using hard thresholding. We then modify a density peaks based
    clustering algorithm for reliably grouping trajectories, thus capturing a rich
    set of spatial and temporal relations among trajectories. Via this
    discriminative video representation, manual effort on the first frame can be
    efficiently propagated into the rest of frames. Experimental results on
    challenging benchmark demonstrate the proposed approach is capable of
    distinguishing object from complex background and even re-identifying object
    with long-term occlusions.

    An Inexact Proximal Alternating Direction Method for Non-convex and Non-smooth Matrix Factorization and Beyond

    Yiyang Wang, Risheng Liu, Xiaoliang Song, Zhixun Su
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)

    Since Non-convex and Non-smooth Matrix Factorization (NNMF) problems have
    great realistic significance in applications, they attract extensive attention
    in the fields of image processing and machine learning. We in this paper
    propose an inexact proximal alternating direction (IPAD) method for solving
    various complex NNMF problems. Our IPAD method is not a single algorithm, but a
    general and flexible framework which can fuse various numerical methods into.
    With a special designed error condition, the convergence properties of IPAD are
    analyzed for a general formulation, and can be extended to a wider range of
    problems. Moreover, an implementation method for checking the inexactness
    criterion is theoretically analyzed, which is more valid than the previously
    naive criteria in practice. Our IPAD algorithm is applied to a widely-concerned
    sparse dictionary learning problem on both synthetic and real-world data. The
    experimental results with detailed analyses and discussions are given to verify
    the efficiency of IPAD method.

    The Active Atlas: Combining 3D Anatomical Models with Texture Detectors

    Yuncong Chen, David Kleinfeld, Yoav Freund
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    While modern imaging technologies such as fMRI have opened exciting new
    possibilities for studying the brain in vivo, histological sections remain the
    best way to study the anatomy of the brain at the level of single neurons. The
    histological atlas changed little since 1909 and localizing brain regions is a
    still a labor intensive process performed only by experienced neuro-anatomists.
    Existing digital atlases such as the Allen Brain atlas are limited to low
    resolution images which cannot identify the detailed structure of the neurons.
    We have developed a digital atlas methodology that combines information about
    the 3D organization of the brain and the detailed texture of neurons in
    different structures. Using the methodology we developed an atlas for the mouse
    brainstem and mid-brain, two regions for which there are currently no good
    atlases. Our atlas is “active” in that it can be used to automatically align a
    histological stack to the atlas, thus reducing the work of the neuroanatomist.

    Accurate, Scalable and Parallel Structure from Motion

    Siyu Zhu, Tianwei Shen, Lei Zhou, Runze Zhang, Tian Fang, Long Quan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we tackle the accurate Structure from Motion (SfM) problem, in
    particular camera registration, far exceeding the memory of a single compute
    node. Different from the previous methods which drastically simplify the
    parameters of SfM, we preserve as many cameras, tracks and their corresponding
    connectivity as possible for a highly consistent and accurate SfM. By means of
    a camera clustering algorithm, we divide all the cameras and associated images
    into clusters and leverage such formulation to process the subsequent track
    generation, local incremental SfM and final bundle adjustment in a scalable and
    parallel scheme. Taking the advantages of both incremental and global SfM
    methods, we apply the relative motions from local incremental SfM to the global
    motion averaging framework and obtain more accurate and robust global camera
    poses than the state-of-the-art methods. We intensively demonstrate the
    superior performance of our method on the benchmark, Internet and our own
    challenging city-scale data-sets.

    Enabling Sparse Winograd Convolution by Native Pruning

    Sheng Li, Jongsoo Park, Ping Tak Peter Tang
    Comments: 10 pages, 2 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Sparse methods and the use of Winograd convolutions are two orthogonal
    approaches each of which significantly accelerates convolution computations in
    modern CNNs. Sparse Winograd merges these two and thus has the potential to
    offer a combined performance benefit. Nevertheless, training convolution layers
    so that the resulting Winograd kernels are sparse has not hitherto been very
    successful. We introduce here a novel construction of Sparse Winograd by
    introducing a substitute for the convolution layer that we call, naturally, the
    Winograd layer. Such a construction allows us to prune parameters natively in
    the Winograd domain. This paper presents the technical details related to the
    Winograd layer and our train-and-prune procedures. We achieved more than 90%
    sparsity in the Winograd parameters while maintaining the original accuracy of
    AlexNet on ImageNet dataset. Our detailed performance model also projects a
    speedup over convolution by dense Winograd kernels in excess of twofold.

    DepthSynth: Real-Time Realistic Synthetic Data Generation from CAD Models for 2.5D Recognition

    Benjamin Planche, Ziyan Wu, Kai Ma, Shanhui Sun, Stefan Kluckner, Terrence Chen, Andreas Hutter, Sergey Zakharov, Harald Kosch, Jan Ernst
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent progress in computer vision has been dominated by deep neural networks
    trained with large amount of labeled data. Collecting and annotating such
    datasets is however a tedious, and in some contexts impossible task; hence a
    recent surge in approaches that rely solely on synthetically generated data
    from 3D models for their training. For depth images however, the discrepancies
    with real scans noticeably affect the performance of such methods. In this
    paper, we propose an innovative end-to-end framework which simulate the whole
    mechanism of these devices, synthetically generating realistic depth data from
    3D CAD models by comprehensively modeling vital factors such as sensor noise,
    material reflectance, surface geometry, etc. Besides covering a wider range of
    sensors than state-of-the-art methods, the proposed one also results in more
    realistic data. Going further than previous works, we not only qualitatively
    evaluate the generated scans, but also quantitatively measure through extensive
    experiments and comparisons how they impact the training of neural network
    algorithms for different 3D recognition tasks, demonstrating how our pipeline
    seamlessly integrates such architectures; and how it consistently and
    significantly enhances their performance-irrespective of the selected feature
    space or intermediate representations.

    Lensless computational imaging through deep learning

    Ayan Sinha, Justin Lee, Shuai Li, George Barbastathis
    Comments: 6 pages, 8 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep learning has been proven to yield reliably generalizable answers to
    numerous classification and decision tasks. Here, we demonstrate for the first
    time, to our knowledge, that deep neural networks (DNNs) can be trained to
    solve inverse problems in computational imaging. We experimentally demonstrate
    a lens-less imaging system where a DNN was trained to recover a phase object
    given a raw intensity image recorded some distance away.

    Learning Deep Visual Object Models From Noisy Web Data: How to Make it Work

    Nizar Massouh, Francesca Babiloni, Tatiana Tommasi, Jay Young, Nick Hawes, Barbara Caputo
    Comments: 8 pages, 7 figures, 3 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB); Learning (cs.LG); Robotics (cs.RO)

    Deep networks thrive when trained on large scale data collections. This has
    given ImageNet a central role in the development of deep architectures for
    visual object classification. However, ImageNet was created during a specific
    period in time, and as such it is prone to aging, as well as dataset bias
    issues. Moving beyond fixed training datasets will lead to more robust visual
    systems, especially when deployed on robots in new environments which must
    train on the objects they encounter there. To make this possible, it is
    important to break free from the need for manual annotators. Recent work has
    begun to investigate how to use the massive amount of images available on the
    Web in place of manual image annotations. We contribute to this research thread
    with two findings: (1) a study correlating a given level of noisily labels to
    the expected drop in accuracy, for two deep architectures, on two different
    types of noise, that clearly identifies GoogLeNet as a suitable architecture
    for learning from Web data; (2) a recipe for the creation of Web datasets with
    minimal noise and maximum visual variability, based on a visual and natural
    language processing concept expansion strategy. By combining these two results,
    we obtain a method for learning powerful deep object models automatically from
    the Web. We confirm the effectiveness of our approach through object
    categorization experiments using our Web-derived version of ImageNet on a
    popular robot vision benchmark database, and on a lifelong object discovery
    task on a mobile robot.

    Understanding Convolution for Semantic Segmentation

    Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, Garrison Cottrell
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent advances in deep learning, especially deep convolutional neural
    networks (CNNs), have led to significant improvement over previous semantic
    segmentation systems. Here we show how to improve pixel-wise semantic
    segmentation by manipulating convolution-related operations that are better for
    practical use. First, we implement dense upsampling convolution (DUC) to
    generate pixel-level prediction, which is able to capture and decode more
    detailed information that is generally missing in bilinear upsampling. Second,
    we propose a hybrid dilated convolution (HDC) framework in the encoding phase.
    This framework 1) effectively enlarges the receptive fields of the network to
    aggregate global information; 2) alleviates what we call the “gridding issue”
    caused by the standard dilated convolution operation. We evaluate our
    approaches thoroughly on the Cityscapes dataset, and achieve a new state-of-art
    result of 80.1% mIOU in the test set. We also are state-of-the-art overall on
    the KITTI road estimation benchmark and the PASCAL VOC2012 segmentation task.
    Pretrained models are available at this https URL

    Memory-Efficient Global Refinement of Decision-Tree Ensembles and its Application to Face Alignment

    Nenad Markuš, Ivan Gogić, Igor S. Pandžić, Jörgen Ahlberg
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Ren et al. recently introduced a method for aggregating multiple decision
    trees into a strong predictor by interpreting a path taken by a sample down
    each tree as a binary vector and performing linear regression on top of these
    vectors stacked together. They provided experimental evidence that the method
    offers advantages over the usual approaches for combining decision trees
    (random forests and boosting). The method truly shines when the regression
    target is a large vector with correlated dimensions, such as a 2D face shape
    represented with the positions of several facial landmarks. However, we argue
    that their basic method is not applicable in many practical scenarios due to
    large memory requirements. This paper shows how this issue can be solved
    through the use of quantization and architectural changes of the predictor that
    maps decision tree-derived encodings to the desired output.

    Show, Attend and Interact: Perceivable Human-Robot Social Interaction through Neural Attention Q-Network

    Ahmed Hussain Qureshi, Yutaka Nakamura, Yuichiro Yoshikawa, Hiroshi Ishiguro
    Comments: 7 pages, 5 figures, accepted by IEEE-RAS ICRA’17
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    For a safe, natural and effective human-robot social interaction, it is
    essential to develop a system that allows a robot to demonstrate the
    perceivable responsive behaviors to complex human behaviors. We introduce the
    Multimodal Deep Attention Recurrent Q-Network using which the robot exhibits
    human-like social interaction skills after 14 days of interacting with people
    in an uncontrolled real world. Each and every day during the 14 days, the
    system gathered robot interaction experiences with people through a
    hit-and-trial method and then trained the MDARQN on these experiences using
    end-to-end reinforcement learning approach. The results of interaction based
    learning indicate that the robot has learned to respond to complex human
    behaviors in a perceivable and socially acceptable manner.

    Image Analysis Using a Dual-Tree (M)-Band Wavelet Transform

    Caroline Chaux, Laurent Duval, Jean-Christophe Pesquet
    Journal-ref: IEEE Transactions on Image Processing, August 2006, Volume 15,
    Issue 8, p. 2397-2412
    Subjects: Data Analysis, Statistics and Probability (physics.data-an); Computer Vision and Pattern Recognition (cs.CV); Functional Analysis (math.FA)

    We propose a 2D generalization to the (M)-band case of the dual-tree
    decomposition structure (initially proposed by N. Kingsbury and further
    investigated by I. Selesnick) based on a Hilbert pair of wavelets. We
    particularly address ( extit{i}) the construction of the dual basis and
    ( extit{ii}) the resulting directional analysis. We also revisit the necessary
    pre-processing stage in the (M)-band case. While several reconstructions are
    possible because of the redundancy of the representation, we propose a new
    optimal signal reconstruction technique, which minimizes potential estimation
    errors. The effectiveness of the proposed (M)-band decomposition is
    demonstrated via denoising comparisons on several image types (natural,
    texture, seismics), with various (M)-band wavelets and thresholding strategies.
    Significant improvements in terms of both overall noise reduction and direction
    preservation are observed.


    Artificial Intelligence

    Bridging the Gap Between Value and Policy Based Reinforcement Learning

    Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    We formulate a new notion of softmax temporal consistency that generalizes
    the standard hard-max Bellman consistency usually considered in value based
    reinforcement learning (RL). In particular, we show how softmax consistent
    action values correspond to optimal policies that maximize entropy regularized
    expected reward. More importantly, we establish that softmax consistent action
    values and the optimal policy must satisfy a mutual compatibility property that
    holds across any state-action subsequence. Based on this observation, we
    develop a new RL algorithm, Path Consistency Learning (PCL), that minimizes the
    total inconsistency measured along multi-step subsequences extracted from both
    both on and off policy traces. An experimental evaluation demonstrates that PCL
    significantly outperforms strong actor-critic and Q-learning baselines across
    several benchmark tasks.

    Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

    Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Philip. H. S. Torr, Pushmeet Kohli, Shimon Whiteson
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Multiagent Systems (cs.MA)

    Many real-world problems, such as network packet routing and urban traffic
    control, are naturally modeled as multi-agent reinforcement learning (RL)
    problems. However, existing multi-agent RL methods typically scale poorly in
    the problem size. Therefore, a key challenge is to translate the success of
    deep learning on single-agent RL to the multi-agent setting. A key stumbling
    block is that independent Q-learning, the most popular multi-agent RL method,
    introduces nonstationarity that makes it incompatible with the experience
    replay memory on which deep RL relies. This paper proposes two methods that
    address this problem: 1) conditioning each agent’s value function on a
    footprint that disambiguates the age of the data sampled from the replay memory
    and 2) using a multi-agent variant of importance sampling to naturally decay
    obsolete data. Results on a challenging decentralised variant of StarCraft unit
    micromanagement confirm that these methods enable the successful combination of
    experience replay with multi-agent RL.

    Optimal Categorical Attribute Transformation for Granularity Change in Relational Databases for Binary Decision Problems in Educational Data Mining

    Paulo J. L. Adeodato, Fábio C. Pereira, Rosalvo F. Oliveira Neto
    Comments: 5 pages, 2 figures, 2 tables
    Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB)

    This paper presents an approach for transforming data granularity in
    hierarchical databases for binary decision problems by applying regression to
    categorical attributes at the lower grain levels. Attributes from a lower
    hierarchy entity in the relational database have their information content
    optimized through regression on the categories histogram trained on a small
    exclusive labelled sample, instead of the usual mode category of the
    distribution. The paper validates the approach on a binary decision task for
    assessing the quality of secondary schools focusing on how logistic regression
    transforms the students and teachers attributes into school attributes.
    Experiments were carried out on Brazilian schools public datasets via 10-fold
    cross-validation comparison of the ranking score produced also by logistic
    regression. The proposed approach achieved higher performance than the usual
    distribution mode transformation and equal to the expert weighing approach
    measured by the maximum Kolmogorov-Smirnov distance and the area under the ROC
    curve at 0.01 significance level.

    Don't Fear the Reaper: Refuting Bostrom's Superintelligence Argument

    Sebastian Benthall
    Subjects: Artificial Intelligence (cs.AI)

    In recent years prominent intellectuals have raised ethical concerns about
    the consequences of artificial intelligence. One concern is that an autonomous
    agent might modify itself to become “superintelligent” and, in supremely
    effective pursuit of poorly specified goals, destroy all of humanity. This
    paper considers and rejects the possibility of this outcome. We argue that this
    scenario depends on an agent’s ability to rapidly improve its ability to
    predict its environment through self-modification. Using a Bayesian model of a
    reasoning agent, we show that there are important limitations to how an agent
    may improve its predictive ability through self-modification alone. We conclude
    that concern about this artificial intelligence outcome is misplaced and better
    directed at policy questions around data access and storage.

    Monte Carlo Action Programming

    Lenz Belzner
    Subjects: Artificial Intelligence (cs.AI); Programming Languages (cs.PL)

    This paper proposes Monte Carlo Action Programming, a programming language
    framework for autonomous systems that act in large probabilistic state spaces
    with high branching factors. It comprises formal syntax and semantics of a
    nondeterministic action programming language. The language is interpreted
    stochastically via Monte Carlo Tree Search. Effectiveness of the approach is
    shown empirically.

    Proportional Representation in Vote Streams

    Palash Dey, Nimrod Talmon, Otniel van Handel
    Comments: Accepted as a full paper in AAMAS 2017
    Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Multiagent Systems (cs.MA)

    We consider elections where the voters come one at a time, in a streaming
    fashion, and devise space-efficient algorithms which identify an approximate
    winning committee with respect to common multiwinner proportional
    representation voting rules; specifically, we consider the Approval-based and
    the Borda-based variants of both the Chamberlin– ourant rule and the Monroe
    rule. We complement our algorithms with lower bounds. Somewhat surprisingly,
    our results imply that, using space which does not depend on the number of
    voters it is possible to efficiently identify an approximate representative
    committee of fixed size over vote streams with huge number of voters.

    Robust Budget Allocation via Continuous Submodular Functions

    Matthew Staib, Stefanie Jegelka
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Social and Information Networks (cs.SI); Optimization and Control (math.OC)

    The optimal allocation of resources for maximizing influence, spread of
    information or coverage, has gained attention in the past years, in particular
    in machine learning and data mining. But, in applications, the parameters of
    the problem are rarely known exactly, and using wrong parameters can lead to
    undesirable outcomes. We hence revisit a continuous version of the Budget
    Allocation or Bipartite Influence Maximization problem introduced by Alon et
    al. (2012) from a robust optimization perspective, where an adversary may
    choose the least favorable parameters within a confidence set. The resulting
    problem is a nonconvex-concave saddle point problem (or game). We show that
    this nonconvex problem can be solved exactly by leveraging connections to
    continuous submodular functions, and by solving a constrained submodular
    minimization problem. Although constrained submodular minimization is hard in
    general, here, we establish conditions under which such a problem can be solved
    to arbitrary precision (epsilon).

    Stacked Thompson Bandits

    Lenz Belzner, Thomas Gabor
    Comments: Accepted at SEsCPS @ ICSE 2017
    Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Systems and Control (cs.SY)

    We introduce Stacked Thompson Bandits (STB) for efficiently generating plans
    that are likely to satisfy a given bounded temporal logic requirement. STB uses
    a simulation for evaluation of plans, and takes a Bayesian approach to using
    the resulting information to guide its search. In particular, we show that
    stacking multiarmed bandits and using Thompson sampling to guide the action
    selection process for each bandit enables STB to generate plans that satisfy
    requirements with a high probability while only searching a fraction of the
    search space.

    Bayesian Verification under Model Uncertainty

    Lenz Belzner, Thomas Gabor
    Comments: Accepted at SEsCPS @ ICSE 2017
    Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

    Machine learning enables systems to build and update domain models based on
    runtime observations. In this paper, we study statistical model checking and
    runtime verification for systems with this ability. Two challenges arise: (1)
    Models built from limited runtime data yield uncertainty to be dealt with. (2)
    There is no definition of satisfaction w.r.t. uncertain hypotheses. We propose
    such a definition of subjective satisfaction based on recently introduced
    satisfaction functions. We also propose the BV algorithm as a Bayesian solution
    to runtime verification of subjective satisfaction under model uncertainty. BV
    provides user-definable stochastic bounds for type I and II errors. We discuss
    empirical results from an example application to illustrate our ideas.

    On architectural choices in deep learning: From network structure to gradient convergence and parameter estimation

    Vamsi K Ithapu, Sathya N Ravi, Vikas Singh
    Comments: 87 Pages; 14 figures; Under review
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

    We study mechanisms to characterize how the asymptotic convergence of
    backpropagation in deep architectures, in general, is related to the network
    structure, and how it may be influenced by other design choices including
    activation type, denoising and dropout rate. We seek to analyze whether network
    architecture and input data statistics may guide the choices of learning
    parameters and vice versa. Given the broad applicability of deep architectures,
    this issue is interesting both from theoretical and a practical standpoint.
    Using properties of general nonconvex objectives (with first-order
    information), we first build the association between structural, distributional
    and learnability aspects of the network vis-`a-vis their interaction with
    parameter convergence rates. We identify a nice relationship between feature
    denoising and dropout, and construct families of networks that achieve the same
    level of convergence. We then derive a workflow that provides systematic
    guidance regarding the choice of network sizes and learning parameters often
    mediated4 by input statistics. Our technical results are corroborated by an
    extensive set of evaluations, presented in this paper as well as independent
    empirical observations reported by other groups. We also perform experiments
    showing the practical implications of our framework for choosing the best
    fully-connected design for a given problem.

    Learning What Data to Learn

    Yang Fan, Fei Tian, Tao Qin, Jiang Bian, Tie-Yan Liu
    Comments: A preliminary version will appear in ICLR 2017, workshop track. this https URL&noteId=SyJNmVqgg
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Machine learning is essentially the sciences of playing with data. An
    adaptive data selection strategy, enabling to dynamically choose different data
    at various training stages, can reach a more effective model in a more
    efficient way. In this paper, we propose a deep reinforcement learning
    framework, which we call emph{ extbf{N}eural extbf{D}ata extbf{F}ilter}
    ( extbf{NDF}), to explore automatic and adaptive data selection in the
    training process. In particular, NDF takes advantage of a deep neural network
    to adaptively select and filter important data instances from a sequential
    stream of training data, such that the future accumulative reward (e.g., the
    convergence speed) is maximized. In contrast to previous studies in data
    selection that is mainly based on heuristic strategies, NDF is quite generic
    and thus can be widely suitable for many machine learning tasks. Taking neural
    network training with stochastic gradient descent (SGD) as an example,
    comprehensive experiments with respect to various neural network modeling
    (e.g., multi-layer perceptron networks, convolutional neural networks and
    recurrent neural networks) and several applications (e.g., image classification
    and text understanding) demonstrate that NDF powered SGD can achieve comparable
    accuracy with standard SGD process by using less data and fewer iterations.

    Analysis of Agent Expertise in Ms. Pac-Man using Value-of-Information-based Policies

    Isaac J. Sledge, Jose C. Principe
    Comments: Submitted to the IEEE Transactions on Computational Intelligence and Artificial Intelligence in Games
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)

    Conventional reinforcement learning methods for Markov decision processes
    rely on weakly-guided, stochastic searches to drive the learning process. It
    can therefore be difficult to predict what agent behaviors might emerge. In
    this paper, we consider an information-theoretic approach for performing
    constrained stochastic searches that promote the formation of risk-averse to
    risk-favoring behaviors. Our approach is based on the value of information, a
    criterion that provides an optimal trade-off between the expected return of a
    policy and the policy’s complexity. As the policy complexity is reduced, there
    is a high chance that the agents will eschew risky actions that increase the
    long-term rewards. The agents instead focus on simply completing their main
    objective in an expeditious fashion. As the policy complexity increases, the
    agents will take actions, regardless of the risk, that seek to decrease the
    long-term costs. A minimal-cost policy is sought in either case; the obtainable
    cost depends on a single, tunable parameter that regulates the degree of policy
    complexity.

    We evaluate the performance of value-of-information-based policies on a
    stochastic version of Ms. Pac-Man. A major component of this paper is
    demonstrating that ranges of policy complexity values yield different game-play
    styles and analyzing why this occurs. We show that low-complexity policies aim
    to only clear the environment of pellets while avoiding invulnerable ghosts.
    Higher-complexity policies implement multi-modal strategies that compel the
    agent to seek power-ups and chase after vulnerable ghosts, both of which reduce
    the long-term costs.

    Show, Attend and Interact: Perceivable Human-Robot Social Interaction through Neural Attention Q-Network

    Ahmed Hussain Qureshi, Yutaka Nakamura, Yuichiro Yoshikawa, Hiroshi Ishiguro
    Comments: 7 pages, 5 figures, accepted by IEEE-RAS ICRA’17
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    For a safe, natural and effective human-robot social interaction, it is
    essential to develop a system that allows a robot to demonstrate the
    perceivable responsive behaviors to complex human behaviors. We introduce the
    Multimodal Deep Attention Recurrent Q-Network using which the robot exhibits
    human-like social interaction skills after 14 days of interacting with people
    in an uncontrolled real world. Each and every day during the 14 days, the
    system gathered robot interaction experiences with people through a
    hit-and-trial method and then trained the MDARQN on these experiences using
    end-to-end reinforcement learning approach. The results of interaction based
    learning indicate that the robot has learned to respond to complex human
    behaviors in a perceivable and socially acceptable manner.

    A Roadmap for a Rigorous Science of Interpretability

    Finale Doshi-Velez, Been Kim
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    As machine learning systems become ubiquitous, there has been a surge of
    interest in interpretable machine learning: systems that provide explanation
    for their outputs. These explanations are often used to qualitatively assess
    other criteria such as safety or non-discrimination. However, despite the
    interest in interpretability, there is very little consensus on what
    interpretable machine learning is and how it should be measured. In this
    position paper, we first define interpretability and describe when
    interpretability is needed (and when it is not). Next, we suggest a taxonomy
    for rigorous evaluation and expose open questions towards a more rigorous
    science of interpretable machine learning.

    Optimal Experiment Design for Causal Discovery from Fixed Number of Experiments

    AmirEmad Ghassami, Saber Salehkaleybar, Negar Kiyavash
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We study the problem of causal structure learning over a set of random
    variables when the experimenter is allowed to perform at most (M) experiments
    in a non-adaptive manner. We consider the optimal learning strategy in terms of
    minimizing the portions of the structure that remains unknown given the limited
    number of experiments in both Bayesian and minimax setting. We characterize the
    theoretical optimal solution and propose an algorithm, which designs the
    experiments efficiently in terms of time complexity. We show that for bounded
    degree graphs, in the minimax case and in the Bayesian case with uniform
    priors, our proposed algorithm is a (
    ho)-approximation algorithm, where
    (
    ho) is independent of the order of the underlying graph. Simulations on both
    synthetic and real data show that the performance of our algorithm is very
    close to the optimal solution.

    Boosted Generative Models

    Aditya Grover, Stefano Ermon
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We propose a new approach for using unsupervised boosting to create an
    ensemble of generative models, where models are trained in sequence to correct
    earlier mistakes. Our meta-algorithmic framework can leverage any existing base
    learner that permits likelihood evaluation, including recent latent variable
    models. Further, our approach allows the ensemble to include discriminative
    models trained to distinguish real data from model-generated data. We show
    theoretical conditions under which incorporating a new model in the ensemble
    will improve the fit and empirically demonstrate the effectiveness of boosting
    on density estimation and sample generation on synthetic and benchmark real
    datasets.


    Computation and Language

    Studying Positive Speech on Twitter

    Marina Sokolova, Vera Sazonova, Kanyi Huang, Rudraneel Chakraboty, Stan Matwin
    Comments: 13 pages, 6 tables
    Subjects: Computation and Language (cs.CL)

    We present results of empirical studies on positive speech on Twitter. By
    positive speech we understand speech that works for the betterment of a given
    situation, in this case relations between different communities in a
    conflict-prone country. We worked with four Twitter data sets. Through
    semi-manual opinion mining, we found that positive speech accounted for < 1% of
    the data . In fully automated studies, we tested two approaches: unsupervised
    statistical analysis, and supervised text classification based on distributed
    word representation. We discuss benefits and challenges of those approaches and
    report empirical evidence obtained in the study.

    Scaffolding Networks for Teaching and Learning to Comprehend

    Asli Celikyilmaz, Li Deng, Lihong Li, Chong Wang
    Comments: 8 pages + Abstract + 3 figures
    Subjects: Computation and Language (cs.CL)

    In scaffolding teaching, students are gradually asked questions to build
    background knowledge, clear up confusions, learn to be attentive, and improve
    comprehension. Inspired by this approach, we explore methods for teaching
    machines to learn to reason over text documents through asking questions about
    the past information. We address three key challenges in teaching and learning
    to reason: 1) the need for an effective architecture that learns from the
    information in text and keeps it in memory; 2) the difficulty of self-assessing
    what is learned at any given point and what is left to be learned; 3) the
    difficulty of teaching reasoning in a scalable way. To address the first
    challenge, we present the Scaffolding Network, an attention-based neural
    network agent that can reason over a dynamic memory. It learns a policy using
    reinforcement learning to incrementally register new information about concepts
    and their relations. For the second challenge, we describe a question simulator
    as part of the scaffolding network that learns to continuously question the
    agent about the information processed so far. Through questioning, the agent
    learns to correctly answer as many questions as possible. For the last
    challenge, we explore training with reduced annotated data. We evaluate on
    synthetic and real datasets, demonstrating that our model competes well with
    the state-of-the-art methods, especially when less supervision is used.

    Improving Machine Learning Ability with Fine-Tuning

    John P. Lalor, Hao Wu, Hong Yu
    Comments: 10 pages, 2 tables, 2 figures
    Subjects: Computation and Language (cs.CL)

    Item Response Theory (IRT) allows for measuring ability of Machine Learning
    models as compared to a human population. However, it is difficult to create a
    large dataset to train the ability of deep neural network models (DNNs). We
    propose fine-tuning as a new training process, where a model pre-trained on a
    large dataset is fine-tuned with a small supplemental training set. Our results
    show that fine-tuning can improve the ability of a state-of-the-art DNN model
    for Recognizing Textual Entailment tasks.

    Approches d'analyse distributionnelle pour améliorer la désambiguïsation sémantique

    Mokhtar Billami (LIF), Núria Gala (LIF)
    Comments: in French, Journ’ees internationales d’Analyse statistique des Donn’ees Textuelles (JADT), Jun 2016, Nice, France
    Subjects: Computation and Language (cs.CL)

    Word sense disambiguation (WSD) improves many Natural Language Processing
    (NLP) applications such as Information Retrieval, Machine Translation or
    Lexical Simplification. WSD is the ability of determining a word sense among
    different ones within a polysemic lexical unit taking into account the context.
    The most straightforward approach uses a semantic proximity measure between the
    word sense candidates of the target word and those of its context. Such a
    method very easily entails a combinatorial explosion. In this paper, we propose
    two methods based on distributional analysis which enable to reduce the
    exponential complexity without losing the coherence. We present a comparison
    between the selection of distributional neighbors and the linearly nearest
    neighbors. The figures obtained show that selecting distributional neighbors
    leads to better results.

    A Knowledge-Based Approach to Word Sense Disambiguation by distributional selection and semantic features

    Mokhtar Billami (LIF)
    Comments: in French
    Journal-ref: 22`eme Conf’erence sur le Traitement Automatique des Langues
    Naturelles et 17`eme Rencontre des ‘Etudiants Chercheurs en Informatique
    pour le Traitement Automatique des Langues, Jun 2015, CAEN, France. 22`eme
    Conf’erence sur le Traitement Automatique des Langues Naturelles et 17`eme
    Rencontre des ‘Etudiants Chercheurs en Informatique pour le Traitement
    Automatique des Langues, pp.13–24, 2015
    Subjects: Computation and Language (cs.CL)

    Word sense disambiguation improves many Natural Language Processing (NLP)
    applications such as Information Retrieval, Information Extraction, Machine
    Translation, or Lexical Simplification. Roughly speaking, the aim is to choose
    for each word in a text its best sense. One of the most popular method
    estimates local semantic similarity relatedness between two word senses and
    then extends it to all words from text. The most direct method computes a rough
    score for every pair of word senses and chooses the lexical chain that has the
    best score (we can imagine the exponential complexity that returns this
    comprehensive approach). In this paper, we propose to use a combinatorial
    optimization metaheuristic for choosing the nearest neighbors obtained by
    distributional selection around the word to disambiguate. The test and the
    evaluation of our method concern a corpus written in French by means of the
    semantic network BabelNet. The obtained accuracy rate is 78 % on all names and
    verbs chosen for the evaluation.


    Distributed, Parallel, and Cluster Computing

    Privacy-enhancing Aggregation of Internet of Things Data via Sensors Grouping

    Stefano Bennati, Evangelos Pournaras
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Big data collection practices using Internet of Things (IoT) pervasive
    technologies are often privacy-intrusive and result in surveillance, profiling,
    and discriminatory actions over citizens. On the other hand, real-time data
    analytics and aggregate information open up tremendous opportunities for
    managing and regulating infrastructures of smart grids and smart cities in a
    more efficient and sustainable way. The privacy-enhancing aggregation of
    distributed sensor data is the research focus and challenge tackled in this
    paper. A baseline scenario is considered in which IoT sensor data are sent
    directly to an untrustworthy central aggregator. Users are enabled to choose
    their privacy level by reducing the quality of the shared data. There is a
    trade-off between the quality of data and the accuracy level of data analytics.
    A grouping mechanism is introduced that improves privacy over the baseline
    scenario by sharing aggregated group data with the aggregator. The group-level
    aggregation step obfuscates the individual sensor data, in a similar fashion as
    differential privacy techniques and homomorphic encryption schemes, thus
    inference of privacy-sensitive information from single sensors becomes
    computationally harder compared to the baseline scenario. The mechanism
    improves individual privacy over the baseline, without an impact on accuracy.
    Furthermore, if groups are large enough, the mechanism improves the privacy
    independently of individual choices. Inter-group effects such as the influence
    of individual choices on the privacy of other group members are investigated.
    Finally, several grouping strategies are evaluated and compared, and the
    implications for the design of an incentive mechanism are discussed.


    Learning

    Lipschitz Optimisation for Lipschitz Interpolation

    Jan-Peter Calliess
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Techniques known as Nonlinear Set Membership prediction, Kinky Inference or
    Lipschitz Interpolation are fast and numerically robust approaches to
    nonparametric machine learning that have been proposed to be utilised in the
    context of system identification and learning-based control. They utilise
    presupposed Lipschitz properties in order to compute inferences over unobserved
    function values. Unfortunately, most of these approaches rely on exact
    knowledge about the input space metric as well as about the Lipschitz constant.
    Furthermore, existing techniques to estimate the Lipschitz constants from the
    data are not robust to noise or seem to be ad-hoc and typically are decoupled
    from the ultimate learning and prediction task. To overcome these limitations,
    we propose an approach for optimising parameters of the presupposed metrics by
    minimising validation set prediction errors. To avoid poor performance due to
    local minima, we propose to utilise Lipschitz properties of the optimisation
    objective to ensure global optimisation success. The resulting approach is a
    new flexible method for nonparametric black-box learning. We provide
    experimental evidence of the competitiveness of our approach on artificial as
    well as on real data.

    Low-rank Label Propagation for Semi-supervised Learning with 100 Millions Samples

    Raphael Petegrosso, Wei Zhang, Zhuliu Li, Yousef Saad, Rui Kuang
    Subjects: Learning (cs.LG)

    The success of semi-supervised learning crucially relies on the scalability
    to a huge amount of unlabelled data that are needed to capture the underlying
    manifold structure for better classification. Since computing the pairwise
    similarity between the training data is prohibitively expensive in most kinds
    of input data, currently, there is no general ready-to-use semi-supervised
    learning method/tool available for learning with tens of millions or more data
    points. In this paper, we adopted the idea of two low-rank label propagation
    algorithms, GLNP (Global Linear Neighborhood Propagation) and Kernel Nystr”om
    Approximation, and implemented the parallelized version of the two algorithms
    accelerated with Nesterov’s accelerated projected gradient descent for Big-data
    Label Propagation (BigLP).

    The parallel algorithms are tested on five real datasets ranging from 7000 to
    10,000,000 in size and a simulation dataset of 100,000,000 samples. In the
    experiments, the implementation can scale up to datasets with 100,000,000
    samples and hundreds of features and the algorithms also significantly improved
    the prediction accuracy when only a very small percentage of the data is
    labeled. The results demonstrate that the BigLP implementation is highly
    scalable to big data and effective in utilizing the unlabeled data for
    semi-supervised learning.

    Deep Semi-Random Features for Nonlinear Function Approximation

    Kenji Kawaguchi, Bo Xie, Le Song
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We propose semi-random features for nonlinear function approximation.
    Semi-random features are defined as the product of a random nonlinear switching
    unit and a linear adjustable unit. The flexibility of semi-random feature lies
    between the fully adjustable units in deep learning and the random features
    used in kernel methods. We show that semi-random features possess a collection
    of nice theoretical properties despite the non-convex nature of its learning
    problem. In experiments, we show that semi-random features can match the
    performance of neural networks by using slightly more units, and it outperforms
    random features by using significantly fewer units. Semi-random features
    provide an interesting data point in between kernel methods and neural networks
    to advance our understanding of the challenge of nonlinear function
    approximation, and it opens up new avenues to tackle the challenge further.

    Efficient Learning for Crowdsourced Regression

    Jungseul Ok, Sewoong Oh, Jinwoo Shin, Yunhun Jang, Yung Yi
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Crowdsourcing platforms emerged as popular venues for purchasing human
    intelligence at low cost for large volume of tasks. As many low-paid workers
    are prone to give noisy answers, one of the fundamental questions is how to
    identify more reliable workers and exploit this heterogeneity to infer the true
    answers. Despite significant research efforts for classification tasks with
    discrete answers, little attention has been paid to regression tasks where the
    answers take continuous values. We consider the task of recovering the position
    of target objects, and introduce a new probabilistic model capturing the
    heterogeneity of the workers. We propose the belief propagation (BP) algorithm
    for inferring the positions and prove that it achieves optimal mean squared
    error by comparing its performance to that of an oracle estimator. Our
    experimental results on synthetic datasets confirm our theoretical predictions.
    We further emulate a crowdsourcing system using PASCAL visual object classes
    datasets and show that de-noising the crowdsourced data using BP can
    significantly improve the performance for the downstream vision task.

    Deep Forest: Towards An Alternative to Deep Neural Networks

    Zhi-Hua Zhou, Ji Feng
    Comments: 7 pages, 5 figures
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    In this paper, we propose gcForest, a decision tree ensemble approach with
    performance highly competitive to deep neural networks. In contrast to deep
    neural networks which require great effort in hyper-parameter tuning, gcForest
    is much easier to train. Actually, even when gcForest is applied to different
    data from different domains, excellent performance can be achieved by almost
    same settings of hyper-parameters. The training process of gcForest is
    efficient and scalable. In our experiments its training time running on a PC is
    comparable to that of deep neural networks running with GPU facilities, and the
    efficiency advantage may be more apparent because gcForest is naturally apt to
    parallel implementation. Furthermore, in contrast to deep neural networks which
    require large-scale training data, gcForest can work well even when there are
    only small-scale training data. Moreover, as a tree-based approach, gcForest
    should be easier for theoretical analysis than deep neural networks.

    Learning Deep Nearest Neighbor Representations Using Differentiable Boundary Trees

    Daniel Zoran, Balaji Lakshminarayanan, Charles Blundell
    Subjects: Learning (cs.LG)

    Nearest neighbor (kNN) methods have been gaining popularity in recent years
    in light of advances in hardware and efficiency of algorithms. There is a
    plethora of methods to choose from today, each with their own advantages and
    disadvantages. One requirement shared between all kNN based methods is the need
    for a good representation and distance measure between samples.

    We introduce a new method called differentiable boundary tree which allows
    for learning deep kNN representations. We build on the recently proposed
    boundary tree algorithm which allows for efficient nearest neighbor
    classification, regression and retrieval. By modelling traversals in the tree
    as stochastic events, we are able to form a differentiable cost function which
    is associated with the tree’s predictions. Using a deep neural network to
    transform the data and back-propagating through the tree allows us to learn
    good representations for kNN methods.

    We demonstrate that our method is able to learn suitable representations
    allowing for very efficient trees with a clearly interpretable structure.

    Robust Budget Allocation via Continuous Submodular Functions

    Matthew Staib, Stefanie Jegelka
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Social and Information Networks (cs.SI); Optimization and Control (math.OC)

    The optimal allocation of resources for maximizing influence, spread of
    information or coverage, has gained attention in the past years, in particular
    in machine learning and data mining. But, in applications, the parameters of
    the problem are rarely known exactly, and using wrong parameters can lead to
    undesirable outcomes. We hence revisit a continuous version of the Budget
    Allocation or Bipartite Influence Maximization problem introduced by Alon et
    al. (2012) from a robust optimization perspective, where an adversary may
    choose the least favorable parameters within a confidence set. The resulting
    problem is a nonconvex-concave saddle point problem (or game). We show that
    this nonconvex problem can be solved exactly by leveraging connections to
    continuous submodular functions, and by solving a constrained submodular
    minimization problem. Although constrained submodular minimization is hard in
    general, here, we establish conditions under which such a problem can be solved
    to arbitrary precision (epsilon).

    Learning rates for classification with Gaussian kernels

    Shao-Bo Lin, Jinshan Zeng, Xiangyu Chang
    Comments: This paper has been accepted by Neural Computation
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

    This paper aims at refined error analysis for binary classification using
    support vector machine (SVM) with Gaussian kernel and convex loss. Our first
    result shows that for some loss functions such as the logistic loss and
    exponential loss, SVM with Gaussian kernel can reach the almost optimal
    learning rate, provided the regression function is smooth. Our second result
    shows that, for a large number of loss functions, under some Tsybakov noise
    assumption, if the regression function is infinitely smooth, then SVM with
    Gaussian kernel can achieve the learning rate of order (m^{-1}), where (m) is
    the number of samples.

    On architectural choices in deep learning: From network structure to gradient convergence and parameter estimation

    Vamsi K Ithapu, Sathya N Ravi, Vikas Singh
    Comments: 87 Pages; 14 figures; Under review
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

    We study mechanisms to characterize how the asymptotic convergence of
    backpropagation in deep architectures, in general, is related to the network
    structure, and how it may be influenced by other design choices including
    activation type, denoising and dropout rate. We seek to analyze whether network
    architecture and input data statistics may guide the choices of learning
    parameters and vice versa. Given the broad applicability of deep architectures,
    this issue is interesting both from theoretical and a practical standpoint.
    Using properties of general nonconvex objectives (with first-order
    information), we first build the association between structural, distributional
    and learnability aspects of the network vis-`a-vis their interaction with
    parameter convergence rates. We identify a nice relationship between feature
    denoising and dropout, and construct families of networks that achieve the same
    level of convergence. We then derive a workflow that provides systematic
    guidance regarding the choice of network sizes and learning parameters often
    mediated4 by input statistics. Our technical results are corroborated by an
    extensive set of evaluations, presented in this paper as well as independent
    empirical observations reported by other groups. We also perform experiments
    showing the practical implications of our framework for choosing the best
    fully-connected design for a given problem.

    Towards Deeper Understanding of Variational Autoencoding Models

    Shengjia Zhao, Jiaming Song, Stefano Ermon
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We propose a new family of optimization criteria for variational
    auto-encoding models, generalizing the standard evidence lower bound. We
    provide conditions under which they recover the data distribution and learn
    latent features, and formally show that common issues such as blurry samples
    and uninformative latent features arise when these conditions are not met.
    Based on these new insights, we propose a new sequential VAE model that can
    generate sharp samples on the LSUN image dataset based on pixel-wise
    reconstruction loss, and propose an optimization criterion that encourages
    unsupervised learning of informative latent features.

    Deep Clustering using Auto-Clustering Output Layer

    Ozsel Kilinc, Ismail Uysal
    Comments: Submitted to ICML 2017
    Subjects: Learning (cs.LG)

    In this paper, we propose a novel method to enrich the representation
    provided to the output layer of feedforward neural networks in the form of an
    auto-clustering output layer (ACOL) which enables the network to naturally
    create sub-clusters under the provided main class la- bels. In addition, a
    novel regularization term is introduced which allows ACOL to encourage the
    neural network to reveal its own explicit clustering objective. While the
    underlying process of finding the subclasses is completely unsupervised,
    semi-supervised learning is also possible based on the provided classification
    objective. The results show that ACOL can achieve a 99.2% clustering accuracy
    for the semi-supervised case when partial class labels are presented and a 96%
    accuracy for the unsupervised clustering case. These findings represent a
    paradigm shift especially when it comes to harnessing the power of deep
    networks for primary and secondary clustering applications in large datasets.

    Learning What Data to Learn

    Yang Fan, Fei Tian, Tao Qin, Jiang Bian, Tie-Yan Liu
    Comments: A preliminary version will appear in ICLR 2017, workshop track. this https URL&noteId=SyJNmVqgg
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Machine learning is essentially the sciences of playing with data. An
    adaptive data selection strategy, enabling to dynamically choose different data
    at various training stages, can reach a more effective model in a more
    efficient way. In this paper, we propose a deep reinforcement learning
    framework, which we call emph{ extbf{N}eural extbf{D}ata extbf{F}ilter}
    ( extbf{NDF}), to explore automatic and adaptive data selection in the
    training process. In particular, NDF takes advantage of a deep neural network
    to adaptively select and filter important data instances from a sequential
    stream of training data, such that the future accumulative reward (e.g., the
    convergence speed) is maximized. In contrast to previous studies in data
    selection that is mainly based on heuristic strategies, NDF is quite generic
    and thus can be widely suitable for many machine learning tasks. Taking neural
    network training with stochastic gradient descent (SGD) as an example,
    comprehensive experiments with respect to various neural network modeling
    (e.g., multi-layer perceptron networks, convolutional neural networks and
    recurrent neural networks) and several applications (e.g., image classification
    and text understanding) demonstrate that NDF powered SGD can achieve comparable
    accuracy with standard SGD process by using less data and fewer iterations.

    Analysis of Agent Expertise in Ms. Pac-Man using Value-of-Information-based Policies

    Isaac J. Sledge, Jose C. Principe
    Comments: Submitted to the IEEE Transactions on Computational Intelligence and Artificial Intelligence in Games
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)

    Conventional reinforcement learning methods for Markov decision processes
    rely on weakly-guided, stochastic searches to drive the learning process. It
    can therefore be difficult to predict what agent behaviors might emerge. In
    this paper, we consider an information-theoretic approach for performing
    constrained stochastic searches that promote the formation of risk-averse to
    risk-favoring behaviors. Our approach is based on the value of information, a
    criterion that provides an optimal trade-off between the expected return of a
    policy and the policy’s complexity. As the policy complexity is reduced, there
    is a high chance that the agents will eschew risky actions that increase the
    long-term rewards. The agents instead focus on simply completing their main
    objective in an expeditious fashion. As the policy complexity increases, the
    agents will take actions, regardless of the risk, that seek to decrease the
    long-term costs. A minimal-cost policy is sought in either case; the obtainable
    cost depends on a single, tunable parameter that regulates the degree of policy
    complexity.

    We evaluate the performance of value-of-information-based policies on a
    stochastic version of Ms. Pac-Man. A major component of this paper is
    demonstrating that ranges of policy complexity values yield different game-play
    styles and analyzing why this occurs. We show that low-complexity policies aim
    to only clear the environment of pellets while avoiding invulnerable ghosts.
    Higher-complexity policies implement multi-modal strategies that compel the
    agent to seek power-ups and chase after vulnerable ghosts, both of which reduce
    the long-term costs.

    Process Progress Estimation and Phase Detection

    Xinyu Li, Yanyi Zhang, Jianyu Zhang, Yueyang Chen, Shuhong Chen, Yue Gu, Moliang Zhou, Richard A. Farneth, Ivan Marsic, Randall S. Burd
    Comments: 16 pages, 11 figures
    Subjects: Learning (cs.LG); Human-Computer Interaction (cs.HC)

    Process modeling and understanding is fundamental for advanced human-computer
    interfaces and automation systems. Recent research focused on activity
    recognition, but little work has focused on process progress detection from
    sensor data. We introduce a real-time, sensor-based system for modeling,
    recognizing and estimating the completeness of a process. We implemented a
    multimodal CNN-LSTM structure to extract the spatio-temporal features from
    different sensory datatypes. We used a novel deep regression structure for
    overall completeness estimation. By combining process completeness estimation
    with a Gaussian mixture model, our system can predict the process phase using
    the estimated completeness. We also introduce the rectified hyperbolic tangent
    (rtanh) activation function and conditional loss to help the training process.
    Using the completeness estimation result and performance speed calculations, we
    also implemented an online estimator of remaining time. We tested this system
    using data obtained from a medical process (trauma resuscitation) and sport
    events (swim competition). Our system outperformed existing implementations for
    phase prediction during trauma resuscitation and achieved over 80% of process
    phase detection accuracy with less than 9% completeness estimation error and
    time remaining estimation error less than 18% of duration in both dataset.

    Depth Creates No Bad Local Minima

    Haihao Lu, Kenji Kawaguchi
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC); Machine Learning (stat.ML)

    In deep learning, extit{depth}, as well as extit{nonlinearity}, create
    non-convex loss surfaces. Then, does depth alone create bad local minima? In
    this paper, we prove that without nonlinearity, depth alone does not create bad
    local minima, although it induces non-convex loss surface. Using this insight,
    we greatly simplify a recently proposed proof to show that all of the local
    minima of feedforward deep linear neural networks are global minima. Our
    theoretical result generalizes previous results with fewer assumptions.

    Learning Latent Networks in Vector Auto Regressive Models

    Saber Salehkaleybar, Jalal Etesami, Negar Kiyavash
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We study the problem of learning the dependency graph between random
    processes in a vector auto regressive (VAR) model from samples when a subset of
    the variables are latent. We show that the dependencies among the observed
    processes can be identified successfully under some conditions on the VAR
    model. Moreover, we can recover the length of all directed paths between any
    two observed processes which pass through latent part. By utilizing this
    information, we reconstruct the latent subgraph with minimum number of nodes
    uniquely if its topology is a directed tree. Furthermore, we propose an
    algorithm that finds all possible minimal latent networks if there exists at
    most one directed path of each length between any two observed nodes through
    the latent part. Experimental results on various synthetic and real-world
    datasets validate our theoretical results.

    Optimal Experiment Design for Causal Discovery from Fixed Number of Experiments

    AmirEmad Ghassami, Saber Salehkaleybar, Negar Kiyavash
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We study the problem of causal structure learning over a set of random
    variables when the experimenter is allowed to perform at most (M) experiments
    in a non-adaptive manner. We consider the optimal learning strategy in terms of
    minimizing the portions of the structure that remains unknown given the limited
    number of experiments in both Bayesian and minimax setting. We characterize the
    theoretical optimal solution and propose an algorithm, which designs the
    experiments efficiently in terms of time complexity. We show that for bounded
    degree graphs, in the minimax case and in the Bayesian case with uniform
    priors, our proposed algorithm is a (
    ho)-approximation algorithm, where
    (
    ho) is independent of the order of the underlying graph. Simulations on both
    synthetic and real data show that the performance of our algorithm is very
    close to the optimal solution.

    Diameter-Based Active Learning

    Christopher Tosh, Sanjoy Dasgupta
    Comments: 18 pages, 2 figures
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    To date, the tightest upper and lower-bounds for the active learning of
    general concept classes have been in terms of a parameter of the learning
    problem called the splitting index. We provide, for the first time, an
    efficient algorithm that is able to realize this upper bound, and we
    empirically demonstrate its good performance.

    Semi-parametric Network Structure Discovery Models

    Amir Dezfouli, Edwin V. Bonilla, Richard Nock
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We propose a network structure discovery model for continuous observations
    that generalizes linear causal models by incorporating a Gaussian process (GP)
    prior on a network-independent component, and random sparsity and weight
    matrices as the network-dependent parameters. This approach provides flexible
    modeling of network-independent trends in the observations as well as
    uncertainty quantification around the discovered network structure. We
    establish a connection between our model and multi-task GPs and develop an
    efficient stochastic variational inference algorithm for it. Furthermore, we
    formally show that our approach is numerically stable and in fact numerically
    easy to carry out almost everywhere on the support of the random variables
    involved. Finally, we evaluate our model on three applications, showing that it
    outperforms previous approaches. We provide a qualitative and quantitative
    analysis of the structures discovered for domains such as the study of the full
    genome regulation of the yeast Saccharomyces cerevisiae.

    SGD Learns the Conjugate Kernel Class of the Network

    Amit Daniely
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)

    We show that the standard stochastic gradient decent (SGD) algorithm is
    guaranteed to learn, in polynomial time, a function that is competitive with
    the best function in the conjugate kernel space, as defined in Daniely, Frostig
    and Singer (2016). The result holds for log-depth networks from a rich family
    of architectures. To the best of our knowledge, it is the first polynomial-time
    guarantee for the standard neural network learning algorithm for networks of
    depth (ge 3).

    Depth Separation for Neural Networks

    Amit Daniely
    Subjects: Learning (cs.LG); Computational Complexity (cs.CC); Machine Learning (stat.ML)

    Let (f:mathbb{S}^{d-1} imes mathbb{S}^{d-1} omathbb{S}) be a function of
    the form (f(mathbf{x},mathbf{x}’) = g(langlemathbf{x},mathbf{x}’
    angle))
    for (g:[-1,1] o mathbb{R}). We give a simple proof that shows that poly-size
    depth two neural networks with (exponentially) bounded weights cannot
    approximate (f) whenever (g) cannot be approximated by a low degree polynomial.
    Moreover, for many (g)’s, such as (g(x)=sin(pi d^3x)), the number of neurons
    must be (2^{Omegaleft(dlog(d)
    ight)}). Furthermore, the result holds
    w.r.t. the uniform distribution on (mathbb{S}^{d-1} imes mathbb{S}^{d-1}).
    As many functions of the above form can be well approximated by poly-size depth
    three networks with poly-bounded weights, this establishes a separation between
    depth two and depth three networks w.r.t. the uniform distribution on
    (mathbb{S}^{d-1} imes mathbb{S}^{d-1}).

    Boosted Generative Models

    Aditya Grover, Stefano Ermon
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We propose a new approach for using unsupervised boosting to create an
    ensemble of generative models, where models are trained in sequence to correct
    earlier mistakes. Our meta-algorithmic framework can leverage any existing base
    learner that permits likelihood evaluation, including recent latent variable
    models. Further, our approach allows the ensemble to include discriminative
    models trained to distinguish real data from model-generated data. We show
    theoretical conditions under which incorporating a new model in the ensemble
    will improve the fit and empirically demonstrate the effectiveness of boosting
    on density estimation and sample generation on synthetic and benchmark real
    datasets.

    Deep and Hierarchical Implicit Models

    Dustin Tran, Rajesh Ranganath, David M. Blei
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Computation (stat.CO); Methodology (stat.ME)

    Implicit probabilistic models are a very flexible class for modeling data.
    They define a process to simulate observations, and unlike traditional models,
    they do not require a tractable likelihood function. In this paper, we develop
    two families of models: hierarchical implicit models and deep implicit models.
    They combine the idea of implicit densities with hierarchical Bayesian modeling
    and deep neural networks. The use of implicit models with Bayesian analysis has
    in general been limited by our ability to perform accurate and scalable
    inference. We develop a variational inference algorithm for implicit models.
    Key to our method is specifying a variational family that is also implicit.
    This matches the model’s flexibility and allows for accurate approximation of
    the posterior. Our method scales up implicit models to sizes previously not
    possible and opens the door to new modeling designs. We demonstrate diverse
    applications: a large-scale physical simulator for predator-prey populations in
    ecology; a Bayesian generative adversarial network for discrete data; and a
    deep implicit model for text generation.

    Bridging the Gap Between Value and Policy Based Reinforcement Learning

    Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    We formulate a new notion of softmax temporal consistency that generalizes
    the standard hard-max Bellman consistency usually considered in value based
    reinforcement learning (RL). In particular, we show how softmax consistent
    action values correspond to optimal policies that maximize entropy regularized
    expected reward. More importantly, we establish that softmax consistent action
    values and the optimal policy must satisfy a mutual compatibility property that
    holds across any state-action subsequence. Based on this observation, we
    develop a new RL algorithm, Path Consistency Learning (PCL), that minimizes the
    total inconsistency measured along multi-step subsequences extracted from both
    both on and off policy traces. An experimental evaluation demonstrates that PCL
    significantly outperforms strong actor-critic and Q-learning baselines across
    several benchmark tasks.

    Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

    Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Philip. H. S. Torr, Pushmeet Kohli, Shimon Whiteson
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Multiagent Systems (cs.MA)

    Many real-world problems, such as network packet routing and urban traffic
    control, are naturally modeled as multi-agent reinforcement learning (RL)
    problems. However, existing multi-agent RL methods typically scale poorly in
    the problem size. Therefore, a key challenge is to translate the success of
    deep learning on single-agent RL to the multi-agent setting. A key stumbling
    block is that independent Q-learning, the most popular multi-agent RL method,
    introduces nonstationarity that makes it incompatible with the experience
    replay memory on which deep RL relies. This paper proposes two methods that
    address this problem: 1) conditioning each agent’s value function on a
    footprint that disambiguates the age of the data sampled from the replay memory
    and 2) using a multi-agent variant of importance sampling to naturally decay
    obsolete data. Results on a challenging decentralised variant of StarCraft unit
    micromanagement confirm that these methods enable the successful combination of
    experience replay with multi-agent RL.

    Central Moment Discrepancy (CMD) for Domain-Invariant Representation Learning

    Werner Zellinger, Thomas Grubinger, Edwin Lughofer, Thomas Natschläger, Susanne Saminger-Platz
    Comments: Published in ICLR 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    The learning of domain-invariant representations in the context of domain
    adaptation with neural networks is considered. We propose a new regularization
    method that minimizes the domain-specific latent feature representations
    directly in the hidden activation space. Although some standard distribution
    matching approaches exist that can be interpreted as the matching of weighted
    sums of moments, e.g. Maximum Mean Discrepancy (MMD), an explicit order-wise
    matching of higher order moments has not been considered before. We propose to
    match the higher order central moments of probability distributions by means of
    order-wise moment differences. Our model does not require computationally
    expensive distance and kernel matrix computations. We utilize the equivalent
    representation of probability distributions by moment sequences to define a new
    distance function, called Central Moment Discrepancy (CMD). We prove that CMD
    is a metric on the set of probability distributions on a compact interval. We
    further prove that convergence of probability distributions on compact
    intervals w.r.t. the new metric implies convergence in distribution of the
    respective random variables. We test our approach on two different benchmark
    data sets for object recognition (Office) and sentiment analysis of product
    reviews (Amazon reviews). CMD achieves a new state-of-the-art performance on
    most domain adaptation tasks of Office and outperforms networks trained with
    MMD, Variational Fair Autoencoders and Domain Adversarial Neural Networks on
    Amazon reviews. In addition, a post-hoc parameter sensitivity analysis shows
    that the new approach is stable w.r.t. parameter changes in a certain interval.
    The source code of the experiments is publicly available.

    ShaResNet: reducing residual network parameter number by sharing weights

    Alexandre Boulch
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Deep Residual Networks have reached the state of the art in many image
    processing tasks such image classification. However, the cost for a gain in
    accuracy in terms of depth and memory is prohibitive as it requires a higher
    number of residual blocks, up to double the initial value. To tackle this
    problem, we propose in this paper a way to reduce the redundant information of
    the networks. We share the weights of convolutional layers between residual
    blocks operating at the same spatial scale. The signal flows multiple times in
    the same convolutional layer. The resulting architecture, called ShaResNet,
    contains block specific layers and shared layers. These ShaResNet are trained
    exactly in the same fashion as the commonly used residual networks. We show, on
    the one hand, that they are almost as efficient as their sequential
    counterparts while involving less parameters, and on the other hand that they
    are more efficient than a residual network with the same number of parameters.
    For example, a 152-layer-deep residual network can be reduced to 106
    convolutional layers, i.e. a parameter gain of 39\%, while loosing less than
    0.2\% accuracy on ImageNet.

    Learning Discrete Representations via Information Maximizing Self Augmented Training

    Weihua Hu, Takeru Miyato, Seiya Tokui, Eiichi Matsumoto, Masashi Sugiyama
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Learning discrete representations of data is a central machine learning task
    because of the compactness of the representations and ease of interpretation.
    The task includes clustering and hash learning as special cases. Deep neural
    networks are promising to be used because they can model the non-linearity of
    data and scale to large datasets. However, their model complexity is huge, and
    therefore, we need to carefully regularize the networks in order to learn
    useful representations that exhibit intended invariance for applications of
    interest. To this end, we propose a method called Information Maximizing Self
    Augmented Training (IMSAT). In IMSAT, we use data augmentation to impose the
    invariance on discrete representations. More specifically, we encourage the
    predicted representations of augmented data points to be close to those of the
    original data points in an end-to-end fashion. At the same time, we maximize
    the information-theoretic dependency between data and their mapped
    representations of data. Extensive experiments on benchmark datasets show that
    IMSAT produces state-of-the-art results for both clustering and unsupervised
    hash learning.

    Algorithmic stability and hypothesis complexity

    Tongliang Liu, Gábor Lugosi, Gergely Neu, Dacheng Tao
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We introduce a notion of algorithmic stability of learning algorithms—that
    we term hypothesis stability—that captures stability of the hypothesis output
    by the learning algorithm in the normed space of functions from which
    hypotheses are selected. The main result of the paper bounds the generalization
    error of any learning algorithm in terms of its hypothesis stability. The
    bounds are based on martingale inequalities in the Banach space to which the
    hypotheses belong. We apply the general bounds to bound the performance of some
    learning algorithms based on empirical risk minimization and stochastic
    gradient descent.

    Significant Pattern Mining on Continuous Variables

    Mahito Sugiyama, Karsten M. Borgwardt
    Comments: 19 pages, 4 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Methodology (stat.ME)

    Significant pattern mining, the search for sets of binary features that are
    statistically significantly enriched in a class of objects, is of fundamental
    importance in a wide range of applications from economics to statistical
    genetics. Still, all existing approaches make the restrictive assumption that
    the features are binary and require a binarization of continuous data during
    preprocessing, which often leads to a loss of information. Here, we solve the
    open problem of significant pattern mining on continuous variables. Our
    approach detects all patterns that are statistically significantly associated
    with a class of interest, while rigorously correcting for multiple testing. Key
    to this approach is the use of Spearman’s rank correlation coefficient to
    represent the frequency of a pattern. Our experiments demonstrate that our
    novel approach detects true patterns with higher precision and recall than
    competing methods that require a prior binarization of the data.

    Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimizations

    Pan Xu, Jian Ma, Quanquan Gu
    Comments: 29 pages, 5 figures, 3 tables
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We study the estimation of the latent variable Gaussian graphical model
    (LVGGM), where the precision matrix is the superposition of a sparse matrix and
    a low-rank matrix. In order to speed up the estimation of the sparse plus
    low-rank components, we propose a sparsity constrained maximum likelihood
    estimator based on matrix factorization, and an efficient alternating gradient
    descent algorithm with hard thresholding to solve it. Our algorithm is orders
    of magnitude faster than the convex relaxation based methods for LVGGM. In
    addition, we prove that our algorithm is guaranteed to linearly converge to the
    unknown sparse and low-rank components up to the optimal statistical precision.
    Experiments on both synthetic and genomic data demonstrate the superiority of
    our algorithm over the state-of-the-art algorithms and corroborate our theory.

    A Roadmap for a Rigorous Science of Interpretability

    Finale Doshi-Velez, Been Kim
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    As machine learning systems become ubiquitous, there has been a surge of
    interest in interpretable machine learning: systems that provide explanation
    for their outputs. These explanations are often used to qualitatively assess
    other criteria such as safety or non-discrimination. However, despite the
    interest in interpretability, there is very little consensus on what
    interpretable machine learning is and how it should be measured. In this
    position paper, we first define interpretability and describe when
    interpretability is needed (and when it is not). Next, we suggest a taxonomy
    for rigorous evaluation and expose open questions towards a more rigorous
    science of interpretable machine learning.

    The Shattered Gradients Problem: If resnets are the answer, then what is the question?

    David Balduzzi, Marcus Frean, Lennox Leary, JP Lewis, Kurt Wan-Duo Ma, Brian McWilliams
    Comments: 14 pages, 6 figures
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

    A long-standing obstacle to progress in deep learning is the problem of
    vanishing and exploding gradients. The problem has largely been overcome
    through the introduction of carefully constructed initializations and batch
    normalization. Nevertheless, architectures incorporating skip-connections such
    as resnets perform much better than standard feedforward architectures despite
    well-chosen initialization and batch normalization. In this paper, we identify
    the shattered gradients problem. Specifically, we show that the correlation
    between gradients in standard feedforward networks decays exponentially with
    depth resulting in gradients that resemble white noise. In contrast, the
    gradients in architectures with skip-connections are far more resistant to
    shattering decaying sublinearly. Detailed empirical evidence is presented in
    support of the analysis, on both fully-connected networks and convnets.
    Finally, we present a new “looks linear” (LL) initialization that prevents
    shattering. Preliminary experiments show the new initialization allows to train
    very deep networks without the addition of skip-connections.

    Can Boltzmann Machines Discover Cluster Updates ?

    Lei Wang
    Comments: 4 pages, 4 figures, and half page appendix
    Subjects: Computational Physics (physics.comp-ph); Statistical Mechanics (cond-mat.stat-mech); Learning (cs.LG); Machine Learning (stat.ML)

    Boltzmann machines are physics informed generative models with wide
    applications in machine learning. They can learn the probability distribution
    from an input dataset and generate new samples accordingly. Applying them back
    to physics, the Boltzmann machines are ideal recommender systems to accelerate
    Monte Carlo simulation of physical systems due to their flexibility and
    effectiveness. More intriguingly, we show that the generative sampling of the
    Boltzmann Machines can even discover unknown cluster Monte Carlo algorithms.
    The creative power comes from the latent representation of the Boltzmann
    machines, which learn to mediate complex interactions and identify clusters of
    the physical system. We demonstrate these findings with concrete examples of
    the classical Ising model with and without four spin plaquette interactions.
    Our results endorse a fresh research paradigm where intelligent machines are
    designed to create or inspire human discovery of innovative algorithms.

    eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys

    Joshua Saxe, Konstantin Berlin
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

    For years security machine learning research has promised to obviate the need
    for signature based detection by automatically learning to detect indicators of
    attack. Unfortunately, this vision hasn’t come to fruition: in fact, developing
    and maintaining today’s security machine learning systems can require
    engineering resources that are comparable to that of signature-based detection
    systems, due in part to the need to develop and continuously tune the
    “features” these machine learning systems look at as attacks evolve. Deep
    learning, a subfield of machine learning, promises to change this by operating
    on raw input signals and automating the process of feature design and
    extraction. In this paper we propose the eXpose neural network, which uses a
    deep learning approach we have developed to take generic, raw short character
    strings as input (a common case for security inputs, which include artifacts
    like potentially malicious URLs, file paths, named pipes, named mutexes, and
    registry keys), and learns to simultaneously extract features and classify
    using character-level embeddings and convolutional neural network. In addition
    to completely automating the feature design and extraction process, eXpose
    outperforms manual feature extraction based baselines on all of the intrusion
    detection problems we tested it on, yielding a 5%-10% detection rate gain at
    0.1% false positive rate compared to these baselines.

    Active Learning Using Uncertainty Information

    Yazhou Yang, Marco Loog
    Comments: 6 pages, 1 figure, International Conference on Pattern Recognition (ICPR) 2016, Cancun, Mexico
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Many active learning methods belong to the retraining-based approaches, which
    select one unlabeled instance, add it to the training set with its possible
    labels, retrain the classification model, and evaluate the criteria that we
    base our selection on. However, since the true label of the selected instance
    is unknown, these methods resort to calculating the average-case or worse-case
    performance with respect to the unknown label. In this paper, we propose a
    different method to solve this problem. In particular, our method aims to make
    use of the uncertainty information to enhance the performance of
    retraining-based models. We apply our method to two state-of-the-art algorithms
    and carry out extensive experiments on a wide variety of real-world datasets.
    The results clearly demonstrate the effectiveness of the proposed method and
    indicate it can reduce human labeling efforts in many real-life applications.

    Fast Threshold Tests for Detecting Discrimination

    Emma Pierson, Sam Corbett-Davies, Sharad Goel
    Comments: 10 pages, 6 figures, 2 tables
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Threshold tests have recently been proposed as a robust method for detecting
    bias in lending, hiring, and policing decisions. For example, in the case of
    credit extensions, these tests aim to estimate the bar for granting loans to
    white and minority applicants, with a higher inferred threshold for minorities
    indicative of discrimination. This technique, however, requires fitting a
    Bayesian latent variable model for which inference is often computationally
    challenging. Here we develop a method for fitting threshold tests that is more
    than 75 times faster than the existing approach, reducing computation from
    hours to minutes. We demonstrate this technique by analyzing 2.7 million police
    stops of pedestrians in New York City between 2008 and 2012. To achieve these
    performance gains, we introduce and analyze a flexible family of probability
    distributions on the interval [0, 1] — which we call discriminant
    distributions — that is computationally efficient to work with. These
    discriminant distributions may aid inference in a variety of applications
    beyond threshold tests.

    Competing Bandits: Learning under Competition

    Yishay Mansour, Aleksandrs Slivkins, Zhiwei Steven Wu
    Subjects: Computer Science and Game Theory (cs.GT); Learning (cs.LG)

    Most modern systems strive to learn from interactions with users, and many
    engage in emph{exploration}: making potentially suboptimal choices for the
    sake of acquiring new information. We initiate a study of the interplay between
    emph{exploration and competition}—how such systems balance the exploration
    for learning and the competition for users. Here the users play three distinct
    roles: they are customers that generate revenue, they are sources of data for
    learning, and they are self-interested agents which choose among the competing
    systems.

    As a model, we consider competition between two multi-armed bandit algorithms
    faced with the same bandit instance. Users arrive one by one and choose among
    the two algorithms, so that each algorithm makes progress if and only if it is
    chosen. We ask whether and to which extent competition incentivizes
    emph{innovation}: adoption of better algorithms. We investigate this issue for
    several models of user response, as we vary the degree of rationality and
    competitiveness in the model. Effectively, we map out the “competition vs.
    innovation” relationship, a well-studied theme in economics.

    Learning Deep Visual Object Models From Noisy Web Data: How to Make it Work

    Nizar Massouh, Francesca Babiloni, Tatiana Tommasi, Jay Young, Nick Hawes, Barbara Caputo
    Comments: 8 pages, 7 figures, 3 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB); Learning (cs.LG); Robotics (cs.RO)

    Deep networks thrive when trained on large scale data collections. This has
    given ImageNet a central role in the development of deep architectures for
    visual object classification. However, ImageNet was created during a specific
    period in time, and as such it is prone to aging, as well as dataset bias
    issues. Moving beyond fixed training datasets will lead to more robust visual
    systems, especially when deployed on robots in new environments which must
    train on the objects they encounter there. To make this possible, it is
    important to break free from the need for manual annotators. Recent work has
    begun to investigate how to use the massive amount of images available on the
    Web in place of manual image annotations. We contribute to this research thread
    with two findings: (1) a study correlating a given level of noisily labels to
    the expected drop in accuracy, for two deep architectures, on two different
    types of noise, that clearly identifies GoogLeNet as a suitable architecture
    for learning from Web data; (2) a recipe for the creation of Web datasets with
    minimal noise and maximum visual variability, based on a visual and natural
    language processing concept expansion strategy. By combining these two results,
    we obtain a method for learning powerful deep object models automatically from
    the Web. We confirm the effectiveness of our approach through object
    categorization experiments using our Web-derived version of ImageNet on a
    popular robot vision benchmark database, and on a lifelong object discovery
    task on a mobile robot.


    Information Theory

    Jamming-Resistant Receivers for the Massive MIMO Uplink

    Tan Tai Do, Emil Björnson, Erik G. Larsson, S. Mohammad Razavizadeh
    Comments: submitted to IEEE Trans. Inf. Forensics and Security
    Subjects: Information Theory (cs.IT)

    We design a jamming-resistant receiver scheme to enhance the robustness of a
    massive MIMO uplink system against jamming. We assume that a jammer attacks the
    system both in the pilot and data transmission phases. The key feature of the
    proposed scheme is that, in the pilot phase, we estimate not only the
    legitimate channel, but also the jamming channel by exploiting a purposely
    unused pilot sequence. The jamming channel estimate is used to constructed
    linear receive filters that reject the impact of the jamming signal. The
    performance of the proposed scheme is analytically evaluated using asymptotic
    properties of massive MIMO. The optimal regularized zero-forcing receiver and
    the optimal power allocation are also studied. Numerical results are provided
    to verify our analysis and show that the proposed scheme greatly improves the
    achievable rates, as compared to conventional receivers. Interestingly, the
    proposed scheme works particularly well under strong jamming attacks, since the
    improved estimate of the jamming channel outweighs the extra jamming power.

    NOMA Meets Finite Resolution Analog Beamforming in Massive MIMO and Millimeter-Wave Networks

    Zhiguo Ding, Linglong Dai, Robert Schober, H. Vincent Poor
    Subjects: Information Theory (cs.IT)

    Finite resolution analog beamforming (FRAB) has been recognized as an
    effective approach to reduce hardware costs in massive multiple-input
    multiple-output (MIMO) and millimeter-wave networks. However, the use of FRAB
    means that the beamformers are not perfectly aligned with the users’ channels
    and multiple users may be assigned similar or even identifical beamformers.
    This letter shows how non-orthogonal multiple access (NOMA) can be used to
    exploit this feature of FRAB, where a single FRAB based beamformer is shared by
    multiple users. Both analytical and simulation results are provided to
    demonstrate the excellent performance achieved by this new NOMA transmission
    scheme.

    Widely-Linear Precoding for Large-Scale MIMO with IQI: Algorithms and Performance Analysis

    Wence Zhang, Rodrigo C. de Lamare, Cunhua Pan, Ming Chen, Jianxin Dai, Bingyang Wu, Xu Bao
    Comments: Accepted in IEEE TWC
    Subjects: Information Theory (cs.IT)

    In this paper we study widely-linear precoding techniques to mitigate
    in-phase/quadrature-phase (IQ) imbalance (IQI) in the downlink of large-scale
    multiple-input multiple-output (MIMO) systems. We adopt a real-valued signal
    model which takes into account the IQI at the transmitter and then develop
    widely-linear zero-forcing (WL-ZF), widely-linear matched filter (WL-MF),
    widely-linear minimum mean-squared error (WL-MMSE) and widely-linear
    block-diagonalization (WL-BD) type precoding algorithms for both {color{red}
    single- and multiple-antenna users.} We also present a performance analysis of
    WL-ZF and WL-BD. It is proved that without IQI, WL-ZF has exactly the same
    multiplexing gain and power offset as ZF, while when IQI exists, WL-ZF achieves
    the same multiplexing gain as ZF with ideal IQ branches, but with a minor power
    loss which is related to the system scale and the IQ parameters. We also
    compare the performance of WL-BD with BD. The analysis shows that with ideal IQ
    branches, WL-BD has the same data rate as BD, while when IQI exists, WL-BD
    achieves the same multiplexing gain as BD without IQ imbalance. Numerical
    results verify the analysis and show that the proposed widely-linear type
    precoding methods significantly outperform their conventional counterparts with
    IQI and approach those with ideal IQ branches.

    Millimeter Wave Beam-Selection Using Out-of-Band Spatial Information

    Anum Ali, Nuria González-Prelcic, Robert W. Heath Jr
    Comments: 30 pages, 11 figures
    Subjects: Information Theory (cs.IT)

    Millimeter wave (mmWave) communication is one feasible solution for high
    data-rate applications like vehicular-to-everything communication and next
    generation cellular communication. Configuring mmWave links, which can be done
    through channel estimation or beam-selection, however, is a source of
    significant overhead. In this paper, we propose to use spatial information
    extracted at sub-6 GHz to help establish the mmWave link. First, we review the
    prior work on frequency dependent channel behavior and outline a simulation
    strategy to generate multi-band frequency dependent channels. Second, assuming:
    (i) narrowband channels and a fully digital architecture at sub-6 GHz; and (ii)
    wideband frequency selective channels, OFDM signaling, and an analog
    architecture at mmWave, we outline strategies to incorporate sub-6 GHz spatial
    information in mmWave compressed beam selection. We formulate compressed
    beam-selection as a weighted sparse signal recovery problem, and obtain the
    weighting information from sub-6 GHz channels. In addition, we outline a
    structured precoder/combiner design to tailor the training to out-of-band
    information. We also extend the proposed out-of-band aided compressed
    beam-selection approach to leverage information from all active OFDM
    subcarriers. The simulation results for achievable rate show that out-of-band
    aided beam-selection can reduce the training overhead of in-band only
    beam-selection by 4x.

    Strong Chain Rules for Min-Entropy under Few Bits Spoiled

    Maciej Skorski
    Subjects: Information Theory (cs.IT)

    It is well established that the notion of min-entropy fails to satisfy the
    emph{chain rule} of the form (H(X,Y) = H(X|Y)+H(Y)), known for Shannon
    Entropy. Such a property would help to analyze how min-entropy is split among
    smaller blocks. Problems of this kind arise for example when constructing
    extractors and dispersers.

    We show that any sequence of variables exhibits a very strong strong
    block-source structure (conditional distributions of blocks are nearly flat)
    when we emph{spoil few correlated bits}. This implies, conditioned on the
    spoiled bits, that emph{splitting-recombination properties} hold. In
    particular, we have many nice properties that min-entropy doesn’t obey in
    general, for example strong chain rules, “information can’t hurt” inequalities,
    equivalences of average and worst-case conditional entropy definitions and
    others. Quantitatively, for any sequence (X_1,ldots,X_t) of random variables
    over an alphabet (mathcal{X}) we prove that, when conditioned on (m = tcdot
    O( loglog|mathcal{X}| + loglog(1/epsilon) + log t)) bits of auxiliary
    information, all conditional distributions of the form (X_i|X_{<i}) are
    (epsilon)-close to be nearly flat (only a constant factor away). The argument
    is combinatorial (based on simplex coverings).

    This result may be used as a generic tool for emph{exhibiting block-source
    structures}. We demonstrate this by reproving the fundamental converter due to
    Nisan and Zuckermann (emph{J. Computer and System Sciences, 1996}), which
    shows that sampling blocks from a min-entropy source roughly preserves the
    entropy rate. Our bound implies, only by straightforward chain rules, an
    additive loss of (o(1)) (for sufficiently many samples), which qualitatively
    meets the first tighter analysis of this problem due to Vadhan
    (emph{CRYPTO’03}), obtained by large deviation techniques.

    Analysis of Agent Expertise in Ms. Pac-Man using Value-of-Information-based Policies

    Isaac J. Sledge, Jose C. Principe
    Comments: Submitted to the IEEE Transactions on Computational Intelligence and Artificial Intelligence in Games
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)

    Conventional reinforcement learning methods for Markov decision processes
    rely on weakly-guided, stochastic searches to drive the learning process. It
    can therefore be difficult to predict what agent behaviors might emerge. In
    this paper, we consider an information-theoretic approach for performing
    constrained stochastic searches that promote the formation of risk-averse to
    risk-favoring behaviors. Our approach is based on the value of information, a
    criterion that provides an optimal trade-off between the expected return of a
    policy and the policy’s complexity. As the policy complexity is reduced, there
    is a high chance that the agents will eschew risky actions that increase the
    long-term rewards. The agents instead focus on simply completing their main
    objective in an expeditious fashion. As the policy complexity increases, the
    agents will take actions, regardless of the risk, that seek to decrease the
    long-term costs. A minimal-cost policy is sought in either case; the obtainable
    cost depends on a single, tunable parameter that regulates the degree of policy
    complexity.

    We evaluate the performance of value-of-information-based policies on a
    stochastic version of Ms. Pac-Man. A major component of this paper is
    demonstrating that ranges of policy complexity values yield different game-play
    styles and analyzing why this occurs. We show that low-complexity policies aim
    to only clear the environment of pellets while avoiding invulnerable ghosts.
    Higher-complexity policies implement multi-modal strategies that compel the
    agent to seek power-ups and chase after vulnerable ghosts, both of which reduce
    the long-term costs.

    Nearly Maximally Predictive Features and Their Dimensions

    Sarah E. Marzen, James P. Crutchfield
    Comments: 6 pages, 2 figures; Supplementary materials, 5 pages, 1 figure; this http URL
    Subjects: Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Chaotic Dynamics (nlin.CD); Machine Learning (stat.ML)

    Scientific explanation often requires inferring maximally predictive features
    from a given data set. Unfortunately, the collection of minimal maximally
    predictive features for most stochastic processes is uncountably infinite. In
    such cases, one compromises and instead seeks nearly maximally predictive
    features. Here, we derive upper-bounds on the rates at which the number and the
    coding cost of nearly maximally predictive features scales with desired
    predictive power. The rates are determined by the fractal dimensions of a
    process’ mixed-state distribution. These results, in turn, show how widely-used
    finite-order Markov models can fail as predictors and that mixed-state
    predictive features offer a substantial improvement.

    Practical issues in decoy-state quantum key distribution based on the central limit theorem

    A.S. Trushechkin, E.O. Kiktenko, A.K. Fedorov
    Comments: 7 pages, 2 figures
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    Decoy-state quantum key distribution is a standard tool for long-distance
    quantum communications. An important issue in this field is the processing the
    decoy-state statistics taking into account statistical fluctuations (or
    “finite-key effects”). In this work, we propose and analyse an option for decoy
    statistics processing, which is based on the central limit theorem. We discuss
    such practical issues as inclusion of the failure probability of the decoy
    states statistical estimates in the total failure probability of a QKD protocol
    and also taking into account the deviations of the binomially distributed
    random variables used in the estimations from the Gaussian distribution. The
    results of numerical simulations show that the obtained estimations are quite
    tight. The proposed technique can be used as a part of post-processing
    procedures for industrial quantum key distribution systems.

    Free Information Flow Benefits Truth Seeking

    Wei Su, Yongguang Yu
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)

    How can we approach the truth in a society? It may depend on various factors.
    In this paper, using a well-established truth seeking model, we show that the
    persistent free information flow will bring us to the truth. Here the free
    information flow is modeled as the environmental random noise that could alter
    one’s cognition. Without the random noise, the model predicts that the truth
    can only be captured by the truth seekers who own active perceptive ability of
    the truth and their believers, while the other individuals may stick to
    falsehood. But under the influence of the random noise, we strictly prove that
    even there is only one truth seeker in the group, all individuals will finally
    approach the truth.

    Finite-Time Elimination of Disagreement of Opinion Dynamics via Covert Noise

    Wei Su, Ge Chen, Yongguang Yu
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)

    Eliminating disagreement in a group is usually beneficial to the social
    stability. In this paper, using the well-known Hegselmann-Krause (HK) model, we
    design a quite simple strategy that could resolve the opinion difference of the
    system in finite time and induce the opinions synchronized to a targeted value.
    To be specific, we intentionally introduce weak random noise to only one agent
    to affect its opinion in the evolution and also a leader agent with fixed
    opinion in a divisive group, then strictly prove that the disagreement is
    finally eliminated and the opinions get synchronized to the leader’s opinion in
    finite time. Other than that, we calculate the finite stopping time when the
    opinions synchronously reach the objective opinion value, which could provide
    further guide for designing more efficient noise intervention strategy.




沪ICP备19023445号-2号
友情链接