IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Thu, 2 Mar 2017

    我爱机器学习(52ml.net)发表于 2017-03-02 00:00:00
    love 0

    Neural and Evolutionary Computing

    Learning A Physical Long-term Predictor

    Sebastien Ehrhardt, Aron Monszpart, Niloy J. Mitra, Andrea Vedaldi
    Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Evolution has resulted in highly developed abilities in many natural
    intelligences to quickly and accurately predict mechanical phenomena. Humans
    have successfully developed laws of physics to abstract and model such
    mechanical phenomena. In the context of artificial intelligence, a recent line
    of work has focused on estimating physical parameters based on sensory data and
    use them in physical simulators to make long-term predictions. In contrast, we
    investigate the effectiveness of a single neural network for end-to-end
    long-term prediction of mechanical phenomena. Based on extensive evaluation, we
    demonstrate that such networks can outperform alternate approaches having even
    access to ground-truth physical simulators, especially when some physical
    parameters are unobserved or not known a-priori. Further, our network outputs a
    distribution of outcomes to capture the inherent uncertainty in the data. Our
    approach demonstrates for the first time the possibility of making actionable
    long-term predictions from sensor data without requiring to explicitly model
    the underlying physical laws.

    Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

    Hairong Liu, Zhenyao Zhu, Xiangang Li, Sanjeev Satheesh
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Most existing sequence labelling models rely on a fixed decomposition of a
    target sequence into a sequence of basic units. These methods suffer from two
    major drawbacks: 1) the set of basic units is fixed, such as the set of words,
    characters or phonemes in speech recognition, and 2) the decomposition of
    target sequences is fixed. These drawbacks usually result in sub-optimal
    performance of modeling sequences. In this pa- per, we extend the popular CTC
    loss criterion to alleviate these limitations, and propose a new loss function
    called Gram-CTC. While preserving the advantages of CTC, Gram-CTC automatically
    learns the best set of basic units (grams), as well as the most suitable
    decomposition of tar- get sequences. Unlike CTC, Gram-CTC allows the model to
    output variable number of characters at each time step, which enables the model
    to capture longer term dependency and improves the computational efficiency. We
    demonstrate that the proposed Gram-CTC improves CTC in terms of both
    performance and efficiency on the large vocabulary speech recognition task at
    multiple scales of data, and that with Gram-CTC we can outperform the
    state-of-the-art on a standard speech benchmark.

    Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-tuning

    Weifeng Ge, Yizhou Yu
    Comments: To appear in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017)
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Deep neural networks require a large amount of labeled training data during
    supervised learning. However, collecting and labeling so much data might be
    infeasible in many cases. In this paper, we introduce a source-target selective
    joint fine-tuning scheme for improving the performance of deep learning tasks
    with insufficient training data. In this scheme, a target learning task with
    insufficient training data is carried out simultaneously with another source
    learning task with abundant training data. However, the source learning task
    does not use all existing training data. Our core idea is to identify and use a
    subset of training images from the original source learning task whose
    low-level characteristics are similar to those from the target learning task,
    and jointly fine-tune shared convolutional layers for both tasks. Specifically,
    we compute descriptors from linear or nonlinear filter bank responses on
    training images from both tasks, and use such descriptors to search for a
    desired subset of training samples for the source learning task.

    Experiments demonstrate that our selective joint fine-tuning scheme achieves
    state-of-the-art performance on multiple visual classification tasks with
    insufficient training data for deep learning. Such tasks include Caltech 256,
    MIT Indoor 67, Oxford Flowers 102 and Stanford Dogs 120. In comparison to
    fine-tuning without a source domain, the proposed method can improve the
    classification accuracy by 2% – 10% using a single model.


    Computer Vision and Pattern Recognition

    Graph-based Isometry Invariant Representation Learning

    Renata Khasanova, Pascal Frossard
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Learning transformation invariant representations of visual data is an
    important problem in computer vision. Deep convolutional networks have
    demonstrated remarkable results for image and video classification tasks.
    However, they have achieved only limited success in the classification of
    images that undergo geometric transformations. In this work we present a novel
    Transformation Invariant Graph-based Network (TIGraNet), which learns
    graph-based features that are inherently invariant to isometric transformations
    such as rotation and translation of input images. In particular, images are
    represented as signals on graphs, which permits to replace classical
    convolution and pooling layers in deep networks with graph spectral convolution
    and dynamic graph pooling layers that together contribute to invariance to
    isometric transformations. Our experiments show high performance on rotated and
    translated images from the test set compared to classical architectures that
    are very sensitive to transformations in the data. The inherent invariance
    properties of our framework provide key advantages, such as increased
    resiliency to data variability and sustained performance with limited training
    sets.

    Perturb-and-MPM: Quantifying Segmentation Uncertainty in Dense Multi-Label CRFs

    Raphael Meier, Urspeter Knecht, Alain Jungo, Roland Wiest, Mauricio Reyes
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes a novel approach for uncertainty quantification in dense
    Conditional Random Fields (CRFs). The presented approach, called
    Perturb-and-MPM, enables efficient, approximate sampling from dense multi-label
    CRFs via random perturbations. An analytic error analysis was performed which
    identified the main cause of approximation error as well as showed that the
    error is bounded. Spatial uncertainty maps can be derived from the
    Perturb-and-MPM model, which can be used to visualize uncertainty in image
    segmentation results. The method is validated on synthetic and clinical
    Magnetic Resonance Imaging data. The effectiveness of the approach is
    demonstrated on the challenging problem of segmenting the tumor core in
    glioblastoma. We found that areas of high uncertainty correspond well to
    wrongly segmented image regions. Furthermore, we demonstrate the potential use
    of uncertainty maps to refine imaging biomarkers in the case of extent of
    resection and residual tumor volume in brain tumor patients.

    Multi-stage Neural Networks with Single-sided Classifiers for False Positive Reduction and its Evaluation using Lung X-ray CT Images

    Masaharu Sakamoto, Hiroki Nakano, Kun Zhao, Taroh Sekiyama
    Comments: arXiv admin note: text overlap with arXiv:1611.07136
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Lung nodule classification is a class imbalanced problem because nodules are
    found with much lower frequency than non-nodules. In the class imbalanced
    problem, conventional classifiers tend to be overwhelmed by the majority class
    and ignore the minority class. We therefore propose cascaded convolutional
    neural networks to cope with the class imbalanced problem. In the proposed
    approach, multi-stage convolutional neural networks that perform as
    single-sided classifiers filter out obvious non-nodules. Successively, a
    convolutional neural network trained with a balanced data set calculates nodule
    probabilities. The proposed method achieved the sensitivity of 92.4\% and 94.5%
    at 4 and 8 false positives per scan in Free Receiver Operating Characteristics
    (FROC) curve analysis, respectively.

    Group Sparsity Residual Constraint for Image Denoising

    Zhiyuan Zha, Xinggan Zhang, Qiong Wang, Yechao Bai, Lan Tang
    Comments: arXiv admin note: text overlap with arXiv:1609.03302
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Group sparsity or nonlocal image representation has shown great potential in
    image denoising. However, most existing methods only consider the nonlocal
    self-similarity (NSS) prior of noisy input image, that is, the similar patches
    collected only from degraded input, which makes the quality of image denoising
    largely depend on the input itself. In this paper we propose a new prior model
    for image denoising, called group sparsity residual constraint (GSRC).
    Different from the most existing NSS prior-based denoising methods, two kinds
    of NSS prior (i.e., NSS priors of noisy input image and pre-filtered image) are
    simultaneously used for image denoising. In particular, to boost the
    performance of group sparse-based image denoising, the group sparsity residual
    is proposed, and thus the problem of image denoising is transformed into one
    that reduces the group sparsity residual. To reduce the residual, we first
    obtain a good estimation of the group sparse coefficients of the original image
    by pre-filtering and then the group sparse coefficients of noisy input image
    are used to approximate the estimation. To improve the accuracy of the nonlocal
    similar patches selection, an adaptive patch search scheme is proposed.
    Moreover, to fuse these two NSS priors better, an effective iterative shrinkage
    algorithm is developed to solve the proposed GSRC model. Experimental results
    have demonstrated that the proposed GSRC modeling outperforms many
    state-of-the-art denoising methods in terms of the objective and the perceptual
    qualities.

    Human Eye Visual Hyperacuity: A New Paradigm for Sensing?

    Adur Lagunas, Oier Dominguez, Susana Martinez-Conde, Stephen L. Macknik, Carlos del-Rio
    Comments: 8 pages,9 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The human eye appears to be using a low number of sensors for image
    capturing. Furthermore, regarding the physical dimensions of
    cones-photoreceptors responsible for the sharp central vision-, we may realize
    that these sensors are of a relatively small size and area. Nonetheless, the
    eye is capable to obtain high resolution images due to visual hyperacuity and
    presents an impressive sensitivity and dynamic range when set against
    conventional digital cameras of similar characteristics. This article is based
    on the hypothesis that the human eye may be benefiting from diffraction to
    improve both image resolution and acquisition process. The developed method
    intends to explain and simulate using MATLAB software the visual hyperacuity:
    the introduction of a controlled diffraction pattern at an initial stage,
    enables the use of a reduced number of sensors for capturing the image and
    makes possible a subsequent processing to improve the final image resolution.
    The results have been compared with the outcome of an equivalent system but in
    absence of diffraction, achieving promising results. The main conclusion of
    this work is that diffraction could be helpful for capturing images or signals
    when a small number of sensors available, which is far from being a
    resolution-limiting factor.

    Improving Object Detection with Region Similarity Learning

    Feng Gao, Yihang Lou, Yan Bai, Shiqi Wang, Tiejun Huang, Ling-Yu Duan
    Comments: 6 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Object detection aims to identify instances of semantic objects of a certain
    class in images or videos. The success of state-of-the-art approaches is
    attributed to the significant progress of object proposal and convolutional
    neural networks (CNNs). Most promising detectors involve multi-task learning
    with an optimization objective of softmax loss and regression loss. The first
    is for multi-class categorization, while the latter is for improving
    localization accuracy. However, few of them attempt to further investigate the
    hardness of distinguishing different sorts of distracting background regions
    (i.e., negatives) from true object regions (i.e., positives). To improve the
    performance of classifying positive object regions vs. a variety of negative
    background regions, we propose to incorporate triplet embedding into learning
    objective. The triplet units are formed by assigning each negative region to a
    meaningful object class and establishing class- specific negatives, followed by
    triplets construction. Over the benchmark PASCAL VOC 2007, the proposed triplet
    em- bedding has improved the performance of well-known FastRCNN model with a
    mAP gain of 2.1%. In particular, the state-of-the-art approach OHEM can benefit
    from the triplet embedding and has achieved a mAP improvement of 1.2%.

    Incorporating Intra-Class Variance to Fine-Grained Visual Recognition

    Yan Bai, Feng Gao, Yihang Lou, Shiqi Wang, Tiejun Huang, Ling-Yu Duan
    Comments: 6 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Fine-grained visual recognition aims to capture discriminative
    characteristics amongst visually similar categories. The state-of-the-art
    research work has significantly improved the fine-grained recognition
    performance by deep metric learning using triplet network. However, the impact
    of intra-category variance on the performance of recognition and robust feature
    representation has not been well studied. In this paper, we propose to leverage
    intra-class variance in metric learning of triplet network to improve the
    performance of fine-grained recognition. Through partitioning training images
    within each category into a few groups, we form the triplet samples across
    different categories as well as different groups, which is called Group
    Sensitive TRiplet Sampling (GS-TRS). Accordingly, the triplet loss function is
    strengthened by incorporating intra-class variance with GS-TRS, which may
    contribute to the optimization objective of triplet network. Extensive
    experiments over benchmark datasets CompCar and VehicleID show that the
    proposed GS-TRS has significantly outperformed state-of-the-art approaches in
    both classification and retrieval tasks.

    Optical Flow-based 3D Human Motion Estimation from Monocular Video

    Thiemo Alldieck, Marc Kassubeck, Marcus Magnor
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a generative method to estimate 3D human motion and body shape
    from monocular video. Under the assumption that starting from an initial pose
    optical flow constrains subsequent human motion, we exploit flow to find
    temporally coherent human poses of a motion sequence. We estimate human motion
    by minimizing the difference between computed flow fields and the output of an
    artificial flow renderer. A single initialization step is required to estimate
    motion over multiple frames. Several regularization functions enhance
    robustness over time. Our test scenarios demonstrate that optical flow
    effectively regularizes the under-constrained problem of human shape and motion
    estimation from monocular video.

    Saliency Fusion in Eigenvector Space with Multi-Channel Pulse Coupled Neural Network

    Nevrez Imamoglu, Zhixuan Wei, Huangjun Shi, Yuki Yoshida, Myagmarbayar Nergui, Jose Gonzalez, Dongyun Gu, Weidong Chen, Kenzo Nonami, Wenwei Yu
    Comments: 8 pages, 9 figures, 1 table. This submission includes detailed explanation of partial section (saliency detection) of the work “An Improved Saliency for RGB-D Visual Tracking and Control Strategies for a Bio-monitoring Mobile Robot”, Evaluating AAL Systems Through Competitive Benchmarking, Communications in Computer and Information Science, vol. 386, pp.1-12, 2013
    Journal-ref: Evaluating AAL Systems Through Competitive Benchmarking,
    Communications in Computer and Information Science, 2013
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Saliency computation has become a popular research field for many
    applications due to the useful information provided by saliency maps. For a
    saliency map, local relations around the salient regions in multi-channel
    perspective should be taken into consideration by aiming uniformity on the
    region of interest as an internal approach. And, irrelevant salient regions
    have to be avoided as much as possible. Most of the works achieve these
    criteria with external processing modules; however, these can be accomplished
    during the conspicuity map fusion process. Therefore, in this paper, a new
    model is proposed for saliency/conspicuity map fusion with two concepts: a)
    input image transformation relying on the principal component analysis (PCA),
    and b) saliency conspicuity map fusion with multi-channel pulsed coupled neural
    network (m-PCNN). Experimental results, which are evaluated by precision,
    recall, F-measure, and area under curve (AUC), support the reliability of the
    proposed method by enhancing the saliency computation.

    Inertial Odometry on Handheld Smartphones

    Arno Solin, Santiago Cortes, Esa Rahtu, Juho Kannala
    Comments: 8 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)

    Building a complete inertial navigation system using the limited quality data
    provided by current smartphones has been regarded challenging, if not
    impossible. We present a probabilistic approach for orientation and use-case
    free inertial odometry, which is based on double-integrating rotated
    accelerations. Our approach uses a probabilistic approach in fusing the noisy
    sensor data and learning the model parameters online. It is able to track the
    phone position, velocity, and pose in real-time and in a computationally
    lightweight fashion. The information fusion is completed with altitude
    correction from barometric pressure readings (if available), zero-velocity
    updates (if the phone remains stationary), and pseudo-updates limiting the
    momentary speed. We demonstrate our approach using a standard iPad and iPhone
    in several indoor dead-reckoning applications and in a measurement tool setup.

    Saliency Detection by Forward and Backward Cues in Deep-CNNs

    Nevrez Imamoglu, Chi Zhang, Wataru Shimoda, Yuming Fang, Boxin Shi
    Comments: 5 pages,4 figures,and 1 table. the content of this work is submitted and under review for ICIP
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    As prior knowledge of objects or object features helps us make relations for
    similar objects on attentional tasks, pre-trained deep convolutional neural
    networks (CNNs) can be used to detect salient objects on images regardless of
    the object class is in the network knowledge or not. In this paper, we propose
    a top-down saliency model using CNN, a weakly supervised CNN model trained for
    1000 object labelling task from RGB images. The model detects attentive regions
    based on their objectness scores predicted by selected features from CNNs. To
    estimate the salient objects effectively, we combine both forward and backward
    features, while demonstrating that partially-guided backpropagation will
    provide sufficient information for selecting the features from forward run of
    CNN model. Finally, these top-down cues are enhanced with a state-of-the-art
    bottom-up model as complementing the overall saliency. As the proposed model is
    an effective integration of forward and backward cues through objectness
    without any supervision or regression to ground truth data, it gives promising
    results compared to state-of-the-art models in two different datasets.

    RGB-D Salient Object Detection Based on Discriminative Cross-modal Transfer Learning

    Hao Chen, Y.F. Li, Dan Su
    Comments: 10 pages, currently under review for CVPR2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this work, we propose to utilize Convolutional Neural Networks to boost
    the performance of depth-induced salient object detection by capturing the
    high-level representative features for depth modality. We formulate the
    depth-induced saliency detection as a CNN-based cross-modal transfer problem to
    bridge the gap between the “data-hungry” nature of CNNs and the unavailability
    of sufficient labeled training data in depth modality. In the proposed
    approach, we leverage the auxiliary data from the source modality effectively
    by training the RGB saliency detection network to obtain the task-specific
    pre-understanding layers for the target modality. Meanwhile, we exploit the
    depth-specific information by pre-training a modality classification network
    that encourages modal-specific representations during the optimizing course.
    Thus, it could make the feature representations of the RGB and depth modalities
    as discriminative as possible. These two modules are pre-trained independently
    and then stitched to initialize and optimize the eventual depth-induced
    saliency detection model. Experiments demonstrate the effectiveness of the
    proposed novel pre-training strategy as well as the significant and consistent
    improvements of the proposed approach over other state-of-the-art methods.

    Remote Sensing Image Scene Classification: Benchmark and State of the Art

    Gong Cheng, Junwei Han, Xiaoqiang Lu
    Comments: This manuscript is the accepted version for Proceedings of the IEEE
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Remote sensing image scene classification plays an important role in a wide
    range of applications and hence has been receiving remarkable attention. During
    the past years, significant efforts have been made to develop various datasets
    or present a variety of approaches for scene classification from remote sensing
    images. However, a systematic review of the literature concerning datasets and
    methods for scene classification is still lacking. In addition, almost all
    existing datasets have a number of limitations, including the small scale of
    scene classes and the image numbers, the lack of image variations and
    diversity, and the saturation of accuracy. These limitations severely limit the
    development of new approaches especially deep learning-based methods. This
    paper first provides a comprehensive review of the recent progress. Then, we
    propose a large-scale dataset, termed “NWPU-RESISC45”, which is a publicly
    available benchmark for REmote Sensing Image Scene Classification (RESISC),
    created by Northwestern Polytechnical University (NWPU). This dataset contains
    31,500 images, covering 45 scene classes with 700 images in each class. The
    proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total
    image number, (ii) holds big variations in translation, spatial resolution,
    viewpoint, object pose, illumination, background, and occlusion, and (iii) has
    high within-class diversity and between-class similarity. The creation of this
    dataset will enable the community to develop and evaluate various data-driven
    algorithms. Finally, several representative methods are evaluated using the
    proposed dataset and the results are reported as a useful baseline for future
    research.

    Segmentation of Lesions in Dermoscopy Images Using Saliency Map And Contour Propagation

    Mostafa Jahanifar, Neda Zamani Tajeddin, Babak Mohammadzadeh Asl
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Melanoma is one of the most dangerous types of skin cancer and causes
    thousands of deaths worldwide each year. Recently dermoscopic imaging systems
    have been widely used as a diagnostic tool for melanoma detection. The first
    step in the automatic analysis of dermoscopy images is the lesion segmentation.
    In this article, a novel method for skin lesion segmentation that could be
    applied to a variety of images with different properties and deficiencies is
    proposed. After a multi-step preprocessing phase (hair removal and illumination
    correction), a supervised saliency map construction method is used to obtain an
    initial guess of lesion location. The construction of the saliency map is based
    on a random forest regressor that takes a vector of regional image features and
    return a saliency score based on them. This regressor is trained in a
    multi-level manner based on 2000 training data provided in ISIC2017 melanoma
    recognition challenge. In addition to obtaining an initial contour of lesion,
    the output saliency map can be used as a speed function alongside with image
    gradient to derive the initial contour toward the lesion boundary using a
    propagation model. The proposed algorithm has been tested on the ISIC2017
    training, validation and test datasets, and gained high values for evaluation
    metrics.

    Discrete Wavelet Transform Based Algorithm for Recognition of QRS Complexes

    Rachid Haddadi, Elhassane Abdelmounim, Mustapha El Hanine, Abdelaziz Belaguid
    Journal-ref: World of Computer Science and Information Technology Journal
    (WCSIT) ; ISSN: 2221-0741; Vol. 4, No. 9, 127-132, 2014
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes the application of Discrete Wavelet Transform (DWT) to
    detect the QRS (ECG is characterized by a recurrent wave sequence of P, QRS and
    T-wave) of an electrocardiogram (ECG) signal. Wavelet Transform provides
    localization in both time and frequency. In preprocessing stage, DWT is used to
    remove the baseline wander in the ECG signal. The performance of the algorithm
    of QRS detection is evaluated against the standard MIT BIH (Massachusetts
    Institute of Technology, Beth Israel Hospital) Arrhythmia database. The average
    QRS complexes detection rate of 98.1 % is achieved.

    Deep Image Harmonization

    Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Compositing is one of the most common operations in photo editing. To
    generate realistic composites, the appearances of foreground and background
    need to be adjusted to make them compatible. Previous approaches to harmonize
    composites have focused on learning statistical relationships between
    hand-crafted appearance features of the foreground and background, which is
    unreliable especially when the contents in the two layers are vastly different.
    In this work, we propose an end-to-end deep convolutional neural network for
    image harmonization, which can capture both the context and semantic
    information of the composite images during harmonization. We also introduce an
    efficient way to collect large-scale and high-quality training data that can
    facilitate the training process. Experiments on the synthesized dataset and
    real composite images show that the proposed network outperforms previous
    state-of-the-art methods.

    Context-Sensitive Super-Resolution for Fast Fetal Magnetic Resonance Imaging

    Steven McDonagh, Benjamin Hou, Konstantinos Kamnitsas, Ozan Oktay, Amir Alansary, Bernhard Kainz
    Comments: 8 pages, 4 figures, currently under review for MICCAI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    3D Magnetic Resonance Imaging (MRI) is often a trade-off between fast but
    low-resolution image acquisition and highly detailed but slow image
    acquisition. Fast imaging is required for targets that move to avoid motion
    artefacts. This is in particular difficult for fetal MRI. Spatially independent
    upsampling techniques, which are the state-of-the-art to address this problem,
    are error prone and disregard contextual information. In this paper we propose
    a context-sensitive upsampling method based on a residual convolutional neural
    network model that learns organ specific appearance and adopts semantically to
    input data allowing for the generation of high resolution images with sharp
    edges and fine scale detail. By making contextual decisions about appearance
    and shape, present in different parts of an image, we gain a maximum of
    structural detail at a similar contrast as provided by high-resolution data. We
    experiment on (145) fetal scans and show that our approach yields an increased
    PSNR of (1.25) (dB) when applied to under-sampled fetal data cf. baseline
    upsampling. Furthermore, our method yields an increased PSNR of (1.73) (dB)
    when utilizing under-sampled fetal data to perform brain volume reconstruction
    on motion corrupted captured data.

    Lossy Image Compression with Compressive Autoencoders

    Lucas Theis, Wenzhe Shi, Andrew Cunningham, Ferenc Huszár
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV)

    We propose a new approach to the problem of optimizing autoencoders for lossy
    image compression. New media formats, changing hardware technology, as well as
    diverse requirements and content types create a need for compression algorithms
    which are more flexible than existing codecs. Autoencoders have the potential
    to address this need, but are difficult to optimize directly due to the
    inherent non-differentiabilty of the compression loss. We here show that
    minimal changes to the loss are sufficient to train deep autoencoders
    competitive with JPEG 2000 and outperforming recently proposed approaches based
    on RNNs. Our network is furthermore computationally efficient thanks to a
    sub-pixel architecture, which makes it suitable for high-resolution images.
    This is in contrast to previous work on autoencoders for compression using
    coarser approximations, shallower architectures, computationally expensive
    methods, or focusing on small images.

    Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank

    Liang Zhao, Siyu Liao, Yanzhi Wang, Jian Tang, Bo Yuan
    Comments: 13 pages, 1 figure
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Recently low displacement rank (LDR) matrices, or so-called structured
    matrices, have been proposed to compress large-scale neural networks. Empirical
    results have shown that neural networks with weight matrices of LDR matrices,
    referred as LDR neural networks, can achieve significant reduction in space and
    computational complexity while retaining high accuracy. We formally study LDR
    matrices in deep learning. First, we prove the universal approximation property
    of LDR neural networks with a mild condition on the displacement operators. We
    then show that the error bounds of LDR neural networks are as efficient as
    general neural networks with both single-layer and multiple-layer structure.
    Finally, we propose back-propagation based training algorithm for general LDR
    neural networks.


    Artificial Intelligence

    HolStep: A Machine Learning Dataset for Higher-order Logic Theorem Proving

    Cezary Kaliszyk, François Chollet, Christian Szegedy
    Subjects: Artificial Intelligence (cs.AI)

    Large computer-understandable proofs consist of millions of intermediate
    logical steps. The vast majority of such steps originate from manually selected
    and manually guided heuristics applied to intermediate goals. So far, machine
    learning has generally not been used to filter or generate these steps. In this
    paper, we introduce a new dataset based on Higher-Order Logic (HOL) proofs, for
    the purpose of developing new machine learning-based theorem-proving
    strategies. We make this dataset publicly available under the BSD license. We
    propose various machine learning tasks that can be performed on this dataset,
    and discuss their significance for theorem proving. We also benchmark a set of
    simple baseline machine learning models suited for the tasks (including
    logistic regression, convolutional neural networks and recurrent neural
    networks). The results of our baseline models show the promise of applying
    machine learning to HOL theorem proving.

    A Hypercat-enabled Semantic Internet of Things Data Hub: Technical Report

    Ilias Tachmazidis, Sotiris Batsakis, John Davies, Alistair Duke, Mauro Vallati, Grigoris Antoniou, Sandra Stincic Clarke
    Comments: Technical report of an accepted ESWC-2017 paper
    Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB)

    An increasing amount of information is generated from the rapidly increasing
    number of sensor networks and smart devices. A wide variety of sources generate
    and publish information in different formats, thus highlighting
    interoperability as one of the key prerequisites for the success of Internet of
    Things (IoT). The BT Hypercat Data Hub provides a focal point for the sharing
    and consumption of available datasets from a wide range of sources. In this
    work, we propose a semantic enrichment of the BT Hypercat Data Hub, using
    well-accepted Semantic Web standards and tools. We propose an ontology that
    captures the semantics of the imported data and present the BT SPARQL Endpoint
    by means of a mapping between SPARQL and SQL queries. Furthermore, federated
    SPARQL queries allow queries over multiple hub-based and external data sources.
    Finally, we provide two use cases in order to illustrate the advantages
    afforded by our semantic approach.

    Learning A Physical Long-term Predictor

    Sebastien Ehrhardt, Aron Monszpart, Niloy J. Mitra, Andrea Vedaldi
    Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Evolution has resulted in highly developed abilities in many natural
    intelligences to quickly and accurately predict mechanical phenomena. Humans
    have successfully developed laws of physics to abstract and model such
    mechanical phenomena. In the context of artificial intelligence, a recent line
    of work has focused on estimating physical parameters based on sensory data and
    use them in physical simulators to make long-term predictions. In contrast, we
    investigate the effectiveness of a single neural network for end-to-end
    long-term prediction of mechanical phenomena. Based on extensive evaluation, we
    demonstrate that such networks can outperform alternate approaches having even
    access to ground-truth physical simulators, especially when some physical
    parameters are unobserved or not known a-priori. Further, our network outputs a
    distribution of outcomes to capture the inherent uncertainty in the data. Our
    approach demonstrates for the first time the possibility of making actionable
    long-term predictions from sensor data without requiring to explicitly model
    the underlying physical laws.

    OptNet: Differentiable Optimization as a Layer in Neural Networks

    Brandon Amos, J. Zico Kolter
    Comments: Submitted to ICML 2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

    This paper presents OptNet, a network architecture that integrates
    optimization problems (here, specifically in the form of quadratic programs) as
    individual layers in larger end-to-end trainable deep networks. These layers
    allow complex dependencies between the hidden states to be captured that
    traditional convolutional and fully-connected layers are not able to capture.
    In this paper, we develop the foundations for such an architecture: we derive
    the equations to perform exact differentiation through these layers and with
    respect to layer parameters; we develop a highly efficient solver for these
    layers that exploits fast GPU-based batch solves within a primal-dual interior
    point method, and which provides backpropagation gradients with virtually no
    additional cost on top of the solve; and we highlight the application of these
    approaches in several problems. In one particularly standout example, we show
    that the method is capable of learning to play Sudoku given just input and
    output games, with no a priori information about the rules of the game; this
    task is virtually impossible for other neural network architectures that we
    have experimented with, and highlights the representation capabilities of our
    approach.

    Learning to Optimize Neural Nets

    Ke Li, Jitendra Malik
    Comments: 10 pages, 15 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

    Learning to Optimize is a recently proposed framework for learning
    optimization algorithms using reinforcement learning. In this paper, we explore
    learning an optimization algorithm for training shallow neural nets. Such
    high-dimensional stochastic optimization problems present interesting
    challenges for existing reinforcement learning algorithms. We develop an
    extension that is suited to learning optimization algorithms in this setting
    and demonstrate that the learned optimization algorithm consistently
    outperforms other known optimization algorithms even on unseen tasks and is
    robust to changes in stochasticity of gradients and the neural net
    architecture. More specifically, we show that an optimization algorithm trained
    with the proposed method on the problem of training a neural net on MNIST
    generalizes to the problems of training neural nets on the Toronto Faces
    Dataset, CIFAR-10 and CIFAR-100.

    Fast k-Nearest Neighbour Search via Prioritized DCI

    Ke Li, Jitendra Malik
    Comments: 11 pages, 5 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR); Machine Learning (stat.ML)

    Most exact methods for k-nearest neighbour search suffer from the curse of
    dimensionality; that is, their query times exhibit exponential dependence on
    either the ambient or the intrinsic dimensionality. Dynamic Continuous Indexing
    (DCI) offers a promising way of circumventing the curse by avoiding space
    partitioning and achieves a query time that grows sublinearly in the intrinsic
    dimensionality. In this paper, we develop a variant of DCI, which we call
    Prioritized DCI, and show a further improvement in the dependence on the
    intrinsic dimensionality compared to standard DCI, thereby improving the
    performance of DCI on datasets with high intrinsic dimensionality. We also
    demonstrate empirically that Prioritized DCI compares favourably to standard
    DCI and Locality-Sensitive Hashing (LSH) both in terms of running time and
    space consumption at all levels of approximation quality. In particular,
    relative to LSH, Prioritized DCI reduces the number of distance evaluations by
    a factor of 5 to 30 and the space consumption by a factor of 47 to 55.

    Virtual-to-real Deep Reinforcement Learning: Continuous Control of Mobile Robots for Mapless Navigation

    Lei Tai, Giuseppe Paolo, Ming Liu
    Comments: 8 pages, 9 figures, under review, video: this https URL
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Deep Reinforcement Learning has been successful in various virtual tasks, but
    it is still rarely used in real world applications especially for continuous
    control of mobile robots navigation. In this paper, we present a learning-based
    mapless motion planner by taking the 10-dimensional range findings and the
    target position as input and the continuous steering commands as output.
    Traditional motion planners for mobile ground robots with a laser range sensor
    mostly depend on the map of the navigation environment where both the highly
    precise laser sensor and the map building work of the environment are
    indispensable. We show that, through an asynchronous deep reinforcement
    learning method, a mapless motion planner can be trained end-to-end without any
    manually designed features and prior demonstrations. The trained planner can be
    directly applied in unseen virtual and real environments. We also evaluated
    this learning-based motion planner and compared it with the traditional motion
    planning method, both in virtual and real environments. The experiments show
    that the proposed mapless motion planner can navigate the nonholonomic mobile
    robot to the desired targets without colliding with any obstacles.

    The Statistical Recurrent Unit

    Junier B. Oliva, Barnabas Poczos, Jeff Schneider
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Sophisticated gated recurrent neural network architectures like LSTMs and
    GRUs have been shown to be highly effective in a myriad of applications. We
    develop an un-gated unit, the statistical recurrent unit (SRU), that is able to
    learn long term dependencies in data by only keeping moving averages of
    statistics. The SRU’s architecture is simple, un-gated, and contains a
    comparable number of parameters to LSTMs; yet, SRUs perform favorably to more
    sophisticated LSTM and GRU alternatives, often outperforming one or both in
    various tasks. We show the efficacy of SRUs as compared to LSTMs and GRUs in an
    unbiased manner by optimizing respective architectures’ hyperparameters in a
    Bayesian optimization scheme for both synthetic and real-world tasks.

    Do Reichenbachian Common Cause Systems of Arbitrary Finite Size Exist?

    Claudio Mazzola, Peter Evans
    Subjects: Other Statistics (stat.OT); Artificial Intelligence (cs.AI); History and Philosophy of Physics (physics.hist-ph)

    The principle of common cause asserts that positive correlations between
    causally unrelated events ought to be explained through the action of some
    shared causal factors. Reichenbachian common cause systems are probabilistic
    structures aimed at accounting for cases where correlations of the aforesaid
    sort cannot be explained through the action of a single common cause. The
    existence of Reichenbachian common cause systems of arbitrary finite size for
    each pair of non-causally correlated events was allegedly demonstrated by
    Hofer-Szab’o and R’edei in 2006. This paper shows that their proof is
    logically deficient, and we propose an improved proof.

    Investigating the Characteristics of One-Sided Matching Mechanisms Under Various Preferences and Risk Attitudes

    Hadi Hosseini, Kate Larson, Robin Cohen
    Comments: arXiv admin note: text overlap with arXiv:1503.01488
    Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

    One-sided matching mechanisms are fundamental for assigning a set of
    indivisible objects to a set of self-interested agents when monetary transfers
    are not allowed. Two widely-studied randomized mechanisms in multiagent
    settings are the Random Serial Dictatorship (RSD) and the Probabilistic Serial
    Rule (PS). Both mechanisms require only that agents specify ordinal preferences
    and have a number of desirable economic and computational properties. However,
    the induced outcomes of the mechanisms are often incomparable and thus there
    are challenges when it comes to deciding which mechanism to adopt in practice.
    In this paper, we first consider the space of general ordinal preferences and
    provide empirical results on the (in)comparability of RSD and PS. We analyze
    their respective economic properties under general and lexicographic
    preferences. We then instantiate utility functions with the goal of gaining
    insights on the manipulability, efficiency, and envyfreeness of the mechanisms
    under different risk-attitude models. Our results hold under various preference
    distribution models, which further confirm the broad use of RSD in most
    practical applications.

    Tracing Linguistic Relations in Winning and Losing Sides of Explicit Opposing Groups

    Ceyda Sanli, Anupam Mondal, Erik Cambria
    Comments: Full paper, Proceedings of FLAIRS-2017 (30th Florida Artificial Intelligence Research Society), Special Track, Artificial Intelligence for Big Social Data Analysis
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    Linguistic relations in oral conversations present how opinions are
    constructed and developed in a restricted time. The relations bond ideas,
    arguments, thoughts, and feelings, re-shape them during a speech, and finally
    build knowledge out of all information provided in the conversation. Speakers
    share a common interest to discuss. It is expected that each speaker’s reply
    includes duplicated forms of words from previous speakers. However, linguistic
    adaptation is observed and evolves in a more complex path than just
    transferring slightly modified versions of common concepts. A conversation
    aiming a benefit at the end shows an emergent cooperation inducing the
    adaptation. Not only cooperation, but also competition drives the adaptation or
    an opposite scenario and one can capture the dynamic process by tracking how
    the concepts are linguistically linked. To uncover salient complex dynamic
    events in verbal communications, we attempt to discover self-organized
    linguistic relations hidden in a conversation with explicitly stated winners
    and losers. We examine open access data of the United States Supreme Court. Our
    understanding is crucial in big data research to guide how transition states in
    opinion mining and decision-making should be modeled and how this required
    knowledge to guide the model should be pinpointed, by filtering large amount of
    data.

    Learning Conversational Systems that Interleave Task and Non-Task Content

    Zhou Yu, Alan W Black, Alexander I. Rudnicky
    Comments: Dialog Systems, Reinforcement Learning
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

    Task-oriented dialog systems have been applied in various tasks, such as
    automated personal assistants, customer service providers and tutors. These
    systems work well when users have clear and explicit intentions that are
    well-aligned to the systems’ capabilities. However, they fail if users
    intentions are not explicit. To address this shortcoming, we propose a
    framework to interleave non-task content (i.e. everyday social conversation)
    into task conversations. When the task content fails, the system can still keep
    the user engaged with the non-task content. We trained a policy using
    reinforcement learning algorithms to promote long-turn conversation coherence
    and consistency, so that the system can have smooth transitions between task
    and non-task content. To test the effectiveness of the proposed framework, we
    developed a movie promotion dialog system. Experiments with human users
    indicate that a system that interleaves social and task content achieves a
    better task success rate and is also rated as more engaging compared to a pure
    task-oriented system.

    Provable Optimal Algorithms for Generalized Linear Contextual Bandits

    Lihong Li, Yu Lu, Dengyong Zhou
    Comments: 15 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Contextual bandits are widely used in Internet services from news
    recommendation to advertising. Generalized linear models (logistical regression
    in particular) have demonstrated stronger performance than linear models in
    many applications. However, most theoretical analyses on contextual bandits so
    far are on linear bandits. In this work, we propose an upper confidence bound
    based algorithm for generalized linear contextual bandits, which achieves an
    ( ilde{O}(sqrt{dT})) regret over (T) rounds with (d) dimensional feature
    vectors. This regret matches the minimax lower bound, up to logarithmic terms,
    and improves on the best previous result by a (sqrt{d}) factor, assuming the
    number of arms is fixed. A key component in our analysis is to establish a new,
    sharp finite-sample confidence bound for maximum-likelihood estimates in
    generalized linear models, which may be of independent interest. We also
    analyze a simpler upper confidence bound algorithm, which is useful in
    practice, and prove it to have optimal regret for the certain cases.

    Analysing Congestion Problems in Multi-agent Reinforcement Learning

    Roxana Rădulescu, Peter Vrancx, Ann Nowé
    Comments: 8 pages, Adaptive Learning Agents (ALA) Workshop at AAMAS 2017 submission
    Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)

    Congestion problems are omnipresent in today’s complex networks and represent
    a challenge in many research domains. In the context of Multi-agent
    Reinforcement Learning (MARL), approaches like difference rewards and resource
    abstraction have shown promising results in tackling such problems. Resource
    abstraction was shown to be an ideal candidate for solving large-scale resource
    allocation problems in a fully decentralized manner. However, its performance
    and applicability strongly depends on some, until now, undocumented
    assumptions. Two of the main congestion benchmark problems considered in the
    literature are: the Beach Problem Domain and the Traffic Lane Domain. In both
    settings the highest system utility is achieved when overcrowding one resource
    and keeping the rest at optimum capacity. We analyse how abstract grouping can
    promote this behaviour and how feasible it is to apply this approach in a
    real-world domain (i.e., what assumptions need to be satisfied and what
    knowledge is necessary). We introduce a new test problem, the Road Network
    Domain (RND), where the resources are no longer independent, but rather part of
    a network (e.g., road network), thus choosing one path will also impact the
    load on other paths having common road segments. We demonstrate the application
    of state-of-the-art MARL methods for this new congestion model and analyse
    their performance. RND allows us to highlight an important limitation of
    resource abstraction and show that the difference rewards approach manages to
    better capture and inform the agents about the dynamics of the environment.

    Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-tuning

    Weifeng Ge, Yizhou Yu
    Comments: To appear in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017)
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Deep neural networks require a large amount of labeled training data during
    supervised learning. However, collecting and labeling so much data might be
    infeasible in many cases. In this paper, we introduce a source-target selective
    joint fine-tuning scheme for improving the performance of deep learning tasks
    with insufficient training data. In this scheme, a target learning task with
    insufficient training data is carried out simultaneously with another source
    learning task with abundant training data. However, the source learning task
    does not use all existing training data. Our core idea is to identify and use a
    subset of training images from the original source learning task whose
    low-level characteristics are similar to those from the target learning task,
    and jointly fine-tune shared convolutional layers for both tasks. Specifically,
    we compute descriptors from linear or nonlinear filter bank responses on
    training images from both tasks, and use such descriptors to search for a
    desired subset of training samples for the source learning task.

    Experiments demonstrate that our selective joint fine-tuning scheme achieves
    state-of-the-art performance on multiple visual classification tasks with
    insufficient training data for deep learning. Such tasks include Caltech 256,
    MIT Indoor 67, Oxford Flowers 102 and Stanford Dogs 120. In comparison to
    fine-tuning without a source domain, the proposed method can improve the
    classification accuracy by 2% – 10% using a single model.


    Information Retrieval

    Combating the Cold Start User Problem in Model Based Collaborative Filtering

    Sampoorna Biswas, Laks V.S. Lakshmanan, Senjuti Basu Ray
    Subjects: Information Retrieval (cs.IR)

    For tackling the well known cold-start user problem in model-based
    recommender systems, one approach is to recommend a few items to a cold-start
    user and use the feedback to learn a profile. The learned profile can then be
    used to make good recommendations to the cold user. In the absence of a good
    initial profile, the recommendations are like random probes, but if not chosen
    judiciously, both bad recommendations and too many recommendations may turn off
    a user. We formalize the cold-start user problem by asking what are the (b)
    best items we should recommend to a cold-start user, in order to learn her
    profile most accurately, where (b), a given budget, is typically a small
    number. We formalize the problem as an optimization problem and present
    multiple non-trivial results, including NP-hardness as well as hardness of
    approximation. We furthermore show that the objective function, i.e., the least
    square error of the learned profile w.r.t. the true user profile, is neither
    submodular nor supermodular, suggesting efficient approximations are unlikely
    to exist. Finally, we discuss several scalable heuristic approaches for
    identifying the (b) best items to recommend to the user and experimentally
    evaluate their performance on 4 real datasets. Our experiments show that our
    proposed accelerated algorithms significantly outperform the prior art in
    runnning time, while achieving similar error in the learned user profile as
    well as in the rating predictions.

    Weighted meta-path generation, Multi-relational recommender system, Heterogeneous information network, Weighted random walk sampling

    Fatemeh Vahedian, Robin Burke, Bamshad Mobasher
    Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    In the information overloaded web, personalized recommender systems are
    essential tools to help users find most relevant information. The most
    heavily-used recommendation frameworks assume user interactions that are
    characterized by a single relation. However, for many tasks, such as
    recommendation in social networks, user-item interactions must be modeled as a
    complex network of multiple relations, not only a single relation. Recently
    research on multi-relational factorization and hybrid recommender models has
    shown that using extended meta-paths to capture additional information about
    both users and items in the network can enhance the accuracy of recommendations
    in such networks. Most of this work is focused on unweighted heterogeneous
    networks, and to apply these techniques, weighted relations must be simplified
    into binary ones. However, information associated with weighted edges, such as
    user ratings, which may be crucial for recommendation, are lost in such
    binarization. In this paper, we explore a random walk sampling method in which
    the frequency of edge sampling is a function of edge weight, and apply this
    generate extended meta-paths in weighted heterogeneous networks. With this
    sampling technique, we demonstrate improved performance on multiple data sets
    both in terms of recommendation accuracy and model generation efficiency.

    Fast k-Nearest Neighbour Search via Prioritized DCI

    Ke Li, Jitendra Malik
    Comments: 11 pages, 5 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR); Machine Learning (stat.ML)

    Most exact methods for k-nearest neighbour search suffer from the curse of
    dimensionality; that is, their query times exhibit exponential dependence on
    either the ambient or the intrinsic dimensionality. Dynamic Continuous Indexing
    (DCI) offers a promising way of circumventing the curse by avoiding space
    partitioning and achieves a query time that grows sublinearly in the intrinsic
    dimensionality. In this paper, we develop a variant of DCI, which we call
    Prioritized DCI, and show a further improvement in the dependence on the
    intrinsic dimensionality compared to standard DCI, thereby improving the
    performance of DCI on datasets with high intrinsic dimensionality. We also
    demonstrate empirically that Prioritized DCI compares favourably to standard
    DCI and Locality-Sensitive Hashing (LSH) both in terms of running time and
    space consumption at all levels of approximation quality. In particular,
    relative to LSH, Prioritized DCI reduces the number of distance evaluations by
    a factor of 5 to 30 and the space consumption by a factor of 47 to 55.

    Second Screen User Profiling and Multi-level Smart Recommendations in the context of Social TVs

    Angelos Valsamis, Alexandros Psychas, Fotis Aisopos, Andreas Menychtas, Theodora Varvarigou
    Comments: In: Wu TT., Gennari R., Huang YM., Xie H., Cao Y. (eds) Emerging Technologies for Education. SETE 2016
    Journal-ref: Lecture Notes in Computer Science, vol 10108. Springer, Cham,
    2017, pp 514-525
    Subjects: Multimedia (cs.MM); Information Retrieval (cs.IR)

    In the context of Social TV, the increasing popularity of first and second
    screen users, interacting and posting content online, illustrates new business
    opportunities and related technical challenges, in order to enrich user
    experience on such environments. SAM (Socializing Around Media) project uses
    Social Media-connected infrastructure to deal with the aforementioned
    challenges, providing intelligent user context management models and mechanisms
    capturing social patterns, to apply collaborative filtering techniques and
    personalized recommendations towards this direction. This paper presents the
    Context Management mechanism of SAM, running in a Social TV environment to
    provide smart recommendations for first and second screen content. Work
    presented is evaluated using real movie rating dataset found online, to
    validate the SAM’s approach in terms of effectiveness as well as efficiency.


    Computation and Language

    Tracing Linguistic Relations in Winning and Losing Sides of Explicit Opposing Groups

    Ceyda Sanli, Anupam Mondal, Erik Cambria
    Comments: Full paper, Proceedings of FLAIRS-2017 (30th Florida Artificial Intelligence Research Society), Special Track, Artificial Intelligence for Big Social Data Analysis
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    Linguistic relations in oral conversations present how opinions are
    constructed and developed in a restricted time. The relations bond ideas,
    arguments, thoughts, and feelings, re-shape them during a speech, and finally
    build knowledge out of all information provided in the conversation. Speakers
    share a common interest to discuss. It is expected that each speaker’s reply
    includes duplicated forms of words from previous speakers. However, linguistic
    adaptation is observed and evolves in a more complex path than just
    transferring slightly modified versions of common concepts. A conversation
    aiming a benefit at the end shows an emergent cooperation inducing the
    adaptation. Not only cooperation, but also competition drives the adaptation or
    an opposite scenario and one can capture the dynamic process by tracking how
    the concepts are linguistically linked. To uncover salient complex dynamic
    events in verbal communications, we attempt to discover self-organized
    linguistic relations hidden in a conversation with explicitly stated winners
    and losers. We examine open access data of the United States Supreme Court. Our
    understanding is crucial in big data research to guide how transition states in
    opinion mining and decision-making should be modeled and how this required
    knowledge to guide the model should be pinpointed, by filtering large amount of
    data.

    Learning Conversational Systems that Interleave Task and Non-Task Content

    Zhou Yu, Alan W Black, Alexander I. Rudnicky
    Comments: Dialog Systems, Reinforcement Learning
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

    Task-oriented dialog systems have been applied in various tasks, such as
    automated personal assistants, customer service providers and tutors. These
    systems work well when users have clear and explicit intentions that are
    well-aligned to the systems’ capabilities. However, they fail if users
    intentions are not explicit. To address this shortcoming, we propose a
    framework to interleave non-task content (i.e. everyday social conversation)
    into task conversations. When the task content fails, the system can still keep
    the user engaged with the non-task content. We trained a policy using
    reinforcement learning algorithms to promote long-turn conversation coherence
    and consistency, so that the system can have smooth transitions between task
    and non-task content. To test the effectiveness of the proposed framework, we
    developed a movie promotion dialog system. Experiments with human users
    indicate that a system that interleaves social and task content achieves a
    better task success rate and is also rated as more engaging compared to a pure
    task-oriented system.

    Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

    Hairong Liu, Zhenyao Zhu, Xiangang Li, Sanjeev Satheesh
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Most existing sequence labelling models rely on a fixed decomposition of a
    target sequence into a sequence of basic units. These methods suffer from two
    major drawbacks: 1) the set of basic units is fixed, such as the set of words,
    characters or phonemes in speech recognition, and 2) the decomposition of
    target sequences is fixed. These drawbacks usually result in sub-optimal
    performance of modeling sequences. In this pa- per, we extend the popular CTC
    loss criterion to alleviate these limitations, and propose a new loss function
    called Gram-CTC. While preserving the advantages of CTC, Gram-CTC automatically
    learns the best set of basic units (grams), as well as the most suitable
    decomposition of tar- get sequences. Unlike CTC, Gram-CTC allows the model to
    output variable number of characters at each time step, which enables the model
    to capture longer term dependency and improves the computational efficiency. We
    demonstrate that the proposed Gram-CTC improves CTC in terms of both
    performance and efficiency on the large vocabulary speech recognition task at
    multiple scales of data, and that with Gram-CTC we can outperform the
    state-of-the-art on a standard speech benchmark.

    A Joint Identification Approach for Argumentative Writing Revisions

    Fan Zhang, Diane Litman
    Subjects: Computation and Language (cs.CL)

    Prior work on revision identification typically uses a pipeline method:
    revision extraction is first conducted to identify the locations of revisions
    and revision classification is then conducted on the identified revisions. Such
    a setting propagates the errors of the revision extraction step to the revision
    classification step. This paper proposes an approach that identifies the
    revision location and the revision type jointly to solve the issue of error
    propagation. It utilizes a sequence representation of revisions and conducts
    sequence labeling for revision identification. A mutation-based approach is
    utilized to update identification sequences. Results demonstrate that our
    proposed approach yields better performance on both revision location
    extraction and revision type classification compared to a pipeline baseline.

    Frequency patterns of semantic change: Corpus-based evidence of a near-critical dynamics in language change

    Quentin Feltgen, Benjamin Fagard, Jean-Pierre Nadal
    Subjects: Physics and Society (physics.soc-ph); Computation and Language (cs.CL)

    It is generally believed that, when a linguistic item acquires a new meaning,
    its overall frequency of use in the language rises with time with an S-shaped
    growth curve. Yet, this claim has only been supported by a limited number of
    case studies. In this paper, we provide the first corpus-based quantitative
    confirmation of the genericity of the S-curve in language change. Moreover, we
    uncover another generic pattern, a latency phase of variable duration preceding
    the S-growth, during which the frequency of use of the semantically expanding
    word remains low and more or less constant. We also propose a usage-based model
    of language change supported by cognitive considerations, which predicts that
    both phases, the latency and the fast S-growth, take place. The driving
    mechanism is a stochastic dynamics, a random walk in the space of frequency of
    use. The underlying deterministic dynamics highlights the role of a control
    parameter, the strength of the cognitive impetus governing the onset of change,
    which tunes the system at the vicinity of a saddle-node bifurcation. In the
    neighborhood of the critical point, the latency phase corresponds to the
    diffusion time over the critical region, and the S-growth to the fast
    convergence that follows. The duration of the two phases is computed as
    specific first passage times of the random walk process, leading to
    distributions that fit well the ones extracted from our dataset. We argue that
    our results are not specific to the studied corpus, but apply to semantic
    change in general.

    SceneSeer: 3D Scene Design with Natural Language

    Angel X. Chang, Mihail Eric, Manolis Savva, Christopher D. Manning
    Subjects: Graphics (cs.GR); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)

    Designing 3D scenes is currently a creative task that requires significant
    expertise and effort in using complex 3D design interfaces. This effortful
    design process starts in stark contrast to the easiness with which people can
    use language to describe real and imaginary environments. We present SceneSeer:
    an interactive text to 3D scene generation system that allows a user to design
    3D scenes using natural language. A user provides input text from which we
    extract explicit constraints on the objects that should appear in the scene.
    Given these explicit constraints, the system then uses a spatial knowledge base
    learned from an existing database of 3D scenes and 3D object models to infer an
    arrangement of the objects forming a natural scene matching the input
    description. Using textual commands the user can then iteratively refine the
    created scene by adding, removing, replacing, and manipulating objects. We
    evaluate the quality of 3D scenes generated by SceneSeer in a perceptual
    evaluation experiment where we compare against manually designed scenes and
    simpler baselines for 3D scene generation. We demonstrate how the generated
    scenes can be iteratively refined through simple natural language commands.


    Distributed, Parallel, and Cluster Computing

    On Using Micro-Clouds to Deliver the Fog

    Yehia Elkhatib, Barry Porter, Heverson B. Ribeiro, Mohamed Faten Zhani, Junaid Qadir, Etienne Riviere
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Cloud computing has demonstrated itself to be a scalable and cost-efficient
    solution for many real-world applications. However, its modus operandi is not
    ideally suited to resource-constrained environments that are characterized by
    limited network bandwidth and high latencies. With the increasing proliferation
    and sophistication of edge devices, the idea of fog computing proposes to
    offload some of the computation to the edge. To this end, micro-clouds—which
    are modular and portable assemblies of small single-board computers—have
    started to gain attention as infrastructures to support fog computing by
    offering isolated resource provisioning at the edge in a cost-effective way. We
    investigate the feasibility and readiness of micro-clouds for delivering the
    vision of fog computing. Through a number of experiments, we showcase the
    potential of micro-clouds formed by collections of Raspberry Pi computers to
    host a range of fog-related applications, particularly for locations where
    there is limited network bandwidths and long latencies.

    Resource Management in Cloud Computing: Classification and Taxonomy

    Swapnil M Parikh, Narendra M Patel, Harshadkumar B Prajapati
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Cloud Computing is a new era of remote computing / Internet based computing
    where one can access their personal resources easily from any computer through
    Internet. Cloud delivers computing as a utility as it is available to the cloud
    consumers on demand. It is a simple pay-per-use consumer-provider service
    model. It contains large number of shared resources. So Resource Management is
    always a major issue in cloud computing like any other computing paradigm. Due
    to the availability of finite resources it is very challenging for cloud
    providers to provide all the requested resources. From the cloud providers
    perspective cloud resources must be allocated in a fair and efficient manner.
    Research Survey is not available from the perspective of resource management as
    a process in cloud computing. So this research paper provides a detailed
    sequential view / steps on resource management in cloud computing. Firstly this
    research paper classifies various resources in cloud computing. It also gives
    taxonomy on resource management in cloud computing through which one can do
    further research. Lastly comparisons on various resource management algorithms
    has been presented.

    Performance and Portability of Accelerated Lattice Boltzmann Applications with OpenACC

    E. Calore, A. Gabbana, J. Kraus, S. F. Schifano, R. Tripiccione
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    An increasingly large number of HPC systems rely on heterogeneous
    architectures combining traditional multi-core CPUs with power efficient
    accelerators. Designing efficient applications for these systems has been
    troublesome in the past as accelerators could usually be programmed using
    specific programming languages threatening maintainability, portability and
    correctness. Several new programming environments try to tackle this problem.
    Among them, OpenACC offers a high-level approach based on compiler directive
    clauses to mark regions of existing C, C++ or Fortran codes to run on
    accelerators. This approach directly addresses code portability, leaving to
    compilers the support of each different accelerator, but one has to carefully
    assess the relative costs of portable approaches versus computing efficiency.
    In this paper we address precisely this issue, using as a test-bench a
    massively parallel Lattice Boltzmann algorithm. We first describe our
    multi-node implementation and optimization of the algorithm, using OpenACC and
    MPI. We then benchmark the code on a variety of processors, including
    traditional CPUs and GPUs, and make accurate performance comparisons with other
    GPU implementations of the same algorithm using CUDA and OpenCL. We also asses
    the performance impact associated to portable programming, and the actual
    portability and performance-portability of OpenACC-based applications across
    several state-of-the- art architectures.

    Massively parallel lattice-Boltzmann codes on large GPU clusters

    E. Calore, A. Gabbana, J. Kraus, E. Pellegrini, S.F. Schifano, R. Tripiccione
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    This paper describes a massively parallel code for a state-of-the art thermal
    lattice- Boltzmann method. Our code has been carefully optimized for
    performance on one GPU and to have a good scaling behavior extending to a large
    number of GPUs. Versions of this code have been already used for large-scale
    studies of convective turbulence. GPUs are becoming increasingly popular in HPC
    applications, as they are able to deliver higher performance than traditional
    processors. Writing efficient programs for large clusters is not an easy task
    as codes must adapt to increasingly parallel architectures, and the overheads
    of node-to-node communications must be properly handled. We describe the
    structure of our code, discussing several key design choices that were guided
    by theoretical models of performance and experimental benchmarks. We present an
    extensive set of performance measurements and identify the corresponding main
    bot- tlenecks; finally we compare the results of our GPU code with those
    measured on other currently available high performance processors. Our results
    are a production-grade code able to deliver a sustained performance of several
    tens of Tflops as well as a design and op- timization methodology that can be
    used for the development of other high performance applications for
    computational physics.

    Reproducible experiments on dynamic resource allocation in cloud data centers

    Andreas Wolke, Martin Bichler, Fernando Chirigati, Victoria Steeves
    Journal-ref: Information Systems, Volume 59, July 2016, Pages 98-101, ISSN
    0306-4379
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In Wolke et al. [1] we compare the efficiency of different resource
    allocation strategies experimentally. We focused on dynamic environments where
    virtual machines need to be allocated and deallocated to servers over time. In
    this companion paper, we describe the simulation framework and how to run
    simulations to replicate experiments or run new experiments within the
    framework.

    Preserving Differential Privacy Between Features in Distributed Estimation

    Christina Heinze-Deml, Brian McWilliams, Nicolai Meinshausen
    Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    Privacy is crucial in many applications of machine learning. Legal, ethical
    and societal issues restrict the sharing of sensitive data making it difficult
    to learn from datasets that are partitioned between many parties. One important
    instance of such a distributed setting arises when information about each
    record in the dataset is held by different data owners (the design matrix is
    “vertically-partitioned”). In this setting few approaches exist for private
    data sharing for the purposes of statistical estimation and the classical setup
    of differential privacy with a “trusted curator” preparing the data does not
    apply. We introduce S-differential privacy which extends single-party
    differential privacy to the distributed, vertically-partitioned case. We then
    propose PriDE, a scalable framework for distributed estimation where each party
    communicates perturbed sketches of their locally held features ensuring
    S-differential privacy is preserved. For L2-penalized supervised learning
    problems PriDE has bounded estimation error compared with the optimal estimates
    obtained without privacy constraints in the non-distributed setting. We confirm
    this empirically on real world and synthetic datasets.


    Learning

    OptNet: Differentiable Optimization as a Layer in Neural Networks

    Brandon Amos, J. Zico Kolter
    Comments: Submitted to ICML 2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

    This paper presents OptNet, a network architecture that integrates
    optimization problems (here, specifically in the form of quadratic programs) as
    individual layers in larger end-to-end trainable deep networks. These layers
    allow complex dependencies between the hidden states to be captured that
    traditional convolutional and fully-connected layers are not able to capture.
    In this paper, we develop the foundations for such an architecture: we derive
    the equations to perform exact differentiation through these layers and with
    respect to layer parameters; we develop a highly efficient solver for these
    layers that exploits fast GPU-based batch solves within a primal-dual interior
    point method, and which provides backpropagation gradients with virtually no
    additional cost on top of the solve; and we highlight the application of these
    approaches in several problems. In one particularly standout example, we show
    that the method is capable of learning to play Sudoku given just input and
    output games, with no a priori information about the rules of the game; this
    task is virtually impossible for other neural network architectures that we
    have experimented with, and highlights the representation capabilities of our
    approach.

    Learning to Optimize Neural Nets

    Ke Li, Jitendra Malik
    Comments: 10 pages, 15 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

    Learning to Optimize is a recently proposed framework for learning
    optimization algorithms using reinforcement learning. In this paper, we explore
    learning an optimization algorithm for training shallow neural nets. Such
    high-dimensional stochastic optimization problems present interesting
    challenges for existing reinforcement learning algorithms. We develop an
    extension that is suited to learning optimization algorithms in this setting
    and demonstrate that the learned optimization algorithm consistently
    outperforms other known optimization algorithms even on unseen tasks and is
    robust to changes in stochasticity of gradients and the neural net
    architecture. More specifically, we show that an optimization algorithm trained
    with the proposed method on the problem of training a neural net on MNIST
    generalizes to the problems of training neural nets on the Toronto Faces
    Dataset, CIFAR-10 and CIFAR-100.

    Fast k-Nearest Neighbour Search via Prioritized DCI

    Ke Li, Jitendra Malik
    Comments: 11 pages, 5 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR); Machine Learning (stat.ML)

    Most exact methods for k-nearest neighbour search suffer from the curse of
    dimensionality; that is, their query times exhibit exponential dependence on
    either the ambient or the intrinsic dimensionality. Dynamic Continuous Indexing
    (DCI) offers a promising way of circumventing the curse by avoiding space
    partitioning and achieves a query time that grows sublinearly in the intrinsic
    dimensionality. In this paper, we develop a variant of DCI, which we call
    Prioritized DCI, and show a further improvement in the dependence on the
    intrinsic dimensionality compared to standard DCI, thereby improving the
    performance of DCI on datasets with high intrinsic dimensionality. We also
    demonstrate empirically that Prioritized DCI compares favourably to standard
    DCI and Locality-Sensitive Hashing (LSH) both in terms of running time and
    space consumption at all levels of approximation quality. In particular,
    relative to LSH, Prioritized DCI reduces the number of distance evaluations by
    a factor of 5 to 30 and the space consumption by a factor of 47 to 55.

    The Statistical Recurrent Unit

    Junier B. Oliva, Barnabas Poczos, Jeff Schneider
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Sophisticated gated recurrent neural network architectures like LSTMs and
    GRUs have been shown to be highly effective in a myriad of applications. We
    develop an un-gated unit, the statistical recurrent unit (SRU), that is able to
    learn long term dependencies in data by only keeping moving averages of
    statistics. The SRU’s architecture is simple, un-gated, and contains a
    comparable number of parameters to LSTMs; yet, SRUs perform favorably to more
    sophisticated LSTM and GRU alternatives, often outperforming one or both in
    various tasks. We show the efficacy of SRUs as compared to LSTMs and GRUs in an
    unbiased manner by optimizing respective architectures’ hyperparameters in a
    Bayesian optimization scheme for both synthetic and real-world tasks.

    Personal Model Training under Privacy Constraints

    Sandra Servia-Rodriguez, Liang Wang, Jianxin R. Zhao, Richard Mortier, Hamed Haddadi
    Comments: Databox Project Technical Report
    Subjects: Learning (cs.LG)

    Many current Internet services rely on inferences from models trained on user
    data. Commonly, both the training and inference tasks are carried out using
    cloud resources fed by personal data collected at scale from users. Holding and
    using such large collections of personal data in the cloud creates privacy
    risks to the data subjects, but is currently required for users to benefit from
    such services. We explore how to provide for model training and inference in a
    system where computation is moved to the data in preference to moving data to
    the cloud, obviating many current privacy risks. Specifically, we take an
    initial model learnt from a small set of users and retrain it locally using
    data from a single user. We evaluate on two tasks: one supervised learning
    task, using a neural network to recognise users’ current activity from
    accelerometer traces; and one unsupervised learning task, identifying topics in
    a large set of documents. In both cases the accuracy is improved. We also
    demonstrate the feasibility of our approach by presenting a performance
    evaluation on a representative resource-constrained device (a Raspberry Pi).

    Gradient Boosting on Stochastic Data Streams

    Hanzhang Hu, Wen Sun, Arun Venkatraman, Martial Hebert, J. Andrew Bagnell
    Comments: To appear in AISTATS 2017
    Subjects: Learning (cs.LG)

    Boosting is a popular ensemble algorithm that generates more powerful
    learners by linearly combining base models from a simpler hypothesis class. In
    this work, we investigate the problem of adapting batch gradient boosting for
    minimizing convex loss functions to online setting where the loss at each
    iteration is i.i.d sampled from an unknown distribution. To generalize from
    batch to online, we first introduce the definition of online weak learning edge
    with which for strongly convex and smooth loss functions, we present an
    algorithm, Streaming Gradient Boosting (SGB) with exponential shrinkage
    guarantees in the number of weak learners. We further present an adaptation of
    SGB to optimize non-smooth loss functions, for which we derive a O(ln N/N)
    convergence rate. We also show that our analysis can extend to adversarial
    online learning setting under a stronger assumption that the online weak
    learning edge will hold in adversarial setting. We finally demonstrate
    experimental results showing that in practice our algorithms can achieve
    competitive results as classic gradient boosting while using less computation.

    Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank

    Liang Zhao, Siyu Liao, Yanzhi Wang, Jian Tang, Bo Yuan
    Comments: 13 pages, 1 figure
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Recently low displacement rank (LDR) matrices, or so-called structured
    matrices, have been proposed to compress large-scale neural networks. Empirical
    results have shown that neural networks with weight matrices of LDR matrices,
    referred as LDR neural networks, can achieve significant reduction in space and
    computational complexity while retaining high accuracy. We formally study LDR
    matrices in deep learning. First, we prove the universal approximation property
    of LDR neural networks with a mild condition on the displacement operators. We
    then show that the error bounds of LDR neural networks are as efficient as
    general neural networks with both single-layer and multiple-layer structure.
    Finally, we propose back-propagation based training algorithm for general LDR
    neural networks.

    Dual Iterative Hard Thresholding: From Non-convex Sparse Minimization to Non-smooth Concave Maximization

    Bo Liu, Xiao-Tong Yuan, Lezi Wang, Qingshan Liu, Dimitris N. Metaxas
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Iterative Hard Thresholding (IHT) is a class of projected gradient descent
    methods for optimizing sparsity-constrained minimization models, with the best
    known efficiency and scalability in practice. As far as we know, the existing
    IHT-style methods are designed for sparse minimization in primal form. It
    remains open to explore duality theory and algorithms in such a non-convex and
    NP-hard setting. In this article, we bridge the gap by establishing a duality
    theory for sparsity-constrained minimization with (ell_2)-regularized
    objective and proposing an IHT-style algorithm for dual maximization. Our
    sparse duality theory provides a set of sufficient and necessary conditions
    under which the original NP-hard/non-convex problem can be equivalently solved
    in a dual space. The proposed dual IHT algorithm is a super-gradient method for
    maximizing the non-smooth dual objective. An interesting finding is that the
    sparse recovery performance of dual IHT is invariant to the Restricted Isometry
    Property (RIP), which is required by all the existing primal IHT without
    sparsity relaxation. Moreover, a stochastic variant of dual IHT is proposed for
    large-scale stochastic optimization. Numerical results demonstrate that dual
    IHT algorithms can achieve more accurate model estimation given small number of
    training data and have higher computational efficiency than the
    state-of-the-art primal IHT-style algorithms.

    On the Power of Learning from (k)-Wise Queries

    Vitaly Feldman, Badih Ghazi
    Comments: 32 pages, Appeared in Innovations in Theoretical Computer Science (ITCS) 2017
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS)

    Several well-studied models of access to data samples, including statistical
    queries, local differential privacy and low-communication algorithms rely on
    queries that provide information about a function of a single sample. (For
    example, a statistical query (SQ) gives an estimate of (Ex_{x sim D}[q(x)])
    for any choice of the query function (q) mapping (X) to the reals, where (D) is
    an unknown data distribution over (X).) Yet some data analysis algorithms rely
    on properties of functions that depend on multiple samples. Such algorithms
    would be naturally implemented using (k)-wise queries each of which is
    specified by a function (q) mapping (X^k) to the reals. Hence it is natural to
    ask whether algorithms using (k)-wise queries can solve learning problems more
    efficiently and by how much.

    Blum, Kalai and Wasserman (2003) showed that for any weak PAC learning
    problem over a fixed distribution, the complexity of learning with (k)-wise SQs
    is smaller than the (unary) SQ complexity by a factor of at most (2^k). We show
    that for more general problems over distributions the picture is substantially
    richer. For every (k), the complexity of distribution-independent PAC learning
    with (k)-wise queries can be exponentially larger than learning with
    ((k+1))-wise queries. We then give two approaches for simulating a (k)-wise
    query using unary queries. The first approach exploits the structure of the
    problem that needs to be solved. It generalizes and strengthens (exponentially)
    the results of Blum et al.. It allows us to derive strong lower bounds for
    learning DNF formulas and stochastic constraint satisfaction problems that hold
    against algorithms using (k)-wise queries. The second approach exploits the
    (k)-party communication complexity of the (k)-wise query function.

    Achieving non-discrimination in prediction

    Lu Zhang (1), Yongkai Wu (1), Xintao Wu (1) ((1) University of Arkansas)
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Discrimination-aware classification is receiving an increasing attention in
    the data mining and machine learning fields. The data preprocessing methods for
    constructing a discrimination-free classifier remove discrimination from the
    training data, and learn the classifier from the cleaned data. However, there
    lacks of a theoretical guarantee for the performance of these methods. In this
    paper, we fill this theoretical gap by mathematically bounding the probability
    that the discrimination in predictions is within a given interval in terms of
    the given training data and classifier. In our analysis, we adopt the causal
    model for modeling the mechanisms in data generation, and formally defining
    discrimination in the population, in a dataset, and in the prediction. The
    theoretical results show that the fundamental assumption made by the data
    preprocessing methods is not correct. Finally, we develop a framework for
    constructing a discrimination-free classifier with a theoretical guarantee.

    Provable Optimal Algorithms for Generalized Linear Contextual Bandits

    Lihong Li, Yu Lu, Dengyong Zhou
    Comments: 15 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Contextual bandits are widely used in Internet services from news
    recommendation to advertising. Generalized linear models (logistical regression
    in particular) have demonstrated stronger performance than linear models in
    many applications. However, most theoretical analyses on contextual bandits so
    far are on linear bandits. In this work, we propose an upper confidence bound
    based algorithm for generalized linear contextual bandits, which achieves an
    ( ilde{O}(sqrt{dT})) regret over (T) rounds with (d) dimensional feature
    vectors. This regret matches the minimax lower bound, up to logarithmic terms,
    and improves on the best previous result by a (sqrt{d}) factor, assuming the
    number of arms is fixed. A key component in our analysis is to establish a new,
    sharp finite-sample confidence bound for maximum-likelihood estimates in
    generalized linear models, which may be of independent interest. We also
    analyze a simpler upper confidence bound algorithm, which is useful in
    practice, and prove it to have optimal regret for the certain cases.

    Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization

    Tomoya Murata, Taiji Suzuki
    Comments: 25 pages, 9 figures
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)

    In this paper, we develop a new accelerated stochastic gradient method for
    efficiently solving the convex regularized empirical risk minimization problem
    in mini-batch settings. The use of mini-batches is becoming a golden standard
    in the machine learning community, because mini-batch settings stabilize the
    gradient estimate and can easily make good use of parallel computing. The core
    of our proposed method is the incorporation of our new “double acceleration”
    technique and variance reduction technique. We theoretically analyze our
    proposed method and show that our method much improves the mini-batch
    efficiencies of previous accelerated stochastic methods, and essentially only
    needs size (sqrt{n}) mini-batches for achieving the optimal iteration
    complexities for both non-strongly and strongly convex objectives, where (n) is
    the training set size. Further, we show that even in non-mini-batch settings,
    our method surpasses the best known convergence rate for non-strongly convex
    objectives, and it achieves the one for strongly convex objectives.

    Virtual-to-real Deep Reinforcement Learning: Continuous Control of Mobile Robots for Mapless Navigation

    Lei Tai, Giuseppe Paolo, Ming Liu
    Comments: 8 pages, 9 figures, under review, video: this https URL
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Deep Reinforcement Learning has been successful in various virtual tasks, but
    it is still rarely used in real world applications especially for continuous
    control of mobile robots navigation. In this paper, we present a learning-based
    mapless motion planner by taking the 10-dimensional range findings and the
    target position as input and the continuous steering commands as output.
    Traditional motion planners for mobile ground robots with a laser range sensor
    mostly depend on the map of the navigation environment where both the highly
    precise laser sensor and the map building work of the environment are
    indispensable. We show that, through an asynchronous deep reinforcement
    learning method, a mapless motion planner can be trained end-to-end without any
    manually designed features and prior demonstrations. The trained planner can be
    directly applied in unseen virtual and real environments. We also evaluated
    this learning-based motion planner and compared it with the traditional motion
    planning method, both in virtual and real environments. The experiments show
    that the proposed mapless motion planner can navigate the nonholonomic mobile
    robot to the desired targets without colliding with any obstacles.

    Detecting Adversarial Samples from Artifacts

    Reuben Feinman, Ryan R. Curtin, Saurabh Shintre, Andrew B. Gardner
    Comments: Submitted to ICML 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Deep neural networks (DNNs) are powerful nonlinear architectures that are
    known to be robust to random perturbations of the input. However, these models
    are vulnerable to adversarial perturbations–small input changes crafted
    explicitly to fool the model. In this paper, we ask whether a DNN can
    distinguish adversarial samples from their normal and noisy counterparts. We
    investigate model confidence on adversarial samples by looking at Bayesian
    uncertainty estimates, available in dropout neural networks, and by performing
    density estimation in the subspace of deep features learned by the model. The
    result is a method for implicit adversarial detection that is oblivious to the
    attack algorithm. We evaluate this method on a variety of standard datasets
    including MNIST and CIFAR-10 and show that it generalizes well across different
    architectures and attacks. Our findings report that 85-92% ROC-AUC can be
    achieved on a number of standard classification tasks with a negative class
    that consists of both normal and noisy samples.

    Preserving Differential Privacy Between Features in Distributed Estimation

    Christina Heinze-Deml, Brian McWilliams, Nicolai Meinshausen
    Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    Privacy is crucial in many applications of machine learning. Legal, ethical
    and societal issues restrict the sharing of sensitive data making it difficult
    to learn from datasets that are partitioned between many parties. One important
    instance of such a distributed setting arises when information about each
    record in the dataset is held by different data owners (the design matrix is
    “vertically-partitioned”). In this setting few approaches exist for private
    data sharing for the purposes of statistical estimation and the classical setup
    of differential privacy with a “trusted curator” preparing the data does not
    apply. We introduce S-differential privacy which extends single-party
    differential privacy to the distributed, vertically-partitioned case. We then
    propose PriDE, a scalable framework for distributed estimation where each party
    communicates perturbed sketches of their locally held features ensuring
    S-differential privacy is preserved. For L2-penalized supervised learning
    problems PriDE has bounded estimation error compared with the optimal estimates
    obtained without privacy constraints in the non-distributed setting. We confirm
    this empirically on real world and synthetic datasets.

    Graph-based Isometry Invariant Representation Learning

    Renata Khasanova, Pascal Frossard
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Learning transformation invariant representations of visual data is an
    important problem in computer vision. Deep convolutional networks have
    demonstrated remarkable results for image and video classification tasks.
    However, they have achieved only limited success in the classification of
    images that undergo geometric transformations. In this work we present a novel
    Transformation Invariant Graph-based Network (TIGraNet), which learns
    graph-based features that are inherently invariant to isometric transformations
    such as rotation and translation of input images. In particular, images are
    represented as signals on graphs, which permits to replace classical
    convolution and pooling layers in deep networks with graph spectral convolution
    and dynamic graph pooling layers that together contribute to invariance to
    isometric transformations. Our experiments show high performance on rotated and
    translated images from the test set compared to classical architectures that
    are very sensitive to transformations in the data. The inherent invariance
    properties of our framework provide key advantages, such as increased
    resiliency to data variability and sustained performance with limited training
    sets.

    L(^3)-SVMs: Landmarks-based Linear Local Support Vectors Machines

    Valentina Zantedeschi, Rémi Emonet, Marc Sebban
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    For their ability to capture non-linearities in the data and to scale to
    large training sets, local Support Vector Machines (SVMs) have received a
    special attention during the past decade. In this paper, we introduce a new
    local SVM method, called L(^3)-SVMs, which clusters the input space, carries
    out dimensionality reduction by projecting the data on landmarks, and jointly
    learns a linear combination of local models. Simple and effective, our
    algorithm is also theoretically well-founded. Using the framework of Uniform
    Stability, we show that our SVM formulation comes with generalization
    guarantees on the true risk. The experiments based on the simplest
    configuration of our model (i.e. landmarks randomly selected, linear
    projection, linear kernel) show that L(^3)-SVMs is very competitive w.r.t. the
    state of the art and opens the door to new exciting lines of research.

    Modular Representation of Layered Neural Networks

    Chihiro Watanabe, Kaoru Hiramatsu, Kunio Kashino
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Deep neural networks have greatly improved the performance of various
    applications including image processing, speech recognition, natural language
    processing, and bioinformatics. However, it is still difficult to discover or
    interpret knowledge from the inference provided by a deep neural network, since
    its internal representation has many nonlinear and complex parameters embedded
    in hierarchical layers. Therefore, it becomes important to establish a new
    methodology by which deep neural networks can be understood.

    In this paper, we propose a new method for extracting a global and simplified
    structure from a layered neural network. Based on network analysis, the
    proposed method detects communities or clusters of units with similar
    connection patterns. We show its effectiveness by applying it to three use
    cases. (1) Network decomposition: it can decompose a trained neural network
    into multiple small independent networks thus dividing the problem and reducing
    the computation time. (2) Training assessment: the appropriateness of a trained
    result with a given hyperparameter or randomly chosen initial parameters can be
    evaluated by using a modularity index. And (3) data analysis: in practical data
    it reveals the community structure in the input, hidden, and output layers,
    which serves as a clue for discovering knowledge from a trained neural network.

    Easy over Hard: A Case Study on Deep Learning

    Wei Fu, Tim Menzies
    Comments: 12 pages, 6 figures, submitted to FSE2017
    Subjects: Software Engineering (cs.SE); Learning (cs.LG)

    While deep learning is an exciting new technique, the benefits of this method
    need to be assessed w.r.t. its computational cost. This is particularly
    important for deep learning since these learners need hours (to weeks) to train
    the model. Such long CPU times limit the ability of (a) a researcher to test
    the stability of their conclusion via repeated runs with different random
    seeds; and (b)other researchers to repeat, improve, or even refute that
    original work. For example, recently, deep learning was used to find which
    questions in the Stack Overflow programmer discussion forum can be linked
    together. That system took 14 hours to execute. We show here that a very simple
    optimizer called DE (differential evolution) can achieve similar (and sometimes
    better). The DE approach terminated in 10 minutes; i.e. 84 times faster hours
    than deep learning. We offer these results as a cautionary tale to the software
    analytics community and suggest that not every new innovation should be applied
    without critical analysis. If researchers deploy some new and expensive
    process, that work should be baselined against some simpler and faster
    alternatives.

    Revisiting Unsupervised Learning for Defect Prediction

    Wei Fu, Tim Menzies
    Comments: 11 pages, 5 figures. Submitted to FSE2017
    Subjects: Software Engineering (cs.SE); Learning (cs.LG)

    Collecting quality data from software projects can be time-consuming and
    expensive. Hence, some researchers explore “unsupervised” approaches to quality
    prediction that does not require labelled data. An alternate technique is to
    use “supervised” approaches that learn models from project data labelled with,
    say, “defective” or “not-defective”.Most researchers use these supervised
    models since, it is argued, they can exploit more knowledge of the projects. At
    FSE’16, Yang et al. reported startling results where unsupervised defect
    predictors outperformed supervised predictors for effort-aware just-in-time
    defect prediction. If confirmed, these results would lead to a dramatic
    simplification of a seemingly complex task (data mining) that is widely
    explored in the SE literature. This paper repeats and refutes those results as
    follows. (1)There is much variability in the efficacy of the Yang et al. models
    so even with their approach, some supervised data is required to prune weaker
    models. (2)Their findings were grouped across (N) projects. When we repeat
    their analysis on a project-by-project basis, supervised predictors are seen to
    work better. Even though this paper rejects the specific conclusions of Yang et
    al., we still endorse their general goal. In our our experiments,
    supervisedpredictors did not perform outstandingly better than unsupervised
    ones. Hence, they may indeed be some combination of unsupervisedlearners to
    achieve comparable performance to supervised. We therefore encourage others to
    work in this promising area.

    SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient

    Lam Nguyen, Jie Liu, Katya Scheinberg, Martin Takáč
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)

    In this paper, we propose a StochAstic Recursive grAdient algoritHm (SARAH),
    as well as its practical variant SARAH+, as a novel approach to the finite-sum
    minimization problems. Different from the vanilla SGD and other modern
    stochastic methods such as SVRG, S2GD, SAG and SAGA, SARAH admits a simple
    recursive framework for updating stochastic gradient estimates; when comparing
    to SAG/SAGA, SARAH does not require a storage of past gradients. The linear
    convergence rate of SARAH is proven under strong convexity assumption. We also
    prove a linear convergence rate (in the strongly convex case) for an inner loop
    of SARAH, the property that SVRG does not possess. Numerical experiments
    demonstrate the efficiency of our algorithm.

    Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

    Hairong Liu, Zhenyao Zhu, Xiangang Li, Sanjeev Satheesh
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Most existing sequence labelling models rely on a fixed decomposition of a
    target sequence into a sequence of basic units. These methods suffer from two
    major drawbacks: 1) the set of basic units is fixed, such as the set of words,
    characters or phonemes in speech recognition, and 2) the decomposition of
    target sequences is fixed. These drawbacks usually result in sub-optimal
    performance of modeling sequences. In this pa- per, we extend the popular CTC
    loss criterion to alleviate these limitations, and propose a new loss function
    called Gram-CTC. While preserving the advantages of CTC, Gram-CTC automatically
    learns the best set of basic units (grams), as well as the most suitable
    decomposition of tar- get sequences. Unlike CTC, Gram-CTC allows the model to
    output variable number of characters at each time step, which enables the model
    to capture longer term dependency and improves the computational efficiency. We
    demonstrate that the proposed Gram-CTC improves CTC in terms of both
    performance and efficiency on the large vocabulary speech recognition task at
    multiple scales of data, and that with Gram-CTC we can outperform the
    state-of-the-art on a standard speech benchmark.

    Multi-Sensor Data Pattern Recognition for Multi-Target Localization: A Machine Learning Approach

    Kasthurirengan Suresh, Samuel Silva, Johnathan Votion, Yongcan Cao
    Comments: submitted for conference publication
    Subjects: Systems and Control (cs.SY); Learning (cs.LG); Machine Learning (stat.ML)

    Data-target pairing is an important step towards multi-target localization
    for the intelligent operation of unmanned systems. Target localization plays a
    crucial role in numerous applications, such as search, and rescue missions,
    traffic management and surveillance. The objective of this paper is to present
    an innovative target location learning approach, where numerous machine
    learning approaches, including K-means clustering and supported vector machines
    (SVM), are used to learn the data pattern across a list of spatially
    distributed sensors. To enable the accurate data association from different
    sensors for accurate target localization, appropriate data pre-processing is
    essential, which is then followed by the application of different machine
    learning algorithms to appropriately group data from different sensors for the
    accurate localization of multiple targets. Through simulation examples, the
    performance of these machine learning algorithms is quantified and compared.

    A description length approach to determining the number of k-means clusters

    Hiromitsu Mizutani (1), Ryota Kanai (1) ((1) Araya Inc.)
    Comments: 27 pages, 6 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We present an asymptotic criterion to determine the optimal number of
    clusters in k-means. We consider k-means as data compression, and propose to
    adopt the number of clusters that minimizes the estimated description length
    after compression. Here we report two types of compression ratio based on two
    ways to quantify the description length of data after compression. This
    approach further offers a way to evaluate whether clusters obtained with
    k-means have a hierarchical structure by examining whether multi-stage
    compression can further reduce the description length. We applied our criteria
    to determine the number of clusters to synthetic data and empirical
    neuroimaging data to observe the behavior of the criteria across different
    types of data set and suitability of the two types of criteria for different
    datasets. We found that our method can offer reasonable clustering results that
    are useful for dimension reduction. While our numerical results revealed
    dependency of our criteria on the various aspects of dataset such as the
    dimensionality, the description length approach proposed here provides a useful
    guidance to determine the number of clusters in a principled manner when
    underlying properties of the data are unknown and only inferred from
    observation of data.

    Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-tuning

    Weifeng Ge, Yizhou Yu
    Comments: To appear in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017)
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Deep neural networks require a large amount of labeled training data during
    supervised learning. However, collecting and labeling so much data might be
    infeasible in many cases. In this paper, we introduce a source-target selective
    joint fine-tuning scheme for improving the performance of deep learning tasks
    with insufficient training data. In this scheme, a target learning task with
    insufficient training data is carried out simultaneously with another source
    learning task with abundant training data. However, the source learning task
    does not use all existing training data. Our core idea is to identify and use a
    subset of training images from the original source learning task whose
    low-level characteristics are similar to those from the target learning task,
    and jointly fine-tune shared convolutional layers for both tasks. Specifically,
    we compute descriptors from linear or nonlinear filter bank responses on
    training images from both tasks, and use such descriptors to search for a
    desired subset of training samples for the source learning task.

    Experiments demonstrate that our selective joint fine-tuning scheme achieves
    state-of-the-art performance on multiple visual classification tasks with
    insufficient training data for deep learning. Such tasks include Caltech 256,
    MIT Indoor 67, Oxford Flowers 102 and Stanford Dogs 120. In comparison to
    fine-tuning without a source domain, the proposed method can improve the
    classification accuracy by 2% – 10% using a single model.


    Information Theory

    Design and Analysis of Time-Invariant SC-LDPC codes with Small Constraint Length

    Massimo Battaglioni, Alireza Tasdighi, Giovanni Cancellieri, Franco Chiaraluce, Marco Baldi
    Comments: 30 pages, 6 figures, submitted
    Subjects: Information Theory (cs.IT)

    In this paper, we deal with time-invariant low-density parity-check
    convolutional (LDPCC) codes, which are a subclass of spatially coupled
    low-density parity-check (SC-LDPC) codes. Classic design approaches usually
    start from quasi-cyclic (QC) low-density parity-check (LDPC) block codes and
    exploit suitable unwrapping procedures to obtain LDPCC codes. We show that the
    direct design of the LDPCC code syndrome former matrix or, equivalently, the
    symbolic parity-check matrix, leads to codes with smaller syndrome former
    constraint lengths with respect to the best solutions available in the
    literature. We provide theoretical lower bounds on the syndrome former
    constraint length for the most relevant families of LDPCC codes, under
    constraints on the minimum length of local cycles in their Tanner graphs. We
    also propose new code design techniques that approach or achieve such
    theoretical limits.

    Lower Bounds on Exponential Moments of the Quadratic Error in Parameter Estimation

    Neri Merhav
    Comments: 28 pages; 4 figures; submitted for publication
    Subjects: Information Theory (cs.IT)

    Considering the problem of risk-sensitive parameter estimation, we propose a
    fairly wide family of lower bounds on the exponential moments of the quadratic
    error, both in the Bayesian and the non–Bayesian regime. This family of
    bounds, which is based on a change of measures, offers considerable freedom in
    the choice of the reference measure, and our efforts are devoted to explore
    this freedom to a certain extent. Our focus is mostly on signal models that are
    relevant to communication problems, namely, models of a parameter-dependent
    signal (modulated signal) corrupted by additive white Gaussian noise, but the
    methodology proposed is also applicable to other types of parametric families,
    such as models of linear systems driven by random input signals (white noise,
    in most cases), and others. In addition to the well known motivations of the
    risk-sensitive cost function (i.e., the exponential quadratic cost function),
    which is most notably, the robustness to model uncertainty, we also view this
    cost function as a tool for studying fundamental limits concerning the tail
    behavior of the estimation error. Another interesting aspect, that we
    demonstrate in a certain parametric model, is that the risk-sensitive cost
    function may be subjected to phase transitions, owing to some analogies with
    statistical mechanics.

    5G Mobile Cellular Networks: Enabling Distributed State Estimation for Smart Grid

    Mirsad Cosovic, Achilleas Tsitsimelis, Dejan Vukobratovic, Javier Matamoros, Carles Anton-Haro
    Comments: 8 pages, 6 figures, version of the journal paper submitted for publication
    Subjects: Information Theory (cs.IT)

    With transition towards 5G, mobile cellular networks are evolving into a
    powerful platform for ubiquitous large-scale information acquisition,
    communication, storage and processing. 5G will provide suitable services for
    mission-critical and real-time applications such as the ones envisioned in
    future Smart Grids. In this work, we show how emerging 5G mobile cellular
    network, with its evolution of Machine-Type Communications and the concept of
    Mobile Edge Computing, provides an adequate environment for distributed
    monitoring and control tasks in Smart Grids. Furthermore, we present in detail
    how Smart Grids could benefit from advanced distributed State Estimation
    methods placed within 5G environment. We present an overview of emerging
    distributed State Estimation solutions, focusing on those based on distributed
    optimization and probabilistic graphical models, and investigate their
    integration as part of the future 5G Smart Grid services.

    Robust Beamforming for Secrecy Rate in Cooperative Cognitive Radio Multicast Communications

    Van-Dinh Nguyen, Trung Q. Duong, Oh-Soon Shin, Arumugam Nallanathan, George K. Karagiannidis
    Comments: 6 pages, 4 figures, IEEE ICC 2017
    Subjects: Information Theory (cs.IT)

    In this paper, we propose a cooperative approach to improve the security of
    both primary and secondary systems in cognitive radio multicast communications.
    During their access to the frequency spectrum licensed to the primary users,
    the secondary unlicensed users assist the primary system in fortifying security
    by sending a jamming noise to the eavesdroppers, while simultaneously protect
    themselves from eavesdropping. The main objective of this work is to maximize
    the secrecy rate of the secondary system, while adhering to all individual
    primary users’ secrecy rate constraints. In the case of passive eavesdroppers
    and imperfect channel state information knowledge at the transceivers, the
    utility function of interest is nonconcave and involved constraints are
    nonconvex, and thus, the optimal solutions are troublesome. To address this
    problem, we propose an iterative algorithm to arrive at a local optimum of the
    considered problem. The proposed iterative algorithm is guaranteed to achieve a
    Karush-Kuhn-Tucker solution.

    Codebook Design for Channel Feedback in Lens-Based Millimeter-Wave Massive MIMO Systems

    Wenqian Shen, Linglong Dai, Yang Yang, Yue Li, Zhaocheng Wang
    Comments: Submitted to SPL for publication
    Subjects: Information Theory (cs.IT)

    The number of radio frequency (RF) chains can be reduced through beam
    selection in lens-based millimeter-wave (mmWave) massive MIMO systems, where
    the equivalent channel between RF chains and multiple users is required at the
    BS to achieve the multi-user multiplexing gain. However, to the best of our
    knowledge, there is no dedicated codebook for the equivalent channel feedback
    in such systems. In this paper, we propose the dimension-reduced subspace
    codebook, which achieves a significant reduction of the feedback overhead and
    codebook size. Specifically, we firstly utilize the limited scattering property
    of mmWave channels to generate the high-dimensional vectors in the channel
    subspace. Then, according to the function of lens and beam selector, we propose
    the dimension-reduced subspace codebook to quantize the equivalent channel
    vector.Moreover, the performance analysis of the proposed codebook is also
    provided.Finally, simulation results show the superior performance of the
    proposed dimension-reduced subspace codebook compared with conventional
    codebooks.

    Joint Beamforming and Antenna Selection for Sum Rate Maximization in Cognitive Radio Networks

    Van-Dinh Nguyen (Student Member, IEEE), Chuyen T. Nguyen, Hieu V. Nguyen, Oh-Soon Shin (Member, IEEE)
    Comments: 4 pages, 3 figures
    Subjects: Information Theory (cs.IT)

    This letter studies joint transmit beamforming and antenna selection at a
    secondary base station (BS) with multiple primary users (PUs) in an underlay
    cognitive radio multiple-input single-output broadcast channel. The objective
    is to maximize the sum rate subject to the secondary BS transmit power, minimum
    required rates for secondary users, and PUs’ interference power constraints.
    The utility function of interest is nonconcave and the involved constraints are
    nonconvex, so this problem is hard to solve. Nevertheless, we propose a new
    iterative algorithm that finds local optima at the least. We use an inner
    approximation method to construct and solve a simple convex quadratic program
    of moderate dimension at each iteration of the proposed algorithm. Simulation
    results indicate that the proposed algorithm converges quickly and outperforms
    existing approaches.

    Repair Strategies for Storage on Mobile Clouds

    Gokhan Calis, Swetha Shivaramaiah, O. Ozan Koyluoglu, Loukas Lazos
    Comments: 23 pages, 11 figures
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    We study the data reliability problem for a community of devices forming a
    mobile cloud storage system. We consider the application of regenerating codes
    for file maintenance within a geographically-limited area. Such codes require
    lower bandwidth to regenerate lost data fragments compared to file replication
    or reconstruction. We investigate threshold-based repair strategies where data
    repair is initiated after a threshold number of data fragments have been lost
    due to node mobility. We show that at a low departure-to-repair rate regime, a
    lazy repair strategy in which repairs are initiated after several nodes have
    left the system outperforms eager repair in which repairs are initiated after a
    single departure. This optimality is reversed when nodes are highly mobile. We
    further compare distributed and centralized repair strategies and derive the
    optimal repair threshold for minimizing the average repair cost per unit of
    time, as a function of underlying code parameters. In addition, we examine
    cooperative repair strategies and show performance improvements compared to
    non-cooperative codes. We investigate several models for the time needed for
    node repair including a simple fixed time model that allows for the computation
    of closed-form expressions and a more realistic model that takes into account
    the number of repaired nodes. We derive the conditions under which the former
    model approximates the latter. Finally, an extended model where additional
    failures are allowed during the repair process is investigated. Overall, our
    results establish the joint effect of code design and repair algorithms on the
    maintenance cost of distributed storage systems.

    Sequence of purchases in credit card data reveal life styles in urban populations

    Riccardo Di Clemente, Miguel Luengo-Oroz, Matias Travizano, Bapu Vaitla, Marta C. Gonzalez
    Comments: 28 pages, 15 figures
    Subjects: Physics and Society (physics.soc-ph); Information Theory (cs.IT); Social and Information Networks (cs.SI); Applications (stat.AP)

    From our most basic consumption to secondary needs, our spending habits
    reflect our life styles. Yet, in computational social sciences there is an open
    question about the existence of ubiquitous trends in spending habits by various
    groups at urban scale. Limited information collected by expenditure surveys
    have not proven conclusive in this regard. This is because, the frequency of
    purchases by type is highly uneven and follows a Zipf-like distribution. In
    this work, we apply text compression techniques to the purchase codes of credit
    card data to detect the significant sequences of transactions of each user.
    Five groups of consumers emerge when grouped by their similarity based on these
    sequences. Remarkably, individuals in each consumer group are also similar in
    age, total expenditure, gender, and the diversity of their social and mobility
    networks extracted by their mobile phone records. By properly deconstructing
    transaction data with Zipf-like distributions, we find that it can give us
    insights on collective behavior.




沪ICP备19023445号-2号
友情链接