IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Thu, 13 Apr 2017

    我爱机器学习(52ml.net)发表于 2017-04-13 00:00:00
    love 0

    Neural and Evolutionary Computing

    Approximating Optimization Problems using EAs on Scale-Free Networks

    Ankit Chauhan, Tobias Friedrich, Francesco Quinzan
    Comments: 21 pages, 3 figures, 2 tables and Accepted at GECCO’17
    Subjects: Neural and Evolutionary Computing (cs.NE); Data Structures and Algorithms (cs.DS); Social and Information Networks (cs.SI)

    It has been experimentally observed that real-world networks follow certain
    topological properties, such as small-world, power-law etc. To study these
    networks, many random graph models, such as Preferential Attachment, have been
    proposed.

    In this paper, we consider the deterministic properties which capture
    power-law degree distribution and degeneracy. Networks with these properties
    are known as scale-free networks in the literature. Many interesting problems
    remain NP-hard on scale-free networks. We study the relationship between
    scale-free properties and the approximation ratio of some commonly used
    evolutionary algorithms.

    For the Vertex Cover, we observe experimentally that the (1+1) EA always
    gives the better result than a greedy local search, even when it runs for only
    (O(n log n)) steps. We give the construction of a scale-free network in which
    a multi-objective algorithm and a greedy algorithm obtain optimal solutions,
    while the (1+1) EA obtains the worst possible solution with constant
    probability.

    We prove that for the Dominating Set, Vertex Cover, Connected Dominating Set
    and Independent Set, the (1+1) EA obtains constant-factor approximation in
    expected run time (O(n log n)) and (O(n^4)) respectively. Whereas, GSEMO gives
    even better approximation than (1+1) EA in expected run time (O(n^3)) for
    Dominating Set, Vertex Cover and Connected Dominating Set on such networks.

    Improving Fitness Functions in Genetic Programming for Classification on Unbalanced Credit Card Datasets

    Van Loi Cao, Nhien-An Le-Khac, Miguel Nicolau, Michael ONeill, James McDermott
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Credit card fraud detection based on machine learning has recently attracted
    considerable interest from the research community. One of the most important
    tasks in this area is the ability of classifiers to handle the imbalance in
    credit card data. In this scenario, classifiers tend to yield poor accuracy on
    the fraud class (minority class) despite realizing high overall accuracy. This
    is due to the influence of the majority class on traditional training criteria.
    In this paper, we aim to apply genetic programming to address this issue by
    adapting existing fitness functions. We examine two fitness functions from
    previous studies and develop two new fitness functions to evolve GP classifier
    with superior accuracy on the minority class and overall. Two UCI credit card
    datasets are used to evaluate the effectiveness of the proposed fitness
    functions. The results demonstrate that the proposed fitness functions augment
    GP classifiers, encouraging fitter solutions on both the minority and the
    majority classes.

    A Neural Representation of Sketch Drawings

    David Ha, Douglas Eck
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

    We present sketch-rnn, a recurrent neural network (RNN) able to construct
    stroke-based drawings of common objects. The model is trained on thousands of
    crude human-drawn images representing hundreds of classes. We outline a
    framework for conditional and unconditional sketch generation, and describe new
    robust training methods for generating coherent sketch drawings in a vector
    format.

    Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

    Timo Hackel, Nikolay Savinov, Lubor Ladicky, Jan D. Wegner, Konrad Schindler, Marc Pollefeys
    Comments: Accepted to ISPRS Annals. The benchmark website is available at this http URL . The baseline code is available at this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

    This paper presents a new 3D point cloud classification benchmark data set
    with over four billion manually labelled points, meant as input for data-hungry
    (deep) learning methods. We also discuss first submissions to the benchmark
    that use deep convolutional neural networks (CNNs) as a work horse, which
    already show remarkable performance improvements over state-of-the-art. CNNs
    have become the de-facto standard for many tasks in computer vision and machine
    learning like semantic segmentation or object detection in images, but have no
    yet led to a true breakthrough for 3D point cloud labelling tasks due to lack
    of training data. With the massive data set presented in this paper, we aim at
    closing this data gap to help unleash the full potential of deep learning
    methods for 3D labelling tasks. Our semantic3D.net data set consists of dense
    point clouds acquired with static terrestrial laser scanners. It contains 8
    semantic classes and covers a wide range of urban outdoor scenes: churches,
    streets, railroad tracks, squares, villages, soccer fields and castles. We
    describe our labelling interface and show that our data set provides more dense
    and complete point clouds with much higher overall number of labelled points
    compared to those already available to the research community. We further
    provide baseline method descriptions and comparison between methods submitted
    to our online system. We hope semantic3D.net will pave the way for deep
    learning methods in 3D point cloud labelling to learn richer, more general 3D
    representations, and first submissions after only a few months indicate that
    this might indeed be the case.


    Computer Vision and Pattern Recognition

    Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

    Timo Hackel, Nikolay Savinov, Lubor Ladicky, Jan D. Wegner, Konrad Schindler, Marc Pollefeys
    Comments: Accepted to ISPRS Annals. The benchmark website is available at this http URL . The baseline code is available at this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

    This paper presents a new 3D point cloud classification benchmark data set
    with over four billion manually labelled points, meant as input for data-hungry
    (deep) learning methods. We also discuss first submissions to the benchmark
    that use deep convolutional neural networks (CNNs) as a work horse, which
    already show remarkable performance improvements over state-of-the-art. CNNs
    have become the de-facto standard for many tasks in computer vision and machine
    learning like semantic segmentation or object detection in images, but have no
    yet led to a true breakthrough for 3D point cloud labelling tasks due to lack
    of training data. With the massive data set presented in this paper, we aim at
    closing this data gap to help unleash the full potential of deep learning
    methods for 3D labelling tasks. Our semantic3D.net data set consists of dense
    point clouds acquired with static terrestrial laser scanners. It contains 8
    semantic classes and covers a wide range of urban outdoor scenes: churches,
    streets, railroad tracks, squares, villages, soccer fields and castles. We
    describe our labelling interface and show that our data set provides more dense
    and complete point clouds with much higher overall number of labelled points
    compared to those already available to the research community. We further
    provide baseline method descriptions and comparison between methods submitted
    to our online system. We hope semantic3D.net will pave the way for deep
    learning methods in 3D point cloud labelling to learn richer, more general 3D
    representations, and first submissions after only a few months indicate that
    this might indeed be the case.

    Connecting Look and Feel: Associating the visual and tactile properties of physical materials

    Wenzhen Yuan, Shaoxiong Wang, Siyuan Dong, Edward Adelson
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    For machines to interact with the physical world, they must understand the
    physical properties of objects and materials they encounter. We use fabrics as
    an example of a deformable material with a rich set of mechanical properties. A
    thin flexible fabric, when draped, tends to look different from a heavy stiff
    fabric. It also feels different when touched. Using a collection of 118 fabric
    sample, we captured color and depth images of draped fabrics along with tactile
    data from a high resolution touch sensor. We then sought to associate the
    information from vision and touch by jointly training CNNs across the three
    modalities. Through the CNN, each input, regardless of the modality, generates
    an embedding vector that records the fabric’s physical property. By comparing
    the embeddings, our system is able to look at a fabric image and predict how it
    will feel, and vice versa. We also show that a system jointly trained on vision
    and touch data can outperform a similar system trained only on visual data when
    tested purely with visual inputs.

    Attention-Set based Metric Learning for Video Face Recognition

    Yibo Hu, Xiang Wu, Ran He
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Face recognition has made great progress with the development of deep
    learning. However, video face recognition (VFR) is still an ongoing task due to
    various illumination, low-resolution, pose variations and motion blur. Most
    existing CNN-based VFR methods only obtain a feature vector from a single image
    and simply aggregate the features in a video, which less consider the
    correlations of face images in one video. In this paper, we propose a novel
    Attention-Set based Metric Learning (ASML) method to measure the statistical
    characteristics of image sets. It is a promising and generalized extension of
    Maximum Mean Discrepancy with memory attention weighting. First, we define an
    effective distance metric on image sets, which explicitly minimizes the
    intra-set distance and maximizes the inter-set distance simultaneously. Second,
    inspired by Neural Turing Machine, a Memory Attention Weighting is proposed to
    adapt set-aware global contents. Then ASML is naturally integrated into CNNs,
    resulting in an end-to-end learning scheme. Our method achieves
    state-of-the-art performance for the task of video face recognition on the
    three widely used benchmarks including YouTubeFace, YouTube Celebrities and
    Celebrity-1000.

    Ensemble classifier approach in breast cancer detection and malignancy grading- A review

    Deepti Ameta
    Comments: 10 pages,1 figure,5 tables
    Journal-ref: International Journal of Managing Public Sector Information and
    Communication Technologies (IJMPICT) Vol. 8, No. 1, March 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The diagnosed cases of Breast cancer is increasing annually and unfortunately
    getting converted into a high mortality rate. Cancer, at the early stages, is
    hard to detect because the malicious cells show similar properties (density) as
    shown by the non-malicious cells. The mortality ratio could have been minimized
    if the breast cancer could have been detected in its early stages. But the
    current systems have not been able to achieve a fully automatic system which is
    not just capable of detecting the breast cancer but also can detect the stage
    of it. Estimation of malignancy grading is important in diagnosing the degree
    of growth of malicious cells as well as in selecting a proper therapy for the
    patient. Therefore, a complete and efficient clinical decision support system
    is proposed which is capable of achieving breast cancer malignancy grading
    scheme very efficiently. The system is based on Image processing and machine
    learning domains. Classification Imbalance problem, a machine learning problem,
    occurs when instances of one class is much higher than the instances of the
    other class resulting in an inefficient classification of samples and hence a
    bad decision support system. Therefore EUSBoost, ensemble based classifier is
    proposed which is efficient and is able to outperform other classifiers as it
    takes the benefits of both-boosting algorithm with Random Undersampling
    techniques. Also comparison of EUSBoost with other techniques is shown in the
    paper.

    Unsupervised part learning for visual recognition

    Ronan Sicre, Yannis Avrithis, Ewa Kijak, Frederic Jurie
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

    Part-based image classification aims at representing categories by small sets
    of learned discriminative parts, upon which an image representation is built.
    Considered as a promising avenue a decade ago, this direction has been
    neglected since the advent of deep neural networks. In this context, this paper
    brings two contributions: first, it shows that despite the recent success of
    end-to-end holistic models, explicit part learning can boosts classification
    performance. Second, this work proceeds one step further than recent part-based
    models (PBM), focusing on how to learn parts without using any labeled data.
    Instead of learning a set of parts per class, as generally done in the PBM
    literature, the proposed approach both constructs a partition of a given set of
    images into visually similar groups, and subsequently learn a set of
    discriminative parts per group in a fully unsupervised fashion. This strategy
    opens the door to the use of PBM in new applications for which the notion of
    image categories is irrelevant, such as instance-based image retrieval, for
    example. We experimentally show that our learned parts can help building
    efficient image representations, for classification as well as for indexing
    tasks, resulting in performance superior to holistic state-of-the art Deep
    Convolutional Neural Networks (DCNN) encoding.

    Unsupervised Construction of Human Body Models Using Principles of Organic Computing

    Thomas Walther, Rolf P. Würtz
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Unsupervised learning of a generalizable model of the visual appearance of
    humans from video data is of major importance for computing systems interacting
    naturally with their users and others. We propose a step towards automatic
    behavior understanding by integrating principles of Organic Computing into the
    posture estimation cycle, thereby relegating the need for human intervention
    while simultaneously raising the level of system autonomy. The system extracts
    coherent motion from moving upper bodies and autonomously decides about limbs
    and their possible spatial relationships. The models from many videos are
    integrated into meta-models, which show good generalization to different
    individuals, backgrounds, and attire. These models allow robust interpretation
    of single video frames without temporal continuity and posture mimicking by an
    android robot.

    Object proposal generation applying the distance dependent Chinese restaurant process

    Mikko Lauri, Simone Frintrop
    Comments: To appear at Scandinavian Conference on Image Analysis (SCIA) 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In application domains such as robotics, it is useful to represent the
    uncertainty related to the robot’s belief about the state of its environment.
    Algorithms that only yield a single “best guess” as a result are not
    sufficient. In this paper, we propose object proposal generation based on
    non-parametric Bayesian inference that allows quantification of the likelihood
    of the proposals. We apply Markov chain Monte Carlo to draw samples of image
    segmentations via the distance dependent Chinese restaurant process. Our method
    achieves state-of-the-art performance on an indoor object discovery data set,
    while additionally providing a likelihood term for each proposal. We show that
    the likelihood term can effectively be used to rank proposals according to
    their quality.

    Dilated Convolutional Neural Networks for Cardiovascular MR Segmentation in Congenital Heart Disease

    Jelmer M. Wolterink, Tim Leiner, Max A. Viergever, Ivana Išgum
    Journal-ref: RAMBO 2016, HVSMR 2016. LNCS 10129. pp. 95-102
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose an automatic method using dilated convolutional neural networks
    (CNNs) for segmentation of the myocardium and blood pool in cardiovascular MR
    (CMR) of patients with congenital heart disease (CHD).

    Ten training and ten test CMR scans cropped to an ROI around the heart were
    provided in the MICCAI 2016 HVSMR challenge. A dilated CNN with a receptive
    field of 131×131 voxels was trained for myocardium and blood pool segmentation
    in axial, sagittal and coronal image slices. Performance was evaluated within
    the HVSMR challenge.

    Automatic segmentation of the test scans resulted in Dice indices of
    0.80(pm)0.06 and 0.93(pm)0.02, average distances to boundaries of
    0.96(pm)0.31 and 0.89(pm)0.24 mm, and Hausdorff distances of 6.13(pm)3.76
    and 7.07(pm)3.01 mm for the myocardium and blood pool, respectively.
    Segmentation took 41.5(pm)14.7 s per scan.

    In conclusion, dilated CNNs trained on a small set of CMR images of CHD
    patients showing large anatomical variability provide accurate myocardium and
    blood pool segmentations.

    Feature Tracking Cardiac Magnetic Resonance via Deep Learning and Spline Optimization

    Davis M. Vigneault, Weidi Xie, David A. Bluemke, J. Alison Noble
    Comments: Accepted to Functional Imaging and Modeling of the Heart (FIMH) 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Feature tracking Cardiac Magnetic Resonance (CMR) has recently emerged as an
    area of interest for quantification of regional cardiac function from balanced,
    steady state free precession (SSFP) cine sequences. However, currently
    available techniques lack full automation, limiting reproducibility. We propose
    a fully automated technique whereby a CMR image sequence is first segmented
    with a deep, fully convolutional neural network (CNN) architecture, and
    quadratic basis splines are fitted simultaneously across all cardiac frames
    using least squares optimization. Experiments are performed using data from 42
    patients with hypertrophic cardiomyopathy (HCM) and 21 healthy control
    subjects. In terms of segmentation, we compared state-of-the-art CNN
    frameworks, U-Net and dilated convolution architectures, with and without
    temporal context, using cross validation with three folds. Performance relative
    to expert manual segmentation was similar across all networks: pixel accuracy
    was ~97%, intersection-over-union (IoU) across all classes was ~87%, and IoU
    across foreground classes only was ~85%. Endocardial left ventricular
    circumferential strain calculated from the proposed pipeline was significantly
    different in control and disease subjects (-25.3% vs -29.1%, p = 0.006), in
    agreement with the current clinical literature.

    Predictive-Corrective Networks for Action Detection

    Achal Dave, Olga Russakovsky, Deva Ramanan
    Comments: Accepted to CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    While deep feature learning has revolutionized techniques for static-image
    understanding, the same does not quite hold for video processing. Architectures
    and optimization techniques used for video are largely based off those for
    static images, potentially underutilizing rich video information. In this work,
    we rethink both the underlying network architecture and the stochastic learning
    paradigm for temporal data. To do so, we draw inspiration from classic theory
    on linear dynamic systems for modeling time series. By extending such models to
    include nonlinear mappings, we derive a series of novel recurrent neural
    networks that sequentially make top-down predictions about the future and then
    correct those predictions with bottom-up observations. Predictive-corrective
    networks have a number of desirable properties: (1) they can adaptively focus
    computation on “surprising” frames where predictions require large corrections,
    (2) they simplify learning in that only “residual-like” corrective terms need
    to be learned over time and (3) they naturally decorrelate an input data stream
    in a hierarchical fashion, producing a more reliable signal for learning at
    each layer of a network. We provide an extensive analysis of our lightweight
    and interpretable framework, and demonstrate that our model is competitive with
    the two-stream network on three challenging datasets without the need for
    computationally expensive optical flow.

    Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories

    Ziad Al-Halah, Rainer Stiefelhagen
    Comments: Accepted as a conference paper at CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Attribute-based recognition models, due to their impressive performance and
    their ability to generalize well on novel categories, have been widely adopted
    for many computer vision applications. However, usually both the attribute
    vocabulary and the class-attribute associations have to be provided manually by
    domain experts or large number of annotators. This is very costly and not
    necessarily optimal regarding recognition performance, and most importantly, it
    limits the applicability of attribute-based models to large scale data sets. To
    tackle this problem, we propose an end-to-end unsupervised attribute learning
    approach. We utilize online text corpora to automatically discover a salient
    and discriminative vocabulary that correlates well with the human concept of
    semantic attributes. Moreover, we propose a deep convolutional model to
    optimize class-attribute associations with a linguistic prior that accounts for
    noise and missing data in text. In a thorough evaluation on ImageNet, we
    demonstrate that our model is able to efficiently discover and learn semantic
    attributes at a large scale. Furthermore, we demonstrate that our model
    outperforms the state-of-the-art in zero-shot learning on three data sets:
    ImageNet, Animals with Attributes and aPascal/aYahoo. Finally, we enable
    attribute-based learning on ImageNet and will share the attributes and
    associations for future research.

    Instance-Level Salient Object Segmentation

    Guanbin Li, Yuan Xie, Liang Lin, Yizhou Yu
    Comments: To appear in CVPR2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Image saliency detection has recently witnessed rapid progress due to deep
    convolutional neural networks. However, none of the existing methods is able to
    identify object instances in the detected salient regions. In this paper, we
    present a salient instance segmentation method that produces a saliency mask
    with distinct object instance labels for an input image. Our method consists of
    three steps, estimating saliency map, detecting salient object contours and
    identifying salient object instances. For the first two steps, we propose a
    multiscale saliency refinement network, which generates high-quality salient
    region masks and salient object contours. Once integrated with multiscale
    combinatorial grouping and a MAP-based subset optimization framework, our
    method can generate very promising salient object instance segmentation
    results. To promote further research and evaluation of salient instance
    segmentation, we also construct a new database of 1000 images and their
    pixelwise salient instance annotations. Experimental results demonstrate that
    our proposed method is capable of achieving state-of-the-art performance on all
    public benchmarks for salient region detection as well as on our new dataset
    for salient instance segmentation.

    Deep Contextual Recurrent Residual Networks for Scene Labeling

    T. Hoang Ngan Le, Chi Nhan Duong, Ligong Han, Khoa Luu, Marios Savvides, Dipan Pal
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Designed as extremely deep architectures, deep residual networks which
    provide a rich visual representation and offer robust convergence behaviors
    have recently achieved exceptional performance in numerous computer vision
    problems. Being directly applied to a scene labeling problem, however, they
    were limited to capture long-range contextual dependence, which is a critical
    aspect. To address this issue, we propose a novel approach, Contextual
    Recurrent Residual Networks (CRRN) which is able to simultaneously handle rich
    visual representation learning and long-range context modeling within a fully
    end-to-end deep network. Furthermore, our proposed end-to-end CRRN is
    completely trained from scratch, without using any pre-trained models in
    contrast to most existing methods usually fine-tuned from the state-of-the-art
    pre-trained models, e.g. VGG-16, ResNet, etc. The experiments are conducted on
    four challenging scene labeling datasets, i.e. SiftFlow, CamVid, Stanford
    background and SUN datasets, and compared against various state-of-the-art
    scene labeling methods.

    Reformulating Level Sets as Deep Recurrent Neural Network Approach to Semantic Segmentation

    Ngan Le, Kha Gia Quach, Khoa Luu, Marios Savvides, Chenchen Zhu
    Comments: 10 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Variational Level Set (LS) has been a widely used method in medical
    segmentation. However, it is limited when dealing with multi-instance objects
    in the real world. In addition, its segmentation results are quite sensitive to
    initial settings and highly depend on the number of iterations. To address
    these issues and boost the classic variational LS methods to a new level of the
    learnable deep learning approaches, we propose a novel definition of contour
    evolution named Recurrent Level Set (RLS)} to employ Gated Recurrent Unit under
    the energy minimization of a variational LS functional. The curve deformation
    process in RLS is formed as a hidden state evolution procedure and updated by
    minimizing an energy functional composed of fitting forces and contour length.
    By sharing the convolutional features in a fully end-to-end trainable
    framework, we extend RLS to Contextual RLS (CRLS) to address semantic
    segmentation in the wild. The experimental results have shown that our proposed
    RLS improves both computational time and segmentation accuracy against the
    classic variations LS-based method, whereas the fully end-to-end system CRLS
    achieves competitive performance compared to the state-of-the-art semantic
    segmentation approaches.

    Beyond Planar Symmetry: Modeling human perception of reflection and rotation symmetries in the wild

    Christopher Funk, Yanxi Liu
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

    Humans take advantage of real world symmetries for various tasks, yet
    capturing their superb symmetry perception mechanism into a computational model
    remains elusive. Encouraged by a new discovery (CVPR 2016) demonstrating
    extremely high inter-person accuracy of human perceived symmetries in the wild,
    we have created the first deep-learning neural network for reflection and
    rotation symmetry detection (Sym-NET), trained on photos from MS-COCO (Common
    Object in COntext) dataset with nearly 11K symmetry-labels from more than 400
    human observers. We employ novel methods to convert discrete human labels into
    symmetry heatmaps, capture symmetry densely in an image and quantitatively
    evaluate Sym-NET against multiple existing computer vision algorithms. Using
    the symmetry competition testsets from CVPR 2013 and unseen MS-COCO photos,
    Sym-NET comes out as the winner with significantly superior performance over
    all other competitors. Beyond mathematically well-defined symmetries on a
    plane, Sym-NET demonstrates abilities to identify viewpoint-varied 3D
    symmetries, partially occluded symmetrical objects and symmetries at a semantic
    level.

    Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification

    Muhammad Zeshan Afzal, Andreas Kölsch, Sheraz Ahmed, Marcus Liwicki
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present an exhaustive investigation of recent Deep Learning architectures,
    algorithms, and strategies for the task of document image classification to
    finally reduce the error by more than half. Existing approaches, such as the
    DeepDocClassifier, apply standard Convolutional Network architectures with
    transfer learning from the object recognition domain. The contribution of the
    paper is threefold: First, it investigates recently introduced very deep neural
    network architectures (GoogLeNet, VGG, ResNet) using transfer learning (from
    real images). Second, it proposes transfer learning from a huge set of document
    images, i.e. 400,000 documents. Third, it analyzes the impact of the amount of
    training data (document images) and other parameters to the classification
    abilities. We use two datasets, the Tobacco-3482 and the large-scale RVL-CDIP
    dataset. We achieve an accuracy of 91.13% for the Tobacco-3482 dataset while
    earlier approaches reach only 77.6%. Thus, a relative error reduction of more
    than 60% is achieved. For the large dataset RVL-CDIP, an accuracy of 90.97% is
    achieved, corresponding to a relative error reduction of 11.5%.

    Attention-based Extraction of Structured Information from Street View Imagery

    Zbigniew Wojna, Alex Gorban, Dar-Shyang Lee, Kevin Murphy, Qian Yu, Yeqing Li, Julian Ibarz
    Comments: In submission to ICDAR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a neural network model – based on CNNs, RNNs and a novel attention
    mechanism – which achieves 84.2% accuracy on the challenging French Street Name
    Signs (FSNS) dataset, significantly outperforming the previous state of the art
    (Smith’16), which achieved 72.46%. Furthermore, our new method is much simpler
    and more general than the previous approach. To demonstrate the generality of
    our model, we show that it also performs well on an even more challenging
    dataset derived from Google Street View, in which the goal is to extract
    business names from store fronts. Finally, we study the speed/accuracy tradeoff
    that results from using CNN feature extractors of different depths.
    Surprisingly, we find that deeper is not always better (in terms of accuracy,
    as well as speed). Our resulting model is simple, accurate and fast, allowing
    it to be used at scale on a variety of challenging real-world text extraction
    problems.

    Learning Detection with Diverse Proposals

    Samaneh Azadi, Jiashi Feng, Trevor Darrell
    Comments: Accepted to CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    To predict a set of diverse and informative proposals with enriched
    representations, this paper introduces a differentiable Determinantal Point
    Process (DPP) layer that is able to augment the object detection architectures.
    Most modern object detection architectures, such as Faster R-CNN, learn to
    localize objects by minimizing deviations from the ground-truth but ignore
    correlation between multiple proposals and object categories. Non-Maximum
    Suppression (NMS) as a widely used proposal pruning scheme ignores label- and
    instance-level relations between object candidates resulting in multi-labeled
    detections. In the multi-class case, NMS selects boxes with the largest
    prediction scores ignoring the semantic relation between categories of
    potential election. In contrast, our trainable DPP layer, allowing for Learning
    Detection with Diverse Proposals (LDDP), considers both label-level contextual
    information and spatial layout relationships between proposals without
    increasing the number of parameters of the network, and thus improves location
    and category specifications of final detected bounding boxes substantially
    during both training and inference schemes. Furthermore, we show that LDDP
    keeps it superiority over Faster R-CNN even if the number of proposals
    generated by LDPP is only ~30% as many as those for Faster R-CNN.

    UC Merced Submission to the ActivityNet Challenge 2016

    Yi Zhu, Shawn Newsam, Zaikun Xu
    Comments: Notebook paper for ActivityNet 2016 challenge, untrimmed video classification track
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

    This notebook paper describes our system for the untrimmed classification
    task in the ActivityNet challenge 2016. We investigate multiple
    state-of-the-art approaches for action recognition in long, untrimmed videos.
    We exploit hand-crafted motion boundary histogram features as well feature
    activations from deep networks such as VGG16, GoogLeNet, and C3D. These
    features are separately fed to linear, one-versus-rest support vector machine
    classifiers to produce confidence scores for each action class. These
    predictions are then fused along with the softmax scores of the recent
    ultra-deep ResNet-101 using weighted averaging.

    Creativity: Generating Diverse Questions using Variational Autoencoders

    Unnat Jain, Ziyu Zhang, Alexander Schwing
    Comments: Accepted to CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Generating diverse questions for given images is an important task for
    computational education, entertainment and AI assistants. Different from many
    conventional prediction techniques is the need for algorithms to generate a
    diverse set of plausible questions, which we refer to as “creativity”. In this
    paper we propose a creative algorithm for visual question generation which
    combines the advantages of variational autoencoders with long short-term memory
    networks. We demonstrate that our framework is able to generate a large set of
    varying questions given a single input image.

    CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction

    Keisuke Tateno, Federico Tombari, Iro Laina, Nassir Navab
    Comments: 10 pages, 6 figures, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, June, 2017. The first two authors contribute equally to this paper
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Given the recent advances in depth prediction from Convolutional Neural
    Networks (CNNs), this paper investigates how predicted depth maps from a deep
    neural network can be deployed for accurate and dense monocular reconstruction.
    We propose a method where CNN-predicted dense depth maps are naturally fused
    together with depth measurements obtained from direct monocular SLAM. Our
    fusion scheme privileges depth prediction in image locations where monocular
    SLAM approaches tend to fail, e.g. along low-textured regions, and vice-versa.
    We demonstrate the use of depth prediction for estimating the absolute scale of
    the reconstruction, hence overcoming one of the major limitations of monocular
    SLAM. Finally, we propose a framework to efficiently fuse semantic labels,
    obtained from a single frame, with dense SLAM, yielding semantically coherent
    scene reconstruction from a single view. Evaluation results on two benchmark
    datasets show the robustness and accuracy of our approach.

    Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems

    Tim Meinhardt, Michael Möller, Caner Hazirbas, Daniel Cremers
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    While variational methods have been among the most powerful tools for solving
    linear inverse problems in imaging, deep (convolutional) neural networks have
    recently taken the lead in many challenging benchmarks. A remaining drawback of
    deep learning approaches is that they require an expensive retraining whenever
    the specific problem, the noise level, noise type, or desired measure of
    fidelity changes. On the contrary, variational methods have a plug-and-play
    nature as they usually consist of separate data fidelity and regularization
    terms. In this paper we study the possibility of replacing the proximal
    operator of the regularization used in many convex energy minimization
    algorithms by a denoising neural network. The latter therefore serves as an
    implicit natural image prior, while the data term can still be chosen
    arbitrarily. Using a fixed denoising neural network in exemplary problems of
    image deconvolution with different blur kernels and image demosaicking, we
    obtain state-of-the-art results. Additionally, we discuss novel results on the
    analysis of possible convex optimization algorithms to incorporate the network
    into, as well as the choices of algorithm parameters and their relation to the
    noise level the neural network is trained on.

    Learning Two-Branch Neural Networks for Image-Text Matching Tasks

    Liwei Wang, Yin Li, Svetlana Lazebnik
    Comments: in submission for TPAMI
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper investigates two-branch neural networks for image-text matching
    tasks. We propose two different network structures that produce different
    output representations. The first one, referred to as an embedding network,
    learns an explicit shared latent embedding space with a maximum-margin ranking
    loss and novel neighborhood constraints. The second one, referred to as a
    similarity network, fuses the two branches via element-wise product and is
    trained with regression loss to directly predict a similarity score. Extensive
    experiments show that our two-branch networks achieve high accuracies for
    phrase localization on the Flickr30K Entities dataset and for bi-directional
    image-sentence retrieval on Flickr30K and MSCOCO datasets.

    Deep-FExt: Deep Feature Extraction for Vessel Segmentation and Centerline Prediction

    Giles Tetteh, Markus Rempfler, Bjoern H. Menze, Claus Zimmer
    Comments: 9 pages
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Feature extraction is a very crucial task in image and pixel (voxel)
    classification and regression in biomedical image modeling. In this work we
    present a machine learning based feature extraction scheme based on inception
    models for pixel classification tasks. We extract features under multi-scale
    and multi-layer schemes through convolutional operators. Layers of Fully
    Convolutional Network are later stacked on this feature extraction layers and
    trained end-to-end for the purpose of classification. We test our model on the
    DRIVE and STARE public data sets for the purpose of segmentation and centerline
    detection and it out performs most existing hand crafted or deterministic
    feature schemes found in literature. We achieve an average maximum Dice of 0.85
    on the DRIVE data set which out performs the scores from the second human
    annotator of this data set. We also achieve an average maximum Dice of 0.85 and
    kappa of 0.84 on the STARE data set. Though these datasets are mainly 2-D we
    also propose ways of extending this feature extraction scheme to handle 3-D
    datasets.


    Artificial Intelligence

    Counterexample Guided Inductive Optimization

    Rodrigo F. Araujo, Higo F. Albuquerque, Iury V. de Bessa, Lucas C. Cordeiro, Joao Edgar C. Filho
    Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

    This paper describes three variants of a counterexample guided inductive
    optimization (CEGIO) approach based on Satisfiability Modulo Theories (SMT)
    solvers. In particular, CEGIO relies on iterative executions to constrain a
    verification procedure, in order to perform inductive generalization, based on
    counterexamples extracted from SMT solvers. CEGIO is able to successfully
    optimize a wide range of functions, including non-linear and non-convex
    optimization problems based on SMT solvers, in which data provided by
    counterexamples are employed to guide the verification engine, thus reducing
    the optimization domain. The present algorithms are evaluated using a large set
    of benchmarks typically employed for evaluating optimization techniques.
    Experimental results show the efficiency and effectiveness of the proposed
    algorithms, which find the optimal solution in all evaluated benchmarks, while
    traditional techniques are usually trapped by local minima.

    Learning from Demonstrations for Real World Reinforcement Learning

    Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    Deep reinforcement learning (RL) has achieved several high profile successes
    in difficult control problems. However, these algorithms typically require a
    huge amount of data before they reach reasonable performance. In fact, their
    performance during learning can be extremely poor. This may be acceptable for a
    simulator, but it severely limits the applicability of deep RL to many
    real-world tasks, where the agent must learn in the real environment. In this
    paper we study a setting where the agent may access data from previous control
    of the system. We present an algorithm, Deep Q-learning from Demonstrations
    (DQfD), that leverages this data to massively accelerate the learning process
    even from relatively small amounts of demonstration data. DQfD works by
    combining temporal difference updates with large-margin classification of the
    demonstrator’s actions. We show that DQfD has better initial performance than
    Deep Q-Networks (DQN) on 40 of 42 Atari games and it receives more average
    rewards than DQN on 27 of 42 Atari games. We also demonstrate that DQfD learns
    faster than DQN even when given poor demonstration data.

    Beliefs in Markov Trees – From Local Computations to Local Valuation

    Mieczysław A. Kłopotek
    Comments: Preliminary versioin of conference paper: M.A. K{l}opotek: Beliefs in Markov Trees – From Local Computations to Local Valuation. [in:] R. Trappl, Ed.: Cybernetics and Systems Research , Proc. 12th European Meeting on Cybernetics and System Research, Vienna 5-8 April 1994, World Scientific Publishers, Vol.1. pp. 351-358
    Subjects: Artificial Intelligence (cs.AI)

    This paper is devoted to expressiveness of hypergraphs for which uncertainty
    propagation by local computations via Shenoy/Shafer method applies. It is
    demonstrated that for this propagation method for a given joint belief
    distribution no valuation of hyperedges of a hypergraph may provide with
    simpler hypergraph structure than valuation of hyperedges by conditional
    distributions. This has vital implication that methods recovering belief
    networks from data have no better alternative for finding the simplest
    hypergraph structure for belief propagation. A method for recovery
    tree-structured belief networks has been developed and specialized for
    Dempster-Shafer belief functions

    Stigmergy-based modeling to discover urban activity patterns from positioning data

    Antonio L. Alfeo, Mario G. C. A. Cimino, Sara Egidi, Bruno Lepri, Alex Pentland, Gigliola Vaglini
    Comments: It will be presented at the International Conference on Social, Cultural, and Behavioral Modeling & Prediction and Behavior Representation in Modeling and Simulation, SBP-BRiMS 2017
    Subjects: Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

    Positioning data offer a remarkable source of information to analyze crowds
    urban dynamics. However, discovering urban activity patterns from the emergent
    behavior of crowds involves complex system modeling. An alternative approach is
    to adopt computational techniques belonging to the emergent paradigm, which
    enables self-organization of data and allows adaptive analysis. Specifically,
    our approach is based on stigmergy. By using stigmergy each sample position is
    associated with a digital pheromone deposit, which progressively evaporates and
    aggregates with other deposits according to their spatiotemporal proximity.
    Based on this principle, we exploit positioning data to identify high density
    areas (hotspots) and characterize their activity over time. This
    characterization allows the comparison of dynamics occurring in different days,
    providing a similarity measure exploitable by clustering techniques. Thus, we
    cluster days according to their activity behavior, discovering unexpected urban
    activity patterns. As a case study, we analyze taxi traces in New York City
    during 2015.

    Finding Modes by Probabilistic Hypergraphs Shifting

    Yang Wang, Lin Wu
    Comments: Fixing some minor issues in PAKDD 2014
    Subjects: Artificial Intelligence (cs.AI)

    In this paper, we develop a novel paradigm, namely hypergraph shift, to find
    robust graph modes by probabilistic voting strategy, which are semantically
    sound besides the self-cohesiveness requirement in forming graph modes. Unlike
    the existing techniques to seek graph modes by shifting vertices based on
    pair-wise edges (i.e, an edge with (2) ends), our paradigm is based on shifting
    high-order edges (hyperedges) to deliver graph modes. Specifically, we convert
    the problem of seeking graph modes as the problem of seeking maximizers of a
    novel objective function with the aim to generate good graph modes based on
    sifting edges in hypergraphs. As a result, the generated graph modes based on
    dense subhypergraphs may more accurately capture the object semantics besides
    the self-cohesiveness requirement. We also formally prove that our technique is
    always convergent. Extensive empirical studies on synthetic and real world data
    sets are conducted on clustering and graph matching. They demonstrate that our
    techniques significantly outperform the existing techniques.

    CASP Solutions for Planning in Hybrid Domains

    Marcello Balduccini, Daniele Magazzeni, Marco Maratea, Emily LeBlanc
    Comments: Under consideration in Theory and Practice of Logic Programming (TPLP)
    Subjects: Artificial Intelligence (cs.AI)

    CASP is an extension of ASP that allows for numerical constraints to be added
    in the rules. PDDL+ is an extension of the PDDL standard language of automated
    planning for modeling mixed discrete-continuous dynamics.

    In this paper, we present CASP solutions for dealing with PDDL+ problems,
    i.e., encoding from PDDL+ to CASP, and extensions to the algorithm of the EZCSP
    CASP solver in order to solve CASP programs arising from PDDL+ domains. An
    experimental analysis, performed on well-known linear and non-linear variants
    of PDDL+ domains, involving various configurations of the EZCSP solver, other
    CASP solvers, and PDDL+ planners, shows the viability of our solution.

    Parallelized Kendall's Tau Coefficient Computation via SIMD Vectorized Sorting On Many-Integrated-Core Processors

    Yongchao Liu, Tony Pan, Oded Green, Srinivas Aluru
    Comments: 29 pages, 6 figures, 5 tables, submitted to Journal of Parallel and Distributed Computing
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)

    Pairwise association measure is an important operation in data analytics.
    Kendall’s tau coefficient is one widely used correlation coefficient
    identifying non-linear relationships between ordinal variables. In this paper,
    we investigated a parallel algorithm accelerating all-pairs Kendall’s tau
    coefficient computation via single instruction multiple data (SIMD) vectorized
    sorting on Intel Xeon Phis by taking advantage of many processing cores and
    512-bit SIMD vector instructions. To facilitate workload balancing and overcome
    on-chip memory limitation, we proposed a generic framework for symmetric
    all-pairs computation by building provable bijective functions between job
    identifier and coordinate space. Performance evaluation demonstrated that our
    algorithm on one 5110P Phi achieves two orders-of-magnitude speedups over
    16-threaded MATLAB and three orders-of-magnitude speedups over sequential R,
    both running on high-end CPUs. Besides, our algorithm exhibited rather good
    distributed computing scalability with respect to number of Phis. Source code
    and datasets are publicly available at this http URL

    Real-time On-Demand Crowd-powered Entity Extraction

    Ting-Hao (Kenneth)
    Huang, Yun-Nung Chen, Jeffrey P. Bigham
    Comments: Accepted by the 5th Edition Of The Collective Intelligence Conference (CI 2017) as an oral presentation. Interface code and data are available at: this https URL
    Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Output-agreement mechanisms such as ESP Game have been widely used in human
    computation to obtain reliable human-generated labels. In this paper, we argue
    that a “time-limited” output-agreement mechanism can be used to create a fast
    and robust crowd-powered component in interactive systems, particularly
    dialogue systems, to extract key information from user utterances on the fly.
    Our experiments on Amazon Mechanical Turk using the Airline Travel Information
    System (ATIS) dataset showed that the proposed approach achieves high-quality
    results with an average response time shorter than 9 seconds.

    Unsupervised Event Abstraction using Pattern Abstraction and Local Process Models

    Felix Mannhardt, Niek Tax
    Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Process mining analyzes business processes based on events stored in event
    logs. However, some recorded events may correspond to activities on a very low
    level of abstraction. When events are recorded on a too low level of
    granularity, process discovery methods tend to generate overgeneralizing
    process models. Grouping low-level events to higher level activities, i.e.,
    event abstraction, can be used to discover better process models. Existing
    event abstraction methods are mainly based on common sub-sequences and
    clustering techniques. In this paper, we propose to first discover local
    process models and then use those models to lift the event log to a higher
    level of abstraction. Our conjecture is that process models discovered on the
    obtained high-level event log return process models of higher quality: their
    fitness and precision scores are more balanced. We show this with preliminary
    results on several real-life event logs.


    Information Retrieval

    Loklak – A Distributed Crawler and Data Harvester for Overcoming Rate Limits

    Sudheesh Singanamalla, Michael Peter Christen
    Comments: 4 pages, 1 figure
    Subjects: Information Retrieval (cs.IR)

    Modern social networks have become sources for vast quantities of data.
    Having access to such big data can be very useful for various researchers and
    data scientists. In this paper we describe Loklak, an open source distributed
    peer to peer crawler and scraper for supporting such research on platforms like
    Twitter, Weibo and other social networks. Social networks such as Twitter and
    Weibo pose various limitations to the user on the rate at which one could
    freely collect such data for research. Our crawler enables researchers to
    continuously collect data while overcoming the barriers of authentication and
    rate limits imposed to provide a repository of open data as a service.

    Leveraging Term Banks for Answering Complex Questions: A Case for Sparse Vectors

    Peter D. Turney
    Comments: Related datasets can be found at this http URL
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Learning (cs.LG)

    While open-domain question answering (QA) systems have proven effective for
    answering simple questions, they struggle with more complex questions. Our goal
    is to answer more complex questions reliably, without incurring a significant
    cost in knowledge resource construction to support the QA. One readily
    available knowledge resource is a term bank, enumerating the key concepts in a
    domain. We have developed an unsupervised learning approach that leverages a
    term bank to guide a QA system, by representing the terminological knowledge
    with thousands of specialized vector spaces. In experiments with complex
    science questions, we show that this approach significantly outperforms several
    state-of-the-art QA systems, demonstrating that significant leverage can be
    gained from continuous vector representations of domain terminology.

    Unsupervised Spatio-Temporal Embeddings for User and Location Modelling

    Jing Yang, Carsten Eickhoff
    Subjects: Information Retrieval (cs.IR)

    Many social network applications depend on robust representations of
    spatio-temporal data. In this work, we present an embedding model based on
    feed-forward neural networks which transforms social media check-ins into dense
    feature vectors encoding geographic, temporal, and functional aspects for
    modelling places, neighborhoods, and users. On the basis of this model, we
    further propose a Spatio-Temporal Embedding Similarity algorithm (STES) for
    location recommendation.

    In a range of experiments on real life data collected from Foursquare,
    Twitter, and Gowalla, we demonstrate our model’s effectiveness at
    characterizing places and people, resulting in a significantly increased
    recommendation and classification performance as compared to the state of the
    art. Finally, we select eight major cities around the globe and verify the
    robustness and generality of our model by porting pre-trained models from one
    city to another, and thereby alleviating the need for costly local training.

    Unsupervised part learning for visual recognition

    Ronan Sicre, Yannis Avrithis, Ewa Kijak, Frederic Jurie
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

    Part-based image classification aims at representing categories by small sets
    of learned discriminative parts, upon which an image representation is built.
    Considered as a promising avenue a decade ago, this direction has been
    neglected since the advent of deep neural networks. In this context, this paper
    brings two contributions: first, it shows that despite the recent success of
    end-to-end holistic models, explicit part learning can boosts classification
    performance. Second, this work proceeds one step further than recent part-based
    models (PBM), focusing on how to learn parts without using any labeled data.
    Instead of learning a set of parts per class, as generally done in the PBM
    literature, the proposed approach both constructs a partition of a given set of
    images into visually similar groups, and subsequently learn a set of
    discriminative parts per group in a fully unsupervised fashion. This strategy
    opens the door to the use of PBM in new applications for which the notion of
    image categories is irrelevant, such as instance-based image retrieval, for
    example. We experimentally show that our learned parts can help building
    efficient image representations, for classification as well as for indexing
    tasks, resulting in performance superior to holistic state-of-the art Deep
    Convolutional Neural Networks (DCNN) encoding.


    Computation and Language

    Trainable Referring Expression Generation using Overspecification Preferences

    Thiago castro Ferreira, Ivandre Paraboni
    Comments: 8 pages
    Subjects: Computation and Language (cs.CL)

    Referring expression generation (REG) models that use speaker-dependent
    information require a considerable amount of training data produced by every
    individual speaker, or may otherwise perform poorly. In this work we present a
    simple REG experiment that allows the use of larger training data sets by
    grouping speakers according to their overspecification preferences. Intrinsic
    evaluation shows that this method generally outperforms the personalised method
    found in previous work.

    Representation Stability as a Regularizer for Improved Text Analytics Transfer Learning

    Matthew Riemer, Elham Khabiri, Richard Goodwin
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Although neural networks are well suited for sequential transfer learning
    tasks, the catastrophic forgetting problem hinders proper integration of prior
    knowledge. In this work, we propose a solution to this problem by using a
    multi-task objective based on the idea of distillation and a mechanism that
    directly penalizes forgetting at the shared representation layer during the
    knowledge integration phase of training. We demonstrate our approach on a
    Twitter domain sentiment analysis task with sequential knowledge transfer from
    four related tasks. We show that our technique outperforms networks fine-tuned
    to the target task. Additionally, we show both through empirical evidence and
    examples that it does not forget useful knowledge from the source task that is
    forgotten during standard fine-tuning. Surprisingly, we find that first
    distilling a human made rule based sentiment engine into a recurrent neural
    network and then integrating the knowledge with the target task data leads to a
    substantial gain in generalization performance. Our experiments demonstrate the
    power of multi-source transfer techniques in practical text analytics problems
    when paired with distillation. In particular, for the SemEval 2016 Task 4
    Subtask A (Nakov et al., 2016) dataset we surpass the state of the art
    established during the competition with a comparatively simple model
    architecture that is not even competitive when trained on only the labeled task
    specific data.

    ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational Knowledge

    Robert Speer, Joanna Lowry-Duda
    Comments: 5 pages, accepted to the SemEval workshop at ACL 2017
    Subjects: Computation and Language (cs.CL)

    This paper describes Luminoso’s participation in SemEval 2017 Task 2,
    “Multilingual and Cross-lingual Semantic Word Similarity”, with a system based
    on ConceptNet. ConceptNet is an open, multilingual knowledge graph that focuses
    on general knowledge that relates the meanings of words and phrases. Our
    submission to SemEval was an update of previous work that builds high-quality,
    multilingual word embeddings from a combination of ConceptNet and
    distributional semantics. Our system took first place in both subtasks. It
    ranked first in 4 out of 5 of the separate languages, and also ranked first in
    all 10 of the cross-lingual language pairs.

    What do Neural Machine Translation Models Learn about Morphology?

    Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, James Glass
    Comments: Accepted to ACL 2017
    Subjects: Computation and Language (cs.CL)

    Neural machine translation (MT) models obtain state-of-the-art performance
    while maintaining a simple, end-to-end architecture. However, little is known
    about what these models learn about source and target languages during the
    training process. In this work, we analyze the representations learned by
    neural MT models at various levels of granularity and empirically evaluate the
    quality of the representations for learning morphology through extrinsic
    part-of-speech and morphological tagging tasks. We conduct a thorough
    investigation along several parameters: word-based vs. character-based
    representations, depth of the encoding layer, the identity of the target
    language, and encoder vs. decoder representations. Our data-driven,
    quantitative evaluation sheds light on important aspects in the neural MT
    system and its ability to capture word structure.

    A Neural Parametric Singing Synthesizer

    Merlijn Blaauw, Jordi Bonada
    Comments: Submitted to Interspeech 2017
    Subjects: Sound (cs.SD); Computation and Language (cs.CL); Learning (cs.LG)

    We present a new model for singing synthesis based on a modified version of
    the WaveNet architecture. Instead of modeling raw waveform, we model features
    produced by a parametric vocoder that separates the influence of pitch and
    timbre. This allows conveniently modifying pitch to match any target melody,
    facilitates training on more modest dataset sizes, and significantly reduces
    training and generation times. Our model makes frame-wise predictions using
    mixture density outputs rather than categorical outputs in order to reduce the
    required parameter count. As we found overfitting to be an issue with the
    relatively small datasets used in our experiments, we propose a method to
    regularize the model and make the autoregressive generation process more robust
    to prediction errors. Using a simple multi-stream architecture, harmonic,
    aperiodic and voiced/unvoiced components can all be predicted in a coherent
    manner. We compare our method to existing parametric statistical and
    state-of-the-art concatenative methods using quantitative metrics and a
    listening test. While naive implementations of the autoregressive generation
    algorithm tend to be inefficient, using a smart algorithm we can greatly speed
    up the process and obtain a system that’s competitive in both speed and
    quality.

    Real-time On-Demand Crowd-powered Entity Extraction

    Ting-Hao (Kenneth)
    Huang, Yun-Nung Chen, Jeffrey P. Bigham
    Comments: Accepted by the 5th Edition Of The Collective Intelligence Conference (CI 2017) as an oral presentation. Interface code and data are available at: this https URL
    Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Output-agreement mechanisms such as ESP Game have been widely used in human
    computation to obtain reliable human-generated labels. In this paper, we argue
    that a “time-limited” output-agreement mechanism can be used to create a fast
    and robust crowd-powered component in interactive systems, particularly
    dialogue systems, to extract key information from user utterances on the fly.
    Our experiments on Amazon Mechanical Turk using the Airline Travel Information
    System (ATIS) dataset showed that the proposed approach achieves high-quality
    results with an average response time shorter than 9 seconds.

    Leveraging Term Banks for Answering Complex Questions: A Case for Sparse Vectors

    Peter D. Turney
    Comments: Related datasets can be found at this http URL
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Learning (cs.LG)

    While open-domain question answering (QA) systems have proven effective for
    answering simple questions, they struggle with more complex questions. Our goal
    is to answer more complex questions reliably, without incurring a significant
    cost in knowledge resource construction to support the QA. One readily
    available knowledge resource is a term bank, enumerating the key concepts in a
    domain. We have developed an unsupervised learning approach that leverages a
    term bank to guide a QA system, by representing the terminological knowledge
    with thousands of specialized vector spaces. In experiments with complex
    science questions, we show that this approach significantly outperforms several
    state-of-the-art QA systems, demonstrating that significant leverage can be
    gained from continuous vector representations of domain terminology.

    Unsupervised Event Abstraction using Pattern Abstraction and Local Process Models

    Felix Mannhardt, Niek Tax
    Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Process mining analyzes business processes based on events stored in event
    logs. However, some recorded events may correspond to activities on a very low
    level of abstraction. When events are recorded on a too low level of
    granularity, process discovery methods tend to generate overgeneralizing
    process models. Grouping low-level events to higher level activities, i.e.,
    event abstraction, can be used to discover better process models. Existing
    event abstraction methods are mainly based on common sub-sequences and
    clustering techniques. In this paper, we propose to first discover local
    process models and then use those models to lift the event log to a higher
    level of abstraction. Our conjecture is that process models discovered on the
    obtained high-level event log return process models of higher quality: their
    fitness and precision scores are more balanced. We show this with preliminary
    results on several real-life event logs.


    Distributed, Parallel, and Cluster Computing

    Parallelized Kendall's Tau Coefficient Computation via SIMD Vectorized Sorting On Many-Integrated-Core Processors

    Yongchao Liu, Tony Pan, Oded Green, Srinivas Aluru
    Comments: 29 pages, 6 figures, 5 tables, submitted to Journal of Parallel and Distributed Computing
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)

    Pairwise association measure is an important operation in data analytics.
    Kendall’s tau coefficient is one widely used correlation coefficient
    identifying non-linear relationships between ordinal variables. In this paper,
    we investigated a parallel algorithm accelerating all-pairs Kendall’s tau
    coefficient computation via single instruction multiple data (SIMD) vectorized
    sorting on Intel Xeon Phis by taking advantage of many processing cores and
    512-bit SIMD vector instructions. To facilitate workload balancing and overcome
    on-chip memory limitation, we proposed a generic framework for symmetric
    all-pairs computation by building provable bijective functions between job
    identifier and coordinate space. Performance evaluation demonstrated that our
    algorithm on one 5110P Phi achieves two orders-of-magnitude speedups over
    16-threaded MATLAB and three orders-of-magnitude speedups over sequential R,
    both running on high-end CPUs. Besides, our algorithm exhibited rather good
    distributed computing scalability with respect to number of Phis. Source code
    and datasets are publicly available at this http URL

    NG2C: Pretenuring N-Generational GC for HotSpot Big Data Applications

    Rodrigo Bruno, Luís Oliveira, Paulo Ferreira
    Comments: Accepted at ISMM’17
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Big Data applications suffer from unpredictable and unacceptably high pause
    times due to Garbage Collection (GC). This is the case in latency-sensitive
    applications such as on-line credit-card fraud detection, graph-based computing
    for analysis on social networks, etc. Such pauses compromise latency
    requirements of the whole application stack and result from applications’
    aggressive buffering/caching of data, exposing an ill-suited GC design, which
    assumes that most objects will die young and does not consider that
    applications hold large amounts of middle-lived data in memory.

    To avoid such pauses, we propose NG2C, a new GC algorithm that combines
    pretenuring with an N-Generational heap. By being able to allocate objects into
    different generations, NG2C is able to group objects with similar lifetime
    profiles in the same generation. By allocating objects with similar lifetime
    profiles close to each other, i.e. in the same generation, we avoid object
    promotion (copying between generations) and heap fragmentation (which leads to
    heap compactions) both responsible for most of the duration of HotSpot GC pause
    times.

    NG2C is implemented for the OpenJDK 8 HotSpot Java Virtual Machine, as an
    extension of the Garbage First GC. We evaluate NG2C using Cassandra, Lucene,
    and GraphChi with three different GCs: Garbage First (G1), Concurrent Mark
    Sweep (CMS), and NG2C. Results show that NG2C decreases the worst observable GC
    pause time by up to 94.8% for Cassandra, 85.0% for Lucene and 96.45% for
    GraphChi, when compared to current collectors (G1 and CMS). In addition, NG2C
    has no negative impact on application throughput or memory usage.

    Optimal Repair Layering for Erasure-Coded Data Centers: From Theory to Practice

    Yuchong Hu, Xiaolu Li, Mi Zhang, Patrick P. C. Lee, Xiaoyang Zhang, Pan Zhou, Dan Feng
    Comments: 23 pages
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Repair performance in hierarchical data centers is often bottlenecked by
    cross-rack network transfer. Recent theoretical results show that the
    cross-rack repair bandwidth can be minimized through repair layering, whose
    idea is to partition a repair operation into inner-rack and cross-rack layers.
    However, how repair layering should be implemented and deployed in practice
    remains an open issue. In this paper, we address this issue by proposing a
    practical repair layering framework called DoubleR. We design two families of
    practical double regenerating codes (DRC), which not only minimize the
    cross-rack repair bandwidth, but also have several practical properties that
    improve state-of-the-art regenerating codes. We implement and deploy DoubleR
    atop Hadoop Distributed File System (HDFS), and show that DoubleR maintains the
    theoretical guarantees of DRC and improves the repair performance of
    regenerating codes in both node recovery and degraded read operations.

    A Component-Based Dual Decomposition Method for the OPF Problem

    Sleiman Mhanna, Gregor Verbic, Archie Chapman
    Comments: 8 pages + 4 pages of examples. Submitted to IEEE Transactions on Power Systems
    Journal-ref: IEEE Transactions on Power Systems, 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)

    This paper proposes a component-based dual decomposition of the optimal power
    flow (OPF) problem, where the modified dual function is solved in a distributed
    fashion. This paper is the first to conduct extensive numerical analysis
    resulting in the identification and tabulation of the algorithmic parameter
    settings that are crucial for the convergence of the method on a vast array of
    test instances. Moreover, this work provides a deeper insight into the geometry
    of the modified Lagrange dual function of the OPF problem and highlights the
    conditions that make this function differentiable. Despite the inherent
    nonconvexity of the AC OPF problem, this numerical proof of convergence coupled
    with the scalability and the privacy preserving nature of the proposed method
    brings component-based distributed OPF several steps closer to reality.

    Toward a Distributed Knowledge Discovery system for Grid systems

    Nhien-An Le-Khac, Lamine Aouad, M-Tahar Kechadi
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    During the last decade or so, we have had a deluge of data from not only
    science fields but also industry and commerce fields. Although the amount of
    data available to us is constantly increasing, our ability to process it
    becomes more and more difficult. Efficient discovery of useful knowledge from
    these datasets is therefore becoming a challenge and a massive economic need.
    This led to the need of developing large-scale data mining (DM) techniques to
    deal with these huge datasets either from science or economic applications. In
    this chapter, we present a new DDM system combining dataset-driven and
    architecture-driven strategies. Data-driven strategies will consider the size
    and heterogeneity of the data, while architecture driven will focus on the
    distribution of the datasets. This system is based on a Grid middleware tools
    that integrate appropriate large data manipulation operations. Therefore, this
    allows more dynamicity and autonomicity during the mining, integrating and
    processing phases

    Feature Selection Parallel Technique for Remotely Sensed Imagery Classification

    Nhien-An Le-Khac, M-Tahar Kechadi, Bo Wu, C. Chen
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Remote sensing research focusing on feature selection has long attracted the
    attention of the remote sensing community because feature selection is a
    prerequisite for image processing and various applications. Different feature
    selection methods have been proposed to improve the classification accuracy.
    They vary from basic search techniques to clonal selections, and various
    optimal criteria have been investigated. Recently, methods using
    dependence-based measures have attracted much attention due to their ability to
    deal with very high dimensional datasets. However, these methods are based on
    Cramers V test, which has performance issues with large datasets. In this
    paper, we propose a parallel approach to improve their performance. We evaluate
    our approach on hyper-spectral and high spatial resolution images and compare
    it to the proposed methods with a centralized version as preliminary results.
    The results are very promising.

    Toward a new approach for massive LiDAR data processing

    V-H Cao, K-X Chu, Nhien-An Le-Khac, M-T Kechadi, Debra F. Laefer, Linh Truong-Hong
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Laser scanning (also known as Light Detection And Ranging) has been widely
    applied in various application. As part of that, aerial laser scanning (ALS)
    has been used to collect topographic data points for a large area, which
    triggers to million points to be acquired. Furthermore, today, with integrating
    full wareform (FWF) technology during ALS data acquisition, all return
    information of laser pulse is stored. Thus, ALS data are to be massive and
    complexity since the FWF of each laser pulse can be stored up to 256 samples
    and density of ALS data is also increasing significantly. Processing LiDAR data
    demands heavy operations and the traditional approaches require significant
    hardware and running time. On the other hand, researchers have recently
    proposed parallel approaches for analysing LiDAR data. These approaches are
    normally based on parallel architecture of target systems such as multi-core
    processors, GPU, etc. However, there is still missing efficient
    approaches/tools supporting the analysis of LiDAR data due to the lack of a
    deep study on both library tools and algorithms used in processing this data.
    In this paper, we present a comparative study of software libraries and
    algorithms to optimise the processing of LiDAR data. We also propose new method
    to improve this process with experiments on large LiDAR data. Finally, we
    discuss on a parallel solution of our approach where we integrate parallel
    computing in processing LiDAR data.


    Learning

    Determining Song Similarity via Machine Learning Techniques and Tagging Information

    Renato L. F. Cunha, Evandro Caldeira, Luciana Fujii
    Comments: 6 pages, 2 figures
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    The task of determining item similarity is a crucial one in a recommender
    system. This constitutes the base upon which the recommender system will work
    to determine which items are more likely to be enjoyed by a user, resulting in
    more user engagement. In this paper we tackle the problem of determining song
    similarity based solely on song metadata (such as the performer, and song
    title) and on tags contributed by users. We evaluate our approach under a
    series of different machine learning algorithms. We conclude that tf-idf
    achieves better results than Word2Vec to model the dataset to feature vectors.
    We also conclude that k-NN models have better performance than SVMs and Linear
    Regression for this problem.

    MAGAN: Margin Adaptation for Generative Adversarial Networks

    Ruohan Wang, Antoine Cully, Hyung Jin Chang, Yiannis Demiris
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We propose a novel training procedure for Generative Adversarial Networks
    (GANs) to improve stability and performance by using an adaptive hinge loss
    objective function. We estimate the appropriate hinge loss margin with the
    expected energy of the target distribution, and derive both a principled
    criterion for updating the margin and an approximate convergence measure. The
    resulting training procedure is simple yet robust on a diverse set of datasets.
    We evaluate the proposed training procedure on the task of unsupervised image
    generation, noting both qualitative and quantitative performance improvements.

    Enabling Embedded Inference Engine with ARM Compute Library

    Dawei Sun, Shaoshan Liu, Jean-Luc Gaudiot
    Comments: 2 pages, 4 figures
    Subjects: Learning (cs.LG)

    When you need to enable deep learning on low-cost embedded SoCs, is it better
    to port an existing deep learning framework or should you build one from
    scratch? In this paper, we share our practical experiences of building an
    embedded inference engine using ARM Compute Library (ACL). The results show
    that, contradictory to conventional wisdoms, for simple models, it takes much
    less development time to build an inference engine from scratch compared to
    porting existing frameworks. In addition, by utilizing ACL, we managed to build
    an inference engine that outperforms TensorFlow by 25%. Our conclusion is that,
    on embedded devices, we most likely will use very simple deep learning models
    for inference, and with well-developed building blocks such as ACL, it may be
    better in both performance and development time to build the engine from
    scratch.

    Deep Extreme Multi-label Learning

    Wenjie Zhang, Liwei Wang, Junchi Yan, Xiangfeng Wang, Hongyuan Zha
    Comments: 7 pages, 7 figures
    Subjects: Learning (cs.LG)

    Extreme multi-label learning or classification has been a practical and
    important problem since the boom of big data. The main challenge lies in the
    exponential label space which involves 2L possible label sets when the label
    dimension L is very large e.g. in millions for Wikipedia labels. This paper is
    motivated to better explore the label space by build- ing and modeling an
    explicit label graph. In the meanwhile, deep learning has been widely studied
    and used in various classification problems includ- ing multi-label
    classification, however it has not been sufficiently studied in this extreme
    but practi- cal case, where the label space can be as large as in millions. In
    this paper, we propose a practical deep embedding method for extreme
    multi-label classifi- cation. Our method harvests the ideas of non-linear
    embedding and modeling label space with graph priors at the same time.
    Extensive experiments on public datasets for XML show that our method per- form
    competitively against state-of-the-art result.

    Sampling-based speech parameter generation using moment-matching networks

    Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari
    Comments: Submitted to INTERSPEECH 2017
    Subjects: Learning (cs.LG)

    This paper presents sampling-based speech parameter generation using
    moment-matching networks for Deep Neural Network (DNN)-based speech synthesis.
    Although people never produce exactly the same speech even if we try to express
    the same linguistic and para-linguistic information, typical statistical speech
    synthesis produces completely the same speech, i.e., there is no
    inter-utterance variation in synthetic speech. To give synthetic speech natural
    inter-utterance variation, this paper builds DNN acoustic models that make it
    possible to randomly sample speech parameters. The DNNs are trained so that
    they make the moments of generated speech parameters close to those of natural
    speech parameters. Since the variation of speech parameters is compressed into
    a low-dimensional simple prior noise vector, our algorithm has lower
    computation cost than direct sampling of speech parameters. As the first step
    towards generating synthetic speech that has natural inter-utterance variation,
    this paper investigates whether or not the proposed sampling-based generation
    deteriorates synthetic speech quality. In evaluation, we compare speech quality
    of conventional maximum likelihood-based generation and proposed sampling-based
    generation. The result demonstrates the proposed generation causes no
    degradation in speech quality.

    Active classification with comparison queries

    Daniel M. Kane, Shachar Lovett, Shay Moran, Jiapeng Zhang
    Comments: 23 pages (not including references), 1 figure
    Subjects: Learning (cs.LG); Computational Geometry (cs.CG)

    We study an extension of active learning in which the learning algorithm may
    ask the annotator to compare the distances of two examples from the boundary of
    their label-class. For example, in a recommendation system application (say for
    restaurants), the annotator may be asked whether she liked or disliked a
    specific restaurant (a label query); or which one of two restaurants did she
    like more (a comparison query).

    We focus on the class of half spaces, and show that under natural
    assumptions, such as large margin or bounded bit-description of the input
    examples, it is possible to reveal all the labels of a sample of size (n) using
    approximately (O(log n)) queries. This implies an exponential improvement over
    classical active learning, where only label queries are allowed. We complement
    these results by showing that if any of these assumptions is removed then, in
    the worst case, (Omega(n)) queries are required.

    Our results follow from a new general framework of active learning with
    additional queries. We identify a combinatorial dimension, called the
    emph{inference dimension}, that captures the query complexity when each
    additional query is determined by (O(1)) examples (such as comparison queries,
    each of which is determined by the two compared examples). Our results for half
    spaces follow by bounding the inference dimension in the cases discussed above.

    Robustly Learning a Gaussian: Getting Optimal Error, Efficiently

    Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart
    Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

    We study the fundamental problem of learning the parameters of a
    high-dimensional Gaussian in the presence of noise — where an
    (varepsilon)-fraction of our samples were chosen by an adversary. We give
    robust estimators that achieve estimation error (O(varepsilon)) in the total
    variation distance, which is optimal up to a universal constant that is
    independent of the dimension.

    In the case where just the mean is unknown, our robustness guarantee is
    optimal up to a factor of (sqrt{2}) and the running time is polynomial in (d)
    and (1/epsilon). When both the mean and covariance are unknown, the running
    time is polynomial in (d) and quasipolynomial in (1/varepsilon). Moreover all
    of our algorithms require only a polynomial number of samples. Our work shows
    that the same sorts of error guarantees that were established over fifty years
    ago in the one-dimensional setting can also be achieved by efficient algorithms
    in high-dimensional settings.

    Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

    Timo Hackel, Nikolay Savinov, Lubor Ladicky, Jan D. Wegner, Konrad Schindler, Marc Pollefeys
    Comments: Accepted to ISPRS Annals. The benchmark website is available at this http URL . The baseline code is available at this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

    This paper presents a new 3D point cloud classification benchmark data set
    with over four billion manually labelled points, meant as input for data-hungry
    (deep) learning methods. We also discuss first submissions to the benchmark
    that use deep convolutional neural networks (CNNs) as a work horse, which
    already show remarkable performance improvements over state-of-the-art. CNNs
    have become the de-facto standard for many tasks in computer vision and machine
    learning like semantic segmentation or object detection in images, but have no
    yet led to a true breakthrough for 3D point cloud labelling tasks due to lack
    of training data. With the massive data set presented in this paper, we aim at
    closing this data gap to help unleash the full potential of deep learning
    methods for 3D labelling tasks. Our semantic3D.net data set consists of dense
    point clouds acquired with static terrestrial laser scanners. It contains 8
    semantic classes and covers a wide range of urban outdoor scenes: churches,
    streets, railroad tracks, squares, villages, soccer fields and castles. We
    describe our labelling interface and show that our data set provides more dense
    and complete point clouds with much higher overall number of labelled points
    compared to those already available to the research community. We further
    provide baseline method descriptions and comparison between methods submitted
    to our online system. We hope semantic3D.net will pave the way for deep
    learning methods in 3D point cloud labelling to learn richer, more general 3D
    representations, and first submissions after only a few months indicate that
    this might indeed be the case.

    Deep Neural Network Based Precursor microRNA Prediction on Eleven Species

    Jaya Thomas, Lee Sael
    Comments: 6 pages, 2 figures, extended from BigComp2017 short paper
    Subjects: Quantitative Methods (q-bio.QM); Learning (cs.LG)

    MicroRNA (miRNA) are small non-coding RNAs that regulates the gene expression
    at the post-transcriptional level. Determining whether a sequence segment is
    miRNA is experimentally challenging. Also, experimental results are sensitive
    to the experimental environment. These limitations inspire the development of
    computational methods for predicting the miRNAs. We propose a deep learning
    based classification model, called DP-miRNA, for predicting precursor miRNA
    sequence that contains the miRNA sequence. The feature set based Restricted
    Boltzmann Machine method, which we call DP-miRNA, uses 58 features that are
    categorized into four groups: sequence features, folding measures, stem-loop
    features and statistical feature. We evaluate the performance of the DP-miRNA
    on eleven twelve data sets of varying species, including the human. The deep
    neural network based classification outperformed support vector machine, neural
    network, naive Baye’s classifiers, k-nearest neighbors, random forests, and a
    hybrid system combining support vector machine and genetic algorithm.

    A Neural Parametric Singing Synthesizer

    Merlijn Blaauw, Jordi Bonada
    Comments: Submitted to Interspeech 2017
    Subjects: Sound (cs.SD); Computation and Language (cs.CL); Learning (cs.LG)

    We present a new model for singing synthesis based on a modified version of
    the WaveNet architecture. Instead of modeling raw waveform, we model features
    produced by a parametric vocoder that separates the influence of pitch and
    timbre. This allows conveniently modifying pitch to match any target melody,
    facilitates training on more modest dataset sizes, and significantly reduces
    training and generation times. Our model makes frame-wise predictions using
    mixture density outputs rather than categorical outputs in order to reduce the
    required parameter count. As we found overfitting to be an issue with the
    relatively small datasets used in our experiments, we propose a method to
    regularize the model and make the autoregressive generation process more robust
    to prediction errors. Using a simple multi-stream architecture, harmonic,
    aperiodic and voiced/unvoiced components can all be predicted in a coherent
    manner. We compare our method to existing parametric statistical and
    state-of-the-art concatenative methods using quantitative metrics and a
    listening test. While naive implementations of the autoregressive generation
    algorithm tend to be inefficient, using a smart algorithm we can greatly speed
    up the process and obtain a system that’s competitive in both speed and
    quality.

    A Proof of Orthogonal Double Machine Learning with (Z)-Estimators

    Vasilis Syrgkanis
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Statistics Theory (math.ST)

    We consider two stage estimation with a non-parametric first stage and a
    generalized method of moments second stage, in a simpler setting than
    (Chernozhukov et al. 2016). We give an alternative proof of the theorem given
    in (Chernozhukov et al. 2016) that orthogonal second stage moments, sample
    splitting and (n^{1/4})-consistency of the first stage, imply
    (sqrt{n})-consistency and asymptotic normality of second stage estimates. Our
    proof is for a variant of their estimator, which is based on the empirical
    version of the moment condition (Z-estimator), rather than a minimization of a
    norm of the empirical vector of moments (M-estimator). This note is meant
    primarily for expository purposes, rather than as a new technical contribution.

    Deep-FExt: Deep Feature Extraction for Vessel Segmentation and Centerline Prediction

    Giles Tetteh, Markus Rempfler, Bjoern H. Menze, Claus Zimmer
    Comments: 9 pages
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Feature extraction is a very crucial task in image and pixel (voxel)
    classification and regression in biomedical image modeling. In this work we
    present a machine learning based feature extraction scheme based on inception
    models for pixel classification tasks. We extract features under multi-scale
    and multi-layer schemes through convolutional operators. Layers of Fully
    Convolutional Network are later stacked on this feature extraction layers and
    trained end-to-end for the purpose of classification. We test our model on the
    DRIVE and STARE public data sets for the purpose of segmentation and centerline
    detection and it out performs most existing hand crafted or deterministic
    feature schemes found in literature. We achieve an average maximum Dice of 0.85
    on the DRIVE data set which out performs the scores from the second human
    annotator of this data set. We also achieve an average maximum Dice of 0.85 and
    kappa of 0.84 on the STARE data set. Though these datasets are mainly 2-D we
    also propose ways of extending this feature extraction scheme to handle 3-D
    datasets.

    Learning from Demonstrations for Real World Reinforcement Learning

    Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    Deep reinforcement learning (RL) has achieved several high profile successes
    in difficult control problems. However, these algorithms typically require a
    huge amount of data before they reach reasonable performance. In fact, their
    performance during learning can be extremely poor. This may be acceptable for a
    simulator, but it severely limits the applicability of deep RL to many
    real-world tasks, where the agent must learn in the real environment. In this
    paper we study a setting where the agent may access data from previous control
    of the system. We present an algorithm, Deep Q-learning from Demonstrations
    (DQfD), that leverages this data to massively accelerate the learning process
    even from relatively small amounts of demonstration data. DQfD works by
    combining temporal difference updates with large-margin classification of the
    demonstrator’s actions. We show that DQfD has better initial performance than
    Deep Q-Networks (DQN) on 40 of 42 Atari games and it receives more average
    rewards than DQN on 27 of 42 Atari games. We also demonstrate that DQfD learns
    faster than DQN even when given poor demonstration data.

    Investigation on the use of Hidden-Markov Models in automatic transcription of music

    D. Cazau, G. Nuel
    Comments: arXiv admin note: text overlap with arXiv:1703.09772
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)

    Hidden Markov Models (HMMs) are a ubiquitous tool to model time series data,
    and have been widely used in two main tasks of Automatic Music Transcription
    (AMT): note segmentation, i.e. identifying the played notes after a multi-pitch
    estimation, and sequential post-processing, i.e. correcting note segmentation
    using training data. In this paper, we employ the multi-pitch estimation method
    called Probabilistic Latent Component Analysis (PLCA), and develop AMT systems
    by integrating different HMM-based modules in this framework. For note
    segmentation, we use two different twostate on/o? HMMs, including a
    higher-order one for duration modeling. For sequential post-processing, we
    focused on a musicological modeling of polyphonic harmonic transitions, using a
    first- and second-order HMMs whose states are defined through candidate note
    mixtures. These different PLCA plus HMM systems have been evaluated
    comparatively on two different instrument repertoires, namely the piano (using
    the MAPS database) and the marovany zither. Our results show that the use of
    HMMs could bring noticeable improvements to transcription results, depending on
    the instrument repertoire.

    Energy Propagation in Deep Convolutional Neural Networks

    Thomas Wiatowski, Philipp Grohs, Helmut Bölcskei
    Comments: IEEE Transactions on Information Theory, Apr. 2017, submitted
    Subjects: Information Theory (cs.IT); Learning (cs.LG); Functional Analysis (math.FA); Machine Learning (stat.ML)

    Many practical machine learning tasks employ very deep convolutional neural
    networks. Such large depths pose formidable computational challenges in
    training and operating the network. It is therefore important to understand how
    many layers are actually needed to have most of the input signal’s features be
    contained in the feature vector generated by the network. This question can be
    formalized by asking how quickly the energy contained in the feature maps
    decays across layers. In addition, it is desirable that none of the input
    signal’s features be “lost” in the feature extraction network or, more
    formally, we want energy conservation in the sense of the energy contained in
    the feature vector being proportional to that of the corresponding input
    signal. This paper establishes conditions for energy conservation for a wide
    class of deep convolutional neural networks and characterizes corresponding
    feature map energy decay rates. Specifically, we consider general scattering
    networks, and find that under mild analyticity and high-pass conditions on the
    filters (which encompass, inter alia, various constructions of Weyl-Heisenberg
    filters, wavelets, ridgelets, ((alpha))-curvelets, and shearlets) the feature
    map energy decays at least polynomially fast. For broad families of wavelets
    and Weyl-Heisenberg filters, the guaranteed decay rate is shown to be
    exponential. Our results yield handy estimates of the number of layers needed
    to have at least (((1-varepsilon)cdot 100)\%) of the input signal energy be
    contained in the feature vector.

    Representation Stability as a Regularizer for Improved Text Analytics Transfer Learning

    Matthew Riemer, Elham Khabiri, Richard Goodwin
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Although neural networks are well suited for sequential transfer learning
    tasks, the catastrophic forgetting problem hinders proper integration of prior
    knowledge. In this work, we propose a solution to this problem by using a
    multi-task objective based on the idea of distillation and a mechanism that
    directly penalizes forgetting at the shared representation layer during the
    knowledge integration phase of training. We demonstrate our approach on a
    Twitter domain sentiment analysis task with sequential knowledge transfer from
    four related tasks. We show that our technique outperforms networks fine-tuned
    to the target task. Additionally, we show both through empirical evidence and
    examples that it does not forget useful knowledge from the source task that is
    forgotten during standard fine-tuning. Surprisingly, we find that first
    distilling a human made rule based sentiment engine into a recurrent neural
    network and then integrating the knowledge with the target task data leads to a
    substantial gain in generalization performance. Our experiments demonstrate the
    power of multi-source transfer techniques in practical text analytics problems
    when paired with distillation. In particular, for the SemEval 2016 Task 4
    Subtask A (Nakov et al., 2016) dataset we surpass the state of the art
    established during the competition with a comparatively simple model
    architecture that is not even competitive when trained on only the labeled task
    specific data.

    Leveraging Term Banks for Answering Complex Questions: A Case for Sparse Vectors

    Peter D. Turney
    Comments: Related datasets can be found at this http URL
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Learning (cs.LG)

    While open-domain question answering (QA) systems have proven effective for
    answering simple questions, they struggle with more complex questions. Our goal
    is to answer more complex questions reliably, without incurring a significant
    cost in knowledge resource construction to support the QA. One readily
    available knowledge resource is a term bank, enumerating the key concepts in a
    domain. We have developed an unsupervised learning approach that leverages a
    term bank to guide a QA system, by representing the terminological knowledge
    with thousands of specialized vector spaces. In experiments with complex
    science questions, we show that this approach significantly outperforms several
    state-of-the-art QA systems, demonstrating that significant leverage can be
    gained from continuous vector representations of domain terminology.

    A Neural Representation of Sketch Drawings

    David Ha, Douglas Eck
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

    We present sketch-rnn, a recurrent neural network (RNN) able to construct
    stroke-based drawings of common objects. The model is trained on thousands of
    crude human-drawn images representing hundreds of classes. We outline a
    framework for conditional and unconditional sketch generation, and describe new
    robust training methods for generating coherent sketch drawings in a vector
    format.

    Personalized Survival Predictions for Cardiac Transplantation via Trees of Predictors

    J. Yoon, W. R. Zame, A. Banerjee, M. Cadeiras, A. M. Alaa, M. van der Schaar
    Comments: Main manuscript: 20 pages, Supplementary materials: 13 pages, 5 figures, 3 tables. Submitted to Science Translational Medicine
    Subjects: Applications (stat.AP); Learning (cs.LG)

    Given the limited pool of donor organs, accurate predictions of survival on
    the wait list and post transplantation are crucial for cardiac transplantation
    decisions and policy. However, current clinical risk scores do not yield
    accurate predictions. We develop a new methodology (ToPs, Trees of Predictors)
    built on the principle that specific predictors should be used for specific
    clusters within the target population. ToPs discovers these specific clusters
    of patients and the specific predictor that perform best for each cluster. In
    comparison with current clinical risk scoring systems, our method provides
    significant improvements in the prediction of survival time on the wait list
    and post transplantation. For example, in terms of 3 month survival for
    patients who were on the US patient wait list in the period 1985 to 2015, our
    method achieves AUC of 0.847, the best commonly used clinical risk score
    (MAGGIC) achieves 0.630. In terms of 3 month survival/mortality predictions (in
    comparison to MAGGIC), holding specificity at 80.0 percents, our algorithm
    correctly predicts survival for 1,228 (26.0 percents more patients out of 4,723
    who actually survived, holding sensitivity at 80.0 percents, our algorithm
    correctly predicts mortality for 839 (33.0 percents) more patients out of 2,542
    who did not survive. Our method achieves similar improvements for other time
    horizons and for predictions post transplantation. Therefore, we offer a more
    accurate, personalized approach to survival analysis that can benefit patients,
    clinicians and policymakers in making clinical decisions and setting clinical
    policy. Because risk prediction is widely used in diagnostic and prognostic
    clinical decision making across diseases and clinical specialties, the
    implications of our methods are far reaching.


    Information Theory

    From ds-bounds for cyclic codes to true distance for abelian codes

    J. J. Bernal, M. Guerreiro, J. J. Simón
    Comments: arXiv admin note: text overlap with arXiv:1604.02949
    Subjects: Information Theory (cs.IT)

    In this paper we develop a technique to extend any bound for the minimum
    distance of cyclic codes constructed from its defining sets (ds-bounds) to
    abelian (or multivariate) codes through the notion of (mathbb{B})-apparent
    distance. We use this technique to improve the searching for new bounds for the
    minimum distance of abelian codes. We also study conditions for an abelian code
    to verify that its (mathbb{B})-apparent distance reaches its (true) minimum
    distance. Then we construct some tables of such codes as an application

    Energy Propagation in Deep Convolutional Neural Networks

    Thomas Wiatowski, Philipp Grohs, Helmut Bölcskei
    Comments: IEEE Transactions on Information Theory, Apr. 2017, submitted
    Subjects: Information Theory (cs.IT); Learning (cs.LG); Functional Analysis (math.FA); Machine Learning (stat.ML)

    Many practical machine learning tasks employ very deep convolutional neural
    networks. Such large depths pose formidable computational challenges in
    training and operating the network. It is therefore important to understand how
    many layers are actually needed to have most of the input signal’s features be
    contained in the feature vector generated by the network. This question can be
    formalized by asking how quickly the energy contained in the feature maps
    decays across layers. In addition, it is desirable that none of the input
    signal’s features be “lost” in the feature extraction network or, more
    formally, we want energy conservation in the sense of the energy contained in
    the feature vector being proportional to that of the corresponding input
    signal. This paper establishes conditions for energy conservation for a wide
    class of deep convolutional neural networks and characterizes corresponding
    feature map energy decay rates. Specifically, we consider general scattering
    networks, and find that under mild analyticity and high-pass conditions on the
    filters (which encompass, inter alia, various constructions of Weyl-Heisenberg
    filters, wavelets, ridgelets, ((alpha))-curvelets, and shearlets) the feature
    map energy decays at least polynomially fast. For broad families of wavelets
    and Weyl-Heisenberg filters, the guaranteed decay rate is shown to be
    exponential. Our results yield handy estimates of the number of layers needed
    to have at least (((1-varepsilon)cdot 100)\%) of the input signal energy be
    contained in the feature vector.

    Hybrid Beamforming via the Kronecker Decomposition for the Millimeter-Wave Massive MIMO Systems

    Guangxu Zhu, Kaibin Huang, Vincent K. N. Lau, Bin Xia, Xiaofan Li, Sha Zhang
    Comments: Submitted to IEEE JSAC Special Issue on Millimeter Wave Communications for Future Mobile Networks, minor revision
    Subjects: Information Theory (cs.IT)

    Despite its promising performance gain, the realization of mmWave massive
    MIMO still faces several practical challenges. In particular, implementing
    massive MIMO in the digital domain requires hundreds of RF chains matching the
    number of antennas. Furthermore, designing these components to operate at the
    mmWave frequencies is challenging and costly. These motivated the recent
    development of hybrid-beamforming where MIMO processing is divided for separate
    implementation in the analog and digital domains, called the analog and digital
    beamforming, respectively. Analog beamforming using a phase array introduces
    uni-modulus constraints on the beamforming coefficients, rendering the
    conventional MIMO techniques unsuitable and call for new designs. In this
    paper, we present a systematic design framework for hybrid beamforming for
    multi-cell multiuser massive MIMO systems over mmWave channels characterized by
    sparse propagation paths. The framework relies on the decomposition of analog
    beamforming vectors and path observation vectors into Kronecker products of
    factors being uni-modulus vectors. Exploiting properties of Kronecker mixed
    products, different factors of the analog beamformer are designed for either
    nulling interference paths or coherently combining data paths. Furthermore, a
    channel estimation scheme is designed for enabling the proposed hybrid
    beamforming. The scheme estimates the AoA of data and interference paths by
    analog beam scanning and data-path gains by analog beam steering. The
    performance of the channel estimation scheme is analyzed. In particular, the
    AoA spectrum resulting from beam scanning, which displays the magnitude
    distribution of paths over the AoA range, is derived in closed-form. It is
    shown that the inter-cell interference level diminishes inversely with the
    array size, the square root of pilot sequence length and the spatial separation
    between paths.

    Privacy-Aware Guessing Efficiency

    Shahab Asoodeh, Mario Diaz, Fady Alajaji, Tamás Linder
    Comments: ISIT 2017
    Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Statistics Theory (math.ST)

    We investigate the problem of guessing a discrete random variable (Y) under a
    privacy constraint dictated by another correlated discrete random variable (X),
    where both guessing efficiency and privacy are assessed in terms of the
    probability of correct guessing. We define (h(P_{XY}, epsilon)) as the maximum
    probability of correctly guessing (Y) given an auxiliary random variable (Z),
    where the maximization is taken over all (P_{Z|Y}) ensuring that the
    probability of correctly guessing (X) given (Z) does not exceed (epsilon). We
    show that the map (epsilonmapsto h(P_{XY}, epsilon)) is strictly increasing,
    concave, and piecewise linear, which allows us to derive a closed form
    expression for (h(P_{XY}, epsilon)) when (X) and (Y) are connected via a
    binary-input binary-output channel. For ((X^n, Y^n)) being pairs of independent
    and identically distributed binary random vectors, we similarly define
    (underline{h}_n(P_{X^nY^n}, epsilon)) under the assumption that (Z^n) is also
    a binary vector. Then we obtain a closed form expression for
    (underline{h}_n(P_{X^nY^n}, epsilon)) for sufficiently large, but nontrivial
    values of (epsilon).

    NOMA based Calibration for Large-Scale Spaceborne Antenna Arrays

    Yujie Lin, Shuai Wang, Xiangyuan Bu, Chengwen Xing, Jianping An
    Comments: 30 Pages, 8 Figures
    Subjects: Information Theory (cs.IT)

    In the parallel calibration for transmitting phased arrays, the calibration
    receiver must separate the signals belonging to different antenna elements to
    avoid mutual interference. Existing algorithms encode different antenna
    elements’ radiation with orthogonal signature codes, but these algorithms are
    far from desired for large-scale spaceborne antenna arrays. Considering the
    strictly limited resources on satellites, to improve hardware efficiency of
    large-scale spaceborne antenna arrays, in this paper based on non-orthogonal
    multiple access (NOMA) we design a series of non-orthogonal signature codes for
    different antenna elements by Cyclically Shifting an m-Sequence (CSmS) with
    different offsets named as CSmS-NOMA signaling. This scheme can give an elegant
    balance between performance and complexity and is very suitable for large-scale
    spaceborne antenna arrays. It is shown that no matter how many antenna elements
    are to be calibrated simultaneously, CSmS-NOMA signaling needs only one
    calibrating waveform generator and one matched filter. Hence it is much more
    efficient than the existing fully orthogonal schemes. To evaluate the
    achievable calibration accuracy, a unified theoretical framework is developed
    based on which the relationship between calibration accuracy and signal to
    noise ratio (SNR) has been clearly revealed. Furthermore, a hardware experiment
    platform is also built to access the theoretical work. For all the considered
    scenarios, it can be concluded that the theoretical, simulated and experimental
    results coincide with each other perfectly.

    Bayesian Optimal Data Detector for mmWave OFDM System with Low-Resolution ADC

    Hanqing Wang, Chao-Kai Wen, Shi Jin
    Comments: 32 pages, 12 figures; accepted by IEEE JSAC special issue on millimeter wave communications for future mobile networks
    Subjects: Information Theory (cs.IT)

    Orthogonal frequency division multiplexing (OFDM) has been widely used in
    communication systems operating in the millimeter wave (mmWave) band to combat
    frequency-selective fading and achieve multi-Gbps transmissions, such as IEEE
    802.15.3c and IEEE 802.11ad. For mmWave systems with ultra high sampling rate
    requirements, the use of low-resolution analog-to-digital converters (ADCs)
    (i.e., 1-3 bits) ensures an acceptable level of power consumption and system
    costs. However, orthogonality among sub-channels in the OFDM system cannot be
    maintained because of the severe non-linearity caused by low-resolution ADC,
    which renders the design of data detector challenging. In this study, we
    develop an efficient algorithm for optimal data detection in the mmWave OFDM
    system with low-resolution ADCs. The analytical performance of the proposed
    detector is derived and verified to achieve the fundamental limit of the
    Bayesian optimal design. On the basis of the derived analytical expression, we
    further propose a power allocation (PA) scheme that seeks to minimize the
    average symbol error rate. In addition to the optimal data detector, we also
    develop a feasible channel estimation method, which can provide high-quality
    channel state information without significant pilot overhead. Simulation
    results confirm the accuracy of our analysis and illustrate that the
    performance of the proposed detector in conjunction with the proposed PA scheme
    is close to the optimal performance of the OFDM system with infinite-resolution
    ADC.

    On Codes over (mathbb{F}_{q}+vmathbb{F}_{q}+v^{2}mathbb{F}_{q})

    A. Melakhessou, K. Guenda, T. A. Gulliver, M. Shi, P. Solé
    Comments: 19 pages
    Subjects: Information Theory (cs.IT); Rings and Algebras (math.RA)

    In this paper we investigate linear codes with complementary dual (LCD) codes
    and formally self-dual codes over the ring (R=F_{q}+vF_{q}+v^{2}F_{q}),
    where (v^{3}=v), for (q) odd. We give conditions on the existence of LCD codes
    and present construction of formally self-dual codes over (R). Further, we give
    bounds on the minimum distance of LCD codes over (F_q) and extend these to
    codes over (R).

    Robustly Learning a Gaussian: Getting Optimal Error, Efficiently

    Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart
    Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

    We study the fundamental problem of learning the parameters of a
    high-dimensional Gaussian in the presence of noise — where an
    (varepsilon)-fraction of our samples were chosen by an adversary. We give
    robust estimators that achieve estimation error (O(varepsilon)) in the total
    variation distance, which is optimal up to a universal constant that is
    independent of the dimension.

    In the case where just the mean is unknown, our robustness guarantee is
    optimal up to a factor of (sqrt{2}) and the running time is polynomial in (d)
    and (1/epsilon). When both the mean and covariance are unknown, the running
    time is polynomial in (d) and quasipolynomial in (1/varepsilon). Moreover all
    of our algorithms require only a polynomial number of samples. Our work shows
    that the same sorts of error guarantees that were established over fifty years
    ago in the one-dimensional setting can also be achieved by efficient algorithms
    in high-dimensional settings.

    Dynamic Quadratic Cheap Talk and Signaling Games

    Serkan Sarıtaş, Serdar Yüksel, Sinan Gezici
    Comments: 30 pages
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)

    Simultaneous (Nash) and sequential (Stackelberg) equilibria of two-player
    dynamic quadratic cheap talk and signaling game problems are investigated under
    a perfect Bayesian formulation. For the dynamic scalar and multi-dimensional
    cheap talk, it is shown that the Nash equilibrium cannot be fully revealing
    whereas the Stackelberg equilibrium is always fully revealing. In addition, the
    final state Nash equilibria have to be essentially quantized when the source is
    scalar, and non-revealing for the multi-dimensional case. In the dynamic
    signaling game where the transmission of a Gauss-Markov source over a
    memoryless Gaussian channel is considered, affine policies constitute an
    invariant subspace under best response maps for both scalar and
    multi-dimensional sources under Nash equilibria; however, the Stackelberg
    equilibrium policies are always linear for scalar sources but may be non-linear
    for multi-dimensional sources. Under the Stackelberg setup, the conditions
    under which the equilibrium is non-informative are derived for scalar sources,
    and a dynamic programming solution is presented when the encoders are
    restricted to be linear for multi-dimensional sources.

    Inter-Operator Resource Management for Millimeter Wave, Multi-Hop Backhaul Networks

    Omid Semiari, Walid Saad, Mehdi Bennis, Zaher Dawy
    Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    In this paper, a novel framework is proposed for optimizing the operation and
    performance of a large-scale, multi-hop millimeter wave (mmW) backhaul within a
    wireless small cell network (SCN) that encompasses multiple mobile network
    operators (MNOs). The proposed framework enables the small base stations (SBSs)
    to jointly decide on forming the multi-hop, mmW links over backhaul
    infrastructure that belongs to multiple, independent MNOs, while properly
    allocating resources across those links. In this regard, the problem is
    addressed using a novel framework based on matching theory that is composed to
    two, highly inter-related stages: a multi-hop network formation stage and a
    resource management stage. One unique feature of this framework is that it
    jointly accounts for both wireless channel characteristics and economic factors
    during both network formation and resource management. The multi-hop network
    formation stage is formulated as a one-to-many matching game which is solved
    using a novel algorithm, that builds on the so-called deferred acceptance
    algorithm and is shown to yield a stable and Pareto optimal multi-hop mmW
    backhaul network. Then, a one-to-many matching game is formulated to enable
    proper resource allocation across the formed multi-hop network. This game is
    then shown to exhibit peer effects and, as such, a novel algorithm is developed
    to find a stable and optimal resource management solution that can properly
    cope with these peer effects. Simulation results show that the proposed
    framework yields substantial gains, in terms of the average sum rate, reaching
    up to 27% and 54%, respectively, compared to a non-cooperative scheme in which
    inter-operator sharing is not allowed and a random allocation approach. The
    results also show that our framework provides insights on how to manage pricing
    and the cost of the cooperative mmW backhaul network for the MNOs.




沪ICP备19023445号-2号
友情链接