IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 18 Oct 2016

    我爱机器学习(52ml.net)发表于 2016-10-18 00:00:00
    love 0

    Neural and Evolutionary Computing

    Evolving the Structure of Evolution Strategies

    Sander van Rijn, Hao Wang, Matthijs van Leeuwen, Thomas Bäck
    Comments: 10 pages. Extended (pre-print) version of SSCI 2016 submission
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Various variants of the well known Covariance Matrix Adaptation Evolution
    Strategy (CMA-ES) have been proposed recently, which improve the empirical
    performance of the original algorithm by structural modifications. However, in
    practice it is often unclear which variation is best suited to the specific
    optimization problem at hand. As one approach to tackle this issue, algorithmic
    mechanisms attached to CMA-ES variants are considered and extracted as
    functional emph{modules}, allowing for combinations of them. This leads to a
    configuration space over ES structures, which enables the exploration of
    algorithm structures and paves the way toward novel algorithm generation.
    Specifically, eleven modules are incorporated in this framework with two or
    three alternative configurations for each module, resulting in $4,608$
    algorithms. A self-adaptive Genetic Algorithm (GA) is used to efficiently
    evolve effective ES-structures for given classes of optimization problems,
    outperforming any classical CMA-ES variants from literature. The proposed
    approach is evaluated on noiseless functions from BBOB suite. Furthermore, such
    an observation is again confirmed on different function groups and
    dimensionality, indicating the feasibility of ES configuration on real-world
    problem classes.

    Weekly maintenance scheduling using exact and genetic methods

    Andrew W. Palmer, Robin Vujanic, Andrew J. Hill, Steven J. Scheding
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

    The weekly maintenance schedule specifies when maintenance activities should
    be performed on the equipment, taking into account the availability of workers
    and maintenance bays, and other operational constraints. The current approach
    to generating this schedule is labour intensive and requires coordination
    between the maintenance schedulers and operations staff to minimise its impact
    on the operation of the mine. This paper presents methods for automatically
    generating this schedule from the list of maintenance tasks to be performed,
    the availability roster of the maintenance staff, and time windows in which
    each piece of equipment is available for maintenance. Both Mixed-Integer Linear
    Programming (MILP) and genetic algorithms are evaluated, with the genetic
    algorithm shown to significantly outperform the MILP. Two fitness functions for
    the genetic algorithm are also examined, with a linear fitness function
    outperforming an inverse fitness function by up to 5% for the same calculation
    time. The genetic algorithm approach is computationally fast, allowing the
    schedule to be rapidly recalculated in response to unexpected delays and
    breakdowns.

    Multiple Instance Fuzzy Inference Neural Networks

    Amine Ben Khalifa, Hichem Frigui
    Comments: Submitted to IEEE Transactions On Cybernetics for review
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (cs.SY)

    Fuzzy logic is a powerful tool to model knowledge uncertainty, measurements
    imprecision, and vagueness. However, there is another type of vagueness that
    arises when data have multiple forms of expression that fuzzy logic does not
    address quite well. This is the case for multiple instance learning problems
    (MIL). In MIL, an object is represented by a collection of instances, called a
    bag. A bag is labeled negative if all of its instances are negative, and
    positive if at least one of its instances is positive. Positive bags encode
    ambiguity since the instances themselves are not labeled. In this paper, we
    introduce fuzzy inference systems and neural networks designed to handle bags
    of instances as input and capable of learning from ambiguously labeled data.
    First, we introduce the Multiple Instance Sugeno style fuzzy inference
    (MI-Sugeno) that extends the standard Sugeno style inference to handle
    reasoning with multiple instances. Second, we use MI-Sugeno to define and
    develop Multiple Instance Adaptive Neuro Fuzzy Inference System (MI-ANFIS). We
    expand the architecture of the standard ANFIS to allow reasoning with bags and
    derive a learning algorithm using backpropagation to identify the premise and
    consequent parameters of the network. The proposed inference system is tested
    and validated using synthetic and benchmark datasets suitable for MIL problems.
    We also apply the proposed MI-ANFIS to fuse the output of multiple
    discrimination algorithms for the purpose of landmine detection using Ground
    Penetrating Radar.

    Cached Long Short-Term Memory Neural Networks for Document-Level Sentiment Classification

    Jiacheng Xu, Danlu Chen, Xipeng Qiu, Xuangjing Huang
    Comments: Published as long paper of EMNLP2016
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    Recently, neural networks have achieved great success on sentiment
    classification due to their ability to alleviate feature engineering. However,
    one of the remaining challenges is to model long texts in document-level
    sentiment classification under a recurrent architecture because of the
    deficiency of the memory unit. To address this problem, we present a Cached
    Long Short-Term Memory neural networks (CLSTM) to capture the overall semantic
    information in long texts. CLSTM introduces a cache mechanism, which divides
    memory into several groups with different forgetting rates and thus enables the
    network to keep sentiment information better within a recurrent unit. The
    proposed CLSTM outperforms the state-of-the-art models on three publicly
    available document-level sentiment analysis datasets.


    Computer Vision and Pattern Recognition

    Rule Extraction Algorithm for Deep Neural Networks: A Review

    Tameru Hailesilassie
    Comments: 6 pages,2 figures,IEEE Publication format, Keywords- Artificial neural network; Deep neural network; Rule extraction; Decompositional; Pedagogical; Eclectic
    Journal-ref: (IJCSIS) International Journal of Computer Science and Information
    Security,Vol. 14, No. 7, July 2016, page 371-381
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Despite the highest classification accuracy in wide varieties of application
    areas, artificial neural network has one disadvantage. The way this Network
    comes to a decision is not easily comprehensible. The lack of explanation
    ability reduces the acceptability of neural network in data mining and decision
    system. This drawback is the reason why researchers have proposed many rule
    extraction algorithms to solve the problem. Recently, Deep Neural Network (DNN)
    is achieving a profound result over the standard neural network for
    classification and recognition problems. It is a hot machine learning area
    proven both useful and innovative. This paper has thoroughly reviewed various
    rule extraction algorithms, considering the classification scheme:
    decompositional, pedagogical, and eclectics. It also presents the evaluation of
    these algorithms based on the neural network structure with which the algorithm
    is intended to work. The main contribution of this review is to show that there
    is a limited study of rule extraction algorithm from DNN.

    Structured Sparse Subspace Clustering: A Joint Affinity Learning and Subspace Clustering Framework

    Chun-Guang Li, Chong You, René Vidal
    Comments: 13 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Subspace clustering refers to the problem of segmenting data drawn from a
    union of subspaces. State of the art approaches for solving this problem follow
    a two-stage approach. In the first step, an affinity matrix is learned from the
    data using sparse or low-rank minimization techniques. In the second step, the
    segmentation is found by applying spectral clustering to this affinity. While
    this approach has led to state-of-the-art results in many applications, it is
    sub-optimal because it does not exploit the fact that the affinity and the
    segmentation depend on each other. In this paper, we propose a joint
    optimization framework — Structured Sparse Subspace Clustering (S$^3$C) —
    for learning both the affinity and the segmentation. The proposed S$^3$C
    framework is based on expressing each data point as a structured sparse linear
    combination of all other data points, where the structure is induced by a norm
    that depends on the unknown segmentation. Moreover, we extend the proposed
    S$^3$C framework into Constrained Structured Sparse Subspace Clustering
    (CS$^3$C) in which available partial side-information is incorporated into the
    stage of learning the affinity. We show that both the structured sparse
    representation and the segmentation can be found via a combination of an
    alternating direction method of multipliers with spectral clustering.
    Experiments on a synthetic data set, the Extended Yale B data set, the Hopkins
    155 motion segmentation database, and three cancer data sets demonstrate the
    effectiveness of our approach.

    Spatio-temporal Co-Occurrence Characterizations for Human Action Classification

    Aznul Qalid Md Sabri, Jacques Boonaert, Erma Rahayu Mohd Faizal Abdullah, Ali Mohammed Mansoor
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The human action classification task is a widely researched topic and is
    still an open problem. Many state-of-the-arts approaches involve the usage of
    bag-of-video-words with spatio-temporal local features to construct
    characterizations for human actions. In order to improve beyond this standard
    approach, we investigate the usage of co-occurrences between local features. We
    propose the usage of co-occurrences information to characterize human actions.
    A trade-off factor is used to define an optimal trade-off between vocabulary
    size and classification rate. Next, a spatio-temporal co-occurrence technique
    is applied to extract co-occurrence information between labeled local features.
    Novel characterizations for human actions are then constructed. These include a
    vector quantized correlogram-elements vector, a highly discriminative PCA
    (Principal Components Analysis) co-occurrence vector and a Haralick texture
    vector. Multi-channel kernel SVM (support vector machine) is utilized for
    classification. For evaluation, the well known KTH as well as the challenging
    UCF-Sports action datasets are used. We obtained state-of-the-arts
    classification performance. We also demonstrated that we are able to fully
    utilize co-occurrence information, and improve the standard bag-of-video-words
    approach.

    Deep Learning Prototype Domains for Person Re-Identification

    Arne Schumann, Shaogang Gong, Tobias Schuchert
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Person re-identification (re-id) is the task of matching multiple occurrences
    of the same person from different cameras, poses, lighting conditions, and a
    multitude of other factors which alter the visual appearance. Typically, this
    is achieved by learning either optimal features or matching metrics which are
    adapted to specific pairs of camera views dictated by the pairwise labelled
    training datasets. In this work, we formulate a deep learning based novel
    approach to automatic prototype-domain discovery for domain perceptive
    (adaptive) person re-id (rather than camera pair specific learning) for any
    camera views scalable to new unseen scenes without training data. We learn a
    separate re-id model for each of the discovered prototype-domains and during
    model deployment, use the person probe image to select automatically the model
    of the closest prototype domain. Our approach requires neither supervised nor
    unsupervised domain adaptation learning, i.e. no data available from the target
    domains. We evaluate extensively our model under realistic re-id conditions
    using automatically detected bounding boxes with low-resolution and partial
    occlusion. We show that our approach outperforms most of the state-of-the-art
    supervised and unsupervised methods on the latest CUHK-SYSU and PRW benchmarks.

    Encoding the Local Connectivity Patterns of fMRI for Cognitive State Classification

    Itir Onal Ertugrul, Mete Ozay, Fatos T. Yarman Vural
    Comments: 8 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In this work, we propose a novel framework to encode the local connectivity
    patterns of brain, using Fisher Vectors (FV), Vector of Locally Aggregated
    Descriptors (VLAD) and Bag-of-Words (BoW) methods. We first obtain local
    descriptors, called Mesh Arc Descriptors (MADs) from fMRI data, by forming
    local meshes around anatomical regions, and estimating their relationship
    within a neighborhood. Then, we extract a dictionary of relationships, called
    extit{brain connectivity dictionary} by fitting a generative Gaussian mixture
    model (GMM) to a set of MADs, and selecting the codewords at the mean of each
    component of the mixture. Codewords represent the connectivity patterns among
    anatomical regions. We also encode MADs by VLAD and BoW methods using the
    k-Means clustering.

    We classify the cognitive states of Human Connectome Project (HCP) task fMRI
    dataset, where we train support vector machines (SVM) by the encoded MADs.
    Results demonstrate that, FV encoding of MADs can be successfully employed for
    classification of cognitive tasks, and outperform the VLAD and BoW
    representations. Moreover, we identify the significant Gaussians in mixture
    models by computing energy of their corresponding FV parts, and analyze their
    effect on classification accuracy. Finally, we suggest a new method to
    visualize the codewords of brain connectivity dictionary.

    Spatio-Temporal Attention Models for Grounded Video Captioning

    Mihai Zanfir, Elisabeta Marinoiu, Cristian Sminchisescu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Automatic video captioning is challenging due to the complex interactions in
    dynamic real scenes. A comprehensive system would ultimately localize and track
    the objects, actions and interactions present in a video and generate a
    description that relies on temporal localization in order to ground the visual
    concepts. However, most existing automatic video captioning systems map from
    raw video data to high level textual description, bypassing localization and
    recognition, thus discarding potentially valuable information for content
    localization and generalization. In this work we present an automatic video
    captioning model that combines spatio-temporal attention and image
    classification by means of deep neural network structures based on long
    short-term memory. The resulting system is demonstrated to produce
    state-of-the-art results in the standard YouTube captioning benchmark while
    also offering the advantage of localizing the visual concepts (subjects, verbs,
    objects), with no grounding supervision, over space and time.

    What is the Best Way for Extracting Meaningful Attributes from Pictures?

    Liangchen Liu, Arnold Wiliem, Shaokang Chen, Brian C. Lovell
    Comments: Submission to Pattern Recognition
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Automatic attribute discovery methods have gained in popularity to extract
    sets of visual attributes from images or videos for various tasks. Despite
    their good performance in some classification tasks, it is difficult to
    evaluate whether the attributes discovered by these methods are meaningful and
    which methods are the most appropriate to discover attributes for visual
    descriptions. In its simplest form, such an evaluation can be performed by
    manually verifying whether there is any consistent identifiable visual concept
    distinguishing between positive and negative exemplars labelled by an
    attribute. This manual checking is tedious, expensive and labour intensive. In
    addition, comparisons between different methods could also be problematic as it
    is not clear how one could quantitatively decide which attribute is more
    meaningful than the others. In this paper, we propose a novel attribute
    meaningfulness metric to address this challenging problem. With this metric,
    automatic quantitative evaluation can be performed on the attribute sets; thus,
    reducing the enormous effort to perform manual evaluation. The proposed metric
    is applied to some recent automatic attribute discovery and hashing methods on
    four attribute-labelled datasets. To further validate the efficacy of the
    proposed method, we conducted a user study. In addition, we also compared our
    metric with a semi-supervised attribute discover method using the mixture of
    probabilistic PCA. In our evaluation, we gleaned several insights that could be
    beneficial in developing new automatic attribute discovery methods.

    Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input

    Srinath Sridhar, Franziska Mueller, Michael Zollhöfer, Dan Casas, Antti Oulasvirta, Christian Theobalt
    Comments: Proceedings of ECCV 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Real-time simultaneous tracking of hands manipulating and interacting with
    external objects has many potential applications in augmented reality, tangible
    computing, and wearable computing. However, due to difficult occlusions, fast
    motions, and uniform hand appearance, jointly tracking hand and object pose is
    more challenging than tracking either of the two separately. Many previous
    approaches resort to complex multi-camera setups to remedy the occlusion
    problem and often employ expensive segmentation and optimization steps which
    makes real-time tracking impossible. In this paper, we propose a real-time
    solution that uses a single commodity RGB-D camera. The core of our approach is
    a 3D articulated Gaussian mixture alignment strategy tailored to hand-object
    tracking that allows fast pose optimization. The alignment energy uses novel
    regularizers to address occlusions and hand-object contacts. For added
    robustness, we guide the optimization with discriminative part classification
    of the hand and segmentation of the object. We conducted extensive experiments
    on several existing datasets and introduce a new annotated hand-object dataset.
    Quantitative and qualitative results show the key advantages of our method:
    speed, accuracy, and robustness.

    Digital Makeup from Internet Images

    Asad Khan, Yudong Guo, Ligang Liu
    Comments: 10 pages, 11 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

    We present a novel approach of color transfer between images by exploring
    their high-level semantic information. First, we set up a database which
    consists of the collection of downloaded images from the internet, which are
    segmented automatically by using matting techniques. We then, extract image
    foregrounds from both source and multiple target images. Then by using image
    matting algorithms, the system extracts the semantic information such as faces,
    lips, teeth, eyes, eyebrows, etc., from the extracted foregrounds of the source
    image. And, then the color is transferred between corresponding parts with the
    same semantic information. Next we get the color transferred result by
    seamlessly compositing different parts together using alpha blending. In the
    final step, we present an efficient method of color consistency to optimize the
    color of a collection of images showing the common scene. The main advantage of
    our method over existing techniques is that it does not need face matching, as
    one could use more than one target images. It is not restricted to head shot
    images as we can also change the color style in the wild. Moreover, our
    algorithm does not require to choose the same color style, same pose and image
    size between source and target images. Our algorithm is not restricted to
    one-to-one image color transfer and can make use of more than one target images
    to transfer the color in different parts in the source image. Comparing with
    other approaches, our algorithm is much better in color blending in the input
    data.

    Location Sensitive Deep Convolutional Neural Networks for Segmentation of White Matter Hyperintensities

    Mohsen Ghafoorian, Nico Karssemeijer, Tom Heskes, Inge van Uden, Clara Sanchez, Geert Litjens, Frank-Erik. de Leeuw, Bram van Ginneken, Elena Marchiori, Bram Platel
    Comments: 13 pages, 8 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The anatomical location of imaging features is of crucial importance for
    accurate diagnosis in many medical tasks. Convolutional neural networks (CNN)
    have had huge successes in computer vision, but they lack the natural ability
    to incorporate the anatomical location in their decision making process,
    hindering success in some medical image analysis tasks.

    In this paper, to integrate the anatomical location information into the
    network, we propose several deep CNN architectures that consider multi-scale
    patches or take explicit location features while training. We apply and compare
    the proposed architectures for segmentation of white matter hyperintensities in
    brain MR images on a large dataset. As a result, we observe that the CNNs that
    incorporate location information substantially outperform a conventional
    segmentation method with hand-crafted features as well as CNNs that do not
    integrate location information. On a test set of 46 scans, the best
    configuration of our networks obtained a Dice score of 0.791, compared to 0.797
    for an independent human observer. Performance levels of the machine and the
    independent human observer were not statistically significantly different
    (p-value=0.17).

    To Frontalize or Not To Frontalize: Do We Really Need Elaborate Pre-Processing to Improve Face Recognition Performance?

    Sandipan Banerjee, Joel Brogan, Janez Krizaj, Aparna Bharati, Brandon RichardWebster, Vitomir Struc, Patrick Flynn, Walter Scheirer
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Automatic face recognition performance has improved remarkably in the last
    decade. Much of this success can be attributed to the development of deep
    learning techniques like convolutional neural networks (CNNs). But the training
    process for CNNs requires a large amount of clean and well-labelled training
    data. If a CNN is intended to work with non-frontal face images, should this
    training data be diverse in terms of facial poses, or should face images be
    frontalized as a pre-processing step? We address this question in this paper.
    We evaluate a set of popular facial landmarking and pose frontalization
    algorithms to understand their effect on facial recognition performance. We
    also introduce a new automatic frontalization scheme that operates over a
    single image without the need for a subject-specific 3D model, and perform a
    comparative analysis between the new scheme and other methods in the
    literature. A CNN trained on face images frontalized using different
    pre-processing methods is used to extract features from the Point and Shoot
    Challenge (PaSC) video dataset. The verification and identification performance
    of the CNN serves to quantify the effectiveness of each landmarking and
    frontalization scheme. We find that frontalization, although an intuitive
    pre-processing strategy, does not significantly improve face recognition
    performance when compared with a simple 2D face alignment.

    Beyond Spatial Auto-Regressive Models: Predicting Housing Prices with Satellite Imagery

    Archith J. Bency, Swati Rallapalli, Raghu K. Ganti, Mudhakar Srivatsa, B. S. Manjunath
    Comments: 10 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    When modeling geo-spatial data, it is critical to capture spatial
    correlations for achieving high accuracy. Spatial Auto-Regression (SAR) is a
    common tool used to model such data, where the spatial contiguity matrix (W)
    encodes the spatial correlations. However, the efficacy of SAR is limited by
    two factors. First, it depends on the choice of contiguity matrix, which is
    typically not learnt from data, but instead, is assumed to be known apriori.
    Second, it assumes that the observations can be explained by linear models. In
    this paper, we propose a Convolutional Neural Network (CNN) framework to model
    geo-spatial data (specifi- cally housing prices), to learn the spatial
    correlations automatically. We show that neighborhood information embedded in
    satellite imagery can be leveraged to achieve the desired spatial smoothing. An
    additional upside of our framework is the relaxation of linear assumption on
    the data. Specific challenges we tackle while implementing our framework
    include, (i) how much of the neighborhood is relevant while estimating housing
    prices? (ii) what is the right approach to capture multiple resolutions of
    satellite imagery? and (iii) what other data-sources can help improve the
    estimation of spatial correlations? We demonstrate a marked improvement of 57%
    on top of the SAR baseline through the use of features from deep neural
    networks for the cities of London, Birmingham and Liverpool.

    Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning

    Ziad Al-Halah, Makarand Tapaswi, Rainer Stiefelhagen
    Comments: Published as a conference paper at CVPR 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Collecting training images for all visual categories is not only expensive
    but also impractical. Zero-shot learning (ZSL), especially using attributes,
    offers a pragmatic solution to this problem. However, at test time most
    attribute-based methods require a full description of attribute associations
    for each unseen class. Providing these associations is time consuming and often
    requires domain specific knowledge. In this work, we aim to carry out
    attribute-based zero-shot classification in an unsupervised manner. We propose
    an approach to learn relations that couples class embeddings with their
    corresponding attributes. Given only the name of an unseen class, the learned
    relationship model is used to automatically predict the class-attribute
    associations. Furthermore, our model facilitates transferring attributes across
    data sets without additional effort. Integrating knowledge from multiple
    sources results in a significant additional improvement in performance. We
    evaluate on two public data sets: Animals with Attributes and aPascal/aYahoo.
    Our approach outperforms state-of-the-art methods in both predicting
    class-attribute associations and unsupervised ZSL by a large margin.

    Incremental One-Class Models for Data Classification

    Takoua Kefi, Riadh Ksantini, M.Becha Kaaniche, Adel Bouhoula
    Comments: 4 pages, accepted in PhD Forum Session of the ECML-PKDD 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper we outline a PhD research plan. This research contributes to
    the field of one-class incremental learning and classification in case of
    non-stationary environments. The goal of this PhD is to define a new
    classification framework able to deal with very small learning dataset at the
    beginning of the process and with abilities to adjust itself according to the
    variability of the incoming data which create large scale datasets. As a
    preliminary work, incremental Covariance-guided One-Class Support Vector
    Machine is proposed to deal with sequentially obtained data. It is inspired
    from COSVM which put more emphasis on the low variance directions while keeping
    the basic formulation of incremental One-Class Support Vector Machine
    untouched. The incremental procedure is introduced by controlling the possible
    changes of support vectors after the addition of new data points, thanks to the
    Karush-Kuhn-Tucker conditions, that have to be maintained on all previously
    acquired data. Comparative experimental results with contemporary incremental
    and non-incremental one-class classifiers on numerous artificial and real data
    sets show that our method results in significantly better classification
    performance.

    Road Curb Extraction from Mobile LiDAR Point Clouds

    Sheng Xu, Ruisheng Wang, Han Zheng
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Automatic extraction of road curbs from uneven, unorganized, noisy and
    massive 3D point clouds is a challenging task. Existing methods often project
    3D point clouds onto 2D planes to extract curbs. However, the projection causes
    loss of 3D information which degrades the performance of the detection. This
    paper presents a robust, accurate and efficient method to extract road curbs
    from 3D mobile LiDAR point clouds. Our method consists of two steps: 1)
    extracting the candidate points of curbs based on the proposed novel energy
    function and 2) refining the candidate points using the proposed least cost
    path model. We evaluated our method on a large-scale of residential area
    (16.7GB, 300 million points) and an urban area (1.07GB, 20 million points)
    mobile LiDAR point clouds. Results indicate that the proposed method is
    superior to the state-of-the-art methods in terms of robustness, accuracy and
    efficiency. The proposed curb extraction method achieved a completeness of
    78.62% and a correctness of 83.29%. These experiments demonstrate that the
    proposed method is a promising solution to extract road curbs from mobile LiDAR
    point clouds.

    Deep Learning Ensembles for Melanoma Recognition in Dermoscopy Images

    Noel Codella, Quoc-Bao Nguyen, Sharath Pankanti, David Gutman, Brian Helba, Allan Halpern, John R. Smith
    Comments: Permission from journal obtained for preprint on arXiv. IBM Journal is an IEEE Journal (this http URL), which approves arXiv as a third party server (this http URL). Document is a PDF from Word, which is required format for IBM Journal
    Journal-ref: IBM Journal of Research and Development, vol. 61, no. 4/5, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Melanoma is the deadliest form of skin cancer. While curable with early
    detection, only highly trained specialists are capable of accurately
    recognizing the disease. As expertise is in limited supply, automated systems
    capable of identifying disease could save lives, reduce unnecessary biopsies,
    and reduce costs. Toward this goal, we propose a system that combines recent
    developments in deep learning with established machine learning approaches,
    creating ensembles of methods that are capable of segmenting skin lesions, as
    well as analyzing the detected area and surrounding tissue for melanoma
    detection. The system is evaluated using the largest publicly available
    benchmark dataset of dermoscopic images, containing 900 training and 379
    testing images. New state-of-the-art performance levels are demonstrated,
    leading to an improvement in the area under receiver operating characteristic
    curve of 7.5% (0.843 vs. 0.783), in average precision of 4% (0.649 vs. 0.624),
    and in specificity measured at the clinically relevant 95% sensitivity
    operating point 2.9 times higher than the previous state-of-the-art (36.8%
    specificity compared to 12.5%). Compared to the average of 8 expert
    dermatologists on a subset of 100 test images, the proposed system produces a
    higher accuracy (76% vs. 70.5%), and specificity (62% vs. 59%) evaluated at an
    equivalent sensitivity (82%).

    A Harmonic Mean Linear Discriminant Analysis for Robust Image Classification

    Shuai Zheng, Feiping Nie, Chris Ding, Heng Huang
    Comments: IEEE 28th International Conference on Tools with Artificial Intelligence, ICTAI 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Linear Discriminant Analysis (LDA) is a widely-used supervised dimensionality
    reduction method in computer vision and pattern recognition. In null space
    based LDA (NLDA), a well-known LDA extension, between-class distance is
    maximized in the null space of the within-class scatter matrix. However, there
    are some limitations in NLDA. Firstly, for many data sets, null space of
    within-class scatter matrix does not exist, thus NLDA is not applicable to
    those datasets. Secondly, NLDA uses arithmetic mean of between-class distances
    and gives equal consideration to all between-class distances, which makes
    larger between-class distances can dominate the result and thus limits the
    performance of NLDA. In this paper, we propose a harmonic mean based Linear
    Discriminant Analysis, Multi-Class Discriminant Analysis (MCDA), for image
    classification, which minimizes the reciprocal of weighted harmonic mean of
    pairwise between-class distance. More importantly, MCDA gives higher priority
    to maximize small between-class distances. MCDA can be extended to multi-label
    dimension reduction. Results on 7 single-label data sets and 4 multi-label data
    sets show that MCDA has consistently better performance than 10 other
    single-label approaches and 4 other multi-label approaches in terms of
    classification accuracy, macro and micro average F1 score.

    Multiple Instance Fuzzy Inference Neural Networks

    Amine Ben Khalifa, Hichem Frigui
    Comments: Submitted to IEEE Transactions On Cybernetics for review
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (cs.SY)

    Fuzzy logic is a powerful tool to model knowledge uncertainty, measurements
    imprecision, and vagueness. However, there is another type of vagueness that
    arises when data have multiple forms of expression that fuzzy logic does not
    address quite well. This is the case for multiple instance learning problems
    (MIL). In MIL, an object is represented by a collection of instances, called a
    bag. A bag is labeled negative if all of its instances are negative, and
    positive if at least one of its instances is positive. Positive bags encode
    ambiguity since the instances themselves are not labeled. In this paper, we
    introduce fuzzy inference systems and neural networks designed to handle bags
    of instances as input and capable of learning from ambiguously labeled data.
    First, we introduce the Multiple Instance Sugeno style fuzzy inference
    (MI-Sugeno) that extends the standard Sugeno style inference to handle
    reasoning with multiple instances. Second, we use MI-Sugeno to define and
    develop Multiple Instance Adaptive Neuro Fuzzy Inference System (MI-ANFIS). We
    expand the architecture of the standard ANFIS to allow reasoning with bags and
    derive a learning algorithm using backpropagation to identify the premise and
    consequent parameters of the network. The proposed inference system is tested
    and validated using synthetic and benchmark datasets suitable for MIL problems.
    We also apply the proposed MI-ANFIS to fuse the output of multiple
    discrimination algorithms for the purpose of landmine detection using Ground
    Penetrating Radar.

    Partial Procedural Geometric Model Fitting for Point Clouds

    Zongliang Zhang, Jonathan Li, Yulan Guo, Yangbin Lin, Ming Cheng, Cheng Wang
    Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

    Geometric model fitting is a fundamental task in computer graphics and
    computer vision. However, most geometric model fitting methods are unable to
    fit an arbitrary geometric model (e.g. a surface with holes) to incomplete
    data, due to that the similarity metrics used in these methods are unable to
    measure the rigid partial similarity between arbitrary models. This paper hence
    proposes a novel rigid geometric similarity metric, which is able to measure
    both the full similarity and the partial similarity between arbitrary geometric
    models. The proposed metric enables us to perform partial procedural geometric
    model fitting (PPGMF). The task of PPGMF is to search a procedural geometric
    model space for the model rigidly similar to a query of non-complete point set.
    Models in the procedural model space are generated according to a set of
    parametric modeling rules. A typical query is a point cloud. PPGMF is very
    useful as it can be used to fit arbitrary geometric models to non-complete
    (incomplete, over-complete or hybrid-complete) point cloud data. For example,
    most laser scanning data is non-complete due to occlusion. Our PPGMF method
    uses Markov chain Monte Carlo technique to optimize the proposed similarity
    metric over the model space. To accelerate the optimization process, the method
    also employs a novel coarse-to-fine model dividing strategy to reject
    dissimilar models in advance. Our method has been demonstrated on a variety of
    geometric models and non-complete data. Experimental results show that the
    PPGMF method based on the proposed metric is able to fit non-complete data,
    while the method based on other metrics is unable. It is also shown that our
    method can be accelerated by several times via early rejection.


    Artificial Intelligence

    Internet of Things Applications: Animal Monitoring with Unmanned Aerial Vehicle

    Jun Xu, Gurkan Solmaz, Rouhollah Rahmatizadeh, Damla Turgut, Ladislau Boloni
    Comments: 12 pages
    Subjects: Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)

    In animal monitoring applications, both animal detection and their movement
    prediction are major tasks. While a variety of animal monitoring strategies
    exist, most of them rely on mounting devices. However, in real world, it is
    difficult to find these animals and install mounting devices. In this paper, we
    propose an animal monitoring application by utilizing wireless sensor networks
    (WSNs) and unmanned aerial vehicle (UAV). The objective of the application is
    to detect locations of endangered species in large-scale wildlife areas and
    monitor movement of animals without any attached devices. In this application,
    sensors deployed throughout the observation area are responsible for gathering
    animal information. The UAV flies above the observation area and collects the
    information from sensors. To achieve the information efficiently, we propose a
    path planning approach for the UAV based on a Markov decision process (MDP)
    model. The UAV receives a certain amount of reward from an area if some animals
    are detected at that location. We solve the MDP using Q-learning such that the
    UAV prefers going to those areas that animals are detected before. Meanwhile,
    the UAV explores other areas as well to cover the entire network and detects
    changes in the animal positions. We first define the mathematical model
    underlying the animal monitoring problem in terms of the value of information
    (VoI) and rewards. We propose a network model including clusters of sensor
    nodes and a single UAV that acts as a mobile sink and visits the clusters.
    Then, one MDP-based path planning approach is designed to maximize the VoI
    while reducing message delays. The effectiveness of the proposed approach is
    evaluated using two real-world movement datasets of zebras and leopard.
    Simulation results show that our approach outperforms greedy, random heuristics
    and the path planning based on the traveling salesman problem.

    Improvements in Sub-optimal Solving of the $(N^2-1)$-Puzzle via Joint Relocation of Pebbles and its Applications to Rule-based Cooperative Path-Finding

    Pavel Surynek, Petr Michalík
    Subjects: Artificial Intelligence (cs.AI)

    The problem of solving $(n^2-1)$-puzzle and cooperative path-finding (CPF)
    sub-optimally by rule based algorithms is addressed in this manuscript. The
    task in the puzzle is to rearrange $n^2-1$ pebbles on the square grid of the
    size of n x n using one vacant position to a desired goal configuration. An
    improvement to the existent polynomial-time algorithm is proposed and
    experimentally analyzed. The improved algorithm is trying to move pebbles in a
    more efficient way than the original algorithm by grouping them into so-called
    snakes and moving them jointly within the snake. An experimental evaluation
    showed that the algorithm using snakes produces solutions that are 8% to 9%
    shorter than solutions generated by the original algorithm.

    The snake-based relocation has been also integrated into rule-based
    algorithms for solving the CPF problem sub-optimally, which is a closely
    related task. The task in CPF is to relocate a group of abstract robots that
    move over an undirected graph to given goal vertices. Robots can move to
    unoccupied neighboring vertices and at most one robot can be placed in each
    vertex. The $(n^2-1)$-puzzle is a special case of CPF where the underlying
    graph is represented by a 4-connected grid and there is only one vacant vertex.
    Two major rule-based algorithms for CPF were included in our study – BIBOX and
    PUSH-and-SWAP (PUSH-and-ROTATE). Improvements gained by using snakes in the
    BIBOX algorithm were stable around 30% in $(n^2-1)$-puzzle solving and up to
    50% in CPFs over bi-connected graphs with various ear decompositions and
    multiple vacant vertices. In the case of the PUSH-and-SWAP algorithm the
    improvement achieved by snakes was around 5% to 8%. However, the improvement
    was unstable and hardly predictable in the case of PUSH-and-SWAP.

    Fault Detection Engine in Intelligent Predictive Analytics Platform for DCIM

    Bodhisattwa Prasad Majumder, Ayan Sengupta, Sajal jain, Parikshit Bhaduri
    Comments: Accepted in 4th International Conference on Business Analytics and Intelligence (ICBAI 2016)
    Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

    With the advancement of huge data generation and data handling capability,
    Machine Learning and Probabilistic modelling enables an immense opportunity to
    employ predictive analytics platform in high security critical industries
    namely data centers, electricity grids, utilities, airport etc. where downtime
    minimization is one of the primary objectives. This paper proposes a novel,
    complete architecture of an intelligent predictive analytics platform, Fault
    Engine, for huge device network connected with electrical/information flow.
    Three unique modules, here proposed, seamlessly integrate with available
    technology stack of data handling and connect with middleware to produce online
    intelligent prediction in critical failure scenarios. The Markov Failure module
    predicts the severity of a failure along with survival probability of a device
    at any given instances. The Root Cause Analysis model indicates probable
    devices as potential root cause employing Bayesian probability assignment and
    topological sort. Finally, a community detection algorithm produces correlated
    clusters of device in terms of failure probability which will further narrow
    down the search space of finding route cause. The whole Engine has been tested
    with different size of network with simulated failure environments and shows
    its potential to be scalable in real-time implementation.

    Learning and Transfer of Modulated Locomotor Controllers

    Nicolas Heess, Greg Wayne, Yuval Tassa, Timothy Lillicrap, Martin Riedmiller, David Silver
    Comments: Supplemental video available at this https URL
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

    We study a novel architecture and training procedure for locomotion tasks. A
    high-frequency, low-level “spinal” network with access to proprioceptive
    sensors learns sensorimotor primitives by training on simple tasks. This
    pre-trained module is fixed and connected to a low-frequency, high-level
    “cortical” network, with access to all sensors, which drives behavior by
    modulating the inputs to the spinal network. Where a monolithic end-to-end
    architecture fails completely, learning with a pre-trained spinal module
    succeeds at multiple high-level tasks, and enables the effective exploration
    required to learn from sparse rewards. We test our proposed architecture on
    three simulated bodies: a 16-dimensional swimming snake, a 20-dimensional
    quadruped, and a 54-dimensional humanoid. Our results are illustrated in the
    accompanying video at this https URL

    Weekly maintenance scheduling using exact and genetic methods

    Andrew W. Palmer, Robin Vujanic, Andrew J. Hill, Steven J. Scheding
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

    The weekly maintenance schedule specifies when maintenance activities should
    be performed on the equipment, taking into account the availability of workers
    and maintenance bays, and other operational constraints. The current approach
    to generating this schedule is labour intensive and requires coordination
    between the maintenance schedulers and operations staff to minimise its impact
    on the operation of the mine. This paper presents methods for automatically
    generating this schedule from the list of maintenance tasks to be performed,
    the availability roster of the maintenance staff, and time windows in which
    each piece of equipment is available for maintenance. Both Mixed-Integer Linear
    Programming (MILP) and genetic algorithms are evaluated, with the genetic
    algorithm shown to significantly outperform the MILP. Two fitness functions for
    the genetic algorithm are also examined, with a linear fitness function
    outperforming an inverse fitness function by up to 5% for the same calculation
    time. The genetic algorithm approach is computationally fast, allowing the
    schedule to be rapidly recalculated in response to unexpected delays and
    breakdowns.

    Wind ramp event prediction with parallelized Gradient Boosted Regression Trees

    Saurav Gupta, Nitin Anand Shrivastava, Abbas Khosravi, Bijaya Ketan Panigrahi
    Comments: IJCNN 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Accurate prediction of wind ramp events is critical for ensuring the
    reliability and stability of the power systems with high penetration of wind
    energy. This paper proposes a classification based approach for estimating the
    future class of wind ramp event based on certain thresholds. A parallelized
    gradient boosted regression tree based technique has been proposed to
    accurately classify the normal as well as rare extreme wind power ramp events.
    The model has been validated using wind power data obtained from the National
    Renewable Energy Laboratory database. Performance comparison with several
    benchmark techniques indicates the superiority of the proposed technique in
    terms of superior classification accuracy.

    Efficient Rectangular Maximal-Volume Algorithm for Rating Elicitation in Collaborative Filtering

    Alexander Fonarev, Alexander Mikhalev, Pavel Serdyukov, Gleb Gusev, Ivan Oseledets
    Comments: IEEE International Conference on Data Mining (ICDM) 2016
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

    Cold start problem in Collaborative Filtering can be solved by asking new
    users to rate a small seed set of representative items or by asking
    representative users to rate a new item. The question is how to build a seed
    set that can give enough preference information for making good
    recommendations. One of the most successful approaches, called Representative
    Based Matrix Factorization, is based on Maxvol algorithm. Unfortunately, this
    approach has one important limitation — a seed set of a particular size
    requires a rating matrix factorization of fixed rank that should coincide with
    that size. This is not necessarily optimal in the general case. In the current
    paper, we introduce a fast algorithm for an analytical generalization of this
    approach that we call Rectangular Maxvol. It allows the rank of factorization
    to be lower than the required size of the seed set. Moreover, the paper
    includes the theoretical analysis of the method’s error, the complexity
    analysis of the existing methods and the comparison to the state-of-the-art
    approaches.

    A Harmonic Mean Linear Discriminant Analysis for Robust Image Classification

    Shuai Zheng, Feiping Nie, Chris Ding, Heng Huang
    Comments: IEEE 28th International Conference on Tools with Artificial Intelligence, ICTAI 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Linear Discriminant Analysis (LDA) is a widely-used supervised dimensionality
    reduction method in computer vision and pattern recognition. In null space
    based LDA (NLDA), a well-known LDA extension, between-class distance is
    maximized in the null space of the within-class scatter matrix. However, there
    are some limitations in NLDA. Firstly, for many data sets, null space of
    within-class scatter matrix does not exist, thus NLDA is not applicable to
    those datasets. Secondly, NLDA uses arithmetic mean of between-class distances
    and gives equal consideration to all between-class distances, which makes
    larger between-class distances can dominate the result and thus limits the
    performance of NLDA. In this paper, we propose a harmonic mean based Linear
    Discriminant Analysis, Multi-Class Discriminant Analysis (MCDA), for image
    classification, which minimizes the reciprocal of weighted harmonic mean of
    pairwise between-class distance. More importantly, MCDA gives higher priority
    to maximize small between-class distances. MCDA can be extended to multi-label
    dimension reduction. Results on 7 single-label data sets and 4 multi-label data
    sets show that MCDA has consistently better performance than 10 other
    single-label approaches and 4 other multi-label approaches in terms of
    classification accuracy, macro and micro average F1 score.


    Information Retrieval

    Efficient Rectangular Maximal-Volume Algorithm for Rating Elicitation in Collaborative Filtering

    Alexander Fonarev, Alexander Mikhalev, Pavel Serdyukov, Gleb Gusev, Ivan Oseledets
    Comments: IEEE International Conference on Data Mining (ICDM) 2016
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

    Cold start problem in Collaborative Filtering can be solved by asking new
    users to rate a small seed set of representative items or by asking
    representative users to rate a new item. The question is how to build a seed
    set that can give enough preference information for making good
    recommendations. One of the most successful approaches, called Representative
    Based Matrix Factorization, is based on Maxvol algorithm. Unfortunately, this
    approach has one important limitation — a seed set of a particular size
    requires a rating matrix factorization of fixed rank that should coincide with
    that size. This is not necessarily optimal in the general case. In the current
    paper, we introduce a fast algorithm for an analytical generalization of this
    approach that we call Rectangular Maxvol. It allows the rank of factorization
    to be lower than the required size of the seed set. Moreover, the paper
    includes the theoretical analysis of the method’s error, the complexity
    analysis of the existing methods and the comparison to the state-of-the-art
    approaches.

    Term-Class-Max-Support (TCMS): A Simple Text Document Categorization Approach Using Term-Class Relevance Measure

    D S Guru, Mahamad Suhil
    Comments: 4 Pages, 4 Figures; 2016 Intl. Conference on Advances in Computing, Communications and Informatics (ICACCI), Sept. 21-24, 2016, Jaipur, India
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    In this paper, a simple text categorization method using term-class relevance
    measures is proposed. Initially, text documents are processed to extract
    significant terms present in them. For every term extracted from a document, we
    compute its importance in preserving the content of a class through a novel
    term-weighting scheme known as Term_Class Relevance (TCR) measure proposed by
    Guru and Suhil (2015) [1]. In this way, for every term, its relevance for all
    the classes present in the corpus is computed and stored in the knowledgebase.
    During testing, the terms present in the test document are extracted and the
    term-class relevance of each term is obtained from the stored knowledgebase. To
    achieve quick search of term weights, Btree indexing data structure has been
    adapted. Finally, the class which receives maximum support in terms of
    term-class relevance is decided to be the class of the given test document. The
    proposed method works in logarithmic complexity in testing time and simple to
    implement when compared to any other text categorization techniques available
    in literature. The experiments conducted on various benchmarking datasets have
    revealed that the performance of the proposed method is satisfactory and
    encouraging.


    Computation and Language

    Achieving Human Parity in Conversational Speech Recognition

    W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, G. Zweig
    Subjects: Computation and Language (cs.CL)

    Conversational speech recognition has served as a flagship speech recognition
    task since the release of the DARPA Switchboard corpus in the 1990s. In this
    paper, we measure the human error rate on the widely used NIST 2000 test set,
    and find that our latest automated system has reached human parity. The error
    rate of professional transcriptionists is 5.9% for the Switchboard portion of
    the data, in which newly acquainted pairs of people discuss an assigned topic,
    and 11.3% for the CallHome portion where friends and family members have
    open-ended conversations. In both cases, our automated system establishes a new
    state-of-the-art, and edges past the human benchmark. This marks the first time
    that human parity has been reported for conversational speech. The key to our
    system’s performance is the systematic use of convolutional and LSTM neural
    networks, combined with a novel spatial smoothing method and lattice-free MMI
    acoustic training.

    Pre-Translation for Neural Machine Translation

    Jan Niehues, Eunah Cho, Thanh-Le Ha, Alex Waibel
    Comments: 9 pages. To appear in COLING 2016
    Subjects: Computation and Language (cs.CL)

    Recently, the development of neural machine translation (NMT) has
    significantly improved the translation quality of automatic machine
    translation. While most sentences are more accurate and fluent than
    translations by statistical machine translation (SMT)-based systems, in some
    cases, the NMT system produces translations that have a completely different
    meaning. This is especially the case when rare words occur.

    When using statistical machine translation, it has already been shown that
    significant gains can be achieved by simplifying the input in a preprocessing
    step. A commonly used example is the pre-reordering approach.

    In this work, we used phrase-based machine translation to pre-translate the
    input into the target language. Then a neural machine translation system
    generates the final hypothesis using the pre-translation. Thereby, we use
    either only the output of the phrase-based machine translation (PBMT) system or
    a combination of the PBMT output and the source sentence.

    We evaluate the technique on the English to German translation task. Using
    this approach we are able to outperform the PBMT system as well as the baseline
    neural MT system by up to 2 BLEU points. We analyzed the influence of the
    quality of the initial system on the final result.

    Neural Machine Translation Advised by Statistical Machine Translation

    Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, Min Zhang
    Comments: submitted to AAAI 2017
    Subjects: Computation and Language (cs.CL)

    Neural Machine Translation (NMT) is a new approach to machine translation
    that has made great progress in recent years. However, recent studies show that
    NMT generally produces fluent but inadequate translations (Tu et al. 2016; He
    et al. 2016). This is in contrast to conventional Statistical Machine
    Translation (SMT), which usually yields adequate but non-fluent translations.
    It is natural, therefore, to leverage the advantages of both models for better
    translations, and in this work we propose to incorporate SMT model into NMT
    framework. More specifically, at each decoding step, SMT offers additional
    recommendations of generated words based on the decoding information from NMT
    (e.g., the generated partial translation and attention history). Then we employ
    an auxiliary classifier to score the SMT recommendations and a gating function
    to combine the SMT recommendations with NMT generations, both of which are
    jointly trained within the NMT architecture in an end-to-end manner.
    Experimental results on Chinese-English translation show that the proposed
    approach achieves significant and consistent improvements over state-of the-art
    NMT and SMT systems on multiple NIST test sets.

    Interactive Attention for Neural Machine Translation

    Fandong Meng, Zhengdong Lu, Hang Li, Qun Liu
    Comments: Accepted at COLING 2016
    Subjects: Computation and Language (cs.CL)

    Conventional attention-based Neural Machine Translation (NMT) conducts
    dynamic alignment in generating the target sentence. By repeatedly reading the
    representation of source sentence, which keeps fixed after generated by the
    encoder (Bahdanau et al., 2015), the attention mechanism has greatly enhanced
    state-of-the-art NMT. In this paper, we propose a new attention mechanism,
    called INTERACTIVE ATTENTION, which models the interaction between the decoder
    and the representation of source sentence during translation by both reading
    and writing operations. INTERACTIVE ATTENTION can keep track of the interaction
    history and therefore improve the translation performance. Experiments on NIST
    Chinese-English translation task show that INTERACTIVE ATTENTION can achieve
    significant improvements over both the previous attention-based NMT baseline
    and some state-of-the-art variants of attention-based NMT (i.e., coverage
    models (Tu et al., 2016)). And neural machine translator with our INTERACTIVE
    ATTENTION can outperform the open source attention-based NMT system Groundhog
    by 4.22 BLEU points and the open source phrase-based system Moses by 3.94 BLEU
    points averagely on multiple test sets.

    Cached Long Short-Term Memory Neural Networks for Document-Level Sentiment Classification

    Jiacheng Xu, Danlu Chen, Xipeng Qiu, Xuangjing Huang
    Comments: Published as long paper of EMNLP2016
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    Recently, neural networks have achieved great success on sentiment
    classification due to their ability to alleviate feature engineering. However,
    one of the remaining challenges is to model long texts in document-level
    sentiment classification under a recurrent architecture because of the
    deficiency of the memory unit. To address this problem, we present a Cached
    Long Short-Term Memory neural networks (CLSTM) to capture the overall semantic
    information in long texts. CLSTM introduces a cache mechanism, which divides
    memory into several groups with different forgetting rates and thus enables the
    network to keep sentiment information better within a recurrent unit. The
    proposed CLSTM outperforms the state-of-the-art models on three publicly
    available document-level sentiment analysis datasets.

    Translation Quality Estimation using Recurrent Neural Network

    Raj Nath Patel, Sasikumar M
    Comments: 7 pages, published at First Conference on Machine Translation
    Journal-ref: cs.CL.NLP.WMT2016.QE. 2 (2016) 819-824
    Subjects: Computation and Language (cs.CL)

    This paper describes our submission to the shared task on word/phrase level
    Quality Estimation (QE) in the First Conference on Statistical Machine
    Translation (WMT16). The objective of the shared task was to predict if the
    given word/phrase is a correct/incorrect (OK/BAD) translation in the given
    sentence. In this paper, we propose a novel approach for word level Quality
    Estimation using Recurrent Neural Network Language Model (RNN-LM) architecture.
    RNN-LMs have been found very effective in different Natural Language Processing
    (NLP) applications. RNN-LM is mainly used for vector space language modeling
    for different NLP problems. For this task, we modify the architecture of
    RNN-LM. The modified system predicts a label (OK/BAD) in the slot rather than
    predicting the word. The input to the system is a word sequence, similar to the
    standard RNN-LM. The approach is language independent and requires only the
    translated text for QE. To estimate the phrase level quality, we use the output
    of the word level QE system.

    Term-Class-Max-Support (TCMS): A Simple Text Document Categorization Approach Using Term-Class Relevance Measure

    D S Guru, Mahamad Suhil
    Comments: 4 Pages, 4 Figures; 2016 Intl. Conference on Advances in Computing, Communications and Informatics (ICACCI), Sept. 21-24, 2016, Jaipur, India
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    In this paper, a simple text categorization method using term-class relevance
    measures is proposed. Initially, text documents are processed to extract
    significant terms present in them. For every term extracted from a document, we
    compute its importance in preserving the content of a class through a novel
    term-weighting scheme known as Term_Class Relevance (TCR) measure proposed by
    Guru and Suhil (2015) [1]. In this way, for every term, its relevance for all
    the classes present in the corpus is computed and stored in the knowledgebase.
    During testing, the terms present in the test document are extracted and the
    term-class relevance of each term is obtained from the stored knowledgebase. To
    achieve quick search of term weights, Btree indexing data structure has been
    adapted. Finally, the class which receives maximum support in terms of
    term-class relevance is decided to be the class of the given test document. The
    proposed method works in logarithmic complexity in testing time and simple to
    implement when compared to any other text categorization techniques available
    in literature. The experiments conducted on various benchmarking datasets have
    revealed that the performance of the proposed method is satisfactory and
    encouraging.

    Generalization of metric classification algorithms for sequences classification and labelling

    Roman Samarev, Andrey Vasnetsov, Elizaveta Smelkova
    Subjects: Learning (cs.LG); Computation and Language (cs.CL)

    The article deals with the issue of modification of metric classification
    algorithms. In particular, it studies the algorithm k-Nearest Neighbours for
    its application to sequential data. A method of generalization of metric
    classification algorithms is proposed. As a part of it, there has been
    developed an algorithm for solving the problem of classification and labelling
    of sequential data. The advantages of the developed algorithm of classification
    in comparison with the existing one are also discussed in the article. There is
    a comparison of the effectiveness of the proposed algorithm with the algorithm
    of CRF in the task of chunking in the open data set CoNLL2000.

    Simultaneous Learning of Trees and Representations for Extreme Classification, with Application to Language Modeling

    Yacine Jernite, Anna Choromanska, David Sontag, Yann LeCun
    Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)

    This paper addresses the problem of multi-class classification with an
    extremely large number of classes, where the class predictor is learned jointly
    with the data representation, as is the case in language modeling problems. The
    predictor admits a hierarchical structure, which allows for efficient handling
    of settings that deal with a very large number of labels. The predictive power
    of the model however can heavily depend on the structure of the tree. We
    address this problem with an algorithm for tree construction and training that
    is based on a new objective function which favors balanced and easily-separable
    node partitions. We describe theoretical properties of this objective function
    and show that it gives rise to a boosting algorithm for which we provide a
    bound on classification error, i.e. we show that if the objective is weakly
    optimized in the internal nodes of the tree, then our algorithm will amplify
    this weak advantage to build a tree achieving any desired level of accuracy. We
    apply the algorithm to the task of language modeling by re-framing conditional
    density estimation as a variant of the hierarchical classification problem. We
    empirically demonstrate on text data that the proposed approach leads to
    high-quality trees in terms of perplexity and computational running time
    compared to its non-hierarchical counterpart.


    Distributed, Parallel, and Cluster Computing

    Parallel Stream Processing Against Workload Skewness and Variance

    Junhua Fang, Rong Zhang, Tom Z.J.Fu, Zhenjie Zhang, Aoying Zhou, Junhua Zhu
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)

    Key-based workload partitioning is a common strategy used in parallel stream
    processing engines, enabling effective key-value tuple distribution over worker
    threads in a logical operator. While randomized hashing on the keys is capable
    of balancing the workload for key-based partitioning when the keys generally
    follow a static distribution, it is likely to generate poor balancing
    performance when workload variance occurs on the incoming data stream. This
    paper presents a new key-based workload partitioning framework, with practical
    algorithms to support dynamic workload assignment for stateful operators. The
    framework combines hash-based and explicit key-based routing strategies for
    workload distribution, which specifies the destination worker threads for a
    handful of keys and assigns the other keys with the hashing function. When
    short-term distribution fluctuations occur to the incoming data stream, the
    system adaptively updates the routing table containing the chosen keys, in
    order to rebalance the workload with minimal migration overhead within the
    stateful operator. We formulate the rebalance operation as an optimization
    problem, with multiple objectives on minimizing state migration costs,
    controlling the size of the routing table and breaking workload imbalance among
    worker threads. Despite of the NP-hardness nature behind the optimization
    formulation, we carefully investigate and justify the heuristics behind key
    (re)routing and state migration, to facilitate fast response to workload
    variance with ignorable cost to the normal processing in the distributed
    system. Empirical studies on synthetic data and real-world stream applications
    validate the usefulness of our proposals and prove the huge advantage of our
    approaches over state-of-the-art solutions in the literature.

    Analysis of the modernization prospects of the WLCG monitoring framework's messaging subsystem

    V. Airiian
    Comments: 5 pages, XIX International Scientific Conference of Young Scientists and Specialists (AYSS-2015), JINR, Dubna, 16-20 February 2015
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    The purpose of the project is an analysis of the modernization prospects of
    the WLCG monitoring framework’s messaging subsystem based on Nagios monitoring
    software and Apache ActiveMQ technologies. The modernization process demands
    thorough examination of the existing subsystem to determine the vital upgrade
    requirements. Thus the work is focused on research of the main underlying
    technologies, the existing subsystem’s structure and revision of its design and
    used software.

    Fault Tolerant Frequent Pattern Mining

    Sameh Shohdy, Abhinav Vishnu, Gagan Agrawal
    Comments: 10 Pages, High Performance Computing Conference (HIPC 2016)
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    FP-Growth algorithm is a Frequent Pattern Min- ing (FPM) algorithm that has
    been extensively used to study correlations and patterns in large scale
    datasets. While several researchers have designed distributed memory FP-Growth
    algorithms, it is pivotal to consider fault tolerant FP-Growth, which can
    address the increasing fault rates in large scale systems. In this work, we
    propose a novel parallel, algorithm-level fault-tolerant FP-Growth algorithm.
    We leverage algorithmic properties and MPI advanced features to guarantee an
    O(1) space complexity, achieved by using the dataset memory space itself for
    checkpointing. We also propose a recovery algorithm that can use in-memory and
    disk-based checkpointing, though in many cases the recovery can be completed
    without any disk access, and incurring no memory overhead for checkpointing. We
    evaluate our FT algorithm on a large scale InfiniBand cluster with several
    large datasets using up to 2K cores. Our evaluation demonstrates excellent
    efficiency for checkpointing and recovery in comparison to the disk-based
    approach. We have also observed 20x average speed-up in comparison to Spark,
    establishing that a well designed algorithm can easily outperform a solution
    based on a general fault-tolerant programming model.

    Remarks to the paper "Control system for reducing energy consumption in backbone computer network" from "Concurrency and Computation: Practice and Experience" journal

    Andrzej Karbowski
    Comments: 5 pages. arXiv admin note: substantial text overlap with arXiv:1610.02551
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

    This paper indicates two errors in the formulation of the main optimization
    model in the article “Control system for reducing energy consumption in
    backbone computer network” by Niewiadomska-Szynkiewicz et al. and shows how to
    fix them.

    A New Perspective on Randomized Gossip Algorithms

    Nicolas Loizou, Peter Richtárik
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Optimization and Control (math.OC)

    In this short note we propose a new approach for the design and analysis of
    randomized gossip algorithms which can be used to solve the average consensus
    problem. We show how that Randomized Block Kaczmarz (RBK) method – a method for
    solving linear systems – works as gossip algorithm when applied to a special
    system encoding the underlying network. The famous pairwise gossip algorithm
    arises as a special case. Subsequently, we reveal a hidden duality of
    randomized gossip algorithms, with the dual iterative process maintaining a set
    of numbers attached to the edges as opposed to nodes of the network. We prove
    that RBK obtains a superlinear speedup in the size of the block, and
    demonstrate this effect through experiments.

    Analysis and Modeling of Social Influence in High Performance Computing Workloads

    Shuai Zheng, Zon-Yin Shae, Xiangliang Zhang, Hani Jamjoom, Liana Fong
    Comments: The 17th International European Conference on Parallel and Distributed Computing, Euro-Par 2011
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI)

    Social influence among users (e.g., collaboration on a project) creates
    bursty behavior in the underlying high performance computing (HPC) workloads.
    Using representative HPC and cluster workload logs, this paper identifies,
    analyzes, and quantifies the level of social influence across HPC users. We
    show the existence of a social graph that is characterized by a pattern of
    dominant users and followers. This pattern also follows a power-law
    distribution, which is consistent with those observed in mainstream social
    networks. Given its potential impact on HPC workloads prediction and
    scheduling, we propose a fast-converging, computationally-efficient online
    learning algorithm for identifying social groups. Extensive evaluation shows
    that our online algorithm can (1) quickly identify the social relationships by
    using a small portion of incoming jobs and (2) can efficiently track group
    evolution over time.

    A Distributed Parallel Algorithm for Minimum Spanning Tree Problem

    Artem Mazeev, Alexander Semenov, Alexey Simonov
    Comments: 11 pages, 5 figures, 2 tables
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In this paper we present and evaluate a parallel algorithm for solving a
    minimum spanning tree (MST) problem for supercomputers with distributed memory.
    The algorithm relies on the relaxation of the message processing order
    requirement for one specific message type compared to the original GHS
    (Gallager, Humblet, Spira) algorithm. Our algorithm adopts hashing and message
    compression optimization techniques as well. To the best of our knowledge, this
    is the first parallel implementation of the GHS algorithm that linearly scales
    to more than 32 nodes (256 cores) of Infiniband cluster.

    Improving Server Utilization in a Distributed Computing Set-up with Independent Clients

    Anindya S. Chakrabarti, Diptesh Ghosh
    Comments: 16 pages, 8 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    We consider a set-up in which there are multiple servers and multiple clients
    in a large distributed computing system. Clients request servers to process
    jobs. Servers can only process one job in unit time. There is no coordinating
    agent to route client requests to servers, and clients choose servers
    independently and simultaneously, and only have access to the outcomes of their
    own past requests. If more than one clients choose the same server, then only
    one randomly chosen client’s requests will be fulfilled. If some servers do not
    receive any request, they remain idle. In this paper, we show that a large
    category of strategies are not effective in terms of server utilization. We
    devise strategies for clients that improve server utilization of such systems
    over those of strategies known in the current literature.

    Decentralized Collaborative Learning of Personalized Models over Networks

    Paul Vanhaesebrouck, Aurélien Bellet, Marc Tommasi
    Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (cs.SY); Machine Learning (stat.ML)

    We consider a set of learning agents in a collaborative peer-to-peer network,
    where each agent learns a personalized model according to its own learning
    objective. The question addressed in this paper is: how can agents improve upon
    their locally trained model by communicating with other agents that have
    similar objectives? We introduce and analyze two asynchronous gossip algorithms
    running in a fully decentralized manner. Our first approach, inspired from
    label propagation, aims to smooth pre-trained local models over the network
    while accounting for the confidence that each agent has in its initial model.
    In our second approach, agents jointly learn and propagate their model by
    making iterative updates based on both their local dataset and the behavior of
    their neighbors. Our algorithm to optimize this challenging objective in a
    decentralized way is based on ADMM.

    Efficient Random Sampling – Parallel, Vectorized, Cache-Efficient, and Online

    Peter Sanders, Sebastian Lamm, Lorenz Hübschle-Schneider, Emanuel Schrade, Carsten Dachsbacher
    Subjects: Mathematical Software (cs.MS); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    We consider the problem of sampling $n$ numbers from the range
    ${1,ldots,N}$ without replacement on modern architectures. The main result
    is a simple divide-and-conquer scheme that makes sequential algorithms more
    cache efficient and leads to a parallel algorithm running in expected time
    $mathcal{O}left(n/p+log p
    ight)$ on $p$ processors. The amount of
    communication between the processors is very small and independent of the
    sample size. We also discuss modifications needed for load balancing, reservoir
    sampling, online sampling, sampling with replacement, Bernoulli sampling, and
    vectorization on SIMD units or GPUs.

    Fault Detection Engine in Intelligent Predictive Analytics Platform for DCIM

    Bodhisattwa Prasad Majumder, Ayan Sengupta, Sajal jain, Parikshit Bhaduri
    Comments: Accepted in 4th International Conference on Business Analytics and Intelligence (ICBAI 2016)
    Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

    With the advancement of huge data generation and data handling capability,
    Machine Learning and Probabilistic modelling enables an immense opportunity to
    employ predictive analytics platform in high security critical industries
    namely data centers, electricity grids, utilities, airport etc. where downtime
    minimization is one of the primary objectives. This paper proposes a novel,
    complete architecture of an intelligent predictive analytics platform, Fault
    Engine, for huge device network connected with electrical/information flow.
    Three unique modules, here proposed, seamlessly integrate with available
    technology stack of data handling and connect with middleware to produce online
    intelligent prediction in critical failure scenarios. The Markov Failure module
    predicts the severity of a failure along with survival probability of a device
    at any given instances. The Root Cause Analysis model indicates probable
    devices as potential root cause employing Bayesian probability assignment and
    topological sort. Finally, a community detection algorithm produces correlated
    clusters of device in terms of failure probability which will further narrow
    down the search space of finding route cause. The whole Engine has been tested
    with different size of network with simulated failure environments and shows
    its potential to be scalable in real-time implementation.


    Learning

    Decentralized Collaborative Learning of Personalized Models over Networks

    Paul Vanhaesebrouck, Aurélien Bellet, Marc Tommasi
    Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (cs.SY); Machine Learning (stat.ML)

    We consider a set of learning agents in a collaborative peer-to-peer network,
    where each agent learns a personalized model according to its own learning
    objective. The question addressed in this paper is: how can agents improve upon
    their locally trained model by communicating with other agents that have
    similar objectives? We introduce and analyze two asynchronous gossip algorithms
    running in a fully decentralized manner. Our first approach, inspired from
    label propagation, aims to smooth pre-trained local models over the network
    while accounting for the confidence that each agent has in its initial model.
    In our second approach, agents jointly learn and propagate their model by
    making iterative updates based on both their local dataset and the behavior of
    their neighbors. Our algorithm to optimize this challenging objective in a
    decentralized way is based on ADMM.

    Risk-Aware Algorithms for Adversarial Contextual Bandits

    Wen Sun, Debadeepta Dey, Ashish Kapoor
    Comments: 28 pages
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    In this work we consider adversarial contextual bandits with risk
    constraints. At each round, nature prepares a context, a cost for each arm, and
    additionally a risk for each arm. The learner leverages the context to pull an
    arm and then receives the corresponding cost and risk associated with the
    pulled arm. In addition to minimizing the cumulative cost, the learner also
    needs to satisfy long-term risk constraints — the average of the cumulative
    risk from all pulled arms should not be larger than a pre-defined threshold. To
    address this problem, we first study the full information setting where in each
    round the learner receives an adversarial convex loss and a convex constraint.
    We develop a meta algorithm leveraging online mirror descent for the full
    information setting and extend it to contextual bandit with risk constraints
    setting using expert advice. Our algorithms can achieve near-optimal regret in
    terms of minimizing the total cost, while successfully maintaining a sublinear
    growth of cumulative risk constraint violation.

    Efficient Metric Learning for the Analysis of Motion Data

    Babak Hosseini, Barbara Hammer
    Comments: Pre-Print of DSAA2015 conference paper, Presented at the Data Science and Advanced Analytics (DSAA), Paris, France (2015)
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We investigate metric learning in the context of dynamic time warping (DTW),
    the by far most popular dissimilarity measure used for the comparison and
    analysis of motion capture data. While metric learning enables a
    problem-adapted representation of data, the majority of meth- ods has been
    proposed for vectorial data only. In this contribution, we extend the popular
    principle offered by the large margin nearest neighbours learner (LMNN) to DTW
    by treating the resulting component-wise dissimilarity values as features. We
    demonstrate, that this principle greatly enhances the classification accuracy
    in several benchmarks. Further, we show that recent auxiliary concepts such as
    metric regularisation can be transferred from the vectorial case to
    component-wise DTW in a similar way. We illustrate, that metric regularisation
    constitutes a crucial prerequisite for the interpretation of the resulting
    relevance profiles.

    Wind ramp event prediction with parallelized Gradient Boosted Regression Trees

    Saurav Gupta, Nitin Anand Shrivastava, Abbas Khosravi, Bijaya Ketan Panigrahi
    Comments: IJCNN 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Accurate prediction of wind ramp events is critical for ensuring the
    reliability and stability of the power systems with high penetration of wind
    energy. This paper proposes a classification based approach for estimating the
    future class of wind ramp event based on certain thresholds. A parallelized
    gradient boosted regression tree based technique has been proposed to
    accurately classify the normal as well as rare extreme wind power ramp events.
    The model has been validated using wind power data obtained from the National
    Renewable Energy Laboratory database. Performance comparison with several
    benchmark techniques indicates the superiority of the proposed technique in
    terms of superior classification accuracy.

    Convergence rate of stochastic k-means

    Cheng Tang, Claire Monteleoni
    Subjects: Learning (cs.LG)

    We analyze online and mini-batch k-means variants. Both scale up the widely
    used Lloyd ‘s algorithm via stochastic approximation, and have become popular
    for large-scale clustering and unsupervised feature learning. We show, for the
    first time, that they have global convergence towards local optima at
    $O(frac{1}{t})$ rate under general conditions. In addition, we show if the
    dataset is clusterable, with suitable initialization, mini-batch k-means
    converges to an optimal k-means solution with $O(frac{1}{t})$ convergence rate
    with high probability. The k-means objective is non-convex and
    non-differentiable: we exploit ideas from non-convex gradient-based
    optimization by providing a novel characterization of the trajectory of k-means
    algorithm on its solution space, and circumvent its non-differentiability via
    geometric insights about k-means update.

    Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering

    Bo Yang, Xiao Fu, Nicholas D. Sidiropoulos, Mingyi Hong
    Comments: Main paper: 9 pages; Supplementary material: 3 pages
    Subjects: Learning (cs.LG)

    Most learning approaches treat dimensionality reduction (DR) and clustering
    separately (i.e., sequentially), but recent research has shown that optimizing
    the two tasks jointly can substantially improve the performance of both. The
    premise behind the latter genre is that the data samples are obtained via
    linear transformation of latent representations that are easy to cluster; but
    in practice, the transformation from the latent space to the data can be more
    complicated. In this work, we assume that this transformation is an unknown and
    possibly nonlinear function. To recover the `clustering-friendly’ latent
    representations and to better cluster the data, we propose a joint DR and
    K-means clustering approach in which DR is accomplished via learning a deep
    neural network (DNN). The motivation is to keep the advantages of jointly
    optimizing the two tasks, while exploiting the deep neural network’s ability to
    approximate any nonlinear function. This way, the proposed approach can work
    well for a broad class of generative models. Towards this end, we carefully
    design the DNN structure and the associated joint optimization criterion, and
    propose an effective and scalable algorithm to handle the formulated
    optimization problem. Experiments using five different real datasets are
    employed to showcase the effectiveness of the proposed approach.

    Similarity Learning for Time Series Classification

    Maria-Irina Nicolae, Éric Gaussier, Amaury Habrard, Marc Sebban
    Comments: Techreport
    Subjects: Learning (cs.LG)

    Multivariate time series naturally exist in many fields, like energy,
    bioinformatics, signal processing, and finance. Most of these applications need
    to be able to compare these structured data. In this context, dynamic time
    warping (DTW) is probably the most common comparison measure. However, not much
    research effort has been put into improving it by learning. In this paper, we
    propose a novel method for learning similarities based on DTW, in order to
    improve time series classification. Making use of the uniform stability
    framework, we provide the first theoretical guarantees in the form of a
    generalization bound for linear classification. The experimental study shows
    that the proposed approach is efficient, while yielding sparse classifiers.

    Generalization of metric classification algorithms for sequences classification and labelling

    Roman Samarev, Andrey Vasnetsov, Elizaveta Smelkova
    Subjects: Learning (cs.LG); Computation and Language (cs.CL)

    The article deals with the issue of modification of metric classification
    algorithms. In particular, it studies the algorithm k-Nearest Neighbours for
    its application to sequential data. A method of generalization of metric
    classification algorithms is proposed. As a part of it, there has been
    developed an algorithm for solving the problem of classification and labelling
    of sequential data. The advantages of the developed algorithm of classification
    in comparison with the existing one are also discussed in the article. There is
    a comparison of the effectiveness of the proposed algorithm with the algorithm
    of CRF in the task of chunking in the open data set CoNLL2000.

    A Closed Form Solution to Multi-View Low-Rank Regression

    Shuai Zheng, Xiao Cai, Chris Ding, Feiping Nie, Heng Huang
    Comments: Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI, 2015
    Subjects: Learning (cs.LG)

    Real life data often includes information from different channels. For
    example, in computer vision, we can describe an image using different image
    features, such as pixel intensity, color, HOG, GIST feature, SIFT features,
    etc.. These different aspects of the same objects are often called multi-view
    (or multi-modal) data. Low-rank regression model has been proved to be an
    effective learning mechanism by exploring the low-rank structure of real life
    data. But previous low-rank regression model only works on single view data. In
    this paper, we propose a multi-view low-rank regression model by imposing
    low-rank constraints on multi-view regression model. Most importantly, we
    provide a closed-form solution to the multi-view low-rank regression model.
    Extensive experiments on 4 multi-view datasets show that the multi-view
    low-rank regression model outperforms single-view regression model and reveals
    that multi-view low-rank structure is very helpful.

    A probabilistic model for the numerical solution of initial value problems

    Michael Schober, Simo Särkkä, Philipp Hennig
    Comments: 18 pages, 5 figures
    Subjects: Numerical Analysis (math.NA); Learning (cs.LG); Numerical Analysis (cs.NA); Machine Learning (stat.ML)

    Like many numerical methods, solvers for initial value problems (IVPs) on
    ordinary differential equations estimate an analytically intractable quantity,
    using the results of tractable computations as inputs. This structure is
    closely connected to the notion of inference on latent variables in statistics.
    We describe a class of algorithms that formulate the solution to an IVP as
    inference on a latent path that is a draw from a Gaussian process probability
    measure (or equivalently, the solution of a linear stochastic differential
    equation). We then show that certain members of this class are identified
    exactly with existing generalized linear methods for ODEs, in particular a
    number of Runge–Kutta methods and Nordsieck methods. This probabilistic
    formulation of classic methods is valuable in two ways: analytically, it
    highlights implicit prior assumptions favoring certain approximate solutions to
    the IVP over others, and gives a precise meaning to the old observation that
    these methods act like filters. Practically, it endows the classic solvers with
    `docking points’ for notions of uncertainty and prior information about the
    initial value, the value of the ODE itself, and the solution of the problem.

    BET on Independence

    Kai Zhang
    Subjects: Statistics Theory (math.ST); Learning (cs.LG); Computation (stat.CO); Methodology (stat.ME); Machine Learning (stat.ML)

    We study the problem of model-free dependence detection. This problem can be
    difficult even when the marginal distributions are known. We explain this
    difficulty by showing the impossibility to uniformly consistently distinguish
    degeneracy from independence with any single test. To make model-free
    dependence detection a tractable problem, we introduce the concept of binary
    expansion statistics (BEStat) and propose the binary expansion testing (BET)
    framework. Through simple mathematics, we convert the dependence detection
    problem to a multiple testing problem. Besides being model-free, the BET also
    enjoys many other advantages which include (1) invariance to monotone marginal
    transformations, (2) clear interpretability of local relationships upon
    rejection, and (3) close connections to computing for efficient algorithms. We
    illustrate the BET by studying the distribution of the brightest stars in the
    night sky.

    The Peaking Phenomenon in Semi-supervised Learning

    Jesse H. Krijthe, Marco Loog
    Comments: 11 pages, 5 figures. S+SSPR 2016, M’erida, Mexico
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    For the supervised least squares classifier, when the number of training
    objects is smaller than the dimensionality of the data, adding more data to the
    training set may first increase the error rate before decreasing it. This,
    possibly counterintuitive, phenomenon is known as peaking. In this work, we
    observe that a similar but more pronounced version of this phenomenon also
    occurs in the semi-supervised setting, where instead of labeled objects,
    unlabeled objects are added to the training set. We explain why the learning
    curve has a more steep incline and a more gradual decline in this setting
    through simulation studies and by applying an approximation of the learning
    curve based on the work by Raudys & Duin.

    Lazifying Conditional Gradient Algorithms

    Gábor Braun, Sebastian Pokutta, Daniel Zink
    Comments: 18 pages and 16 pages of computational results
    Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG)

    Conditional gradient algorithms (also often called Frank-Wolfe algorithms)
    are popular due to their simplicity of only requiring a linear optimization
    oracle and more recently they also gained significant traction for online
    learning. While simple in principle, in many cases the actual implementation of
    the linear optimization oracle is costly. We show a general method to lazify
    various conditional gradient algorithms, which in actual computations leads to
    several orders of magnitude of speedup in wall-clock time. This is achieved by
    using a faster separation oracle instead of a linear optimization oracle,
    relying only on few linear optimization oracle calls.

    Encoding the Local Connectivity Patterns of fMRI for Cognitive State Classification

    Itir Onal Ertugrul, Mete Ozay, Fatos T. Yarman Vural
    Comments: 8 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In this work, we propose a novel framework to encode the local connectivity
    patterns of brain, using Fisher Vectors (FV), Vector of Locally Aggregated
    Descriptors (VLAD) and Bag-of-Words (BoW) methods. We first obtain local
    descriptors, called Mesh Arc Descriptors (MADs) from fMRI data, by forming
    local meshes around anatomical regions, and estimating their relationship
    within a neighborhood. Then, we extract a dictionary of relationships, called
    extit{brain connectivity dictionary} by fitting a generative Gaussian mixture
    model (GMM) to a set of MADs, and selecting the codewords at the mean of each
    component of the mixture. Codewords represent the connectivity patterns among
    anatomical regions. We also encode MADs by VLAD and BoW methods using the
    k-Means clustering.

    We classify the cognitive states of Human Connectome Project (HCP) task fMRI
    dataset, where we train support vector machines (SVM) by the encoded MADs.
    Results demonstrate that, FV encoding of MADs can be successfully employed for
    classification of cognitive tasks, and outperform the VLAD and BoW
    representations. Moreover, we identify the significant Gaussians in mixture
    models by computing energy of their corresponding FV parts, and analyze their
    effect on classification accuracy. Finally, we suggest a new method to
    visualize the codewords of brain connectivity dictionary.

    Probabilistic Dimensionality Reduction via Structure Learning

    Li Wang
    Comments: 32 pages, 6 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We propose a novel probabilistic dimensionality reduction framework that can
    naturally integrate the generative model and the locality information of data.
    Based on this framework, we present a new model, which is able to learn a
    smooth skeleton of embedding points in a low-dimensional space from
    high-dimensional noisy data. The formulation of the new model can be
    equivalently interpreted as two coupled learning problem, i.e., structure
    learning and the learning of projection matrix. This interpretation motivates
    the learning of the embedding points that can directly form an explicit graph
    structure. We develop a new method to learn the embedding points that form a
    spanning tree, which is further extended to obtain a discriminative and compact
    feature representation for clustering problems. Unlike traditional clustering
    methods, we assume that centers of clusters should be close to each other if
    they are connected in a learned graph, and other cluster centers should be
    distant. This can greatly facilitate data visualization and scientific
    discovery in downstream analysis. Extensive experiments are performed that
    demonstrate that the proposed framework is able to obtain discriminative
    feature representations, and correctly recover the intrinsic structures of
    various real-world datasets.

    Dynamic Stacked Generalization for Node Classification on Networks

    Zhen Han, Alyson Wilson
    Comments: 9 pages, 6 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Social and Information Networks (cs.SI); Applications (stat.AP)

    We propose a novel stacked generalization (stacking) method as a dynamic
    ensemble technique using a pool of heterogeneous classifiers for node label
    classification on networks. The proposed method assigns component models a set
    of functional coefficients, which can vary smoothly with certain topological
    features of a node. Compared to the traditional stacking model, the proposed
    method can dynamically adjust the weights of individual models as we move
    across the graph and provide a more versatile and significantly more accurate
    stacking model for label prediction on a network. We demonstrate the benefits
    of the proposed model using both a simulation study and real data analysis.

    Sample Efficient Optimization for Learning Controllers for Bipedal Locomotion

    Rika Antonova, Akshara Rai, Christopher G. Atkeson
    Comments: To appear in International Conference on Humanoid Robots (Humanoids ‘2016), IEEE-RAS. (Rika Antonova and Akshara Rai contributed equally)
    Subjects: Robotics (cs.RO); Learning (cs.LG)

    Learning policies for bipedal locomotion can be difficult, as experiments are
    expensive and simulation does not usually transfer well to hardware. To counter
    this, we need al- gorithms that are sample efficient and inherently safe.
    Bayesian Optimization is a powerful sample-efficient tool for optimizing
    non-convex black-box functions. However, its performance can degrade in higher
    dimensions. We develop a distance metric for bipedal locomotion that enhances
    the sample-efficiency of Bayesian Optimization and use it to train a 16
    dimensional neuromuscular model for planar walking. This distance metric
    reflects some basic gait features of healthy walking and helps us quickly
    eliminate a majority of unstable controllers. With our approach we can learn
    policies for walking in less than 100 trials for a range of challenging
    settings. In simulation, we show results on two different costs and on various
    terrains including rough ground and ramps, sloping upwards and downwards. We
    also perturb our models with unknown inertial disturbances analogous with
    differences between simulation and hardware. These results are promising, as
    they indicate that this method can potentially be used to learn control
    policies on hardware.

    An Adaptive Test of Independence with Analytic Kernel Embeddings

    Wittawat Jitkrittum, Zoltan Szabo, Arthur Gretton
    Comments: 8 pages of main text
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    A new computationally efficient dependence measure, and an adaptive
    statistical test of independence, are proposed. The dependence measure is the
    difference between analytic embeddings of the joint distribution and the
    product of the marginals, evaluated at a finite set of locations (features).
    These features are chosen so as to maximize a lower bound on the test power,
    resulting in a test that is data-efficient, and that runs in linear time (with
    respect to the sample size n). The optimized features can be interpreted as
    evidence to reject the null hypothesis, indicating regions in the joint domain
    where the joint distribution and the product of the marginals differ most.
    Consistency of the independence test is established, for an appropriate choice
    of features. In real-world benchmarks, independence tests using the optimized
    features perform comparably to the state-of-the-art quadratic-time HSIC test,
    and outperform competing O(n) and O(n log n) tests.

    Simultaneous Learning of Trees and Representations for Extreme Classification, with Application to Language Modeling

    Yacine Jernite, Anna Choromanska, David Sontag, Yann LeCun
    Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)

    This paper addresses the problem of multi-class classification with an
    extremely large number of classes, where the class predictor is learned jointly
    with the data representation, as is the case in language modeling problems. The
    predictor admits a hierarchical structure, which allows for efficient handling
    of settings that deal with a very large number of labels. The predictive power
    of the model however can heavily depend on the structure of the tree. We
    address this problem with an algorithm for tree construction and training that
    is based on a new objective function which favors balanced and easily-separable
    node partitions. We describe theoretical properties of this objective function
    and show that it gives rise to a boosting algorithm for which we provide a
    bound on classification error, i.e. we show that if the objective is weakly
    optimized in the internal nodes of the tree, then our algorithm will amplify
    this weak advantage to build a tree achieving any desired level of accuracy. We
    apply the algorithm to the task of language modeling by re-framing conditional
    density estimation as a variant of the hierarchical classification problem. We
    empirically demonstrate on text data that the proposed approach leads to
    high-quality trees in terms of perplexity and computational running time
    compared to its non-hierarchical counterpart.

    Playing FPS Games with Deep Reinforcement Learning

    Guillaume Lample, Devendra Singh Chaplot
    Comments: The authors contributed equally to this work
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    Advances in deep reinforcement learning have allowed autonomous agents to
    perform well on Atari games, often outperforming humans, using only raw pixels
    to make their decisions. However, most of these games take place in 2D
    environments that are fully observable to the agent. In this paper, we present
    the first architecture to tackle 3D environments in first-person shooter games,
    that involve partially observable states. Typically, deep reinforcement
    learning methods only utilize visual input for training. We present a method to
    augment these models to exploit game feature information such as the presence
    of enemies or items, during the training phase. Our model is trained to
    simultaneously learn these features along with minimizing a Q-learning
    objective, which is shown to dramatically improve the training speed and
    performance of our agent. Our architecture is also modularized to allow
    different models to be independently trained for different phases of the game.
    We show that the proposed architecture substantially outperforms built-in AI
    agents of the game as well as humans in deathmatch scenarios.


    Information Theory

    Achieving Perfect Location Privacy in Wireless Devices Using Anonymization

    Zarrin Montazeri, Amir Houmansadr, Hossein Pishro-Nik
    Comments: 12 pages, 3 figures
    Subjects: Information Theory (cs.IT)

    The popularity of mobile devices and location-based services (LBS) has
    created great concern regarding the location privacy of their users.
    Anonymization is a common technique that is often used to protect the location
    privacy of LBS users. Here, we present an information-theoretic approach to
    define the notion of perfect location privacy. We show how LBS’s should use the
    anonymization method to ensure that their users can achieve perfect location
    privacy. First, we assume that a user’s current location is independent from
    her past locations. Using this i.i.d model, we show that if the pseudonym of
    the user is changed before O(n^(2/r-1)) observations are made by the adversary
    for that user, then the user has perfect location privacy. Here, n is the
    number of the users in the network and r is the number of all possible
    locations that users can go to. Next, we model users’ movements using Markov
    chains to better model real-world movement patterns. We show that perfect
    location privacy is achievable for a user if the user’s pseudonym is changed
    before O(n^(2/|E|-r)) observations are collected by the adversary for the user,
    where |E| is the number of edges in the user’s Markov chain model.

    Estimation of Received Signal Strength Distribution for Smart Meters with Biased Measurement Data Set

    Mathias Rønholt Kielgast, Anders Charly Rasmussen, Mathias Hjorth Laursen, Jimmy Jessen Nielsen, Petar Popovski, Rasmus Krigslund
    Comments: Accepted for publication in IEEE Wireless Communications Letters
    Subjects: Information Theory (cs.IT)

    This letter presents an experimental study and a novel modelling approach of
    the wireless channel of smart utility meters placed in basements or sculleries.
    The experimental data consist of signal strength measurements of consumption
    report packets. Since such packets are only registered if they can be decoded
    by the receiver, the part of the signal strength distribution that falls below
    the receiver sensitivity threshold is not observable. We combine a Rician
    fading model with a bias function that captures the cut-off in the observed
    signal strength measurements. Two sets of experimental data are analysed. It is
    shown that the proposed method offers an approximation of the distribution of
    the signal strength measurements that is better than a na”ive Rician fitting.

    Improved bounds for sparse recovery from subsampled random convolutions

    Shahar Mendelson, Holger Rauhut, Rachel Ward
    Comments: 39 pages
    Subjects: Information Theory (cs.IT); Probability (math.PR)

    We study the recovery of sparse vectors from subsampled random convolutions
    via $ell_1$-minimization. We consider the setup in which both the subsampling
    locations as well as the generating vector are chosen at random. For a
    subgaussian generator with independent entries, we improve previously known
    estimates: if the sparsity $s$ is small enough, i.e.~$s lesssim
    sqrt{n/log(n)}$, we show that $m gtrsim s log(en/s)$ measurements are
    sufficient to recover $s$-sparse vectors in dimension $n$ with high
    probability, matching the well-known condition for recovery from standard
    Gaussian measurements. If $s$ is larger, then essentially $m geq s log^2(s)
    log(log(s)) log(n)$ measurements are sufficient, again improving over
    previous estimates. Moreover, we also provide robustness estimates for
    measurement errors that are bounded in $ell_q$ for $q > 2$ — in particular,
    allowing the case $q=infty$ which is important for quantized compressive
    sensing. All these results are shown via $ell_q$-robust versions of the null
    space property and for $q > 2$ they represent the first non-trivial bounds for
    structured random matrices. As a crucial ingredient, our approach requires to
    lower bound expressions of the form $inf_{v in V_r} | Gamma_v xi|_q$,
    where $Gamma_v$ is a set of matrices indexed by unit norm $r$-sparse vectors
    and $xi$ is a subgaussian random vector. This involves the combination of
    small ball estimates with chaining techniques.

    Joint Relay-User Beamforming Design in Full-Duplex Two-Way Relay Channel

    Zhigang Wen, Shuai Wang, Xiaoqing Liu, Junwei Zou
    Comments: To appear in IEEE Transactions on Vehicular Technology
    Subjects: Information Theory (cs.IT)

    A full-duplex two-way relay channel with multiple antennas is considered. For
    this three-node network, the beamforming design needs to suppress
    self-interference. While a traditional way is to apply zero-forcing for
    self-interference mitigation, it may harm the desired signals. In this paper, a
    design which reserves a fraction of self-interference is proposed by solving a
    quality-of-service constrained beamforming design problem. Since the problem is
    challenging due to the loop self-interference, a convergence-guaranteed
    alternating optimization algorithm is proposed to jointly design the relay-user
    beamformers. Numerical results show that the proposed scheme outperforms
    zero-forcing method, and achieves a transmit power close to the ideal case.

    An interleaver design for polar codes over slow fading channels

    Saurabha R Tavildar
    Comments: 3 pages, 4 figures
    Subjects: Information Theory (cs.IT)

    We consider the problem of using polar codes over slow fading wireless
    channels. For design, we focus on a parallel slow fading channel with 2 blocks,
    and polar codes with rate <= 1/2. Motivated by Arikan’s systematic polar code
    construction, we propose an interleaver design for a general polar code. The
    interleaver comprises of using the bit reversal of the order of polarized bit
    channels. This interleaver is called a diversity interleaver. In addition to
    the diversity interleaver, a diversity polar code is proposed to further
    increase the diversity gain.

    The proposed designs are evaluated via link simulations for AWGN and fading
    channels. The simulation results show a performance close to the outage
    probability (within 2 dB) and significant gains over using a random
    interleaver.

    Joint Multiuser Downlink Beamforming and Admission Control for Green Cloud-RANs with Limited Fronthaul Based on Mixed Integer Semi-definite Program

    Zhi Yu, Ke Wang, Lei Chen, Hong Ji
    Comments: 13 pages, 8 figures. This article has been submitted to IEEE Transactions on Mobile Computing
    Subjects: Information Theory (cs.IT)

    With the dense deployment of the remote radio heads (RRHs), the huge network
    power consumption has become a great challenge for green cloud radio access
    networks (Cloud-RANs), and multiuser downlink beamforming has been proposed as
    a promising solution. Moreover, the increasing number of mobile users (MUs)
    causes that admission control is essential for Cloud-RAN with limited fronthaul
    capacity and predefined power budget. In this paper, we consider the problem of
    joint multiuser downlink beamforming and admission control (JBAC) to enhance
    the admitted MUs in the network and reduce the network power consumption, while
    taking into account the Quality of Service requirements of the MUs, the power
    budget constraints and fronthaul limitation. It is shown that the JBAC problem
    is a mixed integer nonlinear problem, and still non-convex even though the
    continuous relaxation is adopted. Therefore, we first transform the JBAC
    problem into a Mixed-Integer Semidefinite Program. Then, we propose a bound
    improving Branch and Bound algorithm to yield the near-optimal solution. For
    practical application, a polynomial-time heuristic algorithm is proposed to
    derive the sub-optimal solution. Extensive simulations are conducted with
    different system configurations to show the effectiveness of the proposed two
    schemes.

    Joint Transmit Power and Relay Two-Way Beamforming Optimization for Energy-Harvesting Full-Duplex Communications

    Alexander A. Okandeji, Muhammad R. A. Khandaker, Kai-Kit Wong, Zhongbin Zheng
    Comments: Accepted for Publication. Globecom 2016
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    This paper studies the joint optimization problem of two-way relay
    beamforming, the receiver power splitting (PS) ratio as well as the transmit
    power at the sources to maximize the achievable sum-rate of a simultaneous
    wireless information and power transfer (SWIPT) system with a full-duplex (FD)
    multiple-input multiple-output (MIMO) amplify and forward (AF) relay, assuming
    perfect channel state information (CSI). In particular, our contribution is an
    iterative algorithm based on the difference of convex programming (DC) and one
    dimensional searching to achieve the joint solution. Simulation results are
    provided to demonstrate the effectiveness of the proposed algorithm.

    Joint CoMP-Cell Selection and Resource Allocation with Fronthaul-Constrained C-RAN

    Lei You, Di Yuan
    Comments: 6 pages
    Subjects: Information Theory (cs.IT)

    Cloud-based Radio Access Network (C-RAN) is a promising architecture for
    future cellular networks, in which Baseband Units (BBUs) are placed at a
    centralized location, with capacity-constrained fronthaul connected to multiple
    distributed Remote Radio Units (RRHs) that are far away from the BBUs. The
    centralization of signal processing enables the flexibility for coordinated
    multi-point transmission (CoMP) to meet high traffic demand of users. We
    investigate how to jointly optimize CoMP-cell selection and base station
    resource allocation so as to enhance the quality of service (QoS), subject to
    the fronthaul capacity constraint in orthogonal frequency-division multiple
    access (OFDMA) based C-RAN. The problem is proved to be NP-hard in this paper.
    To deal with the computational complexity, we derive a partial optimality
    condition as the foundation for designing a cell-selection algorithm. Besides,
    we provide a solution method of the optimum of the time-frequency resource
    allocation problem without loss of fairness on the QoS enhancement of all
    users. The simulations show good performance of the proposed algorithms for
    jointly optimizing the cell selection and resource allocation in a C-RAN, with
    respect to QoS.

    MIMO: State of the Art and the Future in Focus

    Mboli Sechang Julius
    Comments: This is a simple review on the state of the art of MIMO and the future. It tries to address what Massive MIMO which is the MIMO that is expected to work for 5G networks will look like. Another generation of wireless networks is expected by 2020. It is simply written for educational purpose and not any commercial use. It is intended to help those trying to build knowledge in this area
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    Antennas of transmitters and receivers have been manipulated to increase the
    capacity of transmission and reception of signals. Using many elements in
    antennas to shape beams and direct nulls in a particular point for optimum
    signal transmission and reception has over decades, had tremendous positive
    influence in received power and signal to noise ratio (SNR). However, since the
    antenna elements manipulation can be done both at base station and device
    terminal, it gives rise to an important method of using several antennas to put
    and obtain signals to and from space with increased capacity. This principle is
    termed Multiple-input and Multiple-output (MIMO). This paper discusses
    application of MIMO in the state of the art and next generation of wireless
    systems (5G). It also discusses four models of MIMO, SISO, SIMO, MISO and MIMO,
    considering three method of combing the signals from multipath propagations,
    Selection combining (SC), Equal gain combing (EGC) and maximum ratio combining
    (MRC). Spatial diversity and spatial multiplexing are also discussed as form of
    MIMO. Finally, Massive or Hyper MIMO which is a new method of increasing
    transmission capacity by very large scale for fifth generation of wireless
    system is discussed with its challenges and opportunities. Key terms-Diversity
    combining techniques, spatial multiplexing, channel state information (CSI).
    Massive MIMO

    Quantum enhancement of randomness distribution

    Raul Garcia-Patron, William Matthews, Andreas Winter
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    The capability of a given channel to communicate information is, a priori,
    distinct from its capability to distribute shared randomness. In this article
    we define randomness distribution capacities of quantum channels assisted by
    forward, back, or two-way classical communication and compare these to the
    corresponding communication capacities. With forward assistance or no
    assistance, we find that they are equal. We establish the mutual information of
    the channel as an upper bound on the two-way assisted randomness distribution
    capacity. This implies that all of the capacities are equal for
    classical-quantum channels. On the other hand, we show that the back-assisted
    randomness distribution capacity of a quantum-classical channels is equal to
    its mutual information. This is often strictly greater than the back-assisted
    communication capacity. We give an explicit example of such a separation where
    the randomness distribution protocol is noiseless.

    New Algorithms for Maximizing Cellular Wireless Network Energy Efficiency

    Kemal Davaslioglu, Cemil Can Coskun, Ender Ayanoglu
    Comments: Information Theory and Applications (ITA) Workshop 2016
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    In this paper, we aim to maximize the energy efficiency of cellular wireless
    networks. Specifically, we address the power allocation problem in multi-cell
    multi-carrier systems. Considering realistic base station power consumption
    models, we formulate a network-wide energy efficiency maximization problem.
    Using tools from fractional programming, we cast this problem in the framework
    of bi-criterion optimization where rate maximization and power minimization are
    weighted accordingly. Interference pricing mechanism is applied to reduce the
    intercell interference and to achieve a higher network performance. We
    decompose the main problem into subproblems via dual decomposition. These
    subproblems are independently solved per sector using limited information
    exchange between base stations. We first derive our expressions and present
    algorithms for the single-tier networks. Then, we extend our analysis to
    two-tier networks where picocell base stations are deployed to improve the
    network performance and reduce the link distances. Lastly, we extend our
    framework and include the quality-of-service constraints.We obtain closed-form
    expressions for the power level updates which are determined by the multi-level
    water-filling algorithm, or, as it is sometimes called as, the modified
    waterfilling algorithm. Based on our simulation results, we demonstrate that
    the proposed algorithms can outperform the benchmark approaches in terms of
    energy efficiency by a factor of 2.7.

    Uplink-Based Framework for Control Plane Applications in 5G mmWave Cellular Networks

    Marco Giordani, Marco Mezzavilla, Sundeep Rangan, Michele Zorzi
    Comments: Submitted for publication in IEEE Transactions on Wireless Communications (TWC)
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    The millimeter wave (mmWave) frequencies offer the potential of orders of
    magnitude increases in capacity for next-generation cellular systems. However,
    links in mmWave networks are susceptible to blockage and may suffer from rapid
    variations in quality. Connectivity to multiple cells – in the mmWave and in
    the traditional frequencies – is considered essential for robust connectivity.
    One of the challenges in supporting multi-connectivity in mmWaves is the
    requirement for the network to track the direction of each link in addition to
    its power and timing. To address this challenge, this paper proposes a novel
    uplink measurement system based on (i) the UE transmitting sounding signals in
    directions that sweep the angular space, (ii) the mmWave cells measuring the
    instantaneous received signal strength along with its variance to capture the
    dynamics and the reliability of a channel/direction and, finally, (iii) a
    centralized controller making scheduling decisions based on the mmWave cell
    reports and transmitting the decisions either via a mmWave cell or a
    conventional LTE cell (when the paths are not available). We argue that the
    proposed scheme enables fair and robust cell selection, in addition to
    efficient and periodical tracking of the user, in the presence of the channel
    variability expected at mmWaves.

    A New Perspective on Randomized Gossip Algorithms

    Nicolas Loizou, Peter Richtárik
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Optimization and Control (math.OC)

    In this short note we propose a new approach for the design and analysis of
    randomized gossip algorithms which can be used to solve the average consensus
    problem. We show how that Randomized Block Kaczmarz (RBK) method – a method for
    solving linear systems – works as gossip algorithm when applied to a special
    system encoding the underlying network. The famous pairwise gossip algorithm
    arises as a special case. Subsequently, we reveal a hidden duality of
    randomized gossip algorithms, with the dual iterative process maintaining a set
    of numbers attached to the edges as opposed to nodes of the network. We prove
    that RBK obtains a superlinear speedup in the size of the block, and
    demonstrate this effect through experiments.




沪ICP备19023445号-2号
友情链接