IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Fri, 24 Feb 2017

    我爱机器学习(52ml.net)发表于 2017-02-24 00:00:00
    love 0

    Neural and Evolutionary Computing

    Bidirectional Backpropagation: Towards Biologically Plausible Error Signal Transmission in Neural Networks

    Hongyin Luo, Jie Fu, James Glass
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    The back-propagation (BP) algorithm has been considered the de facto method
    for training deep neural networks. It back-propagates errors from the output
    layer to the hidden layers in an exact manner using feedforward weights. In
    this work, we propose a more biologically plausible paradigm of neural
    architecture according to biological findings. Specifically, we propose two
    bidirectional learning algorithms with two sets of trainable weights.
    Preliminary results show that our models perform best on the MNIST and the
    CIFAR10 datasets among the asymmetric error signal passing methods, and their
    performance is more close to that of BP.


    Computer Vision and Pattern Recognition

    k-Means Clustering and Ensemble of Regressions: An Algorithm for the ISIC 2017 Skin Lesion Segmentation Challenge

    David Alvarez, Monica Iglesias
    Comments: Abstract submitted to arXiv as prerequisite to participate in the ISIC 2017 Skin Lesion Segmentation Challenge
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This abstract briefly describes a segmentation algorithm developed for the
    ISIC 2017 Skin Lesion Detection Competition hosted at [ref]. The objective of
    the competition is to perform a segmentation (in the form of a binary mask
    image) of skin lesions in dermoscopic images as close as possible to a
    segmentation performed by trained clinicians, which is taken as ground truth.
    This project only takes part in the segmentation phase of the challenge. The
    other phases of the competition (feature extraction and lesion identification)
    are not considered.

    The proposed algorithm consists of 4 steps: (1) lesion image preprocessing,
    (2) image segmentation using k-means clustering of pixel colors, (3)
    calculation of a set of features describing the properties of each segmented
    region, and (4) calculation of a final score for each region, representing the
    likelihood of corresponding to a suitable lesion segmentation. The scores in
    step (4) are obtained by averaging the results of 2 different regression models
    using the scores of each region as input. Before using the algorithm these
    regression models must be trained using the training set of images and ground
    truth masks provided by the Competition. Steps 2 to 4 are repeated with an
    increasing number of clusters (and therefore the image is segmented into more
    regions) until there is no further improvement of the calculated scores.

    ViP-CNN: A Visual Phrase Reasoning Convolutional Neural Network for Visual Relationship Detection

    Yikang Li, Wanli Ouyang, Xiaogang Wang
    Comments: 10 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    As the intermediate level task connecting image captioning and object
    detection, visual relationship detection started to catch researchers’
    attention because of its descriptive power and clear structure. It localizes
    the objects and captures their interactions with a subject-predicate-object
    triplet, e.g. person-ride-horse. In this paper, the visual relationship is
    considered as a phrase with three components. So we formulate the visual
    relationship detection as three inter-connected recognition problems and
    propose a Visual Phrase reasoning Convolutional Neural Network (ViP-CNN) to
    address them simultaneously. In ViP-CNN, we present a Visual Phrase Reasoning
    Structure (VPRS) to set up the connection among the relationship components and
    help the model consider the three problems jointly. Corresponding non-maximum
    suppression method and model training strategy are also proposed. Experimental
    results show that our ViP-CNN outperforms the state-of-art method both in speed
    and accuracy. We further pretrain our model on our cleansed Visual Genome
    Relationship dataset, which is found to perform better than the pretraining on
    the ImageNet for this task.

    Analyzing Learned Convnet Features with Dirichlet Process Gaussian Mixture Models

    David Malmgren-Hansen, Allan Aasbjerg Nielsen, Rasmus Engholm
    Comments: Presented at NIPS 2016 Workshop: Practical Bayesian Nonparametrics
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional Neural Networks (Convnets) have achieved good results in a
    range of computer vision tasks the recent years. Though given a lot of
    attention, visualizing the learned representations to interpret Convnets, still
    remains a challenging task. The high dimensionality of internal representations
    and the high abstractions of deep layers are the main challenges when
    visualizing Convnet functionality. We present in this paper a technique based
    on clustering internal Convnet representations with a Dirichlet Process
    Gaussian Mixture Model, for visualization of learned representations in
    Convnets. Our method copes with the high dimensionality of a Convnet by
    clustering representations across all nodes of each layer. We will discuss how
    this application is useful when considering transfer learning, i.e.
    transferring a model trained on one dataset to solve a task on a different one.

    Robust and fully automated segmentation of mandible from CT scans

    Neslisah Torosdagli, Denise K. Liberton, Payal Verma, Murat Sincan Janice Lee, Sumanta Pattanaik, Ulas Bagci
    Comments: 4 pages, 5 figures, IEEE International Symposium on Biomedical Imaging (ISBI) 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Mandible bone segmentation from computed tomography (CT) scans is challenging
    due to mandible’s structural irregularities, complex shape patterns, and lack
    of contrast in joints. Furthermore, connections of teeth to mandible and
    mandible to remaining parts of the skull make it extremely difficult to
    identify mandible boundary automatically. This study addresses these challenges
    by proposing a novel framework where we define the segmentation as two
    complementary tasks: recognition and delineation. For recognition, we use
    random forest regression to localize mandible in 3D. For delineation, we
    propose to use 3D gradient-based fuzzy connectedness (FC) image segmentation
    algorithm, operating on the recognized mandible sub-volume. Despite heavy CT
    artifacts and dental fillings, consisting half of the CT image data in our
    experiments, we have achieved highly accurate detection and delineation
    results. Specifically, detection accuracy more than 96% (measured by union of
    intersection (UoI)), the delineation accuracy of 91% (measured by dice
    similarity coefficient), and less than 1 mm in shape mismatch (Hausdorff
    Distance) were found.

    Learning Chained Deep Features and Classifiers for Cascade in Object Detection

    Wanli Ouyang, Ku Wang, Xin Zhu, Xiaogang Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Cascade is a widely used approach that rejects obvious negative samples at
    early stages for learning better classifier and faster inference. This paper
    presents chained cascade network (CC-Net). In this CC-Net, the cascaded
    classifier at a stage is aided by the classification scores in previous stages.
    Feature chaining is further proposed so that the feature learning for the
    current cascade stage uses the features in previous stages as the prior
    information. The chained ConvNet features and classifiers of multiple stages
    are jointly learned in an end-to-end network. In this way, features and
    classifiers at latter stages handle more difficult samples with the help of
    features and classifiers in previous stages. It yields consistent boost in
    detection performance on benchmarks like PASCAL VOC 2007 and ImageNet. Combined
    with better region proposal, CC-Net leads to state-of-the-art result of 81.1%
    mAP on PASCAL VOC 2007.

    Increasing Deep Learning Melanoma Classification by Classical And Expert Knowledge Based Image Transforms

    Cristina Nader Vasconcelos, Bárbara Nader Vasconcelos
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Skin cancer is a major public health problem, as is the most common type of
    cancer and represents more than half of cancer diagnoses worldwide. Early
    detection influences the outcome of the disease and motivates our work. We
    obtain the state of the art results for the ISBI 2016 Melanoma Classification
    Challenge (named Skin Lesion Analysis towards Melanoma Detection) facing the
    peculiarities of dealing with such a small, unbalanced, biological database.
    For that, we explore committees of Convolutional Neural Networks trained over
    the ISBI challenge training dataset artificially augmented by both classical
    image processing transforms and image warping guided by specialist knowledge
    about the lesion axis and improve the final classifier invariance to common
    melanoma variations.

    CT Image Denoising with Perceptive Deep Neural Networks

    Qingsong Yang, Pingkun Yan, Mannudeep K. Kalra, Ge Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Increasing use of CT in modern medical practice has raised concerns over
    associated radiation dose. Reduction of radiation dose associated with CT can
    increase noise and artifacts, which can adversely affect diagnostic confidence.
    Denoising of low-dose CT images on the other hand can help improve diagnostic
    confidence, which however is a challenging problem due to its ill-posed nature,
    since one noisy image patch may correspond to many different output patches. In
    the past decade, machine learning based approaches have made quite impressive
    progress in this direction. However, most of those methods, including the
    recently popularized deep learning techniques, aim for minimizing
    mean-squared-error (MSE) between a denoised CT image and the ground truth,
    which results in losing important structural details due to over-smoothing,
    although the PSNR based performance measure looks great. In this work, we
    introduce a new perceptual similarity measure as the objective function for a
    deep convolutional neural network to facilitate CT image denoising. Instead of
    directly computing MSE for pixel-to-pixel intensity loss, we compare the
    perceptual features of a denoised output against those of the ground truth in a
    feature space. Therefore, our proposed method is capable of not only reducing
    the image noise levels, but also keeping the critical structural information at
    the same time. Promising results have been obtained in our experiments with a
    large number of CT images.

    Synthesising Dynamic Textures using Convolutional Neural Networks

    Christina M. Funke, Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Here we present a parametric model for dynamic textures. The model is based
    on spatiotemporal summary statistics computed from the feature representations
    of a Convolutional Neural Network (CNN) trained on object recognition. We
    demonstrate how the model can be used to synthesise new samples of dynamic
    textures and to predict motion in simple movies.


    Artificial Intelligence

    A Probabilistic Framework for Location Inference from Social Media

    Yujie Qian, Jie Tang, Zhilin Yang, Binxuan Huang, Wei Wei, Kathleen M. Carley
    Subjects: Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

    We study the extent to which we can infer users’ geographical locations from
    social media. Location inference from social media can benefit many
    applications, such as disaster management, targeted advertising, and news
    content tailoring. In recent years, a number of algorithms have been proposed
    for identifying user locations on social media platforms such as Twitter and
    Facebook from message contents, friend networks, and interactions between
    users. In this paper, we propose a novel probabilistic model based on factor
    graphs for location inference that offers several unique advantages for this
    task. First, the model generalizes previous methods by incorporating content,
    network, and deep features learned from social context. The model is also
    flexible enough to support both supervised learning and semi-supervised
    learning. Second, we explore several learning algorithms for the proposed
    model, and present a Two-chain Metropolis-Hastings (MH+) algorithm, which
    improves the inference accuracy. Third, we validate the proposed model on three
    different genres of data – Twitter, Weibo, and Facebook – and demonstrate that
    the proposed model can substantially improve the inference accuracy (+3.3-18.5%
    by F1-score) over that of several state-of-the-art methods.

    Ontologies in System Engineering: a Field Report

    Marco Menapace, Armando Tacchella
    Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

    In recent years ontologies enjoyed a growing popularity outside specialized
    AI communities. System engineering is no exception to this trend, with
    ontologies being proposed as a basis for several tasks in complex industrial
    implements, including system design, monitoring and diagnosis. In this paper,
    we consider four different contributions to system engineering wherein
    ontologies are instrumental to provide enhancements over traditional ad-hoc
    techniques. For each application, we briefly report the methodologies, the
    tools and the results obtained with the goal to provide an assessment of merits
    and limits of ontologies in such domains.

    A DIKW Paradigm to Cognitive Engineering

    Amit Kumar Mishra
    Subjects: Artificial Intelligence (cs.AI)

    Though the word cognitive has a wide range of meanings we define cognitive
    engineering as learning from brain to bolster engineering solutions. However,
    giving an achievable framework to the process towards this has been a difficult
    task. In this work we take the classic data information knowledge wisdom (DIKW)
    framework to set some achievable goals and sub-goals towards cognitive
    engineering. A layered framework like DIKW aligns nicely with the layered
    structure of pre-frontal cortex. And breaking the task into sub-tasks based on
    the layers also makes it easier to start developmental endeavours towards
    achieving the final goal of a brain-inspired system.

    Theoretical and Experimental Analysis of the Canadian Traveler Problem

    Doron Zarchy
    Subjects: Artificial Intelligence (cs.AI)

    Devising an optimal strategy for navigation in a partially observable
    environment is one of the key objectives in AI. One of the problem in this
    context is the Canadian Traveler Problem (CTP). CTP is a navigation problem
    where an agent is tasked to travel from source to target in a partially
    observable weighted graph, whose edge might be blocked with a certain
    probability and observing such blockage occurs only when reaching upon one of
    the edges end points. The goal is to find a strategy that minimizes the
    expected travel cost. The problem is known to be P(#) hard. In this work we
    study the CTP theoretically and empirically. First, we study the Dep-CTP, a CTP
    variant we introduce which assumes dependencies between the edges status. We
    show that Dep-CTP is intractable, and further we analyze two of its subclasses
    on disjoint paths graph. Second, we develop a general algorithm Gen-PAO that
    optimally solve the CTP. Gen-PAO is capable of solving two other types of CTP
    called Sensing-CTP and Expensive-Edges CTP. Since the CTP is intractable,
    Gen-PAO use some pruning methods to reduce the space search for the optimal
    solution. We also define some variants of Gen-PAO, compare their performance
    and show some benefits of Gen-PAO over existing work.

    A Realistic Dataset for the Smart Home Device Scheduling Problem for DCOPs

    William Kluegel, Muhammad Aamir Iqbal, Ferdinando Fioretto, William Yeoh, Enrico Pontelli
    Comments: 15 pages, OPTMAS17
    Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

    The field of Distributed Constraint Optimization has gained momentum in
    recent years thanks to its ability to address various applications related to
    multi-agent cooperation. While techniques to solve Distributed Constraint
    Optimization Problems (DCOPs) are abundant and have matured substantially since
    the field inception, the number of DCOP realistic applications and benchmark
    used to asses the performance of DCOP algorithms is lagging behind. To contrast
    this background we (i) introduce the Smart Home Device Scheduling (SHDS)
    problem, which describe the problem of coordinating smart devices schedules
    across multiple homes as a multi-agent system, (ii) detail the physical models
    adopted to simulate smart sensors, smart actuators, and homes environments, and
    (iii) introduce a DCOP realistic benchmark for SHDS problems.

    Proactive Resource Management in LTE-U Systems: A Deep Learning Perspective

    Ursula Challita, Li Dong, Walid Saad
    Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

    LTE in unlicensed spectrum (LTE-U) is a promising approach to overcome the
    wireless spectrum scarcity. However, to reap the benefits of LTE-U, a fair
    coexistence mechanism with other incumbent WiFi deployments is required. In
    this paper, a novel deep learning approach is proposed for modeling the
    resource allocation problem of LTE-U small base stations (SBSs). The proposed
    approach enables multiple SBSs to proactively perform dynamic channel
    selection, carrier aggregation, and fractional spectrum access while
    guaranteeing fairness with existing WiFi networks and other LTE-U operators.
    Adopting a proactive coexistence mechanism enables future delay-intolerant
    LTE-U data demands to be served within a given prediction window ahead of their
    actual arrival time thus avoiding the underutilization of the unlicensed
    spectrum during off-peak hours while maximizing the total served LTE-U traffic
    load. To this end, a noncooperative game model is formulated in which SBSs are
    modeled as Homo Egualis agents that aim at predicting a sequence of future
    actions and thus achieving long-term equal weighted fairness with WLAN and
    other LTE-U operators over a given time horizon. The proposed deep learning
    algorithm is then shown to reach a mixed-strategy Nash equilibrium (NE), when
    it converges. Simulation results using real data traces show that the proposed
    scheme can yield up to 28% and 11% gains over a conventional reactive approach
    and a proportional fair coexistence mechanism, respectively. The results also
    show that the proposed framework prevents WiFi performance degradation for a
    densely deployed LTE-U network.

    Transferring Face Verification Nets To Pain and Expression Regression

    Feng Wang, Xiang Xiang, Chang Liu, Trac D. Tran, Austin Reiter, Gregory D. Hager, Harry Quon, Jian Cheng, Alan L. Yuille
    Comments: 5 pages, 3 figure; submitted to IEEE ICIP 2017 which does not perform blind reviews
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG); Multimedia (cs.MM)

    Limited annotated data is available for the research of estimating facial
    expression intensities, which makes the training of deep networks for automated
    expression assessment very challenging. Fortunately, fine-tuning from a
    data-extensive pre-trained domain such as face verification can alleviate the
    problem. In this paper, we propose a transferred network that fine-tunes a
    state-of-the-art face verification network using expression-intensity labeled
    data with a regression layer. In this way, the expression regression task can
    benefit from the rich feature representations trained on a huge amount of data
    for face verification. The proposed transferred deep regressor is applied in
    estimating the intensity of facial action units (2017 EmotionNet Challenge) and
    in particular pain intensity estimation (UNBS-McMaster Shoulder-Pain dataset).
    It wins the second place in the challenge and achieves the state-of-the-art
    performance on Shoulder-Pain dataset. Particularly for Shoulder-Pain with the
    imbalance issue of different pain levels, a new weighted evaluation metric is
    proposed.


    Information Retrieval

    Time-Series Adaptive Estimation of Vaccination Uptake Using Web Search Queries

    Niels Dalum Hansen, Kåre Mølbak, Ingemar J. Cox, Christina Lioma
    Subjects: Information Retrieval (cs.IR); Quantitative Methods (q-bio.QM); Applications (stat.AP)

    Estimating vaccination uptake is an integral part of ensuring public health.
    It was recently shown that vaccination uptake can be estimated automatically
    from web data, instead of slowly collected clinical records or population
    surveys. All prior work in this area assumes that features of vaccination
    uptake collected from the web are temporally regular. We present the first ever
    method to remove this assumption from vaccination uptake estimation: our method
    dynamically adapts to temporal fluctuations in time series web data used to
    estimate vaccination uptake. We show our method to outperform the state of the
    art compared to competitive baselines that use not only web data but also
    curated clinical data. This performance improvement is more pronounced for
    vaccines whose uptake has been irregular due to negative media attention (HPV-1
    and HPV-2), problems in vaccine supply (DiTeKiPol), and targeted at children of
    12 years old (whose vaccination is more irregular compared to younger
    children).

    Stability of Topic Modeling via Matrix Factorization

    Mark Belford, Brian Mac Namee, Derek Greene
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    Topic models can provide us with an insight into the underlying latent
    structure of a large corpus of documents. A range of methods have been proposed
    in the literature, including probabilistic topic models and techniques based on
    matrix factorization. However, in both cases, standard implementations rely on
    stochastic elements in their initialization phase, which can potentially lead
    to different results being generated on the same corpus when using the same
    parameter values. This corresponds to the concept of “instability” which has
    previously been studied in the context of (k)-means clustering. In many
    applications of topic modeling, this problem of instability is not considered
    and topic models are treated as being definitive, even though the results may
    change considerably if the initialization process is altered. In this paper we
    demonstrate the inherent instability of popular topic modeling approaches,
    using a number of new measures to assess stability. To address this issue in
    the context of matrix factorization for topic modeling, we propose the use of
    ensemble learning strategies. Based on experiments performed on annotated text
    corpora, we show that a K-Fold ensemble strategy, combining both ensembles and
    structured initialization, can significantly reduce instability, while
    simultaneously yielding more accurate topic models.

    A Neural Attention Model for Categorizing Patient Safety Events

    Arman Cohan, Allan Fong, Nazli Goharian, Raj Ratwani
    Comments: ECIR 2017
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Medical errors are leading causes of death in the US and as such, prevention
    of these errors is paramount to promoting health care. Patient Safety Event
    reports are narratives describing potential adverse events to the patients and
    are important in identifying and preventing medical errors. We present a neural
    network architecture for identifying the type of safety events which is the
    first step in understanding these narratives. Our proposed model is based on a
    soft neural attention model to improve the effectiveness of encoding long
    sequences. Empirical results on two large-scale real-world datasets of patient
    safety reports demonstrate the effectiveness of our method with significant
    improvements over existing methods.

    Scalable Inference for Nested Chinese Restaurant Process Topic Models

    Jianfei Chen, Jun Zhu, Jie Lu, Shixia Liu
    Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Learning (cs.LG)

    Nested Chinese Restaurant Process (nCRP) topic models are powerful
    nonparametric Bayesian methods to extract a topic hierarchy from a given text
    corpus, where the hierarchical structure is automatically determined by the
    data. Hierarchical Latent Dirichlet Allocation (hLDA) is a popular instance of
    nCRP topic models. However, hLDA has only been evaluated at small scale,
    because the existing collapsed Gibbs sampling and instantiated weight
    variational inference algorithms either are not scalable or sacrifice inference
    quality with mean-field assumptions. Moreover, an efficient distributed
    implementation of the data structures, such as dynamically growing count
    matrices and trees, is challenging.

    In this paper, we propose a novel partially collapsed Gibbs sampling (PCGS)
    algorithm, which combines the advantages of collapsed and instantiated weight
    algorithms to achieve good scalability as well as high model quality. An
    initialization strategy is presented to further improve the model quality.
    Finally, we propose an efficient distributed implementation of PCGS through
    vectorization, pre-processing, and a careful design of the concurrent data
    structures and communication strategy.

    Empirical studies show that our algorithm is 111 times more efficient than
    the previous open-source implementation for hLDA, with comparable or even
    better model quality. Our distributed implementation can extract 1,722 topics
    from a 131-million-document corpus with 28 billion tokens, which is 4-5 orders
    of magnitude larger than the previous largest corpus, with 50 machines in 7
    hours.


    Computation and Language

    Inherent Biases of Recurrent Neural Networks for Phonological Assimilation and Dissimilation

    Amanda Doucette
    Subjects: Computation and Language (cs.CL)

    A recurrent neural network model of phonological pattern learning is
    proposed. The model is a relatively simple neural network with one recurrent
    layer, and displays biases in learning that mimic observed biases in human
    learning. Single-feature patterns are learned faster than two-feature patterns,
    and vowel or consonant-only patterns are learned faster than patterns involving
    vowels and consonants, mimicking the results of laboratory learning
    experiments. In non-recurrent models, capturing these biases requires the use
    of alpha features or some other representation of repeated features, but with a
    recurrent neural network, these elaborations are not necessary.

    Are Emojis Predictable?

    Francesco Barbieri, Miguel Ballesteros, Horacio Saggion
    Comments: To appear at EACL 2017
    Subjects: Computation and Language (cs.CL)

    Emojis are ideograms which are naturally combined with plain text to visually
    complement or condense the meaning of a message. Despite being widely used in
    social media, their underlying semantics have received little attention from a
    Natural Language Processing standpoint. In this paper, we investigate the
    relation between words and emojis, studying the novel task of predicting which
    emojis are evoked by text-based tweet messages. We train several models based
    on Long Short-Term Memory networks (LSTMs) in this task. Our experimental
    results show that our neural model outperforms two baselines as well as humans
    solving the same task, suggesting that computational models are able to better
    capture the underlying semantics of emojis.

    Utilizing Lexical Similarity for pivot translation involving resource-poor, related languages

    Anoop Kunchukuttan, Maulik Shah, Pradyot Prakash, Pushpak Bhattacharyya
    Comments: Submitted to ACL 2017, 10 pages, 9 tables
    Subjects: Computation and Language (cs.CL)

    We investigate the use of pivot languages for phrase-based statistical
    machine translation (PB-SMT) between related languages with limited parallel
    corpora. We show that subword-level pivot translation via a related pivot
    language is: (i) highly competitive with the best direct translation model and
    (ii) better than a pivot model which uses an unrelated pivot language, but has
    at its disposal large parallel corpora to build the source-pivot (S-P) and
    pivot-target (P-T) translation models. In contrast, pivot models trained at
    word and morpheme level are far inferior to their direct counterparts. We also
    show that using multiple related pivot languages can outperform a direct
    translation model. Thus, the use of subwords as translation units coupled with
    the use of multiple related pivot languages can compensate for the lack of a
    direct parallel corpus. Subword units make pivot models competitive by (i)
    utilizing lexical similarity to improve the underlying S-P and P-T translation
    models, and (ii) reducing loss of translation candidates during pivoting.

    LTSG: Latent Topical Skip-Gram for Mutually Learning Topic Model and Vector Representations

    Jarvan Law, Hankz Hankui Zhuo, Junhua He, Erhu Rong (Dept. of Computer Science, Sun Yat-Sen University, GuangZhou, China.)
    Subjects: Computation and Language (cs.CL)

    Topic models have been widely used in discovering latent topics which are
    shared across documents in text mining. Vector representations, word embeddings
    and topic embeddings, map words and topics into a low-dimensional and dense
    real-value vector space, which have obtained high performance in NLP tasks.
    However, most of the existing models assume the result trained by one of them
    are perfect correct and used as prior knowledge for improving the other model.
    Some other models use the information trained from external large corpus to
    help improving smaller corpus. In this paper, we aim to build such an algorithm
    framework that makes topic models and vector representations mutually improve
    each other within the same corpus. An EM-style algorithm framework is employed
    to iteratively optimize both topic model and vector representations.
    Experimental results show that our model outperforms state-of-art methods on
    various NLP tasks.

    A Neural Attention Model for Categorizing Patient Safety Events

    Arman Cohan, Allan Fong, Nazli Goharian, Raj Ratwani
    Comments: ECIR 2017
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Medical errors are leading causes of death in the US and as such, prevention
    of these errors is paramount to promoting health care. Patient Safety Event
    reports are narratives describing potential adverse events to the patients and
    are important in identifying and preventing medical errors. We present a neural
    network architecture for identifying the type of safety events which is the
    first step in understanding these narratives. Our proposed model is based on a
    soft neural attention model to improve the effectiveness of encoding long
    sequences. Empirical results on two large-scale real-world datasets of patient
    safety reports demonstrate the effectiveness of our method with significant
    improvements over existing methods.

    Pronunciation recognition of English phonemes / extipa{@}/, /æ/, / extipa{A}:/ and / extipa{2}/ using Formants and Mel Frequency Cepstral Coefficients

    Keith Y. Patarroyo, Vladimir Vargas-Calderón
    Comments: 11 pages, pre-print version
    Subjects: Computation and Language (cs.CL); Sound (cs.SD)

    The Vocal Joystick Vowel Corpus, by Washington University, was used to study
    monophthongs pronounced by native English speakers. The objective of this study
    was to quantitatively measure the extent at which speech recognition methods
    can distinguish between similar sounding vowels. In particular, the phonemes
    / extipa{@}/, /{ae}/, / extipa{A}:/ and / extipa{2}/ were analysed. 748
    sound files from the corpus were used and subjected to Linear Predictive Coding
    (LPC) to compute their formants, and to Mel Frequency Cepstral Coefficients
    (MFCC) algorithm, to compute the cepstral coefficients. A Decision Tree
    Classifier was used to build a predictive model that learnt the patterns of the
    two first formants measured in the data set, as well as the patterns of the 13
    cepstral coefficients. An accuracy of 70\% was achieved using formants for the
    mentioned phonemes. For the MFCC analysis an accuracy of 52 \% was achieved and
    an accuracy of 71\% when / extipa{@}/ was ignored. The results obtained show
    that the studied algorithms are far from mimicking the ability of
    distinguishing subtle differences in sounds like human hearing does.

    Feature Generation for Robust Semantic Role Labeling

    Travis Wolfe, Mark Dredze, Benjamin Van Durme
    Subjects: Computation and Language (cs.CL)

    Hand-engineered feature sets are a well understood method for creating robust
    NLP models, but they require a lot of expertise and effort to create. In this
    work we describe how to automatically generate rich feature sets from simple
    units called featlets, requiring less engineering. Using information gain to
    guide the generation process, we train models which rival the state of the art
    on two standard Semantic Role Labeling datasets with almost no task or
    linguistic insight.

    Unsupervised Learning of Morphological Forests

    Jiaming Luo, Karthik Narasimhan, Regina Barzilay
    Comments: 12 pages, 5 figures, accepted by TACL 2017
    Subjects: Computation and Language (cs.CL)

    This paper focuses on unsupervised modeling of morphological families,
    collectively comprising a forest over the language vocabulary. This formulation
    enables us to capture edgewise properties reflecting single-step morphological
    derivations, along with global distributional properties of the entire forest.
    These global properties constrain the size of the affix set and encourage
    formation of tight morphological families. The resulting objective is solved
    using Integer Linear Programming (ILP) paired with contrastive estimation. We
    train the model by alternating between optimizing the local log-linear model
    and the global ILP objective. We evaluate our system on three tasks: root
    detection, clustering of morphological families and segmentation. Our
    experiments demonstrate that our model yields consistent gains in all three
    tasks compared with the best published results.

    Stability of Topic Modeling via Matrix Factorization

    Mark Belford, Brian Mac Namee, Derek Greene
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    Topic models can provide us with an insight into the underlying latent
    structure of a large corpus of documents. A range of methods have been proposed
    in the literature, including probabilistic topic models and techniques based on
    matrix factorization. However, in both cases, standard implementations rely on
    stochastic elements in their initialization phase, which can potentially lead
    to different results being generated on the same corpus when using the same
    parameter values. This corresponds to the concept of “instability” which has
    previously been studied in the context of (k)-means clustering. In many
    applications of topic modeling, this problem of instability is not considered
    and topic models are treated as being definitive, even though the results may
    change considerably if the initialization process is altered. In this paper we
    demonstrate the inherent instability of popular topic modeling approaches,
    using a number of new measures to assess stability. To address this issue in
    the context of matrix factorization for topic modeling, we propose the use of
    ensemble learning strategies. Based on experiments performed on annotated text
    corpora, we show that a K-Fold ensemble strategy, combining both ensembles and
    structured initialization, can significantly reduce instability, while
    simultaneously yielding more accurate topic models.


    Distributed, Parallel, and Cluster Computing

    Synchronizability of Communicating Finite State Machines is not Decidable

    Alain Finkel (LSV), Etienne Lozes (LSV)
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Formal Languages and Automata Theory (cs.FL)

    A system of communicating finite state machines is synchronizable if its send
    trace semantics, i.e. the set of sequences of sendings it can perform, is the
    same when its communications are FIFO asynchronous and when they are just
    rendezvous synchronizations. This property was claimed to be decidable in
    several conference and journal papers. In this paper, we show that
    synchronizability is actually undecidable. We show that synchronizability is
    decidable if the topology of communications is an oriented ring. We also show
    that, in this case, synchronizability implies the absence of unspecified
    receptions and orphan messages, and the channel-recognizability of the
    reachability set.

    First Experiences Optimizing Smith-Waterman on Intel's Knights Landing Processor

    Enzo Rucci, Carlos Garcia, Guillermo Botella, Armando De Giusti, Marcelo Naiouf, Manuel Prieto-Matias
    Comments: Submitted to Euro-Par 2017 Conference
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

    The well-known Smith-Waterman (SW) algorithm is the most commonly used method
    for local sequence alignments. However, SW is very computationally demanding
    for large protein databases. There exist several implementations that take
    advantage of computing parallelization on many-cores, FPGAs or GPUs, in order
    to increase the alignment throughtput. In this paper, we have explored SW
    acceleration on Intel KNL processor. The novelty of this architecture requires
    the revision of previous programming and optimization techniques on many-core
    architectures. To the best of authors knowledge, this is the first KNL
    architecture assessment for SW algorithm. Our evaluation, using the renowned
    Environmental NR database as benchmark, has shown that multi-threading and SIMD
    exploitation reports competitive performance (351 GCUPS) in comparison with
    other implementations.

    DyAdHyTM: A Low Overhead Dynamically Adaptive Hybrid Transactional Memory on Big Data Graphs

    Mohammad Qayum, Abdul-Hameed Badawy, Jeanine Cook
    Comments: 13 pages
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Big data is a buzzword used to describe massive volumes of data that provides
    opportunities of exploring new insights through data analytics. However, big
    data is mostly structured but can be semi-structured or unstructured. It is
    normally so large that it is not only difficult but also slow to process using
    traditional computing systems. One of the solutions is to format the data as
    graph data structures and process them on shared memory architecture to use
    fast and novel policies such as transactional memory. In most graph
    applications in big data type problems such as bioinformatics, social networks,
    and cyber security, graphs are sparse in nature. Due to this sparsity, we have
    the opportunity to use Transactional Memory (TM) as the synchronization policy
    for critical sections to speedup applications. At low conflict probability TM
    performs better than most synchronization policies due to its inherent
    non-blocking characteristics. TM can be implemented in Software, Hardware or a
    combination of both. However, hardware TM implementations are fast but limited
    by scarce hardware resources while software implementations have high overheads
    which can degrade performance. In this paper, we develop a low overhead, yet
    simple, dynamically adaptive (i.e. at runtime) hybrid (i.e. combines hardware
    and software) TM (DyAdHyTM) scheme that combines the best features of both
    Hardware TM (HTM) and Software TM (STM) while adapting to application
    requirements. It performs better than coarse grain lock by up to 8.12x, a low
    overhead STM by up to 2.68x, a couple of implementations of HTMs (by up to
    2.59x), and other HyTMs (by up to 1.55x) for SSCA2 graph benchmark running on a
    multicore machine with a large shared memory.

    ERA: A Framework for Economic Resource Allocation for the Cloud

    Moshe Babaioff, Yishay Mansour, Noam Nisan, Gali Noti, Carlo Curino, Nar Ganapathy, Ishai Menache, Omer Reingold, Moshe Tennenholtz, Erez Timnat
    Subjects: Computer Science and Game Theory (cs.GT); Distributed, Parallel, and Cluster Computing (cs.DC)

    Cloud computing has reached significant maturity from a systems perspective,
    but currently deployed solutions rely on rather basic economics mechanisms that
    yield suboptimal allocation of the costly hardware resources. In this paper we
    present Economic Resource Allocation (ERA), a complete framework for scheduling
    and pricing cloud resources, aimed at increasing the efficiency of cloud
    resources usage by allocating resources according to economic principles. The
    ERA architecture carefully abstracts the underlying cloud infrastructure,
    enabling the development of scheduling and pricing algorithms independently of
    the concrete lower-level cloud infrastructure and independently of its
    concerns. Specifically, ERA is designed as a flexible layer that can sit on top
    of any cloud system and interfaces with both the cloud resource manager and
    with the users who reserve resources to run their jobs. The jobs are scheduled
    based on prices that are dynamically calculated according to the predicted
    demand. Additionally, ERA provides a key internal API to pluggable algorithmic
    modules that include scheduling, pricing and demand prediction. We provide a
    proof-of-concept software and demonstrate the effectiveness of the architecture
    by testing ERA over both public and private cloud systems — Azure Batch of
    Microsoft and Hadoop/YARN. A broader intent of our work is to foster
    collaborations between economics and system communities. To that end, we have
    developed a simulation platform via which economics and system experts can test
    their algorithmic implementations.

    How to Optimally Allocate Resources for Coded Distributed Computing?

    Qian Yu, Songze Li, Mohammad Ali Maddah-Ali, A. Salman Avestimehr
    Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)

    Today’s data centers have an abundance of computing resources, hosting server
    clusters consisting of as many as tens or hundreds of thousands of machines. To
    execute a complex computing task over a data center, it is natural to
    distribute computations across many nodes to take advantage of parallel
    processing. However, as we allocate more and more computing resources to a
    computation task and further distribute the computations, large amounts of
    (partially) computed data must be moved between consecutive stages of
    computation tasks among the nodes, hence the communication load can become the
    bottleneck. In this paper, we study the optimal allocation of computing
    resources in distributed computing, in order to minimize the total execution
    time in distributed computing accounting for both the duration of computation
    and communication phases. In particular, we consider a general MapReduce-type
    distributed computing framework, in which the computation is decomposed into
    three stages: emph{Map}, emph{Shuffle}, and emph{Reduce}. We focus on a
    recently proposed emph{Coded Distributed Computing} approach for MapReduce and
    study the optimal allocation of computing resources in this framework. For all
    values of problem parameters, we characterize the optimal number of servers
    that should be used for distributed processing, provide the optimal placements
    of the Map and Reduce tasks, and propose an optimal coded data shuffling
    scheme, in order to minimize the total execution time. To prove the optimality
    of the proposed scheme, we first derive a matching information-theoretic
    converse on the execution time, then we prove that among all possible resource
    allocation schemes that achieve the minimum execution time, our proposed scheme
    uses the exactly minimum possible number of servers.

    Scalable Inference for Nested Chinese Restaurant Process Topic Models

    Jianfei Chen, Jun Zhu, Jie Lu, Shixia Liu
    Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Learning (cs.LG)

    Nested Chinese Restaurant Process (nCRP) topic models are powerful
    nonparametric Bayesian methods to extract a topic hierarchy from a given text
    corpus, where the hierarchical structure is automatically determined by the
    data. Hierarchical Latent Dirichlet Allocation (hLDA) is a popular instance of
    nCRP topic models. However, hLDA has only been evaluated at small scale,
    because the existing collapsed Gibbs sampling and instantiated weight
    variational inference algorithms either are not scalable or sacrifice inference
    quality with mean-field assumptions. Moreover, an efficient distributed
    implementation of the data structures, such as dynamically growing count
    matrices and trees, is challenging.

    In this paper, we propose a novel partially collapsed Gibbs sampling (PCGS)
    algorithm, which combines the advantages of collapsed and instantiated weight
    algorithms to achieve good scalability as well as high model quality. An
    initialization strategy is presented to further improve the model quality.
    Finally, we propose an efficient distributed implementation of PCGS through
    vectorization, pre-processing, and a careful design of the concurrent data
    structures and communication strategy.

    Empirical studies show that our algorithm is 111 times more efficient than
    the previous open-source implementation for hLDA, with comparable or even
    better model quality. Our distributed implementation can extract 1,722 topics
    from a 131-million-document corpus with 28 billion tokens, which is 4-5 orders
    of magnitude larger than the previous largest corpus, with 50 machines in 7
    hours.

    Large-Scale Stochastic Learning using GPUs

    Thomas Parnell, Celestine Dünner, Kubilay Atasu, Manolis Sifalakis, Haris Pozidis
    Comments: Accepted for publication in ParLearning 2017: The 6th International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics, Orlando, Florida, May 2017
    Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

    In this work we propose an accelerated stochastic learning system for very
    large-scale applications. Acceleration is achieved by mapping the training
    algorithm onto massively parallel processors: we demonstrate a parallel,
    asynchronous GPU implementation of the widely used stochastic coordinate
    descent/ascent algorithm that can provide up to 35x speed-up over a sequential
    CPU implementation. In order to train on very large datasets that do not fit
    inside the memory of a single GPU, we then consider techniques for distributed
    stochastic learning. We propose a novel method for optimally aggregating model
    updates from worker nodes when the training data is distributed either by
    example or by feature. Using this technique, we demonstrate that one can scale
    out stochastic learning across up to 8 worker nodes without any significant
    loss of training time. Finally, we combine GPU acceleration with the optimized
    distributed method to train on a dataset consisting of 200 million training
    examples and 75 million features. We show by scaling out across 4 GPUs, one can
    attain a high degree of training accuracy in around 4 seconds: a 20x speed-up
    in training time compared to a multi-threaded, distributed implementation
    across 4 CPUs.

    A Realistic Dataset for the Smart Home Device Scheduling Problem for DCOPs

    William Kluegel, Muhammad Aamir Iqbal, Ferdinando Fioretto, William Yeoh, Enrico Pontelli
    Comments: 15 pages, OPTMAS17
    Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

    The field of Distributed Constraint Optimization has gained momentum in
    recent years thanks to its ability to address various applications related to
    multi-agent cooperation. While techniques to solve Distributed Constraint
    Optimization Problems (DCOPs) are abundant and have matured substantially since
    the field inception, the number of DCOP realistic applications and benchmark
    used to asses the performance of DCOP algorithms is lagging behind. To contrast
    this background we (i) introduce the Smart Home Device Scheduling (SHDS)
    problem, which describe the problem of coordinating smart devices schedules
    across multiple homes as a multi-agent system, (ii) detail the physical models
    adopted to simulate smart sensors, smart actuators, and homes environments, and
    (iii) introduce a DCOP realistic benchmark for SHDS problems.


    Learning

    On the ability of neural nets to express distributions

    Holden Lee, Rong Ge, Andrej Risteski, Tengyu Ma, Sanjeev Arora
    Comments: Under review for COLT 2017
    Subjects: Learning (cs.LG)

    Deep neural nets have caused a revolution in many classification tasks. A
    related ongoing revolution—also theoretically not understood—concerns their
    ability to serve as generative models for complicated types of data such as
    images and texts. These models are trained using ideas like variational
    autoencoders and Generative Adversarial Networks.

    We take a first cut at explaining the expressivity of multilayer nets by
    giving a sufficient criterion for a function to be approximable by a neural
    network with (n) hidden layers. A key ingredient is Barron’s Theorem
    cite{Barron1993}, which gives a Fourier criterion for approximability of a
    function by a neural network with 1 hidden layer. We show that a composition of
    (n) functions which satisfy certain Fourier conditions (“Barron functions”) can
    be approximated by a (n+1)-layer neural network.

    For probability distributions, this translates into a criterion for a
    probability distribution to be approximable in Wasserstein distance—a natural
    metric on probability distributions—by a neural network applied to a fixed
    base distribution (e.g., multivariate gaussian).

    Building up recent lower bound work, we also give an example function that
    shows that composition of Barron functions is more expressive than Barron
    functions alone.

    Learning Hawkes Processes from Short Doubly-Censored Event Sequences

    Hongteng Xu, Dixin Luo, Hongyuan Zha
    Subjects: Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)

    Many real-world applications require robust algorithms to learn point process
    models based on a type of incomplete data — the so-called short
    doubly-censored (SDC) event sequences. In this paper, we study this critical
    problem of quantitative asynchronous event sequence analysis under the
    framework of Hawkes processes by leveraging the general idea of data synthesis.
    In particular, given SDC event sequences observed in a variety of time
    intervals, we propose a sampling-stitching data synthesis method — sampling
    predecessor and successor for each SDC event sequence from potential candidates
    and stitching them together to synthesize long training sequences. The
    rationality and the feasibility of our method are discussed in terms of
    arguments based on likelihood. Experiments on both synthetic and real-world
    data demonstrate that the proposed data synthesis method improves learning
    results indeed for both time-invariant and time-varying Hawkes processes.

    Large-Scale Stochastic Learning using GPUs

    Thomas Parnell, Celestine Dünner, Kubilay Atasu, Manolis Sifalakis, Haris Pozidis
    Comments: Accepted for publication in ParLearning 2017: The 6th International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics, Orlando, Florida, May 2017
    Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

    In this work we propose an accelerated stochastic learning system for very
    large-scale applications. Acceleration is achieved by mapping the training
    algorithm onto massively parallel processors: we demonstrate a parallel,
    asynchronous GPU implementation of the widely used stochastic coordinate
    descent/ascent algorithm that can provide up to 35x speed-up over a sequential
    CPU implementation. In order to train on very large datasets that do not fit
    inside the memory of a single GPU, we then consider techniques for distributed
    stochastic learning. We propose a novel method for optimally aggregating model
    updates from worker nodes when the training data is distributed either by
    example or by feature. Using this technique, we demonstrate that one can scale
    out stochastic learning across up to 8 worker nodes without any significant
    loss of training time. Finally, we combine GPU acceleration with the optimized
    distributed method to train on a dataset consisting of 200 million training
    examples and 75 million features. We show by scaling out across 4 GPUs, one can
    attain a high degree of training accuracy in around 4 seconds: a 20x speed-up
    in training time compared to a multi-threaded, distributed implementation
    across 4 CPUs.

    Heavy-Tailed Analogues of the Covariance Matrix for ICA

    Joseph Anderson, Navin Goyal, Anupama Nandi, Luis Rademacher
    Comments: 16 Pages, 9 Figures, AAAI 2017
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Independent Component Analysis (ICA) is the problem of learning a square
    matrix (A), given samples of (X=AS), where (S) is a random vector with
    independent coordinates. Most existing algorithms are provably efficient only
    when each (S_i) has finite and moderately valued fourth moment. However, there
    are practical applications where this assumption need not be true, such as
    speech and finance. Algorithms have been proposed for heavy-tailed ICA, but
    they are not practical, using random walks and the full power of the ellipsoid
    algorithm multiple times. The main contributions of this paper are:

    (1) A practical algorithm for heavy-tailed ICA that we call HTICA. We provide
    theoretical guarantees and show that it outperforms other algorithms in some
    heavy-tailed regimes, both on real and synthetic data. Like the current
    state-of-the-art, the new algorithm is based on the centroid body (a first
    moment analogue of the covariance matrix). Unlike the state-of-the-art, our
    algorithm is practically efficient. To achieve this, we use explicit analytic
    representations of the centroid body, which bypasses the use of the ellipsoid
    method and random walks.

    (2) We study how heavy tails affect different ICA algorithms, including
    HTICA. Somewhat surprisingly, we show that some algorithms that use the
    covariance matrix or higher moments can successfully solve a range of ICA
    instances with infinite second moment. We study this theoretically and
    experimentally, with both synthetic and real-world heavy-tailed data.

    A Converse to Banach's Fixed Point Theorem and its CLS Completeness

    Constantinos Daskalakis, Christos Tzamos, Manolis Zampetakis
    Subjects: Computational Complexity (cs.CC); Learning (cs.LG); General Topology (math.GN); Machine Learning (stat.ML)

    Banach’s fixed point theorem for contraction maps has been widely used to
    analyze the convergence of iterative methods in non-convex problems. It is a
    common experience, however, that iterative maps fail to be globally contracting
    under the natural metric in their domain, making the applicability of Banach’s
    theorem limited. We explore how generally we can apply Banach’s fixed point
    theorem to establish the convergence of iterative methods when pairing it with
    carefully designed metrics.

    Our first result is a strong converse of Banach’s theorem, showing that it is
    a universal analysis tool for establishing uniqueness of fixed points and for
    bounding the convergence rate of iterative maps to a unique fixed point. In
    other words, we show that, whenever an iterative map globally converges to a
    unique fixed point, there exists a metric under which the iterative map is
    contracting and which can be used to bound the number of iterations until
    convergence. We illustrate our approach in the widely used power method,
    providing a new way of bounding its convergence rate through contraction
    arguments.

    We next consider the computational complexity of Banach’s fixed point
    theorem. Making the proof of our converse theorem constructive, we show that
    computing a fixed point whose existence is guaranteed by Banach’s fixed point
    theorem is CLS-complete. We thus provide the first natural complete problem for
    the class CLS, which was defined in [Daskalakis-Papadimitriou 2011] to capture
    the complexity of problems such as P-matrix LCP, computing KKT-points, and
    finding mixed Nash equilibria in congestion and network coordination games.

    Learning to Draw Dynamic Agent Goals with Generative Adversarial Networks

    Shariq Iqbal, John Pearson
    Comments: Submitted to ICML
    Subjects: Neurons and Cognition (q-bio.NC); Learning (cs.LG); Machine Learning (stat.ML)

    We address the problem of designing artificial agents capable of reproducing
    human behavior in a competitive game involving dynamic control. Given data
    consisting of multiple realizations of inputs generated by pairs of interacting
    players, we model each agent’s actions as governed by a time-varying latent
    goal state coupled to a control model. These goals, in turn, are described as
    stochastic processes evolving according to player-specific value functions
    depending on the current state of the game. We model these value functions
    using generative adversarial networks (GANs) and show that our GAN-based
    approach succeeds in producing sample gameplay that captures the rich dynamics
    of human agents. The latent goal dynamics inferred and generated by our model
    has applications to fields like neuroscience and animal behavior, where the
    underlying value functions themselves are of theoretical interest.

    Causal Discovery Using Proxy Variables

    Mateo Rojas-Carulla, Marco Baroni, David Lopez-Paz
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Discovering causal relations is fundamental to reasoning and intelligence. In
    particular, observational causal discovery algorithms estimate the cause-effect
    relation between two random entities (X) and (Y), given (n) samples from
    (P(X,Y)).

    In this paper, we develop a framework to estimate the cause-effect relation
    between two static entities (x) and (y): for instance, an art masterpiece (x)
    and its fraudulent copy (y). To this end, we introduce the notion of proxy
    variables, which allow the construction of a pair of random entities ((A,B))
    from the pair of static entities ((x,y)). Then, estimating the cause-effect
    relation between (A) and (B) using an observational causal discovery algorithm
    leads to an estimation of the cause-effect relation between (x) and (y). For
    example, our framework detects the causal relation between unprocessed
    photographs and their modifications, and orders in time a set of shuffled
    frames from a video.

    As our main case study, we introduce a human-elicited dataset of 10,000 pairs
    of casually-linked pairs of words from natural language. Our methods discover
    75% of these causal relations. Finally, we discuss the role of proxy variables
    in machine learning, as a general tool to incorporate static knowledge into
    prediction tasks.

    Online Multiclass Boosting

    Young Hun Jung, Ambuj Tewari
    Comments: 17 pages, 2 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Recent work has extended the theoretical analysis of boosting algorithms to
    multiclass problems and online settings. However, the multiclass extension is
    in the batch setting and the online extensions only consider binary
    classification. To the best of our knowledge, there exists no framework to
    analyze online boosting algorithms for multiclass classification. We fill this
    gap in the literature by defining, and justifying, a weak learning condition
    for online multiclass boosting. We also provide an algorithm called online
    multiclass boost-by-majority to optimally combine weak learners in our setting.

    Rotting Bandits

    Nir Levine, Koby Crammer, Shie Mannor
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    The Multi-Armed Bandits (MAB) framework highlights the tension between
    acquiring new knowledge (Exploration) and leveraging available knowledge
    (Exploitation). In the classical MAB problem, a decision maker must choose an
    arm at each time step, upon which she receives a reward. The decision maker’s
    objective is to maximize her cumulative expected reward over the time horizon.
    The MAB problem has been studied extensively, specifically under the assumption
    of the arms’ rewards distributions being stationary, or quasi-stationary, over
    time. We consider a variant of the MAB framework, which we termed
    extit{Rotting Bandits}, where each arm’s expected reward decays as a function
    of the number of times it has been pulled. We are motivated by many real-world
    scenarios such as online advertising, content recommendation, crowdsourcing,
    and more. We present algorithms, accompanied by simulations, and derive
    theoretical guarantees.

    A minimax and asymptotically optimal algorithm for stochastic bandits

    Pierre Ménard (IMT), Aurélien Garivier (IMT)
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Statistics Theory (math.ST)

    We propose the kl-UCB ++ algorithm for regret minimization in stochastic
    bandit models with exponential families of distributions. We prove that it is
    simultaneously asymptotically optimal (in the sense of Lai and Robbins’ lower
    bound) and minimax optimal. This is the first algorithm proved to enjoy these
    two properties at the same time. This work thus merges two different lines of
    research, with simple proofs involving no complexity overhead.

    Stability of Topic Modeling via Matrix Factorization

    Mark Belford, Brian Mac Namee, Derek Greene
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    Topic models can provide us with an insight into the underlying latent
    structure of a large corpus of documents. A range of methods have been proposed
    in the literature, including probabilistic topic models and techniques based on
    matrix factorization. However, in both cases, standard implementations rely on
    stochastic elements in their initialization phase, which can potentially lead
    to different results being generated on the same corpus when using the same
    parameter values. This corresponds to the concept of “instability” which has
    previously been studied in the context of (k)-means clustering. In many
    applications of topic modeling, this problem of instability is not considered
    and topic models are treated as being definitive, even though the results may
    change considerably if the initialization process is altered. In this paper we
    demonstrate the inherent instability of popular topic modeling approaches,
    using a number of new measures to assess stability. To address this issue in
    the context of matrix factorization for topic modeling, we propose the use of
    ensemble learning strategies. Based on experiments performed on annotated text
    corpora, we show that a K-Fold ensemble strategy, combining both ensembles and
    structured initialization, can significantly reduce instability, while
    simultaneously yielding more accurate topic models.

    Automatic Representation for Lifetime Value Recommender Systems

    Assaf Hallak, Yishay Mansour, Elad Yom-Tov
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Many modern commercial sites employ recommender systems to propose relevant
    content to users. While most systems are focused on maximizing the immediate
    gain (clicks, purchases or ratings), a better notion of success would be the
    lifetime value (LTV) of the user-system interaction. The LTV approach considers
    the future implications of the item recommendation, and seeks to maximize the
    cumulative gain over time. The Reinforcement Learning (RL) framework is the
    standard formulation for optimizing cumulative successes over time. However, RL
    is rarely used in practice due to its associated representation, optimization
    and validation techniques which can be complex. In this paper we propose a new
    architecture for combining RL with recommendation systems which obviates the
    need for hand-tuned features, thus automating the state-space representation
    construction process. We analyze the practical difficulties in this formulation
    and test our solutions on batch off-line real-world recommendation data.

    Consistent On-Line Off-Policy Evaluation

    Assaf Hallak, Shie Mannor
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    The problem of on-line off-policy evaluation (OPE) has been actively studied
    in the last decade due to its importance both as a stand-alone problem and as a
    module in a policy improvement scheme. However, most Temporal Difference (TD)
    based solutions ignore the discrepancy between the stationary distribution of
    the behavior and target policies and its effect on the convergence limit when
    function approximation is applied. In this paper we propose the Consistent
    Off-Policy Temporal Difference (COP-TD((lambda), (eta))) algorithm that
    addresses this issue and reduces this bias at some computational expense. We
    show that COP-TD((lambda), (eta)) can be designed to converge to the same
    value that would have been obtained by using on-policy TD((lambda)) with the
    target policy. Subsequently, the proposed scheme leads to a related and
    promising heuristic we call log-COP-TD((lambda), (eta)). Both algorithms
    have favorable empirical results to the current state of the art on-line OPE
    algorithms. Finally, our formulation sheds some new light on the recently
    proposed Emphatic TD learning.

    Discriminating Traces with Time

    Saeid Tizpaz-Niari, Pavol Cerny, Bor-Yuh Evan Chang, Sriram Sankaranarayanan, Ashutosh Trivedi
    Comments: Published in TACAS 2017
    Subjects: Programming Languages (cs.PL); Cryptography and Security (cs.CR); Formal Languages and Automata Theory (cs.FL); Learning (cs.LG); Software Engineering (cs.SE)

    What properties about the internals of a program explain the possible
    differences in its overall running time for different inputs? In this paper, we
    propose a formal framework for considering this question we dub trace-set
    discrimination. We show that even though the algorithmic problem of computing
    maximum likelihood discriminants is NP-hard, approaches based on integer linear
    programming (ILP) and decision tree learning can be useful in zeroing-in on the
    program internals. On a set of Java benchmarks, we find that
    compactly-represented decision trees scalably discriminate with high
    accuracy—more scalably than maximum likelihood discriminants and with
    comparable accuracy. We demonstrate on three larger case studies how
    decision-tree discriminants produced by our tool are useful for debugging
    timing side-channel vulnerabilities (i.e., where a malicious observer infers
    secrets simply from passively watching execution times) and availability
    vulnerabilities.

    Bidirectional Backpropagation: Towards Biologically Plausible Error Signal Transmission in Neural Networks

    Hongyin Luo, Jie Fu, James Glass
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    The back-propagation (BP) algorithm has been considered the de facto method
    for training deep neural networks. It back-propagates errors from the output
    layer to the hidden layers in an exact manner using feedforward weights. In
    this work, we propose a more biologically plausible paradigm of neural
    architecture according to biological findings. Specifically, we propose two
    bidirectional learning algorithms with two sets of trainable weights.
    Preliminary results show that our models perform best on the MNIST and the
    CIFAR10 datasets among the asymmetric error signal passing methods, and their
    performance is more close to that of BP.

    Scalable Inference for Nested Chinese Restaurant Process Topic Models

    Jianfei Chen, Jun Zhu, Jie Lu, Shixia Liu
    Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Learning (cs.LG)

    Nested Chinese Restaurant Process (nCRP) topic models are powerful
    nonparametric Bayesian methods to extract a topic hierarchy from a given text
    corpus, where the hierarchical structure is automatically determined by the
    data. Hierarchical Latent Dirichlet Allocation (hLDA) is a popular instance of
    nCRP topic models. However, hLDA has only been evaluated at small scale,
    because the existing collapsed Gibbs sampling and instantiated weight
    variational inference algorithms either are not scalable or sacrifice inference
    quality with mean-field assumptions. Moreover, an efficient distributed
    implementation of the data structures, such as dynamically growing count
    matrices and trees, is challenging.

    In this paper, we propose a novel partially collapsed Gibbs sampling (PCGS)
    algorithm, which combines the advantages of collapsed and instantiated weight
    algorithms to achieve good scalability as well as high model quality. An
    initialization strategy is presented to further improve the model quality.
    Finally, we propose an efficient distributed implementation of PCGS through
    vectorization, pre-processing, and a careful design of the concurrent data
    structures and communication strategy.

    Empirical studies show that our algorithm is 111 times more efficient than
    the previous open-source implementation for hLDA, with comparable or even
    better model quality. Our distributed implementation can extract 1,722 topics
    from a 131-million-document corpus with 28 billion tokens, which is 4-5 orders
    of magnitude larger than the previous largest corpus, with 50 machines in 7
    hours.

    Learning Model Predictive Control for Iterative Tasks: A Computationally Efficient Approach for Linear System

    Ugo Rosolia, Francesco Borrelli
    Comments: arXiv admin note: substantial text overlap with arXiv:1609.01387
    Subjects: Optimization and Control (math.OC); Learning (cs.LG)

    A Learning Model Predictive Controller (LMPC) for linear system in presented.
    The proposed controller is an extension of the LMPC [1] and it aims to decrease
    the computational burden. The control scheme is reference-free and is able to
    improve its performance by learning from previous iterations. A convex safe set
    and a terminal cost function are used in order to guarantee recursive
    feasibility and non- increasing performance at each iteration. The paper
    presents the control design approach, and shows how to recursively construct
    the convex terminal set and the terminal cost from state and input trajectories
    of previous iterations. Simulation results show the effectiveness of the
    proposed control logic.

    One Size Fits Many: Column Bundle for Multi-X Learning

    Trang Pham, Truyen Tran, Svetha Venkatesh
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Much recent machine learning research has been directed towards leveraging
    shared statistics among labels, instances and data views, commonly referred to
    as multi-label, multi-instance and multi-view learning. The underlying premises
    are that there exist correlations among input parts and among output targets,
    and the predictive performance would increase when the correlations are
    incorporated. In this paper, we propose Column Bundle (CLB), a novel deep
    neural network for capturing the shared statistics in data. CLB is generic that
    the same architecture can be applied for various types of shared statistics by
    changing only input and output handling. CLB is capable of scaling to thousands
    of input parts and output labels by avoiding explicit modeling of pairwise
    relations. We evaluate CLB on different types of data: (a) multi-label, (b)
    multi-view, (c) multi-view/multi-label and (d) multi-instance. CLB demonstrates
    a comparable and competitive performance in all datasets against
    state-of-the-art methods designed specifically for each type.

    On Polynomial Time Methods for Exact Low Rank Tensor Completion

    Dong Xia, Ming Yuan
    Comments: 56 pages, 4 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    In this paper, we investigate the sample size requirement for exact recovery
    of a high order tensor of low rank from a subset of its entries. We show that a
    gradient descent algorithm with initial value obtained from a spectral method
    can, in particular, reconstruct a ({d imes d imes d}) tensor of multilinear
    ranks ((r,r,r)) with high probability from as few as
    (O(r^{7/2}d^{3/2}log^{7/2}d+r^7dlog^6d)) entries. In the case when the ranks
    (r=O(1)), our sample size requirement matches those for nuclear norm
    minimization (Yuan and Zhang, 2016a), or alternating least squares assuming
    orthogonal decomposability (Jain and Oh, 2014). Unlike these earlier
    approaches, however, our method is efficient to compute, easy to implement, and
    does not impose extra structures on the tensor. Numerical results are presented
    to further demonstrate the merits of the proposed approach.

    Approximations of the Restless Bandit Problem

    Steffen Grunewalder, Azadeh Khaleghi
    Subjects: Statistics Theory (math.ST); Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)

    The multi-armed restless bandit problem is studied in the case where the
    pay-offs are not necessarily independent over time nor across the arms. Even
    though this version of the problem provides a more realistic model for most
    real-world applications, it cannot be optimally solved in practice since it is
    known to be PSPACE-hard. The objective of this paper is to characterize special
    sub-classes of the problem where good approximate solutions can be found using
    tractable approaches. Specifically, it is shown that in the case where the
    joint distribution over the arms is (varphi)-mixing, and under some conditions
    on the (varphi)-mixing coefficients, a modified version of UCB can prove
    optimal. On the other hand, it is shown that when the pay-off distributions are
    strongly dependent, simple switching strategies may be devised which leverage
    the strong inter-dependencies. To this end, an example is provided using
    Gaussian Processes. The techniques developed in this paper apply, more
    generally, to the problem of online sampling under dependence.

    Transferring Face Verification Nets To Pain and Expression Regression

    Feng Wang, Xiang Xiang, Chang Liu, Trac D. Tran, Austin Reiter, Gregory D. Hager, Harry Quon, Jian Cheng, Alan L. Yuille
    Comments: 5 pages, 3 figure; submitted to IEEE ICIP 2017 which does not perform blind reviews
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG); Multimedia (cs.MM)

    Limited annotated data is available for the research of estimating facial
    expression intensities, which makes the training of deep networks for automated
    expression assessment very challenging. Fortunately, fine-tuning from a
    data-extensive pre-trained domain such as face verification can alleviate the
    problem. In this paper, we propose a transferred network that fine-tunes a
    state-of-the-art face verification network using expression-intensity labeled
    data with a regression layer. In this way, the expression regression task can
    benefit from the rich feature representations trained on a huge amount of data
    for face verification. The proposed transferred deep regressor is applied in
    estimating the intensity of facial action units (2017 EmotionNet Challenge) and
    in particular pain intensity estimation (UNBS-McMaster Shoulder-Pain dataset).
    It wins the second place in the challenge and achieves the state-of-the-art
    performance on Shoulder-Pain dataset. Particularly for Shoulder-Pain with the
    imbalance issue of different pain levels, a new weighted evaluation metric is
    proposed.

    Predicting non-linear dynamics: a stable local learning scheme for recurrent spiking neural networks

    Aditya Gilra, Wulfram Gerstner
    Subjects: Neurons and Cognition (q-bio.NC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Systems and Control (cs.SY)

    Brains need to predict how our muscles and body react to motor commands. How
    networks of spiking neurons can learn to reproduce these non-linear dynamics,
    using local, online and stable learning rules, is an important, open question.
    Here, we present a supervised learning scheme for the feedforward and recurrent
    connections in a network of heterogeneous spiking neurons. The error in the
    output is fed back through fixed random connections with a negative gain,
    causing the network to follow the desired dynamics, while an online and local
    rule changes the weights; hence we call the scheme FOLLOW (Feedback-based
    Online Local Learning Of Weights) The rule is local in the sense that weight
    changes depend on the presynaptic activity and the error signal projected onto
    the post-synaptic neuron. We provide examples of learning linear, non-linear
    and chaotic dynamics, as well as the dynamics of a two-link arm. Using the
    Lyapunov method, and under reasonable assumptions and approximations, we show
    that FOLLOW learning is uniformly stable, with the error going to zero
    asymptotically.


    Information Theory

    Two-Moment Inequalities for Rényi Entropy and Mutual Information

    Galen Reeves
    Subjects: Information Theory (cs.IT)

    This paper explores some applications of a two-moment inequality for the
    integral of the (r)-th power of a function, where (0 < r< 1). The first
    contribution is an upper bound on the R'{e}nyi entropy of a random vector in
    terms of the two different moments. When one of the moments is the zeroth
    moment, these bounds recover previous results based on maximum entropy
    distributions under a single moment constraint. More generally, evaluation of
    the bound with two carefully chosen nonzero moments can lead to significant
    improvements with a modest increase in complexity. The second contribution is a
    method for upper bounding mutual information in terms of certain integrals with
    respect to the variance of the conditional density. The bounds have a number of
    useful properties arising from the connection with variance decompositions.

    How to Optimally Allocate Resources for Coded Distributed Computing?

    Qian Yu, Songze Li, Mohammad Ali Maddah-Ali, A. Salman Avestimehr
    Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)

    Today’s data centers have an abundance of computing resources, hosting server
    clusters consisting of as many as tens or hundreds of thousands of machines. To
    execute a complex computing task over a data center, it is natural to
    distribute computations across many nodes to take advantage of parallel
    processing. However, as we allocate more and more computing resources to a
    computation task and further distribute the computations, large amounts of
    (partially) computed data must be moved between consecutive stages of
    computation tasks among the nodes, hence the communication load can become the
    bottleneck. In this paper, we study the optimal allocation of computing
    resources in distributed computing, in order to minimize the total execution
    time in distributed computing accounting for both the duration of computation
    and communication phases. In particular, we consider a general MapReduce-type
    distributed computing framework, in which the computation is decomposed into
    three stages: emph{Map}, emph{Shuffle}, and emph{Reduce}. We focus on a
    recently proposed emph{Coded Distributed Computing} approach for MapReduce and
    study the optimal allocation of computing resources in this framework. For all
    values of problem parameters, we characterize the optimal number of servers
    that should be used for distributed processing, provide the optimal placements
    of the Map and Reduce tasks, and propose an optimal coded data shuffling
    scheme, in order to minimize the total execution time. To prove the optimality
    of the proposed scheme, we first derive a matching information-theoretic
    converse on the execution time, then we prove that among all possible resource
    allocation schemes that achieve the minimum execution time, our proposed scheme
    uses the exactly minimum possible number of servers.

    A Novel Index Coding Scheme and its Application to Coded Caching

    Kai Wan, Daniela Tuninetti, Pablo Piantanida
    Comments: ITA 2017
    Subjects: Information Theory (cs.IT)

    This paper proposes a novel achievable scheme for the index problem and
    applies it to the caching problem. Index coding and caching are noiseless
    broadcast channel problems where receivers have message side information.In the
    index coding problem the side information sets are fixed, while in the caching
    problem the side information sets correspond the cache contents, which are
    under the control of the system designer. The proposed index coding scheme,
    based on distributed source coding and non-unique decoding,is shown to strictly
    enlarge the rate region achievable by composite coding.The novel index coding
    scheme applied to the caching problem is then shown to match an outer bound
    (previously proposed by the authors and also based on known results for the
    index coding problem) under the assumption of uncoded cache
    placement/prefetching.

    Massive MIMO Pilot Decontamination and Channel Interpolation via Wideband Sparse Channel Estimation

    Saeid Haghighatshoar, Giuseppe Caire
    Comments: A short version of this paper was presented in 50th Annual Asilomar Conference on Signals, Systems, and Computers (Asilomar 2016)
    Subjects: Information Theory (cs.IT)

    We consider a massive MIMO system, based on Time Division Duplexing (TDD) and
    channel reciprocity, where the base stations learn the channel vectors of their
    users via the pilots transmitted by the users in the uplink time-frequency
    slots. It has been observed that, in the limit of very large number of
    antennas, the only factor limiting the achievable throughput of such a system
    is pilot contamination, which arises since the same set of orthogonal pilots is
    eventually reused in multiple cells. In the regime of moderately large number
    of antennas, another source of degradation is channel interpolation since, due
    to limited pilot dimension, the pilot signal of each user probes only a limited
    number OFDM subcarriers and the channel must be interpolated in the remaining
    subcarriers over which no pilot symbol is transmitted.

    In this paper, we design a low-complexity algorithm that uses the received
    wideband pilot snapshots of a user inside an observation window containing
    several coherence blocks to obtain an estimate of the angle-delay Power Spread
    Function (PSF) of its contaminated channel vectors. We propose supervised and
    unsupervised clustering algorithms to decompose the estimated PSF into two
    parts: 1) the signal part corresponding to the desired user and 2) the
    interference part corresponding to the copilot users. We use this decomposition
    to obtain an estimate of the covariance matrix of the user channel vector,
    which we exploit to decontaminate the received pilot signal in the next
    coherence blocks. We also propose an effective low-complexity decontamination
    scheme which also serves as an efficient approximation of the Minimum Mean
    Squared Error (MMSE) smoothing filter, i.e., the optimal channel interpolator
    in the MMSE sense. We use numerical simulations to assess the performance of
    our proposed PSF estimation and pilot-decontamination/channel interpolation
    algorithm.

    Massive MIMO 5G Cellular Networks: mm-wave vs. μ-wave Frequencies

    Stefano Buzzi, Carmen D'Andrea
    Comments: Invited paper; to appear on the ZTE Communications special issue on 5G New Radio
    Subjects: Information Theory (cs.IT)

    Enhanced mobile broadband (eMBB) is one of the key use-cases for the
    development of the new standard 5G New Radio for the next generation of mobile
    wireless networks. Large-scale antenna arrays, a.k.a. Massive MIMO, the usage
    of carrier frequencies in the range 10-100 GHz, the so-called millimeter wave
    (mm-wave) band, and the network densification with the introduction of
    small-sized cells are the three technologies that will permit implementing eMBB
    services and realizing the Gbit/s mobile wireless experience. This paper is
    focused on the massive MIMO technology; initially conceived for conventional
    cellular frequencies in the sub-6 GHz range (mu-wave), the massive MIMO
    concept has been then progressively extended to the case in which mm-wave
    frequencies are used. However, due to different propagation mechanisms in urban
    scenarios, the resulting MIMO channel models at mu-wave and mm-wave are
    radically different. Six key basic differences are pinpointed in this paper,
    along with the implications that they have on the architecture and algorithms
    of the communication transceivers and on the attainable performance in terms of
    reliability and multiplexing capabilities.

    Space-Time Channel Modulation

    Ertugrul Basar, Ibrahim Altunbas
    Comments: Accepted for publication in IEEE Transactions on Vehicular Technology
    Subjects: Information Theory (cs.IT)

    In this paper, we introduce the concept of space-time channel modulation
    (STCM), which extends the classical space-time block codes into a new third
    dimension: channel states (transmission media) dimension. Three novel STCM
    schemes, which provide interesting trade-offs among decoding complexity, error
    performance and data rate, are proposed. It is shown via computer simulations
    that the proposed STCM schemes achieve considerably better error performance
    than the existing media-based modulation (MBM) and classical systems.

    Design of a Cognitive VLC Network with Illumination and Handover Requirements

    Marwan Hammouda, Jürgen Peissig, Anna Maria Vegni
    Subjects: Information Theory (cs.IT)

    In this paper, we consider a cognitive indoor visible light communications
    (VLC) system, comprised of multiple access points serving primary and secondary
    users through the orthogonal frequency division multiple access method. A
    cognitive lighting cell is divided into two non-overlapping regions that
    distinguish the primary and secondary users based on the region they are
    located in. Under the assumption of equal-power allocation among subcarriers,
    each region is defined in terms of its physical area and the number of
    allocated subcarriers within that region. In this paper, we provide the
    lighting cell design with cognitive constraints that guarantee fulfilling
    certain illumination, user mobility, and handover requirements in each cell. We
    further argue that, under some conditions, a careful assignment of the
    subcarriers in each region can mitigate the co-channel interference in the
    overlapping areas of adjacent cells. Numerical results depict the influence of
    different system parameters, such as user density, on defining both regions.
    Finally, a realistic example is implemented to assess the performance of the
    proposed scheme via Monte Carlo simulations.

    Multiuser Millimeter Wave Beamforming Strategies with Quantized and Statistical CSIT

    Mingbo Dai, Bruno Clerckx
    Comments: submitted to TWC
    Subjects: Information Theory (cs.IT)

    To alleviate the high cost of hardware in mmWave systems, hybrid
    analog/digital precoding is typically employed. In the conventional two-stage
    feedback scheme, the analog beamformer is determined by beam search and
    feedback to maximize the desired signal power of each user. The digital
    precoder is designed based on quantization and feedback of effective channel to
    mitigate multiuser interference. Alternatively, we propose a one-stage feedback
    scheme which effectively reduces the complexity of the signalling and feedback
    procedure. Specifically, the second-order channel statistics are leveraged to
    design digital precoder for interference mitigation while all feedback overhead
    is reserved for precise analog beamforming. Under a fixed total feedback
    constraint, we investigate the conditions under which the one-stage feedback
    scheme outperforms the conventional two-stage counterpart. Moreover, a rate
    splitting (RS) transmission strategy is introduced to further tackle the
    multiuser interference and enhance the rate performance. Consider (1) RS
    precoded by the one-stage feedback scheme and (2) conventional transmission
    strategy precoded by the two-stage scheme with the same first-stage feedback as
    (1) and also certain amount of extra second-stage feedback. We show that (1)
    can achieve a sum rate comparable to that of (2). Hence, RS enables remarkable
    saving in the second-stage training and feedback overhead.

    Proactive Resource Management in LTE-U Systems: A Deep Learning Perspective

    Ursula Challita, Li Dong, Walid Saad
    Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

    LTE in unlicensed spectrum (LTE-U) is a promising approach to overcome the
    wireless spectrum scarcity. However, to reap the benefits of LTE-U, a fair
    coexistence mechanism with other incumbent WiFi deployments is required. In
    this paper, a novel deep learning approach is proposed for modeling the
    resource allocation problem of LTE-U small base stations (SBSs). The proposed
    approach enables multiple SBSs to proactively perform dynamic channel
    selection, carrier aggregation, and fractional spectrum access while
    guaranteeing fairness with existing WiFi networks and other LTE-U operators.
    Adopting a proactive coexistence mechanism enables future delay-intolerant
    LTE-U data demands to be served within a given prediction window ahead of their
    actual arrival time thus avoiding the underutilization of the unlicensed
    spectrum during off-peak hours while maximizing the total served LTE-U traffic
    load. To this end, a noncooperative game model is formulated in which SBSs are
    modeled as Homo Egualis agents that aim at predicting a sequence of future
    actions and thus achieving long-term equal weighted fairness with WLAN and
    other LTE-U operators over a given time horizon. The proposed deep learning
    algorithm is then shown to reach a mixed-strategy Nash equilibrium (NE), when
    it converges. Simulation results using real data traces show that the proposed
    scheme can yield up to 28% and 11% gains over a conventional reactive approach
    and a proportional fair coexistence mechanism, respectively. The results also
    show that the proposed framework prevents WiFi performance degradation for a
    densely deployed LTE-U network.

    Maximally Correlated Principal Component Analysis

    Soheil Feizi, David Tse
    Comments: 35 pages, 5 figures
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    In the era of big data, reducing data dimensionality is critical in many
    areas of science. Widely used Principal Component Analysis (PCA) addresses this
    problem by computing a low dimensional data embedding that maximally explain
    variance of the data. However, PCA has two major weaknesses. Firstly, it only
    considers linear correlations among variables (features), and secondly it is
    not suitable for categorical data. We resolve these issues by proposing
    Maximally Correlated Principal Component Analysis (MCPCA). MCPCA computes
    transformations of variables whose covariance matrix has the largest Ky Fan
    norm. Variable transformations are unknown, can be nonlinear and are computed
    in an optimization. MCPCA can also be viewed as a multivariate extension of
    Maximal Correlation. For jointly Gaussian variables we show that the covariance
    matrix corresponding to the identity (or the negative of the identity)
    transformations majorizes covariance matrices of non-identity functions. Using
    this result we characterize global MCPCA optimizers for nonlinear functions of
    jointly Gaussian variables for every rank constraint. For categorical variables
    we characterize global MCPCA optimizers for the rank one constraint based on
    the leading eigenvector of a matrix computed using pairwise joint
    distributions. For a general rank constraint we propose a block coordinate
    descend algorithm and show its convergence to stationary points of the MCPCA
    optimization. We compare MCPCA with PCA and other state-of-the-art
    dimensionality reduction methods including Isomap, LLE, multilayer autoencoders
    (neural networks), kernel PCA, probabilistic PCA and diffusion maps on several
    synthetic and real datasets. We show that MCPCA consistently provides improved
    performance compared to other methods.




沪ICP备19023445号-2号
友情链接