IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 16 Nov 2016

    我爱机器学习(52ml.net)发表于 2016-11-16 00:00:00
    love 0

    Neural and Evolutionary Computing

    Prediction of Seasonal Temperature Using Soft Computing Techniques: Application in Benevento (Southern Italy) Area

    Salvatore Rampone, Alessio Valente
    Journal-ref: Journal of Ambient Intelligence and Humanized Computing (2016)
    Subjects: Neural and Evolutionary Computing (cs.NE); Atmospheric and Oceanic Physics (physics.ao-ph)

    In this work two soft computing methods, Artificial Neural Networks and
    Genetic Programming, are proposed in order to forecast the mean temperature
    that will occur in future seasons. The area in which the soft computing
    techniques were applied is that of the surroundings of the town of Benevento,
    in the south of Italy, having geographic coordinates (lat. 41{deg}07’50″N;
    long.14{deg}47’13″E). This area is not affected by maritime influences as well
    as by winds coming from the west. The methods are fed by data recorded in the
    meteorological stations of Benevento and Castelvenere, located in the hilly
    area, which characterizes the territory surrounding this city, at 144 m a.s.l.
    Both the applied methods show low error rates, while the Genetic Programming
    offers an explicit rule representation (a formula) explaining the prevision.

    Keywords Seasonal Temperature Forecasting; Soft Computing; Artificial Neural
    Networks; Genetic Programming; Southern Italy.

    Differentiable Genetic Programming

    Dario Izzo, Francesco Biscani, Alessio Mereta
    Subjects: Neural and Evolutionary Computing (cs.NE)

    We introduce the use of high order automatic differentiation, implemented via
    the algebra of truncated Taylor polynomials, in genetic programming. Using the
    Cartesian Genetic Programming encoding we obtain a high-order Taylor
    representation of the program output that is then used to back-propagate errors
    during learning. The resulting machine learning framework is called
    differentiable Cartesian Genetic Programming (dCGP). In the context of symbolic
    regression, dCGP offers a new approach to the long unsolved problem of constant
    representation in GP expressions. On several problems of increasing complexity
    we find that dCGP is able to find the exact form of the symbolic expression as
    well as the constants values. We also demonstrate the use of dCGP to solve a
    large class of differential equations and to find prime integrals of dynamical
    systems, presenting, in both cases, results that confirm the efficacy of our
    approach.

    Intrinsic Geometric Information Transfer Learning on Multiple Graph-Structured Datasets

    Jaekoo Lee, Hyunjae Kim, Jongsun Lee, Sungroh Yoon
    Comments: AAAI 2017 Conference
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Graphs provide a powerful means for representing complex interactions between
    entities. Recently, deep learning approaches are emerging for representing and
    modeling graph-structured data, although the conventional deep learning methods
    (such as convolutional neural networks and recurrent neural networks) have
    mainly focused on grid-structured inputs (image and audio). Leveraged by the
    capability of representation learning, deep learning based techniques are
    reporting promising results for graph applications by detecting structural
    characteristics of graphs in an automated fashion. In this paper, we attempt to
    advance deep learning for graph-structured data by incorporating another
    component, transfer learning. By transferring the intrinsic geometric
    information learned in the source domain, our approach can help us to construct
    a model for a new but related task in the target domain without collecting new
    data and without training a new model from scratch. We thoroughly test our
    approach with large-scale real corpora and confirm the effectiveness of the
    proposed transfer learning framework for deep learning on graphs. According to
    our experiments, transfer learning is most effective when the source and target
    domains bear a high level of structural similarity in their graph
    representations.

    Comparison of Brain Networks with Unknown Correspondences

    Sofia Ira Ktena, Sarah Parisot, Jonathan Passerat-Palmbach, Daniel Rueckert
    Comments: Presented at The MICCAI-BACON 16 Workshop (this https URL)
    Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)

    Graph theory has drawn a lot of attention in the field of Neuroscience during
    the last decade, mainly due to the abundance of tools that it provides to
    explore the interactions of elements in a complex network like the brain. The
    local and global organization of a brain network can shed light on mechanisms
    of complex cognitive functions, while disruptions within the network can be
    linked to neurodevelopmental disorders. In this effort, the construction of a
    representative brain network for each individual is critical for further
    analysis. Additionally, graph comparison is an essential step for inference and
    classification analyses on brain graphs. In this work we explore a method based
    on graph edit distance for evaluating graph similarity, when correspondences
    between network elements are unknown due to different underlying subdivisions
    of the brain. We test this method on 30 unrelated subjects as well as 40 twin
    pairs and show that this method can accurately reflect the higher similarity
    between two related networks compared to unrelated ones, while identifying node
    correspondences.

    Cortical Geometry Network and Topology Markers for Parkinson's Disease

    Amanmeet Garg, Donghuan Lu, Karteek Popuri, Mirza Faisal Beg
    Comments: Presented at The MICCAI-BACON 16 Workshop (arXiv:1611.03363) Report number: BACON/2016/02
    Subjects: Applications (stat.AP); Neural and Evolutionary Computing (cs.NE)

    Neurodegeneration affects cortical gray matter leading to loss of cortical
    mantle volume. As a result of such volume loss, the geometrical arrangement of
    the regions on the cortical surface is expected to be altered in comparison to
    healthy brains. Here we present a novel method to study the alterations in
    brain cortical surface geometry in Parkinson’s disease (PD) subjects with a
    emph{Geometry Networks (GN)} framework. The local geometrical arrangement of
    the cortical surface is captured as the 3D coordinates of the centroids of
    anatomically defined parcels on the surface. The inter-regional distance
    between cortical patches is the signal of interest and is captured as a
    geometry network. We study its topology by computing the dimensionality of
    simplicial complexes induced on a filtration of binary undirected networks for
    each geometry network. In a permutation statistics test, a statistically
    significant ((p<0.05)) difference was observed in the homology features between
    PD and healthy control groups highlighting its potential to differentiate
    between the groups and their potential utility in disease diagnosis.


    Computer Vision and Pattern Recognition

    OctNet: Learning Deep 3D Representations at High Resolutions

    Gernot Riegler, Ali Osman Ulusoys, Andreas Geiger
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present OctNet, a representation for deep learning with sparse 3D data. In
    contrast to existing models, our representation enables 3D convolutional
    networks which are both deep and high resolution. Towards this goal, we exploit
    the sparsity in the input data to hierarchically partition the space using a
    set of unbalanced octrees where each leaf node stores a pooled feature
    representation. This allows to focus memory allocation and computation to the
    relevant dense regions and enables deeper networks without compromising
    resolution. We demonstrate the utility of our OctNet representation by
    analyzing the impact of resolution on several 3D tasks including 3D object
    classification, orientation estimation and point cloud labeling.

    Hybrid Light Field Imaging for Improved Spatial Resolution and Depth Range

    M. Zeshan Alam, Bahadir K. Gunturk
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Light field imaging involves capturing both angular and spatial distribution
    of light; it enables new capabilities, such as post-capture digital refocusing,
    camera aperture adjustment, perspective shift, and depth estimation. Micro-lens
    array (MLA) based light field cameras provide a cost-effective approach to
    light field imaging. There are two main limitations of MLA-based light field
    cameras: low spatial resolution and narrow baseline. While low spatial
    resolution limits the general purpose use and applicability of light field
    cameras, narrow baseline limits the depth estimation range and accuracy. In
    this paper, we present a hybrid stereo imaging system that includes a light
    field camera and a regular camera. The hybrid system addresses both spatial
    resolution and narrow baseline issues of the MLA-based light field cameras
    while preserving light field imaging capabilities.

    Light Field Stitching for Extended Synthetic Aperture

    M. Umair Mukati, Bahadir K. Gunturk
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Through capturing spatial and angular radiance distribution, light field
    cameras introduce new capabilities that are not possible with conventional
    cameras. So far in the light field imaging literature, the focus has been on
    the theory and applications of single light field capture. By combining
    multiple light fields, it is possible to obtain new capabilities and
    enhancements, and even exceed physical limitations, such as spatial resolution
    and aperture size of the imaging device. In this paper, we present an algorithm
    to register and stitch multiple light fields. We utilize the regularity of the
    spatial and angular sampling in light field data, and extend some techniques
    developed for stereo vision systems to light field data. Such an extension is
    not straightforward for a micro-lens array (MLA) based light field camera due
    to extremely small baseline and low spatial resolution. By merging multiple
    light fields captured by an MLA based camera, we obtain larger synthetic
    aperture, which results in improvements in light field capabilities, such as
    increased depth estimation range/accuracy and wider perspective shift range.

    One-to-Many Network for Visually Pleasing Compression Artifacts Reduction

    Jun Guo, Hongyang Chao
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We consider the compression artifacts reduction problem, where a compressed
    image is transformed into an artifact-free image. Recent approaches for this
    problem typically train a one-to-one mapping using a per-pixel (L_2) loss
    between the outputs and the ground-truths. We point out that these approaches
    used to produce overly smooth results, and PSNR doesn’t reflect their real
    performance. In this paper, we propose a one-to-many network, which measures
    output quality using a perceptual loss, a naturalness loss, and a JPEG loss. We
    also avoid grid-like artifacts during deconvolution using a “shift-and-average”
    strategy. Extensive experimental results demonstrate the dramatic visual
    improvement of our approach over the state of the arts.

    CIFAR-10: KNN-based Ensemble of Classifiers

    Yehya Abouelnaga, Ola S. Ali, Hager Rady, Mohamed Moustafa
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we study the performance of different classifiers on the
    CIFAR-10 dataset, and build an ensemble of classifiers to reach a better
    performance. We show that, on CIFAR-10, K-Nearest Neighbors (KNN) and
    Convolutional Neural Network (CNN), on some classes, are mutually exclusive,
    thus yield in higher accuracy when combined. We reduce KNN overfitting using
    Principal Component Analysis (PCA), and ensemble it with a CNN to increase its
    accuracy. Our approach improves our best CNN model from 93.33% to 94.03%.

    Diversity encouraged learning of unsupervised LSTM ensemble for neural activity video prediction

    Yilin Song, Jonathan Viventi, Yao Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Being able to predict the neural signal in the near future from the current
    and previous observations has the potential to enable real-time responsive
    brain stimulation to suppress seizures. We have investigated how to use an
    auto-encoder model consisting of LSTM cells for such prediction. Recog- nizing
    that there exist multiple activity pattern clusters, we have further explored
    to train an ensemble of LSTM mod- els so that each model can specialize in
    modeling certain neural activities, without explicitly clustering the training
    data. We train the ensemble using an ensemble-awareness loss, which jointly
    solves the model assignment problem and the error minimization problem. During
    training, for each training sequence, only the model that has the lowest recon-
    struction and prediction error is updated. Intrinsically such a loss function
    enables each LTSM model to be adapted to a subset of the training sequences
    that share similar dynamic behavior. We demonstrate this can be trained in an
    end- to-end manner and achieve significant accuracy in neural activity
    prediction.

    Constrained Low-Rank Learning Using Least Squares-Based Regularization

    Ping Li, Jun Yu, Meng Wang, Luming Zhang, Deng Cai, Xuelong Li
    Comments: 14 pages, 7 figures, accepted to appear in IEEE Transactions on Cybernetics
    Journal-ref: IEEE Transactions on Cybernetics, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Low-rank learning has attracted much attention recently due to its efficacy
    in a rich variety of real-world tasks, e.g., subspace segmentation and image
    categorization. Most low-rank methods are incapable of capturing
    low-dimensional subspace for supervised learning tasks, e.g., classification
    and regression. This paper aims to learn both the discriminant low-rank
    representation (LRR) and the robust projecting subspace in a supervised manner.
    To achieve this goal, we cast the problem into a constrained rank minimization
    framework by adopting the least squares regularization. Naturally, the data
    label structure tends to resemble that of the corresponding low-dimensional
    representation, which is derived from the robust subspace projection of clean
    data by low-rank learning. Moreover, the low-dimensional representation of
    original data can be paired with some informative structure by imposing an
    appropriate constraint, e.g., Laplacian regularizer. Therefore, we propose a
    novel constrained LRR method. The objective function is formulated as a
    constrained nuclear norm minimization problem, which can be solved by the
    inexact augmented Lagrange multiplier algorithm. Extensive experiments on image
    classification, human pose estimation, and robust face recovery have confirmed
    the superiority of our method.

    Scale-constrained Unsupervised Evaluation Method for Multi-scale Image Segmentation

    Yuhang Lu, Youchuan Wan, Gang Li
    Comments: 5 pages, 2016 IEEE International Conference on Image Processing
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Unsupervised evaluation of segmentation quality is a crucial step in image
    segmentation applications. Previous unsupervised evaluation methods usually
    lacked the adaptability to multi-scale segmentation. A scale-constrained
    evaluation method that evaluates segmentation quality according to the
    specified target scale is proposed in this paper. First, regional saliency and
    merging cost are employed to describe intra-region homogeneity and inter-region
    heterogeneity, respectively. Subsequently, both of them are standardized into
    equivalent spectral distances of a predefined region. Finally, by analyzing the
    relationship between image characteristics and segmentation quality, we
    establish the evaluation model. Experimental results show that the proposed
    method outperforms four commonly used unsupervised methods in multi-scale
    evaluation tasks.

    Deeply supervised salient object detection with short connections

    Qibin Hou, Ming-Ming Cheng, Xiao-Wei Hu, Ali Borji, Zhuowen Tu, Philip Torr
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent progress on saliency detection is substantial, benefiting mostly from
    the explosive development of Convolutional Neural Networks (CNNs). Semantic
    segmentation and saliency detection algorithms developed lately have been
    mostly based on Fully Convolutional Neural Networks (FCNs). There is still a
    large room for improvement over the generic FCN models that do not explicitly
    deal with the scale-space problem. Holistically-Nested Edge Detector (HED)
    provides a skip-layer structure with deep supervision for edge and boundary
    detection, but the performance gain of HED on salience detection is not
    obvious. In this paper, we propose a new method for saliency detection by
    introducing short connections to the skip-layer structures within the HED
    architecture. Our framework provides rich multi-scale feature maps at each
    layer, a property that is critically needed to perform segment detection. Our
    method produces state-of-the-art results on 5 widely tested salient object
    detection benchmarks, with advantages in terms of efficiency (0.15 seconds per
    image), effectiveness, and simplicity over the existing algorithms.

    Multilinear Low-Rank Tensors on Graphs & Applications

    Nauman Shahid, Francesco Grassi, Pierre Vandergheynst
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    We propose a new framework for the analysis of low-rank tensors which lies at
    the intersection of spectral graph theory and signal processing. As a first
    step, we present a new graph based low-rank decomposition which approximates
    the classical low-rank SVD for matrices and multi-linear SVD for tensors. Then,
    building on this novel decomposition we construct a general class of convex
    optimization problems for approximately solving low-rank tensor inverse
    problems, such as tensor Robust PCA. The whole framework is named as
    ‘Multilinear Low-rank tensors on Graphs (MLRTG)’. Our theoretical analysis
    shows: 1) MLRTG stands on the notion of approximate stationarity of
    multi-dimensional signals on graphs and 2) the approximation error depends on
    the eigen gaps of the graphs. We demonstrate applications for a wide variety of
    4 artificial and 12 real tensor datasets, such as EEG, FMRI, BCI, surveillance
    videos and hyperspectral images. Generalization of the tensor concepts to
    non-euclidean domain, orders of magnitude speed-up, low-memory requirement and
    significantly enhanced performance at low SNR are the key aspects of our
    framework.

    Feature Extraction and Soft Computing Methods for Aerospace Structure Defect Classification

    Gianni D'Angelo, Salvatore Rampone
    Journal-ref: Measurement Volume 85, May 2016, Pages 192-209
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This study concerns the effectiveness of several techniques and methods of
    signals processing and data interpretation for the diagnosis of aerospace
    structure defects. This is done by applying different known feature extraction
    methods, in addition to a new CBIR-based one; and some soft computing
    techniques including a recent HPC parallel implementation of the U-BRAIN
    learning algorithm on Non Destructive Testing data. The performance of the
    resulting detection systems are measured in terms of Accuracy, Sensitivity,
    Specificity, and Precision. Their effectiveness is evaluated by the Matthews
    correlation, the Area Under Curve (AUC), and the F-Measure. Several experiments
    are performed on a standard dataset of eddy current signal samples for aircraft
    structures. Our experimental results evidence that the key to a successful
    defect classifier is the feature extraction method – namely the novel
    CBIR-based one outperforms all the competitors – and they illustrate the
    greater effectiveness of the U-BRAIN algorithm and the MLP neural network among
    the soft computing methods in this kind of application.

    Keywords- Non-destructive testing (NDT); Soft Computing; Feature Extraction;
    Classification Algorithms; Content-Based Image Retrieval (CBIR); Eddy Currents
    (EC).

    Motion Estimated-Compensated Reconstruction with Preserved-Features in Free-Breathing Cardiac MRI

    Aurelien Bustin, Anne Menini, Martin A. Janich, Darius Burschka, Jacques Felblinger, Anja C.S. Brau, Freddy Odille
    Comments: 12 pages, 6 figures, accepted at MICCAI 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)

    To develop an efficient motion-compensated reconstruction technique for
    free-breathing cardiac magnetic resonance imaging (MRI) that allows
    high-quality images to be reconstructed from multiple undersampled single-shot
    acquisitions. The proposed method is a joint image reconstruction and motion
    correction method consisting of several steps, including a non-rigid motion
    extraction and a motion-compensated reconstruction. The reconstruction includes
    a denoising with the Beltrami regularization, which offers an ideal compromise
    between feature preservation and staircasing reduction. Results were assessed
    in simulation, phantom and volunteer experiments. The proposed joint image
    reconstruction and motion correction method exhibits visible quality
    improvement over previous methods while reconstructing sharper edges. Moreover,
    when the acceleration factor increases, standard methods show blurry results
    while the proposed method preserves image quality. The method was applied to
    free-breathing single-shot cardiac MRI, successfully achieving high image
    quality and higher spatial resolution than conventional segmented methods, with
    the potential to offer high-quality delayed enhancement scans in challenging
    patients.

    Audio Event and Scene Recognition: A Unified Approach using Strongly and Weakly Labeled Data

    Anurag Kumar, Bhiksha Raj
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)

    In this paper we propose a novel learning framework called Supervised and
    Weakly Supervised Learning where the goal is to learn simultaneously from
    weakly and strongly labeled data. Strongly labeled data can be simply
    understood as fully supervised data where all labeled instances are available.
    In weakly supervised learning only weak labels are available. Our proposed
    framework is motivated by the fact that a small amount of strongly labeled data
    can give considerable improvement over only weakly supervised learning. The
    primary problem domain focus of this paper is acoustic event and scene
    detection in audio recordings. We first propose a naive formulation for
    leveraging labeled data in both forms. We then propose a more general framework
    for Supervised and Weakly Supervised Learning (SWSL). Based on this general
    framework, we propose a graph based approach for SWSL. Our main method is based
    on manifold regularization on graphs in which we show that the unified learning
    can be formulated as a constraint optimization problem which can be solved by
    iterative concave-convex procedure (CCCP). Our experiments show that our
    proposed framework can address several concerns of audio content analysis using
    weakly labeled data.

    When Saliency Meets Sentiment: Understanding How Image Content Invokes Emotion and Sentiment

    Honglin Zheng, Tianlang Chen, Jiebo Luo
    Comments: 7 pages, 5 figures, submitted to AAAI-17
    Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    Sentiment analysis is crucial for extracting social signals from social media
    content. Due to the prevalence of images in social media, image sentiment
    analysis is receiving increasing attention in recent years. However, most
    existing systems are black-boxes that do not provide insight on how image
    content invokes sentiment and emotion in the viewers. Psychological studies
    have confirmed that salient objects in an image often invoke emotions. In this
    work, we investigate more fine-grained and more comprehensive interaction
    between visual saliency and visual sentiment. In particular, we partition
    images in several primary scene-type dimensions, including: open-closed,
    natural-manmade, indoor-outdoor, and face-noface. Using state of the art
    saliency detection algorithm and sentiment classification algorithm, we examine
    how the sentiment of the salient region(s) in an image relates to the overall
    sentiment of the image. The experiments on a representative image emotion
    dataset have shown interesting correlation between saliency and sentiment in
    different scene types and in turn shed light on the mechanism of visual
    sentiment evocation.


    Artificial Intelligence

    An integrated Graphical User Interface for Debugging Answer Set Programs

    Philip Gasteiger, Carmine Dodaro, Benjamin Musitsch, Kristian Reale, Francesco Ricca, Konstantin Schekotihin
    Comments: Paper presented at the 1st Workshop on Trends and Applications of Answer Set Programming (TAASP 2016), Klagenfurt, Austria, 26 September 2016, 15 pages, LaTeX, 5 figures
    Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

    Answer Set Programming (ASP) is an expressive knowledge representation and
    reasoning framework. Due to its rather simple syntax paired with
    high-performance solvers, ASP is interesting for industrial applications.
    However, to err is human and thus debugging is an important activity during the
    development process. Therefore, tools for debugging non-ground answer set
    programs are needed. In this paper, we present a new graphical debugging
    interface for non-ground answer set programs. The tool is based on the
    recently-introduced DWASP approach for debugging and it simplifies the
    interaction with the debugger. Furthermore, the debugging interface is
    integrated in ASPIDE, a rich IDE for answer set programs. With our extension
    ASPIDE turns into a full-fledged IDE by offering debugging support.

    An Evaluation of Information Sharing Parking Guidance Policies Using a Bayesian Approach

    Xinyi Wu, Kartik Balkumar, Qi Luo, Robert Hampshire, Romesh Saigal
    Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Robotics (cs.RO)

    Real-time parking occupancy information is critical for a parking management
    system to facilitate drivers to park more efficiently. Recent advances in
    connected and automated vehicle technologies enable sensor-equipped cars (probe
    cars) to detect and broadcast available parking spaces when driving through
    parking lots. In this paper, we evaluate the impact of market penetration of
    probe cars on the system performance, and investigate different parking
    guidance policies to improve the data acquisition process. We adopt a
    simulation-based approach to impose four policies on an off- street parking lot
    influencing the behavior of probe cars to park in assigned parking spaces. This
    in turn effects the scanning route and the parking space occupancy estimations.
    The last policy we propose is a near-optimal guidance strategy that maximizes
    the information gain of posteriors. The results suggest that an efficient
    information gathering policy can compensate for low penetration of connected
    and automated vehicles. We also highlight the policy trade-off that occur while
    attempting to maximize information gain through explorations and improve
    assignment accuracy through exploitations. Our results can assist urban policy
    makers in designing and managing smart parking systems.

    #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

    Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel
    Comments: 16 pages. Under review as a conference paper at ICLR 2017
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    Count-based exploration algorithms are known to perform near-optimally when
    used in conjunction with tabular reinforcement learning (RL) methods for
    solving small discrete Markov decision processes (MDPs). It is generally
    thought that count-based methods cannot be applied in high-dimensional state
    spaces, since most states will only occur once. Recent deep RL exploration
    strategies are able to deal with high-dimensional continuous state spaces
    through complex heuristics, often relying on optimism in the face of
    uncertainty or intrinsic motivation. In this work, we describe a surprising
    finding: a simple generalization of the classic count-based approach can reach
    near state-of-the-art performance on various high-dimensional and/or continuous
    deep RL benchmarks. States are mapped to hash codes, which allows to count
    their occurrences with a hash table. These counts are then used to compute a
    reward bonus according to the classic count-based exploration theory. We find
    that simple hash functions can achieve surprisingly good results on many
    challenging tasks. Furthermore, we show that a domain-dependent learned hash
    code may further improve these results. Detailed analysis reveals important
    aspects of a good hash function: 1) having appropriate granularity and 2)
    encoding information relevant to solving the MDP. This exploration strategy
    achieves near state-of-the-art performance on both continuous control tasks and
    Atari 2600 games, hence providing a simple yet powerful baseline for solving
    MDPs that require considerable exploration.

    Causal Inference in Observational Data

    Pranjul Yadav, Lisiane Prunelli, Alexander Hoff, Michael Steinbach, Bonnie Westra, Vipin Kumar, Gyorgy Simon
    Subjects: Artificial Intelligence (cs.AI); Applications (stat.AP)

    Our aging population increasingly suffers from multiple chronic diseases
    simultaneously, necessitating the comprehensive treatment of these conditions.
    Finding the optimal set of drugs for a combinatorial set of diseases is a
    combinatorial pattern exploration problem. Association rule mining is a popular
    tool for such problems, but the requirement of health care for finding causal,
    rather than associative, patterns renders association rule mining unsuitable.
    To address this issue, we propose a novel framework based on the Rubin-Neyman
    causal model for extracting causal rules from observational data, correcting
    for a number of common biases. Specifically, given a set of interventions and a
    set of items that define subpopulations (e.g., diseases), we wish to find all
    subpopulations in which effective intervention combinations exist and in each
    such subpopulation, we wish to find all intervention combinations such that
    dropping any intervention from this combination will reduce the efficacy of the
    treatment. A key aspect of our framework is the concept of closed intervention
    sets which extend the concept of quantifying the effect of a single
    intervention to a set of concurrent interventions. We also evaluated our causal
    rule mining framework on the Electronic Health Records (EHR) data of a large
    cohort of patients from Mayo Clinic and showed that the patterns we extracted
    are sufficiently rich to explain the controversial findings in the medical
    literature regarding the effect of a class of cholesterol drugs on Type-II
    Diabetes Mellitus (T2DM).

    Implicit ReasoNet: Modeling Large-Scale Structured Relationships with Shared Memory

    Yelong Shen, Po-Sen Huang, Ming-Wei Chang, Jianfeng Gao
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    Recent studies on knowledge base completion, the task of recovering missing
    relationships based on recorded relations, demonstrate the importance of
    learning embeddings from multi-step relations. However, due to the size of
    knowledge bases, learning multi-step relations directly on top of observed
    instances could be costly. In this paper, we propose Implicit ReasoNets (IRNs),
    which is designed to perform large-scale inference implicitly through a search
    controller and shared memory. Unlike previous work, IRNs use training data to
    learn to perform multi-step inference through the shared memory, which is also
    jointly updated during training. While the inference procedure is not operating
    on top of observed instances for IRNs, our proposed model outperforms all
    previous approaches on the popular FB15k benchmark by more than 5.7%.

    When Saliency Meets Sentiment: Understanding How Image Content Invokes Emotion and Sentiment

    Honglin Zheng, Tianlang Chen, Jiebo Luo
    Comments: 7 pages, 5 figures, submitted to AAAI-17
    Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    Sentiment analysis is crucial for extracting social signals from social media
    content. Due to the prevalence of images in social media, image sentiment
    analysis is receiving increasing attention in recent years. However, most
    existing systems are black-boxes that do not provide insight on how image
    content invokes sentiment and emotion in the viewers. Psychological studies
    have confirmed that salient objects in an image often invoke emotions. In this
    work, we investigate more fine-grained and more comprehensive interaction
    between visual saliency and visual sentiment. In particular, we partition
    images in several primary scene-type dimensions, including: open-closed,
    natural-manmade, indoor-outdoor, and face-noface. Using state of the art
    saliency detection algorithm and sentiment classification algorithm, we examine
    how the sentiment of the salient region(s) in an image relates to the overall
    sentiment of the image. The experiments on a representative image emotion
    dataset have shown interesting correlation between saliency and sentiment in
    different scene types and in turn shed light on the mechanism of visual
    sentiment evocation.

    The NOESIS Network-Oriented Exploration, Simulation, and Induction System

    Víctor Martínez, Fernando Berzal, Juan-Carlos Cubero
    Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)

    Network data mining has become an important area of study due to the large
    number of problems it can be applied to. This paper presents NOESIS, an open
    source framework for network data mining that provides a large collection of
    network analysis techniques, including the analysis of network structural
    properties, community detection methods, link scoring, and link prediction, as
    well as network visualization algorithms. It also features a complete
    stand-alone graphical user interface that facilitates the use of all these
    techniques. The NOESIS framework has been designed using solid object-oriented
    design principles and structured parallel programming. As a lightweight library
    with minimal external dependencies and a permissive software license, NOESIS
    can be incorporated into other software projects. Released under a BSD license,
    it is available from this http URL

    Recoverability of Joint Distribution from Missing Data

    Jin Tian
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI)

    A probabilistic query may not be estimable from observed data corrupted by
    missing values if the data are not missing at random (MAR). It is therefore of
    theoretical interest and practical importance to determine in principle whether
    a probabilistic query is estimable from missing data or not when the data are
    not MAR. We present an algorithm that systematically determines whether the
    joint probability is estimable from observed data with missing values, assuming
    that the data-generation model is represented as a Bayesian network containing
    unobserved latent variables that not only encodes the dependencies among the
    variables but also explicitly portrays the mechanisms responsible for the
    missingness process. The result significantly advances the existing work.


    Information Retrieval

    A Generic Coordinate Descent Framework for Learning from Implicit Feedback

    Immanuel Bayer, Xiangnan He, Bhargav Kanagal, Steffen Rendle
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

    In recent years, interest in recommender research has shifted from explicit
    feedback towards implicit feedback data. A diversity of complex models has been
    proposed for a wide variety of applications. Despite this, learning from
    implicit feedback is still computationally challenging. So far, most work
    relies on stochastic gradient descent (SGD) solvers which are easy to derive,
    but in practice challenging to apply, especially for tasks with many items. For
    the simple matrix factorization model, an efficient coordinate descent (CD)
    solver has been previously proposed. However, efficient CD approaches have not
    been derived for more complex models.

    In this paper, we provide a new framework for deriving efficient CD
    algorithms for complex recommender models. We identify and introduce the
    property of k-separable models. We show that k-separability is a sufficient
    property to allow efficient optimization of implicit recommender problems with
    CD. We illustrate this framework on a variety of state-of-the-art models
    including factorization machines and Tucker decomposition. To summarize, our
    work provides the theory and building blocks to derive efficient implicit CD
    algorithms for complex recommender models.

    Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm

    Kejun Huang, Xiao Fu, Nicholas D. Sidiropoulos
    Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    In topic modeling, many algorithms that guarantee identifiability of the
    topics have been developed under the premise that there exist anchor words —
    i.e., words that only appear (with positive probability) in one topic.
    Follow-up work has resorted to three or higher-order statistics of the data
    corpus to relax the anchor word assumption. Reliable estimates of higher-order
    statistics are hard to obtain, however, and the identification of topics under
    those models hinges on uncorrelatedness of the topics, which can be
    unrealistic. This paper revisits topic modeling based on second-order moments,
    and proposes an anchor-free topic mining framework. The proposed approach
    guarantees the identification of the topics under a much milder condition
    compared to the anchor-word assumption, thereby exhibiting much better
    robustness in practice. The associated algorithm only involves one
    eigen-decomposition and a few small linear programs. This makes it easy to
    implement and scale up to very large problem instances. Experiments using the
    TDT2 and Reuters-21578 corpus demonstrate that the proposed anchor-free
    approach exhibits very favorable performance (measured using coherence,
    similarity count, and clustering accuracy metrics) compared to the prior art.


    Computation and Language

    Recurrent Neural Network based Part-of-Speech Tagger for Code-Mixed Social Media Text

    Raj Nath Patel, Prakash B. Pimpale, M Sasikumat
    Comments: 7 pages, Published at the Tool Contest on POS tagging for Indian Social Media Text, ICON 2016
    Journal-ref: In the Proceedings of Tool Contest on POS tagging for Indian
    Social Media Text, ICON 2016
    Subjects: Computation and Language (cs.CL)

    This paper describes Centre for Development of Advanced Computing’s (CDACM)
    submission to the shared task-‘Tool Contest on POS tagging for Code-Mixed
    Indian Social Media (Facebook, Twitter, and Whatsapp) Text’, collocated with
    ICON-2016. The shared task was to predict Part of Speech (POS) tag at word
    level for a given text. The code-mixed text is generated mostly on social media
    by multilingual users. The presence of the multilingual words,
    transliterations, and spelling variations make such content linguistically
    complex. In this paper, we propose an approach to POS tag code-mixed social
    media text using Recurrent Neural Network Language Model (RNN-LM) architecture.
    We submitted the results for Hindi-English (hi-en), Bengali-English (bn-en),
    and Telugu-English (te-en) code-mixed data.

    End-to-End Neural Sentence Ordering Using Pointer Network

    Jingjing Gong, Xinchi Chen, Xipeng Qiu, Xuanjing Huang
    Subjects: Computation and Language (cs.CL)

    Sentence ordering is one of important tasks in NLP. Previous works mainly
    focused on improving its performance by using pair-wise strategy. However, it
    is nontrivial for pair-wise models to incorporate the contextual sentence
    information. In addition, error prorogation could be introduced by using the
    pipeline strategy in pair-wise models. In this paper, we propose an end-to-end
    neural approach to address the sentence ordering problem, which uses the
    pointer network (Ptr-Net) to alleviate the error propagation problem and
    utilize the whole contextual information. Experimental results show the
    effectiveness of the proposed model. Source codes and dataset of this paper are
    available.

    Neural Machine Translation with Pivot Languages

    Yong Cheng, Yang Liu, Qian Yang, Maosong Sun, Wei Xu
    Subjects: Computation and Language (cs.CL)

    Neural machine translation systems typically rely on the size of parallel
    corpora. Nevertheless, high-quality parallel corpora are scarce resources for
    specific language pairs and domains. For a source-to-target language pair with
    a small parallel corpus, we introduce the pivot language to “bridge” source
    language and target language under the existence of large source-to-pivot and
    pivot-to-target parallel corpora. We propose three kinds of connection terms to
    jointly train source-to-pivot and pivot-to-target translation models in order
    to enhance the interaction between two sets of model parameters. Experiments on
    German-French and Spanish-French translation tasks with English as the pivot
    language show that our joint training approach improves the translation quality
    significantly than independent training on source-to-pivot, pivot-to-target and
    source-to-target directions.

    Interpreting the Syntactic and Social Elements of the Tweet Representations via Elementary Property Prediction Tasks

    J Ganesh, Manish Gupta, Vasudeva Varma
    Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
    Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)

    Research in social media analysis is experiencing a recent surge with a large
    number of works applying representation learning models to solve high-level
    syntactico-semantic tasks such as sentiment analysis, semantic textual
    similarity computation, hashtag prediction and so on. Although the performance
    of the representation learning models are better than the traditional baselines
    for the tasks, little is known about the core properties of a tweet encoded
    within the representations. Understanding these core properties would empower
    us in making generalizable conclusions about the quality of representations.
    Our work presented here constitutes the first step in opening the black-box of
    vector embedding for social media posts, with emphasis on tweets in particular.

    In order to understand the core properties encoded in a tweet representation,
    we evaluate the representations to estimate the extent to which it can model
    each of those properties such as tweet length, presence of words, hashtags,
    mentions, capitalization, and so on. This is done with the help of multiple
    classifiers which take the representation as input. Essentially, each
    classifier evaluates one of the syntactic or social properties which are
    arguably salient for a tweet. This is also the first holistic study on
    extensively analysing the ability to encode these properties for a wide variety
    of tweet representation models including the traditional unsupervised methods
    (BOW, LDA), unsupervised representation learning methods (Siamese CBOW,
    Tweet2Vec) as well as supervised methods (CNN, BLSTM).

    Quantitative Entropy Study of Language Complexity

    R.R. Xie, W.B. Deng, D.J. Wang, L.P. Csernai
    Subjects: Computation and Language (cs.CL); Physics and Society (physics.soc-ph)

    We study the entropy of Chinese and English texts, based on characters in
    case of Chinese texts and based on words for both languages. Significant
    differences are found between the languages and between different personal
    styles of debating partners. The entropy analysis points in the direction of
    lower entropy, that is of higher complexity. Such a text analysis would be
    applied for individuals of different styles, a single individual at different
    age, as well as different groups of the population.

    Lost in Space: Geolocation in Event Data

    Sophie J. Lee, Howard Liu, Michael D. Ward
    Subjects: Computation and Language (cs.CL)

    Extracting the “correct” location information from text data, i.e.,
    determining the place of event, has long been a goal for automated text
    processing. To approximate human-like coding schema, we introduce a supervised
    machine learning algorithm that classifies each location word to be either
    correct or incorrect. We use news articles collected from around the world
    (Integrated Crisis Early Warning System [ICEWS] data and Open Event Data
    Alliance [OEDA] data) to test our algorithm that consists of two stages. In the
    feature selection stage, we extract contextual information from texts, namely,
    the N-gram patterns for location words, the frequency of mention, and the
    context of the sentences containing location words. In the classification
    stage, we use three classifiers to estimate the model parameters in the
    training set and then to predict whether a location word in the test set news
    articles is the place of the event. The validation results show that our
    algorithm improves the accuracy rate of the current geolocation methods of
    dictionary approach by as much as 25%.

    SimDoc: Topic Sequence Alignment based Document Similarity Framework

    Gaurav Maheshwari, Priyansh Trivedi, Harshita Sahijwani, Kunal Jha, Sourish Dasgupta, Jens Lehmann
    Subjects: Computation and Language (cs.CL)

    Document similarity is the problem of formally representing textual documents
    and then proposing a similarity measure that can be used to compute the
    linguistic similarity between two documents. Accurate document similarity
    computation improves many enterprise relevant tasks such as document
    clustering, text mining, and question-answering. Most contemporary techniques
    employ bag-of-words (BoW) based document representation models. In this paper,
    we show that a document’s thematic flow, which is often disregarded by
    bag-of-word techniques, is pivotal in estimating their semantic similarity. In
    this direction, we propose a novel semantic document similarity framework,
    called SimDoc. We model documents as topic-sequences, where topics represent
    latent generative clusters of relative words. We then use a sequence alignment
    algorithm, that has been adapted from the Smith-Waterman gene-sequencing
    algorithm, to estimate their semantic similarity. For similarity computation at
    a finer granularity, we tune the alignment algorithm by integrating it with a
    word embedding matrix based topic-to-topic similarity measure. A document level
    similarity score is then computed by again using the sequence alignment
    algorithm over all sentence pairs. In our experiments, we see that SimDoc
    outperforms many contemporary bag-of-words techniques in accurately computing
    document similarity, and on practical applications such as document clustering.

    Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder

    Thanh-Le Ha, Jan Niehues, Alexander Waibel
    Subjects: Computation and Language (cs.CL)

    In this paper, we present our first attempts in building a multilingual
    Neural Machine Translation framework under a unified approach. We are then able
    to employ attention-based NMT for many-to-many multilingual translation tasks.
    Our approach does not require any special treatment on the network architecture
    and it allows us to learn minimal number of free parameters in a standard way
    of training. Our approach has shown its effectiveness in an under-resourced
    translation scenario with considerable improvements up to 2.6 BLEU points. In
    addition, the approach has achieved interesting and promising results when
    applied in the translation task that there is no direct parallel corpus between
    source and target languages.

    A Neural Architecture Mimicking Humans End-to-End for Natural Language Inference

    Biswajit Paria, K. M. Annervaz, Ambedkar Dukkipati, Ankush Chatterjee, Sanjay Podder
    Comments: 8 pages, 2 figures
    Subjects: Computation and Language (cs.CL)

    In this work we use the recent advances in representation learning to propose
    a neural architecture for the problem of natural language inference. Our
    approach is aligned to mimic how a human does the natural language inference
    process given two statements. The model uses variants of Long Short Term Memory
    (LSTM), attention mechanism and composable neural networks, to carry out the
    task. Each part of our model can be mapped to a clear functionality humans do
    for carrying out the overall task of natural language inference. The model is
    end-to-end differentiable enabling training by stochastic gradient descent. On
    Stanford Natural Language Inference(SNLI) dataset, the proposed model achieves
    better accuracy numbers than all published models in literature.

    Knowledge Enhanced Hybrid Neural Network for Text Matching

    Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou
    Subjects: Computation and Language (cs.CL)

    Long text brings a big challenge to semantic matching due to their
    complicated semantic and syntactic structures. To tackle the challenge, we
    consider using prior knowledge to help identify useful information and filter
    out noise to matching in long text. To this end, we propose a knowledge
    enhanced hybrid neural network (KEHNN). The model fuses prior knowledge into
    word representations by knowledge gates and establishes three matching channels
    with words, sequential structures of sentences given by Gated Recurrent Units
    (GRU), and knowledge enhanced representations. The three channels are processed
    by a convolutional neural network to generate high level features for matching,
    and the features are synthesized as a matching score by a multilayer
    perceptron. The model extends the existing methods by conducting matching on
    words, local structures of sentences, and global context of sentences.
    Evaluation results from extensive experiments on public data sets for question
    answering and conversation show that KEHNN can significantly outperform
    the-state-of-the-art matching models and particularly improve the performance
    on pairs with long text.

    Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm

    Kejun Huang, Xiao Fu, Nicholas D. Sidiropoulos
    Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    In topic modeling, many algorithms that guarantee identifiability of the
    topics have been developed under the premise that there exist anchor words —
    i.e., words that only appear (with positive probability) in one topic.
    Follow-up work has resorted to three or higher-order statistics of the data
    corpus to relax the anchor word assumption. Reliable estimates of higher-order
    statistics are hard to obtain, however, and the identification of topics under
    those models hinges on uncorrelatedness of the topics, which can be
    unrealistic. This paper revisits topic modeling based on second-order moments,
    and proposes an anchor-free topic mining framework. The proposed approach
    guarantees the identification of the topics under a much milder condition
    compared to the anchor-word assumption, thereby exhibiting much better
    robustness in practice. The associated algorithm only involves one
    eigen-decomposition and a few small linear programs. This makes it easy to
    implement and scale up to very large problem instances. Experiments using the
    TDT2 and Reuters-21578 corpus demonstrate that the proposed anchor-free
    approach exhibits very favorable performance (measured using coherence,
    similarity count, and clustering accuracy metrics) compared to the prior art.

    The Role of Word Length in Semantic Topology

    Francesco Fumarola
    Comments: 17 pages, 10 figures
    Subjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL)

    A topological argument is presented concering the structure of semantic
    space, based on the negative correlation between polysemy and word length. The
    resulting graph structure is applied to the modeling of free-recall
    experiments, resulting in predictions on the comparative values of recall
    probabilities. Associative recall is found to favor longer words whereas
    sequential recall is found to favor shorter words. Data from the PEERS
    experiments of Lohnas et al. (2015) and Healey and Kahana (2016) confirm both
    predictons, with correlation coefficients (r_{seq}= -0.17) and (r_{ass}=
    +0.17). The argument is then applied to predicting global properties of list
    recall, which leads to a novel explanation for the word-length effect based on
    the optimization of retrieval strategies.

    Implicit ReasoNet: Modeling Large-Scale Structured Relationships with Shared Memory

    Yelong Shen, Po-Sen Huang, Ming-Wei Chang, Jianfeng Gao
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    Recent studies on knowledge base completion, the task of recovering missing
    relationships based on recorded relations, demonstrate the importance of
    learning embeddings from multi-step relations. However, due to the size of
    knowledge bases, learning multi-step relations directly on top of observed
    instances could be costly. In this paper, we propose Implicit ReasoNets (IRNs),
    which is designed to perform large-scale inference implicitly through a search
    controller and shared memory. Unlike previous work, IRNs use training data to
    learn to perform multi-step inference through the shared memory, which is also
    jointly updated during training. While the inference procedure is not operating
    on top of observed instances for IRNs, our proposed model outperforms all
    previous approaches on the popular FB15k benchmark by more than 5.7%.


    Distributed, Parallel, and Cluster Computing

    HPAT: High Performance Analytics with Scripting Ease-of-Use

    Ehsan Totoni, Todd A. Anderson, Tatiana Shpeisman
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Big data analytics requires high programmer productivity and high performance
    simultaneously on large-scale clusters. However, current big data analytics
    frameworks (e.g. Apache Spark) have high runtime overheads since they are
    library-based. Given the characteristics of the data analytics domain, we
    introduce the High Performance Analytics Toolkit (HPAT), which is a big data
    analytics framework that performs static compilation of high-level scripting
    programs into high performance parallel code using novel domainspecific
    compilation techniques. HPAT provides scripting abstractions in the Julia
    language for analytics tasks, automatically parallelizes them, generates
    efficient MPI/C++ code, and provides resiliency. Since HPAT is compilerbased,
    it avoids overheads of library-based systems such as dynamic task scheduling
    and master-executor coordination. In addition, it provides automatic
    optimizations for scripting programs, such as fusion of array operations.
    Therefore, HPAT is 14x to 400x faster than Spark on the Cori supercomputer at
    LBL/NERSC. Furthermore, HPAT is much more flexible in distributed data
    structures, which enables the use of existing libraries such as HDF5,
    ScaLAPACK, and Intel R DAAL.

    Massively-Parallel Similarity Join, Edge-Isoperimetry, and Distance Correlations on the Hypercube

    Paul Beame, Cyrus Rashtchian
    Comments: 23 pages, plus references and appendix. To appear in SODA 2017
    Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Distributed, Parallel, and Cluster Computing (cs.DC)

    We study distributed protocols for finding all pairs of similar vectors in a
    large dataset. Our results pertain to a variety of discrete metrics, and we
    give concrete instantiations for Hamming distance. In particular, we give
    improved upper bounds on the overhead required for similarity defined by
    Hamming distance (r>1) and prove a lower bound showing qualitative optimality
    of the overhead required for similarity over any Hamming distance (r). Our main
    conceptual contribution is a connection between similarity search algorithms
    and certain graph-theoretic quantities. For our upper bounds, we exhibit a
    general method for designing one-round protocols using edge-isoperimetric
    shapes in similarity graphs. For our lower bounds, we define a new
    combinatorial optimization problem, which can be stated in purely
    graph-theoretic terms yet also captures the core of the analysis in previous
    theoretical work on distributed similarity joins. As one of our main technical
    results, we prove new bounds on distance correlations in subsets of the Hamming
    cube.


    Learning

    PixelVAE: A Latent Variable Model for Natural Images

    Ishaan Gulrajani, Kundan Kumar, Faruk Ahmed, Adrien Ali Taiga, Francesco Visin, David Vazquez, Aaron Courville
    Subjects: Learning (cs.LG)

    Natural image modeling is a landmark challenge of unsupervised learning.
    Variational Autoencoders (VAEs) learn a useful latent representation and model
    global structure well but have difficulty capturing small details. PixelCNN
    models details very well, but lacks a latent code and is difficult to scale for
    capturing large structures. We present PixelVAE, a VAE model with an
    autoregressive decoder based on PixelCNN. Our model requires very few expensive
    autoregressive layers compared to PixelCNN and learns latent codes that are
    more compressed than a standard VAE while still capturing most non-trivial
    structure. Finally, we extend our model to a hierarchy of latent variables at
    different scales. Our model achieves state-of-the-art performance on binarized
    MNIST, competitive performance on 64×64 ImageNet, and high-quality samples on
    the LSUN bedrooms dataset.

    Iterative Orthogonal Feature Projection for Diagnosing Bias in Black-Box Models

    Julius Adebayo, Lalana Kagal
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Predictive models are increasingly deployed for the purpose of determining
    access to services such as credit, insurance, and employment. Despite potential
    gains in productivity and efficiency, several potential problems have yet to be
    addressed, particularly the potential for unintentional discrimination. We
    present an iterative procedure, based on orthogonal projection of input
    attributes, for enabling interpretability of black-box predictive models.
    Through our iterative procedure, one can quantify the relative dependence of a
    black-box model on its input attributes.The relative significance of the inputs
    to a predictive model can then be used to assess the fairness (or
    discriminatory extent) of such a model.

    Robust Semi-Supervised Graph Classifier Learning with Negative Edge Weights

    Gene Cheung, Weng-Tai Su, Yu Mao, Chia-Wen Lin
    Comments: 13 pages, submitted to IEEE Journal of Selected Topics in Signal Processing (Special Issue on Graph Signal processing)
    Subjects: Learning (cs.LG)

    In a semi-supervised learning scenario, (possibly noisy) partially observed
    labels are used as input to train a classifier, in order to assign labels to
    unclassified samples. In this paper, we study this classifier learning problem
    from a graph signal processing (GSP) perspective. Specifically, by viewing a
    binary classifier as a piecewise constant graph-signal in a high-dimensional
    feature space, we cast classifier learning as a signal restoration problem via
    a classical maximum a posteriori (MAP) formulation. Unlike previous
    graph-signal restoration works, we consider in addition edges with negative
    weights that signify anti-correlation between samples. One unfortunate
    consequence is that the graph Laplacian matrix (mathbf{L}) can be indefinite,
    and previously proposed graph-signal smoothness prior (mathbf{x}^T mathbf{L}
    mathbf{x}) for candidate signal (mathbf{x}) can lead to pathological
    solutions. In response, we derive an optimal perturbation matrix
    (oldsymbol{Delta}) – based on a fast lower-bound computation of the minimum
    eigenvalue of (mathbf{L}) via a novel application of the Haynsworth inertia
    additivity formula—so that (mathbf{L} + oldsymbol{Delta}) is positive
    semi-definite, resulting in a stable signal prior. Further, instead of forcing
    a hard binary decision for each sample, we define the notion of generalized
    smoothness on graph that promotes ambiguity in the classifier signal. Finally,
    we propose an algorithm based on iterative reweighted least squares (IRLS) that
    solves the posed MAP problem efficiently. Extensive simulation results show
    that our proposed algorithm outperforms both SVM variants and graph-based
    classifiers using positive-edge graphs noticeably.

    Audio Event and Scene Recognition: A Unified Approach using Strongly and Weakly Labeled Data

    Anurag Kumar, Bhiksha Raj
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)

    In this paper we propose a novel learning framework called Supervised and
    Weakly Supervised Learning where the goal is to learn simultaneously from
    weakly and strongly labeled data. Strongly labeled data can be simply
    understood as fully supervised data where all labeled instances are available.
    In weakly supervised learning only weak labels are available. Our proposed
    framework is motivated by the fact that a small amount of strongly labeled data
    can give considerable improvement over only weakly supervised learning. The
    primary problem domain focus of this paper is acoustic event and scene
    detection in audio recordings. We first propose a naive formulation for
    leveraging labeled data in both forms. We then propose a more general framework
    for Supervised and Weakly Supervised Learning (SWSL). Based on this general
    framework, we propose a graph based approach for SWSL. Our main method is based
    on manifold regularization on graphs in which we show that the unified learning
    can be formulated as a constraint optimization problem which can be solved by
    iterative concave-convex procedure (CCCP). Our experiments show that our
    proposed framework can address several concerns of audio content analysis using
    weakly labeled data.

    Subgraph Detection with cues using Belief Propagation

    Arun Kadavankandy (MAESTRO), Konstantin Avrachenkov (MAESTRO), Laura Cottatellucci, Sundaresan Rajesh (ECE)
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS)

    We consider an Erdos-Renyi graph with (n) nodes and edge probability (q) that
    is embedded with a random subgraph of size (K) with edge probabilities (p) such
    that (p>q). We address the problem of detecting the subgraph nodes when only
    the graph edges are observed, along with some extra knowledge of a small
    fraction of subgraph nodes, called cued vertices or cues. We employ a local and
    distributed algorithm called belief propagation (BP). Recent works on subgraph
    detection without cues have shown that global maximum likelihood (ML) detection
    strictly outperforms BP in terms of asymptotic error rate, namely, there is a
    threshold condition that the subgraph parameters should satisfy below which BP
    fails in achieving asymptotically zero error, but ML succeeds. In contrast, we
    show that when the fraction of cues is strictly bounded away from zero, i.e.,
    when there exists non-trivial side-information, BP achieves zero asymptotic
    error even below this threshold, thus approaching the performance of ML
    detection.

    The Power of Normalization: Faster Evasion of Saddle Points

    Kfir Y. Levy
    Subjects: Learning (cs.LG)

    A commonly used heuristic in non-convex optimization is Normalized Gradient
    Descent (NGD) – a variant of gradient descent in which only the direction of
    the gradient is taken into account and its magnitude ignored. We analyze this
    heuristic and show that with carefully chosen parameters and noise injection,
    this method can provably evade saddle points. We establish the convergence of
    NGD to a local minimum, and demonstrate rates which improve upon the fastest
    known first order algorithm due to Ge e al. (2015).

    The effectiveness of our method is demonstrated via an application to the
    problem of online tensor decomposition; a task for which saddle point evasion
    is known to result in convergence to global minima.

    Robust Matrix Regression

    Hang Zhang, Fengyuan Zhu, Shixin Li
    Comments: 8 pages, 4 tables
    Subjects: Learning (cs.LG)

    Modern technologies are producing datasets with complex intrinsic structures,
    and they can be naturally represented as matrices instead of vectors. To
    preserve the latent data structures during processing, modern regression
    approaches incorporate the low-rank property to the model and achieve
    satisfactory performance for certain applications. These approaches all assume
    that both predictors and labels for each pair of data within the training set
    are accurate. However, in real-world applications, it is common to see the
    training data contaminated by noises, which can affect the robustness of these
    matrix regression methods. In this paper, we address this issue by introducing
    a novel robust matrix regression method. We also derive efficient proximal
    algorithms for model training. To evaluate the performance of our methods, we
    apply it to real world applications with comparative studies. Our method
    achieves the state-of-the-art performance, which shows the effectiveness and
    the practical value of our method.

    Oracle Complexity of Second-Order Methods for Finite-Sum Problems

    Yossi Arjevani, Ohad Shamir
    Comments: 23 pages
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)

    Finite-sum optimization problems are ubiquitous in machine learning, and are
    commonly solved using first-order methods which rely on gradient computations.
    Recently, there has been growing interest in emph{second-order} methods, which
    rely on both gradients and Hessians. In principle, second-order methods can
    require much fewer iterations than first-order methods, and hold the promise
    for more efficient algorithms. Although computing and manipulating Hessians is
    prohibitive for high-dimensional problems in general, the Hessians of
    individual functions in finite-sum problems can often be efficiently computed,
    e.g. because they possess a low-rank structure. Can second-order information
    indeed be used to solve such problems more efficiently? In this paper, we
    provide evidence that the answer — perhaps surprisingly — is negative, at
    least in terms of worst-case guarantees. However, we also discuss what
    additional assumptions and algorithmic approaches might potentially circumvent
    this negative result.

    Unsupervised Learning with Truncated Gaussian Graphical Models

    Qinliang Su, Xuejun Liao, Chunyuan Li, Zhe Gan, Lawrence Carin
    Comments: To appear in AAAI 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Gaussian graphical models (GGMs) are widely used for statistical modeling,
    because of ease of inference and the ubiquitous use of the normal distribution
    in practical approximations. However, they are also known for their limited
    modeling abilities, due to the Gaussian assumption. In this paper, we introduce
    a novel variant of GGMs, which relaxes the Gaussian restriction and yet admits
    efficient inference. Specifically, we impose a bipartite structure on the GGM
    and govern the hidden variables by truncated normal distributions. The
    nonlinearity of the model is revealed by its connection to rectified linear
    unit (ReLU) neural networks. Meanwhile, thanks to the bipartite structure and
    appealing properties of truncated normals, we are able to train the models
    efficiently using contrastive divergence. We consider three output constructs,
    accounting for real-valued, binary and count data. We further extend the model
    to deep structures and show that deep models can be used for unsupervised
    pre-training of rectifier neural networks. Extensive experimental results are
    provided to validate the proposed models and demonstrate their superiority over
    competing models.

    Constrained Low-Rank Learning Using Least Squares-Based Regularization

    Ping Li, Jun Yu, Meng Wang, Luming Zhang, Deng Cai, Xuelong Li
    Comments: 14 pages, 7 figures, accepted to appear in IEEE Transactions on Cybernetics
    Journal-ref: IEEE Transactions on Cybernetics, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Low-rank learning has attracted much attention recently due to its efficacy
    in a rich variety of real-world tasks, e.g., subspace segmentation and image
    categorization. Most low-rank methods are incapable of capturing
    low-dimensional subspace for supervised learning tasks, e.g., classification
    and regression. This paper aims to learn both the discriminant low-rank
    representation (LRR) and the robust projecting subspace in a supervised manner.
    To achieve this goal, we cast the problem into a constrained rank minimization
    framework by adopting the least squares regularization. Naturally, the data
    label structure tends to resemble that of the corresponding low-dimensional
    representation, which is derived from the robust subspace projection of clean
    data by low-rank learning. Moreover, the low-dimensional representation of
    original data can be paired with some informative structure by imposing an
    appropriate constraint, e.g., Laplacian regularizer. Therefore, we propose a
    novel constrained LRR method. The objective function is formulated as a
    constrained nuclear norm minimization problem, which can be solved by the
    inexact augmented Lagrange multiplier algorithm. Extensive experiments on image
    classification, human pose estimation, and robust face recovery have confirmed
    the superiority of our method.

    Multilinear Low-Rank Tensors on Graphs & Applications

    Nauman Shahid, Francesco Grassi, Pierre Vandergheynst
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    We propose a new framework for the analysis of low-rank tensors which lies at
    the intersection of spectral graph theory and signal processing. As a first
    step, we present a new graph based low-rank decomposition which approximates
    the classical low-rank SVD for matrices and multi-linear SVD for tensors. Then,
    building on this novel decomposition we construct a general class of convex
    optimization problems for approximately solving low-rank tensor inverse
    problems, such as tensor Robust PCA. The whole framework is named as
    ‘Multilinear Low-rank tensors on Graphs (MLRTG)’. Our theoretical analysis
    shows: 1) MLRTG stands on the notion of approximate stationarity of
    multi-dimensional signals on graphs and 2) the approximation error depends on
    the eigen gaps of the graphs. We demonstrate applications for a wide variety of
    4 artificial and 12 real tensor datasets, such as EEG, FMRI, BCI, surveillance
    videos and hyperspectral images. Generalization of the tensor concepts to
    non-euclidean domain, orders of magnitude speed-up, low-memory requirement and
    significantly enhanced performance at low SNR are the key aspects of our
    framework.

    AdversariaLib: An Open-source Library for the Security Evaluation of Machine Learning Algorithms Under Attack

    Igino Corona, Battista Biggio, Davide Maiorca
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

    We present AdversariaLib, an open-source python library for the security
    evaluation of machine learning (ML) against carefully-targeted attacks. It
    supports the implementation of several attacks proposed thus far in the
    literature of adversarial learning, allows for the evaluation of a wide range
    of ML algorithms, runs on multiple platforms, and has multi-processing enabled.
    The library has a modular architecture that makes it easy to use and to extend
    by implementing novel attacks and countermeasures. It relies on other
    widely-used open-source ML libraries, including scikit-learn and FANN.
    Classification algorithms are implemented and optimized in C/C++, allowing for
    a fast evaluation of the simulated attacks. The package is distributed under
    the GNU General Public License v3, and it is available for download at
    this http URL

    #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

    Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel
    Comments: 16 pages. Under review as a conference paper at ICLR 2017
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    Count-based exploration algorithms are known to perform near-optimally when
    used in conjunction with tabular reinforcement learning (RL) methods for
    solving small discrete Markov decision processes (MDPs). It is generally
    thought that count-based methods cannot be applied in high-dimensional state
    spaces, since most states will only occur once. Recent deep RL exploration
    strategies are able to deal with high-dimensional continuous state spaces
    through complex heuristics, often relying on optimism in the face of
    uncertainty or intrinsic motivation. In this work, we describe a surprising
    finding: a simple generalization of the classic count-based approach can reach
    near state-of-the-art performance on various high-dimensional and/or continuous
    deep RL benchmarks. States are mapped to hash codes, which allows to count
    their occurrences with a hash table. These counts are then used to compute a
    reward bonus according to the classic count-based exploration theory. We find
    that simple hash functions can achieve surprisingly good results on many
    challenging tasks. Furthermore, we show that a domain-dependent learned hash
    code may further improve these results. Detailed analysis reveals important
    aspects of a good hash function: 1) having appropriate granularity and 2)
    encoding information relevant to solving the MDP. This exploration strategy
    achieves near state-of-the-art performance on both continuous control tasks and
    Atari 2600 games, hence providing a simple yet powerful baseline for solving
    MDPs that require considerable exploration.

    A Generic Coordinate Descent Framework for Learning from Implicit Feedback

    Immanuel Bayer, Xiangnan He, Bhargav Kanagal, Steffen Rendle
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

    In recent years, interest in recommender research has shifted from explicit
    feedback towards implicit feedback data. A diversity of complex models has been
    proposed for a wide variety of applications. Despite this, learning from
    implicit feedback is still computationally challenging. So far, most work
    relies on stochastic gradient descent (SGD) solvers which are easy to derive,
    but in practice challenging to apply, especially for tasks with many items. For
    the simple matrix factorization model, an efficient coordinate descent (CD)
    solver has been previously proposed. However, efficient CD approaches have not
    been derived for more complex models.

    In this paper, we provide a new framework for deriving efficient CD
    algorithms for complex recommender models. We identify and introduce the
    property of k-separable models. We show that k-separability is a sufficient
    property to allow efficient optimization of implicit recommender problems with
    CD. We illustrate this framework on a variety of state-of-the-art models
    including factorization machines and Tucker decomposition. To summarize, our
    work provides the theory and building blocks to derive efficient implicit CD
    algorithms for complex recommender models.

    Implicit ReasoNet: Modeling Large-Scale Structured Relationships with Shared Memory

    Yelong Shen, Po-Sen Huang, Ming-Wei Chang, Jianfeng Gao
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    Recent studies on knowledge base completion, the task of recovering missing
    relationships based on recorded relations, demonstrate the importance of
    learning embeddings from multi-step relations. However, due to the size of
    knowledge bases, learning multi-step relations directly on top of observed
    instances could be costly. In this paper, we propose Implicit ReasoNets (IRNs),
    which is designed to perform large-scale inference implicitly through a search
    controller and shared memory. Unlike previous work, IRNs use training data to
    learn to perform multi-step inference through the shared memory, which is also
    jointly updated during training. While the inference procedure is not operating
    on top of observed instances for IRNs, our proposed model outperforms all
    previous approaches on the popular FB15k benchmark by more than 5.7%.


    Information Theory

    Linear Precoder Design for a MIMO Gaussian Wiretap Channel with Full-Duplex Source and Destination Nodes

    Lingxiang Li, Zhi Chen, A. P. Petropulu, Jun Fang
    Comments: 14 pages, 9 figures
    Subjects: Information Theory (cs.IT)

    We consider linear precoder design for a multiple-input multiple-output
    (MIMO) Gaussian wiretap channel, which comprises two legitimate nodes, i.e.,
    Alice and Bob, operating in Full-Duplex (FD) mode and exchanging confidential
    messages in the presence of a passive eavesdropper. Using the sum secrecy
    degrees of freedoms (sum S.D.o.F.) as reliability measure, we formulate an
    optimization problem with respect to the precoding matrices. In order to solve
    this problem, we first propose a cooperative secrecy transmission scheme, and
    prove that its feasible set is sufficient to achieve the maximum sum S.D.o.F..
    Based on that feasible set, we then determine the maximum achievable sum
    S.D.o.F. in closed form, and provide a method for constructing the precoding
    matrix pair which achieves the maximum sum S.D.o.F.. Results show that, the FD
    based network provides an attractive secrecy transmission rate performance.

    Performance Analysis of Partial Interference Cancellation in Multi-Antenna UDNs

    Italo Atzeni, Marios Kountouris
    Comments: Paper presented at Asilomar Conference on Signals, Systems and Computers, 2016
    Subjects: Information Theory (cs.IT)

    The employment of partial zero-forcing (PZF) receivers at the base stations
    represents an efficient and low-complexity technique for uplink interference
    management in cellular networks. In this paper, we focus on the performance
    analysis of ultra-dense networks (UDNs) in which the multi-antenna receivers
    adopt PZF. We provide both integral expressions and tight closed-form
    approximations for the probability of successful transmission, which can be
    used to accurately evaluate the optimal tradeoff between interference
    cancellation and array gain. Numerical results show that no more than half of
    the available degrees of freedom should be used for interference cancellation.

    On the Coded Caching Delivery Design over Wireless Networks

    Lei Zheng, Qifa Yan, Qingchun Chen, Xiaohu Tang
    Comments: 18 pages, 7 figures
    Subjects: Information Theory (cs.IT)

    Coded caching scheme is a promising technique to migrate the network burden
    in peak hours, which attains more prominent gains than the uncoded caching. The
    coded caching scheme can be classified into two types, namely, the centralized
    and the decentralized scheme, according to whether the placement procedures are
    carefully designed or operated at random. However, most of the previous
    analysis assumes that the connected links between server and users are
    error-free. In this paper, we explore the coded caching based delivery design
    in wireless networks, where all the connected wireless links are different. For
    both centralized and decentralized cases, we proposed two delivery schemes,
    namely, the orthogonal delivery scheme and the concurrent delivery scheme. We
    focus on the transmission time slots spent on satisfying the system requests,
    and prove that for both the centralized and the decentralized cases, the
    concurrent delivery always outperforms orthogonal delivery scheme. Furthermore,
    for the orthogonal delivery scheme, we derive the gap in terms of transmission
    time between the decentralized and centralized case, which is essentially no
    more than 1.5.

    Information Theoretic Performance of Periodogram-based CFO Estimation in Massive MU-MIMO Systems

    Sudarshan Mukherjee, Saif Khan Mohammed
    Comments: Submitted to IEEE ICC 2017. arXiv admin note: text overlap with arXiv:1605.06275
    Subjects: Information Theory (cs.IT)

    In this paper, we study the information theoretic performance of the modified
    time-reversal maximum ratio combining (TR-MRC) receiver (presented in [9]) with
    the spatially averaged periodogram-based carrier frequency offset (CFO)
    estimator (proposed in [7]) in multi-user massive MIMO systems. Our analysis
    shows that an (mathcal{O}(sqrt{M})) array gain is achieved with this
    periodogram-based CFO estimator, which is same as the array gain achieved in
    the ideal/zero CFO scenario ((M) is the number of base station antennas).
    Information theoretic performance comparison with the correlation-based CFO
    estimator for massive MIMO systems (proposed in [6]) reveals that this
    periodogram-based CFO estimator is more energy efficient in slowly time-varying
    channels.

    Stall Pattern Avoidance in Polynomial Product Codes

    Carlo Condo, Francois Leduc-Primeau, Gabi Sarkis, Pascal Giard, Warren Gross
    Comments: 4 pages, 2 figures, GlobalSiP 2016
    Subjects: Information Theory (cs.IT)

    Product codes are a concatenated error-correction scheme that has been often
    considered for applications requiring very low bit-error rates, which demand
    that the error floor be decreased as much as possible. In this work, we
    consider product codes constructed from polynomial algebraic codes, and propose
    a novel low-complexity post-processing technique that is able to improve the
    error-correction performance by orders of magnitude. We provide lower bounds
    for the error rate achievable under post processing, and present simulation
    results indicating that these bounds are tight.

    SIR Asymptotics in General Network Models

    Anjin Guo, Martin Haenggi, Radha Krishna Ganti
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Applications (stat.AP)

    In the performance analyses of wireless networks, asymptotic quantities and
    properties often pro- vide useful results and insights. The asymptotic analyses
    become especially important when complete analytical expressions of the
    performance metrics of interest are not available, which is often the case if
    one departs from very specific modeling assumptions. In this paper, we consider
    the asymptotics of the SIR distribution in general wireless network models,
    including ad hoc and cellular networks, simple and non-simple point processes,
    and singular and bounded path loss models, for which, in most cases, finding
    analytical expressions of the complete SIR distribution seems hopeless. We show
    that the lower tails of the SIR distributions decay polynomially with the order
    solely determined by the path loss exponent or the fading parameter, while the
    upper tails decay exponentially, with the exception of cellular networks with
    singular path loss. In addition, we analyze the impact of the nearest
    interferer on the asymptotic properties of the SIR distributions, and we
    formulate three crisp conjectures that -if true- determine the asymptotic
    behavior in many cases based on the large-scale path loss properties of the
    desired signal and/or nearest interferer only.

    An Overview on Resource Allocation Techniques for Multi-User MIMO Systems

    Eduardo Castañeda, Adão Silva, Atílio Gameiro, Marios Kountouris
    Comments: 45 pages, 12 figures, accepted in IEEE Communications Surveys and Tutorials
    Journal-ref: IEEE Communications Surveys and Tutorials, 2016
    Subjects: Information Theory (cs.IT)

    Remarkable research activities and major advances have been occurred over the
    past decade in multiuser multiple-input multiple-output (MU-MIMO) systems.
    Several transmission technologies and precoding techniques have been developed
    in order to exploit the spatial dimension so that simultaneous transmission of
    independent data streams reuse the same radio resources. The achievable
    performance of such techniques heavily depends on the channel characteristics
    of the selected users, the amount of channel knowledge, and how efficiently
    interference is mitigated. In systems where the total number of receivers is
    larger than the number of total transmit antennas, user selection becomes a key
    approach to benefit from multiuser diversity and achieve full multiplexing
    gain. The overall performance of MU-MIMO systems is a complex joint
    multi-objective optimization problem since many variables and parameters have
    to be optimized, including the number of users, the number of antennas, spatial
    signaling, rate and power allocation, and transmission technique. The objective
    of this literature survey is to provide a comprehensive overview of the various
    methodologies used to approach the aforementioned joint optimization task in
    the downlink of MU-MIMO communication systems.

    Gaussian mixtures: entropy and geometric inequalities

    Alexandros Eskenazis, Piotr Nayar, Tomasz Tkocz
    Subjects: Probability (math.PR); Information Theory (cs.IT); Functional Analysis (math.FA); Metric Geometry (math.MG)

    A symmetric random variable is called a Gaussian mixture if it has the same
    distribution as the product of two independent random variables, one being
    positive and the other a standard Gaussian random variable. Examples of
    Gaussian mixtures include random variables with densities proportional to
    (e^{-|t|^p}) and symmetric (p)-stable random variables, where (pin(0,2]). We
    obtain various sharp moment and entropy comparison estimates for weighted sums
    of independent Gaussian mixtures and investigate extensions of the B-inequality
    and the Gaussian correlation inequality in the context of Gaussian mixtures. We
    also obtain a correlation inequality for symmetric geodesically convex sets in
    the unit sphere equipped with the normalized surface area measure. We then
    apply these results to derive sharp constants in Khintchine inequalities for
    vectors uniformly distributed on the unit balls with respect to (p)-norms and
    provide short proofs to new and old comparison estimates for geometric
    parameters of sections and projections of such balls.

    Topological classifier for detecting the emergence of epileptic seizures

    Marco Piangerelli, Matteo Rucco, Emanuela Merelli
    Comments: Open data: Physionet data-set
    Subjects: Neurons and Cognition (q-bio.NC); Information Theory (cs.IT)

    In this work we study how to apply topological data analysis to create a
    method suitable to classify EEGs of patients affected by epilepsy. The
    topological space constructed from the collection of EEGs signals is analyzed
    by Persistent Entropy acting as a global topological feature for discriminating
    between healthy and epileptic signals. The Physionet data-set has been used for
    testing the classifier.

    Joint Load Balancing and Interference Mitigation in 5G Heterogeneous Networks

    Trung Kien Vu, Mehdi Bennis, Sumudu Samarakoon, Mérouane Debbah, Matti Latva-aho
    Comments: Submitted to the IEEE Transactions on Wireless Communications, full paper with a comprehensive performance analysis and other proofs, 16 pages
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    We study the problem of joint load balancing and interference mitigation in
    heterogeneous networks (HetNets) in which massive multiple-input
    multiple-output (MIMO) macro cell base station (BS) equipped with a large
    number of antennas, overlaid with wireless self-backhauled small cells (SCs)
    are assumed. Self-backhauled SC BSs with full-duplex communication employing
    regular antenna arrays serve both macro users and SC users by using the
    wireless backhaul from macro BS in the same frequency band. We formulate the
    joint load balancing and interference mitigation problem as a network utility
    maximization subject to wireless backhaul constraints. Subsequently, leveraging
    the framework of stochastic optimization, the problem is decoupled into dynamic
    scheduling of macro cell users, backhaul provisioning of SCs, and offloading
    macro cell users to SCs as a function of interference and backhaul links. Via
    numerical results, we show the performance gains of our proposed framework
    under the impact of small cells density, number of base station antennas, and
    transmit power levels at low and high frequency bands. We further provide
    insights into the performance analysis and convergence of the proposed
    framework. The numerical results show that the proposed user association
    algorithm outperforms other baselines. Interestingly, we find that even at
    lower frequency band the performance of open access small cell is close to that
    of closed access at some operating points, the open access full- duplex small
    cell still yields higher gain as compared to the closed access at higher
    frequency bands. With increasing the small cell density or the wireless
    backhaul quality, the open access full- duplex small cells outperform and
    achieve a 5.6x gain in terms of cell-edge performance as compared to the closed
    access ones in ultra-dense networks with 350 small cell base stations per km2 .

    Improved Handover Through Dual Connectivity in 5G mmWave Mobile Networks

    Michele Polese, Marco Giordani, Marco Mezzavilla, Sundeep Rangan, Michele Zorzi
    Comments: 33 pages, 14 figures, submitted to the IEEE JSAC Special Issue on Millimeter Wave Communications for Future Mobile Networks
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    The millimeter wave (mmWave) bands offer the possibility of orders of
    magnitude greater throughput for fifth generation (5G) cellular systems.
    However, since mmWave signals are very susceptible to blockage, channel quality
    on any one mmWave link can be extremely intermittent. This paper implements a
    novel dual connectivity protocol that enables mobile user equipment (UE)
    devices to maintain physical layer connections to 4G and 5G cells
    simultaneously. A novel uplink control signaling system combined with a local
    coordinator enables rapid path switching in the event of failures on any one
    link. Based on this approach, this paper provides the first comprehensive
    evaluation of handover mechanisms in mmWave cellular systems. The simulation
    framework includes detailed measurement-based channel models to realistically
    capture spatial dynamics of blocking events, as well as the full details of
    MAC, RLC and end-to-end transport protocols. The study reveals significant
    benefits of the proposed method under several metrics, compared to conventional
    handover mechanisms.

    Real-time kinematic positioning of LEO satellites using a single-frequency GPS receiver

    Pei Chen, Jian Zhang, Xiucong Sun
    Comments: 27 pages, 8 figures, ready for publication in GPS Solutions
    Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Information Theory (cs.IT)

    Due to their low cost and low power consumption, single-frequency GPS
    receivers are considered suitable for low-cost space applications such as small
    satellite missions. Recently, requirements have emerged for real-time accurate
    orbit determination at sub-meter level in order to carry out onboard geocoding
    of high-resolution imagery, open-loop operation of altimeters and radio
    occultation. This study proposes an improved real-time kinematic positioning
    method for LEO satellites using single-frequency receivers. The C/A code and L1
    phase are combined to eliminate ionospheric effects. The epoch-differenced
    carrier phase measurements are utilized to acquire receiver position changes
    which are further used to smooth the absolute positions. A kinematic Kalman
    filter is developed to implement kinematic orbit determination. Actual flight
    data from China small satellite SJ-9A are used to test the navigation
    performance. Results show that the proposed method outperforms traditional
    kinematic positioning method in terms of accuracy. A 3D position accuracy of
    0.72 m and 0.79 m has been achieved using the predicted portion of IGS
    ultra-rapid products and broadcast ephemerides, respectively.

    Fisher Information Framework for Time Series Modeling

    R. C. Venkatesan, A. Plastino
    Comments: 30 pages, 8 figures
    Subjects: Methodology (stat.ME); Information Theory (cs.IT); Chaotic Dynamics (nlin.CD); Data Analysis, Statistics and Probability (physics.data-an)

    A robust prediction model invoking the Takens embedding theorem, whose
    extit{working hypothesis} is obtained via an inference procedure based on the
    minimum Fisher information principle, is presented. The coefficients of the
    ansatz, central to the extit{working hypothesis} satisfy a time independent
    Schr”{o}dinger-like equation in a vector setting. The inference of i) the
    probability density function of the coefficients of the extit{working
    hypothesis} and ii) the establishing of constraint driven pseudo-inverse
    condition for the modeling phase of the prediction scheme, is made, for the
    case of normal distributions, with the aid of the quantum mechanical virial
    theorem. The well-known reciprocity relations and the associated Legendre
    transform structure for the Fisher information measure (FIM, hereafter)-based
    model in a vector setting (with least square constraints) are self-consistently
    derived. These relations are demonstrated to yield an intriguing form of the
    FIM for the modeling phase, which defines the extit{working hypothesis},
    solely in terms of the observed data. Cases for prediction employing time
    series’ obtained from the: ((i)) the Mackey-Glass delay-differential equation,
    ((ii)) one ECG sample from the MIT-Beth Israel Deaconess Hospital (MIT-BIH)
    cardiac arrhythmia database, and ((iii)) one ECG from the Creighton University
    ventricular tachyarrhythmia database. The ECG samples were obtained from the
    Physionet online repository. These examples demonstrate the efficiency of the
    prediction model. Numerical examples for exemplary cases are provided.




沪ICP备19023445号-2号
友情链接