IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 15 Nov 2016

    我爱机器学习(52ml.net)发表于 2016-11-15 00:00:00
    love 0

    Neural and Evolutionary Computing

    Deep Learning with Sets and Point Clouds

    Siamak Ravanbakhsh, Jeff Schneider, Barnabas Poczos
    Comments: under review at ICLR
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We study a simple notion of structural invariance that readily suggests a
    parameter-sharing scheme in deep neural networks. In particular, we define
    structure as a collection of relations, and derive graph convolution and
    recurrent neural networks as special cases. We study composition of basic
    structures in defining models that are invariant to more complex “product”
    structures such as graph of graphs, sets of images or sequence of sets. For
    demonstration, our experimental results are focused on the setting where the
    discrete structure of interest is a set. We present results on several novel
    and non-trivial problems on sets, including semi-supervised learning using
    clustering information, point-cloud classification and set outlier detection.

    Generate Models and Model Criticism via Optimized Maximum Mean Discrepancy

    Dougal J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Aaditya Ramdas, Alex Smola, Arthur Gretton
    Comments: Submitted to ICLR 2017; public comments at this http URL
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Methodology (stat.ME)

    We propose a method to optimize the representation and distinguishability of
    samples from two probability distributions, by maximizing the estimated power
    of a statistical test based on the maximum mean discrepancy (MMD). This
    optimized MMD is applied to the setting of unsupervised learning by generative
    adversarial networks (GAN), in which a model attempts to generate realistic
    samples, and a discriminator attempts to tell these apart from data samples. In
    this context, the MMD may be used in two roles: first, as a discriminator,
    either directly on the samples, or on features of the samples. Second, the MMD
    can be used to evaluate the performance of a generative model, by testing the
    model’s samples against a reference data set. In the latter role, the optimized
    MMD is particularly helpful, as it gives an interpretable indication of how the
    model and data distributions differ, even in cases where individual model
    samples are not easily distinguished either by eye or by classifier.

    Advancing Memristive Analog Neuromorphic Networks: Increasing Complexity, and Coping with Imperfect Hardware Components

    F. Merrikh Bayat, M. Prezioso, B. Chakrabarti, I. Kataeva, D. B. Strukov
    Comments: 4 pages, 13 figures
    Subjects: Emerging Technologies (cs.ET); Neural and Evolutionary Computing (cs.NE)

    We experimentally demonstrate classification of 4×4 binary images into 4
    classes, using a 3-layer mixed-signal neuromorphic network (“MLP perceptron”),
    based on two passive 20×20 memristive crossbar arrays, board-integrated with
    discrete CMOS components. The network features 10 hidden-layer and 4
    output-layer analog CMOS neurons and 428 metal-oxide memristors, i.e. is almost
    an order of magnitude more complex than any previously reported functional
    memristor circuit. Moreover, the inference operation of this classifier is
    performed entirely in the integrated hardware. To deal with larger crossbar
    arrays, we have developed a semi-automatic approach to their forming and
    testing, and compared several memristor training schemes for coping with
    imperfect behavior of these devices, as well as with variability of analog CMOS
    neurons. The effectiveness of the proposed schemes for defect and variation
    tolerance was verified experimentally using the implemented network and,
    additionally, by modeling the operation of a larger network, with 300
    hidden-layer neurons, on the MNIST benchmark. Finally, we propose a simple
    modification of the implemented memristor-based vector-by-matrix multiplier to
    allow its operation in a wider temperature range.

    Attending to Characters in Neural Sequence Labeling Models

    Marek Rei, Gamal K.O. Crichton, Sampo Pyysalo
    Comments: Proceedings of COLING 2016
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Sequence labeling architectures use word embeddings for capturing similarity,
    but suffer when handling previously unseen or rare words. We investigate
    character-level extensions to such models and propose a novel architecture for
    combining alternative word representations. By using an attention mechanism,
    the model is able to dynamically decide how much information to use from a
    word- or character-level component. We evaluated different architectures on a
    range of sequence labeling datasets, and character-level extensions were found
    to improve performance on every benchmark. In addition, the proposed
    attention-based architecture delivered the best results even with a smaller
    number of trainable parameters.

    Identity Matters in Deep Learning

    Moritz Hardt, Tengyu Ma
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    An emerging design principle in deep learning is that each layer of a deep
    artificial neural network should be able to easily express the identity
    transformation. This idea not only motivated various normalization techniques,
    such as emph{batch normalization}, but was also key to the immense success of
    emph{residual networks}.

    In this work, we put the principle of emph{identity parameterization} on a
    more solid theoretical footing alongside further empirical progress. We first
    give a strikingly simple proof that arbitrarily deep linear residual networks
    have no spurious local optima. The same result for linear feed-forward networks
    in their standard parameterization is substantially more delicate. Second, we
    show that residual networks with ReLu activations have universal finite-sample
    expressivity in the sense that the network can represent any function of its
    sample provided that the model has more parameters than the sample size.

    Directly inspired by our theory, we experiment with a radically simple
    residual architecture consisting of only residual convolutional layers and ReLu
    activations, but no batch normalization, dropout, or max pool. Our model
    improves significantly on previous all-convolutional networks on the CIFAR10,
    CIFAR100, and ImageNet classification benchmarks.


    Computer Vision and Pattern Recognition

    3-D Convolutional Neural Networks for Glioblastoma Segmentation

    Darvin Yi, Mu Zhou, Zhao Chen, Olivier Gevaert
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional Neural Networks (CNN) have emerged as powerful tools for
    learning discriminative image features. In this paper, we propose a framework
    of 3-D fully CNN models for Glioblastoma segmentation from multi-modality MRI
    data. By generalizing CNN models to true 3-D convolutions in learning 3-D tumor
    MRI data, the proposed approach utilizes a unique network architecture to
    decouple image pixels. Specifically, we design a convolutional layer with
    pre-defined Difference- of-Gaussian (DoG) filters to perform true 3-D
    convolution incorporating local neighborhood information at each pixel. We then
    use three trained convolutional layers that act to decouple voxels from the
    initial 3-D convolution. The proposed framework allows identification of
    high-level tumor structures on MRI. We evaluate segmentation performance on the
    BRATS segmentation dataset with 274 tumor samples. Extensive experimental
    results demonstrate encouraging performance of the proposed approach comparing
    to the state-of-the-art methods. Our data-driven approach achieves a median
    Dice score accuracy of 89% in whole tumor glioblastoma segmentation, revealing
    a generalized low-bias possibility to learn from medium-size MRI datasets.

    Fast Task-Specific Target Detection via Graph Based Constraints Representation and Checking

    Went Luan, Yezhou Yang, Cornelia Fermuller, John S. Baras
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this work, we present a fast target detection framework for real-world
    robotics applications. Considering that an intelligent agent attends to a
    task-specific object target during execution, our goal is to detect the object
    efficiently. We propose the concept of early recognition, which influences the
    candidate proposal process to achieve fast and reliable detection performance.
    To check the target constraints efficiently, we put forward a novel policy to
    generate a sub-optimal checking order, and prove that it has bounded time cost
    compared to the optimal checking sequence, which is not achievable in
    polynomial time. Experiments on two different scenarios: 1) rigid object and 2)
    non-rigid body part detection validate our pipeline. To show that our method is
    widely applicable, we further present a human-robot interaction system based on
    our non-rigid body part detection.

    Can fully convolutional networks perform well for general image restoration problems?

    Subhajit Chaudhury, Hiya Roy
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a fully convolutional network(FCN) based approach for color image
    restoration. Fully convolutional networks have recently shown remarkable
    performance for high level vision problems like semantic segmentation. In this
    paper, we investigate if fully convolutional networks can show promising
    performance for low level problems like image restoration as well. We propose a
    FCN model, that learns a direct end-to-end mapping between the corrupted images
    as input and the desired clean images as output. Experimental results show that
    our FCN model outperforms traditional sparse coding based methods and
    demonstrates competitive performance compared to the state-of-the-art methods
    for image denoising. We further show that our proposed model can solve the
    difficult problem of blind image inpainting and can produce reconstructed
    images of impressive visual quality.

    Automatic discovery of discriminative parts as a quadratic assignment problem

    Ronan Sicre, Julien Rabin, Yannis Avrithis, Teddy Furon, Frederic Jurie
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Part-based image classification consists in representing categories by small
    sets of discriminative parts upon which a representation of the images is
    built. This paper addresses the question of how to automatically learn such
    parts from a set of labeled training images. The training of parts is cast as a
    quadratic assignment problem in which optimal correspondences between image
    regions and parts are automatically learned. The paper analyses different
    assignment strategies and thoroughly evaluates them on two public datasets:
    Willow actions and MIT 67 scenes. State-of-the art results are obtained on
    these datasets.

    Joint Graph Decomposition and Node Labeling by Local Search

    Evgeny Levinkov, Siyu Tang, Eldar Insafutdinov, Bjoern Andres
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Discrete Mathematics (cs.DM)

    We state a combinatorial optimization problem whose feasible solutions define
    both a decomposition and a node labeling of a given graph. This problem offers
    a common mathematical abstraction of seemingly unrelated computer vision tasks,
    including instance-separating semantic segmentation, articulated human body
    pose estimation and multiple object tracking. Conceptually, the problem we
    propose generalizes the unconstrained integer quadratic program and the minimum
    cost lifted multicut problem, both of which are NP-hard. In order to find
    feasible solutions efficiently, we define a local search algorithm that
    converges monotonously to a local optimum, offering a feasible solution at any
    time. To demonstrate the effectiveness of this algorithm in solving computer
    vision tasks, we report running times and competitive solutions for two
    above-mentioned applications.

    Selfie Detection by Synergy-Constraint Based Convolutional Neural Network

    Yashas Annadani, Vijayakrishna Naganoor, Akshay Kumar Jagadish, Krishnan Chemmangat
    Comments: 8 Pages, Accepted for Publication at IEEE SITIS 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Categorisation of huge amount of data on the multimedia platform is a crucial
    task. In this work, we propose a novel approach to address the subtle problem
    of selfie detection for image database segregation on the web, given rapid rise
    in number of selfies clicked. A Convolutional Neural Network (CNN) is modeled
    to learn a synergy feature in the common subspace of head and shoulder
    orientation, derived from Local Binary Pattern (LBP) and Histogram of Oriented
    Gradients (HOG) features respectively. This synergy was captured by projecting
    the aforementioned features using Canonical Correlation Analysis (CCA). We show
    that the resulting network’s convolutional activations in the neighbourhood of
    spatial keypoints captured by SIFT are discriminative for selfie-detection. In
    general, proposed approach aids in capturing intricacies present in the image
    data and has the potential for usage in other subtle image analysis scenarios
    apart from just selfie detection. We investigate and analyse the performance of
    popular CNN architectures (GoogleNet, AlexNet), used for other image
    classification tasks, when subjected to the task of detecting the selfies on
    the multimedia platform. The results of the proposed approach are compared with
    these popular architectures on a dataset of ninety thousand images comprising
    of roughly equal number of selfies and non-selfies. Experimental results on
    this dataset shows the effectiveness of the proposed approach.

    Herding Generalizes Diverse M -Best Solutions

    Ece Ozkan, Gemma Roig, Orcun Goksel, Xavier Boix
    Comments: 10 pages, 2 algorithms, 2 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We show that the algorithm to extract diverse M -solutions from a Conditional
    Random Field (called divMbest [1]) takes exactly the form of a Herding
    procedure [2], i.e. a deterministic dynamical system that produces a sequence
    of hypotheses that respect a set of observed moment constraints. This
    generalization enables us to invoke properties of Herding that show that
    divMbest enforces implausible constraints which may yield wrong assumptions for
    some problem settings. Our experiments in semantic segmentation demonstrate
    that seeing divMbest as an instance of Herding leads to better alternatives for
    the implausible constraints of divMbest.

    A DNN Framework For Text Image Rectification From Planar Transformations

    Chengzhe Yan, Jie Hu, Changshui Zhang
    Comments: 9 pages, 10 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, a novel neural network architecture is proposed attempting to
    rectify text images with mild assumptions. A new dataset of text images is
    collected to verify our model and open to public. We explored the capability of
    deep neural network in learning geometric transformation and found the model
    could segment the text image without explicit supervised segmentation
    information. Experiments show the architecture proposed can restore planar
    transformations with wonderful robustness and effectiveness.

    Baseline CNN structure analysis for facial expression recognition

    Minchul Shin, Munsang Kim, Dong-Soo Kwon
    Comments: 6 pages, RO-MAN2016 Conference
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a baseline convolutional neural network (CNN) structure and image
    preprocessing methodology to improve facial expression recognition algorithm
    using CNN. To analyze the most efficient network structure, we investigated
    four network structures that are known to show good performance in facial
    expression recognition. Moreover, we also investigated the effect of input
    image preprocessing methods. Five types of data input (raw, histogram
    equalization, isotropic smoothing, diffusion-based normalization, difference of
    Gaussian) were tested, and the accuracy was compared. We trained 20 different
    CNN models (4 networks x 5 data input types) and verified the performance of
    each network with test images from five different databases. The experiment
    result showed that a three-layer structure consisting of a simple convolutional
    and a max pooling layer with histogram equalization image input was the most
    efficient. We describe the detailed training procedure and analyze the result
    of the test accuracy based on considerable observation.

    Multi-Shot Mining Semantic Part Concepts in CNNs

    Quanshi Zhang, Ruiming Cao, Ying Nian Wu, Song-Chun Zhu
    Comments: accepted for presentation at the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes a new learning strategy that incrementally embeds new
    object-part concepts into a pre-trained convolutional neural network (CNN), in
    order to 1) explore explicit semantics for the CNN units and 2) gradually
    transfer the pre-trained CNN into a “white-box” model for hierarchical object
    understanding. Given part annotations on a very small number (e.g. 3–12) of
    objects, our method mines certain units from the pre-trained CNN and associate
    them with different part concepts. We use a four-layer And-Or graph to organize
    the CNN units, which clarifies their internal semantic hierarchy. Our method is
    guided by a small number of part annotations, and it achieves superior
    part-localization performance (about 28%–107% improvement in part center
    prediction).

    Convolutional Regression for Visual Tracking

    Kai Chen, Wenbing Tao
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recently, discriminatively learned correlation filters (DCF) has drawn much
    attention in visual object tracking community. The success of DCF is
    potentially attributed to the fact that a large amount of samples are utilized
    to train the ridge regression model and predict the location of object. To
    solve the regression problem in an efficient way, these samples are all
    generated by circularly shifting from a search patch. However, these synthetic
    samples also induce some negative effects which decrease the robustness of DCF
    based trackers.

    In this paper, we propose a Convolutional Regression framework for visual
    tracking (CRT). Instead of learning the linear regression model in a closed
    form, we try to solve the regression problem by optimizing a one-channel-output
    convolution layer with Gradient Descent (GD). In particular, the receptive
    field size of the convolution layer is set to the size of object. Contrary to
    DCF, it is possible to incorporate all “real” samples clipped from the whole
    image. A critical issue of the GD approach is that most of the convolutional
    samples are negative and the contribution of positive samples will be
    submerged. To address this problem, we propose a novel Automatic Hard Negative
    Mining method to eliminate easy negatives and enhance positives. Extensive
    experiments are conducted on a widely-used benchmark with 100 sequences. The
    results show that the proposed algorithm achieves outstanding performance and
    outperforms almost all the existing DCF based algorithms.

    Semi-Dense 3D Semantic Mapping from Monocular SLAM

    Xuanpeng Li, Rachid Belaroussi
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The bundle of geometry and appearance in computer vision has proven to be a
    promising solution for robots across a wide variety of applications. Stereo
    cameras and RGB-D sensors are widely used to realise fast 3D reconstruction and
    trajectory tracking in a dense way. However, they lack flexibility of seamless
    switch between different scaled environments, i.e., indoor and outdoor scenes.
    In addition, semantic information are still hard to acquire in a 3D mapping. We
    address this challenge by combining the state-of-art deep learning method and
    semi-dense Simultaneous Localisation and Mapping (SLAM) based on video stream
    from a monocular camera. In our approach, 2D semantic information are
    transferred to 3D mapping via correspondence between connective Keyframes with
    spatial consistency. There is no need to obtain a semantic segmentation for
    each frame in a sequence, so that it could achieve a reasonable computation
    time. We evaluate our method on indoor/outdoor datasets and lead to an
    improvement in the 2D semantic labelling over baseline single frame
    predictions.

    Hand Gesture Recognition for Contactless Device Control in Operating Rooms

    Ebrahim Nasr-Esfahani, Nader Karimi, S.M. Reza Soroushmehr, M. Hossein Jafari, M. Amin Khorsandi, Shadrokh Samavi, Kayvan Najarian
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Hand gesture is one of the most important means of touchless communication
    between human and machines. There is a great interest for commanding electronic
    equipment in surgery rooms by hand gesture for reducing the time of surgery and
    the potential for infection. There are challenges in implementation of a hand
    gesture recognition system. It has to fulfill requirements such as high
    accuracy and fast response. In this paper we introduce a system of hand gesture
    recognition based on a deep learning approach. Deep learning is known as an
    accurate detection model, but its high complexity prevents it from being
    fabricated as an embedded system. To cope with this problem, we applied some
    changes in the structure of our work to achieve low complexity. As a result,
    the proposed method could be implemented on a naive embedded system. Our
    experiments show that the proposed system results in higher accuracy while
    having less complexity in comparison with the existing comparable methods.

    Automated Inference on Criminality using Face Images

    Xiaolin Wu, Xi Zhang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We study, for the first time, automated inference on criminality based solely
    on still face images. Via supervised machine learning, we build four
    classifiers (logistic regression, KNN, SVM, CNN) using facial images of 1856
    real persons controlled for race, gender, age and facial expressions, nearly
    half of whom were convicted criminals, for discriminating between criminals and
    non-criminals. All four classifiers perform consistently well and produce
    evidence for the validity of automated face-induced inference on criminality,
    despite the historical controversy surrounding the topic. Also, we find some
    discriminating structural features for predicting criminality, such as lip
    curvature, eye inner corner distance, and the so-called nose-mouth angle. Above
    all, the most important discovery of this research is that criminal and
    non-criminal face images populate two quite distinctive manifolds. The
    variation among criminal faces is significantly greater than that of the
    non-criminal faces. The two manifolds consisting of criminal and non-criminal
    faces appear to be concentric, with the non-criminal manifold lying in the
    kernel with a smaller span, exhibiting a law of normality for faces of
    non-criminals. In other words, the faces of general law-biding public have a
    greater degree of resemblance compared with the faces of criminals, or
    criminals have a higher degree of dissimilarity in facial appearance than
    normal people.

    Multi-class Generative Adversarial Networks with the L2 Loss Function

    Xudong Mao, Qing Li, Haoran Xie, Raymond Y.K. Lau, Zhen Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Generative adversarial networks (GANs) have achieved huge success in
    unsupervised learning. Most of GANs treat the discriminator as a classifier
    with the binary sigmoid cross entropy loss function. However, we find that the
    sigmoid cross entropy loss function will sometimes lead to the saturation
    problem in GANs learning. In this work, we propose to adopt the L2 loss
    function for the discriminator. The properties of the L2 loss function can
    improve the stabilization of GANs learning. With the usage of the L2 loss
    function, we propose the multi-class generative adversarial networks for the
    purpose of image generation with multiple classes. We evaluate the multi-class
    GANs on a handwritten Chinese characters dataset with 3740 classes. The
    experiments demonstrate that the multi-class GANs can generate elegant images
    on datasets with a large number of classes. Comparison experiments between the
    L2 loss function and the sigmoid cross entropy loss function are also conducted
    and the results demonstrate the stabilization of the L2 loss function.

    Leveraging Video Descriptions to Learn Video Question Answering

    Kuo-Hao Zeng, Tseng-Hung Chen, Ching-Yao Chuang, Yuan-Hong Liao, Juan Carlos Niebles, Min Sun
    Comments: 7 pages, 5 figures. Accepted to AAAI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)

    We propose a scalable approach to learn video-based question answering (QA):
    answer a “free-form natural language question” about a video content. Our
    approach automatically harvests a large number of videos and descriptions
    freely available online. Then, a large number of candidate QA pairs are
    automatically generated from descriptions rather than manually annotated. Next,
    we use these candidate QA pairs to train a number of video-based QA methods
    extended fromMN (Sukhbaatar et al. 2015), VQA (Antol et al. 2015), SA (Yao et
    al. 2015), SS (Venugopalan et al. 2015). In order to handle non-perfect
    candidate QA pairs, we propose a self-paced learning procedure to iteratively
    identify them and mitigate their effects in training. Finally, we evaluate
    performance on manually generated video-based QA pairs. The results show that
    our self-paced learning procedure is effective, and the extended SS model
    outperforms various baselines.

    Optimized clothes segmentation to boost gender classification in unconstrained scenarios

    D. Freire-Obregón, M. Castrillón-Santana, J. Lorenzo-Navarro
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Several applications require demographic information of ordinary people in
    unconstrained scenarios. This is not a trivial task due to significant human
    appearance variations. In this work, we introduce trixels for clustering image
    regions, enumerating their advantages compared to superpixels. The classical
    GrabCut algorithm is later modified to segment trixels instead of pixels in an
    unsupervised context. Combining with face detection lead us to a clothes
    segmentation approach close to real time. The study uses the challenging Pascal
    VOC dataset for segmentation evaluation experiments. A final experiment
    analyzes the fusion of clothes features with state-of-the-art gender
    classifiers in ClothesDB, revealing a significant performance improvement in
    gender classification.

    Autonomous Learning Framework Based on Online Hybrid Classifier for Multi-view Object Detection in Video

    Dapeng Luo, Zhipeng Zeng, Longsheng Wei, Yongwen Liu, Chen Luo, Jun Chen, Nong Sang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, our proposed framework takes a remarkably different direction
    to resolve multi-view detection problem in a bottom-up fashion. Firstly, a
    state of the art scene-specific objector is obtained from a fully autonomous
    learning process triggered by marking several bounding boxes around the object
    in the first video frame via a mouse, where the human labeled training data or
    a generic detector are not needed. Secondly, this learning process is
    conveniently replicated many times in different surveillance scenes and result
    in a particular detector under various camera viewpoints. Thus, the proposed
    framework can be employed in multi-view object detection application with
    unsupervised manner. Obviously, the initial scene-specific detector, initialed
    by several bounding boxes, exhibits poor detection performance and difficult to
    improve by traditional online learning algorithm. Consequently, in our
    self-learning process, we propose to partition detection responses space by
    online hybrid classifiers and assign each partition individual classifier that
    achieves the high classification accuracy progressively. A novel online gradual
    learning algorithm is proposed to train the hybrid classifiers automatically
    and focus online learning on the hard samples, the most informative samples
    lying around the decision boundary. The output is a hybrid classifier based
    scene-specific detector which achieves decent performance under different
    viewing angles. Experimental results on several video datasets show that our
    approach achieves comparable performance to robust supervised methods and
    outperforms the state of the art online learning methods in varying imaging
    conditions.

    When Fashion Meets Big Data: Discriminative Mining of Best Selling Clothing Features

    Kuan-Ting Chen, Jiebo Luo
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    With the prevalence of e-commence websites and the ease of online shopping,
    consumers are embracing huge amounts of various options in products.
    Undeniably, shopping is one of the most essential activities in our society and
    studying consumer’s shopping behavior is important for the industry as well as
    sociology and psychology. Not surprisingly, one of the most popular e-commerce
    categories is clothing business. There arises the needs for analysis of popular
    and attractive clothing features which could further boost many emerging
    applications, such as clothing recommendation and advertising. In this work, we
    design a novel system that consists of three major components: 1) exploring and
    organizing a large-scale clothing dataset from a online shopping website, 2)
    pruning and extracting images of best-selling products in clothing item data
    and user transaction history, and 3) utilizing a machine learning based
    approach to discovering clothing attributes as the representative and
    discriminative characteristics of popular clothing style elements. Through the
    experiments over a large-scale online clothing dataset, we demonstrate the
    effectiveness of our proposed system, and obtain useful insights on clothing
    consumption trends and profitable clothing features.

    Effective sparse representation of X-Ray medical images

    Laura Rebollo-Neira
    Comments: Routines for implementing the approach are available on this http URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Effective sparse representation of X-Ray medical images within the context of
    data reduction is considered. The proposed framework is shown to render an
    enormous reduction in the cardinality of the data set required to represent
    this class of images at very good quality. The particularity of the approach is
    that it can be implemented at very competitive processing time and low memory
    requirements

    Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot

    Hideki Nakayama, Noriki Nishida
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

    We propose an approach to build a neural machine translation system with no
    supervised resources (i.e., no parallel corpora) using multimodal embedded
    representation over texts and images. Based on the assumption that text
    documents are often likely to be described with other multimedia information
    (e.g., images) somewhat related to the content, we try to indirectly estimate
    the relevance between two languages. Using multimedia as the “pivot”, we
    project all modalities into one common hidden space where samples belonging to
    similar semantic concepts should come close to each other, whatever the
    observed space of each sample is. This modality-agnostic representation is the
    key to bridging the gap between different modalities. Putting a decoder on top
    of it, our network can flexibly draw the outputs from any input modality.
    Notably, in the testing phase, we need only source language texts as the input
    for translation. In experiments, we tested our method on two benchmarks to show
    that it can achieve reasonable translation performance. We compared and
    investigated several possible implementations and found that an end-to-end
    model that simultaneously optimized both rank loss in multimodal encoders and
    cross-entropy loss in decoders performed the best.

    (CAD)(^2)RL: Real Single-Image Flight without a Single Real Image

    Fereshteh Sadeghi, Sergey Levine
    Comments: 11 pages
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    We propose (CAD)(^2)RL, a flight controller for Collision Avoidance via Deep
    Reinforcement Learning that can be used to perform collision-free flight in the
    real world although it is trained entirely in a 3D CAD model simulator. Our
    method uses only single RGB images from a monocular camera mounted on the robot
    as the input and is specialized for indoor hallway following and obstacle
    avoidance. In contrast to most indoor navigation techniques that aim to
    directly reconstruct the 3D geometry of the environment, our approach directly
    predicts the probability of collision given the current monocular image and a
    candidate action. To obtain accurate predictions, we develop a deep
    reinforcement learning algorithm for learning indoor navigation, which uses the
    actual performance of the current policy to construct accurate supervision. The
    collision prediction model is represented by a deep convolutional neural
    network that directly processes raw image inputs. Our collision avoidance
    system is entirely trained in simulation and thus addresses the high sample
    complexity of deep reinforcement learning and avoids the dangers of
    trial-and-error learning in the real world. By highly randomizing the rendering
    settings for our simulated training set, we show that we can train a collision
    predictor that generalizes to new environments with substantially different
    appearance from the training scenarios. Finally, we evaluate our method in the
    real world by controlling a real quadrotor flying through real hallways. We
    demonstrate that our method can perform real-world collision avoidance and
    hallway following after being trained exclusively on synthetic images, without
    ever having seen a single real image at the training time. For supplementary
    video see: this https URL


    Artificial Intelligence

    Feature Engineering and Ensemble Modeling for Paper Acceptance Rank Prediction

    Yujie Qian, Yinpeng Dong, Ye Ma, Hailong Jin, Juanzi Li
    Comments: 2nd place winner report of KDD Cup 2016. More details can be found at this https URL
    Subjects: Artificial Intelligence (cs.AI)

    Measuring research impact and ranking academic achievement are important and
    challenging problems. Having an objective picture of research institution is
    particularly valuable for students, parents and funding agencies, and also
    attracts attention from government and industry. KDD Cup 2016 proposes the
    paper acceptance rank prediction task, in which the participants are asked to
    rank the importance of institutions based on predicting how many of their
    papers will be accepted at the 8 top conferences in computer science. In our
    work, we adopt a three-step feature engineering method, including basic
    features definition, finding similar conferences to enhance the feature set,
    and dimension reduction using PCA. We propose three ranking models and the
    ensemble methods for combining such models. Our experiment verifies the
    effectiveness of our approach. In KDD Cup 2016, we achieved the overall rank of
    the 2nd place.

    Learning for Expertise Matching with Declination Prediction

    Yujie Qian, Jie Tang
    Subjects: Artificial Intelligence (cs.AI)

    We study the problem of finding appropriate experts who are able to complete
    timely reviews and would not say “no” to the invitation. The problem is a
    central issue in many question-and-answer systems, but has received little
    research attention. Different from most existing studies that focus on
    expertise matching, we want to further predict the expert’s response: given a
    question, how can we find the expert who is able to provide a quality review
    and will agree to do it. We formalize the problem as a ranking problem. We
    first present an embedding-based question-to-expert distance metric for
    expertise matching and propose a ranking factor graph (RankFG) model to predict
    expert response. For online evaluation, we developed a Chrome Extension for
    reviewer recommendation and deployed it in the Google Chrome Web Store, and
    then collected the reviewers’ feedback. We also used the review bidding of a CS
    conference for evaluation. In the experiments, the proposed method demonstrates
    its superiority (+6.6-21.2% by MAP) over several state-of-the-art algorithms.

    Combing Context and Commonsense Knowledge Through Neural Networks for Solving Winograd Schema Problems

    Quan Liu, Hui Jiang, Zhen-Hua Ling, Xiaodan Zhu, Si Wei, Yu Hu
    Comments: 7 pages
    Subjects: Artificial Intelligence (cs.AI)

    This paper proposes a general framework to combine context and commonsense
    knowledge for solving the Winograd Schema (WS) and Pronoun Disambiguation
    Problems (PDP). In the proposed framework, commonsense knowledge bases (e.g.
    cause-effect word pairs) are quantized as knowledge constraints. The
    constraints guide us to learn knowledge enhanced embeddings (KEE) from large
    text corpus. Based on the pre-trained KEE models, this paper proposes two
    methods to solve the WS and PDP problems. The first method is an unsupervised
    method, which represents all the pronouns and candidate mentions in continuous
    vector spaces based on their contexts and calculates the semantic similarities
    between all the possible word pairs. The pronoun disambiguation procedure could
    then be implemented by comparing the semantic similarities between the pronoun
    (to be resolved) and all the candidate mentions. The second method is a
    supervised method, which extracts features for all the pronouns and candidate
    mentions and solves the WS problems by training a typical mention pair
    classification model. Similar to the first method, the features used in the
    second method are also extracted based on the KEE models. Experiments conducted
    on the available PDP and WS test sets show that, these two methods both achieve
    consistent improvements over the baseline systems. The best performance reaches
    62\% in accuracy on the PDP test set of the first Winograd Schema Challenge.

    Entropic Causal Inference

    Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath, Babak Hassibi
    Comments: To appear in AAAI 2017
    Subjects: Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

    We consider the problem of identifying the causal direction between two
    discrete random variables using observational data. Unlike previous work, we
    keep the most general functional model but make an assumption on the unobserved
    exogenous variable: Inspired by Occam’s razor, we assume that the exogenous
    variable is simple in the true causal direction. We quantify simplicity using
    R’enyi entropy. Our main result is that, under natural assumptions, if the
    exogenous variable has low (H_0) entropy (cardinality) in the true direction,
    it must have high (H_0) entropy in the wrong direction. We establish several
    algorithmic hardness results about estimating the minimum entropy exogenous
    variable. We show that the problem of finding the exogenous variable with
    minimum entropy is equivalent to the problem of finding minimum joint entropy
    given (n) marginal distributions, also known as minimum entropy coupling
    problem. We propose an efficient greedy algorithm for the minimum entropy
    coupling problem, that for (n=2) provably finds a local optimum. This gives a
    greedy algorithm for finding the exogenous variable with minimum (H_1) (Shannon
    Entropy). Our greedy entropy-based causal inference algorithm has similar
    performance to the state of the art additive noise models in real datasets. One
    advantage of our approach is that we make no use of the values of random
    variables but only their distributions. Our method can therefore be used for
    causal inference for both ordinal and also categorical data, unlike additive
    noise models.

    A Review on Algorithms for Constraint-based Causal Discovery

    Kui Yu, Jiuyong Li, Lin Liu
    Subjects: Artificial Intelligence (cs.AI)

    Causal discovery studies the problem of mining causal relationships between
    variables from data, which is of primary interest in science. During the past
    decades, significant amount of progresses have been made toward this
    fundamental data mining paradigm. Recent years, as the availability of abundant
    large-sized and complex observational data, the constrain-based approaches have
    gradually attracted a lot of interest and have been widely applied to many
    diverse real-world problems due to the fast running speed and easy generalizing
    to the problem of causal insufficiency. In this paper, we aim to review the
    constraint-based causal discovery algorithms. Firstly, we discuss the learning
    paradigm of the constraint-based approaches. Secondly and primarily, the
    state-of-the-art constraint-based casual inference algorithms are surveyed with
    the detailed analysis. Thirdly, several related open-source software packages
    and benchmark data repositories are briefly summarized. As a conclusion, some
    open problems in constraint-based causal discovery are outlined for future
    research.

    Multi-lingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment

    Muhao Chen, Yingtao Tian, Mohan Yang, Carlo Zaniolo
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Many recent works have demonstrated the benefits of knowledge graph
    embeddings in completing monolingual knowledge graphs. Inasmuch as related
    knowledge bases are built in several different languages, achieving
    cross-lingual knowledge alignment will help people in constructing a coherent
    knowledge base, and assist machines in dealing with different expressions of
    entity relationships across diverse human languages. Unfortunately, achieving
    this highly desirable crosslingual alignment by human labor is very costly and
    errorprone. Thus, we propose MTransE, a translation-based model for
    multilingual knowledge graph embeddings, to provide a simple and automated
    solution. By encoding entities and relations of each language in a separated
    embedding space, MTransE provides transitions for each embedding vector to its
    cross-lingual counterparts in other spaces, while preserving the
    functionalities of monolingual embeddings. We deploy three different techniques
    to represent cross-lingual transitions, namely axis calibration, translation
    vectors, and linear transformations, and derive five variants for MTransE using
    different loss functions. Our models can be trained on partially aligned
    graphs, where just a small portion of triples are aligned with their
    cross-lingual counterparts. The experiments on cross-lingual entity matching
    and triple-wise alignment verification show promising results, with some
    variants consistently outperforming others on different tasks. We also explore
    how MTransE preserves the key properties of its monolingual counterpart TransE.

    Reinforcement Learning of Contextual MDPs using Spectral Methods

    Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    We propose a new reinforcement learning (RL) algorithm for contextual Markov
    decision processes (CMDP) using spectral methods. CMDPs are structured MDPs
    where the dynamics and rewards depend on a smaller number of hidden states or
    contexts. If the mapping between the hidden and observed states is known a
    priori, then standard RL algorithms such as UCRL are guaranteed to attain low
    regret. Is it possible to achieve regret of the same order even when the
    mapping is unknown? We provide an affirmative answer in this paper. We exploit
    spectral methods to learn the mapping from hidden to observed states with
    guaranteed confidence bounds, and incorporate it into the UCRL-based framework
    to obtain order-optimal regret.

    Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

    Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We propose a simple, elegant solution to use a single Neural Machine
    Translation (NMT) model to translate between multiple languages. Our solution
    requires no change in the model architecture from our base system but instead
    introduces an artificial token at the beginning of the input sentence to
    specify the required target language. The rest of the model, which includes
    encoder, decoder and attention, remains unchanged and is shared across all
    languages. Using a shared wordpiece vocabulary, our approach enables
    Multilingual NMT using a single model without any increase in parameters, which
    is significantly simpler than previous proposals for Multilingual NMT. Our
    method often improves the translation quality of all involved language pairs,
    even while keeping the total number of model parameters constant. On the WMT’14
    benchmarks, a single multilingual model achieves comparable performance for
    English(
    ightarrow)French and surpasses state-of-the-art results for
    English(
    ightarrow)German. Similarly, a single multilingual model surpasses
    state-of-the-art results for French(
    ightarrow)English and
    German(
    ightarrow)English on WMT’14 and WMT’15 benchmarks respectively. On
    production corpora, multilingual models of up to twelve language pairs allow
    for better translation of many individual pairs. In addition to improving the
    translation quality of language pairs that the model was trained with, our
    models can also learn to perform implicit bridging between language pairs never
    seen explicitly during training, showing that transfer learning and zero-shot
    translation is possible for neural translation. Finally, we show analyses that
    hints at a universal interlingua representation in our models and show some
    interesting examples when mixing languages.

    Learning the best algorithm for max-cut, clustering, and other partitioning problems

    Maria-Florina Balcan, Vaishnavh Nagarajan, Ellen Vitercik, Colin White
    Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Many data analysis problems are NP-hard, a reality that has motivated
    researchers to develop a wealth of approximation algorithms and heuristics over
    the past few decades. Max-cut, clustering, and many other widely applicable
    partitioning problems fall into this camp. We focus on two common classes of
    algorithms that are used for clustering and partitioning problems, including
    SDP rounding algorithms and agglomerative clustering algorithms with dynamic
    programming. The best algorithm to use typically depends on the specific
    application or distribution over instances. A worst-case analysis is often used
    to compare algorithms, but worst-case instances may not be present in the
    particular problem domain, and thus may be misleading when determining which
    algorithm to apply. Therefore, it is necessary to develop optimization methods
    which return the algorithms and parameters best suited for typical inputs from
    the application at hand. We address this problem for integer quadratic
    programming and clustering within a learning-theoretic framework where the
    learning algorithm is given samples from an application-specific distribution
    over problem instances and learns an algorithm which performs well over the
    distribution. We provide sample complexity guarantees and computationally
    efficient algorithms which optimize an algorithm family’s parameters to best
    suit typical inputs from a particular application. We analyze these algorithms
    using the learning-theoretic notion of pseudo-dimension. Along with upper
    bounds, we provide the first pseudo-dimension lower bounds in this line of
    work, which require an involved analysis of each algorithm family’s performance
    on carefully constructed problem instances. Our bounds are tight and therefore
    nail down the intrinsic complexity of the algorithm classes we analyze, which
    are significantly more complex than classes commonly used in learning theory.

    Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot

    Hideki Nakayama, Noriki Nishida
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

    We propose an approach to build a neural machine translation system with no
    supervised resources (i.e., no parallel corpora) using multimodal embedded
    representation over texts and images. Based on the assumption that text
    documents are often likely to be described with other multimedia information
    (e.g., images) somewhat related to the content, we try to indirectly estimate
    the relevance between two languages. Using multimedia as the “pivot”, we
    project all modalities into one common hidden space where samples belonging to
    similar semantic concepts should come close to each other, whatever the
    observed space of each sample is. This modality-agnostic representation is the
    key to bridging the gap between different modalities. Putting a decoder on top
    of it, our network can flexibly draw the outputs from any input modality.
    Notably, in the testing phase, we need only source language texts as the input
    for translation. In experiments, we tested our method on two benchmarks to show
    that it can achieve reasonable translation performance. We compared and
    investigated several possible implementations and found that an end-to-end
    model that simultaneously optimized both rank loss in multimodal encoders and
    cross-entropy loss in decoders performed the best.

    Generate Models and Model Criticism via Optimized Maximum Mean Discrepancy

    Dougal J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Aaditya Ramdas, Alex Smola, Arthur Gretton
    Comments: Submitted to ICLR 2017; public comments at this http URL
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Methodology (stat.ME)

    We propose a method to optimize the representation and distinguishability of
    samples from two probability distributions, by maximizing the estimated power
    of a statistical test based on the maximum mean discrepancy (MMD). This
    optimized MMD is applied to the setting of unsupervised learning by generative
    adversarial networks (GAN), in which a model attempts to generate realistic
    samples, and a discriminator attempts to tell these apart from data samples. In
    this context, the MMD may be used in two roles: first, as a discriminator,
    either directly on the samples, or on features of the samples. Second, the MMD
    can be used to evaluate the performance of a generative model, by testing the
    model’s samples against a reference data set. In the latter role, the optimized
    MMD is particularly helpful, as it gives an interpretable indication of how the
    model and data distributions differ, even in cases where individual model
    samples are not easily distinguished either by eye or by classifier.

    Learning to Gather Information via Imitation

    Sanjiban Choudhury, Ashish Kapoor, Gireeja Ranade, Debadeepta Dey
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

    The budgeted information gathering problem – where a robot with a fixed fuel
    budget is required to maximize the amount of information gathered from the
    world – appears in practice across a wide range of applications in autonomous
    exploration and inspection with mobile robots. Although there is an extensive
    amount of prior work investigating effective approximations of the problem,
    these methods do not address the fact that their performance is heavily
    dependent on distribution of objects in the world. In this paper, we attempt to
    address this issue by proposing a novel data-driven imitation learning
    framework.

    We present an efficient algorithm, EXPLORE, that trains a policy on the
    target distribution to imitate a clairvoyant oracle – an oracle that has full
    information about the world and computes non-myopic solutions to maximize
    information gathered. We validate the approach on a spectrum of results on a
    number of 2D and 3D exploration problems that demonstrates the ability of
    EXPLORE to adapt to different object distributions. Additionally, our analysis
    provides theoretical insight into the behavior of EXPLORE. Our approach paves
    the way forward for efficiently applying data-driven methods to the domain of
    information gathering.

    Recognizing and Eliciting Weakly Single Crossing Profiles on Trees

    Palash Dey
    Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)

    The domain of single crossing preference profiles is a widely studied domain
    in social choice theory. It has been generalized to the domain of single
    crossing preference profiles with respect to trees which inherits many
    desirable properties from the single crossing domain, for example, transitivity
    of majority relation, existence of polynomial time algorithms for finding
    winners of Kemeny voting rule, etc. In this paper, we consider a further
    generalization of the domain of single crossing profiles on trees to the domain
    consisting of all preference profiles which can be extended to single crossing
    preference profiles with respect to some tree by adding more preferences to it.
    We call this domain the weakly single crossing domain on trees. We present a
    polynomial time algorithm for recognizing weakly single crossing profiles on
    trees. We then move on to develop a polynomial time algorithm with low query
    complexity for eliciting weakly single crossing profiles on trees even when we
    do not know any tree with respect to which the closure of the input profile is
    single crossing and the preferences can be queried only sequentially; moreover,
    the sequential order is also unknown. We complement the performance of our
    preference elicitation algorithm by proving that our algorithm makes an optimal
    number of queries up to constant factors when the number of preferences is
    large compared to the number of candidates, even if the input profile is known
    to be single crossing with respect to some given tree and the preferences can
    be accessed randomly.

    Leveraging Video Descriptions to Learn Video Question Answering

    Kuo-Hao Zeng, Tseng-Hung Chen, Ching-Yao Chuang, Yuan-Hong Liao, Juan Carlos Niebles, Min Sun
    Comments: 7 pages, 5 figures. Accepted to AAAI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)

    We propose a scalable approach to learn video-based question answering (QA):
    answer a “free-form natural language question” about a video content. Our
    approach automatically harvests a large number of videos and descriptions
    freely available online. Then, a large number of candidate QA pairs are
    automatically generated from descriptions rather than manually annotated. Next,
    we use these candidate QA pairs to train a number of video-based QA methods
    extended fromMN (Sukhbaatar et al. 2015), VQA (Antol et al. 2015), SA (Yao et
    al. 2015), SS (Venugopalan et al. 2015). In order to handle non-perfect
    candidate QA pairs, we propose a self-paced learning procedure to iteratively
    identify them and mitigate their effects in training. Finally, we evaluate
    performance on manually generated video-based QA pairs. The results show that
    our self-paced learning procedure is effective, and the extended SS model
    outperforms various baselines.


    Information Retrieval

    1.5 billion words Arabic Corpus

    Ibrahim Abu El-khair
    Subjects: Computation and Language (cs.CL); Digital Libraries (cs.DL); Information Retrieval (cs.IR)

    This study is an attempt to build a contemporary linguistic corpus for Arabic
    language. The corpus produced, is a text corpus includes more than five million
    newspaper articles. It contains over a billion and a half words in total, out
    of which, there is about three million unique words. The data were collected
    from newspaper articles in ten major news sources from eight Arabic countries,
    over a period of fourteen years. The corpus was encoded with two types of
    encoding, namely: UTF-8, and Windows CP-1256. Also it was marked with two
    mark-up languages, namely: SGML, and XML.


    Computation and Language

    Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

    Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We propose a simple, elegant solution to use a single Neural Machine
    Translation (NMT) model to translate between multiple languages. Our solution
    requires no change in the model architecture from our base system but instead
    introduces an artificial token at the beginning of the input sentence to
    specify the required target language. The rest of the model, which includes
    encoder, decoder and attention, remains unchanged and is shared across all
    languages. Using a shared wordpiece vocabulary, our approach enables
    Multilingual NMT using a single model without any increase in parameters, which
    is significantly simpler than previous proposals for Multilingual NMT. Our
    method often improves the translation quality of all involved language pairs,
    even while keeping the total number of model parameters constant. On the WMT’14
    benchmarks, a single multilingual model achieves comparable performance for
    English(
    ightarrow)French and surpasses state-of-the-art results for
    English(
    ightarrow)German. Similarly, a single multilingual model surpasses
    state-of-the-art results for French(
    ightarrow)English and
    German(
    ightarrow)English on WMT’14 and WMT’15 benchmarks respectively. On
    production corpora, multilingual models of up to twelve language pairs allow
    for better translation of many individual pairs. In addition to improving the
    translation quality of language pairs that the model was trained with, our
    models can also learn to perform implicit bridging between language pairs never
    seen explicitly during training, showing that transfer learning and zero-shot
    translation is possible for neural translation. Finally, we show analyses that
    hints at a universal interlingua representation in our models and show some
    interesting examples when mixing languages.

    Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot

    Hideki Nakayama, Noriki Nishida
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

    We propose an approach to build a neural machine translation system with no
    supervised resources (i.e., no parallel corpora) using multimodal embedded
    representation over texts and images. Based on the assumption that text
    documents are often likely to be described with other multimedia information
    (e.g., images) somewhat related to the content, we try to indirectly estimate
    the relevance between two languages. Using multimedia as the “pivot”, we
    project all modalities into one common hidden space where samples belonging to
    similar semantic concepts should come close to each other, whatever the
    observed space of each sample is. This modality-agnostic representation is the
    key to bridging the gap between different modalities. Putting a decoder on top
    of it, our network can flexibly draw the outputs from any input modality.
    Notably, in the testing phase, we need only source language texts as the input
    for translation. In experiments, we tested our method on two benchmarks to show
    that it can achieve reasonable translation performance. We compared and
    investigated several possible implementations and found that an end-to-end
    model that simultaneously optimized both rank loss in multimodal encoders and
    cross-entropy loss in decoders performed the best.

    Multi-view Recurrent Neural Acoustic Word Embeddings

    Wanjia He, Weiran Wang, Karen Livescu
    Comments: In submission to ICLR 2017
    Subjects: Computation and Language (cs.CL)

    Recent work has begun exploring neural acoustic word
    embeddings–fixed-dimensional vector representations of arbitrary-length speech
    segments corresponding to words. Such embeddings are applicable to speech
    retrieval and recognition tasks, where reasoning about whole words may make it
    possible to avoid ambiguous sub-word representations. The main idea is to map
    acoustic sequences to fixed-dimensional vectors such that examples of the same
    word are mapped to similar vectors, while different-word examples are mapped to
    very different vectors. In this work we take a multi-view approach to learning
    acoustic word embeddings, in which we jointly learn to embed acoustic sequences
    and their corresponding character sequences. We use deep bidirectional LSTM
    embedding models and multi-view contrastive losses. We study the effect of
    different loss variants, including fixed-margin and cost-sensitive losses. Our
    acoustic word embeddings improve over previous approaches for the task of word
    discrimination. We also present results on other tasks that are enabled by the
    multi-view approach, including cross-view word discrimination and word
    similarity.

    Ranking medical jargon in electronic health record notes by adapted distant supervision

    Jinying Chen, Abhyuday N. Jagannatha, Samah J. Jarad, Hong Yu
    Subjects: Computation and Language (cs.CL)

    Objective: Allowing patients to access their own electronic health record
    (EHR) notes through online patient portals has the potential to improve
    patient-centered care. However, medical jargon, which abounds in EHR notes, has
    been shown to be a barrier for patient EHR comprehension. Existing knowledge
    bases that link medical jargon to lay terms or definitions play an important
    role in alleviating this problem but have low coverage of medical jargon in
    EHRs. We developed a data-driven approach that mines EHRs to identify and rank
    medical jargon based on its importance to patients, to support the building of
    EHR-centric lay language resources.

    Methods: We developed an innovative adapted distant supervision (ADS) model
    based on support vector machines to rank medical jargon from EHRs. For distant
    supervision, we utilized the open-access, collaborative consumer health
    vocabulary, a large, publicly available resource that links lay terms to
    medical jargon. We explored both knowledge-based features from the Unified
    Medical Language System and distributed word representations learned from
    unlabeled large corpora. We evaluated the ADS model using physician-identified
    important medical terms.

    Results: Our ADS model significantly surpassed two state-of-the-art automatic
    term recognition methods, TF*IDF and C-Value, yielding 0.810 ROC-AUC versus
    0.710 and 0.667, respectively. Our model identified 10K important medical
    jargon terms after ranking over 100K candidate terms mined from over 7,500 EHR
    narratives.

    Conclusion: Our work is an important step towards enriching lexical resources
    that link medical jargon to lay terms/definitions to support patient EHR
    comprehension. The identified medical jargon terms and their rankings are
    available upon request.

    Attending to Characters in Neural Sequence Labeling Models

    Marek Rei, Gamal K.O. Crichton, Sampo Pyysalo
    Comments: Proceedings of COLING 2016
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Sequence labeling architectures use word embeddings for capturing similarity,
    but suffer when handling previously unseen or rare words. We investigate
    character-level extensions to such models and propose a novel architecture for
    combining alternative word representations. By using an attention mechanism,
    the model is able to dynamically decide how much information to use from a
    word- or character-level component. We evaluated different architectures on a
    range of sequence labeling datasets, and character-level extensions were found
    to improve performance on every benchmark. In addition, the proposed
    attention-based architecture delivered the best results even with a smaller
    number of trainable parameters.

    Character-level Convolutional Network for Text Classification Applied to Chinese Corpus

    Weijie Huang
    Subjects: Computation and Language (cs.CL)

    This article provides an interesting exploration of character-level
    convolutional neural network solving Chinese corpus text classification
    problem. We constructed a large-scale Chinese language dataset, and the result
    shows that character-level convolutional neural network works better on Chinese
    corpus than its corresponding pinyin format dataset. This is the first time
    that character-level convolutional neural network applied to text
    classification problem.

    `Who would have thought of that!': A Hierarchical Topic Model for Extraction of Sarcasm-prevalent Topics and Sarcasm Detection

    Aditya Joshi, Prayas Jain, Pushpak Bhattacharyya, Mark Carman
    Comments: This paper will be presented at ExPROM workshop at COLING 2016
    Subjects: Computation and Language (cs.CL)

    Topic Models have been reported to be beneficial for aspect-based sentiment
    analysis. This paper reports a simple topic model for sarcasm detection, a
    first, to the best of our knowledge. Designed on the basis of the intuition
    that sarcastic tweets are likely to have a mixture of words of both sentiments
    as against tweets with literal sentiment (either positive or negative), our
    hierarchical topic model discovers sarcasm-prevalent topics and topic-level
    sentiment. Using a dataset of tweets labeled using hashtags, the model
    estimates topic-level, and sentiment-level distributions. Our evaluation shows
    that topics such as `work’, `gun laws’, `weather’ are sarcasm-prevalent topics.
    Our model is also able to discover the mixture of sentiment-bearing words that
    exist in a text of a given sentiment-related label. Finally, we apply our model
    to predict sarcasm in tweets. We outperform two prior work based on statistical
    classifiers with specific features, by around 25\%.

    Classify or Select: Neural Architectures for Extractive Document Summarization

    Ramesh Nallapati, Bowen Zhou, Mingbo Ma
    Subjects: Computation and Language (cs.CL)

    We present two novel and contrasting Recurrent Neural Network (RNN) based
    architectures for extractive summarization of documents. The Classifier based
    architecture sequentially accepts or rejects each sentence in the original
    document order for its membership in the final summary. The Selector
    architecture, on the other hand, is free to pick one sentence at a time in any
    arbitrary order to piece together the summary.

    Our models under both architectures jointly capture the notions of salience
    and redundancy of sentences. In addition, these models have the advantage of
    being very interpretable, since they allow visualization of their predictions
    broken up by abstract features such as information content, salience and
    redundancy.

    We show that our models reach or outperform state-of-the-art supervised
    models on two different corpora. We also recommend the conditions under which
    one architecture is superior to the other based on experimental evidence.

    F-Score Driven Max Margin Neural Network for Named Entity Recognition in Chinese Social Media

    Hangfeng He, Xu Sun
    Subjects: Computation and Language (cs.CL)

    We focus on named entity recognition (NER) for Chinese social media. With
    massive unlabeled text and quite limited labelled corpus, we propose a
    semi-supervised learning model based on B-LSTM neural network. To take
    advantage of traditional methods in NER such as CRF, we combine transition
    probability with deep learning in our model. To bridge the gap between label
    accuracy and F-score of NER, we construct a model which can be directly trained
    on F-score. When considering the instability of F-score driven method and
    meaningful information provided by label accuracy, we propose an integrated
    method to train on both F-score and label accuracy. Our integrated model yields
    7.44\% improvement over previous state-of-the-art result.

    A New Recurrent Neural CRF for Learning Non-linear Edge Features

    Shuming Ma, Xu Sun
    Subjects: Computation and Language (cs.CL)

    Conditional Random Field (CRF) and recurrent neural models have achieved
    success in structured prediction. More recently, there is a marriage of CRF and
    recurrent neural models, so that we can gain from both non-linear dense
    features and globally normalized CRF objective. These recurrent neural CRF
    models mainly focus on encode node features in CRF undirected graphs. However,
    edge features prove important to CRF in structured prediction. In this work, we
    introduce a new recurrent neural CRF model, which learns non-linear edge
    features, and thus makes non-linear features encoded completely. We compare our
    model with different neural models in well-known structured prediction tasks.
    Experiments show that our model outperforms state-of-the-art methods in NP
    chunking, shallow parsing, Chinese word segmentation and POS tagging.

    SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents

    Ramesh Nallapati, Feifei Zhai, Bowen Zhou
    Comments: Published at AAAI 2017, The Thirty-First AAAI Conference on Artificial Intelligence (AAAI-2017)
    Subjects: Computation and Language (cs.CL)

    We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model
    for extractive summarization of documents and show that it achieves performance
    better than or comparable to state-of-the-art. Our model has the additional
    advantage of being very interpretable, since it allows visualization of its
    predictions broken up by abstract features such as information content,
    salience and novelty. Another novel contribution of our work is abstractive
    training of our extractive model that can train on human generated reference
    summaries alone, eliminating the need for sentence-level extractive labels.

    Joint Representation Learning of Text and Knowledge for Knowledge Graph Completion

    Xu Han, Zhiyuan Liu, Maosong Sun
    Subjects: Computation and Language (cs.CL)

    Joint representation learning of text and knowledge within a unified semantic
    space enables us to perform knowledge graph completion more accurately. In this
    work, we propose a novel framework to embed words, entities and relations into
    the same continuous vector space. In this model, both entity and relation
    embeddings are learned by taking knowledge graph and plain text into
    consideration. In experiments, we evaluate the joint learning model on three
    tasks including entity prediction, relation prediction and relation
    classification from text. The experiment results show that our model can
    significantly and consistently improve the performance on the three tasks as
    compared with other baselines.

    Cross-lingual Dataless Classification for Languages with Small Wikipedia Presence

    Yangqiu Song, Stephen Mayhew, Dan Roth
    Subjects: Computation and Language (cs.CL)

    This paper presents an approach to classify documents in any language into an
    English topical label space, without any text categorization training data. The
    approach, Cross-Lingual Dataless Document Classification (CLDDC) relies on
    mapping the English labels or short category description into a Wikipedia-based
    semantic representation, and on the use of the target language Wikipedia.
    Consequently, performance could suffer when Wikipedia in the target language is
    small. In this paper, we focus on languages with small Wikipedias,
    (Small-Wikipedia languages, SWLs). We use a word-level dictionary to convert
    documents in a SWL to a large-Wikipedia language (LWLs), and then perform CLDDC
    based on the LWL’s Wikipedia. This approach can be applied to thousands of
    languages, which can be contrasted with machine translation, which is a
    supervision heavy approach and can be done for about 100 languages. We also
    develop a ranking algorithm that makes use of language similarity metrics to
    automatically select a good LWL, and show that this significantly improves
    classification of SWLs’ documents, performing comparably to the best bridge
    possible.

    Semi-automatic Simultaneous Interpreting Quality Evaluation

    Xiaojun Zhang
    Journal-ref: International Journal on Natural Language Computing (IJNLC) Vol.
    5, No.5, October 2016
    Subjects: Computation and Language (cs.CL)

    Increasing interpreting needs a more objective and automatic measurement. We
    hold a basic idea that ‘translating means translating meaning’ in that we can
    assessment interpretation quality by comparing the meaning of the interpreting
    output with the source input. That is, a translation unit of a ‘chunk’ named
    Frame which comes from frame semantics and its components named Frame Elements
    (FEs) which comes from Frame Net are proposed to explore their matching rate
    between target and source texts. A case study in this paper verifies the
    usability of semi-automatic graded semantic-scoring measurement for human
    simultaneous interpreting and shows how to use frame and FE matches to score.
    Experiments results show that the semantic-scoring metrics have a significantly
    correlation coefficient with human judgment.

    1.5 billion words Arabic Corpus

    Ibrahim Abu El-khair
    Subjects: Computation and Language (cs.CL); Digital Libraries (cs.DL); Information Retrieval (cs.IR)

    This study is an attempt to build a contemporary linguistic corpus for Arabic
    language. The corpus produced, is a text corpus includes more than five million
    newspaper articles. It contains over a billion and a half words in total, out
    of which, there is about three million unique words. The data were collected
    from newspaper articles in ten major news sources from eight Arabic countries,
    over a period of fourteen years. The corpus was encoded with two types of
    encoding, namely: UTF-8, and Windows CP-1256. Also it was marked with two
    mark-up languages, namely: SGML, and XML.

    Multi-Language Identification Using Convolutional Recurrent Neural Network

    Vrishabh Ajay Lakhani, Rohan Mahadev
    Comments: 10 pages, 6 figures
    Subjects: Computation and Language (cs.CL)

    Language Identification, being an important aspect of Automatic Speaker
    Recognition has had many changes and new approaches to ameliorate performance
    over the last decade. We compare the performance of using audio spectrum in the
    log scale and using Polyphonic sound sequences from raw audio samples to train
    the neural network and to classify speech as either English or Spanish. To
    achieve this, we use the novel approach of using a Convolutional Recurrent
    Neural Network using Long Short Term Memory (LSTM) or a Gated Recurrent Unit
    (GRU) for forward propagation of the neural network. Our hypothesis is that the
    performance of using polyphonic sound sequence as features and both LSTM and
    GRU as the gating mechanisms for the neural network outperform the traditional
    MFCC features using a unidirectional Deep Neural Network.

    Linguistically Regularized LSTMs for Sentiment Classification

    Qiao Qian, Minlie Huang, Xiaoyan Zhu
    Subjects: Computation and Language (cs.CL)

    Sentiment understanding has been a long-term goal of AI in the past decades.
    This paper deals with sentence-level sentiment classification. Though a variety
    of neural network models have been proposed very recently, however, previous
    models either depend on expensive phrase-level annotation, whose performance
    drops substantially when trained with only sentence-level annotation; or do not
    fully employ linguistic resources (e.g., sentiment lexicons, negation words,
    intensity words), thus not being able to produce linguistically coherent
    representations. In this paper, we propose simple models trained with
    sentence-level annotation, but also attempt to generating linguistically
    coherent representations by employing regularizers that model the linguistic
    role of sentiment lexicons, negation words, and intensity words. Results show
    that our models are effective to capture the sentiment shifting effect of
    sentiment, negation, and intensity words, while still obtain competitive
    results without sacrificing the models’ simplicity.

    Training IBM Watson using Automatically Generated Question-Answer Pairs

    Jangho Lee, Gyuwan Kim, Jaeyoon Yoo, Changwoo Jung, Minseok Kim, Sungroh Yoon
    Subjects: Computation and Language (cs.CL)

    IBM Watson is a cognitive computing system capable of question answering in
    natural languages. It is believed that IBM Watson can understand large corpora
    and answer relevant questions more effectively than any other
    question-answering system currently available. To unleash the full power of
    Watson, however, we need to train its instance with a large number of
    well-prepared question-answer pairs. Obviously, manually generating such pairs
    in a large quantity is prohibitively time consuming and significantly limits
    the efficiency of Watson’s training. Recently, a large-scale dataset of over 30
    million question-answer pairs was reported. Under the assumption that using
    such an automatically generated dataset could relieve the burden of manual
    question-answer generation, we tried to use this dataset to train an instance
    of Watson and checked the training efficiency and accuracy. According to our
    experiments, using this auto-generated dataset was effective for training
    Watson, complementing manually crafted question-answer pairs. To the best of
    the authors’ knowledge, this work is the first attempt to use a large-scale
    dataset of automatically generated question-answer pairs for training IBM
    Watson. We anticipate that the insights and lessons obtained from our
    experiments will be useful for researchers who want to expedite Watson training
    leveraged by automatically generated question-answer pairs.

    Multi-lingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment

    Muhao Chen, Yingtao Tian, Mohan Yang, Carlo Zaniolo
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Many recent works have demonstrated the benefits of knowledge graph
    embeddings in completing monolingual knowledge graphs. Inasmuch as related
    knowledge bases are built in several different languages, achieving
    cross-lingual knowledge alignment will help people in constructing a coherent
    knowledge base, and assist machines in dealing with different expressions of
    entity relationships across diverse human languages. Unfortunately, achieving
    this highly desirable crosslingual alignment by human labor is very costly and
    errorprone. Thus, we propose MTransE, a translation-based model for
    multilingual knowledge graph embeddings, to provide a simple and automated
    solution. By encoding entities and relations of each language in a separated
    embedding space, MTransE provides transitions for each embedding vector to its
    cross-lingual counterparts in other spaces, while preserving the
    functionalities of monolingual embeddings. We deploy three different techniques
    to represent cross-lingual transitions, namely axis calibration, translation
    vectors, and linear transformations, and derive five variants for MTransE using
    different loss functions. Our models can be trained on partially aligned
    graphs, where just a small portion of triples are aligned with their
    cross-lingual counterparts. The experiments on cross-lingual entity matching
    and triple-wise alignment verification show promising results, with some
    variants consistently outperforming others on different tasks. We also explore
    how MTransE preserves the key properties of its monolingual counterpart TransE.


    Distributed, Parallel, and Cluster Computing

    Byzantine Processors and Cuckoo Birds: Confining Maliciousness to the Outset

    Danny Dolev, Eli Gafni
    Comments: arXiv admin note: text overlap with arXiv:1607.01210
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Are there Byzantine Animals? A Fooling Behavior is exhibited by the Cuckoo
    bird. It sneakily replaces some of the eggs of other species with its own. Lest
    the Cuckoo extinct itself by destroying its host, it self-limits its power: It
    does not replace too large a fraction of the eggs. Here, we show that any
    Byzantine Behavior that does not destroy the system it attacks, i.e. allows the
    system to solve an easy task like epsilon-agreement, then its maliciousness can
    be confined to be the exact replica of the Cuckoo bird behavior: Undetectably
    replace an input of a processor and let the processor behave correctly
    thereafter with respect to the new input. In doing so we reduce the study of
    Byzantine behavior to fail-stop (benign) behavior with the Cuckoo caveat of a
    fraction of the inputs replaced. We establish a complete correspondence between
    the Byzantine and the Benign, modulo different thresholds, and replaced inputs.
    This work is yet another step in a line of work unifying seemingly distinct
    distributed system models, dispelling the Myth that Distributed Computing is a
    plethora of distinct isolated models, each requiring its specialized tools and
    ideas in order to determine solvability of tasks. Thus, hereafter, Byzantine
    Computability questions can be reduced to questions in the benign failure
    setting. We also show that the known results about correlated faults in the
    asynchronous benign setting can be imported verbatim to the asynchronous
    Byzantine setting. Finally, as in the benign case in which we have the property
    that a processor can output once its faulty behavior stops for long enough, we
    show this can be done in a similar manner in the Byzantine case. This
    necessitated the generalization of Reliable Broadcast to what we term
    Recoverable Reliable Broadcast.

    Efficient Communications in Training Large Scale Neural Networks

    Linnan Wang, Wei Wu, George Bosilca, Richard Vuduc, Zenglin Xu
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    We consider the problem of how to reduce the cost of communication that is
    required for the parallel training of a neural network. The state-of-the-art
    method, Bulk Synchronous Parallel Stochastic Gradient Descent (BSP-SGD),
    requires many collective communication operations, like broadcasts of
    parameters or reductions for sub-gradient aggregations, which for large
    messages quickly dominates overall execution time and limits parallel
    scalability. To address this problem, we develop a new technique for collective
    operations, referred to as Linear Pipelining (LP). It is tuned to the message
    sizes that arise in BSP-SGD, and works effectively on multi-GPU systems.
    Theoretically, the cost of LP is invariant to (P), where (P) is the number of
    GPUs, while the cost of more conventional Minimum Spanning Tree (MST) scales
    like (O(log P)). LP also demonstrate up to 2x faster bandwidth than
    Bidirectional Exchange (BE) techniques that are widely adopted by current MPI
    implementations. We apply these collectives to BSP-SGD, showing that the
    proposed implementations reduce communication bottlenecks in practice while
    preserving the attractive convergence properties of BSP-SGD.

    A parallel workload has extreme varibility

    R. Henwood, N. W. Watkins, S. C. Chapman, R. McLay
    Comments: 7 pages, 2 figures, 1 code listing
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Analysis, Statistics and Probability (physics.data-an); Computation (stat.CO)

    In both high-performance computing (HPC) environments and the public cloud,
    the duration of time to retrieve or save your results is simultaneously
    unpredictable and important to your over all resource budget. It is generally
    accepted (“Google: Taming the Long Latency Tail – When More Machines Equals
    Worse Results”, Todd Hoff, highscalability.com 2012), but without a robust
    explanation, that identical parallel tasks do take different durations to
    complete — a phenomena known as variability. This paper advances understanding
    of this topic. We carefully choose a model from which system-level complexity
    emerges that can be studied directly. We find that a generalized extreme value
    (GEV) model for variability naturally emerges. Using the public cloud, we find
    real-world observations have excellent agreement with our model. Since the GEV
    distribution is a limit distribution this suggests a universal property of
    parallel systems gated by the slowest communication element of some sort.
    Hence, this model is applicable to a variety of processing and IO tasks in
    parallel environments. These findings have important implications, ranging from
    characterizing ideal performance for parallel codes to detecting degraded
    behaviour at extreme scales.

    Timestamps for Partial Replication

    Zhuolun Xiang, Nitin H. Vaidya
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Maintaining causal consistency in distributed shared memory systems using
    vector timestamps has received a lot of attention from both theoretical and
    practical prospective. However, most of the previous literature focuses on full
    replication where each data is stored in all replicas, which may not be
    scalable due to the increasing amount of data. In this report, we investigate
    how to achieve causal consistency in partial replicated systems, where each
    replica may store different set of data. We propose an algorithm that tracks
    causal dependencies via vector timestamp in client-server model for partial
    replication. The cost of our algorithm in terms of timestamps size varies as a
    function of the manner in which the replicas share data, and the set of
    replicas accessed by each client. We also establish a connection between our
    algorithm with the previous work on full replication.

    Maintaining Acyclicity of Concurrent Graphs

    Sathya Peri, Muktikanta Sa, Nandini Singhal
    Comments: 23 pages 7 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In this paper, we consider the problem of preserving acyclicity in a directed
    graph (for shared memory architecture) that is concurrently being updated by
    threads adding/deleting vertices and edges. To the best of our knowledge, no
    previous paper has presented a concurrent graph data structure. We implement
    the concurrent directed graph data-structure as a concurrent adjacency list
    representation. We extend the lazy list implementation of concurrent linked
    lists for maintaining concurrent adjacency lists. There exists a number of
    graph applications which require the acyclic invariant in a directed graph. One
    such example is Serialization Graph Testing Algorithm used in databases and
    transactional memory. We present two concurrent algorithms for maintaining
    acyclicity in a concurrent graph: (i) Based on obstruction-free snapshots (ii)
    Using wait-free reachability. We compare the performance of these algorithms
    against the coarse-grained locking strategy, commonly used technique for
    allowing concurrent updates. We present the speedup obtained by these
    algorithms over sequential execution. As a future direction, we plan to extend
    this data structure for other progress conditions.

    On Location Hiding in Distributed Systems

    Karol Gotfryd, Marek Klonowski, Dominik Pająk
    Comments: Submitted to 10th International Conference on Algorithms and Complexity CIAC 2017
    Subjects: Multiagent Systems (cs.MA); Distributed, Parallel, and Cluster Computing (cs.DC)

    We consider the following problem – a group of mobile agents perform some
    task on a terrain modeled as a graph. In a given moment of time an adversary
    gets an access to the graph and positions of the agents. Shortly before
    adversary’s observation the mobile agents have a chance to relocate themselves
    in order to hide their initial configuration. We assume that the initial
    configuration may possibly reveal to the adversary some information about the
    task they performed. Clearly agents have to change their location in possibly
    short time using minimal energy. In our paper we introduce a definition of a
    emph{well hiding} algorithm in which the starting and final configurations of
    the agents have small mutual information. Then we discuss the influence of
    various features of the model on the running time of the optimal well-hiding
    algorithm. We show that if the topology of the graph is known to the agents,
    then the number of steps proportional to the diameter of the graph is
    sufficient and necessary. In the unknown topology scenario we only consider a
    single agent case. We first show that the task is impossible in the
    deterministic case if the agent has no memory. Then we present a polynomial
    randomized algorithm. Finally in the model with memory we show that the number
    of steps proportional to the number of edges of the graph is sufficient and
    necessary. In some sense we investigate how complex is the problem of “losing”
    information about location (both physical and logical) for different settings.


    Learning

    How to scale distributed deep learning?

    Peter H. Jin, Qiaochu Yuan, Forrest Iandola, Kurt Keutzer
    Comments: Extended version of paper accepted at ML Sys 2016 (at NIPS 2016)
    Subjects: Learning (cs.LG)

    Training time on large datasets for deep neural networks is the principal
    workflow bottleneck in a number of important applications of deep learning,
    such as object classification and detection in automatic driver assistance
    systems (ADAS). To minimize training time, the training of a deep neural
    network must be scaled beyond a single machine to as many machines as possible
    by distributing the optimization method used for training. While a number of
    approaches have been proposed for distributed stochastic gradient descent
    (SGD), at the current time synchronous approaches to distributed SGD appear to
    be showing the greatest performance at large scale. Synchronous scaling of SGD
    suffers from the need to synchronize all processors on each gradient step and
    is not resilient in the face of failing or lagging processors. In asynchronous
    approaches using parameter servers, training is slowed by contention to the
    parameter server. In this paper we compare the convergence of synchronous and
    asynchronous SGD for training a modern ResNet network architecture on the
    ImageNet classification problem. We also propose an asynchronous method,
    gossiping SGD, that aims to retain the positive features of both systems by
    replacing the all-reduce collective operation of synchronous training with a
    gossip aggregation algorithm. We find, perhaps counterintuitively, that
    asynchronous SGD, including both elastic averaging and gossiping, converges
    faster at fewer nodes (up to about 32 nodes), whereas synchronous SGD scales
    better to more nodes (up to about 100 nodes).

    Earliness-Aware Deep Convolutional Networks for Early Time Series Classification

    Wenlin Wang, Changyou Chen, Wenqi Wang, Piyush Rai, Lawrence Carin
    Subjects: Learning (cs.LG)

    We present Earliness-Aware Deep Convolutional Networks (EA-ConvNets), an
    end-to-end deep learning framework, for early classification of time series
    data. Unlike most existing methods for early classification of time series
    data, that are designed to solve this problem under the assumption of the
    availability of a good set of pre-defined (often hand-crafted) features, our
    framework can jointly perform feature learning (by learning a deep hierarchy of
    emph{shapelets} capturing the salient characteristics in each time series),
    along with a dynamic truncation model to help our deep feature learning
    architecture focus on the early parts of each time series. Consequently, our
    framework is able to make highly reliable early predictions, outperforming
    various state-of-the-art methods for early time series classification, while
    also being competitive when compared to the state-of-the-art time series
    classification algorithms that work with emph{fully observed} time series
    data. To the best of our knowledge, the proposed framework is the first to
    perform data-driven (deep) feature learning in the context of early
    classification of time series data. We perform a comprehensive set of
    experiments, on several benchmark data sets, which demonstrate that our method
    yields significantly better predictions than various state-of-the-art methods
    designed for early time series classification. In addition to obtaining high
    accuracies, our experiments also show that the learned deep shapelets based
    features are also highly interpretable and can help gain better understanding
    of the underlying characteristics of time series data.

    Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes

    Mengye Ren, Renjie Liao, Raquel Urtasun, Fabian H. Sinz, Richard S. Zemel
    Subjects: Learning (cs.LG)

    Normalization techniques have only recently begun to be exploited in
    supervised learning tasks. Batch normalization exploits mini-batch statistics
    to normalize the activations. This was shown to speed up training and result in
    better models. However its success has been very limited when dealing with
    recurrent neural networks. On the other hand, layer normalization normalizes
    the activations across all activities within a layer. This was shown to work
    well in the recurrent setting. In this paper we propose a unified view of
    normalization techniques, as forms of divisive normalization, which includes
    layer and batch normalization as special cases. Our second contribution is the
    finding that a small modification to these normalization schemes, in
    conjunction with a sparse regularizer on the activations, leads to significant
    benefits over standard normalization techniques. We demonstrate the
    effectiveness of our unified divisive normalization framework in the context of
    convolutional neural nets and recurrent neural networks, showing improvements
    over baselines in image classification, language modeling as well as
    super-resolution.

    On the Quantitative Analysis of Decoder-Based Generative Models

    Yuhuai Wu, Yuri Burda, Ruslan Salakhutdinov, Roger Grosse
    Subjects: Learning (cs.LG)

    The past several years have seen remarkable progress in generative models
    which produce convincing samples of images and other modalities. A shared
    component of many powerful generative models is a decoder network, a parametric
    deep neural net that defines a generative distribution. Examples include
    variational autoencoders, generative adversarial networks, and generative
    moment matching networks. Unfortunately, it can be difficult to quantify the
    performance of these models because of the intractability of log-likelihood
    estimation, and inspecting samples can be misleading. We propose to use
    Annealed Importance Sampling for evaluating log-likelihoods for decoder-based
    models and validate its accuracy using bidirectional Monte Carlo. Using this
    technique, we analyze the performance of decoder-based models, the
    effectiveness of existing log-likelihood estimators, the degree of overfitting,
    and the degree to which these models miss important modes of the data
    distribution.

    Identity Matters in Deep Learning

    Moritz Hardt, Tengyu Ma
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    An emerging design principle in deep learning is that each layer of a deep
    artificial neural network should be able to easily express the identity
    transformation. This idea not only motivated various normalization techniques,
    such as emph{batch normalization}, but was also key to the immense success of
    emph{residual networks}.

    In this work, we put the principle of emph{identity parameterization} on a
    more solid theoretical footing alongside further empirical progress. We first
    give a strikingly simple proof that arbitrarily deep linear residual networks
    have no spurious local optima. The same result for linear feed-forward networks
    in their standard parameterization is substantially more delicate. Second, we
    show that residual networks with ReLu activations have universal finite-sample
    expressivity in the sense that the network can represent any function of its
    sample provided that the model has more parameters than the sample size.

    Directly inspired by our theory, we experiment with a radically simple
    residual architecture consisting of only residual convolutional layers and ReLu
    activations, but no batch normalization, dropout, or max pool. Our model
    improves significantly on previous all-convolutional networks on the CIFAR10,
    CIFAR100, and ImageNet classification benchmarks.

    Learning Sparse, Distributed Representations using the Hebbian Principle

    Aseem Wadhwa, Upamanyu Madhow
    Subjects: Learning (cs.LG)

    The “fire together, wire together” Hebbian model is a central principle for
    learning in neuroscience, but surprisingly, it has found limited applicability
    in modern machine learning. In this paper, we take a first step towards
    bridging this gap, by developing flavors of competitive Hebbian learning which
    produce sparse, distributed neural codes using online adaptation with minimal
    tuning. We propose an unsupervised algorithm, termed Adaptive Hebbian Learning
    (AHL). We illustrate the distributed nature of the learned representations via
    output entropy computations for synthetic data, and demonstrate superior
    performance, compared to standard alternatives such as autoencoders, in
    training a deep convolutional net on standard image datasets.

    (CAD)(^2)RL: Real Single-Image Flight without a Single Real Image

    Fereshteh Sadeghi, Sergey Levine
    Comments: 11 pages
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    We propose (CAD)(^2)RL, a flight controller for Collision Avoidance via Deep
    Reinforcement Learning that can be used to perform collision-free flight in the
    real world although it is trained entirely in a 3D CAD model simulator. Our
    method uses only single RGB images from a monocular camera mounted on the robot
    as the input and is specialized for indoor hallway following and obstacle
    avoidance. In contrast to most indoor navigation techniques that aim to
    directly reconstruct the 3D geometry of the environment, our approach directly
    predicts the probability of collision given the current monocular image and a
    candidate action. To obtain accurate predictions, we develop a deep
    reinforcement learning algorithm for learning indoor navigation, which uses the
    actual performance of the current policy to construct accurate supervision. The
    collision prediction model is represented by a deep convolutional neural
    network that directly processes raw image inputs. Our collision avoidance
    system is entirely trained in simulation and thus addresses the high sample
    complexity of deep reinforcement learning and avoids the dangers of
    trial-and-error learning in the real world. By highly randomizing the rendering
    settings for our simulated training set, we show that we can train a collision
    predictor that generalizes to new environments with substantially different
    appearance from the training scenarios. Finally, we evaluate our method in the
    real world by controlling a real quadrotor flying through real hallways. We
    demonstrate that our method can perform real-world collision avoidance and
    hallway following after being trained exclusively on synthetic images, without
    ever having seen a single real image at the training time. For supplementary
    video see: this https URL

    Realistic risk-mitigating recommendations via inverse classification

    Michael T. Lash, W. Nick Street
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Inverse classification, the process of making meaningful perturbations to a
    test point such that it is more likely to have a desired classification, has
    previously been addressed using data from a single static point in time. Such
    an approach yields inflated probability estimates, stemming from an implicitly
    made assumption that recommendations are implemented instantaneously. We
    propose using longitudinal data to alleviate such issues in two ways. First, we
    use past outcome probabilities as features in the present. Use of such past
    probabilities ties historical behavior to the present, allowing for more
    information to be taken into account when making initial probability estimates
    and subsequently performing inverse classification. Secondly, following inverse
    classification application, optimized instances’ unchangeable features
    (e.g.,~age) are updated using values from the next longitudinal time period.
    Optimized test instance probabilities are then reassessed. Updating the
    unchangeable features in this manner reflects the notion that improvements in
    outcome likelihood, which result from following the inverse classification
    recommendations, do not materialize instantaneously. As our experiments
    demonstrate, more realistic estimates of probability can be obtained by
    factoring in such considerations.

    Batched Gaussian Process Bandit Optimization via Determinantal Point Processes

    Tarun Kathuria, Amit Deshpande, Pushmeet Kohli
    Comments: To appear at NIPS 2016
    Subjects: Learning (cs.LG)

    Gaussian Process bandit optimization has emerged as a powerful tool for
    optimizing noisy black box functions. One example in machine learning is
    hyper-parameter optimization where each evaluation of the target function
    requires training a model which may involve days or even weeks of computation.
    Most methods for this so-called “Bayesian optimization” only allow sequential
    exploration of the parameter space. However, it is often desirable to propose
    batches or sets of parameter values to explore simultaneously, especially when
    there are large parallel processing facilities at our disposal. Batch methods
    require modeling the interaction between the different evaluations in the
    batch, which can be expensive in complex scenarios. In this paper, we propose a
    new approach for parallelizing Bayesian optimization by modeling the diversity
    of a batch via Determinantal point processes (DPPs) whose kernels are learned
    automatically. This allows us to generalize a previous result as well as prove
    better regret bounds based on DPP sampling. Our experiments on a variety of
    synthetic and real-world robotics and hyper-parameter optimization tasks
    indicate that our DPP-based methods, especially those based on DPP sampling,
    outperform state-of-the-art methods.

    Prognostics of Surgical Site Infections using Dynamic Health Data

    Chuyang Ke, Yan Jin, Heather Evans, Bill Lober, Xiaoning Qian, Ji Liu, Shuai Huang
    Comments: 23 pages, 8 figures
    Subjects: Learning (cs.LG)

    Surgical Site Infection (SSI) is a national priority in healthcare research.
    Much research attention has been attracted to develop better SSI risk
    prediction models. However, most of the existing SSI risk prediction models are
    built on static risk factors such as comorbidities and operative factors. In
    this paper, we investigate the use of the dynamic wound data for SSI risk
    prediction. There have been emerging mobile health (mHealth) tools that can
    closely monitor the patients and generate continuous measurements of many
    wound-related variables and other evolving clinical variables. Since existing
    prediction models of SSI have quite limited capacity to utilize the evolving
    clinical data, we develop the corresponding solution to equip these mHealth
    tools with decision-making capabilities for SSI prediction with a seamless
    assembly of several machine learning models to tackle the analytic challenges
    arising from the spatial-temporal data. The basic idea is to exploit the
    low-rank property of the spatial-temporal data via the bilinear formulation,
    and further enhance it with automatic missing data imputation by the matrix
    completion technique. We derive efficient optimization algorithms to implement
    these models and demonstrate the superior performances of our new predictive
    model on a real-world dataset of SSI, compared to a range of state-of-the-art
    methods.

    Dual Teaching: A Practical Semi-supervised Wrapper Method

    Fuqaing Liu, Chenwei Deng, Fukun Bi, Yiding Yang
    Comments: 7 pages, 4 figures, 1 table
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Semi-supervised wrapper methods are concerned with building effective
    supervised classifiers from partially labeled data. Though previous works have
    succeeded in some fields, it is still difficult to apply semi-supervised
    wrapper methods to practice because the assumptions those methods rely on tend
    to be unrealistic in practice. For practical use, this paper proposes a novel
    semi-supervised wrapper method, Dual Teaching, whose assumptions are easy to
    set up. Dual Teaching adopts two external classifiers to estimate the false
    positives and false negatives of the base learner. Only if the recall of every
    external classifier is greater than zero and the sum of the precision is
    greater than one, Dual Teaching will train a base learner from partially
    labeled data as effectively as the fully-labeled-data-trained classifier. The
    effectiveness of Dual Teaching is proved in both theory and practice.

    Anomaly Detection in Bitcoin Network Using Unsupervised Learning Methods

    Thai Pham, Steven Lee
    Subjects: Learning (cs.LG); Cryptography and Security (cs.CR)

    The problem of anomaly detection has been studied for a long time. In short,
    anomalies are abnormal or unlikely things. In financial networks, thieves and
    illegal activities are often anomalous in nature. Members of a network want to
    detect anomalies as soon as possible to prevent them from harming the network’s
    community and integrity. Many Machine Learning techniques have been proposed to
    deal with this problem; some results appear to be quite promising but there is
    no obvious superior method. In this paper, we consider anomaly detection
    particular to the Bitcoin transaction network. Our goal is to detect which
    users and transactions are the most suspicious; in this case, anomalous
    behavior is a proxy for suspicious behavior. To this end, we use three
    unsupervised learning methods including k-means clustering, Mahalanobis
    distance, and Unsupervised Support Vector Machine (SVM) on two graphs generated
    by the Bitcoin transaction network: one graph has users as nodes, and the other
    has transactions as nodes.

    Personalized Donor-Recipient Matching for Organ Transplantation

    Jinsung Yoon, Ahmed M. Alaa, Martin Cadeiras, Mihaela van der Schaar
    Subjects: Learning (cs.LG)

    Organ transplants can improve the life expectancy and quality of life for the
    recipient but carries the risk of serious post-operative complications, such as
    septic shock and organ rejection. The probability of a successful transplant
    depends in a very subtle fashion on compatibility between the donor and the
    recipient but current medical practice is short of domain knowledge regarding
    the complex nature of recipient-donor compatibility. Hence a data-driven
    approach for learning compatibility has the potential for significant
    improvements in match quality. This paper proposes a novel system
    (ConfidentMatch) that is trained using data from electronic health records.
    ConfidentMatch predicts the success of an organ transplant (in terms of the 3
    year survival rates) on the basis of clinical and demographic traits of the
    donor and recipient. ConfidentMatch captures the heterogeneity of the donor and
    recipient traits by optimally dividing the feature space into clusters and
    constructing different optimal predictive models to each cluster. The system
    controls the complexity of the learned predictive model in a way that allows
    for assuring more granular and confident predictions for a larger number of
    potential recipient-donor pairs, thereby ensuring that predictions are
    “personalized” and tailored to individual characteristics to the finest
    possible granularity. Experiments conducted on the UNOS heart transplant
    dataset show the superiority of the prognostic value of ConfidentMatch to other
    competing benchmarks; ConfidentMatch can provide predictions of success with
    95% confidence for 5,489 patients of a total population of 9,620 patients,
    which corresponds to 410 more patients than the most competitive benchmark
    algorithm (DeepBoost).

    Low Latency Anomaly Detection and Bayesian Network Prediction of Anomaly Likelihood

    Derek Farren, Thai Pham, Marco Alban-Hidalgo
    Subjects: Learning (cs.LG)

    We develop a supervised machine learning model that detects anomalies in
    systems in real time. Our model processes unbounded streams of data into time
    series which then form the basis of a low-latency anomaly detection model.
    Moreover, we extend our preliminary goal of just anomaly detection to
    simultaneous anomaly prediction. We approach this very challenging problem by
    developing a Bayesian Network framework that captures the information about the
    parameters of the lagged regressors calibrated in the first part of our
    approach and use this structure to learn local conditional probability
    distributions.

    Unsupervised Learning For Effective User Engagement on Social Media

    Thai Pham, Camelia Simoiu
    Subjects: Learning (cs.LG)

    In this paper, we investigate the effectiveness of unsupervised feature
    learning techniques in predicting user engagement on social media.
    Specifically, we compare two methods to predict the number of feedbacks (i.e.,
    comments) that a blog post is likely to receive. We compare Principal Component
    Analysis (PCA) and sparse Autoencoder to a baseline method where the data are
    only centered and scaled, on each of two models: Linear Regression and
    Regression Tree. We find that unsupervised learning techniques significantly
    improve the prediction accuracy on both models. For the Linear Regression
    model, sparse Autoencoder achieves the best result, with an improvement in the
    root mean squared error (RMSE) on the test set of 42% over the baseline method.
    For the Regression Tree model, PCA achieves the best result, with an
    improvement in RMSE of 15% over the baseline.

    Learning the best algorithm for max-cut, clustering, and other partitioning problems

    Maria-Florina Balcan, Vaishnavh Nagarajan, Ellen Vitercik, Colin White
    Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Many data analysis problems are NP-hard, a reality that has motivated
    researchers to develop a wealth of approximation algorithms and heuristics over
    the past few decades. Max-cut, clustering, and many other widely applicable
    partitioning problems fall into this camp. We focus on two common classes of
    algorithms that are used for clustering and partitioning problems, including
    SDP rounding algorithms and agglomerative clustering algorithms with dynamic
    programming. The best algorithm to use typically depends on the specific
    application or distribution over instances. A worst-case analysis is often used
    to compare algorithms, but worst-case instances may not be present in the
    particular problem domain, and thus may be misleading when determining which
    algorithm to apply. Therefore, it is necessary to develop optimization methods
    which return the algorithms and parameters best suited for typical inputs from
    the application at hand. We address this problem for integer quadratic
    programming and clustering within a learning-theoretic framework where the
    learning algorithm is given samples from an application-specific distribution
    over problem instances and learns an algorithm which performs well over the
    distribution. We provide sample complexity guarantees and computationally
    efficient algorithms which optimize an algorithm family’s parameters to best
    suit typical inputs from a particular application. We analyze these algorithms
    using the learning-theoretic notion of pseudo-dimension. Along with upper
    bounds, we provide the first pseudo-dimension lower bounds in this line of
    work, which require an involved analysis of each algorithm family’s performance
    on carefully constructed problem instances. Our bounds are tight and therefore
    nail down the intrinsic complexity of the algorithm classes we analyze, which
    are significantly more complex than classes commonly used in learning theory.

    Benchmarking Quantum Hardware for Training of Fully Visible Boltzmann Machines

    Dmytro Korenkevych, Yanbo Xue, Zhengbing Bian, Fabian Chudak, William G. Macready, Jason Rolfe, Evgeny Andriyash
    Comments: 22 pages, 13 figures, D-Wave quantum system for sampling Boltzmann machines
    Subjects: Quantum Physics (quant-ph); Learning (cs.LG); Machine Learning (stat.ML)

    Quantum annealing (QA) is a hardware-based heuristic optimization and
    sampling method applicable to discrete undirected graphical models. While
    similar to simulated annealing, QA relies on quantum, rather than thermal,
    effects to explore complex search spaces. For many classes of problems, QA is
    known to offer computational advantages over simulated annealing. Here we
    report on the ability of recent QA hardware to accelerate training of fully
    visible Boltzmann machines. We characterize the sampling distribution of QA
    hardware, and show that in many cases, the quantum distributions differ
    significantly from classical Boltzmann distributions. In spite of this
    difference, training (which seeks to match data and model statistics) using
    standard classical gradient updates is still effective. We investigate the use
    of QA for seeding Markov chains as an alternative to contrastive divergence
    (CD) and persistent contrastive divergence (PCD). Using (k=50) Gibbs steps, we
    show that for problems with high-energy barriers between modes, QA-based seeds
    can improve upon chains with CD and PCD initializations. For these hard
    problems, QA gradient estimates are more accurate, and allow for faster
    learning. Furthermore, and interestingly, even the case of raw QA samples (that
    is, (k=0)) achieved similar improvements. We argue that this relates to the
    fact that we are training a quantum rather than classical Boltzmann
    distribution in this case. The learned parameters give rise to hardware QA
    distributions closely approximating classical Boltzmann distributions that are
    hard to train with CD/PCD.

    Deep Learning with Sets and Point Clouds

    Siamak Ravanbakhsh, Jeff Schneider, Barnabas Poczos
    Comments: under review at ICLR
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We study a simple notion of structural invariance that readily suggests a
    parameter-sharing scheme in deep neural networks. In particular, we define
    structure as a collection of relations, and derive graph convolution and
    recurrent neural networks as special cases. We study composition of basic
    structures in defining models that are invariant to more complex “product”
    structures such as graph of graphs, sets of images or sequence of sets. For
    demonstration, our experimental results are focused on the setting where the
    discrete structure of interest is a set. We present results on several novel
    and non-trivial problems on sets, including semi-supervised learning using
    clustering information, point-cloud classification and set outlier detection.

    Generate Models and Model Criticism via Optimized Maximum Mean Discrepancy

    Dougal J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Aaditya Ramdas, Alex Smola, Arthur Gretton
    Comments: Submitted to ICLR 2017; public comments at this http URL
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Methodology (stat.ME)

    We propose a method to optimize the representation and distinguishability of
    samples from two probability distributions, by maximizing the estimated power
    of a statistical test based on the maximum mean discrepancy (MMD). This
    optimized MMD is applied to the setting of unsupervised learning by generative
    adversarial networks (GAN), in which a model attempts to generate realistic
    samples, and a discriminator attempts to tell these apart from data samples. In
    this context, the MMD may be used in two roles: first, as a discriminator,
    either directly on the samples, or on features of the samples. Second, the MMD
    can be used to evaluate the performance of a generative model, by testing the
    model’s samples against a reference data set. In the latter role, the optimized
    MMD is particularly helpful, as it gives an interpretable indication of how the
    model and data distributions differ, even in cases where individual model
    samples are not easily distinguished either by eye or by classifier.

    Attending to Characters in Neural Sequence Labeling Models

    Marek Rei, Gamal K.O. Crichton, Sampo Pyysalo
    Comments: Proceedings of COLING 2016
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Sequence labeling architectures use word embeddings for capturing similarity,
    but suffer when handling previously unseen or rare words. We investigate
    character-level extensions to such models and propose a novel architecture for
    combining alternative word representations. By using an attention mechanism,
    the model is able to dynamically decide how much information to use from a
    word- or character-level component. We evaluated different architectures on a
    range of sequence labeling datasets, and character-level extensions were found
    to improve performance on every benchmark. In addition, the proposed
    attention-based architecture delivered the best results even with a smaller
    number of trainable parameters.

    Riemannian Tensor Completion with Side Information

    Tengfei Zhou, Hui Qian, Zebang Shen, Congfu Xu
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Numerical Analysis (cs.NA)

    Riemannian optimization methods have shown to be both fast and accurate in
    recovering a large-scale tensor from its incomplete observation. However, in
    almost all recent Riemannian tensor completion methods, only low rank
    constraint is considered. Another important fact, side information or features,
    remains far from exploiting within the Riemannian optimization framework. In
    this paper, we explicitly incorporate the side information into a Riemannian
    minimization model. Specifically, a feature-embedded objective is designed to
    substantially reduce the sample complexity. For such a Riemannian optimization,
    a particular metric can be constructed based on the curvature of the objective,
    which leads to a novel Riemannian conjugate gradient descent solver. Numerical
    experiments suggest that our solver is more efficient than the state-of-the-art
    when a highly accurate solution is required.

    Reinforcement Learning of Contextual MDPs using Spectral Methods

    Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    We propose a new reinforcement learning (RL) algorithm for contextual Markov
    decision processes (CMDP) using spectral methods. CMDPs are structured MDPs
    where the dynamics and rewards depend on a smaller number of hidden states or
    contexts. If the mapping between the hidden and observed states is known a
    priori, then standard RL algorithms such as UCRL are guaranteed to attain low
    regret. Is it possible to achieve regret of the same order even when the
    mapping is unknown? We provide an affirmative answer in this paper. We exploit
    spectral methods to learn the mapping from hidden to observed states with
    guaranteed confidence bounds, and incorporate it into the UCRL-based framework
    to obtain order-optimal regret.

    Annealing Gaussian into ReLU: a New Sampling Strategy for Leaky-ReLU RBM

    Chun-Liang Li, Siamak Ravanbakhsh, Barnabas Poczos
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Restricted Boltzmann Machine (RBM) is a bipartite graphical model that is
    used as the building block in energy-based deep generative models. Due to
    numerical stability and quantifiability of the likelihood, RBM is commonly used
    with Bernoulli units. Here, we consider an alternative member of exponential
    family RBM with leaky rectified linear units — called leaky RBM. We first
    study the joint and marginal distributions of leaky RBM under different
    leakiness, which provides us important insights by connecting the leaky RBM
    model and truncated Gaussian distributions. The connection leads us to a simple
    yet efficient method for sampling from this model, where the basic idea is to
    anneal the leakiness rather than the energy; — i.e., start from a fully
    Gaussian/Linear unit and gradually decrease the leakiness over iterations. This
    serves as an alternative to the annealing of the temperature parameter and
    enables numerical estimation of the likelihood that are more efficient and more
    accurate than the commonly used annealed importance sampling (AIS). We further
    demonstrate that the proposed sampling algorithm enjoys faster mixing property
    than contrastive divergence algorithm, which benefits the training without any
    additional computational cost.


    Information Theory

    Detection Performance with Many Antennas Available for Bandwidth-Efficient Uplink Transmission in MU-MIMO Systems

    Paulo Torres, Antonio Gusmao
    Comments: 8 pages, 7 figures. arXiv admin note: text overlap with arXiv:1502.05597
    Subjects: Information Theory (cs.IT)

    This paper is concerned with SC/FDE for bandwidth-efficient uplink block
    transmission, with QAM schemes, in a MU MIMO system. The number of BS receiver
    antennas is assumed to be large, but not necessarily much larger than the
    overall number of transmitter antennas jointly using the same time/frequency
    resource at MT.

    In this context, we consider several detection techniques and evaluate, in
    detail, the corresponding detection performances (discussed with the help of
    selected performance bounds), for a range of values regarding the number of
    available BS receiver antennas. From our performance results, we conclude that
    simple linear detection techniques, designed to avoid the need of complex
    matrix inversions, can lead to unacceptably high error floor levels. However,
    by combining the use of such simple linear detectors with an appropriate
    interference cancellation procedure – within an iterative DF technique -, a
    close approximation to the SIMO MFB performance can be achieved after a few
    iterations, even for 64-QAM schemes, when the number of BS antennas is, at
    least, five times higher than the number of antennas which are jointly used at
    the user terminals.

    Matrix Characterization for GFDM Systems: Low-Complexity MMSE Receivers and Optimal Prototype Filters

    Po-Chih Chen, Borching Su, Yenming Huang
    Comments: 16 pages, 10 figures
    Subjects: Information Theory (cs.IT)

    Generalized frequency division multiplexing (GFDM) is considered to be a
    promising waveform candidate for 5G new radio. It features good properties such
    as low out-of-band (OOB) radiation. One major drawback of GFDM known in the
    literature is that a zero-forcing receiver suffers from the noise enhancement
    effect since the GFDM (transmitter) matrix in general has a greater-than-unity
    condition number. In this paper, we propose a new matrix-based characterization
    of GFDM matrices, as opposed to traditional vector-based characterization with
    prototype filters. The characterization is helpful in deriving properties of
    GFDM matrices, including conditions on GFDM matrices being nonsingular and
    unitary, respectively. Further, the condition on the existence of a form of
    low-complexity MMSE receivers is also derived. It is found that such a receiver
    under frequency-selective channels exists if and only if the GFDM transmitter
    matrix is chosen to be unitary. For non-unitary transmitter matrices, a
    low-complexity suboptimal MMSE receiver is proposed with a performance very
    close to that of an MMSE receiver. Besides, optimal prototype filters in terms
    of minimizing receiver mean square error (MSE) are derived and found to
    correspond to the use of unitary GFDM matrices under many scenarios. These
    optimal filters can be applied in GFDM systems without causing the problem of
    noise enhancement, thereby having the same MSE performance as OFDM.
    Furthermore, we find that GFDM matrices with a size of power of two do exist in
    the class of unitary GFDM matrices. Finally, while the OOB radiation
    performance of systems using a unitary GFDM matrix is not optimal in general,
    it is shown that the OOB radiation can be satisfactorily low if the parameters
    are carefully chosen.

    Leech Constellations of Construction-A Lattices

    Nicola di Pietro, Joseph J. Boutros
    Comments: 30 pages, 5 figures
    Subjects: Information Theory (cs.IT)

    The problem of communicating over the AWGN channel with lattice codes is
    addressed in this paper. Theoretically, Voronoi constellations have proven to
    yield very powerful lattice codes when the fine lattice is AWGN-good and the
    coarse lattice has an optimal shaping gain. However, achieving Shannon capacity
    with these premises and practically implementable encoding algorithms is in
    general not an easy task. In this work, constructions called Leech
    constellations are proposed. They involve Construction-A lattices as fine
    lattices and a direct sum of Leech lattices as shaping lattice. The combination
    of these two ingredients allows to design finite lattice constellations of
    dual-diagonal LDA lattices whose numerically measured waterfall region is
    situated at less than 0:8 dB from Shannon capacity. A detailed explanation on
    how to perform encoding and demapping is given. A very appreciable feature of
    Leech constellations of dual-diagonal LDA lattices is that encoding, deocoding,
    and demapping have all linear complexity in the block length.

    Interference Cancellation at Receivers in Cache-Enabled Wireless Networks

    Chenchen Yang, Yao Yao, Bin Xia, Kaibin Huang, Weiliang Xie, Yong Zhao
    Subjects: Information Theory (cs.IT)

    In this paper, we propose to exploit the limited cache packets as side
    information to cancel incoming interference at the receiver side. We consider a
    stochastic network where the random locations of base stations and users are
    modeled using Poisson point processes. Caching schemes to reap both the local
    caching gain and the interference cancellation gain for the users are developed
    based on two factors: the density of different user subsets and the packets
    cached in the corresponding subsets. The packet loss rate (PLR) is analyzed,
    which depends on both the cached packets and the channel state information
    (CSI) available at the receiver. Theoretical results reveal the tradeoff
    between caching resource and wireless resource. The performance for different
    caching schemes are analyzed and the minimum achievable PLR for the distributed
    caching is derived.

    Counting generalized Reed-Solomon codes

    Peter Beelen, David Glynn, Tom Høholdt, Krishna Kaipa
    Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM)

    In this article we count the number of generalized Reed-Solomon (GRS) codes
    of dimension k and length n, including the codes coming from a non-degenerate
    conic plus nucleus. We compare our results with known formulae for the number
    of 3-dimensional MDS codes of length n=6,7,8,9.

    Towards Information Privacy for the Internet of Things

    Meng Sun, Wee Peng Tay, Xin He
    Comments: 13 pages, 4 figures, submitted to IEEE transaction on signal processing
    Subjects: Information Theory (cs.IT)

    In an Internet of Things network, multiple sensors send information to a
    fusion center for it to infer a public hypothesis of interest. However, the
    same sensor information may be used by the fusion center to make inferences of
    a private nature that the sensors wish to protect. To model this, we adopt a
    decentralized hypothesis testing framework with binary public and private
    hypotheses. Each sensor makes a private observation and utilizes a local sensor
    decision rule or privacy mapping to summarize that observation before sending
    it to the fusion center. Without assuming knowledge of the joint distribution
    of the sensor observations and hypotheses, we adopt a nonparametric learning
    approach to design local privacy mappings. We introduce the concept of an
    empirical normalized risk, which provides a theoretical guarantee for the
    network to achieve information privacy for the private hypothesis with high
    probability when the number of training samples is large. We develop iterative
    optimization algorithms to determine an appropriate privacy threshold and the
    best sensor privacy mappings, and show that they converge. Numerical results on
    both synthetic and real data sets suggest that our proposed approach yields low
    error rates for inferring the public hypothesis, but high error rates for
    detecting the private hypothesis.

    An algebraic framework for end-to-end physical-layer network coding

    Elisa Gorla, Alberto Ravagnani
    Subjects: Information Theory (cs.IT); Commutative Algebra (math.AC)

    We propose an algebraic setup for end-to-end physical-layer network coding
    based on submodule transmission. We introduce a distance function between
    modules, describe how it relates to information loss and errors, and show how
    to compute it. Then we propose a definition of submodule error-correcting code,
    and investigate bounds and constructions for such codes.

    BDMA for Millimeter-Wave/Terahertz Massive MIMO Transmission with Per-Beam Synchronization

    Li You, Xiqi Gao, Geoffrey Ye Li, Xiang-Gen Xia, Ni Ma
    Comments: 29 pages, 4 figures
    Subjects: Information Theory (cs.IT)

    We propose beam division multiple access (BDMA) with per-beam synchronization
    (PBS) in time and frequency for wideband massive multiple-input multiple-output
    (MIMO) transmission over millimeter-wave (mmW)/Terahertz (THz) bands. We first
    introduce a physically motivated beam domain channel model for massive MIMO and
    demonstrate that the envelopes of the beam domain channel elements tend to be
    independent of time and frequency when both the numbers of antennas at base
    station and user terminals (UTs) tend to infinity. Motivated by the derived
    beam domain channel properties, we then propose PBS for mmW/THz massive MIMO.
    We show that both the effective delay and Doppler frequency spreads of wideband
    massive MIMO channels with PBS are reduced by a factor of the number of UT
    antennas compared with the conventional synchronization approaches.
    Subsequently, we apply PBS to BDMA, investigate beam scheduling to maximize the
    achievable ergodic rates for both uplink and downlink BDMA, and develop a
    greedy beam scheduling algorithm. Simulation results verify the effectiveness
    of BDMA with PBS for mmW/THz wideband massive MIMO systems in typical mobility
    scenarios.

    Optimal Placement Delivery Arrays

    Minquan Cheng, Jing Jiang, Qifa Yan, Xiaohu Tang, Haitao Cao
    Comments: 11pages, coded caching, placement delivery array
    Subjects: Information Theory (cs.IT)

    In wireless networks, coded caching scheme is an effective technique to
    reduce network congestion during peak traffic times. A ((K,F,Z,S)) placement
    delivery array (((K,F,Z,S))PDA in short) can be used to design a coded caching
    scheme with the delivery rate (S/F) during the peak traffic times. Clearly in
    order to minimize delivery rate, we only need to consider a ((K,F,Z,S))PDA with
    minimum (S). For the fixed parameters (K), (F) and (Z), a ((K,F,Z,S))PDA with
    the minimum (S) is called optimal. So it is meaningful to study optimal PDAs.

    There are some known PDAs constructed by combinatorial design theory,
    hypergraphs and so on. However there is only one class of optimal PDAs (IEEE
    Trans. Inf. Theory 60:2856-2867, 2014) constructed by Ali and Niesen. We will
    focus on constructing optimal PDAs. In this paper, two classes of lower bounds
    on the value of (S) in a ((K,F,Z,S))PDA are derived first. Then two classes of
    recursive constructions are proposed. Consequently (i) optimal PDAs with (Z=1)
    and (F-1) for any positive integers (K) and (F) are obtained; (ii) several
    infinite classes of optimal PDAs for some (K), (F) and (Z) are constructed.
    Finally more existence of optimal PDAs with (Z=F-2) are given.

    Millimeter Wave Beam Alignment: Large Deviations Analysis and Design Insights

    Chunshan Liu, Min Li, Stephen V. Hanly, Iain B. Collings, Philip Whiting
    Comments: 27 pages, 7 figures, submitted
    Subjects: Information Theory (cs.IT)

    In millimeter wave cellular communication, fast and reliable beam alignment
    via beam training is crucial to harvest sufficient beamforming gain for the
    subsequent data transmission. In this paper, we establish fundamental limits in
    beam-alignment performance under both the exhaustive search and the
    hierarchical search that adopts multi-resolution beamforming codebooks,
    accounting for time-domain training overhead. Specifically, we derive lower and
    upper bounds on the probability of misalignment for an arbitrary level in the
    hierarchical search, based on a single-path channel model. Using the method of
    large deviations, we characterize the decay rate functions of both bounds and
    show that the bounds coincide as the training sequence length goes large. With
    these results, we characterize the asymptotic misalignment probability of both
    the hierarchical and exhaustive search, and show that the latter asymptotically
    outperforms the former, subject to the same training overhead and codebook
    resolution. We show via numerical results that this relative performance
    behavior holds in the non-asymptotic regime. Moreover, the exhaustive search is
    shown to achieve significantly higher worst-case spectrum efficiency than the
    hierarchical search, when the pre-beamforming signal-to-noise ratio (SNR) is
    relatively low. This study hence implies that the exhaustive search is more
    effective for users situated not so close to base stations, as they tend to
    have low SNR.

    Fast Polarization and Finite-Length Scaling for Non-Stationary Channels

    Hessam Mahdavifar
    Subjects: Information Theory (cs.IT)

    We consider the problem of polar coding for transmission over a
    non-stationary sequence of independent binary-input memoryless symmetric (BMS)
    channels (left{W_i
    ight}_{i=1}^{infty}), where the (i)-th encoded bit is
    transmitted over (W_i). We show, for the first time, a polar coding scheme that
    achieves the effective average symmetric capacity ()
    overline{I}(left{W_i
    ight}_{i=1}^{infty}) := lim_{N
    ightarrow infty}
    frac{1}{N}sum_{i=1}^N I(W_i), () assuming that the limit exists. The polar
    coding scheme is constructed using Ar{i}kan’s channel polarization
    transformation in combination with certain permutations at each polarization
    level and certain skipped operations. This guarantees a fast polarization
    process that results in polar coding schemes with block lengths upper bounded
    by a polynomial of (1/epsilon), where (epsilon) is the gap to the average
    capacity. More specifically, given an arbitrary sequence of BMS channels
    (left{W_i
    ight}_{i=1}^{N}) and (P_e), where (0 < P_e <1), we construct a
    polar code of length (N) and rate (R) guaranteeing a block error probability of
    at most (P_e) for transmission over (left{W_i
    ight}_{i=1}^{N}) such that ()
    N leq frac{kappa}{(overline{I}_N – R)^{mu}} () where (mu) is a constant,
    (kappa) is a constant depending on (P_e) and (mu), and (overline{I}_N) is
    the average of the symmetric capacities (I(W_i)), for (i=1,2,,dots,N). We
    further show a numerical upper bound on (mu) that is: (mu leq 10.78). The
    encoding and decoding complexities of the constructed polar code preserves (O(N
    log N)) complexity of Ar{i}kan’s polar codes.

    Self-Calibration via Linear Least Squares

    Shuyang Ling, Thomas Strohmer
    Subjects: Information Theory (cs.IT)

    Whenever we use devices to take measurements, calibration is indispensable.
    While the purpose of calibration is to reduce bias and uncertainty in the
    measurements, it can be quite difficult, expensive and sometimes even
    impossible to implement. We study a challenging problem called
    self-calibration, i.e., the task of designing an algorithm for devices so that
    the algorithm is able to perform calibration automatically. More precisely, we
    consider the setup (oldsymbol{y} = mathcal{A}(oldsymbol{d}) oldsymbol{x}
    + oldsymbol{epsilon}) where only partial information about the sensing
    matrix (mathcal{A}(oldsymbol{d})) is known and where
    (mathcal{A}(oldsymbol{d})) linearly depends on (oldsymbol{d}). The goal is
    to estimate the calibration parameter (oldsymbol{d}) (resolve the uncertainty
    in the sensing process) and the signal/object of interests (oldsymbol{x})
    simultaneously. For three different models of practical relevance we show how
    such a bilinear inverse problem, including blind deconvolution as an important
    example, can be solved via a simple linear least squares approach. As a
    consequence, the proposed algorithms are numerically extremely efficient, thus
    allowing for real-time deployment. Explicit theoretical guarantees and
    stability theory are derived and the number of sampling complexity is nearly
    optimal (up to a poly-log factor). Applications in imaging sciences and signal
    processing are discussed and numerical simulations are presented to demonstrate
    the effectiveness and efficiency of our approach.

    Energy-Based Adaptive Multiple Access in LoRaWAN IoT Systems with Energy Harvesting

    Nicolo Michelusi, Marco Levorato
    Comments: Submitted to IEEE ICASSP 2017
    Subjects: Information Theory (cs.IT)

    This paper considers a network of energy harvesting nodes which perform data
    communication to a sink node over a multiple access channel. In order to reduce
    the complexity of network control resulting from adaptation to the energy
    storage level at each node, an optimization framework is proposed where energy
    storage dynamics are replaced by dynamic average power constraints induced by
    the time correlated energy supply, thus enabling lightweight and flexible
    network control. The structure of the throughput-optimal genie-aided
    transmission policy is derived under constraints on the average energy
    consumption. The policy takes the form of the packet transmission probability
    defined on the energy harvesting state and number of active wireless nodes.
    Inspired by the genie-aided policy, a Bayesian estimation approach is proposed
    to address the case where the base station estimates the number of active
    wireless nodes based on the observed network transmission pattern, and
    broadcasts low-overhead control information to the whole network.

    Resource Allocation in Wireless Powered Relay Networks: A Bargaining Game Approach

    Zijie Zheng, Lingyang Song, Dusit Niyato, Zhu Han
    Comments: 14 pages, 7 figures, journal paper
    Subjects: Information Theory (cs.IT); Computer Science and Game Theory (cs.GT)

    Simultaneously information and power transfer in mobile relay networks have
    recently emerged, where the relay can harvest the radio frequency (RF) energy
    and then use this energy for data forwarding and system operation. Most of the
    previous works do not consider that the relay may have its own objectives, such
    as using the harvested energy for its own transmission instead of maximizing
    transmission of the network. Therefore, in this paper, we propose a Nash
    bargaining approach to balance the information transmission efficiency of
    source-destination pairs and the harvested energy of the relay in a wireless
    powered relay network with multiple source-destination pairs and one relay. We
    analyze and prove that the Nash bargaining problem has several desirable
    properties such as the discreteness and quasi-concavity, when it is decomposed
    into three sub-problems: the energy transmission power optimization, the power
    control for data transmission and the time division between energy transmission
    and data transmission. Based on the theoretical analysis, we propose an
    alternating power control and time division algorithm to find a suboptimal
    solution. Simulation results clearly show and demonstrate the properties of the
    problem and the convergence of our algorithm.

    WINDOW: Wideband Demodulator for Optical Waveforms

    Omri Lev, Tal Wiener, Deborah Cohen, Yonina C. Eldar
    Subjects: Information Theory (cs.IT)

    Optical communication systems, which operate at very high rates, are often
    limited by the sampling rate bottleneck. The optical wideband regime may exceed
    analog to digital converters (ADCs) front-end bandwidth. Multi-channel sampling
    approaches, such as multicoset or interleaved ADCs, have been proposed to
    sample the wideband signal using several channels. Each channel samples below
    the Nyquist rate such that the overall sampling rate is preserved. However,
    this scheme suffers from two practical limitations that make its implementation
    difficult. First, the inherent anti-aliasing filter of the samplers distorts
    the wideband signal. Second, it requires accurate time shifts on the order of
    the signal’s Nyquist rate, which are challenging to maintain. In this work, we
    propose an alternative multi-channel sampling scheme, the wideband demodulator
    for optical waveforms (WINDOW), based on analog RF demodulation, where each
    channel aliases the spectrum using a periodic mixing function before
    integration and sampling. We show that intentionally using the inherent ADC
    filter to perform integration increases the signal to noise ratio (SNR). We
    demonstrate both theoretically and through numerical experiments that our
    system outperforms multicoset in terms of signal recovery and symbol estimation
    in the presence of both thermal and quantization noise but is slightly less
    robust to timing jitter.

    The Entropy Region is not Closed Under Duality

    Tarik Kaced
    Comments: submitted
    Subjects: Information Theory (cs.IT)

    We import a duality notion coming from polymatroids to define duality for
    information inequalities. We show that the entropy region for (nge 5) is not
    closed under duality. Our result answers an open question of Mat`uv{s}
    (1992).

    An Elementary Proof of Convex Phase Retrieval in the Natural Parameter Space via the Linear Program PhaseMax

    Paul Hand, Vladislav Voroninski
    Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

    The phase retrieval problem has garnered significant attention since the
    development of the PhaseLift algorithm, which is a convex program that operates
    in a lifted space of matrices. Because of the substantial computational cost
    due to lifting, many approaches to phase retrieval have been developed,
    including non-convex optimization algorithms which operate in the natural
    parameter space, such as Wirtinger Flow. Very recently, a convex formulation
    called PhaseMax has been discovered, and it has been proven to achieve phase
    retrieval via linear programming in the natural parameter space under optimal
    sample complexity. The current proofs of PhaseMax rely on statistical learning
    theory or geometric probability theory. Here, we present a short and elementary
    proof that PhaseMax exactly recovers real-valued vectors from random
    measurements under optimal sample complexity. Our proof only relies on standard
    probabilistic concentration and covering arguments, yielding a simpler and more
    direct proof than those that require statistical learning theory, geometric
    probability or the highly technical arguments for Wirtinger Flow-like
    approaches.

    Some conjectures on codes

    Clelia De Felice
    Subjects: Formal Languages and Automata Theory (cs.FL); Information Theory (cs.IT)

    Variable-length codes are the bases of the free submonoids of a free monoid.
    There are some important longstanding open questions about the structure of
    finite maximal codes. In this paper we discuss this conjectures and their
    relations with factorizations of cyclic groups.

    Good Integers and Applications in Coding Theory

    Somphong Jitman
    Comments: 21 pages
    Subjects: Rings and Algebras (math.RA); Information Theory (cs.IT); Number Theory (math.NT)

    A class of good integers has been introduced by P. Moree in (1997) together
    with the characterization of good odd integers. Such integers have shown to
    have nice number theoretical properties and wide applications. In this paper, a
    complete characterization of all good integers is given.

    Two subclasses of good integers are introduced, namely, oddly-good and
    evenly-good integers. The characterization and properties of good integers in
    the two subclasses are determined.

    As applications, good integers and oddly-good integers are applied in the
    study of the hulls of abelian codes. The average dimension of the hulls of
    abelian codes is given together with some upper and lower bounds.

    Nuclei and automorphism groups of generalized twisted Gabidulin codes

    Rocco Trombetti, Yue Zhou
    Comments: 20 pages
    Subjects: Combinatorics (math.CO); Information Theory (cs.IT)

    Generalized twisted Gabidulin codes are one of the few known families of
    maximum rank matrix codes over finite fields. As a subset of m by n matrices,
    when m=n, the automorphism group of any generalized twisted Gabidulin code has
    been completely determined by the authors recently. In this paper, we consider
    the same problem for m<n. Under certain conditions on their parameters, we
    determine their middle nuclei and right nuclei, which are important invariants
    with respect to the equivalence for rank metric codes. Furthermore, we also use
    them to derive necessary conditions on the automorphisms of generalized twisted
    Gabidulin codes.

    Bounds and Constructions for (overline{3})-Strongly Separable Codes with Length (3)

    Xuli Zhang, Jing Jiang, Minquan Cheng
    Comments: 17 pages, separable codes, strongly separable codes
    Subjects: Discrete Mathematics (cs.DM); Information Theory (cs.IT)

    As separable code (SC, IEEE Trans Inf Theory 57:4843-4851, 2011) and
    frameproof code (FPC, IEEE Trans Inf Theory 44:1897-1905, 1998) do in
    multimedia fingerprinting, strongly separable code (SSC, Des. Codes and
    Cryptogr.79:303-318, 2016) can be also used to construct anti-collusion codes.
    Furthermore, SSC is better than FPC and SC in the applications for multimedia
    fingerprinting since SSC has lower tracing complexity than that of SC (the same
    complexity as FPC) and weaker structure than that of FPC. In this paper, we
    first derive several upper bounds on the number of codewords of
    (overline{t})-SSC. Then we focus on (overline{3})-SSC with codeword length
    (3), and obtain the following two main results: (1) An equivalence between an
    SSC and an SC. %Consequently a more tighter upper bound ((3q^2/4)) and lower
    bound ((q^{3/2})) on the number of codewords are obtained; (2) An improved
    lower bound (Omega (q^{5/3}+q^{4/3}-q)) on the size of a (q)-ary SSC when
    (q=q_1^6) for any prime power (q_1equiv 1 pmod 6), better than the before
    known bound (lfloorsqrt{q}
    floor^{3}), which is obtained by means of
    difference matrix and the known result on the subset of (mathbb{F}^{n}_q)
    containing no three points on a line.

    A linear-time benchmarking tool for generalized surface codes

    Nicolas Delfosse, Pavithran Iyer, David Poulin
    Comments: Software available online: this http URL
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    Quantum information processors need to be protected against errors and
    faults. One of the most widely considered fault-tolerant architecture is based
    on surface codes. While the general principles of these codes are well
    understood and basic code properties such as minimum distance and rate are easy
    to characterize, a code’s average performance depends on the detailed geometric
    layout of the qubits. To date, optimizing a surface code architecture and
    comparing different geometric layouts relies on costly numerical simulations.
    Here, we propose a benchmarking algorithm for simulating the performance of
    surface codes, and generalizations thereof, that runs in linear time. We
    implemented this algorithm in a software that generates performance reports and
    allows to quickly compare different architectures.

    Sub-channel and Power Allocation for Non-orthogonal Multiple Access Relay Networks with Amplify-and-Forward Protocol

    Shuhang Zhang, Boya Di, Lingyang Song, Yonghui Li
    Comments: 30 pages, 6 figures, submit to Tran. Wireless Commun
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    In this paper, we study the resource allocation problem for a single-cell
    non-orthogonal multiple access (NOMA) relay network where an OFDM
    amplify-and-forward (AF) relay allocates the spectrum and power resources to
    the source-destination (SD) pairs. We aim to optimize the resource allocation
    to maximize the average sum-rate. The optimal approach requires an exhaustive
    search, leading to an NP-hard problem. To solve this problem, we propose two
    efficient many-to-many two-sided SD pair-subchannel matching algorithms in
    which the SD pairs and sub-channels are considered as two sets of players
    chasing their own interests. The proposed algorithms can provide a sub-optimal
    solution to this resource allocation problem in affordable time. Both the
    static matching algorithm and dynamic matching algorithm converge to a
    pair-wise stable matching after a limited number of iterations. Simulation
    results show that the capacity of both proposed algorithms in the NOMA scheme
    significantly outperforms the conventional orthogonal multiple access scheme.
    The proposed matching algorithms in NOMA scheme also achieve a better
    user-fairness performance than the conventional orthogonal multiple access.

    Vulnerability analysis of coherent quantum cryptography protocols towards to an active beam-splitting attack

    D.A. Kronberg, E.O. Kiktenko, A.K. Fedorov, Y.V. Kurochkin
    Comments: 5 pages, 2 figures; comments are welcome
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    A novel attack on the coherent quantum key distribution (coherent one-way,
    COW) protocol is considered. The key idea of the proposed attack is performing
    of individual measurements on a fraction of intercepted states and transferring
    or blocking of the remaining part depending on a measurement result.

    Entropic Causal Inference

    Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath, Babak Hassibi
    Comments: To appear in AAAI 2017
    Subjects: Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

    We consider the problem of identifying the causal direction between two
    discrete random variables using observational data. Unlike previous work, we
    keep the most general functional model but make an assumption on the unobserved
    exogenous variable: Inspired by Occam’s razor, we assume that the exogenous
    variable is simple in the true causal direction. We quantify simplicity using
    R’enyi entropy. Our main result is that, under natural assumptions, if the
    exogenous variable has low (H_0) entropy (cardinality) in the true direction,
    it must have high (H_0) entropy in the wrong direction. We establish several
    algorithmic hardness results about estimating the minimum entropy exogenous
    variable. We show that the problem of finding the exogenous variable with
    minimum entropy is equivalent to the problem of finding minimum joint entropy
    given (n) marginal distributions, also known as minimum entropy coupling
    problem. We propose an efficient greedy algorithm for the minimum entropy
    coupling problem, that for (n=2) provably finds a local optimum. This gives a
    greedy algorithm for finding the exogenous variable with minimum (H_1) (Shannon
    Entropy). Our greedy entropy-based causal inference algorithm has similar
    performance to the state of the art additive noise models in real datasets. One
    advantage of our approach is that we make no use of the values of random
    variables but only their distributions. Our method can therefore be used for
    causal inference for both ordinal and also categorical data, unlike additive
    noise models.




沪ICP备19023445号-2号
友情链接