IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 31 May 2017

    我爱机器学习(52ml.net)发表于 2017-05-31 00:00:00
    love 0

    Neural and Evolutionary Computing

    Deep Learning is Robust to Massive Label Noise

    David Rolnick, Andreas Veit, Serge Belongie, Nir Shavit
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Deep neural networks trained on large supervised datasets have led to
    impressive results in recent years. However, since well-annotated datasets can
    be prohibitively expensive and time-consuming to collect, recent work has
    explored the use of larger but noisy datasets that can be more easily obtained.
    In this paper, we investigate the behavior of deep neural networks on training
    sets with massively noisy labels. We show that successful learning is possible
    even with an essentially arbitrary amount of noise. For example, on MNIST we
    find that accuracy of above 90 percent is still attainable even when the
    dataset has been diluted with 100 noisy examples for each clean example. Such
    behavior holds across multiple patterns of label noise, even when noisy labels
    are biased towards confusing classes. Further, we show how the required dataset
    size for successful training increases with higher label noise. Finally, we
    present simple actionable techniques for improving learning in the regime of
    high label noise.

    DNN-based uncertainty estimation for weighted DNN-HMM ASR

    José Novoa, Josué Fredes, Néstor Becerra Yoma
    Subjects: Sound (cs.SD); Neural and Evolutionary Computing (cs.NE)

    In this paper, the uncertainty is defined as the mean square error between a
    given enhanced noisy observation vector and the corresponding clean one. Then,
    a DNN is trained by using enhanced noisy observation vectors as input and the
    uncertainty as output with a training database. In testing, the DNN receives an
    enhanced noisy observation vector and delivers the estimated uncertainty. This
    uncertainty in employed in combination with a weighted DNN-HMM based speech
    recognition system and compared with an existing estimation of the noise
    cancelling uncertainty variance based on an additive noise model. Experiments
    were carried out with Aurora-4 task. Results with clean, multi-noise and
    multi-condition training are presented.


    Computer Vision and Pattern Recognition

    Reflection Invariant and Symmetry Detection

    Erbo Li, Hua Li
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Symmetry detection and discrimination are of fundamental meaning in science,
    technology, and engineering. This paper introduces reflection invariants and
    defines the directional moment to detect symmetry for shape analysis and object
    recognition. And it demonstrates that detection of reflection symmetry can be
    done in a simple way by solving a trigonometric system derived from the
    directional moment, and discrimination of reflection symmetry can be achieved
    by application of the reflection invariants in 2D and 3D. Rotation symmetry can
    also be determined based on that.The experiments in 2D and 3D, including the
    regular triangle, the square, and the five Platonic objects, show that all the
    reflection lines or planes can be deterministically found using directional
    moments up to order six. This result can be used to simplify the efforts of
    symmetry detection in research areas, such as protein structure, model
    retrieval, inverse engineering, and machine vision etc.

    A Kernel Redundancy Removing Policy for Convolutional Neural Network

    Chih-Ting Liu, Yi-Heng Wu, Yu-Sheng Lin, Shao-Yi Chien
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep Convolutional Neural Networks (CNN) have won a significant place in the
    computer vision recently, which repeatedly convolving an image to extract the
    knowledge behind it. However, with the depth of convolutional layers getting
    deeper and deeper in recent years, the computational complexity also increases
    significantly, which make it difficult to be deployed on embedded systems with
    limited hardware resources.

    In this paper we propose a method to reduce the redundant convolution kernels
    during the computation of CNN and apply it to a network for super resolution
    (SR). Using PSNR drop compared to the original network as performance
    criterion, our method can get the optimal PSNR under a certain computation
    budget constraint. On the other hand, our method is also capable of minimizing
    the computation required under a given PSNR drop.

    Deep manifold-to-manifold transforming network for action recognition

    Tong Zhang (1 and 2), Wenming Zheng (2), Zhen Cui (2), Yuan Zong (2), Yang Li (1 and 2) ((1) the Department of Information Science and Engineering, Southeast University, Nanjing, China (2) the Key Laboratory of Child Development and Learning Science of Ministry of Education, Research Center for Learning Science, Southeast University, Nanjing, China)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, a novel deep manifold-to-manifold transforming network
    (DMT-Net) is proposed for action recognition, in which symmetric positive
    definite (SPD) matrix is adopted to describe the spatial-temporal information
    of action feature vectors. Since each SPD matrix is a point of the Riemannian
    manifold space, the proposed DMT-Net aims to learn more discriminative feature
    by hierarchically transforming the data points from one Riemannian manifold to
    another more discriminative one. To this end, several novel layers are proposed
    in DMT-Net, including SPD convolutional layer, channel convolution layer,
    diagonalizing layer and kernel regression layer. Specifically, SPD
    convolutional layer enables multi-channel convolution to be well applied to
    Riemannian manifold, and the kernel regression layer enables the distance
    metric computation between two SPD points in Riemannian manifold to be done in
    the Euclidean space, in which a novel reference set dynamically generated
    during the network training is also introduced to relieve the computational
    burden of the kernel method. To evaluate the effectiveness of the proposed
    method, three action recognition databases are respectively used to testify our
    method and the experimental results show that our algorithm outperforms the
    state-of-the-art methods.

    Addressing Ambiguity in Multi-target Tracking by Hierarchical Strategy

    Ali Taalimi, Liu Liu, Hairong Qi
    Comments: 5 pages, Accepted in International Conference of Image Processing, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a novel hierarchical approach for the simultaneous
    tracking of multiple targets in a video. We use a network flow approach to link
    detections in low-level and tracklets in high-level. At each step of the
    hierarchy, the confidence of candidates is measured by using a new scoring
    system, ConfRank, that considers the quality and the quantity of its
    neighborhood. The output of the first stage is a collection of safe tracklets
    and unlinked high-confidence detections. For each individual detection, we
    determine if it belongs to an existing or is a new tracklet. We show the effect
    of our framework to recover missed detections and reduce switch identity. The
    proposed tracker is referred to as TVOD for multi-target tracking using the
    visual tracker and generic object detector. We achieve competitive results with
    lower identity switches on several datasets comparing to state-of-the-art.

    Multi-View Task-Driven Recognition in Visual Sensor Networks

    Ali Taalimi, Alireza Rahimpour, Liu Liu, Hairong Qi
    Comments: 5 pages, Accepted in International Conference of Image Processing, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Nowadays, distributed smart cameras are deployed for a wide set of tasks in
    several application scenarios, ranging from object recognition, image
    retrieval, and forensic applications. Due to limited bandwidth in distributed
    systems, efficient coding of local visual features has in fact been an active
    topic of research. In this paper, we propose a novel approach to obtain a
    compact representation of high-dimensional visual data using sensor fusion
    techniques. We convert the problem of visual analysis in resource-limited
    scenarios to a multi-view representation learning, and we show that the key to
    finding properly compressed representation is to exploit the position of
    cameras with respect to each other as a norm-based regularization in the
    particular signal representation of sparse coding. Learning the representation
    of each camera is viewed as an individual task and a multi-task learning with
    joint sparsity for all nodes is employed. The proposed representation learning
    scheme is referred to as the multi-view task-driven learning for visual sensor
    network (MT-VSN). We demonstrate that MT-VSN outperforms state-of-the-art in
    various surveillance recognition tasks.

    ResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting, Violent Behaviour Detection and Crowd Density Level Classification

    Mark Marsden, Kevin McGuinness, Suzanne Little, Noel E. O'Connor
    Comments: 7 Pages, AVSS 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper we propose ResnetCrowd, a deep residual architecture for
    simultaneous crowd counting, violent behaviour detection and crowd density
    level classification. To train and evaluate the proposed multi-objective
    technique, a new 100 image dataset referred to as Multi Task Crowd is
    constructed. This new dataset is the first computer vision dataset fully
    annotated for crowd counting, violent behaviour detection and density level
    classification. Our experiments show that a multi-task approach boosts
    individual task performance for all tasks and most notably for violent
    behaviour detection which receives a 9\% boost in ROC curve AUC (Area under the
    curve). The trained ResnetCrowd model is also evaluated on several additional
    benchmarks highlighting the superior generalisation of crowd analysis models
    trained for multiple objectives.

    Discovering Visual Concept Structure with Sparse and Incomplete Tags

    Jingya Wang, Xiatian Zhu, Shaogang Gong
    Comments: Artificial Intelligence journal 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Discovering automatically the semantic structure of tagged visual data (e.g.
    web videos and images) is important for visual data analysis and
    interpretation, enabling the machine intelligence for effectively processing
    the fast-growing amount of multi-media data. However, this is non-trivial due
    to the need for jointly learning underlying correlations between heterogeneous
    visual and tag data. The task is made more challenging by inherently sparse and
    incomplete tags. In this work, we develop a method for modelling the inherent
    visual data concept structures based on a novel Hierarchical-Multi-Label Random
    Forest model capable of correlating structured visual and tag information so as
    to more accurately interpret the visual semantics, e.g. disclosing meaningful
    visual groups with similar high-level concepts, and recovering missing tags for
    individual visual data samples. Specifically, our model exploits hierarchically
    structured tags of different semantic abstractness and multiple tag statistical
    correlations in addition to modelling visual and tag interactions. As a result,
    our model is able to discover more accurate semantic correlation between
    textual tags and visual features, and finally providing favourable visual
    semantics interpretation even with highly sparse and incomplete tags. We
    demonstrate the advantages of our proposed approach in two fundamental
    applications, visual data clustering and missing tag completion, on
    benchmarking video (i.e. TRECVID MED 2011) and image (i.e. NUS-WIDE) datasets.

    Nighttime sky/cloud image segmentation

    Soumyabrata Dev, Florian M. Savoy, Yee Hui Lee, Stefan Winkler
    Comments: Accepted in Proc. IEEE International Conference on Image Processing (ICIP), 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Imaging the atmosphere using ground-based sky cameras is a popular approach
    to study various atmospheric phenomena. However, it usually focuses on the
    daytime. Nighttime sky/cloud images are darker and noisier, and thus harder to
    analyze. An accurate segmentation of sky/cloud images is already challenging
    because of the clouds’ non-rigid structure and size, and the lower and less
    stable illumination of the night sky increases the difficulty. Nonetheless,
    nighttime cloud imaging is essential in certain applications, such as
    continuous weather analysis and satellite communication.

    In this paper, we propose a superpixel-based method to segment nighttime
    sky/cloud images. We also release the first nighttime sky/cloud image
    segmentation database to the research community. The experimental results show
    the efficacy of our proposed algorithm for nighttime images.

    Multi-Focus Image Fusion Via Coupled Sparse Representation and Dictionary Learning

    Rui Gao, Sergiy A. Vorobyov
    Comments: 27 pages, 8 figures, 1 table
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We address the multi-focus image fusion problem, where multiple images
    captured with different focal settings are to be fused into an all-in-focus
    image of higher quality. Algorithms for this problem necessarily admit the
    source image characteristics along with focused and blurred feature. However,
    most sparsity-based approaches use a single dictionary in focused feature space
    to describe multi-focus images, and ignore the representations in blurred
    feature space. Here, we propose a multi-focus image fusion approach based on
    coupled sparse representation. The approach exploits the facts that (i) the
    patches in given training set can be sparsely represented by a couple of
    overcomplete dictionaries related to the focused and blurred categories of
    images; and (ii) merging such representations leads to a more flexible and
    therefore better fusion strategy than the one based on just selecting the
    sparsest representation in the original image estimate. By jointly learning the
    coupled dictionary, we enforce the similarity of sparse representations in the
    focused and blurred feature spaces, and then introduce a fusion approach to
    combine these representations for generating an all-in-focus image. We also
    discuss the advantages of the fusion approach based on coupled sparse
    representation and present an efficient algorithm for learning the coupled
    dictionary. Extensive experimental comparisons with state-of-the-art
    multi-focus image fusion algorithms validate the effectiveness of the proposed
    approach.

    End-to-end Active Object Tracking via Reinforcement Learning

    Wenhan Luo, Peng Sun, Yadong Mu, Wei Liu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper we propose an active object tracking approach, which provides a
    tracking solution simultaneously addressing tracking and camera control.
    Crucially, these two tasks are tackled in an end-to-end manner via
    reinforcement learning. Specifically, a ConvNet-LSTM function approximator is
    adopted, which takes as input only visual observations (i.e., frame sequences)
    and directly outputs camera motions (e.g., move forward, turn left, etc.). The
    tracker, regarded as an agent, is trained with the A3C algorithm, where we
    harness an environment augmentation technique and a customized rewarding
    function to encourage robust object tracking. We carry out experiments on the
    AI research platform ViZDoom. The yielded tracker can automatically pay
    attention to the most likely object in the initial frame and perform tracking
    subsequently, not requiring a manual bounding box as initialization. Moreover,
    our approach shows a good generalization ability when performing tracking in
    case of unseen object moving path, object appearance, background and
    distracting object. The tracker can even restore tracking once it occasionally
    loses the target.

    Interpreting and Extending The Guided Filter Via Cyclic Coordinate Descent

    Longquan Dai
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we will disclose that the Guided Filter (GF) can be
    interpreted as the Cyclic Coordinate Descent (CCD) solver of a Least Square
    (LS) objective function. This discovery implies a possible way to extend GF
    because we can alter the objective function of GF and define new filters as the
    first pass iteration of the CCD solver of modified objective functions.
    Moreover, referring to the iterative minimizing procedure of CCD, we can derive
    new rolling filtering schemes. Hence, under the guidance of this discovery, we
    not only propose new GF-like filters adapting to the specific requirements of
    applications but also offer thoroughly explanations for two rolling filtering
    schemes of GF as well as the way to extend them. Experiments show that our new
    filters and extensions produce state-of-the-art results.

    Saliency Revisited: Analysis of Mouse Movements versus Fixations

    Hamed R. Tavakoli, Fawad Ahmed, Ali Borji, Jorma Laaksonen
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper revisits visual saliency prediction by evaluating the recent
    advancements in this field such as crowd-sourced mouse tracking-based databases
    and contextual annotations. We pursue a critical and quantitative approach
    towards some of the new challenges including the quality of mouse tracking
    versus eye tracking for model training and evaluation. We extend quantitative
    evaluation of models in order to incorporate contextual information by
    proposing an evaluation methodology that allows accounting for contextual
    factors such as text, faces, and object attributes. The proposed contextual
    evaluation scheme facilitates detailed analysis of models and helps identify
    their pros and cons. Through several experiments, we find that (1) mouse
    tracking data has lower inter-participant visual congruency and higher
    dispersion, compared to the eye tracking data, (2) mouse tracking data does not
    totally agree with eye tracking in general and in terms of different contextual
    regions in specific, and (3) mouse tracking data leads to acceptable results in
    training current existing models, and (4) mouse tracking data is less reliable
    for model selection and evaluation. The contextual evaluation also reveals
    that, among the studied models, there is no single model that performs best on
    all the tested annotations.

    Parcellation of Visual Cortex on high-resolution histological Brain Sections using Convolutional Neural Networks

    Hannah Spitzer, Katrin Amunts, Stefan Harmeling, Timo Dickscheid
    Comments: Accepted for oral presentation at International Symposium of Biomedical Imaging (ISBI) 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Microscopic analysis of histological sections is considered the “gold
    standard” to verify structural parcellations in the human brain. Its high
    resolution allows the study of laminar and columnar patterns of cell
    distributions, which build an important basis for the simulation of cortical
    areas and networks. However, such cytoarchitectonic mapping is a semiautomatic,
    time consuming process that does not scale with high throughput imaging. We
    present an automatic approach for parcellating histological sections at 2um
    resolution. It is based on a convolutional neural network that combines
    topological information from probabilistic atlases with the texture features
    learned from high-resolution cell-body stained images. The model is applied to
    visual areas and trained on a sparse set of partial annotations. We show how
    predictions are transferable to new brains and spatially consistent across
    sections.

    Decorrelation of Neutral Vector Variables: Theory and Applications

    Zhanyu Ma, Jing-Hao Xue, Arne Leijon, Zheng-Hua Tan, Zhen Yang, Jun Guo
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    In this paper, we propose novel strategies for neutral vector variable
    decorrelation. Two fundamental invertible transformations, namely serial
    nonlinear transformation and parallel nonlinear transformation, are proposed to
    carry out the decorrelation. For a neutral vector variable, which is not
    multivariate Gaussian distributed, the conventional principal component
    analysis (PCA) cannot yield mutually independent scalar variables. With the two
    proposed transformations, a highly negatively correlated neutral vector can be
    transformed to a set of mutually independent scalar variables with the same
    degrees of freedom. We also evaluate the decorrelation performances for the
    vectors generated from a single Dirichlet distribution and a mixture of
    Dirichlet distributions. The mutual independence is verified with the distance
    correlation measurement. The advantages of the proposed decorrelation
    strategies are intensively studied and demonstrated with synthesized data and
    practical application evaluations.

    RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data

    Haifeng Li, Chao Tao, Zhixiang Wu, Jie Chen, Jianya Gong, Min Deng
    Comments: 39 pages, 21 figures, 7 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Remote sensing image classification is a fundamental task in remote sensing
    image processing. Remote sensing field still lacks of such a large-scale
    benchmark compared to ImageNet, Place2. We propose a remote sensing image
    classification benchmark (RSI-CB) based on crowd-source data which is massive,
    scalable, and diversity. Using crowdsource data, we can efficiently annotate
    ground objects in remotes sensing image by point of interests, vectors data
    from OSM or other crowd-source data. Based on this method, we construct a
    worldwide large-scale benchmark for remote sensing image classification. In
    this benchmark, there are two sub datasets with 256 * 256 and 128 * 128 size
    respectively since different convolution neural networks requirement different
    image size. The former sub dataset contains 6 categories with 35 subclasses
    with total of more than 24,000 images; the later one contains 6 categories with
    45 subclasses with total of more than 36,000 images. The six categories are
    agricultural land, construction land and facilities, transportation and
    facilities, water and water conservancy facilities, woodland and other land,
    and each category has several subclasses. This classification system is defined
    according to the national standard of land use classification in China, and is
    inspired by the hierarchy mechanism of ImageNet. Finally, we have done a large
    number of experiments to compare RSI-CB with SAT-4, UC-Merced datasets on
    handcrafted features, such as such as SIFT, and classical CNN models, such as
    AlexNet, VGG, GoogleNet, and ResNet. We also show CNN models trained by RSI-CB
    have good performance when transfer to other dataset, i.e. UC-Merced, and good
    generalization ability. The experiments show that RSI-CB is more suitable as a
    benchmark for remote sensing image classification task than other ones in big
    data era, and can be potentially used in practical applications.

    Robust Tracking Using Region Proposal Networks

    Jimmy Ren, Zhiyang Yu, Jianbo Liu, Rui Zhang, Wenxiu Sun, Jiahao Pang, Xiaohao Chen, Qiong Yan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent advances in visual tracking showed that deep Convolutional Neural
    Networks (CNN) trained for image classification can be strong feature
    extractors for discriminative trackers. However, due to the drastic difference
    between image classification and tracking, extra treatments such as model
    ensemble and feature engineering must be carried out to bridge the two domains.
    Such procedures are either time consuming or hard to generalize well across
    datasets. In this paper we discovered that the internal structure of Region
    Proposal Network (RPN)’s top layer feature can be utilized for robust visual
    tracking. We showed that such property has to be unleashed by a novel loss
    function which simultaneously considers classification accuracy and bounding
    box quality. Without ensemble and any extra treatment on feature maps, our
    proposed method achieved state-of-the-art results on several large scale
    benchmarks including OTB50, OTB100 and VOT2016. We will make our code publicly
    available.

    Unsupervised Person Re-identification: Clustering and Fine-tuning

    Hehe Fan, Liang Zheng, Yi Yang
    Comments: This version is a draft. We will update it with more results, parameter analysis and comparisons
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The superiority of deeply learned pedestrian representations has been
    reported in very recent literature of person re-identification (re-ID). In this
    paper, we consider the more pragmatic issue of learning a deep feature with
    only a few or no labels. We propose a progressive unsupervised learning (PUL)
    method to transfer pretrained deep representations to unseen domains. Our
    method is easy to implement and can be viewed as an effective baseline for
    unsupervised feature learning. Specifically, PUL iterates between 1) pedestrian
    clustering and 2) fine-tuning of the convolutional neural network (CNN) to
    improve the original model trained on the irrelevant labeled dataset. At the
    beginning when the model is weak, CNN is fine-tuned on a small amount of
    reliable examples which locate near to cluster centroids in the feature space.
    As the model becomes stronger in subsequent iterations, more images are being
    selected as CNN training samples. Progressively, pedestrian clustering and the
    CNN model are improved simultaneously until algorithm convergence. We then
    point out promising directions that may lead to further improvement.

    Discriminatively Learned Hierarchical Rank Pooling Networks

    Basura Fernando, Stephen Gould
    Comments: International Journal of Computer Vision
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this work, we present novel temporal encoding methods for action and
    activity classification by extending the unsupervised rank pooling temporal
    encoding method in two ways. First, we present “discriminative rank pooling” in
    which the shared weights of our video representation and the parameters of the
    action classifiers are estimated jointly for a given training dataset of
    labelled vector sequences using a bilevel optimization formulation of the
    learning problem. When the frame level features vectors are obtained from a
    convolutional neural network (CNN), we rank pool the network activations and
    jointly estimate all parameters of the model, including CNN filters and
    fully-connected weights, in an end-to-end manner which we coined as “end-to-end
    trainable rank pooled CNN”. Importantly, this model can make use of any
    existing convolutional neural network architecture (e.g., AlexNet or VGG)
    without modification or introduction of additional parameters. Then, we extend
    rank pooling to a high capacity video representation, called “hierarchical rank
    pooling”. Hierarchical rank pooling consists of a network of rank pooling
    functions, which encode temporal semantics over arbitrary long video clips
    based on rich frame level features. By stacking non-linear feature functions
    and temporal sub-sequence encoders one on top of the other, we build a high
    capacity encoding network of the dynamic behaviour of the video. The resulting
    video representation is a fixed-length feature vector describing the entire
    video clip that can be used as input to standard machine learning classifiers.
    We demonstrate our approach on the task of action and activity recognition.
    Obtained results are comparable to state-of-the-art methods on three important
    activity recognition benchmarks with classification performance of 76.7% mAP on
    Hollywood2, 69.4% on HMDB51, and 93.6% on UCF101.

    Learning to Generate Chairs with Generative Adversarial Nets

    Evgeny Zamyatin, Andrey Filchenkov
    Comments: Submitted to NIPS 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Generative adversarial networks (GANs) has gained tremendous popularity
    lately due to an ability to reinforce quality of its predictive model with
    generated objects and the quality of the generative model with and supervised
    feedback. GANs allow to synthesize images with a high degree of realism.
    However, the learning process of such models is a very complicated optimization
    problem and certain limitation for such models were found. It affects the
    choice of certain layers and nonlinearities when designing architectures. In
    particular, it does not allow to train convolutional GAN models with
    fully-connected hidden layers. In our work, we propose a modification of the
    previously described set of rules, as well as new approaches to designing
    architectures that will allow us to train more powerful GAN models. We show the
    effectiveness of our methods on the problem of synthesizing projections of 3D
    objects with the possibility of interpolation by class and view point.

    Optimal Multi-Object Segmentation with Novel Gradient Vector Flow Based Shape Priors

    Junjie Bai, Abhay Shah, Xiaodong Wu
    Comments: Paper in review
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Shape priors have been widely utilized in medical image segmentation to
    improve segmentation accuracy and robustness. A major way to encode such a
    prior shape model is to use a mesh representation, which is prone to causing
    self-intersection or mesh folding. Those problems require complex and expensive
    algorithms to mitigate. In this paper, we propose a novel shape prior directly
    embedded in the voxel grid space, based on gradient vector flows of a
    pre-segmentation. The flexible and powerful prior shape representation is ready
    to be extended to simultaneously segmenting multiple interacting objects with
    minimum separation distance constraint. The problem is formulated as a Markov
    random field problem whose exact solution can be efficiently computed with a
    single minimum s-t cut in an appropriately constructed graph. The proposed
    algorithm is validated on two multi-object segmentation applications: the brain
    tissue segmentation in MRI images, and the bladder/prostate segmentation in CT
    images. Both sets of experiments show superior or competitive performance of
    the proposed method to other state-of-the-art methods.

    Efficient Decentralized Visual Place Recognition From Full-Image Descriptors

    Titus Cieslewski, Davide Scaramuzza
    Comments: 3 pages, 4 figures. This is a self-published paper that accompanies our original work [1] as well as the ICRA 2017 Workshop on Multi-robot Perception-Driven Control and Planning [2]
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we discuss the adaptation of our decentralized place
    recognition method described in [1] to full image descriptors. As we had shown,
    the key to making a scalable decentralized visual place recognition lies in
    exploting deterministic key assignment in a distributed key-value map. Through
    this, it is possible to reduce bandwidth by up to a factor of n, the robot
    count, by casting visual place recognition to a key-value lookup problem. In
    [1], we exploited this for the bag-of-words method [3], [4]. Our method of
    casting bag-of-words, however, results in a complex decentralized system, which
    has inherently worse recall than its centralized counterpart. In this paper, we
    instead start from the recent full-image description method NetVLAD [5]. As we
    show, casting this to a key-value lookup problem can be achieved with k-means
    clustering, and results in a much simpler system than [1]. The resulting system
    still has some flaws, albeit of a completely different nature: it suffers when
    the environment seen during deployment lies in a different distribution in
    feature space than the environment seen during training.

    Deep Learning is Robust to Massive Label Noise

    David Rolnick, Andreas Veit, Serge Belongie, Nir Shavit
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Deep neural networks trained on large supervised datasets have led to
    impressive results in recent years. However, since well-annotated datasets can
    be prohibitively expensive and time-consuming to collect, recent work has
    explored the use of larger but noisy datasets that can be more easily obtained.
    In this paper, we investigate the behavior of deep neural networks on training
    sets with massively noisy labels. We show that successful learning is possible
    even with an essentially arbitrary amount of noise. For example, on MNIST we
    find that accuracy of above 90 percent is still attainable even when the
    dataset has been diluted with 100 noisy examples for each clean example. Such
    behavior holds across multiple patterns of label noise, even when noisy labels
    are biased towards confusing classes. Further, we show how the required dataset
    size for successful training increases with higher label noise. Finally, we
    present simple actionable techniques for improving learning in the regime of
    high label noise.

    Jeffrey's prior sampling of deep sigmoidal networks

    Lorien X. Hayden, Alexander A. Alemi, Paul H. Ginsparg, James P. Sethna
    Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Computer Vision and Pattern Recognition (cs.CV)

    Neural networks have been shown to have a remarkable ability to uncover low
    dimensional structure in data: the space of possible reconstructed images form
    a reduced model manifold in image space. We explore this idea directly by
    analyzing the manifold learned by Deep Belief Networks and Stacked Denoising
    Autoencoders using Monte Carlo sampling. The model manifold forms an only
    slightly elongated hyperball with actual reconstructed data appearing
    predominantly on the boundaries of the manifold. In connection with the results
    we present, we discuss problems of sampling high-dimensional manifolds as well
    as recent work [M. Transtrum, G. Hart, and P. Qiu, Submitted (2014)] discussing
    the relation between high dimensional geometry and model reduction.

    Emergent Language in a Multi-Modal, Multi-Step Referential Game

    Katrina Evtimova, Andrew Drozdov, Douwe Kiela, Kyunghyun Cho
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Multiagent Systems (cs.MA)

    Inspired by previous work on emergent language in referential games, we
    propose a novel multi-modal, multi-step referential game, where the sender and
    receiver have access to distinct modalities of an object, and their information
    exchange is bidirectional and of arbitrary duration. The multi-modal multi-step
    setting allows agents to develop an internal language significantly closer to
    natural language, in that they share a single set of messages, and that the
    length of the conversation may vary according to the difficulty of the task. We
    examine these properties empirically using a dataset consisting of images and
    textual descriptions of mammals, where the agents are tasked with identifying
    the correct object. Our experiments indicate that a robust and efficient
    communication protocol emerges, where gradual information exchange informs
    better predictions and higher communication bandwidth improves generalization.


    Artificial Intelligence

    The Cramer Distance as a Solution to Biased Wasserstein Gradients

    Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, Rémi Munos
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    The Wasserstein probability metric has received much attention from the
    machine learning community. Unlike the Kullback-Leibler divergence, which
    strictly measures change in probability, the Wasserstein metric reflects the
    underlying geometry between outcomes. The value of being sensitive to this
    geometry has been demonstrated, among others, in ordinal regression and
    generative modelling. In this paper we describe three natural properties of
    probability divergences that reflect requirements from machine learning: sum
    invariance, scale sensitivity, and unbiased sample gradients. The Wasserstein
    metric possesses the first two properties but, unlike the Kullback-Leibler
    divergence, does not possess the third. We provide empirical evidence
    suggesting that this is a serious issue in practice. Leveraging insights from
    probabilistic forecasting we propose an alternative to the Wasserstein metric,
    the Cram’er distance. We show that the Cram’er distance possesses all three
    desired properties, combining the best of the Wasserstein and Kullback-Leibler
    divergences. To illustrate the relevance of the Cram’er distance in practice
    we design a new algorithm, the Cram’er Generative Adversarial Network (GAN),
    and show that it performs significantly better than the related Wasserstein
    GAN.

    Generating Steganographic Text with LSTMs

    Tina Fang, Martin Jaggi, Katerina Argyraki
    Comments: ACL 2017 Student Research Workshop
    Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Multimedia (cs.MM)

    Motivated by concerns for user privacy, we design a steganographic system
    (“stegosystem”) that enables two users to exchange encrypted messages without
    an adversary detecting that such an exchange is taking place. We propose a new
    linguistic stegosystem based on a Long Short-Term Memory (LSTM) neural network.
    We demonstrate our approach on the Twitter and Enron email datasets and show
    that it yields high-quality steganographic text while significantly improving
    capacity (encrypted bits per word) relative to the state-of-the-art.

    Strength Factors: An Uncertainty System for a Quantified Modal Logic

    Naveen Sundar Govindarajulu, Selmer Bringsjord
    Subjects: Artificial Intelligence (cs.AI)

    We present a new system S for handling uncertainty in a quantified modal
    logic (first-order modal logic). The system is based on both probability theory
    and proof theory. The system is derived from Chisholm’s epistemology. We
    concretize Chisholm’s system by grounding his undefined and primitive (i.e.
    foundational) concept of reasonablenes in probability and proof theory. S can
    be useful in systems that have to interact with humans and provide
    justifications for their uncertainty. As a demonstration of the system, we
    apply the system to provide a solution to the lottery paradox. Another
    advantage of the system is that it can be used to provide uncertainty values
    for counterfactual statements. Counterfactuals are statements that an agent
    knows for sure are false. Among other cases, counterfactuals are useful when
    systems have to explain their actions to users. Uncertainties for
    counterfactuals fall out naturally from our system.

    Efficient reasoning in just simple first-order logic is a hard problem.
    Resolution-based first-order reasoning systems have made significant progress
    over the last several decades in building systems that have solved non-trivial
    tasks (even unsolved conjectures in mathematics). We present a sketch of a
    novel algorithm for reasoning that extends first-order resolution.

    Finally, while there have been many systems of uncertainty for propositional
    logics, first-order logics and propositional modal logics, there has been very
    little work in building systems of uncertainty for first-order modal logics.
    The work described below is in progress; and once finished will address this
    lack.

    Low Impact Artificial Intelligences

    Stuart Armstrong, Benjamin Levinstein
    Subjects: Artificial Intelligence (cs.AI)

    There are many goals for an AI that could become dangerous if the AI becomes
    superintelligent or otherwise powerful. Much work on the AI control problem has
    been focused on constructing AI goals that are safe even for such AIs. This
    paper looks at an alternative approach: defining a general concept of `low
    impact’. The aim is to ensure that a powerful AI which implements low impact
    will not modify the world extensively, even if it is given a simple or
    dangerous goal. The paper proposes various ways of defining and grounding low
    impact, and discusses methods for ensuring that the AI can still be allowed to
    have a (desired) impact despite the restriction. The end of the paper addresses
    known issues with this approach and avenues for future research.

    Multi-Labelled Value Networks for Computer Go

    Ti-Rong Wu, I-Chen Wu, Guan-Wun Chen, Ting-han Wei, Tung-Yi Lai, Hung-Chun Wu, Li-Cheng Lan
    Comments: This version was also submitted to IEEE TCIAIG on May 30, 2017
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    This paper proposes a new approach to a novel value network architecture for
    the game Go, called a multi-labelled (ML) value network. In the ML value
    network, different values (win rates) are trained simultaneously for different
    settings of komi, a compensation given to balance the initiative of playing
    first. The ML value network has three advantages, (a) it outputs values for
    different komi, (b) it supports dynamic komi, and (c) it lowers the mean
    squared error (MSE). This paper also proposes a new dynamic komi method to
    improve game-playing strength. This paper also performs experiments to
    demonstrate the merits of the architecture. First, the MSE of the ML value
    network is generally lower than the value network alone. Second, the program
    based on the ML value network wins by a rate of 67.6% against the program based
    on the value network alone. Third, the program with the proposed dynamic komi
    method significantly improves the playing strength over the baseline that does
    not use dynamic komi, especially for handicap games. To our knowledge, up to
    date, no handicap games have been played openly by programs using value
    networks. This paper provides these programs with a useful approach to playing
    handicap games.

    Universal Reinforcement Learning Algorithms: Survey and Experiments

    John Aslanides, Jan Leike, Marcus Hutter
    Comments: 8 pages, 6 figures, Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
    Subjects: Artificial Intelligence (cs.AI)

    Many state-of-the-art reinforcement learning (RL) algorithms typically assume
    that the environment is an ergodic Markov Decision Process (MDP). In contrast,
    the field of universal reinforcement learning (URL) is concerned with
    algorithms that make as few assumptions as possible about the environment. The
    universal Bayesian agent AIXI and a family of related URL algorithms have been
    developed in this setting. While numerous theoretical optimality results have
    been proven for these agents, there has been no empirical investigation of
    their behavior to date. We present a short and accessible survey of these URL
    algorithms under a unified notation and framework, along with results of some
    experiments that qualitatively illustrate some properties of the resulting
    policies, and their relative performance on partially-observable gridworld
    environments. We also present an open-source reference implementation of the
    algorithms which we hope will facilitate further understanding of, and
    experimentation with, these ideas.

    MOBA: a New Arena for Game AI

    Victor do Nascimento Silva, Luiz Chaimowicz
    Subjects: Artificial Intelligence (cs.AI)

    Games have always been popular testbeds for Artificial Intelligence (AI). In
    the last decade, we have seen the rise of the Multiple Online Battle Arena
    (MOBA) games, which are the most played games nowadays. In spite of this, there
    are few works that explore MOBA as a testbed for AI Research. In this paper we
    present and discuss the main features and opportunities offered by MOBA games
    to Game AI Research. We describe the various challenges faced along the game
    and also propose a discrete model that can be used to better understand and
    explore the game. With this, we aim to encourage the use of MOBA as a novel
    research platform for Game AI.

    Fine-grained acceleration control for autonomous intersection management using deep reinforcement learning

    Hamid Mirzaei, Tony Givargis
    Comments: Accepted in IEEE Smart World Congress 2017
    Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (cs.SY)

    Recent advances in combining deep learning and Reinforcement Learning have
    shown a promising path for designing new control agents that can learn optimal
    policies for challenging control tasks. These new methods address the main
    limitations of conventional Reinforcement Learning methods such as customized
    feature engineering and small action/state space dimension requirements. In
    this paper, we leverage one of the state-of-the-art Reinforcement Learning
    methods, known as Trust Region Policy Optimization, to tackle intersection
    management for autonomous vehicles. We show that using this method, we can
    perform fine-grained acceleration control of autonomous vehicles in a grid
    street plan to achieve a global design objective.

    Deep Learning for Ontology Reasoning

    Patrick Hohenecker, Thomas Lukasiewicz
    Comments: 9 pages
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    In this work, we present a novel approach to ontology reasoning that is based
    on deep learning rather than logic-based formal reasoning. To this end, we
    introduce a new model for statistical relational learning that is built upon
    deep recursive neural networks, and give experimental evidence that it can
    easily compete with, or even outperform, existing logic-based reasoners on the
    task of ontology reasoning. More precisely, we compared our implemented system
    with one of the best logic-based ontology reasoners at present, RDFox, on a
    number of large standard benchmark datasets, and found that our system attained
    high reasoning quality, while being up to two orders of magnitude faster.

    Knowledge Base Completion: Baselines Strike Back

    Rudolf Kadlec, Ondrej Bajgar, Jan Kleindienst
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Many papers have been published on the knowledge base completion task in the
    past few years. Most of these introduce novel architectures for relation
    learning that are evaluated on standard datasets such as FB15k and WN18. This
    paper shows that the accuracy of almost all models published on the FB15k can
    be outperformed by an appropriately tuned baseline – our reimplementation of
    the DistMult model. Our findings cast doubt on the claim that the performance
    improvements of recent models are due to architectural changes as opposed to
    hyper-parameter tuning or different training objectives. This should prompt
    future research to re-consider how the performance of models is evaluated and
    reported.

    Deep Learning is Robust to Massive Label Noise

    David Rolnick, Andreas Veit, Serge Belongie, Nir Shavit
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Deep neural networks trained on large supervised datasets have led to
    impressive results in recent years. However, since well-annotated datasets can
    be prohibitively expensive and time-consuming to collect, recent work has
    explored the use of larger but noisy datasets that can be more easily obtained.
    In this paper, we investigate the behavior of deep neural networks on training
    sets with massively noisy labels. We show that successful learning is possible
    even with an essentially arbitrary amount of noise. For example, on MNIST we
    find that accuracy of above 90 percent is still attainable even when the
    dataset has been diluted with 100 noisy examples for each clean example. Such
    behavior holds across multiple patterns of label noise, even when noisy labels
    are biased towards confusing classes. Further, we show how the required dataset
    size for successful training increases with higher label noise. Finally, we
    present simple actionable techniques for improving learning in the regime of
    high label noise.

    Preliminary results on Ontology-based Open Data Publishing

    Gianluca Cima
    Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

    Despite the current interest in Open Data publishing, a formal and
    comprehensive methodology supporting an organization in deciding which data to
    publish and carrying out precise procedures for publishing high-quality data,
    is still missing. In this paper we argue that the Ontology-based Data
    Management paradigm can provide a formal basis for a principled approach to
    publish high quality, semantically annotated Open Data. We describe two main
    approaches to using an ontology for this endeavor, and then we present some
    technical results on one of the approaches, called bottom-up, where the
    specification of the data to be published is given in terms of the sources, and
    specific techniques allow deriving suitable annotations for interpreting the
    published data under the light of the ontology.

    Learning End-to-end Multimodal Sensor Policies for Autonomous Navigation

    Guan-Horng Liu, Avinash Siravuru, Sai Prabhakar, Manuela Veloso, George Kantor
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Sensor fusion is indispensable to improve accuracy and robustness in an
    autonomous navigation setting. However, in the space of end-to-end sensorimotor
    control, this multimodal outlook has received limited attention. In this work,
    we propose a novel stochastic regularization technique, called Sensor Dropout,
    to robustify multimodal sensor policy learning outcomes. We also introduce an
    auxiliary loss on policy network along with the standard DRL loss that help
    reduce the action variations of the multimodal sensor policy. Through empirical
    testing we demonstrate that our proposed policy can 1) operate with minimal
    performance drop in noisy environments, 2) remain functional even in the face
    of a sensor subset failure. Finally, through the visualization of gradients, we
    show that the learned policies are conditioned on the same latent input
    distribution despite having multiple sensory observations spaces – a hallmark
    of true sensor-fusion. This efficacy of a multimodal policy is shown through
    simulations on TORCS, a popular open-source racing car game. A demo video can
    be seen here: this https URL


    Information Retrieval

    Auditing Search Engines for Differential Satisfaction Across Demographics

    Rishabh Mehrotra, Ashton Anderson, Fernando Diaz, Amit Sharma, Hanna Wallach, Emine Yilmaz
    Comments: 8 pages Accepted at WWW 2017
    Subjects: Information Retrieval (cs.IR)

    Many online services, such as search engines, social media platforms, and
    digital marketplaces, are advertised as being available to any user, regardless
    of their age, gender, or other demographic factors. However, there are growing
    concerns that these services may systematically underserve some groups of
    users. In this paper, we present a framework for internally auditing such
    services for differences in user satisfaction across demographic groups, using
    search engines as a case study. We first explain the pitfalls of na”ively
    comparing the behavioral metrics that are commonly used to evaluate search
    engines. We then propose three methods for measuring latent differences in user
    satisfaction from observed differences in evaluation metrics. To develop these
    methods, we drew on ideas from the causal inference literature and the
    multilevel modeling literature. Our framework is broadly applicable to other
    online services, and provides general insight into interpreting their
    evaluation metrics.

    IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models

    Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, Dell Zhang
    Comments: 10 pages
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

    This paper provides a unified account of two schools of thinking in
    information retrieval modelling: the generative retrieval focusing on
    predicting relevant documents given a query, and the discriminative retrieval
    focusing on predicting relevancy given a query-document pair. We propose a game
    theoretical minimax game to iteratively optimise both models. On one hand, the
    discriminative model, aiming to mine signals from labelled and unlabelled data,
    provides guidance to train the generative model towards fitting the underlying
    relevance distribution over documents given the query. On the other hand, the
    generative model, acting as an attacker to the current discriminative model,
    generates difficult examples for the discriminative model in an adversarial way
    by minimising its discrimination objective. With the competition between these
    two models, we show that the unified framework takes advantage of both schools
    of thinking: (i) the generative model learns to fit the relevance distribution
    over documents via the signals from the discriminative model, and (ii) the
    discriminative model is able to exploit the unlabelled data selected by the
    generative model to achieve a better estimation for document ranking. Our
    experimental results have demonstrated significant performance gains as much as
    23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of
    applications including web search, item recommendation, and question answering.

    Twitter Hashtag Recommendation using Matrix Factorization

    Hamidreza Alvari
    Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR)

    Twitter, one of the biggest and most popular microblogging Websites, has
    evolved into a powerful communication platform which allows millions of active
    users to generate huge volume of microposts and queries on a daily basis. To
    accommodate effective categorization and easy search, users are allowed to make
    use of hashtags, keywords or phrases prefixed by hash character, to categorize
    and summarize their posts. However, valid hashtags are not restricted and thus
    are created in a free and heterogeneous style, increasing difficulty of the
    task of tweet categorization. In this paper, we propose a low-rank weighted
    matrix factorization based method to recommend hashtags to the users solely
    based on their hashtag usage history and independent from their tweets’
    contents. We confirm using two-sample t-test that users are more likely to
    adopt new hashtags similar to the ones they have previously adopted. In
    particular, we formulate the problem of hashtag recommendation into an
    optimization problem and incorporate hashtag correlation weight matrix into it
    to account for the similarity between different hashtags. We finally leverage
    widely used matrix factorization from recommender systems to solve the
    optimization problem by capturing the latent factors of users and hashtags.
    Empirical experiments demonstrate that our method is capable to properly
    recommend hashtags.

    Local Search Methods for Fast Near Neighbor Search

    Eric S. Tellez, Guillermo Ruiz, Edgar Chavez, Mario Graff
    Comments: extended version of SISAP’15 “Guillermo Ruiz, Edgar Chavez, Mario Graff, Eric S. Tellez: Finding Near Neighbors Through Local Search. SISAP 2015: 103-109”
    Subjects: Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR)

    Near neighbor search (NNS) has been traditionally addressed from an
    algorithmic perspective, that is, given a dataset and a distance function, the
    goal is to build a data structure along with discarding rules that quickly find
    the near neighbor of a query under the given distance function. In this
    manuscript, we take another approach. We restate NNS as a combinatorial
    optimization problem, being the goal to minimize the distance from the query to
    the data set. This cost function is well defined, and the minimum is realized
    precisely with the nearest object to the query. Adopting this new view allows
    the use of a rich collection of optimization algorithms with a long tradition.
    We present three local search algorithms that solve the NNS problem from the
    combinatorial optimization point of view. Two of the algorithms can be
    described as the hyper-parameter optimization of two known indexing methods,
    simplifying their adoption for final users. The third method achieves excellent
    search tradeoffs among speed, memory, and quality of the approximation. For
    instance, for moderately large dimensions, our indexes can reach above 0.99
    recall reviewing less than 1% of the database. We conducted extensive
    experimentation with five real-world standard benchmark and five synthetic
    datasets to prove our claims.


    Computation and Language

    A Low Dimensionality Representation for Language Variety Identification

    Francisco Rangel, Marc Franco-Salvador, Paolo Rosso
    Journal-ref: CICLing – Computational Linguistics and Intelligent Text
    Processing, 2016
    Subjects: Computation and Language (cs.CL)

    Language variety identification aims at labelling texts in a native language
    (e.g. Spanish, Portuguese, English) with its specific variation (e.g.
    Argentina, Chile, Mexico, Peru, Spain; Brazil, Portugal; UK, US). In this work
    we propose a low dimensionality representation (LDR) to address this task with
    five different varieties of Spanish: Argentina, Chile, Mexico, Peru and Spain.
    We compare our LDR method with common state-of-the-art representations and show
    an increase in accuracy of ~35%. Furthermore, we compare LDR with two reference
    distributed representation models. Experimental results show competitive
    performance while dramatically reducing the dimensionality –and increasing the
    big data suitability– to only 6 features per variety. Additionally, we analyse
    the behaviour of the employed machine learning algorithms and the most
    discriminating features. Finally, we employ an alternative dataset to test the
    robustness of our low dimensionality representation with another set of similar
    languages.

    The Importance of Automatic Syntactic Features in Vietnamese Named Entity Recognition

    Thai-Hoang Pham, Phuong Le-Hong
    Comments: 7 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:1705.04044
    Subjects: Computation and Language (cs.CL)

    This paper presents a state-of-the-art system for Vietnamese Named Entity
    Recognition (NER). By incorporating automatic syntactic features with word
    embeddings as input for bidirectional Long Short-Term Memory (Bi-LSTM), our
    system, although simpler than some deep learning architectures, achieves a much
    better result for Vietnamese NER. The proposed method achieves an overall F1
    score of 92.05% on the test set of an evaluation campaign, organized in late
    2016 by the Vietnamese Language and Speech Processing (VLSP) community. Our
    named entity recognition system outperforms the best previous systems for
    Vietnamese NER by a large margin.

    Character-Based Text Classification using Top Down Semantic Model for Sentence Representation

    Zhenzhou Wu, Xin Zheng, Daniel Dahlmeier
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Despite the success of deep learning on many fronts especially image and
    speech, its application in text classification often is still not as good as a
    simple linear SVM on n-gram TF-IDF representation especially for smaller
    datasets. Deep learning tends to emphasize on sentence level semantics when
    learning a representation with models like recurrent neural network or
    recursive neural network, however from the success of TF-IDF representation, it
    seems a bag-of-words type of representation has its strength. Taking advantage
    of both representions, we present a model known as TDSM (Top Down Semantic
    Model) for extracting a sentence representation that considers both the
    word-level semantics by linearly combining the words with attention weights and
    the sentence-level semantics with BiLSTM and use it on text classification. We
    apply the model on characters and our results show that our model is better
    than all the other character-based and word-based convolutional neural network
    models by cite{zhang15} across seven different datasets with only 1\% of their
    parameters. We also demonstrate that this model beats traditional linear models
    on TF-IDF vectors on small and polished datasets like news article in which
    typically deep learning models surrender.

    On the "Calligraphy" of Books

    Vanessa Q. Marinho, Henrique F. de Arruda, Thales S. Lima, Luciano F. Costa, Diego R. Amancio
    Comments: TextGraphs ACL 2017 (to appear)
    Subjects: Computation and Language (cs.CL)

    Authorship attribution is a natural language processing task that has been
    widely studied, often by considering small order statistics. In this paper, we
    explore a complex network approach to assign the authorship of texts based on
    their mesoscopic representation, in an attempt to capture the flow of the
    narrative. Indeed, as reported in this work, such an approach allowed the
    identification of the dominant narrative structure of the studied authors. This
    has been achieved due to the ability of the mesoscopic approach to take into
    account relationships between different, not necessarily adjacent, parts of the
    text, which is able to capture the story flow. The potential of the proposed
    approach has been illustrated through principal component analysis, a
    comparison with the chance baseline method, and network visualization. Such
    visualizations reveal individual characteristics of the authors, which can be
    understood as a kind of calligraphy.

    Emergent Language in a Multi-Modal, Multi-Step Referential Game

    Katrina Evtimova, Andrew Drozdov, Douwe Kiela, Kyunghyun Cho
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Multiagent Systems (cs.MA)

    Inspired by previous work on emergent language in referential games, we
    propose a novel multi-modal, multi-step referential game, where the sender and
    receiver have access to distinct modalities of an object, and their information
    exchange is bidirectional and of arbitrary duration. The multi-modal multi-step
    setting allows agents to develop an internal language significantly closer to
    natural language, in that they share a single set of messages, and that the
    length of the conversation may vary according to the difficulty of the task. We
    examine these properties empirically using a dataset consisting of images and
    textual descriptions of mammals, where the agents are tasked with identifying
    the correct object. Our experiments indicate that a robust and efficient
    communication protocol emerges, where gradual information exchange informs
    better predictions and higher communication bandwidth improves generalization.


    Distributed, Parallel, and Cluster Computing

    Distributed Matrix Factorization using Asynchrounous Communication

    Tom Vander Aa, Imen Chakroun, Tom Haber
    Comments: arXiv admin note: substantial text overlap with arXiv:1705.04159
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Using the matrix factorization technique in machine learning is very common
    mainly in areas like recommender systems. Despite its high prediction accuracy
    and its ability to avoid over-fitting of the data, the Bayesian Probabilistic
    Matrix Factorization algorithm (BPMF) has not been widely used on large scale
    data because of the prohibitive cost. In this paper, we propose a distributed
    high-performance parallel implementation of the BPMF using Gibbs sampling on
    shared and distributed architectures. We show by using efficient load balancing
    using work stealing on a single node, and by using asynchronous communication
    in the distributed version we beat state of the art implementations.

    Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

    Xiaoming Chen, Jianxu Chen, Danny Z. Chen, Xiaobo Sharon Hu
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    Convolution is a fundamental operation in many applications, such as computer
    vision, natural language processing, image processing, etc. Recent successes of
    convolutional neural networks in various deep learning applications put even
    higher demand on fast convolution. The high computation throughput and memory
    bandwidth of graphics processing units (GPUs) make GPUs a natural choice for
    accelerating convolution operations. However, maximally exploiting the
    available memory bandwidth of GPUs for convolution is a challenging task. This
    paper introduces a general model to address the mismatch between the memory
    bank width of GPUs and computation data width of threads. Based on this model,
    we develop two convolution kernels, one for the general case and the other for
    a special case with one input channel. By carefully optimizing memory access
    patterns and computation patterns, we design a communication-optimized kernel
    for the special case and a communication-reduced kernel for the general case.
    Experimental data based on implementations on Kepler GPUs show that our kernels
    achieve 5.16X and 35.5% average performance improvement over the latest cuDNN
    library, for the special case and the general case, respectively.

    Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication

    Qian Yu, Mohammad Ali Maddah-Ali, A. Salman Avestimehr
    Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)

    We consider a large-scale matrix multiplication problem where the computation
    is carried out using a distributed system with a master node and multiple
    worker nodes, where each worker can store parts of the input matrices. We
    propose a computation strategy that leverages ideas from coding theory to
    design intermediate computations at the worker nodes, in order to efficiently
    deal with straggling workers. The proposed strategy, named as emph{polynomial
    codes}, achieves the optimum recovery threshold, defined as the minimum number
    of workers that the master needs to wait for in order to compute the output.
    Furthermore, by leveraging the algebraic structure of polynomial codes, we can
    map the reconstruction problem of the final output to a polynomial
    interpolation problem, which can be solved efficiently. Polynomial codes
    provide order-wise improvement over the state of the art in terms of recovery
    threshold, and are also optimal in terms of several other metrics. Furthermore,
    we extend this code to distributed convolution and show its order-wise
    optimality.

    Good Things Come in LogLog(n)-Sized Packages: Robustness with Small Quorums

    Mercy O. Jaiyeola, Kyle Patron, Jared Saia, Maxwell Young, Qian M. Zhou
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

    A popular technique for tolerating Byzantine faults in open distributed
    systems is to group machines into sets called quorums, each of which has an
    honest majority. These quorums are then used as basic building blocks to design
    systems that are robust to adversarial faults.

    Despite over a decade of active research, all current algorithms require
    quorum sizes of (Omega(log n)), where (n) is the number of machines in the
    network. This size is important since communication cost scales polynomially in
    the size of the quorum. Given the stubbornness of this (Omega(log n))
    barrier, a natural question is whether better bounds are possible.

    In this paper, we demonstrate that it is possible to reduce quorums sizes to
    (O(loglog n)), despite an adversary that controls a constant fraction of the
    computational resources in the network. In particular, we show that even with
    such small quorums, we can ensure that all but an (o(1))-fraction of the
    machines can communicate with all but an (o(1))-fraction of the machines in the
    network.


    Learning

    Forward-Backward Selection with Early Dropping

    Giorgos Borboudakis, Ioannis Tsamardinos
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Forward-backward selection is one of the most basic and commonly-used feature
    selection algorithms available. It is also general and conceptually applicable
    to many different types of data. In this paper, we propose a heuristic that
    significantly improves its running time, while preserving predictive accuracy.
    The idea is to temporarily discard the variables that are conditionally
    independent with the outcome given the selected variable set. Depending on how
    those variables are reconsidered and reintroduced, this heuristic gives rise to
    a family of algorithms with increasingly stronger theoretical guarantees. In
    distributions that can be faithfully represented by Bayesian networks or
    maximal ancestral graphs, members of this algorithmic family are able to
    correctly identify the Markov blanket in the sample limit. In experiments we
    show that the proposed heuristic increases computational efficiency by about
    two orders of magnitude in high-dimensional problems, while selecting fewer
    variables and retaining predictive performance. Furthermore, we show that the
    proposed algorithm and feature selection with LASSO perform similarly when
    restricted to select the same number of variables, making the proposed
    algorithm an attractive alternative for problems where no (efficient) algorithm
    for LASSO exists.

    Generative Models of Visually Grounded Imagination

    Ramakrishna Vedantam, Ian Fischer, Jonathan Huang, Kevin Murphy
    Comments: 25 pages, 14 figures, Submitted to NIPS 2017
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Consider how easy it is for people to imagine what a “purple hippo” would
    look like, even though they do not exist. If we instead said “purple hippo with
    wings”, they could just as easily create a different internal mental
    representation, to represent this more specific concept. To assess whether the
    person has correctly understood the concept, we can ask them to draw a few
    sketches, to illustrate their thoughts. We call the ability to map text
    descriptions of concepts to latent representations and then to images (or vice
    versa) visually grounded semantic imagination. We propose a latent variable
    model for images and attributes, based on variational auto-encoders, which can
    perform this task. Our method uses a novel training objective, and a novel
    product-of-experts inference network, which can handle partially specified
    (abstract) concepts in a principled and efficient way. We also propose a set of
    easy-to-compute evaluation metrics that capture our intuitive notions of what
    it means to have good imagination, namely correctness, coverage, and
    compositionality (the 3 C’s). Finally, we perform a detailed comparison (in
    terms of the 3 C’s) of our method with two existing joint image-attribute VAE
    methods (the JMVAE method of (Suzuki et al., 2017) and the bi-VCCA method of
    (Wang et al., 2016)) by applying them to two simple datasets based on MNIST,
    where it is easy to objectively evaluate performance in a controlled way.

    Recurrent Estimation of Distributions

    Junier B. Oliva, Kumar Avinava Dubey, Barnabas Poczos, Eric Xing, Jeff Schneider
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    This paper presents the recurrent estimation of distributions (RED) for
    modeling real-valued data in a semiparametric fashion. RED models make two
    novel uses of recurrent neural networks (RNNs) for density estimation of
    general real-valued data. First, RNNs are used to transform input covariates
    into a latent space to better capture conditional dependencies in inputs.
    After, an RNN is used to compute the conditional distributions of the latent
    covariates. The resulting model is efficient to train, compute, and sample
    from, whilst producing normalized pdfs. The effectiveness of RED is shown via
    several real-world data experiments. Our results show that RED models achieve a
    lower held-out negative log-likelihood than other neural network approaches
    across multiple dataset sizes and dimensionalities. Further context of the
    efficacy of RED is provided by considering anomaly detection tasks, where we
    also observe better performance over alternative models.

    Knowledge Base Completion: Baselines Strike Back

    Rudolf Kadlec, Ondrej Bajgar, Jan Kleindienst
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Many papers have been published on the knowledge base completion task in the
    past few years. Most of these introduce novel architectures for relation
    learning that are evaluated on standard datasets such as FB15k and WN18. This
    paper shows that the accuracy of almost all models published on the FB15k can
    be outperformed by an appropriately tuned baseline – our reimplementation of
    the DistMult model. Our findings cast doubt on the claim that the performance
    improvements of recent models are due to architectural changes as opposed to
    hyper-parameter tuning or different training objectives. This should prompt
    future research to re-consider how the performance of models is evaluated and
    reported.

    Deep Learning is Robust to Massive Label Noise

    David Rolnick, Andreas Veit, Serge Belongie, Nir Shavit
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Deep neural networks trained on large supervised datasets have led to
    impressive results in recent years. However, since well-annotated datasets can
    be prohibitively expensive and time-consuming to collect, recent work has
    explored the use of larger but noisy datasets that can be more easily obtained.
    In this paper, we investigate the behavior of deep neural networks on training
    sets with massively noisy labels. We show that successful learning is possible
    even with an essentially arbitrary amount of noise. For example, on MNIST we
    find that accuracy of above 90 percent is still attainable even when the
    dataset has been diluted with 100 noisy examples for each clean example. Such
    behavior holds across multiple patterns of label noise, even when noisy labels
    are biased towards confusing classes. Further, we show how the required dataset
    size for successful training increases with higher label noise. Finally, we
    present simple actionable techniques for improving learning in the regime of
    high label noise.

    Domain Adaptation with Randomized Multilinear Adversarial Networks

    Mingsheng Long, Zhangjie Cao, Jianmin Wang, Michael I. Jordan
    Comments: arXiv admin note: text overlap with arXiv:1605.06636
    Subjects: Learning (cs.LG)

    Adversarial learning has been successfully embedded into deep networks to
    learn transferable features for domain adaptation, which reduce distribution
    discrepancy between the source and target domains and improve generalization
    performance. Prior domain adversarial adaptation methods could not align
    complex multimode distributions since the discriminative structures and
    inter-layer interactions across multiple domain-specific layers have not been
    exploited for distribution alignment. In this paper, we present randomized
    multilinear adversarial networks (RMAN), which exploit multiple feature layers
    and the classifier layer based on a randomized multilinear adversary to enable
    both deep and discriminative adversarial adaptation. The learning can be
    performed by stochastic gradient descent with the gradients computed by
    back-propagation in linear-time. Experiments demonstrate that our models exceed
    the state-of-the-art results on standard domain adaptation datasets.

    Constrained Policy Optimization

    Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel
    Comments: Accepted to ICML 2017
    Subjects: Learning (cs.LG)

    For many applications of reinforcement learning it can be more convenient to
    specify both a reward function and constraints, rather than trying to design
    behavior through the reward function. For example, systems that physically
    interact with or around humans should satisfy safety constraints. Recent
    advances in policy search algorithms (Mnih et al., 2016, Schulman et al., 2015,
    Lillicrap et al., 2016, Levine et al., 2016) have enabled new capabilities in
    high-dimensional control, but do not consider the constrained setting.

    We propose Constrained Policy Optimization (CPO), the first general-purpose
    policy search algorithm for constrained reinforcement learning with guarantees
    for near-constraint satisfaction at each iteration. Our method allows us to
    train neural network policies for high-dimensional control while making
    guarantees about policy behavior all throughout training. Our guarantees are
    based on a new theoretical result, which is of independent interest: we prove a
    bound relating the expected returns of two policies to an average divergence
    between them. We demonstrate the effectiveness of our approach on simulated
    robot locomotion tasks where the agent must satisfy constraints motivated by
    safety.

    Quantum Low Entropy based Associative Reasoning or QLEAR Learning

    Marko V. Jankovic
    Subjects: Learning (cs.LG); Information Theory (cs.IT)

    In this paper, we propose the classification method based on a learning
    paradigm we are going to call Quantum Low Entropy based Associative Reasoning
    or QLEAR learning. The approach is based on the idea that classification can be
    understood as supervised clustering, where a quantum entropy in the context of
    the quantum probabilistic model, will be used as a “capturer” (measure, or
    external index), of the “natural structure” of the data. By using quantum
    entropy we do not make any assumption about linear separability of the data
    that are going to be classified. The basic idea is to find close neighbors to a
    query sample and then use relative change in the quantum entropy as a measure
    of similarity of the newly arrived sample with the representatives of interest.
    In other words, method is based on calculation of quantum entropy of the
    referent system and its relative change with the addition of the newly arrived
    sample. Referent system consists of vectors that represent individual classes
    and that are the most similar, in Euclidean distance sense, to the vector that
    is analyzed. Here, we analyze the classification problem in the context of
    measuring similarities to prototype examples of categories. While nearest
    neighbor classifiers are natural in this setting, they suffer from the problem
    of high variance (in bias-variance decomposition) in the case of limited
    sampling. Alternatively, one could use machine learning techniques (like
    support vector machines) but they involve time-consuming optimization. Here we
    propose a hybrid of nearest neighbor and machine learning technique which deals
    naturally with the multi-class setting, has reasonable computational complexity
    both in training and at run time, and yields excellent results in practice.

    Exploiting Restricted Boltzmann Machines and Deep Belief Networks in Compressed Sensing

    Luisa F. Polania, Kenneth E. Barner
    Comments: Accepted for publication at IEEE Transactions on Signal Processing
    Subjects: Learning (cs.LG)

    This paper proposes a CS scheme that exploits the representational power of
    restricted Boltzmann machines and deep learning architectures to model the
    prior distribution of the sparsity pattern of signals belonging to the same
    class. The determined probability distribution is then used in a maximum a
    posteriori (MAP) approach for the reconstruction. The parameters of the prior
    distribution are learned from training data. The motivation behind this
    approach is to model the higher-order statistical dependencies between the
    coefficients of the sparse representation, with the final goal of improving the
    reconstruction. The performance of the proposed method is validated on the
    Berkeley Segmentation Dataset and the MNIST Database of handwritten digits.

    Online to Offline Conversions and Adaptive Minibatch Sizes

    Kfir Y. Levy
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

    We present an approach towards convex optimization that relies on a novel
    scheme which converts online adaptive algorithms into offline methods. In the
    offline optimization setting, our derived methods are shown to obtain
    favourable adaptive guarantees which depend on the harmonic sum of the queried
    gradients. We further show that our methods implicitly adapt to the objective’s
    structure: in the smooth case fast convergence rates are ensured without any
    prior knowledge of the smoothness parameter, while still maintaining guarantees
    in the non-smooth setting. Our approach has a natural extension to the
    stochastic setting, resulting in a lazy version of SGD (stochastic GD), where
    minibathces are chosen emph{adaptively} depending on the magnitude of the
    gradients. Thus providing a principled approach towards choosing minibatch
    sizes.

    Federated Multi-Task Learning

    Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, Ameet Talwalkar
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Federated learning poses new statistical and systems challenges in training
    machine learning models over distributed networks of devices. In this work, we
    show that multi-task learning is naturally suited to handle the statistical
    challenges of this setting, and propose a novel systems-aware optimization
    method, MOCHA, that is robust to practical systems issues. Our method and
    theory for the first time consider issues of high communication cost,
    stragglers, and fault tolerance for distributed multi-task learning. The
    resulting method achieves significant speedups compared to alternatives in the
    federated setting, as we demonstrate through simulations on real-world
    federated datasets.

    The Numerics of GANs

    Lars Mescheder, Sebastian Nowozin, Andreas Geiger
    Subjects: Learning (cs.LG)

    In this paper, we analyze the numerics of common algorithms for training
    Generative Adversarial Networks (GANs). Using the formalism of smooth
    two-player games we analyze the associated gradient vector field of GAN
    training objectives. Our findings suggest that the convergence of current
    algorithms suffers due to two factors: i) presence of eigenvalues of the
    Jacobian of the gradient vector field with zero real-part, and ii) eigenvalues
    with big imaginary part. Using these findings, we design a new algorithm that
    overcomes some of these limitations and has better convergence properties.
    Experimentally, we demonstrate its superiority on training common GAN
    architectures and show convergence on GAN architectures that are known to be
    notoriously hard to train.

    Emergent Language in a Multi-Modal, Multi-Step Referential Game

    Katrina Evtimova, Andrew Drozdov, Douwe Kiela, Kyunghyun Cho
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Multiagent Systems (cs.MA)

    Inspired by previous work on emergent language in referential games, we
    propose a novel multi-modal, multi-step referential game, where the sender and
    receiver have access to distinct modalities of an object, and their information
    exchange is bidirectional and of arbitrary duration. The multi-modal multi-step
    setting allows agents to develop an internal language significantly closer to
    natural language, in that they share a single set of messages, and that the
    length of the conversation may vary according to the difficulty of the task. We
    examine these properties empirically using a dataset consisting of images and
    textual descriptions of mammals, where the agents are tasked with identifying
    the correct object. Our experiments indicate that a robust and efficient
    communication protocol emerges, where gradual information exchange informs
    better predictions and higher communication bandwidth improves generalization.

    Classification of Major Depressive Disorder via Multi-Site Weighted LASSO Model

    Dajiang Zhu, Brandalyn C. Riedel, Neda Jahanshad, Nynke A. Groenewold, Dan J. Stein, Ian H. Gotlib, Danai Dima, James H. Cole, Cynthia H.Y. Fu, Henrik Walter, Ilya M. Veer, Thomas Frodl, Lianne Schmaal, Dick J. Veltman, Paul M. Thompson
    Comments: Accepted by MICCAI 2017
    Subjects: Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Applications (stat.AP)

    Large-scale collaborative analysis of brain imaging data, in psychiatry and
    neu-rology, offers a new source of statistical power to discover features that
    boost ac-curacy in disease classification, differential diagnosis, and outcome
    prediction. However, due to data privacy regulations or limited accessibility
    to large datasets across the world, it is challenging to efficiently integrate
    distributed information. Here we propose a novel classification framework
    through multi-site weighted LASSO: each site performs an iterative weighted
    LASSO for feature selection separately. Within each iteration, the
    classification result and the selected features are collected to update the
    weighting parameters for each feature. This new weight is used to guide the
    LASSO process at the next iteration. Only the fea-tures that help to improve
    the classification accuracy are preserved. In tests on da-ta from five sites
    (299 patients with major depressive disorder (MDD) and 258 normal controls),
    our method boosted classification accuracy for MDD by 4.9% on average. This
    result shows the potential of the proposed new strategy as an ef-fective and
    practical collaborative platform for machine learning on large scale
    distributed imaging and biobank data.

    The Cramer Distance as a Solution to Biased Wasserstein Gradients

    Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, Rémi Munos
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    The Wasserstein probability metric has received much attention from the
    machine learning community. Unlike the Kullback-Leibler divergence, which
    strictly measures change in probability, the Wasserstein metric reflects the
    underlying geometry between outcomes. The value of being sensitive to this
    geometry has been demonstrated, among others, in ordinal regression and
    generative modelling. In this paper we describe three natural properties of
    probability divergences that reflect requirements from machine learning: sum
    invariance, scale sensitivity, and unbiased sample gradients. The Wasserstein
    metric possesses the first two properties but, unlike the Kullback-Leibler
    divergence, does not possess the third. We provide empirical evidence
    suggesting that this is a serious issue in practice. Leveraging insights from
    probabilistic forecasting we propose an alternative to the Wasserstein metric,
    the Cram’er distance. We show that the Cram’er distance possesses all three
    desired properties, combining the best of the Wasserstein and Kullback-Leibler
    divergences. To illustrate the relevance of the Cram’er distance in practice
    we design a new algorithm, the Cram’er Generative Adversarial Network (GAN),
    and show that it performs significantly better than the related Wasserstein
    GAN.

    Fast Regression with an (ell_infty) Guarantee

    Eric Price, Zhao Song, David P. Woodruff
    Comments: ICALP 2017
    Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG)

    Sketching has emerged as a powerful technique for speeding up problems in
    numerical linear algebra, such as regression. In the overconstrained regression
    problem, one is given an (n imes d) matrix (A), with (n gg d), as well as an
    (n imes 1) vector (b), and one wants to find a vector (hat{x}) so as to
    minimize the residual error (|Ax-b|_2). Using the sketch and solve paradigm,
    one first computes (S cdot A) and (S cdot b) for a randomly chosen matrix
    (S), then outputs (x’ = (SA)^{dagger} Sb) so as to minimize (|SAx’ – Sb|_2).

    The sketch-and-solve paradigm gives a bound on (|x’-x^*|_2) when (A) is
    well-conditioned. Our main result is that, when (S) is the subsampled
    randomized Fourier/Hadamard transform, the error (x’ – x^*) behaves as if it
    lies in a “random” direction within this bound: for any fixed direction (ain
    mathbb{R}^d), we have with (1 – d^{-c}) probability that

    [

    langle a, x’-x^*
    angle lesssim
    frac{|a|_2|x’-x^*|_2}{d^{frac{1}{2}-gamma}}, quad (1)

    ]

    where (c, gamma > 0) are arbitrary constants.

    This implies (|x’-x^*|_{infty}) is a factor (d^{frac{1}{2}-gamma})
    smaller than (|x’-x^*|_2). It also gives a better bound on the generalization
    of (x’) to new examples: if rows of (A) correspond to examples and columns to
    features, then our result gives a better bound for the error introduced by
    sketch-and-solve when classifying fresh examples. We show that not all
    oblivious subspace embeddings (S) satisfy these properties. In particular, we
    give counterexamples showing that matrices based on Count-Sketch or leverage
    score sampling do not satisfy these properties.

    We also provide lower bounds, both on how small (|x’-x^*|_2) can be, and
    for our new guarantee (1), showing that the subsampled randomized
    Fourier/Hadamard transform is nearly optimal.

    Multi-Labelled Value Networks for Computer Go

    Ti-Rong Wu, I-Chen Wu, Guan-Wun Chen, Ting-han Wei, Tung-Yi Lai, Hung-Chun Wu, Li-Cheng Lan
    Comments: This version was also submitted to IEEE TCIAIG on May 30, 2017
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    This paper proposes a new approach to a novel value network architecture for
    the game Go, called a multi-labelled (ML) value network. In the ML value
    network, different values (win rates) are trained simultaneously for different
    settings of komi, a compensation given to balance the initiative of playing
    first. The ML value network has three advantages, (a) it outputs values for
    different komi, (b) it supports dynamic komi, and (c) it lowers the mean
    squared error (MSE). This paper also proposes a new dynamic komi method to
    improve game-playing strength. This paper also performs experiments to
    demonstrate the merits of the architecture. First, the MSE of the ML value
    network is generally lower than the value network alone. Second, the program
    based on the ML value network wins by a rate of 67.6% against the program based
    on the value network alone. Third, the program with the proposed dynamic komi
    method significantly improves the playing strength over the baseline that does
    not use dynamic komi, especially for handicap games. To our knowledge, up to
    date, no handicap games have been played openly by programs using value
    networks. This paper provides these programs with a useful approach to playing
    handicap games.

    Feature Squeezing Mitigates and Detects Carlini/Wagner Adversarial Examples

    Weilin Xu, David Evans, Yanjun Qi
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

    Feature squeezing is a recently-introduced framework for mitigating and
    detecting adversarial examples. In previous work, we showed that it is
    effective against several earlier methods for generating adversarial examples.
    In this short note, we report on recent results showing that simple feature
    squeezing techniques also make deep learning models significantly more robust
    against the Carlini/Wagner attacks, which are the best known adversarial
    methods discovered to date.

    Grammatical Inference as a Satisfiability Modulo Theories Problem

    Rick Smetsers
    Comments: Submitted and selected for oral presentation at the LearnAut workshop at LICS 2017
    Subjects: Formal Languages and Automata Theory (cs.FL); Learning (cs.LG); Logic in Computer Science (cs.LO)

    The problem of learning a minimal consistent model from a set of labeled
    sequences of symbols is addressed from a satisfiability modulo theories
    perspective. We present two encodings for deterministic finite automata and
    extend one of these for Moore and Mealy machines. Our experimental results show
    that these encodings improve upon the state-of-the-art, and are useful in
    practice for learning small models.

    Approximation learning methods of Harmonic Mappings in relation to Hardy Spaces

    Zhulin Liu, C. L. Philip Chen
    Comments: 2016 3rd International Conference on Informative and Cybernetics for Computational Social Systems (ICCSS)
    Subjects: Numerical Analysis (math.NA); Learning (cs.LG)

    A new Hardy space Hardy space approach of Dirichlet type problem based on
    Tikhonov regularization and Reproducing Hilbert kernel space is discussed in
    this paper, which turns out to be a typical extremal problem located on the
    upper upper-high complex plane. If considering this in the Hardy space, the
    optimization operator of this problem will be highly simplified and an
    efficient algorithm is possible. This is mainly realized by the help of
    reproducing properties of the functions in the Hardy space of upper-high
    complex plane, and the detail algorithm is proposed. Moreover, harmonic
    mappings, which is a significant geometric transformation, are commonly used in
    many applications such as image processing, since it describes the energy
    minimization mappings between individual manifolds. Particularly, when we focus
    on the planer mappings between two Euclid planer regions, the harmonic mappings
    are exist and unique, which is guaranteed solidly by the existence of harmonic
    function. This property is attractive and simulation results are shown in this
    paper to ensure the capability of applications such as planer shape distortion
    and surface registration.

    Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

    Xiaoming Chen, Jianxu Chen, Danny Z. Chen, Xiaobo Sharon Hu
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    Convolution is a fundamental operation in many applications, such as computer
    vision, natural language processing, image processing, etc. Recent successes of
    convolutional neural networks in various deep learning applications put even
    higher demand on fast convolution. The high computation throughput and memory
    bandwidth of graphics processing units (GPUs) make GPUs a natural choice for
    accelerating convolution operations. However, maximally exploiting the
    available memory bandwidth of GPUs for convolution is a challenging task. This
    paper introduces a general model to address the mismatch between the memory
    bank width of GPUs and computation data width of threads. Based on this model,
    we develop two convolution kernels, one for the general case and the other for
    a special case with one input channel. By carefully optimizing memory access
    patterns and computation patterns, we design a communication-optimized kernel
    for the special case and a communication-reduced kernel for the general case.
    Experimental data based on implementations on Kepler GPUs show that our kernels
    achieve 5.16X and 35.5% average performance improvement over the latest cuDNN
    library, for the special case and the general case, respectively.

    Character-Based Text Classification using Top Down Semantic Model for Sentence Representation

    Zhenzhou Wu, Xin Zheng, Daniel Dahlmeier
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Despite the success of deep learning on many fronts especially image and
    speech, its application in text classification often is still not as good as a
    simple linear SVM on n-gram TF-IDF representation especially for smaller
    datasets. Deep learning tends to emphasize on sentence level semantics when
    learning a representation with models like recurrent neural network or
    recursive neural network, however from the success of TF-IDF representation, it
    seems a bag-of-words type of representation has its strength. Taking advantage
    of both representions, we present a model known as TDSM (Top Down Semantic
    Model) for extracting a sentence representation that considers both the
    word-level semantics by linearly combining the words with attention weights and
    the sentence-level semantics with BiLSTM and use it on text classification. We
    apply the model on characters and our results show that our model is better
    than all the other character-based and word-based convolutional neural network
    models by cite{zhang15} across seven different datasets with only 1\% of their
    parameters. We also demonstrate that this model beats traditional linear models
    on TF-IDF vectors on small and polished datasets like news article in which
    typically deep learning models surrender.

    Multi-Focus Image Fusion Via Coupled Sparse Representation and Dictionary Learning

    Rui Gao, Sergiy A. Vorobyov
    Comments: 27 pages, 8 figures, 1 table
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We address the multi-focus image fusion problem, where multiple images
    captured with different focal settings are to be fused into an all-in-focus
    image of higher quality. Algorithms for this problem necessarily admit the
    source image characteristics along with focused and blurred feature. However,
    most sparsity-based approaches use a single dictionary in focused feature space
    to describe multi-focus images, and ignore the representations in blurred
    feature space. Here, we propose a multi-focus image fusion approach based on
    coupled sparse representation. The approach exploits the facts that (i) the
    patches in given training set can be sparsely represented by a couple of
    overcomplete dictionaries related to the focused and blurred categories of
    images; and (ii) merging such representations leads to a more flexible and
    therefore better fusion strategy than the one based on just selecting the
    sparsest representation in the original image estimate. By jointly learning the
    coupled dictionary, we enforce the similarity of sparse representations in the
    focused and blurred feature spaces, and then introduce a fusion approach to
    combine these representations for generating an all-in-focus image. We also
    discuss the advantages of the fusion approach based on coupled sparse
    representation and present an efficient algorithm for learning the coupled
    dictionary. Extensive experimental comparisons with state-of-the-art
    multi-focus image fusion algorithms validate the effectiveness of the proposed
    approach.

    IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models

    Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, Dell Zhang
    Comments: 10 pages
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

    This paper provides a unified account of two schools of thinking in
    information retrieval modelling: the generative retrieval focusing on
    predicting relevant documents given a query, and the discriminative retrieval
    focusing on predicting relevancy given a query-document pair. We propose a game
    theoretical minimax game to iteratively optimise both models. On one hand, the
    discriminative model, aiming to mine signals from labelled and unlabelled data,
    provides guidance to train the generative model towards fitting the underlying
    relevance distribution over documents given the query. On the other hand, the
    generative model, acting as an attacker to the current discriminative model,
    generates difficult examples for the discriminative model in an adversarial way
    by minimising its discrimination objective. With the competition between these
    two models, we show that the unified framework takes advantage of both schools
    of thinking: (i) the generative model learns to fit the relevance distribution
    over documents via the signals from the discriminative model, and (ii) the
    discriminative model is able to exploit the unlabelled data selected by the
    generative model to achieve a better estimation for document ranking. Our
    experimental results have demonstrated significant performance gains as much as
    23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of
    applications including web search, item recommendation, and question answering.

    Implications of Decentralized Q-learning Resource Allocation in Wireless Networks

    Francesc Wilhelmi, Boris Bellalta, Cristina Cano, Anders Jonsson
    Comments: Conference
    Subjects: Networking and Internet Architecture (cs.NI); Learning (cs.LG)

    Reinforcement Learning is gaining attention by the wireless networking
    community due to its potential to learn good-performing configurations only
    from the observed results. In this work we propose a stateless variation of
    Q-learning, which we apply to exploit spatial reuse in a wireless network. In
    particular, we allow networks to modify both their transmission power and the
    channel used solely based on the experienced throughput. We concentrate in a
    completely decentralized scenario in which no information about neighbouring
    nodes is available to the learners. Our results show that although the
    algorithm is able to find the best-performing actions to enhance aggregate
    throughput, there is high variability in the throughput experienced by the
    individual networks. We identify the cause of this variability as the
    adversarial setting of our setup, in which the most played actions provide
    intermittent good/poor performance depending on the neighbouring decisions. We
    also evaluate the effect of the intrinsic learning parameters of the algorithm
    on this variability.

    Zonotope hit-and-run for efficient sampling from projection DPPs

    Guillaume Gautier, Rémi Bardenet, Michal Valko
    Comments: 19 pages, accepted to ICML 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Computation (stat.CO)

    Determinantal point processes (DPPs) are distributions over sets of items
    that model diversity using kernels. Their applications in machine learning
    include summary extraction and recommendation systems. Yet, the cost of
    sampling from a DPP is prohibitive in large-scale applications, which has
    triggered an effort towards efficient approximate samplers. We build a novel
    MCMC sampler that combines ideas from combinatorial geometry, linear
    programming, and Monte Carlo methods to sample from DPPs with a fixed sample
    cardinality, also called projection DPPs. Our sampler leverages the ability of
    the hit-and-run MCMC kernel to efficiently move across convex bodies. Previous
    theoretical results yield a fast mixing time of our chain when targeting a
    distribution that is close to a projection DPP, but not a DPP in general. Our
    empirical results demonstrate that this extends to sampling projection DPPs,
    i.e., our sampler is more sample-efficient than previous approaches.

    Joint auto-encoders: a flexible multi-task learning framework

    Baruch Epstein. Ron Meir, Tomer Michaeli
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    The incorporation of prior knowledge into learning is essential in achieving
    good performance based on small noisy samples. Such knowledge is often
    incorporated through the availability of related data arising from domains and
    tasks similar to the one of current interest. Ideally one would like to allow
    both the data for the current task and for previous related tasks to
    self-organize the learning system in such a way that commonalities and
    differences between the tasks are learned in a data-driven fashion. We develop
    a framework for learning multiple tasks simultaneously, based on sharing
    features that are common to all tasks, achieved through the use of a modular
    deep feedforward neural network consisting of shared branches, dealing with the
    common features of all tasks, and private branches, learning the specific
    unique aspects of each task. Once an appropriate weight sharing architecture
    has been established, learning takes place through standard algorithms for
    feedforward networks, e.g., stochastic gradient descent and its variations. The
    method deals with domain adaptation and multi-task learning in a unified
    fashion, and can easily deal with data arising from different types of sources.
    Numerical experiments demonstrate the effectiveness of learning in domain
    adaptation and transfer learning setups, and provide evidence for the flexible
    and task-oriented representations arising in the network.

    Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets

    Karol Hausman, Yevgen Chebotar, Stefan Schaal, Gaurav Sukhatme, Joseph Lim
    Subjects: Robotics (cs.RO); Learning (cs.LG)

    Imitation learning has traditionally been applied to learn a single task from
    demonstrations thereof. The requirement of structured and isolated
    demonstrations limits the scalability of imitation learning approaches as they
    are difficult to apply to real-world scenarios, where robots have to be able to
    execute a multitude of tasks. In this paper, we propose a multi-modal imitation
    learning framework that is able to segment and imitate skills from unlabelled
    and unstructured demonstrations by learning skill segmentation and imitation
    learning jointly. The extensive simulation results indicate that our method can
    efficiently separate the demonstrations into individual skills and learn to
    imitate them using a single multi-modal policy. The video of our experiments is
    available at this http URL

    Iterative Machine Teaching

    Weiyang Liu, Bo Dai, James M. Rehg, Le Song
    Comments: To appear in ICML 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    In this paper, we consider the problem of machine teaching, the inverse
    problem of machine learning. Different from traditional machine teaching which
    views the learners as batch algorithms, we study a new paradigm where the
    learner uses an iterative algorithm and a teacher can feed examples
    sequentially and intelligently based on the current performance of the learner.
    We show that the teaching complexity in the iterative case is very different
    from that in the batch case. Instead of constructing a minimal training set for
    learners, our iterative machine teaching focuses on achieving fast convergence
    in the learner model. Depending on the level of information the teacher has
    from the learner model, we design teaching algorithms which can provably reduce
    the number of teaching examples and achieve faster convergence than learning
    without teachers. We also validate our theoretical findings with extensive
    experiments on different data distribution and real image datasets.

    Learning End-to-end Multimodal Sensor Policies for Autonomous Navigation

    Guan-Horng Liu, Avinash Siravuru, Sai Prabhakar, Manuela Veloso, George Kantor
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Sensor fusion is indispensable to improve accuracy and robustness in an
    autonomous navigation setting. However, in the space of end-to-end sensorimotor
    control, this multimodal outlook has received limited attention. In this work,
    we propose a novel stochastic regularization technique, called Sensor Dropout,
    to robustify multimodal sensor policy learning outcomes. We also introduce an
    auxiliary loss on policy network along with the standard DRL loss that help
    reduce the action variations of the multimodal sensor policy. Through empirical
    testing we demonstrate that our proposed policy can 1) operate with minimal
    performance drop in noisy environments, 2) remain functional even in the face
    of a sensor subset failure. Finally, through the visualization of gradients, we
    show that the learned policies are conditioned on the same latent input
    distribution despite having multiple sensory observations spaces – a hallmark
    of true sensor-fusion. This efficacy of a multimodal policy is shown through
    simulations on TORCS, a popular open-source racing car game. A demo video can
    be seen here: this https URL

    Solving the Conjugacy Decision Problem via Machine Learning

    Jonathan Gryak, Robert M. Haralick, Delaram Kahrobaei
    Subjects: Group Theory (math.GR); Learning (cs.LG)

    Machine learning and pattern recognition techniques have been successfully
    applied to algorithmic problems in free groups. In this paper, we seek to
    extend these techniques to finitely presented non-free groups, with a
    particular emphasis on polycyclic and metabelian groups that are of interest to
    non-commutative cryptography.

    As a prototypical example, we utilize supervised learning methods to
    construct classifiers that can solve the conjugacy decision problem, i.e.,
    determine whether or not a pair of elements from a specified group are
    conjugate. The accuracies of classifiers created using decision trees, random
    forests, and N-tuple neural network models are evaluated for several non-free
    groups. The very high accuracy of these classifiers suggests an underlying
    mathematical relationship with respect to conjugacy in the tested groups.

    Gradient Descent Can Take Exponential Time to Escape Saddle Points

    Simon S. Du, Chi Jin, Jason D. Lee, Michael I. Jordan, Barnabas Poczos, Aarti Singh
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)

    Although gradient descent (GD) almost always escapes saddle points
    asymptotically [Lee et al., 2016], this paper shows that even with fairly
    natural random initialization schemes and non-pathological functions, GD can be
    significantly slowed down by saddle points, taking exponential time to escape.
    On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et
    al., 2017] is not slowed down by saddle points – it can find an approximate
    local minimizer in polynomial time. This result implies that GD is inherently
    slower than perturbed GD, and justifies the importance of adding perturbations
    for efficient non-convex optimization. While our focus is theoretical, we also
    present experiments that illustrate our theoretical findings.

    Distributed SAGA: Maintaining linear convergence rate with limited communication

    Clément Calauzènes, Nicolas Le Roux
    Subjects: Optimization and Control (math.OC); Learning (cs.LG)

    In recent years, variance-reducing stochastic methods have shown great
    practical performance, exhibiting linear convergence rate when other stochastic
    methods offered a sub-linear rate. However, as datasets grow ever bigger and
    clusters become widespread, the need for fast distribution methods is pressing.
    We propose here a distribution scheme for SAGA which maintains a linear
    convergence rate, even when communication between nodes is limited.

    Collaborative Deep Learning for Speech Enhancement: A Run-Time Model Selection Method Using Autoencoders

    Minje Kim
    Journal-ref: Proc. of the IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP), pp 76-80, March 2017
    Subjects: Sound (cs.SD); Learning (cs.LG)

    We show that a Modular Neural Network (MNN) can combine various speech
    enhancement modules, each of which is a Deep Neural Network (DNN) specialized
    on a particular enhancement job. Differently from an ordinary ensemble
    technique that averages variations in models, the propose MNN selects the best
    module for the unseen test signal to produce a greedy ensemble. We see this as
    Collaborative Deep Learning (CDL), because it can reuse various already-trained
    DNN models without any further refining. In the proposed MNN selecting the best
    module during run time is challenging. To this end, we employ a speech
    AutoEncoder (AE) as an arbitrator, whose input and output are trained to be as
    similar as possible if its input is clean speech. Therefore, the AE can gauge
    the quality of the module-specific denoised result by seeing its AE
    reconstruction error, e.g. low error means that the module output is similar to
    clean speech. We propose an MNN structure with various modules that are
    specialized on dealing with a specific noise type, gender, and input
    Signal-to-Noise Ratio (SNR) value, and empirically prove that it almost always
    works better than an arbitrarily chosen DNN module and sometimes as good as an
    oracle result.

    Neural Embeddings of Graphs in Hyperbolic Space

    Benjamin Paul Chamberlain, James Clough, Marc Peter Deisenroth
    Comments: 7 pages, 5 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Neural embeddings have been used with great success in Natural Language
    Processing (NLP). They provide compact representations that encapsulate word
    similarity and attain state-of-the-art performance in a range of linguistic
    tasks. The success of neural embeddings has prompted significant amounts of
    research into applications in domains other than language. One such domain is
    graph-structured data, where embeddings of vertices can be learned that
    encapsulate vertex similarity and improve performance on tasks including edge
    prediction and vertex labelling. For both NLP and graph based tasks, embeddings
    have been learned in high-dimensional Euclidean spaces. However, recent work
    has shown that the appropriate isometric space for embedding complex networks
    is not the flat Euclidean space, but negatively curved, hyperbolic space. We
    present a new concept that exploits these recent insights and propose learning
    neural embeddings of graphs in hyperbolic space. We provide experimental
    evidence that embedding graphs in their natural geometry significantly improves
    performance on downstream tasks for several real-world public datasets.

    Deep Learning for Ontology Reasoning

    Patrick Hohenecker, Thomas Lukasiewicz
    Comments: 9 pages
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    In this work, we present a novel approach to ontology reasoning that is based
    on deep learning rather than logic-based formal reasoning. To this end, we
    introduce a new model for statistical relational learning that is built upon
    deep recursive neural networks, and give experimental evidence that it can
    easily compete with, or even outperform, existing logic-based reasoners on the
    task of ontology reasoning. More precisely, we compared our implemented system
    with one of the best logic-based ontology reasoners at present, RDFox, on a
    number of large standard benchmark datasets, and found that our system attained
    high reasoning quality, while being up to two orders of magnitude faster.

    Fast learning rate of deep learning via a kernel perspective

    Taiji Suzuki
    Comments: 36 pages
    Subjects: Statistics Theory (math.ST); Learning (cs.LG); Machine Learning (stat.ML)

    We develop a new theoretical framework to analyze the generalization error of
    deep learning, and derive a new fast learning rate for two representative
    algorithms: empirical risk minimization and Bayesian deep learning. The series
    of theoretical analyses of deep learning has revealed its high expressive power
    and universal approximation capability. Although these analyses are highly
    nonparametric, existing generalization error analyses have been developed
    mainly in a fixed dimensional parametric model. To compensate this gap, we
    develop an infinite dimensional model that is based on an integral form as
    performed in the analysis of the universal approximation capability. This
    allows us to define a reproducing kernel Hilbert space corresponding to each
    layer. Our point of view is to deal with the ordinary finite dimensional deep
    neural network as a finite approximation of the infinite dimensional one. The
    approximation error is evaluated by the degree of freedom of the reproducing
    kernel Hilbert space in each layer. To estimate a good finite dimensional
    model, we consider both of empirical risk minimization and Bayesian deep
    learning. We derive its generalization error bound and it is shown that there
    appears bias-variance trade-off in terms of the number of parameters of the
    finite dimensional approximation. We show that the optimal width of the
    internal layers can be determined through the degree of freedom and the
    convergence rate can be faster than (O(1/sqrt{n})) rate which has been shown
    in the existing studies.


    Information Theory

    A method for constructing parity-check matrices of non-binary quasi-cyclic LDPC codes

    Stanislav Kruglik, Valeriya Potapova, Alexey Frolov
    Comments: submitted to WCC 2017
    Subjects: Information Theory (cs.IT)

    An algorithm for constructing parity-check matrices of non-binary
    quasi-cyclic low-density parity-check (NB QC-LDPC) codes is proposed. The
    algorithm finds short cycles in the base matrix and tries to eliminate them by
    selecting the circulants and the elements of GF(q). Firstly the algorithm tries
    to eliminate the cycles with the smallest number edges going outside the cycle.
    The efficiency of the algorithm is demonstrated by means of simulations. In
    particular, it was shown that NB QC-LDPC codes constructed with use of our
    algorithm loose less that 0.1 dB in comparison to the best NB LDPC codes.

    Optical Communication in Space: Challenges and Mitigation Techniques

    Hemani Kaushal, Georges Kaddoum
    Comments: 41 pages, 13 Figures and 8 Tables. arXiv admin note: substantial text overlap with arXiv:1506.04836
    Journal-ref: IEEE Communications Surveys & Tutorials ( Volume: 19, Issue: 1,
    Firstquarter 2017 ), pp. 57-96, 2016
    Subjects: Information Theory (cs.IT)

    In recent years, free space optical communication has gained significant
    importance owing to its unique features: large bandwidth, license-free
    spectrum, high data rate, easy and quick deployability, less power and low mass
    requirements. FSO communication uses the optical carrier in the near infrared
    band to establish either terrestrial links within the Earth’s atmosphere or
    inter-satellite or deep space links or ground-to-satellite or
    satellite-to-ground links. However, despite the great potential of FSO
    communication, its performance is limited by the adverse effects viz.,
    absorption, scattering, and turbulence of the atmospheric channel. This paper
    presents a comprehensive survey on various challenges faced by FSO
    communication system for ground-to-satellite or satellite-to-ground and
    inter-satellite links. It also provides details of various performance
    mitigation techniques in order to have high link availability and reliability.
    The first part of the paper will focus on various types of impairments that
    pose a serious challenge to the performance of optical communication system for
    ground-to-satellite or satellite-to-ground and inter-satellite links. The
    latter part of the paper will provide the reader with an exhaustive review of
    various techniques both at physical layer as well as at the other layers i.e.,
    link, network or transport layer to combat the adverse effects of the
    atmosphere. It also uniquely presents a recently developed technique using
    orbital angular momentum for utilizing the high capacity advantage of the
    optical carrier in case of space-based and near-Earth optical communication
    links. This survey provides the reader with comprehensive details on the use of
    space-based optical backhaul links in order to provide high-capacity and
    low-cost backhaul solutions.

    Near-Optimal Vector Linear Index Codes For Single Unicast Index Coding Problems with Symmetric Neighboring Interference

    Mahesh Babu Vaddi, B. Sundar Rajan
    Comments: 14 pages, 8 figures and 3 tables. arXiv admin note: substantial text overlap with arXiv:1705.05060, arXiv:1705.03192
    Subjects: Information Theory (cs.IT)

    A single unicast index coding problem (SUICP) with symmetric neighboring
    interference (SNI) has equal number of (K) messages and (K) receivers, the
    (k)th receiver (R_{k}) wanting the (k)th message (x_{k}) and having the
    side-information (mathcal{K}_{k}=(mathcal{I}_{k} cup x_{k})^c,) where
    ({I}_k= {x_{k-U},dots,x_{k-2},x_{k-1}}cup{x_{k+1},
    x_{k+2},dots,x_{k+D}}) is the interference with (D) messages after and (U)
    messages before its desired message. Maleki, Cadambe and Jafar obtained the
    capacity of this single unicast index coding problem with symmetric neighboring
    interference (SUICP-SNI) with (K) tending to infinity and Blasiak, Kleinberg
    and Lubetzky for the special case of ((D=U=1)) with (K) being finite. In our
    previous work, we proved the capacity of SUICP-SNI for arbitrary (K) and (D)
    with (U= ext{gcd}(K,D+1)-1). This paper deals with near-optimal linear code
    construction for SUICP-SNI with arbitrary (K,U) and (D.) For SUICP-SNI with
    arbitrary (K,U) and (D), we define a set of (2)-tuples such that for every
    ((a,b)) in that set the rate (D+1+frac{a}{b}) is achieved by using vector
    linear index codes over every field. We prove that the set
    (mathcal{mathbf{S}}) consists of ((a,b)) such that the rate of constructed
    vector linear index codes are at most (frac{K~ ext{mod}~(D+1)}{left lfloor
    frac{K}{D+1}
    ight
    floor}) away from a known lower bound on broadcast rate
    of SUICP-SNI. The three known results on the exact capacity of the SUICP-SNI
    are recovered as special cases of our results. Also, we give a low complexity
    decoding procedure for the proposed vector linear index codes for the
    SUICP-SNI.

    Universal secure rank-metric coding schemes with optimal communication overheads

    Umberto Martínez-Peñas
    Comments: 21 pages, LaTeX; parts of this paper have been accepted for presentation at the IEEE International Symposium on Information Theory, Aachen, Germany, June 2017
    Subjects: Information Theory (cs.IT)

    We study the problem of reducing the communication overhead from a noisy
    wire-tap channel or storage system where data is encoded as a matrix, when more
    columns (or their linear combinations) are available. We present its
    applications to reducing communication overheads in universal secure linear
    network coding and secure distributed storage with crisscross errors and
    erasures and in the presence of a wire-tapper. Our main contribution is a
    method to transform coding schemes based on linear rank-metric codes, with
    certain properties, to schemes with lower communication overheads. By applying
    this method to pairs of Gabidulin codes, we obtain coding schemes with optimal
    information rate with respect to their security and rank error correction
    capability, and with universally optimal communication overheads, when ( n leq
    m ), being ( n ) and ( m ) the number of columns and number of rows,
    respectively. Moreover, our method can be applied to other families of maximum
    rank distance codes when ( n > m ). The downside of the method is generally
    expanding the packet length, but some practical instances come at no cost.

    Secret sharing on large girth graphs

    Laszlo Csirmaz, Peter Ligeti
    Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

    We investigate graph based secret sharing schemes and its information ratio,
    also called complexity, measuring the maximal amount of information the
    vertices has to store. It was conjectured that in large girth graphs, where the
    interaction between far away nodes is restricted to a single path, this ratio
    is bounded. This conjecture was supported by several result, most notably by a
    result of Csirmaz and Ligeti saying that the complexity of graphs with girth at
    least six and no neighboring high degree vertices is strictly below 2. In this
    paper we refute the above conjecture. First, a family of (d)-regular graphs is
    defined iteratively such that the complexity of these graphs is the largest
    possible ((d+1)/2) allowed by Stinson’s bound. This part extends earlier
    results of van Dijk and Blundo et al, and uses the so-called entropy method.
    Second, using combinatorial arguments, we show that this family contains graphs
    with arbitrary large girth. In particular, we obtain the following purely
    combinatorial result, which might be interesting on its own: there are
    (d)-regular graphs with arbitrary large girth such that any fractional
    edge-cover by stars (or by complete multipartite graphs) must cover some vertex
    ((d+1)/2) times.

    Diversity Combining for RF Energy Harvesting

    Dogay Altinel, Gunes Karabulut Kurt
    Subjects: Information Theory (cs.IT)

    RF energy harvesting (RFEH) is a promising technology for energy requirements
    of wireless communication nodes. However, providing sufficient amount of energy
    to ensure self-sufficient devices based on RFEH may be challenging. In this
    paper, the use of diversity combining in RFEH systems is proposed to increase
    the amount of harvested energy. The power consumption of diversity combining
    process is also taken into account to analyze the net benefit of diversity
    combining. Performances of RFEH systems are investigated for selection
    combining (SC), equal gain combining (EGC), and maximal ratio combining (MRC)
    techniques. Simulations are conducted to compare the numerical results of SC,
    EGC, and MRC, and the results show that although the diversity combining
    techniques can improve the energy harvesting performance, the power consumption
    parameters have a critical importance while determining the suitable technique.

    On the Fundamental Limits of Random Non-orthogonal Multiple Access in Cellular Massive IoT

    Mahyar Shirvanimoghaddam, Massimo Condoluci, Mischa Dohler, Sarah Johnson
    Comments: To appear in IEEE JSAC Special Issue on Non-Orthogonal Multiple Access for 5G Systems
    Subjects: Information Theory (cs.IT)

    Machine-to-machine (M2M) constitutes the communication paradigm at the basis
    of Internet of Things (IoT) vision. M2M solutions allow billions of multi-role
    devices to communicate with each other or with the underlying data transport
    infrastructure without, or with minimal, human intervention. Current solutions
    for wireless transmissions originally designed for human-based applications
    thus require a substantial shift to cope with the capacity issues in managing a
    huge amount of M2M devices. In this paper, we consider the multiple access
    techniques as promising solutions to support a large number of devices in
    cellular systems with limited radio resources. We focus on non-orthogonal
    multiple access (NOMA) where, with the aim to increase the channel efficiency,
    the devices share the same radio resources for their data transmission. This
    has been shown to provide optimal throughput from an information theoretic
    point of view.We consider a realistic system model and characterise the system
    performance in terms of throughput and energy efficiency in a NOMA scenario
    with a random packet arrival model, where we also derive the stability
    condition for the system to guarantee the performance.

    Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication

    Qian Yu, Mohammad Ali Maddah-Ali, A. Salman Avestimehr
    Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)

    We consider a large-scale matrix multiplication problem where the computation
    is carried out using a distributed system with a master node and multiple
    worker nodes, where each worker can store parts of the input matrices. We
    propose a computation strategy that leverages ideas from coding theory to
    design intermediate computations at the worker nodes, in order to efficiently
    deal with straggling workers. The proposed strategy, named as emph{polynomial
    codes}, achieves the optimum recovery threshold, defined as the minimum number
    of workers that the master needs to wait for in order to compute the output.
    Furthermore, by leveraging the algebraic structure of polynomial codes, we can
    map the reconstruction problem of the final output to a polynomial
    interpolation problem, which can be solved efficiently. Polynomial codes
    provide order-wise improvement over the state of the art in terms of recovery
    threshold, and are also optimal in terms of several other metrics. Furthermore,
    we extend this code to distributed convolution and show its order-wise
    optimality.

    Deep-LMS for gigabit transmission over unshielded twisted pair cables

    Avi Zanko, Itsik Bergel, Amir Leshem
    Subjects: Information Theory (cs.IT)

    In this paper we propose a rapidly converging LMS algorithm for crosstalk
    cancellation. The architecture is similar to deep neural networks, where
    multiple layers are adapted sequentially. The application motivating this
    approach is gigabit rate transmission over unshielded twisted pairs using a
    vectored system. The crosstalk cancellation algorithm uses an adaptive
    non-diagonal preprocessing matrix prior to a conventional LMS crosstalk
    canceler. The update of the preprocessing matrix is inspired by deep neural
    networks. However, since most the operations in the Deep-LMS algorithm are
    linear, we are capable of providing an exact convergence speed analysis. The
    role of the preprocessing matrix is to speed up the convergence of the
    conventional LMS crosstalk canceler and hence the convergence of the overall
    system. The Deep-LMS is important for crosstalk cancellation in the novel
    G.fast standard, where traditional LMS converges very slowly due to the
    ill-conditioned covariance matrix of the received signal at the extended
    bandwidth. Simulation results support our analysis and show significant
    reduction in convergence time compared to existing LMS variants.

    (Quantum) Min-Entropy Resources

    Christopher Portmann
    Comments: 39+18 pages, 11 figures
    Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Information Theory (cs.IT)

    We model (interactive) resources that provide Alice with a string (X) and a
    guarantee that any Eve interacting with her interface of the resource obtains a
    (quantum) system (E) such that the conditional (smooth) min-entropy of (X)
    given (E) is lower bounded by some (k). This (abstract) resource specification
    encompasses any setting that results in the honest players holding such a
    string (or aborting). For example, it could be constructed from, e.g., noisy
    channels, quantum key distribution (QKD), or a violation of Bell inequalities,
    which all may be used to derive bounds on the min-entropy of (X).

    As a first application, we use this min-entropy resource to modularize key
    distribution (KD) schemes by dividing them in two parts, which may be analyzed
    separately. In the first part, a KD protocol constructs a min-entropy resource
    given the (physical) resources available in the specific setting considered. In
    the second, it distills secret key from the min-entropy resource—i.e., it
    constructs a secret key resource. We prove security for a generic key
    distillation protocol that may use any min-entropy resource. Since the notion
    of resource construction is composable—security of a composed protocol
    follows from the security of its parts— this reduces proving security of a KD
    protocol (e.g., QKD) to proving that it constructs a min-entropy resource.

    As a second application, we provide a composable security proof for the
    recent Fehr-Salvail protocol [EUROCRYPT 2017] that authenticates classical
    messages with a quantum message authentication code (Q-MAC), and recycles all
    the key upon successfully verifying the authenticity of the message. This
    protocol uses (and recycles) a non-uniform key, which we model as consuming and
    constructing a min-entropy resource.

    Quantum Low Entropy based Associative Reasoning or QLEAR Learning

    Marko V. Jankovic
    Subjects: Learning (cs.LG); Information Theory (cs.IT)

    In this paper, we propose the classification method based on a learning
    paradigm we are going to call Quantum Low Entropy based Associative Reasoning
    or QLEAR learning. The approach is based on the idea that classification can be
    understood as supervised clustering, where a quantum entropy in the context of
    the quantum probabilistic model, will be used as a “capturer” (measure, or
    external index), of the “natural structure” of the data. By using quantum
    entropy we do not make any assumption about linear separability of the data
    that are going to be classified. The basic idea is to find close neighbors to a
    query sample and then use relative change in the quantum entropy as a measure
    of similarity of the newly arrived sample with the representatives of interest.
    In other words, method is based on calculation of quantum entropy of the
    referent system and its relative change with the addition of the newly arrived
    sample. Referent system consists of vectors that represent individual classes
    and that are the most similar, in Euclidean distance sense, to the vector that
    is analyzed. Here, we analyze the classification problem in the context of
    measuring similarities to prototype examples of categories. While nearest
    neighbor classifiers are natural in this setting, they suffer from the problem
    of high variance (in bias-variance decomposition) in the case of limited
    sampling. Alternatively, one could use machine learning techniques (like
    support vector machines) but they involve time-consuming optimization. Here we
    propose a hybrid of nearest neighbor and machine learning technique which deals
    naturally with the multi-class setting, has reasonable computational complexity
    both in training and at run time, and yields excellent results in practice.

    Solving Almost all Systems of Random Quadratic Equations

    Gang Wang, Georgios B. Giannakis, Yousef Saad, Jie Chen
    Comments: 27 pages, 8 figures
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Machine Learning (stat.ML)

    This paper deals with finding an (n)-dimensional solution (x) to a system of
    quadratic equations of the form (y_i=|langle{a}_i,x
    angle|^2) for (1le i le
    m), which is also known as phase retrieval and is NP-hard in general. We put
    forth a novel procedure for minimizing the amplitude-based least-squares
    empirical loss, that starts with a weighted maximal correlation initialization
    obtainable with a few power or Lanczos iterations, followed by successive
    refinements based upon a sequence of iteratively reweighted (generalized)
    gradient iterations. The two (both the initialization and gradient flow) stages
    distinguish themselves from prior contributions by the inclusion of a fresh
    (re)weighting regularization technique. The overall algorithm is conceptually
    simple, numerically scalable, and easy-to-implement. For certain random
    measurement models, the novel procedure is shown capable of finding the true
    solution (x) in time proportional to reading the data ({(a_i;y_i)}_{1le i
    le m}). This holds with high probability and without extra assumption on the
    signal (x) to be recovered, provided that the number (m) of equations is some
    constant (c>0) times the number (n) of unknowns in the signal vector, namely,
    (m>cn). Empirically, the upshots of this contribution are: i) (almost) (100\%)
    perfect signal recovery in the high-dimensional (say e.g., (nge 2,000)) regime
    given only an information-theoretic limit number of noiseless equations,
    namely, (m=2n-1) in the real-valued Gaussian case; and, ii) (nearly) optimal
    statistical accuracy in the presence of additive noise of bounded support.
    Finally, substantial numerical tests using both synthetic data and real images
    corroborate markedly improved signal recovery performance and computational
    efficiency of our novel procedure relative to state-of-the-art approaches.

    Emergent Language in a Multi-Modal, Multi-Step Referential Game

    Katrina Evtimova, Andrew Drozdov, Douwe Kiela, Kyunghyun Cho
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Multiagent Systems (cs.MA)

    Inspired by previous work on emergent language in referential games, we
    propose a novel multi-modal, multi-step referential game, where the sender and
    receiver have access to distinct modalities of an object, and their information
    exchange is bidirectional and of arbitrary duration. The multi-modal multi-step
    setting allows agents to develop an internal language significantly closer to
    natural language, in that they share a single set of messages, and that the
    length of the conversation may vary according to the difficulty of the task. We
    examine these properties empirically using a dataset consisting of images and
    textual descriptions of mammals, where the agents are tasked with identifying
    the correct object. Our experiments indicate that a robust and efficient
    communication protocol emerges, where gradual information exchange informs
    better predictions and higher communication bandwidth improves generalization.




沪ICP备19023445号-2号
友情链接