IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Fri, 9 Dec 2016

    我爱机器学习(52ml.net)发表于 2016-12-09 00:00:00
    love 0

    Neural and Evolutionary Computing

    Towards better decoding and language model integration in sequence to sequence models

    Jan Chorowski, Navdeep Jaitly
    Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    The recently proposed Sequence-to-Sequence (seq2seq) framework advocates
    replacing complex data processing pipelines, such as an entire automatic speech
    recognition system, with a single neural network trained in an end-to-end
    fashion. In this contribution, we analyse an attention-based seq2seq speech
    recognition system that directly transcribes recordings into characters. We
    observe two shortcomings: overconfidence in its predictions and a tendency to
    produce incomplete transcriptions when language models are used. We propose
    practical solutions to both problems achieving competitive speaker independent
    word error rates on the Wall Street Journal dataset: without separate language
    models we reach 10.6% WER, while together with a trigram language model, we
    reach 6.7% WER.

    Geometric Decomposition of Feed Forward Neural Networks

    Sven Cattell
    Comments: 11 pages, 2 figures
    Subjects: Neural and Evolutionary Computing (cs.NE); Combinatorics (math.CO)

    There have been several attempts to mathematically understand neural networks
    and many more from biological and computational perspectives. The field has
    exploded in the last decade, yet neural networks are still treated much like a
    black box. In this work we describe a structure that is inherent to a feed
    forward neural network. This will provide a framework for future work on neural
    networks to improve training algorithms, compute the homology of the network,
    and other applications. Our approach takes a more geometric point of view and
    is unlike other attempts to mathematically understand neural networks that rely
    on a functional perspective.

    Coupling Distributed and Symbolic Execution for Natural Language Queries

    Lili Mou, Zhengdong Lu, Hang Li, Zhi Jin
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    Building neural networks to query a knowledge base (a table) with natural
    language is an emerging research topic in NLP. The neural enquirer typically
    necessitates multiple steps of execution because of the compositionality of
    queries. In previous studies, researchers have developed either distributed
    enquirers or symbolic ones for table querying. The distributed enquirer is
    end-to-end learnable, but is weak in terms of execution efficiency and explicit
    interpretability. The symbolic enqurier, on the contrary, is efficient during
    execution; but it is very difficult to train especially at initial stages. In
    this paper, we propose to couple distributed and symbolic execution for natural
    language queries. The observation is that a fully distributed executor also
    exhibits meaningful, albeit imperfect, interpretation. We can thus pretrain the
    symbolic executor with the distributed one’s intermediate execution results in
    a step-by-step fashion. Experiments show that our approach significantly
    outperforms either the distributed or symbolic executor; moreover, we have
    recovered more than 80% execution sequences with only groundtruth denotations
    during training. In summary, the coupled neural enquirer takes advantages of
    both distributed and symbolic executors, and has high performance, high
    learning efficiency, high execution efficiency, and high interpretability.

    Learning in the Machine: Random Backpropagation and the Learning Channel

    Pierre Baldi, Peter Sadowski, Zhiqin Lu
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Random backpropagation (RBP) is a variant of the backpropagation algorithm
    for training neural networks, where the transpose of the forward matrices are
    replaced by fixed random matrices in the calculation of the weight updates. It
    is remarkable both because of its effectiveness, in spite of using random
    matrices to communicate error information, and because it completely removes
    the taxing requirement of maintaining symmetric weights in a physical neural
    system. To better understand random backpropagation, we first connect it to the
    notions of local learning and the learning channel. Through this connection, we
    derive several alternatives to RBP, including skipped RBP (SRPB), adaptive RBP
    (ARBP), sparse RBP, and their combinations (e.g. ASRBP) and analyze their
    computational complexity. We then study their behavior through simulations
    using the MNIST and CIFAR-10 bechnmark datasets. These simulations show that
    most of these variants work robustly, almost as well as backpropagation, and
    that multiplication by the derivatives of the activation functions is
    important. As a follow-up, we study also the low-end of the number of bits
    required to communicate error information over the learning channel. We then
    provide partial intuitive explanations for some of the remarkable properties of
    RBP and its variations. Finally, we prove several mathematical results,
    including the convergence to fixed points of linear chains of arbitrary length,
    the convergence to fixed points of linear autoencoders with decorrelated data,
    the long-term existence of solutions for linear systems with a single hidden
    layer, and the convergence to fixed points of non-linear chains, when the
    derivative of the activation functions is included.

    Improving the Performance of Neural Machine Translation Involving Morphologically Rich Languages

    Krupakar Hans, R S Milton
    Comments: 21 pages, 11 figures, 2 tables, In Journal of Computational Linguistics
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    The advent of the attention mechanism in neural machine translation models
    has improved the performance of machine translation systems by enabling
    selective lookup into the source sentence. In this paper, the efficiencies of
    translation using bidirectional encoder attention decoder models were studied
    with respect to translation involving morphologically rich languages. The
    English – Tamil language pair was selected for this analysis. First, the use of
    Word2Vec embedding for both the English and Tamil words improved the
    translation results by 0.73 BLEU points over the baseline RNNSearch model with
    4.84 BLEU score. The use of morphological segmentation before word
    vectorization to split the morphologically rich Tamil words into their
    respective morphemes before the translation, caused a reduction in the target
    vocabulary size by a factor of 8. Also, this model (RNNMorph) improved the
    performance of neural machine translation by 7.05 BLEU points over the
    RNNSearch model used over the same corpus. Since the BLEU evaluation of the
    RNNMorph model might be unreliable due to an increase in the number of matching
    tokens per sentence, the performances of the translations were also compared by
    means of human evaluation metrics of adequacy, fluency and relative ranking.
    Further, the use of morphological segmentation also improved the efficacy of
    the attention mechanism.


    Computer Vision and Pattern Recognition

    3D Shape Segmentation with Projective Convolutional Networks

    Evangelos Kalogerakis, Melinos Averkiou, Subhransu Maji, Siddhartha Chaudhuri
    Comments: 11 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

    This paper introduces a deep architecture for segmenting 3D objects into
    their labeled semantic parts. Our architecture combines image-based Fully
    Convolutional Networks (FCNs) and surface-based Conditional Random Fields
    (CRFs) to yield coherent segmentations of 3D shapes. The image-based FCNs are
    used for efficient view-based reasoning about 3D object parts. Through a
    special projection layer, FCN outputs are effectively aggregated across
    multiple views and scales, then are projected onto the 3D object surfaces.
    Finally, a surface-based CRF combines the projected outputs with geometric
    consistency cues to yield coherent segmentations. The whole architecture
    (multi-view FCNs and CRF) is trained end-to-end. Our approach significantly
    outperforms the existing state-of-the-art methods in the currently largest
    segmentation benchmark (ShapeNet). Finally, we demonstrate promising
    segmentation results on noisy 3D shapes acquired from consumer-grade depth
    cameras.

    Feedback Neural Network for Weakly Supervised Geo-Semantic Segmentation

    Xianming Liu, Amy Zhang, Tobias Tiecke, Andreas Gros, Thomas S. Huang
    Comments: 9 pages, 4 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Learning from weakly-supervised data is one of the main challenges in machine
    learning and computer vision, especially for tasks such as image semantic
    segmentation where labeling is extremely expensive and subjective. In this
    paper, we propose a novel neural network architecture to perform
    weakly-supervised learning by suppressing irrelevant neuron activations. It
    localizes objects of interest by learning from image-level categorical labels
    in an end-to-end manner. We apply this algorithm to a practical challenge of
    transforming satellite images into a map of settlements and individual
    buildings. Experimental results show that the proposed algorithm achieves
    superior performance and efficiency when compared with various baseline models.

    A Maximum A Posteriori Estimation Framework for Robust High Dynamic Range Video Synthesis

    Yuelong Li, Chul Lee, Vishal Monga
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    High dynamic range (HDR) image synthesis from multiple low dynamic range
    (LDR) exposures continues to be actively researched. The extension to HDR video
    synthesis is a topic of significant current interest due to potential cost
    benefits. For HDR video, a stiff practical challenge presents itself in the
    form of accurate correspondence estimation of objects between video frames. In
    particular, loss of data resulting from poor exposures and varying intensity
    make conventional optical flow methods highly inaccurate. We avoid exact
    correspondence estimation by proposing a statistical approach via maximum a
    posterior (MAP) estimation, and under appropriate statistical assumptions and
    choice of priors and models, we reduce it to an optimization problem of solving
    for the foreground and background of the target frame. We obtain the background
    through rank minimization and estimate the foreground via a novel multiscale
    adaptive kernel regression technique, which implicitly captures local structure
    and temporal motion by solving an unconstrained optimization problem. Extensive
    experimental results on both real and synthetic datasets demonstrate that our
    algorithm is more capable of delivering high-quality HDR videos than current
    state-of-the-art methods, under both subjective and objective assessments.
    Furthermore, a thorough complexity analysis reveals that our algorithm achieves
    better complexity-performance trade-off than conventional methods.

    Joint Hand Detection and Rotation Estimation by Using CNN

    Xiaoming Deng, Ye Yuan, Yinda Zhang, Ping Tan, Liang Chang, Shuo Yang, Hongan Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Hand detection is essential for many hand related tasks, e.g. parsing hand
    pose, understanding gesture, which are extremely useful for robotics and
    human-computer interaction. However, hand detection in uncontrolled
    environments is challenging due to the flexibility of wrist joint and cluttered
    background. We propose a deep learning based approach which detects hands and
    calibrates in-plane rotation under supervision at the same time. To guarantee
    the recall, we propose a context aware proposal generation algorithm which
    significantly outperforms the selective search. We then design a convolutional
    neural network(CNN) which handles object rotation explicitly to jointly solve
    the object detection and rotation estimation tasks. Experiments show that our
    method achieves better results than state-of-the-art detection models on
    widely-used benchmarks such as Oxford and Egohands database. We further show
    that rotation estimation and classification can mutually benefit each other.

    Predicting Ground-Level Scene Layout from Aerial Imagery

    Menghua Zhai, Zachary Bessinger, Scott Workman, Nathan Jacobs
    Comments: 13 pages including appendix
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We introduce a novel strategy for learning to extract semantically meaningful
    features from aerial imagery. Instead of manually labeling the aerial imagery,
    we propose to predict (noisy) semantic features automatically extracted from
    co-located ground imagery. Our network architecture takes an aerial image as
    input, extracts features using a convolutional neural network, and then applies
    an adaptive transformation to map these features into the ground-level
    perspective. We use an end-to-end learning approach to minimize the difference
    between the semantic segmentation extracted directly from the ground image and
    the semantic segmentation predicted solely based on the aerial image. We show
    that a model learned using this strategy, with no additional training, is
    already capable of rough semantic labeling of aerial imagery. Furthermore, we
    demonstrate that by finetuning this model we can achieve more accurate semantic
    segmentation than two baseline initialization strategies. We use our network to
    address the task of estimating the geolocation and geoorientation of a ground
    image. Finally, we show how features extracted from an aerial image can be used
    to hallucinate a plausible ground-level panorama.

    Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

    Chi Li, M. Zeeshan Zia, Quoc-Huy Tran, Xiang Yu, Gregory D. Hager, Manmohan Chandraker
    Comments: In review
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Monocular 3D object parsing is highly desirable in various scenarios
    including occlusion reasoning and holistic scene interpretation. We present a
    deep convolutional neural network (CNN) architecture to localize object
    semantic parts in 2D image and 3D space while inferring their visibility
    states, given a single RGB image. Our key insight is to exploit domain
    knowledge to regularize the network by deeply supervising its hidden layers, in
    order to sequentially infer a causal sequence of intermediate concepts
    associated with the final task. To acquire training data in desired quantities
    with ground truth 3D shape and intermediate concepts, we render 3D object CAD
    models to generate large-scale synthetic data and simulate challenging
    occlusion configurations between objects. We train the network only on
    synthetic data and demonstrate state-of-the-art performances on real image
    benchmarks including an extended version of KITTI, PASCAL VOC, PASCAL3D+ and
    IKEA for 2D and 3D keypoint localization and instance segmentation. The
    empirical results substantiate the utility of deep supervision scheme by
    demonstrating effective transfer of knowledge from synthetic data to real
    images, resulting in less overfitting compared to standard end-to-end training.

    Domain knowledge assisted cyst segmentation in OCT retinal images

    Karthik Gopinath, Jayanthi Sivaswamy
    Comments: The paper was accepted as an oral presentation in MICCAI-2015 OPTIMA Cyst Segmentation Challenge
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    3D imaging modalities are becoming increasingly popular and relevant in
    retinal imaging owing to their effectiveness in highlighting structures in
    sub-retinal layers. OCT is one such modality which has great importance in the
    context of analysis of cystoid structures in subretinal layers. Signal to noise
    ratio(SNR) of the images obtained from OCT is less and hence automated and
    accurate determination of cystoid structures from OCT is a challenging task. We
    propose an automated method for detecting/segmenting cysts in 3D OCT volumes.
    The proposed method is biologically inspired and fast aided by the domain
    knowledge about the cystoid structures. An ensemble learning methodRandom
    forests is learnt for classification of detected region into cyst region. The
    method achieves detection and segmentation in a unified setting. We believe the
    proposed approach with further improvements can be a promising starting point
    for more robust approach. This method is validated against the training set
    achieves a mean dice coefficient of 0.3893 with a standard deviation of 0.2987

    FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation

    Judy Hoffman, Dequan Wang, Fisher Yu, Trevor Darrell
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Fully convolutional models for dense prediction have proven successful for a
    wide range of visual tasks. Such models perform well in a supervised setting,
    but performance can be surprisingly poor under domain shifts that appear mild
    to a human observer. For example, training on one city and testing on another
    in a different geographic region and/or weather condition may result in
    significantly degraded performance due to pixel-level distribution shift. In
    this paper, we introduce the first domain adaptive semantic segmentation
    method, proposing an unsupervised adversarial approach to pixel prediction
    problems. Our method consists of both global and category specific adaptation
    techniques. Global domain alignment is performed using a novel semantic
    segmentation network with fully convolutional domain adversarial learning. This
    initially adapted space then enables category specific adaptation through a
    generalization of constrained weak learning, with explicit transfer of the
    spatial layout from the source to the target domains. Our approach outperforms
    baselines across different settings on multiple large-scale datasets, including
    adapting across various real city environments, different synthetic
    sub-domains, from simulated to real environments, and on a novel large-scale
    dash-cam dataset.

    Learning Video Object Segmentation from Static Images

    Anna Khoreva, Federico Perazzi, Rodrigo Benenson, Bernt Schiele, Alexander Sorkine-Hornung
    Comments: Submitted to CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Inspired by recent advances of deep learning in instance segmentation and
    object tracking, we introduce video object segmentation problem as a concept of
    guided instance segmentation. Our model proceeds on a per-frame basis, guided
    by the output of the previous frame towards the object of interest in the next
    frame. We demonstrate that highly accurate object segmentation in videos can be
    enabled by using a convnet trained with static images only. The key ingredient
    of our approach is a combination of offline and online learning strategies,
    where the former serves to produce a refined mask from the previous frame
    estimate and the latter allows to capture the appearance of the specific object
    instance. Our method can handle different types of input annotations: bounding
    boxes and segments, as well as incorporate multiple annotated frames, making
    the system suitable for diverse applications. We obtain competitive results on
    three different datasets, independently from the type of input annotation.

    Progressive Tree-like Curvilinear Structure Reconstruction with Structured Ranking Learning and Graph Algorithm

    Seong-Gyun Jeong, Yuliya Tarabalka, Nicolas Nisse, Josiane Zerubia
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a novel tree-like curvilinear structure reconstruction algorithm
    based on supervised learning and graph theory. In this work we analyze image
    patches to obtain the local major orientations and the rankings that correspond
    to the curvilinear structure. To extract local curvilinear features, we compute
    oriented gradient information using steerable filters. We then employ
    Structured Support Vector Machine for ordinal regression of the input image
    patches, where the ordering is determined by shape similarity to latent
    curvilinear structure. Finally, we progressively reconstruct the curvilinear
    structure by looking for geodesic paths connecting remote vertices in the graph
    built on the structured output rankings. Experimental results show that the
    proposed algorithm faithfully provides topological features of the curvilinear
    structures using minimal pixels for various datasets.

    Scene Flow Estimation: A Survey

    Zike Yan, Xuezhi Xiang
    Comments: 51 pages, 12 figures, 10 tables, 108 references
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper is the first to review the scene flow estimation field to the best
    of our knowledge, which analyzes and compares methods, technical challenges,
    evaluation methodologies and performance of scene flow estimation. Existing
    algorithms are categorized in terms of scene representation, data source, and
    calculation scheme, and the pros and cons in each category are compared
    briefly. The datasets and evaluation protocols are enumerated, and the
    performance of the most representative methods is presented. A future vision is
    illustrated with few questions arisen for discussion. This survey presents a
    general introduction and analysis of scene flow estimation.

    Multi-source Transfer Learning with Convolutional Neural Networks for Lung Pattern Analysis

    Stergios Christodoulidis, Marios Anthimopoulos, Lukas Ebner, Andreas Christe, Stavroula Mougiakakou
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Early diagnosis of interstitial lung diseases is crucial for their treatment,
    but even experienced physicians find it difficult, as their clinical
    manifestations are similar. In order to assist with the diagnosis,
    computer-aided diagnosis (CAD) systems have been developed. These commonly rely
    on a fixed scale classifier that scans CT images, recognizes textural lung
    patterns and generates a map of pathologies. In a previous study, we proposed a
    method for classifying lung tissue patterns using a deep convolutional neural
    network (CNN), with an architecture designed for the specific problem. In this
    study, we present an improved method for training the proposed network by
    transferring knowledge from the similar domain of general texture
    classification. Six publicly available texture databases are used to pretrain
    networks with the proposed architecture, which are then fine-tuned on the lung
    tissue data. The resulting CNNs are combined in an ensemble and their fused
    knowledge is compressed back to a network with the original architecture. The
    proposed approach resulted in an absolute increase of about 2% in the
    performance of the proposed CNN. The results demonstrate the potential of
    transfer learning in the field of medical image analysis, indicate the textural
    nature of the problem and show that the method used for training a network can
    be as important as designing its architecture.

    From Motion Blur to Motion Flow: a Deep Learning Solution for Removing Heterogeneous Motion Blur

    Dong Gong, Jie Yang, Lingqiao Liu, Yanning Zhang, Ian Reid, Chunhua Shen, Anton van den Hengel, Qinfeng Shi
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Removing pixel-wise heterogeneous motion blur is challenging due to the
    ill-posed nature of the problem. The predominant solution is to estimate the
    blur kernel by adding a prior, but the extensive literature on the subject
    indicates the difficulty in identifying a prior which is suitably informative,
    and general. Rather than imposing a prior based on theory, we propose instead
    to learn one from the data. Learning a prior over the latent image would
    require modeling all possible image content. The critical observation
    underpinning our approach is thus that learning the motion flow instead allows
    the model to focus on the cause of the blur, irrespective of the image content.
    This is a much easier learning task, but it also avoids the iterative process
    through which latent image priors are typically applied. Our approach directly
    estimates the motion flow from the blurred image through a fully-convolutional
    deep neural network (FCN) and recovers the unblurred image from the estimated
    motion flow. Our FCN is the first universal end-to-end mapping from the blurred
    image to the dense motion flow. To train the FCN, we simulate motion flows to
    generate synthetic blurred-image-motion-flow pairs thus avoiding the need for
    human labeling. Extensive experiments on challenging realistic blurred images
    demonstrate that the proposed method outperforms the state-of-the-art.

    Filter sharing: Efficient learning of parameters for volumetric convolutions

    Rahul Venkataramani, Sheshadri Thiruvenkadam, Prasad Sudhakar, Hariharan Ravishankar, Vivek Vaidya
    Comments: 6 pages, 2 figures. Published in NIPS 2016 workshop on Machine Learning for Health, December 2016, Barcelona
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Typical convolutional neural networks (CNNs) have several millions of
    parameters and require a large amount of annotated data to train them. In
    medical applications where training data is hard to come by, these
    sophisticated machine learning models are difficult to train. In this paper, we
    propose a method to reduce the inherent complexity of CNNs during training by
    exploiting the significant redundancy that is noticed in the learnt CNN
    filters. Our method relies on finding a small set of filters and mixing
    coefficients to derive every filter in each convolutional layer at the time of
    training itself, thereby reducing the number of parameters to be trained. We
    consider the problem of 3D lung nodule segmentation in CT images and
    demonstrate the effectiveness of our method in achieving good results with only
    few training examples.

    Classification of Neurological Gait Disorders Using Multi-task Feature Learning

    Ioannis Papavasileiou, Wenlong Zhang, Song Han, Xin Wang, Jinbo Bi, Nancy Byl, Masayoshi Tomizuka
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    As our population ages, neurological impairments and degeneration of the
    musculoskeletal system yield gait abnormalities, which can significantly reduce
    quality of life. Gait rehabilitation therapy has been widely adopted to help
    these patients retrain central nervous system physiology to maximize community
    participation and independence. To further improve the rehabilitative therapy
    provided to the patients, more objective methods need to be used and rely less
    in the subjective judgment of the therapist and patient. In this paper, an
    algorithmic framework is proposed to provide classification of gait affected by
    two common neurological diseases, stroke and Parkinson’s Disease (PD), from
    ground contact force (GCF) data. An advanced machine learning method,
    multi-task learning (MTL), is used to jointly train classification models of
    subject’s gait in three classes, stroke, Parkinson’s and healthy gait. Gait
    parameters that can capture gait patterns related to mobility, balance,
    strength and gait phases are used as features for the classification. Out of
    all the parameters used, the MTL models capture the important ones per disease
    and help provide better objective assessment and therapy progress tracking. To
    evaluate the proposed methodology we use data from a human participant study,
    including five PD patients, three post-stroke patients, and three healthy
    subjects. Despite the diversity of the abnormalities caused from each disease,
    the evaluation shows that the proposed approach can successfully distinguish
    patient’s gait from these diseases and healthy gait. Moreover, the proposed
    machine learning methods select the best gait parameters from each category.
    This work can open new research directions in effective gait assessment,
    targeted treatment and therapy progress tracking.

    AGA: Attribute Guided Augmentation

    Mandar Dixit, Roland Kwitt, Marc Niethammer, Nuno Vasconcelos
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We consider the problem of data augmentation, i.e., generating artificial
    samples to extend a given corpus of training data. Specifically, we propose
    attributed-guided augmentation (AGA) which learns a mapping that allows to
    synthesize data such that an attribute of a synthesized sample is at a desired
    value or strength. This is particularly interesting in situations where little
    data with no attribute annotation is available for learning, but we have access
    to a large external corpus of heavily annotated samples. While prior works
    primarily augment in the space of images, we propose to perform augmentation in
    feature space instead. We implement our approach as a deep encoder-decoder
    architecture that learns the synthesis function in an end-to-end manner. We
    demonstrate the utility of our approach on the problems of (1) one-shot object
    recognition in a transfer-learning setting where we have no prior knowledge of
    the new classes, as well as (2) object-based one-shot scene recognition. As
    external data, we leverage 3D depth and pose information from the SUN RGB-D
    dataset. Our experiments show that attribute-guided augmentation of high-level
    CNN features considerably improves one-shot recognition performance on both
    problems.

    Query-adaptive Image Retrieval by Deep Weighted Hashing

    Jian Zhang, Yuxin Peng, Junchao Zhang
    Comments: 10 pages. arXiv admin note: text overlap with arXiv:1607.08477
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The hashing methods have attracted much attention for large scale image
    retrieval. Some deep hashing methods have achieved promising results by taking
    advantage of the better representation power of deep networks recently.
    However, existing deep hashing methods treat all hash bits equally. On one
    hand, a large number of images share the same distance to a query image because
    of the discrete Hamming distance, which cannot provide fine-grained retrieval
    since the ranking of these images is ambiguous. On the other hand, different
    hash bits actually contribute to the image retrieval differently, treating them
    equally greatly affects the image retrieval accuracy. To address the two
    problems, we propose the query-adaptive deep weighted hashing (QaDWH) approach,
    which can perform fine-grained image retrieval for different queries by
    weighted Hamming distance. First, a novel deep hashing network is designed to
    learn the hash codes and corresponding class-wise hash bit weights jointly, so
    that the learned weights can reflect the importance of different hash bits for
    different image class. Second, a query-adaptive image retrieval method is
    proposed, which rapidly generate query image’s hash bit weights by the fusion
    of the semantic probability of the query and the learned class-wise weights.
    Fine-grained image retrieval is then performed by the weighted Hamming
    distance, which can provide more accurate ranking than the original Hamming
    distance. Extensive experiments on 3 widely used datasets show that the
    proposed approach outperforms state-of-the-art hashing methods.

    Contextual Visual Similarity

    Xiaofang Wang, Kris M. Kitani, Martial Hebert
    Comments: Submitted to CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Measuring visual similarity is critical for image understanding. But what
    makes two images similar? Most existing work on visual similarity assumes that
    images are similar because they contain the same object instance or category.
    However, the reason why images are similar is much more complex. For example,
    from the perspective of category, a black dog image is similar to a white dog
    image. However, in terms of color, a black dog image is more similar to a black
    horse image than the white dog image. This example serves to illustrate that
    visual similarity is ambiguous but can be made precise when given an explicit
    contextual perspective. Based on this observation, we propose the concept of
    contextual visual similarity. To be concrete, we examine the concept of
    contextual visual similarity in the application domain of image search. Instead
    of providing only a single image for image similarity search (eg, Google image
    search), we require three images. Given a query image, a second positive image
    and a third negative image, dissimilar to the first two images, we define a
    contextualized similarity search criteria. In particular, we learn feature
    weights over all the feature dimensions of each image such that the distance
    between the query image and the positive image is small and their distances to
    the negative image are large after reweighting their features. The learned
    feature weights encode the contextualized visual similarity specified by the
    user and can be used for attribute specific image search. We also show the
    usefulness of our contextualized similarity weighting scheme for different
    tasks, such as answering visual analogy questions and unsupervised attribute
    discovery.

    An Efficient Algorithm for the Piecewise-Smooth Model with Approximately Explicit Solutions

    Huihui Song, Yuhui Zheng, Kaihua Zhang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents an efficient approach to image segmentation that
    approximates the piecewise-smooth (PS) functional in [12] with explicit
    solutions. By rendering some rational constraints on the initial conditions and
    the final solutions of the PS functional, we propose two novel formulations
    which can be approximated to be the explicit solutions of the evolution partial
    differential equations (PDEs) of the PS model, in which only one PDE needs to
    be solved efficiently. Furthermore, an energy term that regularizes the level
    set function to be a signed distance function is incorporated into our
    evolution formulation, and the time-consuming re-initialization is avoided.
    Experiments on synthetic and real images show that our method is more efficient
    than both the PS model and the local binary fitting (LBF) model [4], while
    having similar segmentation accuracy as the LBF model.

    Complex Matrix Factorization for Face Recognition

    Viet-Hang Duong, Yuan-Shan Lee, Bach-Tung Pham, Seksan Mathulaprangsan, Pham The Bao, Jia-Ching Wang
    Comments: 4 pages,3 figures,4 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This work developed novel complex matrix factorization methods for face
    recognition; the methods were complex matrix factorization (CMF), sparse
    complex matrix factorization (SpaCMF), and graph complex matrix factorization
    (GraCMF). After real-valued data are transformed into a complex field, the
    complex-valued matrix will be decomposed into two matrices of bases and
    coefficients, which are derived from solutions to an optimization problem in a
    complex domain. The generated objective function is the real-valued function of
    the reconstruction error, which produces a parametric description. Factorizing
    the matrix of complex entries directly transformed the constrained optimization
    problem into an unconstrained optimization problem. Additionally, a complex
    vector space with N dimensions can be regarded as a 2N-dimensional real vector
    space. Accordingly, all real analytic properties can be exploited in the
    complex field. The ability to exploit these important characteristics motivated
    the development herein of a simpler framework that can provide better
    recognition results. The effectiveness of this framework will be clearly
    elucidated in later sections in this paper.

    Discrete Schroedinger Transform For Texture Recognition

    João B. Florindo, Odemir M. Bruno
    Comments: 15 pages, 7 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Data Analysis, Statistics and Probability (physics.data-an)

    This work presents a new procedure to extract features of grey-level texture
    images based on the discrete Schroedinger transform. This is a non-linear
    transform where the image is mapped as the initial probability distribution of
    a wave function and such distribution evolves in time following the
    Schroedinger equation from Quantum Mechanics. The features are provided by
    statistical moments of the distribution measured at different times. The
    proposed method is applied to the classification of three databases of textures
    used for benchmark and compared to other well-known texture descriptors in the
    literature, such as textons, local binary patterns, multifractals, among
    others. All of them are outperformed by the proposed method in terms of
    percentage of images correctly classified. The proposal is also applied to the
    identification of plant species using scanned images of leaves and again it
    outperforms other texture methods. A test with images affected by Gaussian and
    “salt & pepper” noise is also carried out, also with the best performance
    achieved by the Schroedinger descriptors.

    Research on the Multiple Feature Fusion Image Retrieval Algorithm based on Texture Feature and Rough Set Theory

    Xiaojie Shi, Yijun Shao
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recently, we have witnessed the explosive growth of images with complex
    information and content. In order to effectively and precisely retrieve desired
    images from a large-scale image database with low time-consuming, we propose
    the multiple feature fusion image retrieval algorithm based on the texture
    feature and rough set theory in this paper. In contrast to the conventional
    approaches that only use the single feature or standard, we fuse the different
    features with operation of normalization. The rough set theory will assist us
    to enhance the robustness of retrieval system when facing with incomplete data
    warehouse. To enhance the texture extraction paradigm, we use the wavelet Gabor
    function that holds better robustness. In addition, from the perspectives of
    the internal and external normalization, we re-organize extracted feature with
    the better combination. The numerical experiment has verified general
    feasibility of our methodology. We enhance the overall accuracy compared with
    the other state-of-the-art algorithms.

    Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker

    James H Cole, Rudra PK Poudel, Dimosthenis Tsagkrasoulis, Matthan WA Caan, Claire Steves, Tim D Spector, Giovanni Montana
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neurons and Cognition (q-bio.NC)

    Machine learning analysis of neuroimaging data can accurately predict
    chronological age in healthy people and deviations from healthy brain ageing
    have been associated with cognitive impairment and disease. Here we sought to
    further establish the credentials of “brain-predicted age” as a biomarker of
    individual differences in the brain ageing process, using a predictive
    modelling approach based on deep learning, and specifically convolutional
    neural networks (CNN), and applied to both pre-processed and raw T1-weighted
    MRI data. Firstly, we aimed to demonstrate the accuracy of CNN brain-predicted
    age using a large dataset of healthy adults (N = 2001). Next, we sought to
    establish the heritability of brain-predicted age using a sample of monozygotic
    and dizygotic female twins (N = 62). Thirdly, we examined the test-retest and
    multi-centre reliability of brain-predicted age using two samples
    (within-scanner N = 20; between-scanner N = 11). CNN brain-predicted ages were
    generated and compared to a Gaussian Process Regression (GPR) approach, on all
    datasets. Input data were grey matter (GM) or white matter (WM) volumetric maps
    generated by Statistical Parametric Mapping (SPM) or raw data. Brain-predicted
    age represents an accurate, highly reliable and genetically-valid phenotype,
    that has potential to be used as a biomarker of brain ageing. Moreover, age
    predictions can be accurately generated on raw T1-MRI data, substantially
    reducing computation time for novel data, bringing the process closer to giving
    real-time information on brain health in clinical settings.


    Artificial Intelligence

    Hierarchy through Composition with Linearly Solvable Markov Decision Processes

    Andrew M. Saxe, Adam Earle, Benjamin Rosman
    Comments: 9 pages, 3 figures
    Subjects: Artificial Intelligence (cs.AI)

    Hierarchical architectures are critical to the scalability of reinforcement
    learning methods. Current hierarchical frameworks execute actions serially,
    with macro-actions comprising sequences of primitive actions. We propose a
    novel alternative to these control hierarchies based on concurrent execution of
    many actions in parallel. Our scheme uses the concurrent compositionality
    provided by the linearly solvable Markov decision process (LMDP) framework,
    which naturally enables a learning agent to draw on several macro-actions
    simultaneously to solve new tasks. We introduce the Multitask LMDP module,
    which maintains a parallel distributed representation of tasks and may be
    stacked to form deep hierarchies abstracted in space and time.

    Decision Theory in an Algebraic Setting

    Maurizio Negri
    Comments: 22 pages, 1 figure
    Subjects: Artificial Intelligence (cs.AI); Probability (math.PR)

    In decision theory an act is a function from a set of conditions to the set
    of real numbers. The set of conditions is a partition in some algebra of
    events. The expected value of an act can be calculated when a probability
    measure is given. We adopt an algebraic point of view by substituting the
    algebra of events with a finite distributive lattice and the probability
    measure with a lattice valuation. We introduce a partial order on acts that
    generalizes the dominance relation and show that the set of acts is a lattice
    with respect to this order. Finally we analyze some different kinds of
    comparison between acts, without supposing a common set of conditions for the
    acts to be compared.

    Inverses, Conditionals and Compositional Operators in Separative Valuation Algebra

    Juerg Kohlas
    Subjects: Artificial Intelligence (cs.AI)

    Compositional models were introduce by Jirousek and Shenoy in the general
    framework of valuation-based systems. They based their theory on an axiomatic
    system of valuations involving not only the operations of combination and
    marginalisation, but also of removal. They claimed that this systems covers
    besides the classical case of discrete probability distributions, also the
    cases of Gaussian densities and belief functions, and many other systems.

    Whereas their results on the compositional operator are correct, the
    axiomatic basis is not sufficient to cover the examples claimed above. We
    propose here a different axiomatic system of valuation algebras, which permits
    a rigorous mathematical theory of compositional operators in valuation-based
    systems and covers all the examples mentioned above. It extends the classical
    theory of inverses in semigroup theory and places thereby the present theory
    into its proper mathematical frame. Also this theory sheds light on the
    different structures of valuation-based systems, like regular algebras
    (represented by probability potentials), canncellative algebras (Gaussian
    potentials) and general separative algebras (density functions).

    Interactive Elicitation of Knowledge on Feature Relevance Improves Predictions in Small Data Sets

    Luana Micallef, Iiris Sundin, Pekka Marttinen, Muhammad Ammad-ud-din, Tomi Peltola, Marta Soare, Giulio Jacucci, Samuel Kaski
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Providing accurate predictions is challenging for machine learning algorithms
    when the number of features is larger than the number of samples in the data.
    Prior knowledge can improve machine learning models by indicating relevant
    variables and parameter values. Yet, this prior knowledge is often tacit and
    only available from domain experts. We present a novel approach that uses
    interactive visualization to elicit the tacit prior knowledge and uses it to
    improve the accuracy of prediction models. The main component of our approach
    is a user model that models the domain expert’s knowledge of the relevance of
    different features for a prediction task. In particular, based on the expert’s
    earlier input, the user model guides the selection of the features on which to
    elicit user’s knowledge next. The results of a controlled user study show that
    the user model significantly improves prior knowledge elicitation and
    prediction accuracy, when predicting the relative citation counts of scientific
    documents in a specific domain.

    Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification

    Ting Chen, Yizhou Sun
    Comments: Accepted by WSDM 2017. This is an extended version with appendix
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (stat.ML)

    In this paper, we study the problem of author identification under
    double-blind review setting, which is to identify potential authors given
    information of an anonymized paper. Different from existing approaches that
    rely heavily on feature engineering, we propose to use network embedding
    approach to address the problem, which can automatically represent nodes into
    lower dimensional feature vectors. However, there are two major limitations in
    recent studies on network embedding: (1) they are usually general-purpose
    embedding methods, which are independent of the specific tasks; and (2) most of
    these approaches can only deal with homogeneous networks, where the
    heterogeneity of the network is ignored. Hence, challenges faced here are two
    folds: (1) how to embed the network under the guidance of the author
    identification task, and (2) how to select the best type of information due to
    the heterogeneity of the network.

    To address the challenges, we propose a task-guided and path-augmented
    heterogeneous network embedding model. In our model, nodes are first embedded
    as vectors in latent feature space. Embeddings are then shared and jointly
    trained according to task-specific and network-general objectives. We extend
    the existing unsupervised network embedding to incorporate meta paths in
    heterogeneous networks, and select paths according to the specific task. The
    guidance from author identification task for network embedding is provided both
    explicitly in joint training and implicitly during meta path selection. Our
    experiments demonstrate that by using path-augmented network embedding with
    task guidance, our model can obtain significantly better accuracy at
    identifying the true authors comparing to existing methods.

    Safety Verification and Control for Collision Avoidance at Road Intersections

    Heejin Ahn, Domitilla Del Vecchio
    Comments: 12 pages
    Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Systems and Control (cs.SY)

    This paper presents the design of a supervisory algorithm that monitors
    safety at road intersections and overrides drivers with a safe input when
    necessary. The design of the supervisor consists of two parts: safety
    verification and control design. Safety verification is the problem to
    determine if vehicles will be able to cross the intersection without colliding
    with current drivers’ inputs. We translate this safety verification problem
    into a jobshop scheduling problem, which minimizes the maximum lateness and
    evaluates if the optimal cost is zero. The zero optimal cost corresponds to the
    case in which all vehicles can cross each conflict area without collisions.
    Computing the optimal cost requires solving a Mixed Integer Nonlinear
    Programming (MINLP) problem due to the nonlinear second-order dynamics of the
    vehicles. We therefore estimate this optimal cost by formulating two related
    Mixed Integer Linear Programming (MILP) problems that assume simpler vehicle
    dynamics. We prove that these two MILP problems yield lower and upper bounds of
    the optimal cost. We also quantify the worst case approximation errors of these
    MILP problems. We design the supervisor to override the vehicles with a safe
    control input if the MILP problem that computes the upper bound yields a
    positive optimal cost. We theoretically demonstrate that the supervisor keeps
    the intersection safe and is non-blocking. Computer simulations further
    validate that the algorithms can run in real time for problems of realistic
    size.

    Coupling Distributed and Symbolic Execution for Natural Language Queries

    Lili Mou, Zhengdong Lu, Hang Li, Zhi Jin
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    Building neural networks to query a knowledge base (a table) with natural
    language is an emerging research topic in NLP. The neural enquirer typically
    necessitates multiple steps of execution because of the compositionality of
    queries. In previous studies, researchers have developed either distributed
    enquirers or symbolic ones for table querying. The distributed enquirer is
    end-to-end learnable, but is weak in terms of execution efficiency and explicit
    interpretability. The symbolic enqurier, on the contrary, is efficient during
    execution; but it is very difficult to train especially at initial stages. In
    this paper, we propose to couple distributed and symbolic execution for natural
    language queries. The observation is that a fully distributed executor also
    exhibits meaningful, albeit imperfect, interpretation. We can thus pretrain the
    symbolic executor with the distributed one’s intermediate execution results in
    a step-by-step fashion. Experiments show that our approach significantly
    outperforms either the distributed or symbolic executor; moreover, we have
    recovered more than 80% execution sequences with only groundtruth denotations
    during training. In summary, the coupled neural enquirer takes advantages of
    both distributed and symbolic executors, and has high performance, high
    learning efficiency, high execution efficiency, and high interpretability.

    Controlling Robot Morphology from Incomplete Measurements

    Martin Pecka, Karel Zimmermann, Michal Reinštein, Tomáš Svoboda
    Comments: Accepted into IEEE Transactions to Industrial Electronics, Special Section on Motion Control for Novel Emerging Robotic Devices and Systems
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

    Mobile robots with complex morphology are essential for traversing rough
    terrains in Urban Search & Rescue missions (USAR). Since teleoperation of the
    complex morphology causes high cognitive load of the operator, the morphology
    is controlled autonomously. The autonomous control measures the robot state and
    surrounding terrain which is usually only partially observable, and thus the
    data are often incomplete. We marginalize the control over the missing
    measurements and evaluate an explicit safety condition. If the safety condition
    is violated, tactile terrain exploration by the body-mounted robotic arm
    gathers the missing data.

    Learning in the Machine: Random Backpropagation and the Learning Channel

    Pierre Baldi, Peter Sadowski, Zhiqin Lu
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Random backpropagation (RBP) is a variant of the backpropagation algorithm
    for training neural networks, where the transpose of the forward matrices are
    replaced by fixed random matrices in the calculation of the weight updates. It
    is remarkable both because of its effectiveness, in spite of using random
    matrices to communicate error information, and because it completely removes
    the taxing requirement of maintaining symmetric weights in a physical neural
    system. To better understand random backpropagation, we first connect it to the
    notions of local learning and the learning channel. Through this connection, we
    derive several alternatives to RBP, including skipped RBP (SRPB), adaptive RBP
    (ARBP), sparse RBP, and their combinations (e.g. ASRBP) and analyze their
    computational complexity. We then study their behavior through simulations
    using the MNIST and CIFAR-10 bechnmark datasets. These simulations show that
    most of these variants work robustly, almost as well as backpropagation, and
    that multiplication by the derivatives of the activation functions is
    important. As a follow-up, we study also the low-end of the number of bits
    required to communicate error information over the learning channel. We then
    provide partial intuitive explanations for some of the remarkable properties of
    RBP and its variations. Finally, we prove several mathematical results,
    including the convergence to fixed points of linear chains of arbitrary length,
    the convergence to fixed points of linear autoencoders with decorrelated data,
    the long-term existence of solutions for linear systems with a single hidden
    layer, and the convergence to fixed points of non-linear chains, when the
    derivative of the activation functions is included.

    Fixpoint Approximation of Strategic Abilities under Imperfect Information

    Wojciech Jamroga, Michał Knapik, Damian Kurpiewski
    Comments: 9 pages
    Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

    Model checking of strategic ability under imperfect information is known to
    be hard. In this paper, we propose translations of ATLir formulae that provide
    lower and upper bounds for their truth values, and are cheaper to verify than
    the original specifications. Most interestingly, the lower approximation is
    provided by a fixpoint expression that uses a nonstandard variant of the
    next-step ability operator. We show the correctness of the translations,
    establish their computational complexity, and validate the approach by
    experiments with a scalable scenario of Bridge play.

    Prediction with a Short Memory

    Sham Kakade, Percy Liang, Vatsal Sharan, Gregory Valiant
    Comments: 26 pages, 1 figure
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Machine Learning (stat.ML)

    We consider the problem of predicting the next observation given a sequence
    of past observations. We show that for any distribution over observations, if
    the mutual information between past observations and future observations is
    upper bounded by (I), then a simple Markov model over the most recent
    (I/epsilon) observations can obtain KL error (epsilon) with respect to the
    optimal predictor with access to the entire past. For a Hidden Markov Model
    with (n) states, (I) is bounded by (log n), a quantity that does not depend on
    the mixing time. We also demonstrate that the simple Markov model cannot really
    be improved upon: First, a window length of (I/epsilon) ((I/epsilon^2)) is
    information-theoretically necessary for KL error ((ell_1) error). Second, the
    (d^{Theta(I/epsilon)}) samples required to accurately estimate the Markov
    model when observations are drawn from an alphabet of size (d) is in fact
    necessary for any computationally tractable algorithm, assuming the hardness of
    strongly refuting a certain class of CSPs.

    Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

    Yichen Chen, Mengdi Wang
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

    We study the online estimation of the optimal policy of a Markov decision
    process (MDP). We propose a class of Stochastic Primal-Dual (SPD) methods which
    exploit the inherent minimax duality of Bellman equations. The SPD methods
    update a few coordinates of the value and policy estimates as a new state
    transition is observed. These methods use small storage and has low
    computational complexity per iteration. The SPD methods find an
    absolute-(epsilon)-optimal policy, with high probability, using
    (mathcal{O}left(frac{|mathcal{S}|^4 |mathcal{A}|^2sigma^2
    }{(1-gamma)^6epsilon^2}
    ight)) iterations/samples for the infinite-horizon
    discounted-reward MDP and (mathcal{O}left(frac{|mathcal{S}|^4
    |mathcal{A}|^2H^6sigma^2 }{epsilon^2}
    ight)) for the finite-horizon MDP.


    Information Retrieval

    Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification

    Ting Chen, Yizhou Sun
    Comments: Accepted by WSDM 2017. This is an extended version with appendix
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (stat.ML)

    In this paper, we study the problem of author identification under
    double-blind review setting, which is to identify potential authors given
    information of an anonymized paper. Different from existing approaches that
    rely heavily on feature engineering, we propose to use network embedding
    approach to address the problem, which can automatically represent nodes into
    lower dimensional feature vectors. However, there are two major limitations in
    recent studies on network embedding: (1) they are usually general-purpose
    embedding methods, which are independent of the specific tasks; and (2) most of
    these approaches can only deal with homogeneous networks, where the
    heterogeneity of the network is ignored. Hence, challenges faced here are two
    folds: (1) how to embed the network under the guidance of the author
    identification task, and (2) how to select the best type of information due to
    the heterogeneity of the network.

    To address the challenges, we propose a task-guided and path-augmented
    heterogeneous network embedding model. In our model, nodes are first embedded
    as vectors in latent feature space. Embeddings are then shared and jointly
    trained according to task-specific and network-general objectives. We extend
    the existing unsupervised network embedding to incorporate meta paths in
    heterogeneous networks, and select paths according to the specific task. The
    guidance from author identification task for network embedding is provided both
    explicitly in joint training and implicitly during meta path selection. Our
    experiments demonstrate that by using path-augmented network embedding with
    task guidance, our model can obtain significantly better accuracy at
    identifying the true authors comparing to existing methods.

    A note on the triangle inequality for the Jaccard distance

    Sven Kosub
    Subjects: Discrete Mathematics (cs.DM); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)

    Two simple proofs of the triangle inequality for the Jaccard distance in
    terms of nonnegative, monotone, submodular functions are given and discussed.


    Computation and Language

    Discovering Conversational Dependencies between Messages in Dialogs

    Wenchao Du, Pascal Poupart, Wei Xu
    Comments: AAAI2017 student abstract camera-ready version
    Subjects: Computation and Language (cs.CL)

    We investigate the task of inferring conversational dependencies between
    messages in one-on-one online chat, which has become one of the most popular
    forms of customer service. We propose a novel probabilistic classifier that
    leverages conversational, lexical and semantic information. The approach is
    evaluated empirically on a set of customer service chat logs from a Chinese
    e-commerce website. It outperforms heuristic baselines.

    Mention2Vec: Entity Identification as Multitasking

    Karl Stratos
    Subjects: Computation and Language (cs.CL)

    Standard approaches in entity identification hard-code boundary detection and
    type prediction into labels (e.g., John/B-PER Smith/I-PER) and then perform
    Viterbi. This has two disadvantages: 1. the runtime complexity grows
    quadratically in the number of types, and 2. there is no natural segment-level
    representation. In this paper, we propose a novel neural architecture that
    addresses these disadvantages. We frame the problem as multitasking, separating
    boundary detection and type prediction but optimizing them jointly. Despite its
    simplicity, this architecture performs competitively with fully structured
    models such as BiLSTM-CRFs while scaling linearly in the number of types.
    Furthermore, by construction, the model induces type-disambiguating embeddings
    of predicted mentions.

    Embedding Words and Senses Together via Joint Knowledge-Enhanced Training

    Massimiliano Mancini, Jose Camacho-Collados, Ignacio Iacobacci, Roberto Navigli
    Comments: 10 pages, 1 figure
    Subjects: Computation and Language (cs.CL)

    Word embeddings based on neural networks are widely used in Natural Language
    Processing. However, despite their success in capturing semantic information
    from massive corpora, word embeddings still conflate different meanings of a
    word into a single vectorial representation and do not benefit from information
    available in lexical resources. We address this issue by proposing a new model
    that jointly learns word and sense embeddings and represents them in a unified
    vector space by exploiting large corpora and knowledge obtained from semantic
    networks. We evaluate the main features of our approach qualitatively and
    quantitatively in various tasks, highlighting the advantages of the proposed
    method with respect to state-of-the-art word- and sense-based models.

    Improving the Performance of Neural Machine Translation Involving Morphologically Rich Languages

    Krupakar Hans, R S Milton
    Comments: 21 pages, 11 figures, 2 tables, In Journal of Computational Linguistics
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    The advent of the attention mechanism in neural machine translation models
    has improved the performance of machine translation systems by enabling
    selective lookup into the source sentence. In this paper, the efficiencies of
    translation using bidirectional encoder attention decoder models were studied
    with respect to translation involving morphologically rich languages. The
    English – Tamil language pair was selected for this analysis. First, the use of
    Word2Vec embedding for both the English and Tamil words improved the
    translation results by 0.73 BLEU points over the baseline RNNSearch model with
    4.84 BLEU score. The use of morphological segmentation before word
    vectorization to split the morphologically rich Tamil words into their
    respective morphemes before the translation, caused a reduction in the target
    vocabulary size by a factor of 8. Also, this model (RNNMorph) improved the
    performance of neural machine translation by 7.05 BLEU points over the
    RNNSearch model used over the same corpus. Since the BLEU evaluation of the
    RNNMorph model might be unreliable due to an increase in the number of matching
    tokens per sentence, the performances of the translations were also compared by
    means of human evaluation metrics of adequacy, fluency and relative ranking.
    Further, the use of morphological segmentation also improved the efficacy of
    the attention mechanism.

    Coupling Distributed and Symbolic Execution for Natural Language Queries

    Lili Mou, Zhengdong Lu, Hang Li, Zhi Jin
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    Building neural networks to query a knowledge base (a table) with natural
    language is an emerging research topic in NLP. The neural enquirer typically
    necessitates multiple steps of execution because of the compositionality of
    queries. In previous studies, researchers have developed either distributed
    enquirers or symbolic ones for table querying. The distributed enquirer is
    end-to-end learnable, but is weak in terms of execution efficiency and explicit
    interpretability. The symbolic enqurier, on the contrary, is efficient during
    execution; but it is very difficult to train especially at initial stages. In
    this paper, we propose to couple distributed and symbolic execution for natural
    language queries. The observation is that a fully distributed executor also
    exhibits meaningful, albeit imperfect, interpretation. We can thus pretrain the
    symbolic executor with the distributed one’s intermediate execution results in
    a step-by-step fashion. Experiments show that our approach significantly
    outperforms either the distributed or symbolic executor; moreover, we have
    recovered more than 80% execution sequences with only groundtruth denotations
    during training. In summary, the coupled neural enquirer takes advantages of
    both distributed and symbolic executors, and has high performance, high
    learning efficiency, high execution efficiency, and high interpretability.

    Towards better decoding and language model integration in sequence to sequence models

    Jan Chorowski, Navdeep Jaitly
    Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    The recently proposed Sequence-to-Sequence (seq2seq) framework advocates
    replacing complex data processing pipelines, such as an entire automatic speech
    recognition system, with a single neural network trained in an end-to-end
    fashion. In this contribution, we analyse an attention-based seq2seq speech
    recognition system that directly transcribes recordings into characters. We
    observe two shortcomings: overconfidence in its predictions and a tendency to
    produce incomplete transcriptions when language models are used. We propose
    practical solutions to both problems achieving competitive speaker independent
    word error rates on the Wall Street Journal dataset: without separate language
    models we reach 10.6% WER, while together with a trigram language model, we
    reach 6.7% WER.


    Distributed, Parallel, and Cluster Computing

    Realtime Predictive Maintenance with Lambda Architecture

    Yoji Yamato, Hiroki Kumazaki, Yoshifumi Fukumoto
    Comments: 4 pages, in Japanese, 3 figures, IEICE Technical Report, SC2016-28, Nov. 2016
    Journal-ref: IEICE Technical Report, SC2016-28, Nov. 2016
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Recently, IoT technologies have been progressed and applications of
    maintenance area are expected. However, IoT maintenance applications are not
    spread in Japan yet because of insufficient analysis of real time situation,
    high cost to collect sensing data and to configure failure detection rules. In
    this paper, using lambda architecture concept, we propose a maintenance
    platform in which edge nodes analyze sensing data, detect anomaly, extract a
    new detection rule in real time and a cloud orders maintenance automatically,
    also analyzes whole data collected by batch process in detail, updates learning
    model of edge nodes to improve analysis accuracy.

    Multiscale Computing in the Exascale Era

    Saad Alowayyed, Derek Groen, Peter V. Coveney, Alfons G. Hoekstra
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    We expect that multiscale simulations will be one of the main high
    performance computing workloads in the exascale era. We propose multiscale
    computing patterns as a generic vehicle to realise load balanced, fault
    tolerant and energy aware high performance multiscale computing. Multiscale
    computing patterns should lead to a separation of concerns, whereby application
    developers can compose multiscale models and execute multiscale simulations,
    while pattern software realises optimized, fault tolerant and energy aware
    multiscale computing. We introduce three multiscale computing patterns, present
    an example of the extreme scaling pattern, and discuss our vision of how this
    may shape multiscale computing in the exascale era.

    An initial investigation of the performance of GPU-based swept time-space decomposition

    Kyle E Niemeyer, Daniel Magee
    Comments: 14 pages; submitted to 2017 AIAA SciTech Forum
    Subjects: Computational Physics (physics.comp-ph); Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS)

    Simulations of physical phenomena are essential to the expedient design of
    precision components in aerospace and other high-tech industries. These
    phenomena are often described by mathematical models involving partial
    differential equations (PDEs) without exact solutions. Modern design problems
    require simulations with a level of resolution that is difficult to achieve in
    a reasonable amount of time even in effectively parallelized solvers. Though
    the scale of the problem relative to available computing power is the greatest
    impediment to accelerating these applications, significant performance gains
    can be achieved through careful attention to the details of memory accesses.
    Parallelized PDE solvers are subject to a trade-off in memory management: store
    the solution for each timestep in abundant, global memory with high access
    costs or in a limited, private memory with low access costs that must be passed
    between nodes. The GPU implementation of swept time-space decomposition
    presented here mitigates this dilemma by using private (shared) memory,
    avoiding internode communication, and overwriting unnecessary values. It shows
    significant improvement in the execution time of the PDE solvers in one
    dimension achieving speedups of 6-2x for large and small problem sizes
    respectively compared to naive GPU versions and 7-300x compared to parallel CPU
    versions.

    Spontaneous Proximity Clouds: Making Mobile Devices to Collaborate for Resource and Data Sharing

    Roya Golchay (CITI), Frédéric Le Mouël (CITI), Julien Ponge (CITI), Nicolas Stouls (CITI)
    Comments: in Proceedings of the 12th EAI International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom’2016), Nov 2016, Beijing, China
    Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)

    The base motivation of Mobile Cloud Computing was empowering mobile devices
    by application offloading onto powerful cloud resources. However, this goal
    can’t entirely be reached because of the high offloading cost imposed by the
    long physical distance between the mobile device and the cloud. To address this
    issue, we propose an application offloading onto a nearby mobile cloud composed
    of the mobile devices in the vicinity-a Spontaneous Proximity Cloud. We
    introduce our proposed dynamic, ant-inspired, bi-objective offloading
    middleware-ACOMMA, and explain its extension to perform a close mobile
    application offloading. With the learning-based offloading decision-making
    process of ACOMMA, combined to the collaborative resource sharing, the mobile
    devices can cooperate for decision cache sharing. We evaluate the performance
    of ACOMMA in collaborative mode with real benchmarks Face Recognition and
    Monte-Carlo algorithms-and achieve 50% execution time gain.


    Learning

    Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification

    Ting Chen, Yizhou Sun
    Comments: Accepted by WSDM 2017. This is an extended version with appendix
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (stat.ML)

    In this paper, we study the problem of author identification under
    double-blind review setting, which is to identify potential authors given
    information of an anonymized paper. Different from existing approaches that
    rely heavily on feature engineering, we propose to use network embedding
    approach to address the problem, which can automatically represent nodes into
    lower dimensional feature vectors. However, there are two major limitations in
    recent studies on network embedding: (1) they are usually general-purpose
    embedding methods, which are independent of the specific tasks; and (2) most of
    these approaches can only deal with homogeneous networks, where the
    heterogeneity of the network is ignored. Hence, challenges faced here are two
    folds: (1) how to embed the network under the guidance of the author
    identification task, and (2) how to select the best type of information due to
    the heterogeneity of the network.

    To address the challenges, we propose a task-guided and path-augmented
    heterogeneous network embedding model. In our model, nodes are first embedded
    as vectors in latent feature space. Embeddings are then shared and jointly
    trained according to task-specific and network-general objectives. We extend
    the existing unsupervised network embedding to incorporate meta paths in
    heterogeneous networks, and select paths according to the specific task. The
    guidance from author identification task for network embedding is provided both
    explicitly in joint training and implicitly during meta path selection. Our
    experiments demonstrate that by using path-augmented network embedding with
    task guidance, our model can obtain significantly better accuracy at
    identifying the true authors comparing to existing methods.

    The Physical Systems Behind Optimization Algorithms

    Lin F. Yang, R. Arora, V. Braverman, Tuo Zhao
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

    We provide some new insights for analyzing dynamics of optimization
    algorithms, which are popular in machine learning, based on differential
    equation approaches. Our analysis reveals a natural connection from
    optimization algorithm to physical systems, and is applicable to more general
    algorithms and optimization problems beyond general convexity and strong
    convexity.

    Interactive Prior Elicitation of Features Similarities for Small Sample Size Prediction

    Homayun Afrabandpey, Tomi Peltola, Samuel Kaski
    Subjects: Learning (cs.LG); Human-Computer Interaction (cs.HC)

    Regression under the “small (n), large (p)” conditions, of small sample size
    (n) and large number of features (p) in the learning data set, is a recurring
    setting in which learning from data is difficult. With prior knowledge about
    relationships of the features, (p) can effectively be reduced, but explicating
    such prior knowledge is difficult for experts. In this paper we introduce a new
    method for eliciting expert prior knowledge about the similarity of the roles
    of features in the prediction task. The key idea is to use an interactive
    multidimensional-scaling-type scatterplot display of the features to elicit the
    similarity relationships, and then use the elicited relationships in the prior
    distribution of prediction parameters. Specifically, for learning to predict a
    target variable with Bayesian linear regression, the feature relationships are
    used as the prior for the correlations of the regression coefficients. Results
    on simulated data and a user study on text data confirm that prior elicitation
    of feature similarities improves prediction accuracy. Furthermore, elicitation
    with an interactive scatterplot display outperforms straightforward elicitation
    where the users choose feature pairs from a feature list.

    Improved generator objectives for GANs

    Ben Poole, Alexander A. Alemi, Jascha Sohl-Dickstein, Anelia Angelova
    Comments: NIPS 2016 Workshop on Adversarial Training
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We present a framework to understand GAN training as alternating density
    ratio estimation and approximate divergence minimization. This provides an
    interpretation for the mismatched GAN generator and discriminator objectives
    often used in practice, and explains the problem of poor sample diversity. We
    also derive a family of generator objectives that target arbitrary
    (f)-divergences without minimizing a lower bound, and use them to train
    generative image models that target either improved sample quality or greater
    sample diversity.

    Coupling Distributed and Symbolic Execution for Natural Language Queries

    Lili Mou, Zhengdong Lu, Hang Li, Zhi Jin
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    Building neural networks to query a knowledge base (a table) with natural
    language is an emerging research topic in NLP. The neural enquirer typically
    necessitates multiple steps of execution because of the compositionality of
    queries. In previous studies, researchers have developed either distributed
    enquirers or symbolic ones for table querying. The distributed enquirer is
    end-to-end learnable, but is weak in terms of execution efficiency and explicit
    interpretability. The symbolic enqurier, on the contrary, is efficient during
    execution; but it is very difficult to train especially at initial stages. In
    this paper, we propose to couple distributed and symbolic execution for natural
    language queries. The observation is that a fully distributed executor also
    exhibits meaningful, albeit imperfect, interpretation. We can thus pretrain the
    symbolic executor with the distributed one’s intermediate execution results in
    a step-by-step fashion. Experiments show that our approach significantly
    outperforms either the distributed or symbolic executor; moreover, we have
    recovered more than 80% execution sequences with only groundtruth denotations
    during training. In summary, the coupled neural enquirer takes advantages of
    both distributed and symbolic executors, and has high performance, high
    learning efficiency, high execution efficiency, and high interpretability.

    Learning in the Machine: Random Backpropagation and the Learning Channel

    Pierre Baldi, Peter Sadowski, Zhiqin Lu
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Random backpropagation (RBP) is a variant of the backpropagation algorithm
    for training neural networks, where the transpose of the forward matrices are
    replaced by fixed random matrices in the calculation of the weight updates. It
    is remarkable both because of its effectiveness, in spite of using random
    matrices to communicate error information, and because it completely removes
    the taxing requirement of maintaining symmetric weights in a physical neural
    system. To better understand random backpropagation, we first connect it to the
    notions of local learning and the learning channel. Through this connection, we
    derive several alternatives to RBP, including skipped RBP (SRPB), adaptive RBP
    (ARBP), sparse RBP, and their combinations (e.g. ASRBP) and analyze their
    computational complexity. We then study their behavior through simulations
    using the MNIST and CIFAR-10 bechnmark datasets. These simulations show that
    most of these variants work robustly, almost as well as backpropagation, and
    that multiplication by the derivatives of the activation functions is
    important. As a follow-up, we study also the low-end of the number of bits
    required to communicate error information over the learning channel. We then
    provide partial intuitive explanations for some of the remarkable properties of
    RBP and its variations. Finally, we prove several mathematical results,
    including the convergence to fixed points of linear chains of arbitrary length,
    the convergence to fixed points of linear autoencoders with decorrelated data,
    the long-term existence of solutions for linear systems with a single hidden
    layer, and the convergence to fixed points of non-linear chains, when the
    derivative of the activation functions is included.

    Human powered multiple imputation

    Lovedeep Gondara
    Subjects: Learning (cs.LG); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)

    Missing data is universal and methods to deal with it far ranging from simply
    ignoring it to using complex modelling strategies such as multiple imputation
    and maximum likelihood estimation.Missing data has only been effectively
    imputed by machines via statistical/machine learning models. In this paper we
    set to answer an important question “Can humans perform reasonably well to fill
    in missing data, given information about the dataset?”. We do so in a
    crowdsourcing framework, where we first translate our missing data problem to a
    survey question, which then can be easily completed by crowdworkers. We address
    challenges that are inherent to crowdsourcing in our context and present the
    evaluation on a real dataset. We compare human powered multiple imputation
    outcomes with state-of-the-art model based imputation.

    Towards Information-Seeking Agents

    Philip Bachman, Alessandro Sordoni, Adam Trischler
    Comments: Under review for ICLR 2017
    Subjects: Learning (cs.LG)

    We develop a general problem setting for training and testing the ability of
    agents to gather information efficiently. Specifically, we present a collection
    of tasks in which success requires searching through a partially-observed
    environment, for fragments of information which can be pieced together to
    accomplish various goals. We combine deep architectures with techniques from
    reinforcement learning to develop agents that solve our tasks. We shape the
    behavior of these agents by combining extrinsic and intrinsic rewards. We
    empirically demonstrate that these agents learn to search actively and
    intelligently for new information to reduce their uncertainty, and to exploit
    information they have already acquired.

    Prediction with a Short Memory

    Sham Kakade, Percy Liang, Vatsal Sharan, Gregory Valiant
    Comments: 26 pages, 1 figure
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Machine Learning (stat.ML)

    We consider the problem of predicting the next observation given a sequence
    of past observations. We show that for any distribution over observations, if
    the mutual information between past observations and future observations is
    upper bounded by (I), then a simple Markov model over the most recent
    (I/epsilon) observations can obtain KL error (epsilon) with respect to the
    optimal predictor with access to the entire past. For a Hidden Markov Model
    with (n) states, (I) is bounded by (log n), a quantity that does not depend on
    the mixing time. We also demonstrate that the simple Markov model cannot really
    be improved upon: First, a window length of (I/epsilon) ((I/epsilon^2)) is
    information-theoretically necessary for KL error ((ell_1) error). Second, the
    (d^{Theta(I/epsilon)}) samples required to accurately estimate the Markov
    model when observations are drawn from an alphabet of size (d) is in fact
    necessary for any computationally tractable algorithm, assuming the hardness of
    strongly refuting a certain class of CSPs.

    Bridging Medical Data Inference to Achilles Tendon Rupture Rehabilitation

    An Qu, Cheng Zhang, Paul Ackermann, Hedvig Kjellström
    Comments: Workshop on Machine Learning for Healthcare, NIPS 2016, Barcelona, Spain
    Subjects: Learning (cs.LG); Applications (stat.AP)

    Imputing incomplete medical tests and predicting patient outcomes are crucial
    for guiding the decision making for therapy, such as after an Achilles Tendon
    Rupture (ATR). We formulate the problem of data imputation and prediction for
    ATR relevant medical measurements into a recommender system framework. By
    applying MatchBox, which is a collaborative filtering approach, on a real
    dataset collected from 374 ATR patients, we aim at offering personalized
    medical data imputation and prediction. In this work, we show the feasibility
    of this approach and discuss potential research directions by conducting
    initial qualitative evaluations.

    Protein-Ligand Scoring with Convolutional Neural Networks

    Matthew Ragoza (1), Joshua Hochuli (1), Elisa Idrobo (2), Jocelyn Sunseri (1), David Ryan Koes (1) ((1) University of Pittsburgh, (2) The College of New Jersey)
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Biomolecules (q-bio.BM)

    Computational approaches to drug discovery can reduce the time and cost
    associated with experimental assays and enable the screening of novel
    chemotypes. Structure-based drug design methods rely on scoring functions to
    rank and predict binding affinities and poses. The ever-expanding amount of
    protein-ligand binding and structural data enables the use of deep machine
    learning techniques for protein-ligand scoring.

    We describe convolutional neural network (CNN) scoring functions that take as
    input a comprehensive 3D representation of a protein-ligand interaction. A CNN
    scoring function automatically learns the key features of protein-ligand
    interactions that correlate with binding. We train and optimize our CNN scoring
    functions to discriminate between correct and incorrect binding poses and known
    binders and non-binders. We find that our CNN scoring function outperforms the
    AutoDock Vina scoring function when ranking poses both for pose prediction and
    virtual screening.

    Controlling Robot Morphology from Incomplete Measurements

    Martin Pecka, Karel Zimmermann, Michal Reinštein, Tomáš Svoboda
    Comments: Accepted into IEEE Transactions to Industrial Electronics, Special Section on Motion Control for Novel Emerging Robotic Devices and Systems
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

    Mobile robots with complex morphology are essential for traversing rough
    terrains in Urban Search & Rescue missions (USAR). Since teleoperation of the
    complex morphology causes high cognitive load of the operator, the morphology
    is controlled autonomously. The autonomous control measures the robot state and
    surrounding terrain which is usually only partially observable, and thus the
    data are often incomplete. We marginalize the control over the missing
    measurements and evaluate an explicit safety condition. If the safety condition
    is violated, tactile terrain exploration by the body-mounted robotic arm
    gathers the missing data.

    Scalable Influence Maximization for Multiple Products in Continuous-Time Diffusion Networks

    Nan Du, Yingyu Liang, Maria-Florina Balcan, Manuel Gomez-Rodriguez, Hongyuan Zha, Le Song
    Comments: 45 pages. arXiv admin note: substantial text overlap with arXiv:1312.2164, arXiv:1311.3669
    Subjects: Social and Information Networks (cs.SI); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Machine Learning (stat.ML)

    A typical viral marketing model identifies influential users in a social
    network to maximize a single product adoption assuming unlimited user
    attention, campaign budgets, and time. In reality, multiple products need
    campaigns, users have limited attention, convincing users incurs costs, and
    advertisers have limited budgets and expect the adoptions to be maximized soon.
    Facing these user, monetary, and timing constraints, we formulate the problem
    as a submodular maximization task in a continuous-time diffusion model under
    the intersection of a matroid and multiple knapsack constraints. We propose a
    randomized algorithm estimating the user influence in a network
    ((|mathcal{V}|) nodes, (|mathcal{E}|) edges) to an accuracy of (epsilon)
    with (n=mathcal{O}(1/epsilon^2)) randomizations and
    ( ilde{mathcal{O}}(n|mathcal{E}|+n|mathcal{V}|)) computations. By
    exploiting the influence estimation algorithm as a subroutine, we develop an
    adaptive threshold greedy algorithm achieving an approximation factor (k_a/(2+2
    k)) of the optimal when (k_a) out of the (k) knapsack constraints are active.
    Extensive experiments on networks of millions of nodes demonstrate that the
    proposed algorithms achieve the state-of-the-art in terms of effectiveness and
    scalability.

    A note on the triangle inequality for the Jaccard distance

    Sven Kosub
    Subjects: Discrete Mathematics (cs.DM); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)

    Two simple proofs of the triangle inequality for the Jaccard distance in
    terms of nonnegative, monotone, submodular functions are given and discussed.

    Towards better decoding and language model integration in sequence to sequence models

    Jan Chorowski, Navdeep Jaitly
    Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    The recently proposed Sequence-to-Sequence (seq2seq) framework advocates
    replacing complex data processing pipelines, such as an entire automatic speech
    recognition system, with a single neural network trained in an end-to-end
    fashion. In this contribution, we analyse an attention-based seq2seq speech
    recognition system that directly transcribes recordings into characters. We
    observe two shortcomings: overconfidence in its predictions and a tendency to
    produce incomplete transcriptions when language models are used. We propose
    practical solutions to both problems achieving competitive speaker independent
    word error rates on the Wall Street Journal dataset: without separate language
    models we reach 10.6% WER, while together with a trigram language model, we
    reach 6.7% WER.

    Evaluating the Performance of ANN Prediction System at Shanghai Stock Market in the Period 21-Sep-2016 to 11-Oct-2016

    Barack Wamkaya Wanjawa
    Comments: 13 pages, 4 figures, 8 tables
    Subjects: Statistical Finance (q-fin.ST); Learning (cs.LG); Machine Learning (stat.ML)

    This research evaluates the performance of an Artificial Neural Network based
    prediction system that was employed on the Shanghai Stock Exchange for the
    period 21-Sep-2016 to 11-Oct-2016. It is a follow-up to a previous paper in
    which the prices were predicted and published before September 21. Stock market
    price prediction remains an important quest for investors and researchers. This
    research used an Artificial Intelligence system, being an Artificial Neural
    Network that is feedforward multi-layer perceptron with error backpropagation
    for prediction, unlike other methods such as technical, fundamental or time
    series analysis. While these alternative methods tend to guide on trends and
    not the exact likely prices, neural networks on the other hand have the ability
    to predict the real value prices, as was done on this research. Nonetheless,
    determination of suitable network parameters remains a challenge in neural
    network design, with this research settling on a configuration of 5:21:21:1
    with 80% training data or 4-year of training data as a good enough model for
    stock prediction, as already determined in a previous research by the author.
    The comparative results indicate that neural network can predict typical stock
    market prices with mean absolute percentage errors that are as low as 1.95%
    over the ten prediction instances that was studied in this research.

    Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker

    James H Cole, Rudra PK Poudel, Dimosthenis Tsagkrasoulis, Matthan WA Caan, Claire Steves, Tim D Spector, Giovanni Montana
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neurons and Cognition (q-bio.NC)

    Machine learning analysis of neuroimaging data can accurately predict
    chronological age in healthy people and deviations from healthy brain ageing
    have been associated with cognitive impairment and disease. Here we sought to
    further establish the credentials of “brain-predicted age” as a biomarker of
    individual differences in the brain ageing process, using a predictive
    modelling approach based on deep learning, and specifically convolutional
    neural networks (CNN), and applied to both pre-processed and raw T1-weighted
    MRI data. Firstly, we aimed to demonstrate the accuracy of CNN brain-predicted
    age using a large dataset of healthy adults (N = 2001). Next, we sought to
    establish the heritability of brain-predicted age using a sample of monozygotic
    and dizygotic female twins (N = 62). Thirdly, we examined the test-retest and
    multi-centre reliability of brain-predicted age using two samples
    (within-scanner N = 20; between-scanner N = 11). CNN brain-predicted ages were
    generated and compared to a Gaussian Process Regression (GPR) approach, on all
    datasets. Input data were grey matter (GM) or white matter (WM) volumetric maps
    generated by Statistical Parametric Mapping (SPM) or raw data. Brain-predicted
    age represents an accurate, highly reliable and genetically-valid phenotype,
    that has potential to be used as a biomarker of brain ageing. Moreover, age
    predictions can be accurately generated on raw T1-MRI data, substantially
    reducing computation time for novel data, bringing the process closer to giving
    real-time information on brain health in clinical settings.

    Interactive Elicitation of Knowledge on Feature Relevance Improves Predictions in Small Data Sets

    Luana Micallef, Iiris Sundin, Pekka Marttinen, Muhammad Ammad-ud-din, Tomi Peltola, Marta Soare, Giulio Jacucci, Samuel Kaski
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Providing accurate predictions is challenging for machine learning algorithms
    when the number of features is larger than the number of samples in the data.
    Prior knowledge can improve machine learning models by indicating relevant
    variables and parameter values. Yet, this prior knowledge is often tacit and
    only available from domain experts. We present a novel approach that uses
    interactive visualization to elicit the tacit prior knowledge and uses it to
    improve the accuracy of prediction models. The main component of our approach
    is a user model that models the domain expert’s knowledge of the relevance of
    different features for a prediction task. In particular, based on the expert’s
    earlier input, the user model guides the selection of the features on which to
    elicit user’s knowledge next. The results of a controlled user study show that
    the user model significantly improves prior knowledge elicitation and
    prediction accuracy, when predicting the relative citation counts of scientific
    documents in a specific domain.

    Improving the Performance of Neural Machine Translation Involving Morphologically Rich Languages

    Krupakar Hans, R S Milton
    Comments: 21 pages, 11 figures, 2 tables, In Journal of Computational Linguistics
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    The advent of the attention mechanism in neural machine translation models
    has improved the performance of machine translation systems by enabling
    selective lookup into the source sentence. In this paper, the efficiencies of
    translation using bidirectional encoder attention decoder models were studied
    with respect to translation involving morphologically rich languages. The
    English – Tamil language pair was selected for this analysis. First, the use of
    Word2Vec embedding for both the English and Tamil words improved the
    translation results by 0.73 BLEU points over the baseline RNNSearch model with
    4.84 BLEU score. The use of morphological segmentation before word
    vectorization to split the morphologically rich Tamil words into their
    respective morphemes before the translation, caused a reduction in the target
    vocabulary size by a factor of 8. Also, this model (RNNMorph) improved the
    performance of neural machine translation by 7.05 BLEU points over the
    RNNSearch model used over the same corpus. Since the BLEU evaluation of the
    RNNMorph model might be unreliable due to an increase in the number of matching
    tokens per sentence, the performances of the translations were also compared by
    means of human evaluation metrics of adequacy, fluency and relative ranking.
    Further, the use of morphological segmentation also improved the efficacy of
    the attention mechanism.


    Information Theory

    In the transmission of information, the great potential of model-based coding with the SP theory of intelligence

    J Gerard Wolff
    Subjects: Information Theory (cs.IT)

    Model-based coding, described by John Pierce in 1961, has great potential to
    reduce the volume of information that needs to be transmitted in moving big
    data, without loss of information, from one place to another, or in lossless
    communications via the internet. Compared with ordinary compression methods,
    this potential advantage of model-based coding in the transmission of data
    arises from the fact that both the transmitter (“Alice”) and the receiver
    (“Bob”) are equipped with a grammar for the kind of data that is to be
    transmitted, which means that, to achieve lossless transmission of a body of
    data from Alice and Bob, a relatively small amount of information needs to be
    sent. Preliminary trials indicate that, with model-based coding, the volume of
    information to be sent from Alice to Bob to achieve lossless transmission of a
    given body of data may be less than (6\%) of the volume of information that
    needs to be sent when ordinary compression methods are used.

    Until recently, it has not been feasible to convert John Pierce’s vision into
    something that may be applied in practice. Now, with the development of the “SP
    theory of intelligence” and its realisation in the “SP computer model”, there
    is clear potential to realise the three main functions that will be needed:
    unsupervised learning of a grammar for the kind of data that is to be
    transmitted using a relatively powerful computer that is independent of Alice
    and Bob; the encoding by Alice of any one example of such data in terms of the
    grammar; and, with the grammar, decoding of the encoding by Bob to retrieve the
    given example. It appears now to be feasible, within reasonable timescales, to
    bring these capabilities to a level where they may be applied to the
    transmission of realistically large bodies of data.

    Nonlinear 1-Bit Precoding for Massive MU-MIMO with Higher-Order Modulation

    Sven Jacobsson, Giuseppe Durisi, Mikael Coldrey, Tom Goldstein, Christoph Studer
    Subjects: Information Theory (cs.IT)

    Massive multi-user (MU) multiple-input multiple- output (MIMO) is widely
    believed to be a core technology for the upcoming fifth-generation (5G)
    wireless communication standards. The use of low-precision digital-to-analog
    converters (DACs) in MU-MIMO base stations is of interest because it reduces
    the power consumption, system costs, and raw baseband data rates. In this
    paper, we develop novel algorithms for downlink precoding in massive MU-MIMO
    systems with 1-bit DACs that support higher-order modulation schemes such as
    8-PSK or 16-QAM. Specifically, we present low-complexity nonlinear precoding
    algorithms that achieve low error rates when combined with blind or
    training-based channel-estimation algorithms at the user equipment. These
    results are in stark contrast to linear-quantized precoding algorithms, which
    suffer from a high error floor if used with high-order modulation schemes and
    1-bit DACs.

    Optimal Pilot and Payload Power Control in Single-Cell Massive MIMO Systems

    Hei Victor Cheng, Emil Björnson, Erik G. Larsson
    Comments: 16 pages, 6 figures, to appear in IEEE Transactions on Signal Processing
    Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

    This paper considers the jointly optimal pilot and data power allocation in
    single-cell uplink massive multiple-input-multiple-output (MIMO) systems. Using
    the spectral efficiency (SE) as performance metric and setting a total energy
    budget per coherence interval, the power control is formulated as optimization
    problems for two different objective functions: the weighted minimum SE among
    the users and the weighted sum SE. A closed form solution for the optimal
    length of the pilot sequence is derived. The optimal power control policy for
    the former problem is found by solving a simple equation with a single
    variable. Utilizing the special structure arising from imperfect channel
    estimation, a convex reformulation is found to solve the latter problem to
    global optimality in polynomial time. The gain of the optimal joint power
    control is theoretically justified, and is proved to be large in the low SNR
    regime. Simulation results also show the advantage of optimizing the power
    control over both pilot and data power, as compared to the cases of using full
    power and of only optimizing the data powers as done in previous work.

    An Encoding Scheme with Constituent Codes Optimization for Polar Code-Aim to Reduce the Decoding Latency

    Tiben Che, Gwan Choi
    Comments: Submitted to IEEE Communication Letters
    Subjects: Information Theory (cs.IT)

    This paper proposes a polar code encoding scheme that reduces
    constituent-code supplemented decoding latency. Constituent codes are those
    subset of the codeword with specific patterns. They are used to accelerate the
    successive cancellation decoding process of polar code without any performance
    degradation. We modify the traditional encoding approach to yield increased
    number of desired constituent codes that speeds the decoding process. For (n,
    k) polar code, instead of directly setting the k best and (n-k) worst bits to
    the information bits and frozen bits, respectively, we thoughtfully swap the
    locations of some information and frozen bits according to the quality of their
    equivalent channels. The algorithm of constituent codes division optimization
    is presented. We conducted the simulation of 1k and 2k length polar codes with
    multiple rates and analyzed the decoding latency for various length codes. The
    numerical results shows that the proposed encoding scheme generally is able to
    achieve at least around 20% latency deduction with an negligible gain loss with
    carefully selected optimization threshold. Discussions are also presented.

    Minimum Rates of Approximate Sufficient Statistics

    Masahito Hayashi, Vincent Y. F. Tan
    Comments: 17 pages, 1 figure, To be submitted to ISIT 2017
    Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)

    Given a sufficient statistic for a parametric family of distributions, one
    can estimate the parameter without access to the data itself. However, the
    memory or code size for storing the sufficient statistic may nonetheless still
    be prohibitive. Indeed, for (n) independent data samples drawn from a
    (k)-nomial distribution with (d=k-1) degrees of freedom, the length of the code
    scales as (dlog n+O(1)). In many applications though, we may not have a useful
    notion of sufficient statistics (e.g., when the parametric family is not an
    exponential family) and also may not need to reconstruct the generating
    distribution exactly. By adopting a Shannon-theoretic approach in which we
    consider allow a small error in estimating the generating distribution, we
    construct various notions of {em approximate sufficient statistics} and show
    that the code length can be reduced to (frac{d}{2}log n+O(1)). We also note
    that the locality assumption that is used to describe the notion of local
    approximate sufficient statistics when the parametric family is not an
    exponential family can be dispensed of. We consider errors measured according
    to the relative entropy and variational distance criteria. For the code
    construction parts, we leverage Rissanen’s minimum description length
    principle, which yields a non-vanishing error measured using the relative
    entropy. For the converse parts, we use Clarke and Barron’s asymptotic
    expansion for the relative entropy of a parametrized distribution and the
    corresponding mixture distribution. The limitation of this method is that only
    a weak converse for the variational distance can be shown. We develop new
    techniques to achieve vanishing errors and we also prove strong converses for
    all our statements. The latter means that even if the code is allowed to have a
    non-vanishing error, its length must still be at least (frac{d}{2}log n).

    Joint remote state preparation (JRSP) of two-qubit equatorial state in quantum noisy channels

    Adenike Grace Adepoju, Babatunde James Falaye, Guo-Hua Sun, Oscar Camacho-Nieto, Shi-Hai Dong
    Comments: arXiv admin note: text overlap with arXiv:1609.01538
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    This letter reports the influence of noisy channels on JRSP of two-qubit
    equatorial state. We present a scheme for JRSP of two-qubit equatorial state.
    We employ two tripartite Greenberger-Horne-Zeilinger (GHZ) entangled states as
    the quantum channel linking the parties. We find the success probability to be
    (1/4). However, this probability can be ameliorated to (3/4) if the state
    preparers assist by transmitting individual partial information through
    classical channel to the receiver non-contemporaneously. Afterward, we
    investigate the effects of five quantum noises: the bit-flip noise, bit-phase
    flip noise, amplitude-damping noise, phase-damping noise and depolarizing noise
    on the JRSP process. We obtain the analytical derivation of the fidelities
    corresponding to each quantum noisy channel, which is a measure of information
    loss as the qubits are being distributed in these quantum channels. We find
    that the system loses some of its properties as a consequence of unwanted
    interactions with environment. For instance, within the domain
    (0<lambda<0.65), the information lost via transmission of qubits in amplitude
    channel is most minimal, while for (0.65<lambdaleq1), the information lost in
    phase flip channel becomes the most minimal. Also, for any given (lambda), the
    information transmitted through depolarizing channel has the least chance of
    success.

    What do Shannon information inequalities, submodular width, and disjunctive datalog have to do with one another?

    Mahmoud Abo Khamis, Hung Q. Ngo, Dan Suciu
    Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT)

    Recent works on bounding the output size of a conjunctive query with
    functional dependencies and degree bounds have shown a deep connection between
    fundamental questions in information theory and database theory. We prove
    analogous output bounds for disjunctive datalog queries, and answer several
    open questions regarding the tightness and looseness of these bounds along the
    way. The bounds are intimately related to Shannon-type information
    inequalities. We devise the notion of a “proof sequence” of a specific class of
    Shannon-type information inequalities called “Shannon flow inequalities”. We
    then show how the proof sequence can be used as symbolic instructions to guide
    an algorithm called “PANDA”, which answers disjunctive datalog queries within
    the size bound predicted. We show that PANDA can be used as a black-box to
    devise algorithms matching precisely the fractional hypertree width and the
    submodular width runtimes for queries with functional dependencies and degree
    bounds.

    Our results improve upon known results in three ways. First, our bounds and
    algorithms are for the much more general class of disjunctive datalog queries,
    of which conjunctive queries are a special case. Second, the runtime of PANDA
    matches precisely the submodular width bound, while the previous algorithm by
    Marx has a runtime that is polynomial in this bound. Third, our bounds and
    algorithms work for queries with input cardinality bounds, functional
    dependencies, and degree bounds.

    Overall, our results showed a deep connection between three seemingly
    unrelated lines of research; and, our results on proof sequences for Shannon
    flow inequalities might be of independent interest.




沪ICP备19023445号-2号
友情链接