IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 11 Oct 2016

    我爱机器学习(52ml.net)发表于 2016-10-11 00:00:00
    love 0

    Neural and Evolutionary Computing

    Investigating the effects Diversity Mechanisms have on Evolutionary Algorithms in Dynamic Environments

    Matthew Hughes
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Evolutionary algorithms have been successfully applied to a variety of
    optimisation problems in stationary environments. However, many real world
    optimisation problems are set in dynamic environments where the success
    criteria shifts regularly. Population diversity affects algorithmic
    performance, particularly on multiobjective and dynamic problems. Diversity
    mechanisms are methods of altering evolutionary algorithms in a way that
    promotes the maintenance of population diversity. This project intends to
    measure and compare the performance effect a variety of diversity mechanisms
    have on an evolutionary algorithm when facing an assortment of dynamic
    problems.

    Deep Convolutional Networks as Models of Generalization and Blending Within Visual Creativity

    Graeme McCaig, Steve DiPaola, Liane Gabora
    Comments: 8 pages, In Proceedings of the 7th International Conference on Computational Creativity. Palo Alto: Association for the Advancement of Artificial Intelligence (AAAI) Press (2016)
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)

    We examine two recent artificial intelligence (AI) based deep learning
    algorithms for visual blending in convolutional neural networks (Mordvintsev et
    al. 2015, Gatys et al. 2015). To investigate the potential value of these
    algorithms as tools for computational creativity research, we explain and
    schematize the essential aspects of the algorithms’ operation and give visual
    examples of their output. We discuss the relationship of the two algorithms to
    human cognitive science theories of creativity such as conceptual blending
    theory and honing theory, and characterize the algorithms with respect to
    generation of novelty and aesthetic quality.

    Learning What and Where to Draw

    Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, Honglak Lee
    Comments: In NIPS 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Generative Adversarial Networks (GANs) have recently demonstrated the
    capability to synthesize compelling real-world images, such as room interiors,
    album covers, manga, faces, birds, and flowers. While existing models can
    synthesize images based on global constraints such as a class label or caption,
    they do not provide control over pose or object location. We propose a new
    model, the Generative Adversarial What-Where Network (GAWWN), that synthesizes
    images given instructions describing what content to draw in which location. We
    show high-quality 128 x 128 image synthesis on the Caltech-UCSD Birds dataset,
    conditioned on both informal text descriptions and also object location. Our
    system exposes control over both the bounding box around the bird and its
    constituent parts. By modeling the conditional distributions over part
    locations, our system also enables conditioning on arbitrary subsets of parts
    (e.g. only the beak and tail), yielding an efficient interface for picking part
    locations. We also show preliminary results on the more challenging domain of
    text- and location-controllable synthesis of images of human actions on the
    MPII Human Pose dataset.


    Computer Vision and Pattern Recognition

    Deep feature representations for high-resolution remote-sensing imagery retrieval

    Weixun Zhou, Congmin Li
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this letter, we investigate how to extract deep feature representations
    based on convolutional neural networks (CNN) for high-resolution remote-sensing
    imagery retrieval (HRRSIR). Two effective schemes are proposed to generate the
    final feature rep-resentations for similarity measure. In the first scheme, the
    deep features are extracted from the fully-connected and convolutional layers
    of the pre-trained CNN models, respectively; in the second scheme, we fine-tune
    the pre-trained CNN model using the target remote sensing dataset to learn
    dataset-specific features. The deep feature representations generated by the
    two schemes are evalu-ated on two public and challenging datasets. The
    experimental results indicate that the proposed schemes are able to achieve
    state-of-the-art performance due to the good transferability of the CNN models.

    Person Re-identification: Past, Present and Future

    Liang Zheng, Yi Yang, Alexander G. Hauptmann
    Comments: 20 pages, 5 tables, 10 images
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Person re-identification (re-ID) has become increasingly popular in the
    community due to its application and research significance. It aims at spotting
    a person of interest in other cameras. In the early days, hand-crafted
    algorithms and small-scale evaluation were predominantly reported. Recent years
    have witnessed the emergence of large-scale datasets and deep learning systems
    which make use of large data volumes. Considering different tasks, we classify
    most current re-ID methods into two classes, i.e., image-based and video-based;
    in both tasks, hand-crafted and deep learning systems will be reviewed.
    Moreover, two new re-ID tasks which are much closer to real-world applications
    are described and discussed, i.e., end-to-end re-ID and fast re-ID in very
    large galleries. This paper: 1) introduces the history of person re-ID and its
    relationship with image classification and instance retrieval; 2) surveys a
    broad selection of the hand-crafted systems and the large-scale methods in both
    image- and video-based re-ID; 3) describes critical future directions in
    end-to-end re-ID and fast retrieval in large galleries; and 4) finally briefs
    some important yet under-developed issues.

    Video Captioning and Retrieval Models with Semantic Attention

    Youngjae Yu, Hyungjin Ko, Jongwook Choi, Gunhee Kim
    Comments: 11 pages, 7 figures; Winner of three (fill-in-the-blank, multiple-choice test, and movie retrieval) out of four tasks of the LSMDC 2016 Challenge (Workshop in ECCV 2016)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present the approaches for the four video-to-language tasks of LSMDC 2016,
    including movie description, fill-in-the-blank, multiple-choice test, and movie
    retrieval. Our key idea is to adopt the semantic attention mechanism; we first
    build a set of attribute words that are consistently discovered on video
    frames, and then selectively fuse them with input words for more semantic
    representation and with output words for more accurate prediction. We show that
    our implementation of semantic attention indeed improves the performance of
    multiple video-to-language tasks. Specifically, the presented approaches
    participated in all the four tasks of the LSMDC 2016, and have won three of
    them, including fill-in-the-blank, multiple-choice test, and movie retrieval.

    EM-Based Mixture Models Applied to Video Event Detection

    Alessandra Martins Coelho, Vania V. Estrela
    Comments: 25 pages, 8 figures, Available from: this http URL, Chapter from book “Principal Component Analysis – Engineering Applications”, Dr. Parinya Sanguansat (Ed.), InTech, 2012. arXiv admin note: text overlap with arXiv:1404.1100 by other authors
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Surveillance system (SS) development requires hi-tech support to prevail over
    the shortcomings related to the massive quantity of visual information from
    SSs. Anything but reduced human monitoring became impossible by means of its
    physical and economic implications, and an advance towards an automated
    surveillance becomes the only way out. When it comes to a computer vision
    system, automatic video event comprehension is a challenging task due to motion
    clutter, event understanding under complex scenes, multilevel semantic event
    inference, contextualization of events and views obtained from multiple
    cameras, unevenness of motion scales, shape changes, occlusions and object
    interactions among lots of other impairments. In recent years, state-of-the-art
    models for video event classification and recognition include modeling events
    to discern context, detecting incidents with only one camera, low-level feature
    extraction and description, high-level semantic event classification, and
    recognition. Even so, it is still very burdensome to recuperate or label a
    specific video part relying solely on its content. Principal component analysis
    (PCA) has been widely known and used, but when combined with other techniques
    such as the expectation-maximization (EM) algorithm its computation becomes
    more efficient. This chapter introduces advances associated with the concept of
    Probabilistic PCA (PPCA) analysis of video event and it also aims at looking
    closely to ways and metrics to evaluate these less intensive EM implementations
    of PCA and KPCA.

    Deep Pyramidal Residual Networks

    Dongyoon Han, Jiwhan Kim, Junmo Kim
    Comments: 9 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep convolutional neural networks (DCNNs) have shown remarkable performance
    in image classification tasks in recent years. Generally, deep neural network
    architectures are stacks consisting of a large number of convolution layers,
    and they perform downsampling along the spatial dimension via pooling to reduce
    memory usage. At the same time, the feature map dimension (i.e., the number of
    channels) is sharply increased at downsampling locations, which is essential to
    ensure effective performance because it increases the capability of high-level
    attributes. Moreover, this also applies to residual networks and is very
    closely related to their performance. In this research, instead of using
    downsampling to achieve a sharp increase at each residual unit, we gradually
    increase the feature map dimension at all the units to involve as many
    locations as possible. This is discussed in depth together with our new
    insights as it has proven to be an effective design to improve the
    generalization ability. Furthermore, we propose a novel residual unit capable
    of further improving the classification accuracy with our new network
    architecture. Experiments on benchmark CIFAR datasets have shown that our
    network architecture has a superior generalization ability compared to the
    original residual networks. Code is available at
    this https URL

    Content Based Image Retrieval (CBIR) in Remote Clinical Diagnosis and Healthcare

    Albany E. Herrmann, Vania Vieira Estrela
    Comments: 28 pages, 6 figures, Book Chapter from “Encyclopedia of E-Health and Telemedicine”
    Journal-ref: Encyclopedia of E-Health and Telemedicine. IGI Global, 2016.
    495-520. Web. 10 Oct. 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Content-Based Image Retrieval (CBIR) locates, retrieves and displays images
    alike to one given as a query, using a set of features. It demands accessible
    data in medical archives and from medical equipment, to infer meaning after
    some processing. A problem similar in some sense to the target image can aid
    clinicians. CBIR complements text-based retrieval and improves evidence-based
    diagnosis, administration, teaching, and research in healthcare. It facilitates
    visual/automatic diagnosis and decision-making in real-time remote
    consultation/screening, store-and-forward tests, home care assistance and
    overall patient surveillance. Metrics help comparing visual data and improve
    diagnostic. Specially designed architectures can benefit from the application
    scenario. CBIR use calls for file storage standardization, querying procedures,
    efficient image transmission, realistic databases, global availability, access
    simplicity, and Internet-based structures. This chapter recommends important
    and complex aspects required to handle visual content in healthcare.

    Impatient DNNs – Deep Neural Networks with Dynamic Time Budgets

    Manuel Amthor, Erik Rodner, Joachim Denzler
    Comments: British Machine Vision Conference (BMVC) 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose Impatient Deep Neural Networks (DNNs) which deal with dynamic time
    budgets during application. They allow for individual budgets given a priori
    for each test example and for anytime prediction, i.e., a possible interruption
    at multiple stages during inference while still providing output estimates. Our
    approach can therefore tackle the computational costs and energy demands of
    DNNs in an adaptive manner, a property essential for real-time applications.
    Our Impatient DNNs are based on a new general framework of learning dynamic
    budget predictors using risk minimization, which can be applied to current DNN
    architectures by adding early prediction and additional loss layers. A key
    aspect of our method is that all of the intermediate predictors are learned
    jointly. In experiments, we evaluate our approach for different budget
    distributions, architectures, and datasets. Our results show a significant gain
    in expected accuracy compared to common baselines.

    Matching of Images with Rotation Transformation Based on the Virtual Electromagnetic Interaction

    Xiaodong Zhuang, N. E. Mastorakis
    Comments: 19 pages, 26 figures
    Journal-ref: WSEAS Transactions On Computers, pp. 679-697, Volume 14, 2015
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A novel approach of image matching for rotating transformation is presented
    and studied. The approach is inspired by electromagnetic interaction force
    between physical currents. The virtual current in images is proposed based on
    the significant edge lines extracted as the fundamental structural feature of
    images. The virtual electromagnetic force and the corresponding moment is
    studied between two images after the extraction of the virtual currents in the
    images. Then image matching for rotating transformation is implemented by
    exploiting the interaction between the virtual currents in the two images to be
    matched. The experimental results prove the effectiveness of the novel idea,
    which indicates the promising application of the proposed method in image
    registration.

    Image Segmentation Based on the Self-Balancing Mechanism in Virtual 3D Elastic Mesh

    Xiaodong Zhuang, N. E. Mastorakis, Jieru Chi, Hanping Wang
    Comments: 14 pages, 21 figures
    Journal-ref: WSEAS Transactions on Computers, pp. 805-818, Volume 14, 2015
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, a novel model of 3D elastic mesh is presented for image
    segmentation. The model is inspired by stress and strain in physical elastic
    objects, while the repulsive force and elastic force in the model are defined
    slightly different from the physical force to suit the segmentation problem
    well. The self-balancing mechanism in the model guarantees the stability of the
    method in segmentation. The shape of the elastic mesh at balance state is used
    for region segmentation, in which the sign distribution of the points’z
    coordinate values is taken as the basis for segmentation. The effectiveness of
    the proposed method is proved by analysis and experimental results for both
    test images and real world images.

    Egocentric Height Estimation

    Jessica Finocchiaro, Aisha Urooj Khan, Ali Borji
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Egocentric, or first-person vision which became popular in recent years with
    an emerge in wearable technology, is different than exocentric (third-person)
    vision in some distinguishable ways, one of which being that the camera wearer
    is generally not visible in the video frames. Recent work has been done on
    action and object recognition in egocentric videos, as well as work on
    biometric extraction from first-person videos. Height estimation can be a
    useful feature for both soft-biometrics and object tracking. Here, we propose a
    method of estimating the height of an egocentric camera without any calibration
    or reference points. We used both traditional computer vision approaches and
    deep learning in order to determine the visual cues that results in best height
    estimation. Here, we introduce a framework inspired by two stream networks
    comprising of two Convolutional Neural Networks, one based on spatial
    information, and one based on information given by optical flow in a frame.
    Given an egocentric video as an input to the framework, our model yields a
    height estimate as an output. We also incorporate late fusion to learn a
    combination of temporal and spatial cues. Comparing our model with other
    methods we used as baselines, we achieve height estimates for videos with a
    Mean Average Error of 14.04 cm over a range of 103 cm of data, and
    classification accuracy for relative height (tall, medium or short) up to
    93.75% where chance level is 33%.

    Zero Shot Hashing

    Shubham Pachori, Shanmuganathan Raman
    Comments: 10 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper provides a framework to hash images containing instances of
    unknown object classes. In many object recognition problems, we might have
    access to huge amount of data. It may so happen that even this huge data
    doesn’t cover the objects belonging to classes that we see in our day to day
    life. Zero shot learning exploits auxiliary information (also called as
    signatures) in order to predict the labels corresponding to unknown classes. In
    this work, we attempt to generate the hash codes for images belonging to unseen
    classes, information of which is available only through the textual corpus. We
    formulate this as an unsupervised hashing formulation as the exact labels are
    not available for the instances of unseen classes. We show that the proposed
    solution is able to generate hash codes which can predict labels corresponding
    to unseen classes with appreciably good precision.

    Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition

    Zecheng Xie, Zenghui Sun, Lianwen Jin, Hao Ni, Terry Lyons
    Comments: 14 pages, 9 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Online handwritten Chinese text recognition (OHCTR) is a challenging problem
    as it involves a large-scale character set, ambiguous segmentation, and
    variable-length input sequences. In this paper, we exploit the outstanding
    capability of path signature to translate online pen-tip trajectories into
    informative signature feature maps using a sliding window-based method,
    successfully capturing the analytic and geometric properties of pen strokes
    with strong local invariance and robustness. A multi-spatial-context fully
    convolutional recurrent network (MCFCRN) is proposed to exploit the multiple
    spatial contexts from the signature feature maps and generate a prediction
    sequence while completely avoiding the difficult segmentation problem.
    Furthermore, an implicit language model is developed to make predictions based
    on semantic context within a predicting feature sequence, providing a new
    perspective for incorporating lexicon constraints and prior knowledge about a
    certain language in the recognition procedure. Experiments on two standard
    benchmarks, Dataset-CASIA and Dataset-ICDAR, yielded outstanding results, with
    correct rates of 97.10% and 97.15%, respectively, which are significantly
    better than the best result reported thus far in the literature.

    Crafting GBD-Net for Object Detection

    Xingyu Zeng, Wanli Ouyang, Junjie Yan, Hongsheng Li, Tong Xiao, Kun Wang, Yu Liu, Yucong Zhou, Bin Yang, Zhe Wang, Hui Zhou, Xiaogang Wang
    Comments: This paper shows the details of our approach in wining the ImageNet object detection challenge of 2016, with source code provided on url{this https URL}. The preliminary version of this paper is presented at the ECCV. Xingyu Zeng and Wanli Ouyang contributed equally
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The visual cues from multiple support regions of different sizes and
    resolutions are complementary in classifying a candidate box in object
    detection. Effective integration of local and contextual visual cues from these
    regions has become a fundamental problem in object detection.

    In this paper, we propose a gated bi-directional CNN (GBD-Net) to pass
    messages among features from different support regions during both feature
    learning and feature extraction. Such message passing can be implemented
    through convolution between neighboring support regions in two directions and
    can be conducted in various layers. Therefore, local and contextual visual
    patterns can validate the existence of each other by learning their nonlinear
    relationships and their close interactions are modeled in a more complex way.
    It is also shown that message passing is not always helpful but dependent on
    individual samples. Gated functions are therefore needed to control message
    transmission, whose on-or-offs are controlled by extra visual evidence from the
    input sample. The effectiveness of GBD-Net is shown through experiments on
    three object detection datasets, ImageNet, Pascal VOC2007 and Microsoft COCO.
    This paper also shows the details of our approach in wining the ImageNet object
    detection challenge of 2016, with source code provided on
    url{this https URL}.

    Content-Based Image Retrieval Using Multiresolution Analysis Of Shape-Based Classified Images

    I. M. El-Henawy, Kareem Ahmed
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Content-Based Image Retrieval (CBIR) systems have been widely used for a wide
    range of applications such as Art collections, Crime prevention and
    Intellectual property. In this paper, a novel CBIR system, which utilizes
    visual contents (color, texture and shape) of an image to retrieve images, is
    proposed. The proposed system builds three feature vectors and stores them into
    MySQL database. The first feature vector uses descriptive statistics to
    describe the distribution of data in each channel of RGB channels of the image.
    The second feature vector describes the texture using eigenvalues of the 39
    sub-bands that are generated after applying four levels 2D DWT in each channel
    (red, green and blue channels) of the image. These wavelets sub-bands perfectly
    describes the horizontal, vertical and diagonal edges that exist in the
    multi-resolution analysis of the image. The third feature vector describes the
    basic shapes that exist in the skeletonization version of the black and white
    representation of the image. Experimental results on a private MYSQL database
    that consists of 10000 images, using color, texture, shape and stored relevance
    feedbacks, showed 96.4% average correct retrieval rate in an efficient recovery
    time.

    Learning What and Where to Draw

    Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, Honglak Lee
    Comments: In NIPS 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Generative Adversarial Networks (GANs) have recently demonstrated the
    capability to synthesize compelling real-world images, such as room interiors,
    album covers, manga, faces, birds, and flowers. While existing models can
    synthesize images based on global constraints such as a class label or caption,
    they do not provide control over pose or object location. We propose a new
    model, the Generative Adversarial What-Where Network (GAWWN), that synthesizes
    images given instructions describing what content to draw in which location. We
    show high-quality 128 x 128 image synthesis on the Caltech-UCSD Birds dataset,
    conditioned on both informal text descriptions and also object location. Our
    system exposes control over both the bounding box around the bird and its
    constituent parts. By modeling the conditional distributions over part
    locations, our system also enables conditioning on arbitrary subsets of parts
    (e.g. only the beak and tail), yielding an efficient interface for picking part
    locations. We also show preliminary results on the more challenging domain of
    text- and location-controllable synthesis of images of human actions on the
    MPII Human Pose dataset.

    ResearchDoom and CocoDoom: Learning Computer Vision with Games

    A. Mahendran, H. Bilen, J. F. Henriques, A. Vedaldi
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this short note we introduce ResearchDoom, an implementation of the Doom
    first-person shooter that can extract detailed metadata from the game. We also
    introduce the CocoDoom dataset, a collection of pre-recorded data extracted
    from Doom gaming sessions along with annotations in the MS Coco format.
    ResearchDoom and CocoDoom can be used to train and evaluate a variety of
    computer vision methods such as object recognition, detection and segmentation
    at the level of instances and categories, tracking, ego-motion estimation,
    monocular depth estimation and scene segmentation. The code and data are
    available at this http URL

    Indoor Space Recognition using Deep Convolutional Neural Network: A Case Study at MIT Campus

    Fan Zhang, Fabio Duarte, Ruixian Ma, Dimitrios Milioris, Hui Lin, Carlo Ratti
    Comments: 22 pages; 14 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we propose a robust and parsimonious approach using Deep
    Convolutional Neural Network (DCNN) to recognize and interpret interior space.
    DCNN has achieved incredible success in object and scene recognition. In this
    study we design and train a DCNN to classify a pre-zoning indoor space, and
    from a single phone photo to recognize the learned space features, with no need
    of additional assistive technology. We collect more than 600,000 images inside
    MIT campus buildings to train our DCNN model, and achieved 97.9% accuracy in
    validation dataset and 81.7% accuracy in test dataset based on spatial-scale
    fixed model. Furthermore, the recognition accuracy and spatial resolution can
    be potentially improved through multiscale classification model. We identify
    the discriminative image regions through Class Activating Mapping (CAM)
    technique, to observe the model’s behavior in how to recognize space and
    interpret it in an abstract way. By evaluating the results with
    misclassification matrix, we investigate the visual spatial feature of interior
    space by looking into its visual similarity and visual distinctiveness, giving
    insights into interior design and human indoor perception and wayfinding
    research. The contribution of this paper is threefold. First, we propose a
    robust and parsimonious approach for indoor navigation using DCNN. Second, we
    demonstrate that DCNN also has a potential capability in space feature learning
    and recognition, even under severe appearance changes. Third, we introduce a
    DCNN based approach to look into the visual similarity and visual
    distinctiveness of interior space.

    Open-Ended Visual Question-Answering

    Issey Masuda, Santiago Pascual de la Puente, Xavier Giro-i-Nieto
    Comments: Bachelor thesis report graded with A with honours at ETSETB Telecom BCN school, Universitat Polit`ecnica de Catalunya (UPC). June 2016. Source code and models are publicly available at this http URL
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

    This thesis report studies methods to solve Visual Question-Answering (VQA)
    tasks with a Deep Learning framework. As a preliminary step, we explore Long
    Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to
    tackle Question-Answering (text based). We then modify the previous model to
    accept an image as an input in addition to the question. For this purpose, we
    explore the VGG-16 and K-CNN convolutional neural networks to extract visual
    features from the image. These are merged with the word embedding or with a
    sentence embedding of the question to predict the answer. This work was
    successfully submitted to the Visual Question Answering Challenge 2016, where
    it achieved a 53,62% of accuracy in the test dataset. The developed software
    has followed the best programming practices and Python code style, providing a
    consistent baseline in Keras for different configurations.

    Visual Closed-Loop Control for Pouring Liquids

    Connor Schenck, Dieter Fox
    Comments: Submitted to ICRA 2017
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    Pouring a specific amount of liquid is a challenging task. In this paper we
    develop methods for robots to use visual feedback to perform closed-loop
    control for pouring liquids. We propose both a model-based and a model-free
    method utilizing deep learning for estimating the volume of liquid in a
    container. Our results show that the model-free method is better able to
    estimate the volume. We combine this with a simple PID controller to pour
    specific amounts of liquid, and show that the robot is able to achieve an
    average 38ml deviation from the target amount. To our knowledge, this is the
    first use of raw visual feedback to pour liquids in robotics.

    Boost K-Means

    Wan-Lei Zhao, Cheng-Hao Deng, Chong-Wah Ngo
    Comments: 11 pages, 10 figures
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB)

    Due to its simplicity and versatility, k-means remains popular since it was
    proposed three decades ago. Since then, continuous efforts have been taken to
    enhance its performance. Unfortunately, a good trade-off between quality and
    efficiency is hardly reached. In this paper, a novel k-means variant is
    presented. Different from most of k-means variants, the clustering procedure is
    explicitly driven by an objective function, which is feasible for the whole
    l2-space. The classic egg-chicken loop in k-means has been simplified to a pure
    stochastic optimization procedure. K-means therefore becomes simpler, faster
    and better. The effectiveness of this new variant has been studied extensively
    in different contexts, such as document clustering, nearest neighbor search and
    image clustering. Superior performance is observed across different scenarios.

    4D Crop Monitoring: Spatio-Temporal Reconstruction for Agriculture

    Jing Dong, John Gary Burnham, Byron Boots, Glen C. Rains, Frank Dellaert
    Comments: Submitted to IEEE International Conference on Robotics and Automation (ICRA) 2017
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    Autonomous crop monitoring at high spatial and temporal resolution is a
    critical problem in precision agriculture. While Structure from Motion and
    Multi-View Stereo algorithms can finely reconstruct the 3D structure of a field
    with low-cost image sensors, these algorithms fail to capture the dynamic
    nature of continuously growing crops. In this paper we propose a 4D
    reconstruction approach to crop monitoring, which employs a spatio-temporal
    model of dynamic scenes that is useful for precision agriculture applications.
    Additionally, we provide a robust data association algorithm to address the
    problem of large appearance changes due to scenes being viewed from different
    angles at different points in time, which is critical to achieving 4D
    reconstruction. Finally, we collected a high quality dataset with ground truth
    statistics to evaluate the performance of our method. We demonstrate that our
    4D reconstruction approach provides models that are qualitatively correct with
    respect to visual appearance and quantitatively accurate when measured against
    the ground truth geometric properties of the monitored crops.

    Deep Convolutional Networks as Models of Generalization and Blending Within Visual Creativity

    Graeme McCaig, Steve DiPaola, Liane Gabora
    Comments: 8 pages, In Proceedings of the 7th International Conference on Computational Creativity. Palo Alto: Association for the Advancement of Artificial Intelligence (AAAI) Press (2016)
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)

    We examine two recent artificial intelligence (AI) based deep learning
    algorithms for visual blending in convolutional neural networks (Mordvintsev et
    al. 2015, Gatys et al. 2015). To investigate the potential value of these
    algorithms as tools for computational creativity research, we explain and
    schematize the essential aspects of the algorithms’ operation and give visual
    examples of their output. We discuss the relationship of the two algorithms to
    human cognitive science theories of creativity such as conceptual blending
    theory and honing theory, and characterize the algorithms with respect to
    generation of novelty and aesthetic quality.

    Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models

    Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra
    Comments: 14 pages
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    Neural sequence models are widely used to model time-series data in many
    fields. Equally ubiquitous is the usage of beam search (BS) as an approximate
    inference algorithm to decode output sequences from these models. BS explores
    the search space in a greedy left-right fashion retaining only the top-$B$
    candidates — resulting in sequences that differ only slightly from each other.
    Producing lists of nearly identical sequences is not only computationally
    wasteful but also typically fails to capture the inherent ambiguity of complex
    AI tasks. To overcome this problem, we propose emph{Diverse Beam Search}
    (DBS), an alternative to BS that decodes a list of diverse outputs by
    optimizing for a diversity-augmented objective. We observe that our method
    finds better top-1 solutions by controlling for the exploration and
    exploitation of the search space — implying that DBS is a emph{better search
    algorithm}. Moreover, these gains are achieved with minimal computational or
    memory overhead as compared to beam search. To demonstrate the broad
    applicability of our method, we present results on image captioning, machine
    translation and visual question generation using both standard quantitative
    metrics and qualitative human studies. Our method consistently outperforms BS
    and previously proposed techniques for diverse decoding from neural sequence
    models.


    Artificial Intelligence

    ABA+

    Kristijonas Čyras, Francesca Toni
    Comments: Manuscript under review
    Subjects: Artificial Intelligence (cs.AI)

    We present ABA+, a new approach to handling preferences in a well known
    structured argumentation formalism, Assumption-Based Argumentation (ABA). In
    ABA+, preference information given over assumptions is incorporated directly
    into the attack relation, thus resulting in attack reversal. ABA+
    conservatively extends ABA and exhibits various desirable features regarding
    relationship among argumentation semantics as well as preference handling. We
    also introduce Weak Contraposition, a principle concerning reasoning with rules
    and preferences that relaxes the standard principle of contraposition, while
    guaranteeing additional desirable features for ABA+.

    Personalizing a Dialogue System with Transfer Learning

    Kaixiang Mo, Shuangyin Li, Yu Zhang, Jiajun Li, Qiang Yang
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    It is difficult to train a personalized task-oriented dialogue system because
    the data collected from each individual is often insufficient. Personalized
    dialogue systems trained on a small dataset can overfit and make it difficult
    to adapt to different user needs. One way to solve this problem is to consider
    a collection of multiple users’ data as a source domain and an individual
    user’s data as a target domain, and to perform a transfer learning from the
    source to the target domain. By following this idea, we propose
    “PETAL”(PErsonalized Task diALogue), a transfer-learning framework based on
    POMDP to learn a personalized dialogue system. The system first learns common
    dialogue knowledge from the source domain and then adapts this knowledge to the
    target user. This framework can avoid the negative transfer problem by
    considering differences between source and target users. The policy in the
    personalized POMDP can learn to choose different actions appropriately for
    different users. Experimental results on a real-world coffee-shopping data and
    simulation data show that our personalized dialogue system can choose different
    optimal actions for different users, and thus effectively improve the dialogue
    quality under the personalized setting.

    Situational Awareness by Risk-Conscious Skills

    Daniel J. Mankowitz, Aviv Tamar, Shie Mannor
    Subjects: Artificial Intelligence (cs.AI)

    Hierarchical Reinforcement Learning has been previously shown to speed up the
    convergence rate of RL planning algorithms as well as mitigate feature-based
    model misspecification (Mankowitz et. al. 2016a,b, Bacon 2015). To do so, it
    utilizes hierarchical abstractions, also known as skills — a type of
    temporally extended action (Sutton et. al. 1999) to plan at a higher level,
    abstracting away from the lower-level details. We incorporate risk sensitivity,
    also referred to as Situational Awareness (SA), into hierarchical RL for the
    first time by defining and learning risk aware skills in a Probabilistic Goal
    Semi-Markov Decision Process (PG-SMDP). This is achieved using our novel
    Situational Awareness by Risk-Conscious Skills (SARiCoS) algorithm which comes
    with a theoretical convergence guarantee. We show in a RoboCup soccer domain
    that the learned risk aware skills exhibit complex human behaviors such as
    `time-wasting’ in a soccer game. In addition, the learned risk aware skills are
    able to mitigate reward-based model misspecification.

    Ranking academic institutions on potential paper acceptance in upcoming conferences

    Jobin Wilson, Ram Mohan, Muhammad Arif, Santanu Chaudhury, Brejesh Lall
    Comments: KDD 2016, KDD Cup 2016, Appeared in the KDD Cup Workshop 2016,this https URL
    Subjects: Artificial Intelligence (cs.AI); Digital Libraries (cs.DL); Learning (cs.LG)

    The crux of the problem in KDD Cup 2016 involves developing data mining
    techniques to rank research institutions based on publications. Rank importance
    of research institutions are derived from predictions on the number of full
    research papers that would potentially get accepted in upcoming top-tier
    conferences, utilizing public information on the web. This paper describes our
    solution to KDD Cup 2016. We used a two step approach in which we first
    identify full research papers corresponding to each conference of interest and
    then train two variants of exponential smoothing models to make predictions.
    Our solution achieves an overall score of 0.7508, while the winning submission
    scored 0.7656 in the overall results.

    Multi-Objective Deep Reinforcement Learning

    Hossam Mossalam, Yannis M. Assael, Diederik M. Roijers, Shimon Whiteson
    Subjects: Artificial Intelligence (cs.AI)

    We propose Deep Optimistic Linear Support Learning (DOL) to solve
    high-dimensional multi-objective decision problems where the relative
    importances of the objectives are not known a priori. Using features from the
    high-dimensional inputs, DOL computes the convex coverage set containing all
    potential optimal solutions of the convex combinations of the objectives. To
    our knowledge, this is the first time that deep reinforcement learning has
    succeeded in learning multi-objective policies. In addition, we provide a
    testbed with two experiments to be used as a benchmark for deep multi-objective
    reinforcement learning.

    Solving Marginal MAP Problems with NP Oracles and Parity Constraints

    Yexiang Xue, Zhiyuan Li, Stefano Ermon, Carla P. Gomes, Bart Selman
    Subjects: Artificial Intelligence (cs.AI)

    Arising from many applications at the intersection of decision making and
    machine learning, Marginal Maximum A Posteriori (Marginal MAP) Problems unify
    the two main classes of inference, namely maximization (optimization) and
    marginal inference (counting), and are believed to have higher complexity than
    both of them. We propose XOR_MMAP, a novel approach to solve the Marginal MAP
    Problem, which represents the intractable counting subproblem with queries to
    NP oracles, subject to additional parity constraints. XOR_MMAP provides a
    constant factor approximation to the Marginal MAP Problem, by encoding it as a
    single optimization in polynomial size of the original problem. We evaluate our
    approach in several machine learning and decision making applications, and show
    that our approach outperforms several state-of-the-art Marginal MAP solvers.

    Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models

    Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra
    Comments: 14 pages
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    Neural sequence models are widely used to model time-series data in many
    fields. Equally ubiquitous is the usage of beam search (BS) as an approximate
    inference algorithm to decode output sequences from these models. BS explores
    the search space in a greedy left-right fashion retaining only the top-$B$
    candidates — resulting in sequences that differ only slightly from each other.
    Producing lists of nearly identical sequences is not only computationally
    wasteful but also typically fails to capture the inherent ambiguity of complex
    AI tasks. To overcome this problem, we propose emph{Diverse Beam Search}
    (DBS), an alternative to BS that decodes a list of diverse outputs by
    optimizing for a diversity-augmented objective. We observe that our method
    finds better top-1 solutions by controlling for the exploration and
    exploitation of the search space — implying that DBS is a emph{better search
    algorithm}. Moreover, these gains are achieved with minimal computational or
    memory overhead as compared to beam search. To demonstrate the broad
    applicability of our method, we present results on image captioning, machine
    translation and visual question generation using both standard quantitative
    metrics and qualitative human studies. Our method consistently outperforms BS
    and previously proposed techniques for diverse decoding from neural sequence
    models.

    Extrapolation and learning equations

    Georg Martius, Christoph H. Lampert
    Comments: 13 pages, 8 figures, 4 tables
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    In classical machine learning, regression is treated as a black box process
    of identifying a suitable function from a hypothesis set without attempting to
    gain insight into the mechanism connecting inputs and outputs. In the natural
    sciences, however, finding an interpretable function for a phenomenon is the
    prime goal as it allows to understand and generalize results. This paper
    proposes a novel type of function learning network, called equation learner
    (EQL), that can learn analytical expressions and is able to extrapolate to
    unseen domains. It is implemented as an end-to-end differentiable feed-forward
    network and allows for efficient gradient based training. Due to sparsity
    regularization concise interpretable expressions can be obtained. Often the
    true underlying source expression is identified.

    Towards an Ontology-Driven Blockchain Design for Supply Chain Provenance

    Henry M. Kim, Marek Laskowski
    Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

    An interesting research problem in our age of Big Data is that of determining
    provenance. Granular evaluation of provenance of physical goods–e.g. tracking
    ingredients of a pharmaceutical or demonstrating authenticity of luxury
    goods–has often not been possible with today’s items that are produced and
    transported in complex, inter-organizational, often internationally-spanning
    supply chains. Recent adoption of Internet of Things and Blockchain
    technologies give promise at better supply chain provenance. We are
    particularly interested in the blockchain as many favoured use cases of
    blockchain are for provenance tracking. We are also interested in applying
    ontologies as there has been some work done on knowledge provenance,
    traceability, and food provenance using ontologies. In this paper, we make a
    case for why ontologies can contribute to blockchain design. To support this
    case, we analyze a traceability ontology and translate some of its
    representations to smart contracts that execute a provenance trace and enforce
    traceability constraints on the Ethereum blockchain platform.

    Heuristic Approaches for Generating Local Process Models through Log Projections

    Niek Tax, Natalia Sidorova, Wil M. P. van der Aalst, Reinder Haakma
    Comments: paper accepted and to appear in the proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM), special session on Process Mining, part of the Symposium Series on Computational Intelligence (SSCI)
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB)

    Local Process Model (LPM) discovery is focused on the mining of a set of
    process models where each model describes the behavior represented in the event
    log only partially, i.e. subsets of possible events are taken into account to
    create so-called local process models. Often such smaller models provide
    valuable insights into the behavior of the process, especially when no adequate
    and comprehensible single overall process model exists that is able to describe
    the traces of the process from start to end. The practical application of LPM
    discovery is however hindered by computational issues in the case of logs with
    many activities (problems may already occur when there are more than 17 unique
    activities). In this paper, we explore three heuristics to discover subsets of
    activities that lead to useful log projections with the goal of speeding up LPM
    discovery considerably while still finding high-quality LPMs. We found that a
    Markov clustering approach to create projection sets results in the largest
    improvement of execution time, with discovered LPMs still being better than
    with the use of randomly generated activity sets of the same size. Another
    heuristic, based on log entropy, yields a more moderate speedup, but enables
    the discovery of higher quality LPMs. The third heuristic, based on the
    relative information gain, shows unstable performance: for some data sets the
    speedup and LPM quality are higher than with the log entropy based method,
    while for other data sets there is no speedup at all.

    A New Theoretical and Technological System of Imprecise-Information Processing

    Shiyou Lian
    Comments: 14 pages, 8 figures
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    Imprecise-information processing will play an indispensable role in
    intelligent systems, especially in the anthropomorphic intelligent systems (as
    intelligent robots). A new theoretical and technological system of
    imprecise-information processing has been founded in Principles of
    Imprecise-Information Processing: A New Theoretical and Technological System[1]
    which is different from fuzzy technology. The system has clear hierarchy and
    rigorous structure, which results from the formation principle of imprecise
    information and has solid mathematical and logical bases, and which has many
    advantages beyond fuzzy technology. The system provides a technological
    platform for relevant applications and lays a theoretical foundation for
    further research.

    Interpreting Neural Networks to Improve Politeness Comprehension

    Malika Aubakirova, Mohit Bansal
    Comments: To appear at EMNLP 2016
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We present an interpretable neural network approach to predicting and
    understanding politeness in natural language requests. Our models are based on
    simple convolutional neural networks directly on raw text, avoiding any manual
    identification of complex sentiment or syntactic features, while performing
    better than such feature-based models from previous work. More importantly, we
    use the challenging task of politeness prediction as a testbed to next present
    a much-needed understanding of what these successful networks are actually
    learning. For this, we present several network visualizations based on
    activation clusters, first derivative saliency, and embedding space
    transformations, helping us automatically identify several subtle linguistics
    markers of politeness theories. Further, this analysis reveals multiple novel,
    high-scoring politeness strategies which, when added back as new features,
    reduce the accuracy gap between the original featurized system and the neural
    model, thus providing a clear quantitative interpretation of the success of
    these neural networks.

    A new selection strategy for selective cluster ensemble based on Diversity and Independency

    Muhammad Yousefnezhad, Ali Reihanian, Daoqiang Zhang, Behrouz Minaei-Bidgoli
    Comments: Accepted in Engineering Applications of Artificial Intelligence (EAAI) Journal
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    This research introduces a new strategy in cluster ensemble selection by
    using Independency and Diversity metrics. In recent years, Diversity and
    Quality, which are two metrics in evaluation procedure, have been used for
    selecting basic clustering results in the cluster ensemble selection. Although
    quality can improve the final results in cluster ensemble, it cannot control
    the procedures of generating basic results, which causes a gap in prediction of
    the generated basic results’ accuracy. Instead of quality, this paper
    introduces Independency as a supplementary method to be used in conjunction
    with Diversity. Therefore, this paper uses a heuristic metric, which is based
    on the procedure of converting code to graph in Software Testing, in order to
    calculate the Independency of two basic clustering algorithms. Moreover, a new
    modeling language, which we called as “Clustering Algorithms Independency
    Language” (CAIL), is introduced in order to generate graphs which depict
    Independency of algorithms. Also, Uniformity, which is a new similarity metric,
    has been introduced for evaluating the diversity of basic results. As a
    credential, our experimental results on varied different standard data sets
    show that the proposed framework improves the accuracy of final results
    dramatically in comparison with other cluster ensemble methods.

    On Deductive Systems of AC Semantics for Rough Sets

    A. Mani
    Comments: 12 pages
    Subjects: Logic (math.LO); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

    Antichain based semantics for general rough sets were introduced recently by
    the present author. In her paper two different semantics, one for general rough
    sets and another for general approximation spaces over quasi-equivalence
    relations, were developed. These semantics are improved and studied further
    from a lateral algebraic logic perspective in this research. The main results
    concern the structure of the algebras and deductive systems in the context.


    Information Retrieval

    Dynamic Trade-Off Prediction in Multi-Stage Retrieval Systems

    J. Shane Culpepper, Charles L. A. Clarke, Jimmy Lin
    Subjects: Information Retrieval (cs.IR)

    Modern multi-stage retrieval systems are comprised of a candidate generation
    stage followed by one or more reranking stages. In such an architecture, the
    quality of the final ranked list may not be sensitive to the quality of initial
    candidate pool, especially in terms of early precision. This provides several
    opportunities to increase retrieval efficiency without significantly
    sacrificing effectiveness. In this paper, we explore a new approach to
    dynamically predicting two different parameters in the candidate generation
    stage which can directly affect the overall efficiency and effectiveness of the
    entire system. Previous work exploring this tradeoff has focused on global
    parameter settings that apply to all queries, even though optimal settings vary
    across queries. In contrast, we propose a technique which makes a parameter
    prediction that maximizes efficiency within a effectiveness envelope on a per
    query basis, using only static pre-retrieval features. The query-specific
    tradeoff point between effectiveness and efficiency is decided using a
    classifier cascade that weighs possible efficiency gains against effectiveness
    losses over a range of possible parameter cutoffs to make the prediction. The
    interesting twist in our new approach is to train classifiers without requiring
    explicit relevance judgments. We show that our framework is generalizable by
    applying it to two different retrieval parameters – selecting k in common top-k
    query retrieval algorithms, and setting a quality threshold, $
    ho$, for
    score-at-a-time approximate query evaluation algorithms. Experimental results
    show that substantial efficiency gains are achievable depending on the dynamic
    parameter choice. In addition, our framework provides a versatile tool that can
    be used to estimate the effectiveness-efficiency tradeoffs that are possible
    before selecting and tuning algorithms to make machine learned predictions.

    Optimal Download Cost of Private Information Retrieval for Arbitrary Message Length

    Hua Sun, Syed A. Jafar
    Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Information Retrieval (cs.IR)

    A private information retrieval scheme is a mechanism that allows a user to
    retrieve any one out of $K$ messages from $N$ non-communicating replicated
    databases, each of which stores all $K$ messages, without revealing anything
    about the identity of the desired message index to any individual database. If
    the size of each message is $L$ bits and the total download required by a PIR
    scheme from all $N$ databases is $D$ bits, then $D$ is called the download cost
    and the ratio $L/D$ is called an achievable rate. For fixed $K,Ninmathbb{N}$,
    the capacity of PIR, denoted by $C$, is the supremum of achievable rates over
    all PIR schemes and over all message sizes, and was recently shown to be
    $C=(1+1/N+1/N^2+cdots+1/N^{K-1})^{-1}$. In this work, for arbitrary $K, N$, we
    explore the minimum download cost $D_L$ across all PIR schemes (not restricted
    to linear schemes) for arbitrary message lengths $L$ under arbitrary choices of
    alphabet (not restricted to finite fields) for the message and download
    symbols. If the same $M$-ary alphabet is used for the message and download
    symbols, then we show that the optimal download cost in $M$-ary symbols is
    $D_L=lceilfrac{L}{C}
    ceil$. If the message symbols are in $M$-ary alphabet
    and the downloaded symbols are in $M’$-ary alphabet, then we show that the
    optimal download cost in $M’$-ary symbols, $D_Linleft{leftlceil
    frac{L’}{C}
    ight
    ceil,leftlceil frac{L’}{C}
    ight
    ceil-1,leftlceil
    frac{L’}{C}
    ight
    ceil-2
    ight}$, where $L’= lceil L log_{M’} M
    ceil$.

    SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

    Kaiwei Li, Jianfei Chen, Wenguang Chen, Jun Zhu
    Comments: 13 pages, 12 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)

    Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete
    count data such as text and images. Applications require LDA to handle both
    large datasets and a large number of topics. Though distributed CPU systems
    have been used, GPU-based systems have emerged as a promising alternative
    because of the high computational power and memory bandwidth of GPUs. However,
    existing GPU-based LDA systems cannot support a large number of topics because
    they use algorithms on dense data structures whose time and space complexity is
    linear to the number of topics. In this paper, we propose SaberLDA, a GPU-based
    LDA system that implements a sparsity-aware algorithm to achieve sublinear time
    complexity and scales well to learn a large number of topics. To address the
    challenges introduced by sparsity, we propose a novel data layout, a new
    warp-based sampling kernel, and an efficient sparse count matrix updating
    algorithm that improves locality, makes efficient utilization of GPU warps, and
    reduces memory consumption. xperiments show that SaberLDA can learn from
    billions-token-scale data with up to 10,000 topics, which is almost two orders
    of magnitude larger than that of the previous GPU-based systems. With a single
    GPU card, SaberLDA is able to earn 10,000 topics from a dataset of billions of
    tokens in a few hours, which is only achievable with clusters with tens of
    machines before.


    Computation and Language

    Very Deep Convolutional Networks for End-to-End Speech Recognition

    Yu Zhang, William Chan, Navdeep Jaitly
    Subjects: Computation and Language (cs.CL)

    Sequence-to-sequence models have shown success in end-to-end speech
    recognition. However these models have only used shallow acoustic encoder
    networks. In our work, we successively train very deep convolutional networks
    to add more expressive power and better generalization for end-to-end ASR
    models. We apply network-in-network principles, batch normalization, residual
    connections and convolutional LSTMs to build very deep recurrent and
    convolutional structures. Our models exploit the spectral structure in the
    feature space and add computational depth without overfitting issues. We
    experiment with the WSJ ASR task and achieve 10.5\% word error rate without any
    dictionary or language using a 15 layer deep network.

    Fully Character-Level Neural Machine Translation without Explicit Segmentation

    Jason Lee, Kyunghyun Cho, Thomas Hofmann
    Comments: 14 pages, 2 figures
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Most existing machine translation systems operate at the level of words,
    relying on explicit segmentation to extract tokens. We introduce a neural
    machine translation (NMT) model that maps a source character sequence to a
    target character sequence without any segmentation. We employ a character-level
    convolutional network with max-pooling at the encoder to reduce the length of
    source representation, allowing the model to be trained at a speed comparable
    to subword-level models while capturing local regularities. Our
    character-to-character model outperforms a recently proposed baseline with a
    subword-level encoder on WMT’15 DE-EN and CS-EN, and gives comparable
    performance on FI-EN and RU-EN. We then demonstrate that it is possible to
    share a single character-level encoder across multiple languages by training a
    model on a many-to-one translation task. In this multilingual setting, the
    character-level encoder significantly outperforms the subword-level encoder on
    all the language pairs. We also observe that the quality of the multilingual
    character-level translation even surpasses the models trained and tuned on one
    language pair, namely on CS-EN, FI-EN and RU-EN.

    Modelling Sentence Pairs with Tree-structured Attentive Encoder

    Yao Zhou, Cong Liu, Yan Pan
    Comments: 10 pages, 3 figures, COLING2016
    Subjects: Computation and Language (cs.CL)

    We describe an attentive encoder that combines tree-structured recursive
    neural networks and sequential recurrent neural networks for modelling sentence
    pairs. Since existing attentive models exert attention on the sequential
    structure, we propose a way to incorporate attention into the tree topology.
    Specially, given a pair of sentences, our attentive encoder uses the
    representation of one sentence, which generated via an RNN, to guide the
    structural encoding of the other sentence on the dependency parse tree. We
    evaluate the proposed attentive encoder on three tasks: semantic similarity,
    paraphrase identification and true-false question selection. Experimental
    results show that our encoder outperforms all baselines and achieves
    state-of-the-art results on two tasks.

    A New Theoretical and Technological System of Imprecise-Information Processing

    Shiyou Lian
    Comments: 14 pages, 8 figures
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    Imprecise-information processing will play an indispensable role in
    intelligent systems, especially in the anthropomorphic intelligent systems (as
    intelligent robots). A new theoretical and technological system of
    imprecise-information processing has been founded in Principles of
    Imprecise-Information Processing: A New Theoretical and Technological System[1]
    which is different from fuzzy technology. The system has clear hierarchy and
    rigorous structure, which results from the formation principle of imprecise
    information and has solid mathematical and logical bases, and which has many
    advantages beyond fuzzy technology. The system provides a technological
    platform for relevant applications and lays a theoretical foundation for
    further research.

    A Dynamic Window Neural Network for CCG Supertagging

    Huijia Wu, Jiajun Zhang, Chengqing Zong
    Comments: 8 pages, 3 figures
    Subjects: Computation and Language (cs.CL)

    Combinatory Category Grammar (CCG) supertagging is a task to assign lexical
    categories to each word in a sentence. Almost all previous methods use fixed
    context window sizes as input features. However, it is obvious that different
    tags usually rely on different context window sizes. These motivate us to build
    a supertagger with a dynamic window approach, which can be treated as an
    attention mechanism on the local contexts. Applying dropout on the dynamic
    filters can be seen as drop on words directly, which is superior to the regular
    dropout on word embeddings. We use this approach to demonstrate the
    state-of-the-art CCG supertagging performance on the standard test set.

    Open-Ended Visual Question-Answering

    Issey Masuda, Santiago Pascual de la Puente, Xavier Giro-i-Nieto
    Comments: Bachelor thesis report graded with A with honours at ETSETB Telecom BCN school, Universitat Polit`ecnica de Catalunya (UPC). June 2016. Source code and models are publicly available at this http URL
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

    This thesis report studies methods to solve Visual Question-Answering (VQA)
    tasks with a Deep Learning framework. As a preliminary step, we explore Long
    Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to
    tackle Question-Answering (text based). We then modify the previous model to
    accept an image as an input in addition to the question. For this purpose, we
    explore the VGG-16 and K-CNN convolutional neural networks to extract visual
    features from the image. These are merged with the word embedding or with a
    sentence embedding of the question to predict the answer. This work was
    successfully submitted to the Visual Question Answering Challenge 2016, where
    it achieved a 53,62% of accuracy in the test dataset. The developed software
    has followed the best programming practices and Python code style, providing a
    consistent baseline in Keras for different configurations.

    Interpreting Neural Networks to Improve Politeness Comprehension

    Malika Aubakirova, Mohit Bansal
    Comments: To appear at EMNLP 2016
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We present an interpretable neural network approach to predicting and
    understanding politeness in natural language requests. Our models are based on
    simple convolutional neural networks directly on raw text, avoiding any manual
    identification of complex sentiment or syntactic features, while performing
    better than such feature-based models from previous work. More importantly, we
    use the challenging task of politeness prediction as a testbed to next present
    a much-needed understanding of what these successful networks are actually
    learning. For this, we present several network visualizations based on
    activation clusters, first derivative saliency, and embedding space
    transformations, helping us automatically identify several subtle linguistics
    markers of politeness theories. Further, this analysis reveals multiple novel,
    high-scoring politeness strategies which, when added back as new features,
    reduce the accuracy gap between the original featurized system and the neural
    model, thus providing a clear quantitative interpretation of the success of
    these neural networks.

    Enabling Medical Translation for Low-Resource Languages

    Ahmad Musleh, Nadir Durrani, Irina Temnikova, Preslav Nakov, Stephan Vogel, Osama Alsaad
    Comments: CICLING-2016: 17th International Conference on Intelligent Text Processing and Computational Linguistics, Keywords: Machine Translation, medical translation, doctor-patient communication, resource-poor languages, Hindi
    Subjects: Computation and Language (cs.CL)

    We present research towards bridging the language gap between migrant workers
    in Qatar and medical staff. In particular, we present the first steps towards
    the development of a real-world Hindi-English machine translation system for
    doctor-patient communication. As this is a low-resource language pair,
    especially for speech and for the medical domain, our initial focus has been on
    gathering suitable training data from various sources. We applied a variety of
    methods ranging from fully automatic extraction from the Web to manual
    annotation of test data. Moreover, we developed a method for automatically
    augmenting the training data with synthetically generated variants, which
    yielded a very sizable improvement of more than 3 BLEU points absolute.

    Mining the Web for Pharmacovigilance: the Case Study of Duloxetine and Venlafaxine

    Abbas Chokor, Abeed Sarker, Graciela Gonzalez
    Comments: Masters project report
    Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)

    Adverse reactions caused by drugs following their release into the market are
    among the leading causes of death in many countries. The rapid growth of
    electronically available health related information, and the ability to process
    large volumes of them automatically, using natural language processing (NLP)
    and machine learning algorithms, have opened new opportunities for
    pharmacovigilance. Survey found that more than 70% of US Internet users consult
    the Internet when they require medical information. In recent years, research
    in this area has addressed for Adverse Drug Reaction (ADR) pharmacovigilance
    using social media, mainly Twitter and medical forums and websites. This paper
    will show the information which can be collected from a variety of Internet
    data sources and search engines, mainly Google Trends and Google Correlate.
    While considering the case study of two popular Major depressive Disorder (MDD)
    drugs, Duloxetine and Venlafaxine, we will provide a comparative analysis for
    their reactions using publicly-available alternative data sources.

    Computational linking theory

    Aaron Steven White, Drew Reisinger, Rachel Rudinger, Kyle Rawlins, Benjamin Van Durme
    Subjects: Computation and Language (cs.CL)

    A linking theory explains how verbs’ semantic arguments are mapped to their
    syntactic arguments—the inverse of the Semantic Role Labeling task from the
    shallow semantic parsing literature. In this paper, we develop the
    Computational Linking Theory framework as a method for implementing and testing
    linking theories proposed in the theoretical literature. We deploy this
    framework to assess two cross-cutting types of linking theory: local v. global
    models and categorical v. featural models. To further investigate the behavior
    of these models, we develop a measurement model in the spirit of previous work
    in semantic role induction: the Semantic Proto-Role Linking Model. We use this
    model, which implements a generalization of Dowty’s seminal Proto-Role Theory,
    to induce semantic proto-roles, which we compare to those Dowty proposes.

    A Semantic Analyzer for the Comprehension of the Spontaneous Arabic Speech

    Mourad Mars, Mounir Zrigui, Mohamed Belgacem, Anis Zouaghi
    Comments: Advances in Computer Science and Engineering. 12 pages 6 figures
    Journal-ref: Journal: Research in Computing Science Vol34, 2008, pp. 129-140,
    ISSN: 1870-4069
    Subjects: Computation and Language (cs.CL)

    This work is part of a large research project entitled “Or’eodule” aimed at
    developing tools for automatic speech recognition, translation, and synthesis
    for Arabic language. Our attention has mainly been focused on an attempt to
    improve the probabilistic model on which our semantic decoder is based. To
    achieve this goal, we have decided to test the influence of the pertinent
    context use, and of the contextual data integration of different types, on the
    effectiveness of the semantic decoder. The findings are quite satisfactory.

    Latent Sequence Decompositions

    William Chan, Yu Zhang, Quoc Le, Navdeep Jaitly
    Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)

    We present the Latent Sequence Decompositions (LSD) framework. LSD decomposes
    sequences with variable lengthed output units as a function of both the input
    sequence and the output sequence. We present a training algorithm which samples
    valid extensions and an approximate decoding algorithm. We experiment with the
    Wall Street Journal speech recognition task. Our LSD model achieves 12.9% WER
    compared to a character baseline of 14.8% WER. When combined with a
    convolutional network on the encoder, we achieve 9.2% WER.

    Investigation of Synthetic Speech Detection Using Frame- and Segment-Specific Importance Weighting

    Ali Khodabakhsh, Cenk Demiroglu
    Subjects: Sound (cs.SD); Computation and Language (cs.CL)

    Speaker verification systems are vulnerable to spoofing attacks which
    presents a major problem in their real-life deployment. To date, most of the
    proposed synthetic speech detectors (SSDs) have weighted the importance of
    different segments of speech equally. However, different attack methods have
    different strengths and weaknesses and the traces that they leave may be short
    or long term acoustic artifacts. Moreover, those may occur for only particular
    phonemes or sounds. Here, we propose three algorithms that weigh
    likelihood-ratio scores of individual frames, phonemes, and sound-classes
    depending on their importance for the SSD. Significant improvement over the
    baseline system has been obtained for known attack methods that were used in
    training the SSDs. However, improvement with unknown attack types was not
    substantial. Thus, the type of distortions that were caused by the unknown
    systems were different and could not be captured better with the proposed SSD
    compared to the baseline SSD.

    Personalizing a Dialogue System with Transfer Learning

    Kaixiang Mo, Shuangyin Li, Yu Zhang, Jiajun Li, Qiang Yang
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    It is difficult to train a personalized task-oriented dialogue system because
    the data collected from each individual is often insufficient. Personalized
    dialogue systems trained on a small dataset can overfit and make it difficult
    to adapt to different user needs. One way to solve this problem is to consider
    a collection of multiple users’ data as a source domain and an individual
    user’s data as a target domain, and to perform a transfer learning from the
    source to the target domain. By following this idea, we propose
    “PETAL”(PErsonalized Task diALogue), a transfer-learning framework based on
    POMDP to learn a personalized dialogue system. The system first learns common
    dialogue knowledge from the source domain and then adapts this knowledge to the
    target user. This framework can avoid the negative transfer problem by
    considering differences between source and target users. The policy in the
    personalized POMDP can learn to choose different actions appropriately for
    different users. Experimental results on a real-world coffee-shopping data and
    simulation data show that our personalized dialogue system can choose different
    optimal actions for different users, and thus effectively improve the dialogue
    quality under the personalized setting.

    Emergence of linguistic laws in human voice

    Ivan Gonzalez Torre, Bartolo Luque, Lucas Lacasa, Jordi Luque, Antoni Hernandez-Fernandez
    Comments: Submitted for publication
    Subjects: Physics and Society (physics.soc-ph); Computation and Language (cs.CL)

    Linguistic laws constitute one of the quantitative cornerstones of modern
    cognitive sciences and have been routinely investigated in written corpora, or
    in the equivalent transcription of oral corpora. This means that inferences of
    statistical patterns of language in acoustics are biased by the arbitrary,
    language-dependent segmentation of the signal, and virtually precludes the
    possibility of making comparative studies between human voice and other animal
    communication systems. Here we bridge this gap by proposing a method that
    allows to measure such patterns in acoustic signals of arbitrary origin,
    without needs to have access to the language corpus underneath. The method has
    been applied to six different human languages, recovering successfully some
    well-known laws of human communication at timescales even below the phoneme and
    finding yet another link between complexity and criticality in a biological
    system. These methods further pave the way for new comparative studies in
    animal communication or the analysis of signals of unknown code.

    Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models

    Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra
    Comments: 14 pages
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    Neural sequence models are widely used to model time-series data in many
    fields. Equally ubiquitous is the usage of beam search (BS) as an approximate
    inference algorithm to decode output sequences from these models. BS explores
    the search space in a greedy left-right fashion retaining only the top-$B$
    candidates — resulting in sequences that differ only slightly from each other.
    Producing lists of nearly identical sequences is not only computationally
    wasteful but also typically fails to capture the inherent ambiguity of complex
    AI tasks. To overcome this problem, we propose emph{Diverse Beam Search}
    (DBS), an alternative to BS that decodes a list of diverse outputs by
    optimizing for a diversity-augmented objective. We observe that our method
    finds better top-1 solutions by controlling for the exploration and
    exploitation of the search space — implying that DBS is a emph{better search
    algorithm}. Moreover, these gains are achieved with minimal computational or
    memory overhead as compared to beam search. To demonstrate the broad
    applicability of our method, we present results on image captioning, machine
    translation and visual question generation using both standard quantitative
    metrics and qualitative human studies. Our method consistently outperforms BS
    and previously proposed techniques for diverse decoding from neural sequence
    models.


    Distributed, Parallel, and Cluster Computing

    Multi-Message Broadcast in Dynamic Radio Networks

    Mohamad Ahmadi, Fabian Kuhn
    Comments: appears in ALGOSENSORS 2016
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    We continue the recent line of research studying information dissemination
    problems in adversarial dynamic radio networks. We give two generic algorithms
    which allow to transform generalized version of single-message broadcast
    algorithms into multi-message broadcast algorithms. Based on these generic
    algorithms, we obtain multi-message broadcast algorithms for dynamic radio
    networks for a number of different dynamic network settings. For one of the
    modeling assumptions, our algorithms are complemented by a lower bound which
    shows that the upper bound is close to optimal.

    Hardening Cassandra Against Byzantine Failures

    Roy Friedman, Roni Licher
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Cassandra is one of the most widely used distributed data stores these days.
    Cassandra supports flexible consistency guarantees over a wide-column data
    access model and provides almost linear scale-out performance. This enables
    application developers to tailor the performance and availability of Cassandra
    to their exact application’s needs and required semantics. Yet, Cassandra is
    designed to withstand benign failures, and cannot cope with most forms of
    Byzantine attacks.

    In this work, we present an analysis of Cassandra’s vulnerabilities and
    propose protocols for hardening Cassandra against Byzantine failures. We
    examine several alternative design choices and compare between them both
    qualitatively and empirically by using the Yahoo! Cloud Serving Benchmark
    (YCSB) performance benchmark. We include incremental performance analysis for
    our algorithmic and cryptographic adjustments, supporting our design choices.

    Portage: Bringing Hackers' Wisdom to Science

    Guilherme Amadio, Benda Xu
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Providing users of HPC systems with a wide variety of up to date software
    packages is a challenging task. Large software stacks built from source are
    difficult to manage, requiring powerful package management tools. The Portage
    package manager from Gentoo is a highly flexible tool that offers a mature
    solution to this otherwise daunting task. The Gentoo Prefix project develops
    and maintains a way of installing Gentoo systems in non-standard locations,
    bringing the virtues of Gentoo to other operating systems. Here we demonstrate
    how a Gentoo Prefix installation can be used to cross compile software packages
    for the Intel Xeon Phi known as Knights Corner, as well as to manage large
    software stacks in HPC environments.

    Correction to the article "Dynamic power management in energy-aware computer networks and data intensive computing systems" published in "Future Generation Computer Systems" journal

    Andrzej Karbowski
    Comments: 7 pages
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI); Systems and Control (cs.SY)

    This paper indicates two errors in the formulation of the main optimization
    model in the article “Dynamic power management in energy-aware computer
    networks and data intensive computing systems” by Niewiadomska-Szynkiewicz et
    al. [FGCS, vol.37 (2014), pp.284-296] and shows how to fix them.

    SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

    Kaiwei Li, Jianfei Chen, Wenguang Chen, Jun Zhu
    Comments: 13 pages, 12 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)

    Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete
    count data such as text and images. Applications require LDA to handle both
    large datasets and a large number of topics. Though distributed CPU systems
    have been used, GPU-based systems have emerged as a promising alternative
    because of the high computational power and memory bandwidth of GPUs. However,
    existing GPU-based LDA systems cannot support a large number of topics because
    they use algorithms on dense data structures whose time and space complexity is
    linear to the number of topics. In this paper, we propose SaberLDA, a GPU-based
    LDA system that implements a sparsity-aware algorithm to achieve sublinear time
    complexity and scales well to learn a large number of topics. To address the
    challenges introduced by sparsity, we propose a novel data layout, a new
    warp-based sampling kernel, and an efficient sparse count matrix updating
    algorithm that improves locality, makes efficient utilization of GPU warps, and
    reduces memory consumption. xperiments show that SaberLDA can learn from
    billions-token-scale data with up to 10,000 topics, which is almost two orders
    of magnitude larger than that of the previous GPU-based systems. With a single
    GPU card, SaberLDA is able to earn 10,000 topics from a dataset of billions of
    tokens in a few hours, which is only achievable with clusters with tens of
    machines before.

    Verification of the Tree-Based Hierarchical Read-Copy Update in the Linux Kernel

    Lihao Liang, Paul E. McKenney, Daniel Kroening, Tom Melham
    Comments: 14 pages, 11 figures
    Subjects: Logic in Computer Science (cs.LO); Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS); Software Engineering (cs.SE)

    Read-Copy Update (RCU) is a scalable, high-performance Linux-kernel
    synchronization mechanism that runs low-overhead readers concurrently with
    updaters. Production-quality RCU implementations for multi-core systems are
    decidedly non-trivial. Giving the ubiquity of Linux, a rare “million-year” bug
    can occur several times per day across the installed base. Stringent validation
    of RCU’s complex behaviors is thus critically important. Exhaustive testing is
    infeasible due to the exponential number of possible executions, which suggests
    use of formal verification.

    Previous verification efforts on RCU either focus on simple implementations
    or use modeling languages, the latter requiring error-prone manual translation
    that must be repeated frequently due to regular changes in the Linux kernel’s
    RCU implementation. In this paper, we first describe the implementation of Tree
    RCU in the Linux kernel. We then discuss how to construct a model directly from
    Tree RCU’s source code in C, and use the CBMC model checker to verify its
    safety and liveness properties. To our best knowledge, this is the first
    verification of a significant part of RCU’s source code, and is an important
    step towards integration of formal verification into the Linux kernel’s
    regression test suite.

    Scalable Construction of Text Indexes

    Timo Bingmann, Simon Gog, Florian Kurpicz
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

    The suffix array is the key to efficient solutions for myriads of string
    processing problems in different applications domains, like data compression,
    data mining, or Bioinformatics. With the rapid growth of available data, suffix
    array construction algorithms had to be adapted to advanced computational
    models such as external memory and distributed computing. In this article, we
    present five suffix array construction algorithms utilizing the new algorithmic
    big data batch processing framework Thrill, which allows us to process input
    sizes in orders of magnitude that have not been considered before.


    Learning

    Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data

    Jialei Wang, Jason D. Lee, Mehrdad Mahdavi, Mladen Kolar, Nathan Srebro
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

    Sketching techniques have become popular for scaling up machine learning
    algorithms by reducing the sample size or dimensionality of massive data sets,
    while still maintaining the statistical power of big data. In this paper, we
    study sketching from an optimization point of view: we first show that the
    iterative Hessian sketch is an optimization process with preconditioning, and
    develop accelerated iterative Hessian sketch via the searching the conjugate
    direction; we then establish primal-dual connections between the Hessian sketch
    and dual random projection, and apply the preconditioned conjugate gradient
    approach on the dual problem, which leads to the accelerated iterative dual
    random projection methods. Finally to tackle the challenges from both large
    sample size and high-dimensionality, we propose the primal-dual sketch, which
    iteratively sketches the primal and dual formulations. We show that using a
    logarithmic number of calls to solvers of small scale problem, primal-dual
    sketch is able to recover the optimum of the original problem up to arbitrary
    precision. The proposed algorithms are validated via extensive experiments on
    synthetic and real data sets which complements our theoretical results.

    Extrapolation and learning equations

    Georg Martius, Christoph H. Lampert
    Comments: 13 pages, 8 figures, 4 tables
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    In classical machine learning, regression is treated as a black box process
    of identifying a suitable function from a hypothesis set without attempting to
    gain insight into the mechanism connecting inputs and outputs. In the natural
    sciences, however, finding an interpretable function for a phenomenon is the
    prime goal as it allows to understand and generalize results. This paper
    proposes a novel type of function learning network, called equation learner
    (EQL), that can learn analytical expressions and is able to extrapolate to
    unseen domains. It is implemented as an end-to-end differentiable feed-forward
    network and allows for efficient gradient based training. Due to sparsity
    regularization concise interpretable expressions can be obtained. Often the
    true underlying source expression is identified.

    Heuristic Approaches for Generating Local Process Models through Log Projections

    Niek Tax, Natalia Sidorova, Wil M. P. van der Aalst, Reinder Haakma
    Comments: paper accepted and to appear in the proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM), special session on Process Mining, part of the Symposium Series on Computational Intelligence (SSCI)
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB)

    Local Process Model (LPM) discovery is focused on the mining of a set of
    process models where each model describes the behavior represented in the event
    log only partially, i.e. subsets of possible events are taken into account to
    create so-called local process models. Often such smaller models provide
    valuable insights into the behavior of the process, especially when no adequate
    and comprehensible single overall process model exists that is able to describe
    the traces of the process from start to end. The practical application of LPM
    discovery is however hindered by computational issues in the case of logs with
    many activities (problems may already occur when there are more than 17 unique
    activities). In this paper, we explore three heuristics to discover subsets of
    activities that lead to useful log projections with the goal of speeding up LPM
    discovery considerably while still finding high-quality LPMs. We found that a
    Markov clustering approach to create projection sets results in the largest
    improvement of execution time, with discovered LPMs still being better than
    with the use of randomly generated activity sets of the same size. Another
    heuristic, based on log entropy, yields a more moderate speedup, but enables
    the discovery of higher quality LPMs. The third heuristic, based on the
    relative information gain, shows unstable performance: for some data sets the
    speedup and LPM quality are higher than with the log entropy based method,
    while for other data sets there is no speedup at all.

    A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation

    Gang Chen
    Comments: 9 pages
    Subjects: Learning (cs.LG)

    We describe recurrent neural networks (RNNs), which have attracted great
    attention on sequential tasks, such as handwriting recognition, speech
    recognition and image to text. However, compared to general feedforward neural
    networks, RNNs have feedback loops, which makes it a little hard to understand
    the backpropagation step. Thus, we focus on basics, especially the error
    backpropagation to compute gradients with respect to model parameters. Further,
    we go into detail on how error backpropagation algorithm is applied on long
    short-term memory (LSTM) by unfolding the memory unit.

    Federated Optimization: Distributed Machine Learning for On-Device Intelligence

    Jakub Konečný, H. Brendan McMahan, Daniel Ramage, Peter Richtárik
    Comments: 38 pages
    Subjects: Learning (cs.LG)

    We introduce a new and increasingly relevant setting for distributed
    optimization in machine learning, where the data defining the optimization are
    unevenly distributed over an extremely large number of nodes. The goal is to
    train a high-quality centralized model. We refer to this setting as Federated
    Optimization. In this setting, communication efficiency is of the utmost
    importance and minimizing the number of rounds of communication is the
    principal goal.

    A motivating example arises when we keep the training data locally on users’
    mobile devices instead of logging it to a data center for training. In
    federated optimziation, the devices are used as compute nodes performing
    computation on their local data in order to update a global model. We suppose
    that we have extremely large number of devices in the network — as many as
    the number of users of a given service, each of which has only a tiny fraction
    of the total data available. In particular, we expect the number of data points
    available locally to be much smaller than the number of devices. Additionally,
    since different users generate data with different patterns, it is reasonable
    to assume that no device has a representative sample of the overall
    distribution.

    We show that existing algorithms are not suitable for this setting, and
    propose a new algorithm which shows encouraging experimental results for sparse
    convex problems. This work also sets a path for future research needed in the
    context of federated optimization.

    Boost K-Means

    Wan-Lei Zhao, Cheng-Hao Deng, Chong-Wah Ngo
    Comments: 11 pages, 10 figures
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB)

    Due to its simplicity and versatility, k-means remains popular since it was
    proposed three decades ago. Since then, continuous efforts have been taken to
    enhance its performance. Unfortunately, a good trade-off between quality and
    efficiency is hardly reached. In this paper, a novel k-means variant is
    presented. Different from most of k-means variants, the clustering procedure is
    explicitly driven by an objective function, which is feasible for the whole
    l2-space. The classic egg-chicken loop in k-means has been simplified to a pure
    stochastic optimization procedure. K-means therefore becomes simpler, faster
    and better. The effectiveness of this new variant has been studied extensively
    in different contexts, such as document clustering, nearest neighbor search and
    image clustering. Superior performance is observed across different scenarios.

    Automatic chemical design using a data-driven continuous representation of molecules

    Rafael Gómez-Bombarelli, David Duvenaud, José Miguel Hernández-Lobato, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, Alán Aspuru-Guzik
    Comments: 23 pages, 15 figures
    Subjects: Learning (cs.LG); Chemical Physics (physics.chem-ph)

    We report a method to convert discrete representations of molecules to and
    from a multidimensional continuous representation. This generative model allows
    efficient search and optimization through open-ended spaces of chemical
    compounds. We train deep neural networks on hundreds of thousands of existing
    chemical structures to construct two coupled functions: an encoder and a
    decoder. The encoder converts the discrete representation of a molecule into a
    real-valued continuous vector, and the decoder converts these continuous
    vectors back to the discrete representation from this latent space. Continuous
    representations allow us to automatically generate novel chemical structures by
    performing simple operations in the latent space, such as decoding random
    vectors, perturbing known chemical structures, or interpolating between
    molecules. Continuous representations also allow the use of powerful
    gradient-based optimization to efficiently guide the search for optimized
    functional compounds. We demonstrate our method in the design of drug-like
    molecules as well as organic light-emitting diodes.

    Equality of Opportunity in Supervised Learning

    Moritz Hardt, Eric Price, Nathan Srebro
    Subjects: Learning (cs.LG)

    We propose a criterion for discrimination against a specified sensitive
    attribute in supervised learning, where the goal is to predict some target
    based on available features. Assuming data about the predictor, target, and
    membership in the protected group are available, we show how to optimally
    adjust any learned predictor so as to remove discrimination according to our
    definition. Our framework also improves incentives by shifting the cost of poor
    classification from disadvantaged groups to the decision maker, who can respond
    by improving the classification accuracy.

    In line with other studies, our notion is oblivious: it depends only on the
    joint statistics of the predictor, the target and the protected attribute, but
    not on interpretation of individualfeatures. We study the inherent limits of
    defining and identifying biases based on such oblivious measures, outlining
    what can and cannot be inferred from different oblivious tests.

    We illustrate our notion using a case study of FICO credit scores.

    Latent Sequence Decompositions

    William Chan, Yu Zhang, Quoc Le, Navdeep Jaitly
    Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)

    We present the Latent Sequence Decompositions (LSD) framework. LSD decomposes
    sequences with variable lengthed output units as a function of both the input
    sequence and the output sequence. We present a training algorithm which samples
    valid extensions and an approximate decoding algorithm. We experiment with the
    Wall Street Journal speech recognition task. Our LSD model achieves 12.9% WER
    compared to a character baseline of 14.8% WER. When combined with a
    convolutional network on the encoder, we achieve 9.2% WER.

    Fully Character-Level Neural Machine Translation without Explicit Segmentation

    Jason Lee, Kyunghyun Cho, Thomas Hofmann
    Comments: 14 pages, 2 figures
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Most existing machine translation systems operate at the level of words,
    relying on explicit segmentation to extract tokens. We introduce a neural
    machine translation (NMT) model that maps a source character sequence to a
    target character sequence without any segmentation. We employ a character-level
    convolutional network with max-pooling at the encoder to reduce the length of
    source representation, allowing the model to be trained at a speed comparable
    to subword-level models while capturing local regularities. Our
    character-to-character model outperforms a recently proposed baseline with a
    subword-level encoder on WMT’15 DE-EN and CS-EN, and gives comparable
    performance on FI-EN and RU-EN. We then demonstrate that it is possible to
    share a single character-level encoder across multiple languages by training a
    model on a many-to-one translation task. In this multilingual setting, the
    character-level encoder significantly outperforms the subword-level encoder on
    all the language pairs. We also observe that the quality of the multilingual
    character-level translation even surpasses the models trained and tuned on one
    language pair, namely on CS-EN, FI-EN and RU-EN.

    Distributed Convex Optimization with Many Convex Constraints

    Joachim Giesen, Sören Laue
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Numerical Analysis (cs.NA); Machine Learning (stat.ML)

    We address the problem of solving convex optimization problems with many
    convex constraints in a distributed setting. Our approach is based on an
    extension of the alternating direction method of multipliers (ADMM) that
    recently gained a lot of attention in the Big Data context. Although it has
    been invented decades ago, ADMM so far can be applied only to unconstrained
    problems and problems with linear equality or inequality constraints. Our
    extension can handle arbitrary inequality constraints directly. It combines the
    ability of ADMM to solve convex optimization problems in a distributed setting
    with the ability of the Augmented Lagrangian method to solve constrained
    optimization problems, and as we show, it inherits the convergence guarantees
    of ADMM and the Augmented Lagrangian method.

    A General Framework for Content-enhanced Network Representation Learning

    Xiaofei Sun, Jiang Guo, Xiao Ding, Ting Liu
    Subjects: Social and Information Networks (cs.SI); Learning (cs.LG)

    This paper investigates the problem of network embedding, which aims at
    learning low-dimensional vector representation of nodes in networks. Most
    existing network embedding methods rely solely on the network structure, i.e.,
    the linkage relationships between nodes, but ignore the rich content
    information associated with it, which is common in real world networks and
    beneficial to describing the characteristics of a node. In this paper, we
    propose content-enhanced network embedding (CENE), which is capable of jointly
    leveraging the network structure and the content information. Our approach
    integrates text modeling and structure modeling in a general framework by
    treating the content information as a special kind of node. Experiments on
    several real world net- works with application to node classification show that
    our models outperform all existing network embedding methods, demonstrating the
    merits of content information and joint learning.

    Personalizing a Dialogue System with Transfer Learning

    Kaixiang Mo, Shuangyin Li, Yu Zhang, Jiajun Li, Qiang Yang
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    It is difficult to train a personalized task-oriented dialogue system because
    the data collected from each individual is often insufficient. Personalized
    dialogue systems trained on a small dataset can overfit and make it difficult
    to adapt to different user needs. One way to solve this problem is to consider
    a collection of multiple users’ data as a source domain and an individual
    user’s data as a target domain, and to perform a transfer learning from the
    source to the target domain. By following this idea, we propose
    “PETAL”(PErsonalized Task diALogue), a transfer-learning framework based on
    POMDP to learn a personalized dialogue system. The system first learns common
    dialogue knowledge from the source domain and then adapts this knowledge to the
    target user. This framework can avoid the negative transfer problem by
    considering differences between source and target users. The policy in the
    personalized POMDP can learn to choose different actions appropriately for
    different users. Experimental results on a real-world coffee-shopping data and
    simulation data show that our personalized dialogue system can choose different
    optimal actions for different users, and thus effectively improve the dialogue
    quality under the personalized setting.

    Ranking academic institutions on potential paper acceptance in upcoming conferences

    Jobin Wilson, Ram Mohan, Muhammad Arif, Santanu Chaudhury, Brejesh Lall
    Comments: KDD 2016, KDD Cup 2016, Appeared in the KDD Cup Workshop 2016,this https URL
    Subjects: Artificial Intelligence (cs.AI); Digital Libraries (cs.DL); Learning (cs.LG)

    The crux of the problem in KDD Cup 2016 involves developing data mining
    techniques to rank research institutions based on publications. Rank importance
    of research institutions are derived from predictions on the number of full
    research papers that would potentially get accepted in upcoming top-tier
    conferences, utilizing public information on the web. This paper describes our
    solution to KDD Cup 2016. We used a two step approach in which we first
    identify full research papers corresponding to each conference of interest and
    then train two variants of exponential smoothing models to make predictions.
    Our solution achieves an overall score of 0.7508, while the winning submission
    scored 0.7656 in the overall results.

    Robust Bayesian Compressed sensing

    Qian Wan, Huiping Duan, Jun Fang, Hongbin Li
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We consider the problem of robust compressed sensing whose objective is to
    recover a high-dimensional sparse signal from compressed measurements corrupted
    by outliers. A new sparse Bayesian learning method is developed for robust
    compressed sensing. The basic idea of the proposed method is to identify and
    remove the outliers from sparse signal recovery. To automatically identify the
    outliers, we employ a set of binary indicator hyperparameters to indicate which
    observations are outliers. These indicator hyperparameters are treated as
    random variables and assigned a beta process prior such that their values are
    confined to be binary. In addition, a Gaussian-inverse Gamma prior is imposed
    on the sparse signal to promote sparsity. Based on this hierarchical prior
    model, we develop a variational Bayesian method to estimate the indicator
    hyperparameters as well as the sparse signal. Simulation results show that the
    proposed method achieves a substantial performance improvement over existing
    robust compressed sensing techniques.

    Dataiku's Solution to SPHERE's Activity Recognition Challenge

    Maxime Voisin, Leo Dreyfus-Schmidt, Pierre Gutierrez, Samuel Ronsin, Marc Beillevaire
    Comments: 5 pages
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Our team won the second prize of the Safe Aging with SPHERE Challenge
    organized by SPHERE, in conjunction with ECML-PKDD and Driven Data. The goal of
    the competition was to recognize activities performed by humans, using sensor
    data. This paper presents our solution. It is based on a rich pre-processing
    and state of the art machine learning methods. From the raw train data, we
    generate a synthetic train set with the same statistical characteristics as the
    test set. We then perform feature engineering. The machine learning modeling
    part is based on stacking weak learners through a grid searched XGBoost
    algorithm. Finally, we use post-processing to smooth our predictions over time.

    A new selection strategy for selective cluster ensemble based on Diversity and Independency

    Muhammad Yousefnezhad, Ali Reihanian, Daoqiang Zhang, Behrouz Minaei-Bidgoli
    Comments: Accepted in Engineering Applications of Artificial Intelligence (EAAI) Journal
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    This research introduces a new strategy in cluster ensemble selection by
    using Independency and Diversity metrics. In recent years, Diversity and
    Quality, which are two metrics in evaluation procedure, have been used for
    selecting basic clustering results in the cluster ensemble selection. Although
    quality can improve the final results in cluster ensemble, it cannot control
    the procedures of generating basic results, which causes a gap in prediction of
    the generated basic results’ accuracy. Instead of quality, this paper
    introduces Independency as a supplementary method to be used in conjunction
    with Diversity. Therefore, this paper uses a heuristic metric, which is based
    on the procedure of converting code to graph in Software Testing, in order to
    calculate the Independency of two basic clustering algorithms. Moreover, a new
    modeling language, which we called as “Clustering Algorithms Independency
    Language” (CAIL), is introduced in order to generate graphs which depict
    Independency of algorithms. Also, Uniformity, which is a new similarity metric,
    has been introduced for evaluating the diversity of basic results. As a
    credential, our experimental results on varied different standard data sets
    show that the proposed framework improves the accuracy of final results
    dramatically in comparison with other cluster ensemble methods.

    Revisiting Multiple Instance Neural Networks

    Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, Wenyu Liu
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Recently neural networks and multiple instance learning are both attractive
    topics in Artificial Intelligence related research fields. Deep neural networks
    have achieved great success in supervised learning problems, and multiple
    instance learning as a typical weakly-supervised learning method is effective
    for many applications in computer vision, biometrics, nature language
    processing, etc. In this paper, we revisit the problem of solving multiple
    instance learning problems using neural networks. Neural networks are appealing
    for solving multiple instance learning problem. The multiple instance neural
    networks perform multiple instance learning in an end-to-end way, which take a
    bag with various number of instances as input and directly output bag label.
    All of the parameters in a multiple instance network are able to be optimized
    via back-propagation. We propose a new multiple instance neural network to
    learn bag representations, which is different from the existing multiple
    instance neural networks that focus on estimating instance label. In addition,
    recent tricks developed in deep learning have been studied in multiple instance
    networks, we find deep supervision is effective for boosting bag classification
    accuracy. In the experiments, the proposed multiple instance networks achieve
    state-of-the-art or competitive performance on several MIL benchmarks.
    Moreover, it is extremely fast for both testing and training, e.g., it takes
    only 0.0003 second to predict a bag and a few seconds to train on a MIL
    datasets on a moderate CPU.

    SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

    Kaiwei Li, Jianfei Chen, Wenguang Chen, Jun Zhu
    Comments: 13 pages, 12 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)

    Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete
    count data such as text and images. Applications require LDA to handle both
    large datasets and a large number of topics. Though distributed CPU systems
    have been used, GPU-based systems have emerged as a promising alternative
    because of the high computational power and memory bandwidth of GPUs. However,
    existing GPU-based LDA systems cannot support a large number of topics because
    they use algorithms on dense data structures whose time and space complexity is
    linear to the number of topics. In this paper, we propose SaberLDA, a GPU-based
    LDA system that implements a sparsity-aware algorithm to achieve sublinear time
    complexity and scales well to learn a large number of topics. To address the
    challenges introduced by sparsity, we propose a novel data layout, a new
    warp-based sampling kernel, and an efficient sparse count matrix updating
    algorithm that improves locality, makes efficient utilization of GPU warps, and
    reduces memory consumption. xperiments show that SaberLDA can learn from
    billions-token-scale data with up to 10,000 topics, which is almost two orders
    of magnitude larger than that of the previous GPU-based systems. With a single
    GPU card, SaberLDA is able to earn 10,000 topics from a dataset of billions of
    tokens in a few hours, which is only achievable with clusters with tens of
    machines before.


    Information Theory

    Optimal Download Cost of Private Information Retrieval for Arbitrary Message Length

    Hua Sun, Syed A. Jafar
    Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Information Retrieval (cs.IR)

    A private information retrieval scheme is a mechanism that allows a user to
    retrieve any one out of $K$ messages from $N$ non-communicating replicated
    databases, each of which stores all $K$ messages, without revealing anything
    about the identity of the desired message index to any individual database. If
    the size of each message is $L$ bits and the total download required by a PIR
    scheme from all $N$ databases is $D$ bits, then $D$ is called the download cost
    and the ratio $L/D$ is called an achievable rate. For fixed $K,Ninmathbb{N}$,
    the capacity of PIR, denoted by $C$, is the supremum of achievable rates over
    all PIR schemes and over all message sizes, and was recently shown to be
    $C=(1+1/N+1/N^2+cdots+1/N^{K-1})^{-1}$. In this work, for arbitrary $K, N$, we
    explore the minimum download cost $D_L$ across all PIR schemes (not restricted
    to linear schemes) for arbitrary message lengths $L$ under arbitrary choices of
    alphabet (not restricted to finite fields) for the message and download
    symbols. If the same $M$-ary alphabet is used for the message and download
    symbols, then we show that the optimal download cost in $M$-ary symbols is
    $D_L=lceilfrac{L}{C}
    ceil$. If the message symbols are in $M$-ary alphabet
    and the downloaded symbols are in $M’$-ary alphabet, then we show that the
    optimal download cost in $M’$-ary symbols, $D_Linleft{leftlceil
    frac{L’}{C}
    ight
    ceil,leftlceil frac{L’}{C}
    ight
    ceil-1,leftlceil
    frac{L’}{C}
    ight
    ceil-2
    ight}$, where $L’= lceil L log_{M’} M
    ceil$.

    Analysis of Downlink and Uplink Decoupling in Dense Cellular Networks

    Alexis I. Aravanis, Olga Munoz, Antonio Pascual-Iserte, Josep Vidal
    Comments: 6 pages, 4 figures, accepted for publication: “21st IEEE International Workshop on Computer Aided Modelling and Design of Communication Links and Networks, Toronto, Canada, 2016”
    Subjects: Information Theory (cs.IT)

    Decoupling uplink (UL) and downlink (DL) is a new architectural paradigm
    where DL and UL are not constrained to be associated to the same base station
    (BS). Building upon this paradigm, the goal of the present paper is to provide
    lower, albeit tight bounds for the ergodic UL capacity of a decoupled cellular
    network. The analysis is performed for a scenario consisting of a macro BS and
    a set of small cells (SCs) whose positions are selected randomly according to a
    Poisson point process of a given spatial density. Based on this analysis simple
    bounds in closed form expressions are defined. The devised bounds are employed
    to compare the performance of the decoupled case versus a set of benchmark
    cases, namely the coupled case, and the situations of having either a single
    macro BS or only SCs. This comparison provides valuable insights regarding the
    behavior and performance of such networks, providing simpler expressions for
    the ergodic UL capacity as a function of the distances to the macro BS and the
    density of SCs. These expressions constitute a simple guide to the minimum
    degree of densification that guarantees the Quality of Service (QoS) objectives
    of the network, thus, providing a valuable tool to the network operator of
    significant practical and commercial value.

    Achievable Rate and Energy Efficiency of Hybrid and Digital Beamforming Receivers with Low Resolution ADC

    Kilian Roth, Josef A. Nossek
    Subjects: Information Theory (cs.IT)

    For 5G it will be important to leverage the available millimeter wave
    spectrum. To achieve an approximately omni- directional coverage with a similar
    effective antenna aperture compared to state of the art cellular systems, an
    antenna array is required at both the mobile and basestation. Due to the large
    bandwidth and inefficient amplifiers available in CMOS for mmWave, the analog
    front-end of the receiver with a large number of antennas becomes especially
    power hungry. Two main solutions exist to reduce the power consumption: Hybrid
    BeamForming (HBF) and Digital BeamForming (DBF) with low resolution ADC. Hybrid
    beamforming can also be combined with low resolution ADCs. This paper compares
    the spectral and energy efficiency based on the chosen RF-frontend
    configuration. A channel with multipath propagation is used. In contrast to
    previous publication, we take the spatial correlation of the quantization noise
    into account. We show that the low resolution ADC are robust to small Automatic
    Gain Control (AGC) imperfections. We showed that in the low SNR regime the
    performance of DBF even with 1-2 bit resolution outperforms HBF. If we consider
    the relationship of spectral and energy efficiency then DBF with 3-5 bits
    resolution achieves the best ratio of spectral efficiency per power consumption
    of the RF receiver frontend over a very wide SNR region. The power consumption
    model is based on components reported in literature.

    A Greedy Blind Calibration Method for Compressed Sensing with Unknown Sensor Gains

    Valerio Cambareri, Laurent Jacques
    Comments: 6 pages, 2 figures. A 5-page version of this draft was submitted to IEEE ICASSP 2017
    Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

    The realisation of sensing modalities based on the principles of compressed
    sensing is often hindered by discrepancies between the mathematical model of
    its sensing operator, which is necessary during signal recovery, and its actual
    physical implementation, whose values may differ significantly from the assumed
    model. In this paper we tackle the bilinear inverse problem of recovering a
    sparse input signal and some unknown, unstructured multiplicative factors
    affecting the sensors that capture each compressive measurement. Our
    methodology relies on collecting a few snapshots under new draws of the sensing
    operator, and applying a greedy algorithm based on projected gradient descent
    and the principles of iterative hard thresholding. We explore empirically the
    sample complexity requirements of this algorithm by testing the phase
    transition of our algorithm, and show in a practically relevant instance of
    compressive imaging that the exact solution can be obtained with only a few
    snapshots.

    Uplink Transmission Design with Massive Machine Type Devices in Tactile Internet

    Changyang She, Chenyang Yang, Tony Q. S. Quek
    Comments: Accepted by IEEE Globecom 2016
    Subjects: Information Theory (cs.IT)

    In this work, we study how to design uplink transmission with massive machine
    type devices in tactile internet, where ultra-short delay and ultra-high
    reliability are required. To characterize the transmission reliability
    constraint, we employ a two-state transmission model based on the achievable
    rate with finite blocklength channel codes. If the channel gain exceeds a
    threshold, a short packet can be transmitted with a small error probability;
    otherwise there is a packet loss. To exploit frequency diversity, we assign
    multiple subchannels to each active device, from which the device selects a
    subchannel with channel gain exceeding the threshold for transmission. To show
    the total bandwidth required to ensure the reliability, we optimize the number
    of subchannels and bandwidth of each subchannel and the threshold for each
    device to minimize the total bandwidth of the system with a given number of
    antennas at the base station. Numerical results show that with 1000 devices in
    one cell, the required bandwidth of the optimized policy is acceptable even for
    prevalent cellular systems. Furthermore, we show that by increasing antennas at
    the BS, frequency diversity becomes unnecessary, and the required bandwidth is
    reduced.

    Energy Efficient Design for Tactile Internet

    Changyang She, Chenyang Yang
    Comments: Accepted by IEEE/CIC ICCC 2016 (invited paper/talk)
    Subjects: Information Theory (cs.IT)

    Ensuring the ultra-low end-to-end latency and ultrahigh reliability required
    by tactile internet is challenging. This is especially true when the stringent
    Quality-of-Service (QoS) requirement is expected to be satisfied not at the
    cost of significantly reducing spectral efficiency and energy efficiency (EE).
    In this paper, we study how to maximize the EE for tactile internet under the
    stringent QoS constraint, where both queueing delay and transmission delay are
    taken into account. We first validate that the upper bound of queueing delay
    violation probability derived from the effective bandwidth can be used to
    characterize the queueing delay violation probability in the short delay regime
    for Poisson arrival process. However, the upper bound is not tight for short
    delay, which leads to conservative designs and hence leads to wasting energy.
    To avoid this, we optimize resource allocation that depends on the queue state
    information and channel state information. Analytical results show that with a
    large number of transmit antennas the EE achieved by the proposed policy
    approaches to the EE limit achieved for infinite delay bound, which implies
    that the policy does not lead to any EE loss. Simulation and numerical results
    show that even for not-so-large number of antennas, the EE achieved by the
    proposed policy is still close to the EE limit.

    Cross-layer Transmission Design for Tactile Internet

    Changyang She, Chenyang Yang, Tony Q. S. Quek
    Comments: Accepted by IEEE Globecom 2016
    Subjects: Information Theory (cs.IT)

    To ensure the low end-to-end (E2E) delay for tactile internet, short frame
    structures will be used in 5G systems. As such, transmission errors with finite
    blocklength channel codes should be considered to guarantee the high
    reliability requirement. In this paper, we study cross-layer transmission
    optimization for tactile internet, where both queueing delay and transmission
    delay are accounted for in the E2E delay, and different packet loss/error
    probabilities are considered to characterize the reliability. We show that the
    required transmit power becomes unbounded when the allowed maximal queueing
    delay is shorter than the channel coherence time. To satisfy quality-of-service
    requirement with finite transmit power, we introduce a proactive packet
    dropping mechanism, and optimize a queue state information and channel state
    information dependent transmission policy. Since the resource and policy for
    transmission and the packet dropping policy are related to the packet error
    probability, queueing delay violation probability, and packet dropping
    probability, we optimize the three probabilities and obtain the policies
    related to these probabilities. We start from single-user scenario and then
    extend our framework to the multi-user scenario. Simulation results show that
    the optimized three probabilities are in the same order of magnitude.
    Therefore, we have to take into account all these factors when we design
    systems for tactile internet applications.

    Vehicle-to-Vehicle Communications with Urban Intersection Path Loss Models

    Mouhamed Abdulla, Erik Steinmetz, Henk Wymeersch
    Comments: Proc. of IEEE Global Communications Conference (GLOBECOM’16)
    Subjects: Information Theory (cs.IT)

    Vehicle-to-vehicle (V2V) communication can improve road safety and traffic
    efficiency, particularly around critical areas such as intersections. We
    analytically derive V2V success probability near an urban intersection, based
    on empirically supported line-of-sight (LOS), weak-line-of-sight (WLOS), and
    non-line-of-sight (NLOS) channel models. The analysis can serve as a
    preliminary design tool for performance assessment over different system
    parameters and target performance requirements.

    Channel Estimation in Broadband Millimeter Wave MIMO Systems with Few-Bit ADCs

    Jianhua Mo, Philip Schniter, Robert W. Heath Jr
    Comments: 13 pages, 16 figs, submitted to IEEE Transactions on Signal Processing
    Subjects: Information Theory (cs.IT)

    We develop a broadband channel estimation algorithm for millimeter wave
    (mmWave) multiple input multiple output (MIMO) systems with few-bit
    analog-to-digital converters (ADCs). The mmWave MIMO channel is approximately
    sparse in the joint angle-delay domain since there are relatively fewer paths
    in the mmWave channel. We formulate the estimation problem as a noisy quantized
    compressed sensing problem. Then the Expectation-Maximization Generalized
    Approximate Message Passing (EM-GAMP) algorithm is used to estimate the
    channel. The angle-delay domain channel coefficients are modeled by a
    Bernoulli-Gaussian-Mixture distribution with unknown parameters, in which case
    the EM-GAMP algorithm can adaptively estimate the parameters. Furthermore,
    training sequences are designed to accelerate the algorithm and minimize the
    estimation error. Our simulation results show that with one-bit ADCs, the
    proposed approach yields relatively low MSE in the important low and medium SNR
    regions. Furthermore, with 3 or 4-bit ADCs, it yields MSE and achievable rate
    that are only slightly worse than with infinite-bit ADCs in terms of estimation
    error and achievable rate at low and medium SNR.

    Transmission Strategies for Remote Estimation with an Energy Harvesting Sensor

    Ayca Ozcelikkale, Tomas McKelvey, Mats Viberg
    Subjects: Information Theory (cs.IT)

    We consider the remote estimation of a time-correlated signal using an energy
    harvesting (EH) sensor. The sensor observes the unknown signal and communicates
    its observations to a remote fusion center using an amplify-and-forward
    strategy. We consider the design of optimal power allocation strategies in
    order to minimize the mean-square error at the fusion center. Contrary to the
    traditional approaches, the degree of correlation between the signal values
    constitutes an important aspect of our formulation. We provide the optimal
    power allocation strategies for a number of illustrative scenarios. We show
    that the most majorized power allocation strategy, i.e. the power allocation as
    balanced as possible, is optimal for the cases of circularly wide-sense
    stationary (c.w.s.s.) signals with a static correlation coefficient, and
    sampled low-pass c.w.s.s. signals for a static channel. We show that the
    optimal strategy can be characterized as a water-filling type solution for
    sampled low-pass c.w.s.s. signals for a fading channel. Motivated by the
    high-complexity of the numerical solution of the optimization problem, we
    propose low-complexity policies for the general scenario. Numerical evaluations
    illustrate the close performance of these low-complexity policies to that of
    the optimal policies, and demonstrate the effect of the EH constraints and the
    degree of freedom of the signal.

    Reconstruction of signals from their autocorrelation and cross-correlation vectors, with applications to phase retrieval and blind channel estimation

    Kishore Jaganathan, Babak Hassibi
    Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

    We consider the problem of reconstructing two signals from the
    autocorrelation and cross-correlation measurements. This inverse problem is a
    fundamental one in signal processing, and arises in many applications,
    including phase retrieval and blind channel estimation. In a typical phase
    retrieval setup, only the autocorrelation measurements are obtainable. We show
    that, when the measurements are obtained using three simple “masks”, phase
    retrieval reduces to the aforementioned reconstruction problem.

    The classic solution to this problem is based on finding common factors
    between the $z$-transforms of the autocorrelation and cross-correlation
    vectors. This solution has enjoyed limited practical success, mainly due to the
    fact that it is not sufficiently stable in the noisy setting. In this work,
    inspired by the success of convex programming in provably and stably solving
    various quadratic constrained problems, we develop a semidefinite
    programming-based algorithm and provide theoretical guarantees. In particular,
    we show that almost all signals can be uniquely recovered by this algorithm (up
    to a global phase). Comparative numerical studies demonstrate that the proposed
    method significantly outperforms the classic method in the noisy setting.

    Defect tolerance: fundamental limits and examples

    Jennifer Tang, Da Wang, Yury Polyanskiy, Gregory Wornell
    Subjects: Information Theory (cs.IT)

    This paper addresses the problem of adding redundancy to a collection of
    physical objects so that the overall system is more robust to failures.
    Physical redundancy can (generally) be achieved by employing copy/substitute
    procedures. This is fundamentally different from information redundancy, where
    a single parity check simultaneously protects a large number of data bits
    against a single erasure. We propose a bipartite graph model of designing
    defect-tolerant systems where defective objects are repaired by reconnecting
    them to strategically placed redundant objects. The fundamental limits of this
    model are characterized under various asymptotic settings and both asymptotic
    and finite-size optimal systems are constructed.

    Mathematically, we say that a $k$ by $m$ bipartite graph corrects $t$ defects
    over an alphabet of size $q$ if for every $q$-coloring of $k$ left vertices
    there exists a $q$-coloring of $m$ right vertices such that every left vertex
    is connected to at least $t$ same-colored right vertices. We study the
    trade-off between redundancy $m / k$ and the total number of edges in the graph
    divided by $k$. The question is trivial when $qge k$: the optimal solution is
    a simple $t$-fold replication. However, when $q<k$ non-trivial savings are
    possible by leveraging the inherent repetition of colors.

    Performance Bounds for Remote Estimation under Energy Harvesting Constraints

    Ayca Ozcelikkale, Tomas McKelvey, Mats Viberg
    Subjects: Information Theory (cs.IT)

    Remote estimation with an energy harvesting sensor with a limited data and
    energy buffer is considered. The sensor node observes an unknown Gaussian field
    and communicates its observations to a remote fusion center using the energy it
    harvested. The fusion center employs minimum mean-square error (MMSE)
    estimation to reconstruct the unknown field. The distortion minimization
    problem under the online scheme, where the sensor has access to only the
    statistical information for the future energy packets is considered. We provide
    performance bounds on the achievable distortion under a slotted block
    transmission scheme, where at each transmission time slot, the data and the
    energy buffer are completely emptied. Our bounds provide insights to the
    trade-offs between the buffer sizes, the statistical properties of the energy
    harvesting process and the achievable distortion. In particular, these
    trade-offs illustrate the insensitivity of the performance to the buffer sizes
    for signals with low degree of freedom and suggest performance improvements
    with increasing buffer size for signals with relatively higher degree of
    freedom. Depending only on the mean, variance and finite support of the energy
    arrival process, these results provide practical insights for the battery and
    buffer sizes for deployment in future energy harvesting wireless sensing
    systems.

    Location-Aided Pilot Decontamination for Massive MIMO Systems

    L. Srikar Muppirisetty, Themistoklis Charalambous, Johnny Karout, Gabor Fodor, Henk Wymeersch
    Comments: 10 pages, 14 figures
    Subjects: Information Theory (cs.IT)

    One of the key limitation of massive MIMO systems is pilot contamination,
    which is defined as the interference during uplink channel estimation due to
    re-use of the same pilots in surrounding cells. In this paper, we propose a
    location-based approach to the pilot contamination problem for uplink MIMO
    systems. Our approach makes use of the approximate locations of mobile devices
    to provide good estimates of the channel statistics between the mobile devices
    and their corresponding base stations (BSs). We aim at minimizing the pilot
    contamination even when the number of BS antennas is not very large, and when
    multiple users from different cells, or even the same cell, are assigned the
    same pilot sequence. First, we characterize a desired angular region of the
    target user at the target BS in which interference is very low or zero, based
    on the number of BS antennas and the location of the target user. Second, based
    on this observation, we propose various pilot coordination methods for
    multi-user multi-cell scenarios to eliminate pilot contamination.

    Cost-Effective Millimeter Wave Communications with Lens Antenna Array

    Yong Zeng, Rui Zhang
    Comments: 14 pages, 5 figures, 1 table, submitted for possible magazine publication
    Subjects: Information Theory (cs.IT)

    Millimeter wave (mmWave) communication is a promising technology for the
    fifth-generation (5G) wireless system. However, the large number of antennas
    used and the wide signal bandwidth in mmWave systems render the conventional
    multi-antenna techniques increasingly costly in terms of signal processing
    complexity, hardware implementation, and power consumption. In this article, we
    investigate cost-effective mmWave communications by first providing an overview
    of the main existing techniques that offer different trade-offs between
    performance and cost, and then focusing our discussion on a promising new
    technique based on the advanced lens antenna array. It is revealed that by
    exploiting the angle-dependent energy focusing property of lens arrays,
    together with the angular sparsity of the mmWave channels, mmWave lens-antenna
    system is able to achieve the capacity-optimal performance with very few
    radio-frequency (RF) chains and using the low-complexity single-carrier
    transmission, even for wide-band frequency-selective channels. Numerical
    results show that the lens-based system significantly outperforms the
    state-of-the-art designs for mmWave systems in both spectrum efficiency and
    energy efficiency.

    Multiuser Diversity Gain from Superposition of Infinite Users over Block-Fading MAC

    Yushu Zhang, Kewu Peng, Jian Song, Shuang Chen
    Comments: 5 pages, 5 figures
    Subjects: Information Theory (cs.IT)

    In uplink block-fading multiple access channel (BF-MAC), the advantage of
    multiuser diversity (MUD) can be taken to achieve higher system throughput. In
    this letter, with rate constraints for all users and only channel state
    information available at the receiver assumed, we demonstrate that
    non-orthogonal multiple access (NOMA) outperforms the counterpart of orthogonal
    multiple access (OMA) via exploiting the MUD from the superposition of multiple
    users. The MUD gain achieved by NOMA compared with OMA is quantitatively
    analyzed for finite users, and the closed-form upper bound of MUD gain from the
    superposition of infinite users is also derived. Numerical results show that
    the potential MUD gain from superposition of infinite users can be well
    approached with limited superposed users, which indicates that multiple access
    schemes with limited superposed users can provide a good tradeoff between
    system performance and decoding complexity.

    Frequency Estimation of Multiple Sinusoids with Three Sub-Nyquist Channels

    Shan Huang, Haijian Zhang, Hong Sun, Lei Yu, Liwen Chen
    Comments: 7 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:1601.00270
    Subjects: Information Theory (cs.IT)

    Frequency estimation of multiple sinusoids is significant in both theory and
    application. In some application scenarios, only sub-Nyquist samples are
    available to estimate the frequencies. A conventional approach is to sample the
    signals at several lower rates. In this paper, we propose a novel method based
    on subspace techniques using three-channel undersampled data. We analyze the
    impact of undersampling and demonstrate that three sub-Nyquist channels are
    general enough to estimate the frequencies provided the undersampling ratios
    are coprime. The ambiguous frequencies obtained from one channel are identified
    and the correct frequencies are screened out by using three-channel samples
    jointly. Numerical experiments verify the correctness of our analysis and
    conclusion. Simulations show that the proposed method is valid and with high
    accuracy.

    Phase transitions and optimal algorithms in high-dimensional Gaussian mixture clustering

    Thibault Lesieur, Caterina De Bacco, Jess Banks, Florent Krzakala, Cris Moore, Lenka Zdeborová
    Comments: 8 pages, 3 figures, conference
    Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Information Theory (cs.IT)

    We consider the problem of Gaussian mixture clustering in the
    high-dimensional limit where the data consists of $m$ points in $n$ dimensions,
    $n,m
    ightarrow infty$ and $alpha = m/n$ stays finite. Using exact but
    non-rigorous methods from statistical physics, we determine the critical value
    of $alpha$ and the distance between the clusters at which it becomes
    information-theoretically possible to reconstruct the membership into clusters
    better than chance. We also determine the accuracy achievable by the
    Bayes-optimal estimation algorithm. In particular, we find that when the number
    of clusters is sufficiently large, $r > 4 + 2 sqrt{alpha}$, there is a gap
    between the threshold for information-theoretically optimal performance and the
    threshold at which known algorithms succeed.

    Iterative Secret Key Rate Adapting with Error Minimization for Continuous-Variable Quantum Key Distribution

    Laszlo Gyongyosi
    Comments: 32 pages, 8 figures. arXiv admin note: text overlap with arXiv:1603.09247, arXiv:1504.05574
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    We define an iterative error-minimizing secret key adapting method for
    multicarrier CVQKD. A multicarrier CVQKD protocol uses Gaussian subcarrier
    quantum continuous variables (CVs) for the transmission. The proposed method
    allows for the parties to reach a given target secret key rate with minimized
    error rate through the Gaussian sub-channels by a sub-channel adaption
    procedure. The adaption algorithm iteratively determines the optimal transmit
    conditions to achieve the target secret key rate and the minimal error rate
    over the sub-channels. The solution requires no complex calculations or
    computational tools, allowing for easy implementation for experimental CVQKD
    scenarios.

    Diversity Extraction for Multicarrier Continuous-Variable Quantum Key Distribution

    Laszlo Gyongyosi
    Comments: 5 pages, 2 figures, IEEE Signal Processing Conference Proceedings (shortened version of arXiv:1405.6948)
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    We introduce a diversity extraction for multicarrier continuous-variable (CV)
    quantum key distribution (QKD). The diversity extraction utilizes the resources
    that are injected into the transmission by the additional degrees of freedom of
    the multicarrier modulation. The multicarrier scheme granulates the information
    into Gaussian subcarrier CVs and divides the physical link into several
    Gaussian sub-channels for the transmission. We prove that the exploitable extra
    degree of freedom in a multicarrier CVQKD scenario significantly extends the
    possibilities of single-carrier CVQKD. The diversity extraction allows for the
    parties to reach decreased error probabilities by utilizing those extra
    resources of a multicarrier transmission that are not available in a
    single-carrier CVQKD setting. The additional resources of multicarrier CVQKD
    allow the achievement of significant performance improvements that are
    particularly crucial in an experimental scenario.

    Minimax Optimality of Shiryaev-Roberts Procedure for Quickest Drift Change Detection of a Brownian motion

    Taposh Banerjee, George V. Moustakides
    Comments: Submitted
    Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Optimization and Control (math.OC)

    The problem of detecting a change in the drift of a Brownian motion is
    considered. The change point is assumed to have a modified exponential prior
    distribution with unknown parameters. A worst-case analysis with respect to
    these parameters is adopted leading to a min-max problem formulation.
    Analytical and numerical justifications are provided towards establishing that
    the Shiryaev-Roberts procedure with a specially designed starting point is
    exactly optimal for the proposed mathematical setup.

    Universal Clustering via Crowdsourcing

    Ravi Kiran Raman, Lav Varshney
    Subjects: Human-Computer Interaction (cs.HC); Information Theory (cs.IT); Machine Learning (stat.ML)

    Consider unsupervised clustering of objects drawn from a discrete set,
    through the use of human intelligence available in crowdsourcing platforms.
    This paper defines and studies the problem of universal clustering using
    responses of crowd workers, without knowledge of worker reliability or task
    difficulty. We model stochastic worker response distributions by incorporating
    traits of memory for similar objects and traits of distance among differing
    objects. We are particularly interested in two limiting worker
    types—temporary workers who retain no memory of responses and long-term
    workers with memory. We first define clustering algorithms for these limiting
    cases and then integrate them into an algorithm for the unified worker model.
    We prove asymptotic consistency of the algorithms and establish sufficient
    conditions on the sample complexity of the algorithm. Converse arguments
    establish necessary conditions on sample complexity, proving that the defined
    algorithms are asymptotically order-optimal in cost.




沪ICP备19023445号-2号
友情链接