IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Mon, 5 Jun 2017

    我爱机器学习(52ml.net)发表于 2017-06-05 00:00:00
    love 0

    Neural and Evolutionary Computing

    Dataflow Matrix Machines as a Model of Computations with Linear Streams

    Michael Bukatin, Jon Anthony
    Comments: 6 pages, accepted for presentation at LearnAut 2017: Learning and Automata workshop at LICS (Logic in Computer Science) 2017 conference. Preprint original version: April 9, 2017; minor correction: May 1, 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Programming Languages (cs.PL)

    We overview dataflow matrix machines as a Turing complete generalization of
    recurrent neural networks and as a programming platform. We describe vector
    space of finite prefix trees with numerical leaves which allows us to combine
    expressive power of dataflow matrix machines with simplicity of traditional
    recurrent neural networks.

    Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks

    Alberto Delmas, Patrick Judd, Sayeh Sharify, Andreas Moshovos
    Comments: 3 pages, 3 figures
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Stripes is a Deep Neural Network (DNN) accelerator that uses bit-serial
    computation to offer performance that is proportional to the fixed-point
    precision of the activation values. The fixed-point precisions are determined a
    priori using profiling and are selected at a per layer granularity. This paper
    presents Dynamic Stripes, an extension to Stripes that detects precision
    variance at runtime and at a finer granularity. This extra level of precision
    reduction increases performance by 41% over Stripes.

    CATERPILLAR: Coarse Grain Reconfigurable Architecture for Accelerating the Training of Deep Neural Networks

    Yuanfang Li, Ardavan Pedram
    Comments: 10 pages, 10 figures, ASAP 2017: The 28th Annual IEEE International Conference on Application-specific Systems, Architectures and Processors
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Accelerating the inference of a trained DNN is a well studied subject. In
    this paper we switch the focus to the training of DNNs. The training phase is
    compute intensive, demands complicated data communication, and contains
    multiple levels of data dependencies and parallelism. This paper presents an
    algorithm/architecture space exploration of efficient accelerators to achieve
    better network convergence rates and higher energy efficiency for training
    DNNs. We further demonstrate that an architecture with hierarchical support for
    collective communication semantics provides flexibility in training various
    networks performing both stochastic and batched gradient descent based
    techniques. Our results suggest that smaller networks favor non-batched
    techniques while performance for larger networks is higher using batched
    operations. At 45nm technology, CATERPILLAR achieves performance efficiencies
    of 177 GFLOPS/W at over 80% utilization for SGD training on small networks and
    211 GFLOPS/W at over 90% utilization for pipelined SGD/CP training on larger
    networks using a total area of 103.2 mm(^2) and 178.9 mm(^2) respectively.


    Computer Vision and Pattern Recognition

    Automating Carotid Intima-Media Thickness Video Interpretation with Convolutional Neural Networks

    Jae Y. Shin, Nima Tajbakhsh, R. Todd Hurst, Christopher B. Kendall, Jianming Liang
    Comments: J. Y. Shin, N. Tajbakhsh, R. T. Hurst, C. B. Kendall, and J. Liang. Automating carotid intima-media thickness video interpretation with convolutional neural networks. CVPR 2016, pp 2526-2535; N. Tajbakhsh, J. Y. Shin, R. T. Hurst, C. B. Kendall, and J. Liang. Automatic interpretation of CIMT videos using convolutional neural networks. Deep Learning for Medical Image Analysis, Academic Press, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Cardiovascular disease (CVD) is the leading cause of mortality yet largely
    preventable, but the key to prevention is to identify at-risk individuals
    before adverse events. For predicting individual CVD risk, carotid intima-media
    thickness (CIMT), a noninvasive ultrasound method, has proven to be valuable,
    offering several advantages over CT coronary artery calcium score. However,
    each CIMT examination includes several ultrasound videos, and interpreting each
    of these CIMT videos involves three operations: (1) select three end-diastolic
    ultrasound frames (EUF) in the video, (2) localize a region of interest (ROI)
    in each selected frame, and (3) trace the lumen-intima interface and the
    media-adventitia interface in each ROI to measure CIMT. These operations are
    tedious, laborious, and time consuming, a serious limitation that hinders the
    widespread utilization of CIMT in clinical practice. To overcome this
    limitation, this paper presents a new system to automate CIMT video
    interpretation. Our extensive experiments demonstrate that the suggested system
    significantly outperforms the state-of-the-art methods. The superior
    performance is attributable to our unified framework based on convolutional
    neural networks (CNNs) coupled with our informative image representation and
    effective post-processing of the CNN outputs, which are uniquely designed for
    each of the above three operations.

    Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?

    Nima Tajbakhsh, Jae Y. Shin, Suryakanth R. Gurudu, R. Todd Hurst, Christopher B. Kendall, Michael B. Gotway, Jianming Liang
    Journal-ref: IEEE Transactions on Medical Imaging. 35(5):1299-1312 (2016)
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Training a deep convolutional neural network (CNN) from scratch is difficult
    because it requires a large amount of labeled training data and a great deal of
    expertise to ensure proper convergence. A promising alternative is to fine-tune
    a CNN that has been pre-trained using, for instance, a large set of labeled
    natural images. However, the substantial differences between natural and
    medical images may advise against such knowledge transfer. In this paper, we
    seek to answer the following central question in the context of medical image
    analysis: emph{Can the use of pre-trained deep CNNs with sufficient
    fine-tuning eliminate the need for training a deep CNN from scratch?} To
    address this question, we considered 4 distinct medical imaging applications in
    3 specialties (radiology, cardiology, and gastroenterology) involving
    classification, detection, and segmentation from 3 different imaging
    modalities, and investigated how the performance of deep CNNs trained from
    scratch compared with the pre-trained CNNs fine-tuned in a layer-wise manner.
    Our experiments consistently demonstrated that (1) the use of a pre-trained CNN
    with adequate fine-tuning outperformed or, in the worst case, performed as well
    as a CNN trained from scratch; (2) fine-tuned CNNs were more robust to the size
    of training sets than CNNs trained from scratch; (3) neither shallow tuning nor
    deep tuning was the optimal choice for a particular application; and (4) our
    layer-wise fine-tuning scheme could offer a practical way to reach the best
    performance for the application at hand based on the amount of available data.

    Temporal Action Labeling using Action Sets

    Alexander Richard, Hilde Kuehne, Juergen Gall
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Action detection and temporal segmentation of actions in videos are topics of
    increasing interest. While fully supervised systems have gained much attention
    lately, full annotation of each action within the video is costly and
    impractical for large amounts of video data. Thus, weakly supervised action
    detection and temporal segmentation methods are of great importance. While most
    works in this area assume an ordered sequence of occurring actions to be given,
    our approach only uses a set of actions. Such action sets provide much less
    supervision since neither action ordering nor the number of action occurrences
    are known. In exchange, they can be easily obtained, for instance, from
    meta-tags, while ordered sequences still require human annotation. We introduce
    a system that automatically learns to temporally segment and label actions in a
    video, where the only supervision that is used are action sets. We evaluate our
    method on three datasets and show that it performs close to or on par with
    recent weakly supervised methods that require ordering constraints.

    Development of a N-type GM-PHD Filter for Multiple Target, Multiple Type Visual Tracking

    Nathanael L. Baisa, Andrew Wallace
    Comments: arXiv admin note: text overlap with arXiv:1705.0475
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a new framework that extends the standard Probability Hypothesis
    Density (PHD) filter for multiple targets having (N) different types where
    (Ngeq2) based on Random Finite Set (RFS) theory, taking into account not only
    background false positives (clutter), but also confusions among detections of
    different target types, which are in general different in character from
    background clutter. Under the assumptions of Gaussianity and linearity, our
    framework extends the existing Gaussian mixture (GM) implementation of the
    standard PHD filter to create a N-type GM-PHD filter. The methodology is
    applied to real video sequences by integrating object detectors’ information
    into this filter for two scenarios. In the first scenario, a tri-GM-PHD filter
    ((N=3)) is applied to real video sequences containing three types of multiple
    targets in the same scene, two football teams and a referee, using separate but
    confused detections. In the second scenario, we use a dual GM-PHD filter
    ((N=2)) for tracking pedestrians and vehicles in the same scene handling their
    detectors’ confusions. For both cases, Munkres’s variant of the Hungarian
    assignment algorithm is used to associate tracked target identities between
    frames. This approach is evaluated and compared to both raw detection and
    independent GM-PHD filters using the Optimal Sub-pattern Assignment (OSPA)
    metric and the discrimination rate. This shows the improved performance of our
    strategy on real video sequences.

    Dual-reference Face Retrieval: What Does He/She Look Like at Age `X'?

    BingZhang Hu, Feng Zheng, Ling Shao
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Face retrieval has received much attention over the past few decades, and
    many efforts have been made in retrieving face images against pose,
    illumination, and expression variations. However, the conventional works fail
    to meet the requirements of a potential and novel task — retrieving a
    person’s face image at a given age, ie `what does a person look like at age
    (X)?’ The reason that previous works struggle is that text-based approaches
    generally suffer from insufficient age labels and content-based methods
    typically take a single input image as query, which can only indicate either
    the identity or the age. To tackle this problem, we propose a dual reference
    face retrieval framework in this paper, where the identity and the age are
    reflected by two reference images respectively. In our framework, the raw
    images are first projected on a joint manifold, which preserves both the age
    and identity locality. Then two similarity metrics of age and identity are
    exploited and optimized by utilizing our proposed quartet-based model. The
    quartet-based model is novel as it simultaneously describes the similarity in
    two aspects: identity and age. The experiment shows a promising result,
    outperforming hierarchical methods. It is also shown that the learned joint
    manifold is a powerful representation of the human face.

    Facies classification from well logs using an inception convolutional network

    Valentin Tschannen, Matthias Delescluse, Mathieu Rodriguez, Janis Keuper
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The idea to use automated algorithms to determine geological facies from well
    logs is not new (see e.g Busch et al. (1987); Rabaute (1998)) but the recent
    and dramatic increase in research in the field of machine learning makes it a
    good time to revisit the topic. Following an exercise proposed by Dubois et al.
    (2007) and Hall (2016) we employ a modern type of deep convolutional network,
    called extit{inception network} (Szegedy et al., 2015), to tackle the
    supervised classification task and we discuss the methodological limits of such
    problem as well as further research opportunities.

    Dynamic Steerable Blocks in Deep Residual Networks

    Jörn-Henrik Jacobsen, Bert de Brabandere, Arnold W.M. Smeulders
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Filters in convolutional networks are typically parameterized in a pixel
    basis, that does not take prior knowledge about the visual world into account.
    We investigate the generalized notion of frames, that can be designed with
    image properties in mind, as alternatives to this parametrization. We show that
    frame-based ResNets and Densenets can improve performance on Cifar-10+
    consistently, while having additional pleasant properties like steerability. By
    exploiting these transformation properties explicitly, we arrive at dynamic
    steerable blocks. They are an extension of residual blocks, that are able to
    seamlessly transform filters under pre-defined transformations, conditioned on
    the input at training and inference time. Dynamic steerable blocks learn the
    degree of invariance from data and locally adapt filters, allowing them to
    apply a different geometrical variant of the same filter to each location of
    the feature map. When evaluated on the Berkeley Segmentation contour detection
    dataset, our approach outperforms all competing approaches that do not utilize
    pre-training, highlighting the benefits of image-based regularization to deep
    networks.

    Image Restoration from Patch-based Compressed Sensing Measurement

    Guangtao Nie, Ying Fu, Yinqiang Zheng, Hua Huang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A series of methods have been proposed to reconstruct an image from
    compressively sensed random measurement, but most of them have high time
    complexity and are inappropriate for patch-based compressed sensing capture,
    because of their serious blocky artifacts in the restoration results. In this
    paper, we present a non-iterative image reconstruction method from patch-based
    compressively sensed random measurement. Our method features two cascaded
    networks based on residual convolution neural network to learn the end-to-end
    full image restoration, which is capable of reconstructing image patches and
    removing the blocky effect with low time cost. Experimental results on
    synthetic and real data show that our method outperforms state-of-the-art
    compressive sensing (CS) reconstruction methods with patch-based CS
    measurement. To demonstrate the effectiveness of our method in more general
    setting, we apply the de-block process in our method to JPEG compression
    artifacts removal and achieve outstanding performance as well.

    Recursive Cross-Domain Face/Sketch Generation from Limited Facial Parts

    Yang Song, Zhifei Zhang, Hairong Qi
    Comments: Submitted to ICCV 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We start by asking an interesting yet challenging question, “If an eyewitness
    can only recall the eye features of the suspect, such that the forensic artist
    can only produce a sketch of the eyes (e.g., the top-left sketch shown in Fig.
    1), can advanced computer vision techniques help generate the whole face
    image?” A more generalized question is that if a large proportion (e.g., more
    than 50%) of the face/sketch is missing, can a realistic whole face
    sketch/image still be estimated. Existing face completion and generation
    methods either do not conduct domain transfer learning or can not handle large
    missing area. For example, the inpainting approach tends to blur the generated
    region when the missing area is large (i.e., more than 50%). In this paper, we
    exploit the potential of deep learning networks in filling large missing region
    (e.g., as high as 95% missing) and generating realistic faces with
    high-fidelity in cross domains. We propose the recursive generation by
    bidirectional transformation networks (r-BTN) that recursively generates a
    whole face/sketch from a small sketch/face patch. The large missing area and
    the cross domain challenge make it difficult to generate satisfactory results
    using a unidirectional cross-domain learning structure. On the other hand, a
    forward and backward bidirectional learning between the face and sketch domains
    would enable recursive estimation of the missing region in an incremental
    manner (Fig. 1) and yield appealing results. r-BTN also adopts an adversarial
    constraint to encourage the generation of realistic faces/sketches. Extensive
    experiments have been conducted to demonstrate the superior performance from
    r-BTN as compared to existing potential solutions.

    Rank Persistence: Assessing the Temporal Performance of Real-World Person Re-Identification

    Srikrishna Karanam, Eric Lam, Richard J. Radke
    Comments: 8 pages, 7 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Designing useful person re-identification systems for real-world applications
    requires attention to operational aspects not typically considered in academic
    research. Here, we focus on the temporal aspect of re-identification; that is,
    instead of finding a match to a probe person of interest in a fixed candidate
    gallery, we consider the more realistic scenario in which the gallery is
    continuously populated by new candidates over a long time period. A key
    question of interest for an operator of such a system is: how long is a correct
    match to a probe likely to remain in a rank-k shortlist of possible candidates?
    We propose to distill this information into a Rank Persistence Curve (RPC),
    which allows different algorithms’ temporal performance characteristics to be
    directly compared. We present examples to illustrate the RPC using a new
    long-term dataset with multiple candidate reappearances, and discuss
    considerations for future re-identification research that explicitly involves
    temporal aspects.

    SAR Image Despeckling Using a Convolutional

    Puyang Wang, He Zhang, Vishal M. Patel
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Synthetic Aperture Radar (SAR) images are often contaminated by a
    multiplicative noise known as speckle. Speckle makes the processing and
    interpretation of SAR images difficult. We propose a deep learning-based
    approach called, Image Despeckling Convolutional Neural Network (ID-CNN), for
    automatically removing speckle from the input noisy images. In particular,
    ID-CNN uses a set of convolutional layers along with batch normalization and
    rectified linear unit (ReLU) activation function and a component-wise division
    residual layer to estimate speckle and it is trained in an end-to-end fashion
    using a combination of Euclidean loss and Total Variation (TV) loss. Extensive
    experiments on synthetic and real SAR images show that the proposed method
    achieves significant improvements over the state-of-the-art speckle reduction
    methods.

    Integrated Deep and Shallow Networks for Salient Object Detection

    Jing Zhang, Bo Li, Yuchao Dai, Fatih Porikli, Mingyi He
    Comments: Accepted by IEEE International Conference on Image Processing (ICIP) 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep convolutional neural network (CNN) based salient object detection
    methods have achieved state-of-the-art performance and outperform those
    unsupervised methods with a wide margin. In this paper, we propose to integrate
    deep and unsupervised saliency for salient object detection under a unified
    framework. Specifically, our method takes results of unsupervised saliency
    (Robust Background Detection, RBD) and normalized color images as inputs, and
    directly learns an end-to-end mapping between inputs and the corresponding
    saliency maps. The color images are fed into a Fully Convolutional Neural
    Networks (FCNN) adapted from semantic segmentation to exploit high-level
    semantic cues for salient object detection. Then the results from deep FCNN and
    RBD are concatenated to feed into a shallow network to map the concatenated
    feature maps to saliency maps. Finally, to obtain a spatially consistent
    saliency map with sharp object boundaries, we fuse superpixel level saliency
    map at multi-scale. Extensive experimental results on 8 benchmark datasets
    demonstrate that the proposed method outperforms the state-of-the-art
    approaches with a margin.

    Data Augmentation of Wearable Sensor Data for Parkinson's Disease Monitoring using Convolutional Neural Networks

    Terry Taewoong Um, Franz Michael Josef Pfister, Daniel Pichler, Satoshi Endo, Muriel Lang, Sandra Hirche, Urban Fietzek, Dana Kulić
    Comments: submitted to ICMI2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    While convolutional neural networks (CNNs) have been successfully applied to
    many challenging classification applications, they typically require large
    datasets for training. When the availability of labeled data is limited, data
    augmentation is a critical preprocessing step for CNNs. However, data
    augmentation for wearable sensor data has not been deeply investigated yet.

    In this paper, various data augmentation methods for wearable sensor data are
    proposed. The proposed methods and CNNs are applied to the problem of
    classifying the motor state of Parkinson’s Disease (PD) patients, which is
    challenging due to small dataset size, noisy labels, and large within-class
    variability. Appropriate augmentation improves the classification performance
    from 76.7% to 92.0%.

    A Vision System for Multi-View Face Recognition

    M. Y. Shams, A. S. Tolba, S.H. Sarhan
    Comments: 7 pages, 4 figures, 4 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Multimodal biometric identification has been grown a great attention in the
    most interests in the security fields. In the real world there exist modern
    system devices that are able to detect, recognize, and classify the human
    identities with reliable and fast recognition rates. Unfortunately most of
    these systems rely on one modality, and the reliability for two or more
    modalities are further decreased. The variations of face images with respect to
    different poses are considered as one of the important challenges in face
    recognition systems. In this paper, we propose a multimodal biometric system
    that able to detect the human face images that are not only one view face
    image, but also multi-view face images. Each subject entered to the system
    adjusted their face at front of the three cameras, and then the features of the
    face images are extracted based on Speeded Up Robust Features (SURF) algorithm.
    We utilize Multi-Layer Perceptron (MLP) and combined classifiers based on both
    Learning Vector Quantization (LVQ), and Radial Basis Function (RBF) for
    classification purposes. The proposed system has been tested using SDUMLA-HMT,
    and CASIA datasets. Furthermore, we collected a database of multi-view face
    images by which we take the additive white Gaussian noise into considerations.
    The results indicated the reliability, robustness of the proposed system with
    different poses and variations including noise images.

    Personalized Pancreatic Tumor Growth Prediction via Group Learning

    Ling Zhang, Le Lu, Ronald M. Summers, Electron Kebebew, Jianhua Yao
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Tumor growth prediction, a highly challenging task, has long been viewed as a
    mathematical modeling problem, where the tumor growth pattern is personalized
    based on imaging and clinical data of a target patient. Though mathematical
    models yield promising results, their prediction accuracy may be limited by the
    absence of population trend data and personalized clinical characteristics. In
    this paper, we propose a statistical group learning approach to predict the
    tumor growth pattern that incorporates both the population trend and
    personalized data, in order to discover high-level features from multimodal
    imaging data. A deep convolutional neural network approach is developed to
    model the voxel-wise spatio-temporal tumor progression. The deep features are
    combined with the time intervals and the clinical factors to feed a process of
    feature selection. Our predictive model is pretrained on a group data set and
    personalized on the target patient data to estimate the future spatio-temporal
    progression of the patient’s tumor. Multimodal imaging data at multiple time
    points are used in the learning, personalization and inference stages. Our
    method achieves a Dice coefficient of 86.8% +- 3.6% and RVD of 7.9% +- 5.4% on
    a pancreatic tumor data set, outperforming the DSC of 84.4% +- 4.0% and RVD
    13.9% +- 9.8% obtained by a previous state-of-the-art model-based method.

    Provenance Filtering for Multimedia Phylogeny

    Allan Pinto, Daniel Moreira, Aparna Bharati, Joel Brogan, Kevin Bowyer, Patrick Flynn, Walter Scheirer, Anderson Rocha
    Comments: 5 pages, Accepted in IEEE International Conference on Image Processing (ICIP), 2017
    Subjects: Information Retrieval (cs.IR); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

    Departing from traditional digital forensics modeling, which seeks to analyze
    single objects in isolation, multimedia phylogeny analyzes the evolutionary
    processes that influence digital objects and collections over time. One of its
    integral pieces is provenance filtering, which consists of searching a
    potentially large pool of objects for the most related ones with respect to a
    given query, in terms of possible ancestors (donors or contributors) and
    descendants. In this paper, we propose a two-tiered provenance filtering
    approach to find all the potential images that might have contributed to the
    creation process of a given query (q). In our solution, the first (coarse) tier
    aims to find the most likely “host” images — the major donor or background
    — contributing to a composite/doctored image. The search is then refined in
    the second tier, in which we search for more specific (potentially small) parts
    of the query that might have been extracted from other images and spliced into
    the query image. Experimental results with a dataset containing more than a
    million images show that the two-tiered solution underpinned by the context of
    the query is highly useful for solving this difficult task.


    Artificial Intelligence

    ICABiDAS: Intuition Centred Architecture for Big Data Analysis and Synthesis

    Amit Kumar Mishra
    Comments: This paper is presented in the Biologically Inspired Cognitive Architecture Conference 2017 and published by their proceedings
    Subjects: Artificial Intelligence (cs.AI)

    Humans are expert in the amount of sensory data they deal with each moment.
    Human brain not only analyses these data but also starts synthesizing new
    information from the existing data. The current age Big-data systems are needed
    not just to analyze data but also to come up new interpretation. We believe
    that the pivotal ability in human brain which enables us to do this is what is
    known as “intuition”. Here, we present an intuition based architecture for big
    data analysis and synthesis.

    Joint Matrix-Tensor Factorization for Knowledge Base Inference

    Prachi Jain, Shikhar Murty, Mausam, Soumen Chakrabarti
    Subjects: Artificial Intelligence (cs.AI)

    While several matrix factorization (MF) and tensor factorization (TF) models
    have been proposed for knowledge base (KB) inference, they have rarely been
    compared across various datasets. Is there a single model that performs well
    across datasets? If not, what characteristics of a dataset determine the
    performance of MF and TF models? Is there a joint TF+MF model that performs
    robustly on all datasets? We perform an extensive evaluation to compare popular
    KB inference models across popular datasets in the literature. In addition to
    answering the questions above, we remove a limitation in the standard
    evaluation protocol for MF models, propose an extension to MF models so that
    they can better handle out-of-vocabulary (OOV) entity pairs, and develop a
    novel combination of TF and MF models. We also analyze and explain the results
    based on models and dataset characteristics. Our best model is robust, and
    obtains strong results across all datasets.

    Exception-Based Knowledge Updates

    Martin Slota, Joao Leite
    Subjects: Artificial Intelligence (cs.AI)

    Existing methods for dealing with knowledge updates differ greatly depending
    on the underlying knowledge representation formalism. When Classical Logic is
    used, updates are typically performed by manipulating the knowledge base on the
    model-theoretic level. On the opposite side of the spectrum stand the semantics
    for updating Answer-Set Programs that need to rely on rule syntax. Yet, a
    unifying perspective that could embrace both these branches of research is of
    great importance as it enables a deeper understanding of all involved methods
    and principles and creates room for their cross-fertilisation, ripening and
    further development.

    This paper bridges the seemingly irreconcilable approaches to updates. It
    introduces a novel monotonic characterisation of rules, dubbed RE-models, and
    shows it to be a more suitable semantic foundation for rule updates than
    SE-models. Then it proposes a generic scheme for specifying semantic rule
    update operators, based on the idea of viewing a program as the set of sets of
    RE-models of its rules; updates are performed by introducing additional
    interpretations – exceptions – to the sets of RE-models of rules in the
    original program. The introduced scheme is used to define rule update operators
    that are closely related to both classical update principles and traditional
    approaches to rules updates, and serve as a basis for a solution to the
    long-standing problem of state condensing, showing how they can be equivalently
    defined as binary operators on some class of logic programs.

    Finally, the essence of these ideas is extracted to define an abstract
    framework for exception-based update operators, viewing a knowledge base as the
    set of sets of models of its elements, which can capture a wide range of both
    model- and formula-based classical update operators, and thus serves as the
    first firm formal ground connecting classical and rule updates.

    Latent Attention Networks

    Christopher Grimm, Dilip Arumugam, Siddharth Karamcheti, David Abel, Lawson L.S. Wong, Michael L. Littman
    Subjects: Artificial Intelligence (cs.AI)

    Deep neural networks are able to solve tasks across a variety of domains and
    modalities of data. Despite many empirical successes, we lack the ability to
    clearly understand and interpret the learned internal mechanisms that
    contribute to such effective behaviors or, more critically, failure modes. In
    this work, we present a general method for visualizing an arbitrary neural
    network’s inner mechanisms and their power and limitations. Our dataset-centric
    method produces visualizations of how a trained network attends to components
    of its inputs. The computed “attention masks” support improved interpretability
    by highlighting which input attributes are critical in determining output. We
    demonstrate the effectiveness of our framework on a variety of deep neural
    network architectures in domains from computer vision, natural language
    processing, and reinforcement learning. The primary contribution of our
    approach is an interpretable visualization of attention that provides unique
    insights into the network’s underlying decision-making process irrespective of
    the data modality.

    Knowledge Representation in Bicategories of Relations

    Evan Patterson
    Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Category Theory (math.CT)

    We introduce the relational ontology log, or relational olog, a knowledge
    representation system based on the category of sets and relations. It is
    inspired by Spivak and Kent’s olog, a recent categorical framework for
    knowledge representation. Relational ologs interpolate between ologs and
    description logic, the dominant formalism for knowledge representation today.
    In this paper, we investigate relational ologs both for their own sake and to
    gain insight into the relationship between the algebraic and logical approaches
    to knowledge representation. On a practical level, we show by example that
    relational ologs have a friendly and intuitive–yet fully precise–graphical
    syntax, derived from the string diagrams of monoidal categories. We explain
    several other useful features of relational ologs not possessed by most
    description logics, such as a type system and a rich, flexible notion of
    instance data. In a more theoretical vein, we draw on categorical logic to show
    how relational ologs can be translated to and from logical theories in a
    fragment of first-order logic. Although we make extensive use of categorical
    language, this paper is designed to be self-contained and has considerable
    expository content. The only prerequisites are knowledge of first-order logic
    and the rudiments of category theory.

    Automating Carotid Intima-Media Thickness Video Interpretation with Convolutional Neural Networks

    Jae Y. Shin, Nima Tajbakhsh, R. Todd Hurst, Christopher B. Kendall, Jianming Liang
    Comments: J. Y. Shin, N. Tajbakhsh, R. T. Hurst, C. B. Kendall, and J. Liang. Automating carotid intima-media thickness video interpretation with convolutional neural networks. CVPR 2016, pp 2526-2535; N. Tajbakhsh, J. Y. Shin, R. T. Hurst, C. B. Kendall, and J. Liang. Automatic interpretation of CIMT videos using convolutional neural networks. Deep Learning for Medical Image Analysis, Academic Press, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Cardiovascular disease (CVD) is the leading cause of mortality yet largely
    preventable, but the key to prevention is to identify at-risk individuals
    before adverse events. For predicting individual CVD risk, carotid intima-media
    thickness (CIMT), a noninvasive ultrasound method, has proven to be valuable,
    offering several advantages over CT coronary artery calcium score. However,
    each CIMT examination includes several ultrasound videos, and interpreting each
    of these CIMT videos involves three operations: (1) select three end-diastolic
    ultrasound frames (EUF) in the video, (2) localize a region of interest (ROI)
    in each selected frame, and (3) trace the lumen-intima interface and the
    media-adventitia interface in each ROI to measure CIMT. These operations are
    tedious, laborious, and time consuming, a serious limitation that hinders the
    widespread utilization of CIMT in clinical practice. To overcome this
    limitation, this paper presents a new system to automate CIMT video
    interpretation. Our extensive experiments demonstrate that the suggested system
    significantly outperforms the state-of-the-art methods. The superior
    performance is attributable to our unified framework based on convolutional
    neural networks (CNNs) coupled with our informative image representation and
    effective post-processing of the CNN outputs, which are uniquely designed for
    each of the above three operations.

    Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?

    Nima Tajbakhsh, Jae Y. Shin, Suryakanth R. Gurudu, R. Todd Hurst, Christopher B. Kendall, Michael B. Gotway, Jianming Liang
    Journal-ref: IEEE Transactions on Medical Imaging. 35(5):1299-1312 (2016)
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Training a deep convolutional neural network (CNN) from scratch is difficult
    because it requires a large amount of labeled training data and a great deal of
    expertise to ensure proper convergence. A promising alternative is to fine-tune
    a CNN that has been pre-trained using, for instance, a large set of labeled
    natural images. However, the substantial differences between natural and
    medical images may advise against such knowledge transfer. In this paper, we
    seek to answer the following central question in the context of medical image
    analysis: emph{Can the use of pre-trained deep CNNs with sufficient
    fine-tuning eliminate the need for training a deep CNN from scratch?} To
    address this question, we considered 4 distinct medical imaging applications in
    3 specialties (radiology, cardiology, and gastroenterology) involving
    classification, detection, and segmentation from 3 different imaging
    modalities, and investigated how the performance of deep CNNs trained from
    scratch compared with the pre-trained CNNs fine-tuned in a layer-wise manner.
    Our experiments consistently demonstrated that (1) the use of a pre-trained CNN
    with adequate fine-tuning outperformed or, in the worst case, performed as well
    as a CNN trained from scratch; (2) fine-tuned CNNs were more robust to the size
    of training sets than CNNs trained from scratch; (3) neither shallow tuning nor
    deep tuning was the optimal choice for a particular application; and (4) our
    layer-wise fine-tuning scheme could offer a practical way to reach the best
    performance for the application at hand based on the amount of available data.


    Information Retrieval

    Hashtag-centric Immersive Search on Social Media

    Yuqi Gao, Jitao Sang, Tongwei Ren, Changsheng Xu
    Subjects: Information Retrieval (cs.IR)

    Social media information distributes in different Online Social Networks
    (OSNs). This paper addresses the problem integrating the cross-OSN information
    to facilitate an immersive social media search experience. We exploit hashtag,
    which is widely used to annotate and organize multi-modal items in different
    OSNs, as the bridge for information aggregation and organization. A three-stage
    solution framework is proposed for hashtag representation, clustering and
    demonstration. Given an event query, the related items from three OSNs,
    Twitter, Flickr and YouTube, are organized in cluster-hashtag-item hierarchy
    for display. The effectiveness of the proposed solution is validated by
    qualitative and quantitative experiments on hundreds of trending event queries.

    Authorship Verification based on Compression-Models

    Oren Halvani, Christian Winter, Lukas Graner
    Subjects: Information Retrieval (cs.IR)

    Compression models represent an interesting approach for different
    classification tasks and have been used widely across many research fields. We
    adapt compression models to the field of authorship verification (AV), a branch
    of digital text forensics. The task in AV is to verify if a questioned document
    and a reference document of a known author are written by the same person. We
    propose an intrinsic AV method, which yields competitive results compared to a
    number of current state-of-the-art approaches, based on support vector machines
    or neural networks. However, in contrast to these approaches our method does
    not make use of machine learning algorithms, natural language processing
    techniques, feature engineering, hyperparameter optimization or external
    documents (a common strategy to transform AV from a one-class to a multi-class
    classification problem). Instead, the only three key components of our method
    are a compressing algorithm, a dissimilarity measure and a threshold, needed to
    accept or reject the authorship of the questioned document. Due to its
    compactness, our method performs very fast and can be reimplemented with
    minimal effort. In addition, the method can handle complicated AV cases where
    both, the questioned and the reference document, are not related to each other
    in terms of topic or genre. We evaluated our approach against publicly
    available datasets, which were used in three international AV competitions.
    Furthermore, we constructed our own corpora, where we evaluated our method
    against state-of-the-art approaches and achieved, in both cases, promising
    results.

    Provenance Filtering for Multimedia Phylogeny

    Allan Pinto, Daniel Moreira, Aparna Bharati, Joel Brogan, Kevin Bowyer, Patrick Flynn, Walter Scheirer, Anderson Rocha
    Comments: 5 pages, Accepted in IEEE International Conference on Image Processing (ICIP), 2017
    Subjects: Information Retrieval (cs.IR); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

    Departing from traditional digital forensics modeling, which seeks to analyze
    single objects in isolation, multimedia phylogeny analyzes the evolutionary
    processes that influence digital objects and collections over time. One of its
    integral pieces is provenance filtering, which consists of searching a
    potentially large pool of objects for the most related ones with respect to a
    given query, in terms of possible ancestors (donors or contributors) and
    descendants. In this paper, we propose a two-tiered provenance filtering
    approach to find all the potential images that might have contributed to the
    creation process of a given query (q). In our solution, the first (coarse) tier
    aims to find the most likely “host” images — the major donor or background
    — contributing to a composite/doctored image. The search is then refined in
    the second tier, in which we search for more specific (potentially small) parts
    of the query that might have been extracted from other images and spliced into
    the query image. Experimental results with a dataset containing more than a
    million images show that the two-tiered solution underpinned by the context of
    the query is highly useful for solving this difficult task.


    Computation and Language

    Prosodic Event Recognition using Convolutional Neural Networks with Context Information

    Sabrina Stehwien, Ngoc Thang Vu
    Comments: Interspeech 2017 4 pages, 1 figure
    Subjects: Computation and Language (cs.CL)

    This paper demonstrates the potential of convolutional neural networks (CNN)
    for detecting and classifying prosodic events on words, specifically pitch
    accents and phrase boundary tones, from frame-based acoustic features. Typical
    approaches use not only feature representations of the word in question but
    also its surrounding context. We show that adding position features indicating
    the current word benefits the CNN. In addition, this paper discusses the
    generalization from a speaker-dependent modelling approach to a
    speaker-independent setup. The proposed method is simple and efficient and
    yields strong results not only in speaker-dependent but also
    speaker-independent cases.

    Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech

    Michael Neumann, Ngoc Thang Vu
    Comments: to appear in the proceedings of Interspeech 2017
    Subjects: Computation and Language (cs.CL)

    Speech emotion recognition is an important and challenging task in the realm
    of human-computer interaction. Prior work proposed a variety of models and
    feature sets for training a system. In this work, we conduct extensive
    experiments using an attentive convolutional neural network with multi-view
    learning objective function. We compare system performance using different
    lengths of the input signal, different types of acoustic features and different
    types of emotion speech (improvised/scripted). Our experimental results on the
    Interactive Emotional Motion Capture (IEMOCAP) database reveal that the
    recognition performance strongly depends on the type of speech data independent
    of the choice of input features. Furthermore, we achieved state-of-the-art
    results on the improvised speech data of IEMOCAP.

    Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora

    Jooyeon Kim, Dongwoo Kim, Alice Oh
    Comments: Accepted by Transactions of the Association for Computational Linguistics (TACL); to appear
    Subjects: Computation and Language (cs.CL); Digital Libraries (cs.DL); Social and Information Networks (cs.SI)

    Much of scientific progress stems from previously published findings, but
    searching through the vast sea of scientific publications is difficult. We
    often rely on metrics of scholarly authority to find the prominent authors but
    these authority indices do not differentiate authority based on research
    topics. We present Latent Topical-Authority Indexing (LTAI) for jointly
    modeling the topics, citations, and topical authority in a corpus of academic
    papers. Compared to previous models, LTAI differs in two main aspects. First,
    it explicitly models the generative process of the citations, rather than
    treating the citations as given. Second, it models each author’s influence on
    citations of a paper based on the topics of the cited papers, as well as the
    citing papers. We fit LTAI to four academic corpora: CORA, Arxiv Physics, PNAS,
    and Citeseer. We compare the performance of LTAI against various baselines,
    starting with the latent Dirichlet allocation, to the more advanced models
    including author-link topic model and dynamic author citation topic model. The
    results show that LTAI achieves improved accuracy over other similar models
    when predicting words, citations and authors of publications.

    Morphological Embeddings for Named Entity Recognition in Morphologically Rich Languages

    Onur Gungor, Eray Yildiz, Suzan Uskudarli, Tunga Gungor
    Comments: Working draft
    Subjects: Computation and Language (cs.CL)

    In this work, we present new state-of-the-art results of 93.59,% and 79.59,%
    for Turkish and Czech named entity recognition based on the model of (Lample et
    al., 2016). We contribute by proposing several schemes for representing the
    morphological analysis of a word in the context of named entity recognition. We
    show that a concatenation of this representation with the word and character
    embeddings improves the performance. The effect of these representation schemes
    on the tagging performance is also investigated.

    Function Assistant: A Tool for NL Querying of APIs

    Kyle Richardson, Jonas Kuhn
    Comments: In submission for EMNLP-2017 (demo track)
    Subjects: Computation and Language (cs.CL)

    In this paper, we describe Function Assistant, a lightweight Python-based
    toolkit for querying and exploring source code repositories using natural
    language. The toolkit is designed to help end-users of a target API quickly
    find information about functions through high-level natural language queries
    and descriptions. For a given text query and background API, the tool finds
    candidate functions by performing a translation from the text to known
    representations in the API using the semantic parsing approach of Richardson
    and Kuhn (2017). Translations are automatically learned from example text-code
    pairs in example APIs. The toolkit includes features for building translation
    pipelines and query engines for arbitrary source code projects. To explore this
    last feature, we perform new experiments on 27 well-known Python projects
    hosted on Github.

    Machine Assisted Analysis of Vowel Length Contrasts in Wolof

    Elodie Gauthier, Laurent Besacier, Sylvie Voisin
    Comments: Accepted to Interspeech 2017
    Subjects: Computation and Language (cs.CL)

    Growing digital archives and improving algorithms for automatic analysis of
    text and speech create new research opportunities for fundamental research in
    phonetics. Such empirical approaches allow statistical evaluation of a much
    larger set of hypothesis about phonetic variation and its conditioning factors
    (among them geographical / dialectal variants). This paper illustrates this
    vision and proposes to challenge automatic methods for the analysis of a not
    easily observable phenomenon: vowel length contrast. We focus on Wolof, an
    under-resourced language from Sub-Saharan Africa. In particular, we propose
    multiple features to make a fine evaluation of the degree of length contrast
    under different factors such as: read vs semi spontaneous speech ; standard vs
    dialectal Wolof. Our measures made fully automatically on more than 20k vowel
    tokens show that our proposed features can highlight different degrees of
    contrast for each vowel considered. We notably show that contrast is weaker in
    semi-spontaneous speech and in a non standard semi-spontaneous dialect.

    NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems

    Ozan Caglayan, Mercedes García-Martínez, Adrien Bardet, Walid Aransa, Fethi Bougares, Loïc Barrault
    Comments: 10 pages, 3 figures
    Subjects: Computation and Language (cs.CL)

    In this paper, we present nmtpy, a flexible Python toolkit based on Theano
    for training Neural Machine Translation and other neural sequence-to-sequence
    architectures. nmtpy decouples the specification of a network from the training
    and inference utilities to simplify the addition of a new architecture and
    reduce the amount of boilerplate code to be written. nmtpy has been used for
    LIUM’s top-ranked submissions to WMT Multimodal Machine Translation and News
    Translation tasks in 2016 and 2017.


    Distributed, Parallel, and Cluster Computing

    CATERPILLAR: Coarse Grain Reconfigurable Architecture for Accelerating the Training of Deep Neural Networks

    Yuanfang Li, Ardavan Pedram
    Comments: 10 pages, 10 figures, ASAP 2017: The 28th Annual IEEE International Conference on Application-specific Systems, Architectures and Processors
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Accelerating the inference of a trained DNN is a well studied subject. In
    this paper we switch the focus to the training of DNNs. The training phase is
    compute intensive, demands complicated data communication, and contains
    multiple levels of data dependencies and parallelism. This paper presents an
    algorithm/architecture space exploration of efficient accelerators to achieve
    better network convergence rates and higher energy efficiency for training
    DNNs. We further demonstrate that an architecture with hierarchical support for
    collective communication semantics provides flexibility in training various
    networks performing both stochastic and batched gradient descent based
    techniques. Our results suggest that smaller networks favor non-batched
    techniques while performance for larger networks is higher using batched
    operations. At 45nm technology, CATERPILLAR achieves performance efficiencies
    of 177 GFLOPS/W at over 80% utilization for SGD training on small networks and
    211 GFLOPS/W at over 90% utilization for pipelined SGD/CP training on larger
    networks using a total area of 103.2 mm(^2) and 178.9 mm(^2) respectively.


    Learning

    Hyperparameter Optimization: A Spectral Approach

    Elad Hazan, Adam Klivans, Yang Yuan
    Subjects: Learning (cs.LG); Optimization and Control (math.OC)

    We give a simple, fast algorithm for hyperparameter optimization inspired by
    techniques from the analysis of Boolean functions. We focus on the
    high-dimensional regime where the canonical example is training a neural
    network with a large number of hyperparameters. The algorithm – an iterative
    application of compressed sensing techniques for orthogonal polynomials –
    requires only uniform sampling of the hyperparameters and is thus easily
    parallelizable. Experiments for training deep nets on Cifar-10 show that
    compared to state-of-the-art tools (e.g., Hyperband and Spearmint), our
    algorithm finds significantly improved solutions, in some cases matching what
    is attainable by hand-tuning. In terms of overall running time (i.e., time
    required to sample various settings of hyperparameters plus additional
    computation time), we are at least an order of magnitude faster than Hyperband
    and even more so compared to Bayesian Optimization. We also outperform Random
    Search 5X. Additionally, our method comes with provable guarantees and yields
    the first quasi-polynomial time algorithm for learning decision trees under the
    uniform distribution with polynomial sample complexity, the first improvement
    in over two decades.

    Learning Bayes networks using interventional path queries in polynomial time and sample complexity

    Kevin Bello, Jean Honorio
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Causal discovery from empirical data is a fundamental problem in many
    scientific domains. Observational data allows for identifiability only up to
    Markov equivalence class. In this paper, we propose a polynomial time algorithm
    for learning the exact structure of Bayesian networks with high probability, by
    using interventional path queries. Each path query takes as input an origin
    node and a target node, and answers whether there is a directed path from the
    origin to the target. This is done by intervening the origin node and observing
    samples from the target node. We theoretically show the logarithmic sample
    complexity for the size of interventional data per path query. Finally, we
    experimentally validate the correctness of our algorithm in synthetic and
    real-world networks.

    Weight Sharing is Crucial to Succesful Optimization

    Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah
    Subjects: Learning (cs.LG)

    Exploiting the great expressive power of Deep Neural Network architectures,
    relies on the ability to train them. While current theoretical work provides,
    mostly, results showing the hardness of this task, empirical evidence usually
    differs from this line, with success stories in abundance. A strong position
    among empirically successful architectures is captured by networks where
    extensive weight sharing is used, either by Convolutional or Recurrent layers.
    Additionally, characterizing specific aspects of different tasks, making them
    “harder” or “easier”, is an interesting direction explored both theoretically
    and empirically. We consider a family of ConvNet architectures, and prove that
    weight sharing can be crucial, from an optimization point of view. We explore
    different notions of the frequency, of the target function, proving necessity
    of the target function having some low frequency components. This necessity is
    not sufficient – only with weight sharing can it be exploited, thus
    theoretically separating architectures using it, from others which do not. Our
    theoretical results are aligned with empirical experiments in an even more
    general setting, suggesting viability of examination of the role played by
    interleaving those aspects in broader families of tasks.

    Robust Deep Learning via Reverse Cross-Entropy Training and Thresholding Test

    Tianyu Pang, Chao Du, Jun Zhu
    Subjects: Learning (cs.LG)

    Though the recent progress is substantial, deep learning methods can be
    vulnerable to the elaborately crafted adversarial samples. In this paper, we
    attempt to improve the robustness by presenting a new training procedure and a
    thresholding test strategy. In training, we propose to minimize the reverse
    cross-entropy, which encourages a deep network to learn latent representations
    that better distinguish adversarial samples from normal ones. In testing, we
    propose to use a thresholding strategy based on a new metric to filter out
    adversarial samples for reliable predictions. Our method is simple to implement
    using standard algorithms, with little extra training cost compared to the
    common cross-entropy minimization. We apply our method to various
    state-of-the-art networks (e.g., residual networks) and we achieve significant
    improvements on robust predictions in the adversarial setting.

    Learning-based Surgical Workflow Detection from Intra-Operative Signals

    Ralf Stauder, Ergün Kayis, Nassir Navab
    Comments: 7 pages, 4 figures
    Subjects: Learning (cs.LG)

    A modern operating room (OR) provides a plethora of advanced medical devices.
    In order to better facilitate the information offered by them, they need to
    automatically react to the intra-operative context. To this end, the progress
    of the surgical workflow must be detected and interpreted, so that the current
    status can be given in machine-readable form. In this work, Random Forests (RF)
    and Hidden Markov Models (HMM) are compared and combined to detect the surgical
    workflow phase of a laparoscopic cholecystectomy. Various combinations of data
    were tested, from using only raw sensor data to filtered and augmented
    datasets. Achieved accuracies ranged from 64% to 72% for the RF approach, and
    from 80% to 82% for the combination of RF and HMM.

    On Unifying Deep Generative Models

    Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Eric P. Xing
    Comments: 12 pages
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Deep generative models have achieved impressive success in recent years.
    Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), as
    powerful frameworks for deep generative model learning, have largely been
    considered as two distinct paradigms and received extensive independent study
    respectively. This paper establishes formal connections between deep generative
    modeling approaches through a new formulation of GANs and VAEs. We show that
    GANs and VAEs are essentially minimizing KL divergences with opposite
    directions and reversed latent/visible treatments, extending the two learning
    phases of classic wake-sleep algorithm, respectively. The unified view provides
    a powerful tool to analyze a diverse set of existing model variants, and
    enables to exchange ideas across research lines in a principled way. For
    example, we transfer the importance weighting method in VAE literatures for
    improved GAN learning, and enhance VAEs with an adversarial mechanism.
    Quantitative experiments show generality and effectiveness of the imported
    extensions.

    PixelGAN Autoencoders

    Alireza Makhzani, Brendan Frey
    Subjects: Learning (cs.LG)

    In this paper, we describe the “PixelGAN autoencoder”, a generative
    autoencoder in which the generative path is a convolutional autoregressive
    neural network on pixels (PixelCNN) that is conditioned on a latent code, and
    the recognition path uses a generative adversarial network (GAN) to impose a
    prior distribution on the latent code. We show that different priors result in
    different decompositions of information between the latent code and the
    autoregressive decoder. For example, by imposing a Gaussian distribution as the
    prior, we can achieve a global vs. local decomposition, or by imposing a
    categorical distribution as the prior, we can disentangle the style and content
    information of images in an unsupervised fashion. We further show how the
    PixelGAN autoencoder with a categorical prior can be directly used in
    semi-supervised settings and achieve competitive semi-supervised classification
    results on the MNIST, SVHN and NORB datasets.

    Discriminative conditional restricted Boltzmann machine for discrete choice and latent variable modelling

    Melvin Wong, Bilal Farooq, Guillaume-Alexandre Bilodeau
    Subjects: Learning (cs.LG)

    Conventional methods of estimating latent behaviour generally use attitudinal
    questions which are subjective and these survey questions may not always be
    available. We hypothesize that an alternative approach can be used for latent
    variable estimation through an undirected graphical models. For instance,
    non-parametric artificial neural networks. In this study, we explore the use of
    generative non-parametric modelling methods to estimate latent variables from
    prior choice distribution without the conventional use of measurement
    indicators. A restricted Boltzmann machine is used to represent latent
    behaviour factors by analyzing the relationship information between the
    observed choices and explanatory variables. The algorithm is adapted for latent
    behaviour analysis in discrete choice scenario and we use a graphical approach
    to evaluate and understand the semantic meaning from estimated parameter vector
    values. We illustrate our methodology on a financial instrument choice dataset
    and perform statistical analysis on parameter sensitivity and stability. Our
    findings show that through non-parametric statistical tests, we can extract
    useful latent information on the behaviour of latent constructs through machine
    learning methods and present strong and significant influence on the choice
    process. Furthermore, our modelling framework shows robustness in input
    variability through sampling and validation.

    Tensor Contraction Layers for Parsimonious Deep Nets

    Jean Kossaifi, Aran Khanna, Zachary C. Lipton, Tommaso Furlanello, Anima Anandkumar
    Subjects: Learning (cs.LG)

    Tensors offer a natural representation for many kinds of data frequently
    encountered in machine learning. Images, for example, are naturally represented
    as third order tensors, where the modes correspond to height, width, and
    channels. Tensor methods are noted for their ability to discover
    multi-dimensional dependencies, and tensor decompositions in particular, have
    been used to produce compact low-rank approximations of data. In this paper, we
    explore the use of tensor contractions as neural network layers and investigate
    several ways to apply them to activation tensors. Specifically, we propose the
    Tensor Contraction Layer (TCL), the first attempt to incorporate tensor
    contractions as end-to-end trainable neural network layers. Applied to existing
    networks, TCLs reduce the dimensionality of the activation tensors and thus the
    number of model parameters. We evaluate the TCL on the task of image
    recognition, augmenting two popular networks (AlexNet, VGG). The resulting
    models are trainable end-to-end. Applying the TCL to the task of image
    recognition, using the CIFAR100 and ImageNet datasets, we evaluate the effect
    of parameter reduction via tensor contraction on performance. We demonstrate
    significant model compression without significant impact on the accuracy and,
    in some cases, improved performance.

    Parameter identification in Markov chain choice models

    Arushi Gupta, Daniel Hsu
    Subjects: Statistics Theory (math.ST); Learning (cs.LG); Machine Learning (stat.ML)

    This work studies the parameter identification problem for the Markov chain
    choice model of Blanchet, Gallego, and Goyal used in assortment planning. In
    this model, the product selected by a customer is determined by a Markov chain
    over the products, where the products in the offered assortment are absorbing
    states. The underlying parameters of the model were previously shown to be
    identifiable from the choice probabilities for the all-products assortment,
    together with choice probabilities for assortments of all-but-one products.
    Obtaining and estimating choice probabilities for such large assortments is not
    desirable in many settings. The main result of this work is that the parameters
    may be identified from assortments of sizes two and three, regardless of the
    total number of products. The result is obtained via a simple and efficient
    parameter recovery algorithm.

    Dataflow Matrix Machines as a Model of Computations with Linear Streams

    Michael Bukatin, Jon Anthony
    Comments: 6 pages, accepted for presentation at LearnAut 2017: Learning and Automata workshop at LICS (Logic in Computer Science) 2017 conference. Preprint original version: April 9, 2017; minor correction: May 1, 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Programming Languages (cs.PL)

    We overview dataflow matrix machines as a Turing complete generalization of
    recurrent neural networks and as a programming platform. We describe vector
    space of finite prefix trees with numerical leaves which allows us to combine
    expressive power of dataflow matrix machines with simplicity of traditional
    recurrent neural networks.

    Bias-Variance Tradeoff of Graph Laplacian Regularizer

    Pin-Yu Chen, Sijia Liu
    Comments: accepted by IEEE Signal Processing Letters
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Social and Information Networks (cs.SI)

    This paper presents a bias-variance tradeoff of graph Laplacian regularizer,
    which is widely used in graph signal processing and semi-supervised learning
    tasks. The scaling law of the optimal regularization parameter is specified in
    terms of the spectral graph properties and a novel signal-to-noise ratio
    parameter, which suggests selecting a mediocre regularization parameter is
    often suboptimal. The analysis is applied to three applications, including
    random, band-limited, and multiple-sampled graph signals. Experiments on
    synthetic and real-world graphs demonstrate near-optimal performance of the
    established analysis.

    CATERPILLAR: Coarse Grain Reconfigurable Architecture for Accelerating the Training of Deep Neural Networks

    Yuanfang Li, Ardavan Pedram
    Comments: 10 pages, 10 figures, ASAP 2017: The 28th Annual IEEE International Conference on Application-specific Systems, Architectures and Processors
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Accelerating the inference of a trained DNN is a well studied subject. In
    this paper we switch the focus to the training of DNNs. The training phase is
    compute intensive, demands complicated data communication, and contains
    multiple levels of data dependencies and parallelism. This paper presents an
    algorithm/architecture space exploration of efficient accelerators to achieve
    better network convergence rates and higher energy efficiency for training
    DNNs. We further demonstrate that an architecture with hierarchical support for
    collective communication semantics provides flexibility in training various
    networks performing both stochastic and batched gradient descent based
    techniques. Our results suggest that smaller networks favor non-batched
    techniques while performance for larger networks is higher using batched
    operations. At 45nm technology, CATERPILLAR achieves performance efficiencies
    of 177 GFLOPS/W at over 80% utilization for SGD training on small networks and
    211 GFLOPS/W at over 90% utilization for pipelined SGD/CP training on larger
    networks using a total area of 103.2 mm(^2) and 178.9 mm(^2) respectively.

    Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks

    Alberto Delmas, Patrick Judd, Sayeh Sharify, Andreas Moshovos
    Comments: 3 pages, 3 figures
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Stripes is a Deep Neural Network (DNN) accelerator that uses bit-serial
    computation to offer performance that is proportional to the fixed-point
    precision of the activation values. The fixed-point precisions are determined a
    priori using profiling and are selected at a per layer granularity. This paper
    presents Dynamic Stripes, an extension to Stripes that detects precision
    variance at runtime and at a finer granularity. This extra level of precision
    reduction increases performance by 41% over Stripes.

    Personalized Pancreatic Tumor Growth Prediction via Group Learning

    Ling Zhang, Le Lu, Ronald M. Summers, Electron Kebebew, Jianhua Yao
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Tumor growth prediction, a highly challenging task, has long been viewed as a
    mathematical modeling problem, where the tumor growth pattern is personalized
    based on imaging and clinical data of a target patient. Though mathematical
    models yield promising results, their prediction accuracy may be limited by the
    absence of population trend data and personalized clinical characteristics. In
    this paper, we propose a statistical group learning approach to predict the
    tumor growth pattern that incorporates both the population trend and
    personalized data, in order to discover high-level features from multimodal
    imaging data. A deep convolutional neural network approach is developed to
    model the voxel-wise spatio-temporal tumor progression. The deep features are
    combined with the time intervals and the clinical factors to feed a process of
    feature selection. Our predictive model is pretrained on a group data set and
    personalized on the target patient data to estimate the future spatio-temporal
    progression of the patient’s tumor. Multimodal imaging data at multiple time
    points are used in the learning, personalization and inference stages. Our
    method achieves a Dice coefficient of 86.8% +- 3.6% and RVD of 7.9% +- 5.4% on
    a pancreatic tumor data set, outperforming the DSC of 84.4% +- 4.0% and RVD
    13.9% +- 9.8% obtained by a previous state-of-the-art model-based method.

    The Mixing method: coordinate descent for low-rank semidefinite programming

    Po-Wei Wang, Wei-Cheng Chang, J. Zico Kolter
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)

    In this paper, we propose a coordinate descent approach to low-rank
    structured semidefinite programming. The approach, which we call the Mixing
    method, is extremely simple to implement, has no free parameters, and typically
    attains an order of magnitude or better improvement in optimization performance
    over the current state of the art. We show that for certain problems, the
    method is strictly decreasing and guaranteed to converge to a critical point.
    We then apply the algorithm to three separate domains: solving the maximum cut
    semidefinite relaxation, solving a (novel) maximum satisfiability relaxation,
    and solving the GloVe word embedding optimization problem. In all settings, we
    demonstrate improvement over the existing state of the art along various
    dimensions. In total, this work substantially expands the scope and scale of
    problems that can be solved using semidefinite programming methods.

    Deep Learning: A Bayesian Perspective

    Nicholas Polson, Vadim Sokolov
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Methodology (stat.ME)

    Deep learning is a form of machine learning for nonlinear high dimensional
    data reduction and prediction. A Bayesian probabilistic perspective provides a
    number of advantages. Specifically statistical interpretation and properties,
    more efficient algorithms for optimisation and hyper-parameter tuning, and an
    explanation of predictive performance. Traditional high-dimensional statistical
    techniques; principal component analysis (PCA), partial least squares (PLS),
    reduced rank regression (RRR), projection pursuit regression (PPR) are shown to
    be shallow learners. Their deep learning counterparts exploit multiple layers
    of of data reduction which leads to performance gains. Stochastic gradient
    descent (SGD) training and optimisation and Dropout (DO) provides model and
    variable selection. Bayesian regularization is central to finding networks and
    provides a framework for optimal bias-variance trade-off to achieve good out-of
    sample performance. Constructing good Bayesian predictors in high dimensions is
    discussed. To illustrate our methodology, we provide an analysis of first time
    international bookings on Airbnb. Finally, we conclude with directions for
    future research.


    Information Theory

    Double-Edge Factor Graphs: Definition, Properties, and Examples

    Michael X. Cao, Pascal O. Vontobel
    Comments: Submitted
    Subjects: Information Theory (cs.IT); Quantum Physics (quant-ph)

    Some of the most interesting quantities associated with a factor graph are
    its marginals and its partition sum. For factor graphs emph{without cycles}
    and moderate message update complexities, the sum-product algorithm (SPA) can
    be used to efficiently compute these quantities exactly. Moreover, for various
    classes of factor graphs emph{with cycles}, the SPA has been successfully
    applied to efficiently compute good approximations to these quantities. Note
    that in the case of factor graphs with cycles, the local functions are usually
    non-negative real-valued functions. In this paper we introduce a class of
    factor graphs, called double-edge factor graphs (DE-FGs), which allow local
    functions to be complex-valued and only require them, in some suitable sense,
    to be positive semi-definite. We discuss various properties of the SPA when
    running it on DE-FGs and we show promising numerical results for various
    example DE-FGs, some of which have connections to quantum information
    processing.

    Millimeter Wave LOS Coverage Enhancements with Coordinated High-Rise Access Points

    Yinan Qi, Mythri Hunukumbure, Yue Wang
    Comments: 6 pages, 9 figures, conference
    Subjects: Information Theory (cs.IT)

    Millimetre wave (mm-wave) communication is considered as one of the most
    important enablers for the fifth generation communication (5G) system to
    support data rate of Gbps and above. In some scenarios, it is crucial to
    maintain a line of sight (LOS) link for users enjoying 5G immersive experiences
    and thus requiring very high data rate. In this paper, we investigate the LOS
    probability in mm-wave systems. In particular, we study the impact of access
    point (AP) and blockage height on the LOS probability and propose a solution to
    effectively enhance the LOS coverage by using high-rise APs on top of low-rise
    APs normally installed on street furniture, e.g., lamp poles. Two deployment
    options are explored: 1) irregular deployment and 2) regular deployment, where
    LOS probability is derived for both cases. Simulation results show that the
    impact of AP height on LOS probability is significant and using coordinated
    high-rise APs jointly deployed with low-rise APs will substantially improve the
    LOS probability.

    The role of asymptotic functions in network optimization and feasibility studies

    R. L. G. Cavalcante, S. Stanczak
    Comments: Submitted to GlobalSIP 2017
    Subjects: Information Theory (cs.IT)

    Solutions to network optimization problems, whether distributed or
    centralized, have greatly benefited from developments in nonlinear analysis,
    and, in particular, from developments in convex optimization. A key concept
    that has made convex and nonconvex analysis an important tool in science and
    engineering is the notion of asymptotic function, which is often hidden in many
    influential studies on nonlinear analysis and related fields. Therefore, we can
    also expect that asymptotic functions are deeply connected to many results in
    the wireless domain, even though they are rarely mentioned in the wireless
    literature. In this study, we show connections of this type. By doing so, we
    explain many properties of centralized and distributed solutions to wireless
    resource allocation problems within a unified framework, and we also generalize
    and unify existing approaches to feasibility analysis of network designs.

    Exploiting Multiple-Antenna Techniques for Non-Orthogonal Multiple Access

    Xiaoming Chen, Zhaoyang Zhang, Caijun Zhong, Derrick Wing Kwan Ng
    Subjects: Information Theory (cs.IT)

    This paper aims to provide a comprehensive solution for the design, analysis,
    and optimization of a multiple-antenna non-orthogonal multiple access (NOMA)
    system for multiuser downlink communication with both time duplex division
    (TDD) and frequency duplex division (FDD) modes. First, we design a new
    framework for multiple-antenna NOMA, including user clustering, channel state
    information (CSI) acquisition, superposition coding, transmit beamforming, and
    successive interference cancellation (SIC). Then, we analyze the performance of
    the considered system, and derive exact closed-form expressions for average
    transmission rates in terms of transmit power, CSI accuracy, transmission mode,
    and channel conditions. For further enhancing the system performance, we
    optimize three key parameters, i.e., transmit power, feedback bits, and
    transmission mode. Especially, we propose a low-complexity joint optimization
    scheme, so as to fully exploit the potential of multiple-antenna techniques in
    NOMA. Moreover, through asymptotic analysis, we reveal the impact of system
    parameters on average transmission rates, and hence present some guidelines on
    the design of multiple-antenna NOMA. Finally, simulation results validate our
    theoretical analysis, and show that a substantial performance gain can be
    obtained over traditional orthogonal multiple access (OMA) technology under
    practical conditions.

    Generic Secure Repair for Distributed Storage

    Wentao Huang, Jehoshua Bruck
    Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)

    This paper studies the problem of repairing secret sharing schemes, i.e.,
    schemes that encode a message into (n) shares, assigned to (n) nodes, so that
    any (n-r) nodes can decode the message but any colluding (z) nodes cannot infer
    any information about the message. In the event of node failures so that shares
    held by the failed nodes are lost, the system needs to be repaired by
    reconstructing and reassigning the lost shares to the failed (or replacement)
    nodes. This can be achieved trivially by a trustworthy third-party that
    receives the shares of the available nodes, recompute and reassign the lost
    shares. The interesting question, studied in the paper, is how to repair
    without a trustworthy third-party. The main issue that arises is repair
    security: how to maintain the requirement that any colluding (z) nodes,
    including the failed nodes, cannot learn any information about the message,
    during and after the repair process? We solve this secure repair problem from
    the perspective of secure multi-party computation. Specifically, we design
    generic repair schemes that can securely repair any (scalar or vector) linear
    secret sharing schemes. We prove a lower bound on the repair bandwidth of
    secure repair schemes and show that the proposed secure repair schemes achieve
    the optimal repair bandwidth up to a small constant factor when (n) dominates
    (z), or when the secret sharing scheme being repaired has optimal rate. We
    adopt a formal information-theoretic approach in our analysis and bounds. A
    main idea in our schemes is to allow a more flexible repair model than the
    straightforward one-round repair model implicitly assumed by existing secure
    regenerating codes. Particularly, the proposed secure repair schemes are simple
    and efficient two-round protocols.

    Streaming Bayesian inference: theoretical limits and mini-batch approximate message-passing

    Andre Manoel, Florent Krzakala, Eric W. Tramel, Lenka Zdeborová
    Comments: 19 pages, 4 figures
    Subjects: Machine Learning (stat.ML); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT)

    In statistical learning for real-world large-scale data problems, one must
    often resort to “streaming” algorithms which operate sequentially on small
    batches of data. In this work, we present an analysis of the
    information-theoretic limits of mini-batch inference in the context of
    generalized linear models and low-rank matrix factorization. In a controlled
    Bayes-optimal setting, we characterize the optimal performance and phase
    transitions as a function of mini-batch size. We base part of our results on a
    detailed analysis of a mini-batch version of the approximate message-passing
    algorithm (Mini-AMP), which we introduce. Additionally, we show that this
    theoretical optimality carries over into real-data problems by illustrating
    that Mini-AMP is competitive with standard streaming algorithms for clustering.

    Testing Gaussian Process with Applications to Super-Resolution

    Jean-Marc Azaïs, Yohann De Castro, Stéphane Mourareau
    Comments: 29 pages, 4 figures
    Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Probability (math.PR)

    This article introduces new testing procedures on the mean of a stationary
    Gaussian process. Our test statistics are exact and derived from the outcomes
    of total variation minimization on the space of complex valued measures. Two
    testing procedures are presented, the first one is based on thin grids (we show
    that this testing procedure is unbiased) and the second one is based on maxima
    of the Gaussian process. We show that both procedures can be performed even if
    the variance is unknown. These procedures can be used for the problem of
    deconvolution over the space of complex valued measures, and applications in
    frame of the Super-Resolution theory are presented.

    Quantum key distribution protocol with pseudorandom bases

    A.S. Trushechkin, P.A. Tregubov, E.O. Kiktenko, Y.V. Kurochkin, A.K. Fedorov
    Comments: 16 pages, 4 figures; comments are welcome
    Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Information Theory (cs.IT)

    Quantum key distribution (QKD) offers a way for establishing
    information-theoretically secure communications. An important part of QKD
    technology is a high-quality random number generator (RNG) for quantum states
    preparation and for post-processing procedures. In the present work, we
    consider a novel class of prepare-and-measure QKD protocols, utilizing
    additional pseudorandomness in the preparation of quantum states. We study one
    of such protocols and analyze its security against the intercept-resend attack.
    We demonstrate that, for single-photon sources, the considered protocol gives
    better secret key rates than the BB84 and the asymmetric BB84 protocol.
    However, the protocol strongly requires single-photon sources.

    The Entropy Power Inequality with quantum memory

    Giacomo De Palma, Dario Trevisan
    Subjects: Mathematical Physics (math-ph); Information Theory (cs.IT); Probability (math.PR); Quantum Physics (quant-ph)

    We prove the Entropy Power Inequality for Gaussian quantum systems in the
    presence of quantum memory. This fundamental inequality determines the minimum
    quantum conditional von Neumann entropy of the output of the beam-splitter or
    of the squeezing among all the input states where the two inputs are
    conditionally independent given the memory and have given quantum conditional
    entropies. We also prove that, for any couple of values of the quantum
    conditional entropies of the two inputs, the minimum of the quantum conditional
    entropy of the output given by the quantum conditional Entropy Power Inequality
    is asymptotically achieved by a suitable sequence of quantum Gaussian input
    states. Our proof of the quantum conditional Entropy Power Inequality is based
    on a new Stam inequality for the quantum conditional Fisher information and on
    the determination of the universal asymptotic behaviour of the quantum
    conditional entropy under the heat semigroup evolution. The beam-splitter and
    the squeezing are the central elements of quantum optics, and can model the
    attenuation, the amplification and the noise of electromagnetic signals. This
    quantum conditional Entropy Power Inequality will have a strong impact in
    quantum information and quantum cryptography, and we exploit it to prove an
    upper bound to the entanglement-assisted classical capacity of a non-Gaussian
    quantum channel.




沪ICP备19023445号-2号
友情链接