IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Thu, 6 Apr 2017

    我爱机器学习(52ml.net)发表于 2017-04-06 00:00:00
    love 0

    Neural and Evolutionary Computing

    DyVEDeep: Dynamic Variable Effort Deep Neural Networks

    Sanjay Ganapathy, Swagath Venkataramani, Balaraman Ravindran, Anand Raghunathan
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Deep Neural Networks (DNNs) have advanced the state-of-the-art in a variety
    of machine learning tasks and are deployed in increasing numbers of products
    and services. However, the computational requirements of training and
    evaluating large-scale DNNs are growing at a much faster pace than the
    capabilities of the underlying hardware platforms that they are executed upon.
    In this work, we propose Dynamic Variable Effort Deep Neural Networks
    (DyVEDeep) to reduce the computational requirements of DNNs during inference.
    Previous efforts propose specialized hardware implementations for DNNs,
    statically prune the network, or compress the weights. Complementary to these
    approaches, DyVEDeep is a dynamic approach that exploits the heterogeneity in
    the inputs to DNNs to improve their compute efficiency with comparable
    classification accuracy. DyVEDeep equips DNNs with dynamic effort mechanisms
    that, in the course of processing an input, identify how critical a group of
    computations are to classify the input. DyVEDeep dynamically focuses its
    compute effort only on the critical computa- tions, while skipping or
    approximating the rest. We propose 3 effort knobs that operate at different
    levels of granularity viz. neuron, feature and layer levels. We build DyVEDeep
    versions for 5 popular image recognition benchmarks – one for CIFAR-10 and four
    for ImageNet (AlexNet, OverFeat and VGG-16, weight-compressed AlexNet). Across
    all benchmarks, DyVEDeep achieves 2.1x-2.6x reduction in the number of scalar
    operations, which translates to 1.8x-2.3x performance improvement over a
    Caffe-based implementation, with < 0.5% loss in accuracy.

    Best Practices for Applying Deep Learning to Novel Applications

    Leslie N. Smith
    Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    This report is targeted to groups who are subject matter experts in their
    application but deep learning novices. It contains practical advice for those
    interested in testing the use of deep neural networks on applications that are
    novel for deep learning. We suggest making your project more manageable by
    dividing it into phases. For each phase this report contains numerous
    recommendations and insights to assist novice practitioners.

    Deep Learning and Quantum Physics : A Fundamental Bridge

    Yoav Levine, David Yakira, Nadav Cohen, Amnon Shashua
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Quantum Physics (quant-ph)

    Deep convolutional networks have witnessed unprecedented success in various
    machine learning applications. Formal understanding on what makes these
    networks so successful is gradually unfolding, but for the most part there are
    still significant mysteries to unravel. The inductive bias, which reflects
    prior knowledge embedded in the network architecture, is one of them. In this
    work, we establish a fundamental connection between the fields of quantum
    physics and deep learning. We use this connection for asserting novel
    theoretical observations regarding the role that the number of channels in each
    layer of the convolutional network fulfills in the overall inductive bias.
    Specifically, we show an equivalence between the function realized by a deep
    convolutional arithmetic circuit (ConvAC) and a quantum many-body wave
    function, which relies on their common underlying tensorial structure. This
    facilitates the use of quantum entanglement measures as well-defined
    quantifiers of a deep network’s expressive ability to model intricate
    correlation structures of its inputs. Most importantly, the construction of a
    deep ConvAC in terms of a Tensor Network is made available. This description
    enables us to carry a graph-theoretic analysis of a convolutional network, with
    which we demonstrate a direct control over the inductive bias of the deep
    network via its channel numbers, that are related min-cut in the underlying
    graph. This result is relevant to any practitioner designing a convolutional
    network for a specific task. We theoretically analyze ConvACs, and empirically
    validate our findings on more common ConvNets which involve ReLU activations
    and max pooling. Beyond the results described above, the description of a deep
    convolutional network in well-defined graph-theoretic tools and the formal
    connection to quantum entanglement, are two interdisciplinary bridges that are
    brought forth by this work.

    MIT at SemEval-2017 Task 10: Relation Extraction with Convolutional Neural Networks

    Ji Young Lee, Franck Dernoncourt, Peter Szolovits
    Comments: Accepted at SemEval 2017. The first two authors contributed equally to this work
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Over 50 million scholarly articles have been published: they constitute a
    unique repository of knowledge. In particular, one may infer from them
    relations between scientific concepts, such as synonyms and hyponyms.
    Artificial neural networks have been recently explored for relation extraction.
    In this work, we continue this line of work and present a system based on a
    convolutional neural network to extract relations. Our model ranked first in
    the SemEval-2017 task 10 (ScienceIE) for relation extraction in scientific
    articles (subtask C).

    Learning to Generate Reviews and Discovering Sentiment

    Alec Radford, Rafal Jozefowicz, Ilya Sutskever
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    We explore the properties of byte-level recurrent language models. When given
    sufficient amounts of capacity, training data, and compute time, the
    representations learned by these models include disentangled features
    corresponding to high-level concepts. Specifically, we find a single unit which
    performs sentiment analysis. These representations, learned in an unsupervised
    manner, achieve state of the art on the binary subset of the Stanford Sentiment
    Treebank. They are also very data efficient. When using only a handful of
    labeled examples, our approach matches the performance of strong baselines
    trained on full datasets. We also demonstrate the sentiment unit has a direct
    influence on the generative process of the model. Simply fixing its value to be
    positive or negative generates samples with the corresponding positive or
    negative sentiment.


    Computer Vision and Pattern Recognition

    Generating Descriptions with Grounded and Co-Referenced People

    Anna Rohrbach, Marcus Rohrbach, Siyu Tang, Seong Joon Oh, Bernt Schiele
    Comments: Accepted to CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Learning how to generate descriptions of images or videos received major
    interest both in the Computer Vision and Natural Language Processing
    communities. While a few works have proposed to learn a grounding during the
    generation process in an unsupervised way (via an attention mechanism), it
    remains unclear how good the quality of the grounding is and whether it
    benefits the description quality. In this work we propose a movie description
    model which learns to generate description and jointly ground (localize) the
    mentioned characters as well as do visual co-reference resolution between pairs
    of consecutive sentences/clips. We also propose to use weak localization
    supervision through character mentions provided in movie descriptions to learn
    the character grounding. At training time, we first learn how to localize
    characters by relating their visual appearance to mentions in the descriptions
    via a semi-supervised approach. We then provide this (noisy) supervision into
    our description model which greatly improves its performance. Our proposed
    description model improves over prior work w.r.t. generated description quality
    and additionally provides grounding and local co-reference resolution. We
    evaluate it on the MPII Movie Description dataset using automatic and human
    evaluation measures and using our newly collected grounding and co-reference
    data for characters.

    Isotropic reconstruction of 3D fluorescence microscopy images using convolutional neural networks

    Martin Weigert, Loic Royer, Florian Jug, Gene Myers
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Fluorescence microscopy images usually show severe anisotropy in axial versus
    lateral resolution. This hampers downstream processing, i.e. the automatic
    extraction of quantitative biological data. While deconvolution methods and
    other techniques to address this problem exist, they are either time consuming
    to apply or limited in their ability to remove anisotropy. We propose a method
    to recover isotropic resolution from readily acquired anisotropic data. We
    achieve this using a convolutional neural network that is trained end-to-end
    from the same anisotropic body of data we later apply the network to. The
    network effectively learns to restore the full isotropic resolution by
    restoring the image under a trained, sample specific image prior. We apply our
    method to (3) synthetic and (3) real datasets and show that our results improve
    on results from deconvolution and state-of-the-art super-resolution techniques.
    Finally, we demonstrate that a standard 3D segmentation pipeline performs on
    the output of our network with comparable accuracy as on the full isotropic
    data.

    Weakly Supervised Dense Video Captioning

    Zhiqiang Shen, Jianguo Li, Zhou Su, Minjun Li, Yurong Chen, Yu-Gang Jiang, Xiangyang Xue
    Comments: To appear in CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper focuses on a novel and challenging vision task, dense video
    captioning, which aims to automatically describe a video clip with multiple
    informative and diverse caption sentences. The proposed method is trained
    without explicit annotation of fine-grained sentence to video region-sequence
    correspondence, but is only based on weak video-level sentence annotations. It
    differs from existing video captioning systems in three technical aspects.
    First, we propose lexical fully convolutional neural networks (Lexical-FCN)
    with weakly supervised multi-instance multi-label learning to weakly link video
    regions with lexical labels. Second, we introduce a novel submodular
    maximization scheme to generate multiple informative and diverse
    region-sequences based on the Lexical-FCN outputs. A winner-takes-all scheme is
    adopted to weakly associate sentences to region-sequences in the training
    phase. Third, a sequence-to-sequence learning based language model is trained
    with the weakly supervised information obtained through the association
    process. We show that the proposed method can not only produce informative and
    diverse dense captions, but also outperform state-of-the-art single video
    captioning methods by a large margin.

    Convolutional Neural Networks for Page Segmentation of Historical Document Images

    Kai Chen, Mathias Seuret
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    This paper presents a Convolutional Neural Network (CNN) based page
    segmentation method for handwritten historical document images. We consider
    page segmentation as a pixel labeling problem, i.e., each pixel is classified
    as one of the predefined classes. Traditional methods in this area rely on
    carefully hand-crafted features or large amounts of prior knowledge. In
    contrast, we propose to learn features from raw image pixels using a CNN. While
    many researchers focus on developing deep CNN architectures to solve different
    problems, we train a simple CNN with only one convolution layer. We show that
    the simple architecture achieves competitive results against other deep
    architectures on different public datasets. Experiments also demonstrate the
    effectiveness and superiority of the proposed method compared to previous
    methods.

    Automatic Breast Ultrasound Image Segmentation: A Survey

    Min Xian, Yingtao Zhang, H.D. Cheng, Fei Xu, Boyu Zhang, Jianrui Ding
    Comments: 71 pages, 6 tables, 166 references
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Breast cancer is one of the leading causes of cancer death among women
    worldwide. In clinical routine, automatic breast ultrasound (BUS) image
    segmentation is very challenging and essential for cancer diagnosis and
    treatment planning. Many BUS segmentation approaches have been studied in the
    last two decades, and have been proved to be effective on private datasets.
    Currently, the advancement of BUS image segmentation seems to meet its
    bottleneck. The improvement of the performance is increasingly challenging, and
    only few new approaches were published in the last several years. It is the
    time to look at the field by reviewing previous approaches comprehensively and
    to investigate the future directions. In this paper, we study the basic ideas,
    theories, pros and cons of the approaches, group them into categories, and
    extensively review each category in depth by discussing the principles,
    application issues, and advantages/disadvantages.

    A Unified Multi-Faceted Video Summarization System

    Anurag Sahoo, Vishal Kaushal, Khoshrav Doctor, Suyash Shetty, Rishabh Iyer, Ganesh Ramakrishnan
    Comments: 18 pages, 11 Figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Discrete Mathematics (cs.DM)

    This paper addresses automatic summarization and search in visual data
    comprising of videos, live streams and image collections in a unified manner.
    In particular, we propose a framework for multi-faceted summarization which
    extracts key-frames (image summaries), skims (video summaries) and entity
    summaries (summarization at the level of entities like objects, scenes, humans
    and faces in the video). The user can either view these as extractive
    summarization, or query focused summarization. Our approach first pre-processes
    the video or image collection once, to extract all important visual features,
    following which we provide an interactive mechanism to the user to summarize
    the video based on their choice. We investigate several diversity, coverage and
    representation models for all these problems, and argue the utility of these
    different mod- els depending on the application. While most of the prior work
    on submodular summarization approaches has focused on combining several models
    and learning weighted mixtures, we focus on the explain-ability of different
    the diversity, coverage and representation models and their scalability. Most
    importantly, we also show that we can summarize hours of video data in a few
    seconds, and our system allows the user to generate summaries of various
    lengths and types interactively on the fly.

    Effect of Super Resolution on High Dimensional Features for Unsupervised Face Recognition in the Wild

    Ahmed ElSayed, Ausif Mahmood, Tarek Sobh
    Comments: Submitted to ICIP 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Majority of the face recognition algorithms use query faces captured from
    uncontrolled, in the wild, environment. Often caused by the cameras limited
    capabilities, it is common for these captured facial images to be blurred or
    low resolution. Super resolution algorithms are therefore crucial in improving
    the resolution of such images especially when the image size is small requiring
    enlargement. This paper aims to demonstrate the effect of one of the
    state-of-the-art algorithms in the field of image super resolution. To
    demonstrate the functionality of the algorithm, various before and after 3D
    face alignment cases are provided using the images from the Labeled Faces in
    the Wild (lfw). Resulting images are subject to testing on a closed set face
    recognition protocol using unsupervised algorithms with high dimension
    extracted features. The inclusion of super resolution algorithm resulted in
    significant improved recognition rate over recently reported results obtained
    from unsupervised algorithms.

    Non-Convex Weighted Lp Minimization based Group Sparse Representation Framework for Image Denoising

    Qiong Wang, Xinggan Zhang, Yu Wu, Yechao Bai, Lan Tang, Zhiyuan Zha
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Nonlocal image representation or group sparsity has attracted considerable
    interest in various low-level vision tasks and has led to several
    state-of-the-art image denoising techniques, such as BM3D, LSSC. In the past,
    convex optimization with sparsity-promoting convex regularization was usually
    regarded as a standard scheme for estimating sparse signals in noise. However,
    using convex regularization can not still obtain the correct sparsity solution
    under some practical problems including image inverse problems. In this paper
    we propose a non-convex weighted (ell_p) minimization based group sparse
    representation (GSR) framework for image denoising. To make the proposed scheme
    tractable and robust, the generalized soft-thresholding (GST) algorithm is
    adopted to solve the non-convex (ell_p) minimization problem. In addition, to
    improve the accuracy of the nonlocal similar patches selection, an adaptive
    patch search (APS) scheme is proposed. Experimental results have demonstrated
    that the proposed approach not only outperforms many state-of-the-art denoising
    methods such as BM3D and WNNM, but also results in a competitive speed.

    The UMCD Dataset

    Danilo Avola, Gian Luca Foresti, Niki Martinel, Daniele Pannone, Claudio Piciarelli
    Comments: 3 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In recent years, the technological improvements of low-cost small-scale
    Unmanned Aerial Vehicles (UAVs) are promoting an ever-increasing use of them in
    different tasks. In particular, the use of small-scale UAVs is useful in all
    these low-altitude tasks in which common UAVs cannot be adopted, such as
    recurrent comprehensive view of wide environments, frequent monitoring of
    military areas, real-time classification of static and moving entities (e.g.,
    people, cars, etc.). These tasks can be supported by mosaicking and change
    detection algorithms achieved at low-altitude. Currently, public datasets for
    testing these algorithms are not available. This paper presents the UMCD
    dataset, the first collection of geo-referenced video sequences acquired at
    low-altitude for mosaicking and change detection purposes. Five reference
    scenarios are also reported.

    On the Relation between Color Image Denoising and Classification

    Jiqing Wu, Radu Timofte, Zhiwu Huang, Luc Van Gool
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Large amount of image denoising literature focuses on single channel images
    and often experimentally validates the proposed methods on tens of images at
    most. In this paper, we investigate the interaction between denoising and
    classification on large scale dataset. Inspired by classification models, we
    propose a novel deep learning architecture for color (multichannel) image
    denoising and report on thousands of images from ImageNet dataset as well as
    commonly used imagery. We study the importance of (sufficient) training data,
    how semantic class information can be traded for improved denoising results. As
    a result, our method greatly improves PSNR performance by 0.34 – 0.51 dB on
    average over state-of-the art methods on large scale dataset. We conclude that
    it is beneficial to incorporate in classification models. On the other hand, we
    also study how noise affect classification performance. In the end, we come to
    a number of interesting conclusions, some being counter-intuitive.

    Incremental Tube Construction for Human Action Detection

    Harkirat S. Behl, Michael Sapienza, Gurkirt Singh, Suman Saha, Fabio Cuzzolin, Philip H. S. Torr
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Current state-of-the-art action detection systems are tailored for offline
    batch-processing applications. However, for online applications like
    human-robot interaction, current systems fall short, either because they only
    detect one action per video, or because they assume that the entire video is
    available ahead of time. In this work, we introduce a real-time and online
    joint-labelling and association algorithm for action detection that can
    incrementally construct space-time action tubes on the most challenging action
    videos in which different action categories occur concurrently. In contrast to
    previous methods, we solve the detection-window association and action
    labelling problems jointly in a single pass. We demonstrate superior online
    association accuracy and speed (2.2ms per frame) as compared to the current
    state-of-the-art offline systems. We further demonstrate that the entire action
    detection pipeline can easily be made to work effectively in real-time using
    our action tube construction algorithm.

    Not All Pixels Are Equal: Difficulty-aware Semantic Segmentation via Deep Layer Cascade

    Xiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, Xiaoou Tang
    Comments: To appear in CVPR 2017 as a spotlight paper
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We propose a novel deep layer cascade (LC) method to improve the accuracy and
    speed of semantic segmentation. Unlike the conventional model cascade (MC) that
    is composed of multiple independent models, LC treats a single deep model as a
    cascade of several sub-models. Earlier sub-models are trained to handle easy
    and confident regions, and they progressively feed-forward harder regions to
    the next sub-model for processing. Convolutions are only calculated on these
    regions to reduce computations. The proposed method possesses several
    advantages. First, LC classifies most of the easy regions in the shallow stage
    and makes deeper stage focuses on a few hard regions. Such an adaptive and
    ‘difficulty-aware’ learning improves segmentation performance. Second, LC
    accelerates both training and testing of deep network thanks to early decisions
    in the shallow stage. Third, in comparison to MC, LC is an end-to-end trainable
    framework, allowing joint learning of all sub-models. We evaluate our method on
    PASCAL VOC and Cityscapes datasets, achieving state-of-the-art performance and
    fast speed.

    Automated Diagnosis of Epilepsy Employing Multifractal Detrended Fluctuation Analysis Based Features

    S Pratiher, S Chatterjee, R Bose
    Comments: 20 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Adaptation and Self-Organizing Systems (nlin.AO); Chaotic Dynamics (nlin.CD); Quantitative Methods (q-bio.QM); Other Statistics (stat.OT)

    This contribution reports an application of MultiFractal Detrended
    Fluctuation Analysis, MFDFA based novel feature extraction technique for
    automated detection of epilepsy. In fractal geometry, Multifractal Detrended
    Fluctuation Analysis MFDFA is a popular technique to examine the
    self-similarity of a nonlinear, chaotic and noisy time series. In the present
    research work, EEG signals representing healthy, interictal (seizure free) and
    ictal activities (seizure) are acquired from an existing available database.
    The acquired EEG signals of different states are at first analyzed using MFDFA.
    To requisite the time series singularity quantification at local and global
    scales, a novel set of fourteen different features. Suitable feature ranking
    employing students t-test has been done to select the most statistically
    significant features which are henceforth being used as inputs to a support
    vector machines (SVM) classifier for the classification of different EEG
    signals. Eight different classification problems have been presented in this
    paper and it has been observed that the overall classification accuracy using
    MFDFA based features are reasonably satisfactory for all classification
    problems. The performance of the proposed method are also found to be quite
    commensurable and in some cases even better when compared with the results
    published in existing literature studied on the similar data set.

    Smart Mining for Deep Metric Learning

    Vijay B G Kumar, Ben Harwood, Gustavo Carneiro, Ian Reid, Tom Drummond
    Comments: 9 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    To solve deep metric learning problems and producing feature embeddings,
    current methodologies will commonly use a triplet model to minimise the
    relative distance between samples from the same class and maximise the relative
    distance between samples from different classes. Though successful, the
    training convergence of this triplet model can be compromised by the fact that
    the vast majority of the training samples will produce gradients with
    magnitudes that are close to zero. This issue has motivated the development of
    methods that explore the global structure of the embedding and other methods
    that explore hard negative/positive mining. The effectiveness of such mining
    methods is often associated with intractable computational requirements. In
    this paper, we propose a novel deep metric learning method that combines the
    triplet model and the global structure of the embedding space. We rely on a
    smart mining procedure that produces effective training samples for a low
    computational cost. In addition, we propose an adaptive controller that
    automatically adjusts the smart mining hyper-parameters and speeds up the
    convergence of the training process. We show empirically that our proposed
    method allows for fast and more accurate training of triplet ConvNets than
    other competing mining methods. Additionally, we show that our method achieves
    new state-of-the-art embedding results for CUB-200-2011 and Cars196 datasets.

    Supporting Navigation of Outdoor Shopping Complexes for Visually-impaired Users through Multi-modal Data Fusion

    Archana Paladugu, Parag S. Chandakkar, Peng Zhang, Baoxin Li
    Comments: ICME 2013
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

    Outdoor shopping complexes (OSC) are extremely difficult for people with
    visual impairment to navigate. Existing GPS devices are mostly designed for
    roadside navigation and seldom transition well into an OSC-like setting. We
    report our study on the challenges faced by a blind person in navigating OSC
    through developing a new mobile application named iExplore. We first report an
    exploratory study aiming at deriving specific design principles for building
    this system by learning the unique challenges of the problem. Then we present a
    methodology that can be used to derive the necessary information for the
    development of iExplore, followed by experimental validation of the technology
    by a group of visually impaired users in a local outdoor shopping center. User
    feedback and other experiments suggest that iExplore, while at its very initial
    phase, has the potential of filling a practical gap in existing assistive
    technologies for the visually impaired.

    Classification of Diabetic Retinopathy Images Using Multi-Class Multiple-Instance Learning Based on Color Correlogram Features

    Ragav Venkatesan, Parag S. Chandakkar, Baoxin Li
    Comments: EMBS 2012
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    All people with diabetes have the risk of developing diabetic retinopathy
    (DR), a vision-threatening complication. Early detection and timely treatment
    can reduce the occurrence of blindness due to DR. Computer-aided diagnosis has
    the potential benefit of improving the accuracy and speed in DR detection. This
    study is concerned with automatic classification of images with microaneurysm
    (MA) and neovascularization (NV), two important DR clinical findings. Together
    with normal images, this presents a 3-class classification problem. We propose
    a modified color auto-correlogram feature (AutoCC) with low dimensionality that
    is spectrally tuned towards DR images. Recognizing the fact that the images
    with or without MA or NV are generally different only in small, localized
    regions, we propose to employ a multi-class, multiple-instance learning
    framework for performing the classification task using the proposed feature.
    Extensive experiments including comparison with a few state-of-art image
    classification approaches have been performed and the results suggest that the
    proposed approach is promising as it outperforms other methods by a large
    margin.

    Investigating Human Factors in Image Forgery Detection

    Parag S. Chandakkar, Baoxin Li
    Comments: ACM MM 2014
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In today’s age of internet and social media, one can find an enormous volume
    of forged images on-line. These images have been used in the past to convey
    falsified information and achieve harmful intentions. The spread and the effect
    of the social media only makes this problem more severe. While creating forged
    images has become easier due to software advancements, there is no automated
    algorithm which can reliably detect forgery.

    Image forgery detection can be seen as a subset of image understanding
    problem. Human performance is still the gold-standard for these type of
    problems when compared to existing state-of-art automated algorithms. We
    conduct a subjective evaluation test with the aid of eye-tracker to investigate
    into human factors associated with this problem. We compare the performance of
    an automated algorithm and humans for forgery detection problem. We also
    develop an algorithm which uses the data from the evaluation test to predict
    the difficulty-level of an image (the difficulty-level of an image here denotes
    how difficult it is for humans to detect forgery in an image. Terms such as
    “Easy/difficult image” will be used in the same context). The experimental
    results presented in this paper should facilitate development of better
    algorithms in the future.

    Improving Vision-based Self-positioning in Intelligent Transportation Systems via Integrated Lane and Vehicle Detection

    Parag S. Chandakkar, Yilin Wang, Baoxin Li
    Comments: WACV 2015
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Traffic congestion is a widespread problem. Dynamic traffic routing systems
    and congestion pricing are getting importance in recent research. Lane
    prediction and vehicle density estimation is an important component of such
    systems. We introduce a novel problem of vehicle self-positioning which
    involves predicting the number of lanes on the road and vehicle’s position in
    those lanes using videos captured by a dashboard camera. We propose an
    integrated closed-loop approach where we use the presence of vehicles to aid
    the task of self-positioning and vice-versa. To incorporate multiple factors
    and high-level semantic knowledge into the solution, we formulate this problem
    as a Bayesian framework. In the framework, the number of lanes, the vehicle’s
    position in those lanes and the presence of other vehicles are considered as
    parameters. We also propose a bounding box selection scheme to reduce the
    number of false detections and increase the computational efficiency. We show
    that the number of box proposals decreases by a factor of 6 using the selection
    approach. It also results in large reduction in the number of false detections.
    The entire approach is tested on real-world videos and is found to give
    acceptable results.

    Relative Learning from Web Images for Content-adaptive Enhancement

    Parag S. Chandakkar, Qiongjie Tian, Baoxin Li
    Comments: ICME 2015
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Personalized and content-adaptive image enhancement can find many
    applications in the age of social media and mobile computing. This paper
    presents a relative-learning-based approach, which, unlike previous methods,
    does not require matching original and enhanced images for training. This
    allows the use of massive online photo collections to train a ranking model for
    improved enhancement. We first propose a multi-level ranking model, which is
    learned from only relatively-labeled inputs that are automatically crawled.
    Then we design a novel parameter sampling scheme under this model to generate
    the desired enhancement parameters for a new image. For evaluation, we first
    verify the effectiveness and the generalization abilities of our approach,
    using images that have been enhanced/labeled by experts. Then we carry out
    subjective tests, which show that users prefer images enhanced by our approach
    over other existing methods.

    A Structured Approach to Predicting Image Enhancement Parameters

    Parag S. Chandakkar, Baoxin Li
    Comments: WACV 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Social networking on mobile devices has become a commonplace of everyday
    life. In addition, photo capturing process has become trivial due to the
    advances in mobile imaging. Hence people capture a lot of photos everyday and
    they want them to be visually-attractive. This has given rise to automated,
    one-touch enhancement tools. However, the inability of those tools to provide
    personalized and content-adaptive enhancement has paved way for machine-learned
    methods to do the same. The existing typical machine-learned methods
    heuristically (e.g. kNN-search) predict the enhancement parameters for a new
    image by relating the image to a set of similar training images. These
    heuristic methods need constant interaction with the training images which
    makes the parameter prediction sub-optimal and computationally expensive at
    test time which is undesired. This paper presents a novel approach to
    predicting the enhancement parameters given a new image using only its
    features, without using any training images. We propose to model the
    interaction between the image features and its corresponding enhancement
    parameters using the matrix factorization (MF) principles. We also propose a
    way to integrate the image features in the MF formulation. We show that our
    approach outperforms heuristic approaches as well as recent approaches in MF
    and structured prediction on synthetic as well as real-world data of image
    enhancement.

    A Computational Approach to Relative Aesthetics

    Parag S. Chandakkar, Vijetha Gattupalli, Baoxin Li
    Comments: ICPR 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Computational visual aesthetics has recently become an active research area.
    Existing state-of-art methods formulate this as a binary classification task
    where a given image is predicted to be beautiful or not. In many applications
    such as image retrieval and enhancement, it is more important to rank images
    based on their aesthetic quality instead of binary-categorizing them.
    Furthermore, in such applications, it may be possible that all images belong to
    the same category. Hence determining the aesthetic ranking of the images is
    more appropriate. To this end, we formulate a novel problem of ranking images
    with respect to their aesthetic quality. We construct a new dataset of image
    pairs with relative labels by carefully selecting images from the popular AVA
    dataset. Unlike in aesthetics classification, there is no single threshold
    which would determine the ranking order of the images across our entire
    dataset. We propose a deep neural network based approach that is trained on
    image pairs by incorporating principles from relative learning. Results show
    that such relative training procedure allows our network to rank the images
    with a higher accuracy than a state-of-art network trained on the same set of
    images using binary labels.

    Estimation of Tissue Microstructure Using a Deep Network Inspired by a Sparse Reconstruction Framework

    Chuyang Ye
    Comments: 12 pages, 5 figures. Accepted by IPMI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Diffusion magnetic resonance imaging (dMRI) provides a unique tool for
    noninvasively probing the microstructure of the neuronal tissue. The NODDI
    model has been a popular approach to the estimation of tissue microstructure in
    many neuroscience studies. It represents the diffusion signals with three types
    of diffusion in tissue: intra-cellular, extra-cellular, and cerebrospinal fluid
    compartments. However, the original NODDI method uses a computationally
    expensive procedure to fit the model and could require a large number of
    diffusion gradients for accurate microstructure estimation, which may be
    impractical for clinical use. Therefore, efforts have been devoted to efficient
    and accurate NODDI microstructure estimation with a reduced number of diffusion
    gradients. In this work, we propose a deep network based approach to the NODDI
    microstructure estimation, which is named Microstructure Estimation using a
    Deep Network (MEDN). Motivated by the AMICO algorithm which accelerates the
    computation of NODDI parameters, we formulate the microstructure estimation
    problem in a dictionary-based framework. The proposed network comprises two
    cascaded stages. The first stage resembles the solution to a dictionary-based
    sparse reconstruction problem and the second stage computes the final
    microstructure using the output of the first stage. The weights in the two
    stages are jointly learned from training data, which is obtained from training
    dMRI scans with diffusion gradients that densely sample the q-space. The
    proposed method was applied to brain dMRI scans, where two shells each with 30
    gradient directions (60 diffusion gradients in total) were used. Estimation
    accuracy with respect to the gold standard was measured and the results
    demonstrate that MEDN outperforms the competing algorithms.

    Joint Regression and Ranking for Image Enhancement

    Parag S. Chandakkar, Baoxin Li
    Comments: WACV 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Research on automated image enhancement has gained momentum in recent years,
    partially due to the need for easy-to-use tools for enhancing pictures captured
    by ubiquitous cameras on mobile devices. Many of the existing leading methods
    employ machine-learning-based techniques, by which some enhancement parameters
    for a given image are found by relating the image to the training images with
    known enhancement parameters. While knowing the structure of the parameter
    space can facilitate search for the optimal solution, none of the existing
    methods has explicitly modeled and learned that structure. This paper presents
    an end-to-end, novel joint regression and ranking approach to model the
    interaction between desired enhancement parameters and images to be processed,
    employing a Gaussian process (GP). GP allows searching for ideal parameters
    using only the image features. The model naturally leads to a ranking technique
    for comparing images in the induced feature space. Comparative evaluation using
    the ground-truth based on the MIT-Adobe FiveK dataset plus subjective tests on
    an additional data-set were used to demonstrate the effectiveness of the
    proposed approach.

    Escape from Cells: Deep Kd-Networks for The Recognition of 3D Point Cloud Models

    Roman Klokov, Victor Lempitsky
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a new deep learning architecture (called Kd-network) that is
    designed for 3D model recognition tasks and works with unstructured point
    clouds. The new architecture performs multiplicative transformations and share
    parameters of these transformations according to the subdivisions of the point
    clouds imposed onto them by Kd-trees. Unlike the currently dominant
    convolutional architectures that usually require rasterization on uniform
    two-dimensional or three-dimensional grids, Kd-networks do not rely on such
    grids in any way and therefore avoid poor scaling behaviour. In a series of
    experiments with popular shape recognition benchmarks, Kd-networks demonstrate
    competitive performance in a number of shape recognition tasks such as shape
    classification, shape retrieval and shape part segmentation.

    Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition

    Harshala Gammulle, Simon Denman, Sridha Sridharan, Clinton Fookes
    Comments: Published as a conference paper at WACV 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper we address the problem of human action recognition from video
    sequences. Inspired by the exemplary results obtained via automatic feature
    learning and deep learning approaches in computer vision, we focus our
    attention towards learning salient spatial features via a convolutional neural
    network (CNN) and then map their temporal relationship with the aid of
    Long-Short-Term-Memory (LSTM) networks. Our contribution in this paper is a
    deep fusion framework that more effectively exploits spatial features from CNNs
    with temporal features from LSTM models. We also extensively evaluate their
    strengths and weaknesses. We find that by combining both the sets of features,
    the fully connected features effectively act as an attention mechanism to
    direct the LSTM to interesting parts of the convolutional feature sequence. The
    significance of our fusion method is its simplicity and effectiveness compared
    to other state-of-the-art methods. The evaluation results demonstrate that this
    hierarchical multi stream fusion method has higher performance compared to
    single stream mapping methods allowing it to achieve high accuracy
    outperforming current state-of-the-art methods in three widely used databases:
    UCF11, UCFSports, jHMDB.

    Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

    Weilin Xu, David Evans, Yanjun Qi
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Learning (cs.LG)

    Although deep neural networks (DNNs) have achieved great success in many
    computer vision tasks, recent studies have shown they are vulnerable to
    adversarial examples. Such examples, typically generated by adding small but
    purposeful distortions, can frequently fool DNN models. Previous studies to
    defend against adversarial examples mostly focused on refining the DNN models.
    They have either shown limited success or suffer from the expensive
    computation. We propose a new strategy, emph{feature squeezing}, that can be
    used to harden DNN models by detecting adversarial examples. Feature squeezing
    reduces the search space available to an adversary by coalescing samples that
    correspond to many different feature vectors in the original space into a
    single sample. By comparing a DNN model’s prediction on the original input with
    that on the squeezed input, feature squeezing detects adversarial examples with
    high accuracy and few false positives. This paper explores two instances of
    feature squeezing: reducing the color bit depth of each pixel and smoothing
    using a spatial filter. These strategies are straightforward, inexpensive, and
    complementary to defensive methods that operate on the underlying model, such
    as adversarial training.

    Pose2Instance: Harnessing Keypoints for Person Instance Segmentation

    Subarna Tripathi, Maxwell Collins, Matthew Brown, Serge Belongie
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Human keypoints are a well-studied representation of people.We explore how to
    use keypoint models to improve instance-level person segmentation. The main
    idea is to harness the notion of a distance transform of oracle provided
    keypoints or estimated keypoint heatmaps as a prior for person instance
    segmentation task within a deep neural network. For training and evaluation, we
    consider all those images from COCO where both instance segmentation and human
    keypoints annotations are available. We first show how oracle keypoints can
    boost the performance of existing human segmentation model during inference
    without any training. Next, we propose a framework to directly learn a deep
    instance segmentation model conditioned on human pose. Experimental results
    show that at various Intersection Over Union (IOU) thresholds, in a constrained
    environment with oracle keypoints, the instance segmentation accuracy achieves
    10% to 12% relative improvements over a strong baseline of oracle bounding
    boxes. In a more realistic environment, without the oracle keypoints, the
    proposed deep person instance segmentation model conditioned on human pose
    achieves 3.8% to 10.5% relative improvements comparing with its strongest
    baseline of a deep network trained only for segmentation.

    DyVEDeep: Dynamic Variable Effort Deep Neural Networks

    Sanjay Ganapathy, Swagath Venkataramani, Balaraman Ravindran, Anand Raghunathan
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Deep Neural Networks (DNNs) have advanced the state-of-the-art in a variety
    of machine learning tasks and are deployed in increasing numbers of products
    and services. However, the computational requirements of training and
    evaluating large-scale DNNs are growing at a much faster pace than the
    capabilities of the underlying hardware platforms that they are executed upon.
    In this work, we propose Dynamic Variable Effort Deep Neural Networks
    (DyVEDeep) to reduce the computational requirements of DNNs during inference.
    Previous efforts propose specialized hardware implementations for DNNs,
    statically prune the network, or compress the weights. Complementary to these
    approaches, DyVEDeep is a dynamic approach that exploits the heterogeneity in
    the inputs to DNNs to improve their compute efficiency with comparable
    classification accuracy. DyVEDeep equips DNNs with dynamic effort mechanisms
    that, in the course of processing an input, identify how critical a group of
    computations are to classify the input. DyVEDeep dynamically focuses its
    compute effort only on the critical computa- tions, while skipping or
    approximating the rest. We propose 3 effort knobs that operate at different
    levels of granularity viz. neuron, feature and layer levels. We build DyVEDeep
    versions for 5 popular image recognition benchmarks – one for CIFAR-10 and four
    for ImageNet (AlexNet, OverFeat and VGG-16, weight-compressed AlexNet). Across
    all benchmarks, DyVEDeep achieves 2.1x-2.6x reduction in the number of scalar
    operations, which translates to 1.8x-2.3x performance improvement over a
    Caffe-based implementation, with < 0.5% loss in accuracy.

    Satellite Image-based Localization via Learned Embeddings

    Dong-Ki Kim, Matthew R. Walter
    Comments: To be published in IEEE International Conference on Robotics and Automation (ICRA), 2017
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We propose a vision-based method that localizes a ground vehicle using
    publicly available satellite imagery as the only prior knowledge of the
    environment. Our approach takes as input a sequence of ground-level images
    acquired by the vehicle as it navigates, and outputs an estimate of the
    vehicle’s pose relative to a georeferenced satellite image. We overcome the
    significant viewpoint and appearance variations between the images through a
    neural multi-view model that learns location-discriminative embeddings in which
    ground-level images are matched with their corresponding satellite view of the
    scene. We use this learned function as an observation model in a filtering
    framework to maintain a distribution over the vehicle’s pose. We evaluate our
    method on different benchmark datasets and demonstrate its ability localize
    ground-level images in environments novel relative to training, despite the
    challenges of significant viewpoint and appearance variations.


    Artificial Intelligence

    Embodied Artificial Intelligence through Distributed Adaptive Control: An Integrated Framework

    Clément Moulin-Frier, Jordi-Ysard Puigbò, Xerxes D. Arsiwalla, Martì Sanchez-Fibla, Paul F. M. J. Verschure
    Comments: Paper submitted to the ICDL-Epirob 2017 conference
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Multiagent Systems (cs.MA)

    In this paper, we argue that the future of Artificial Intelligence research
    resides in two keywords: integration and embodiment. We support this claim by
    analyzing the recent advances of the field. Regarding integration, we note that
    the most impactful recent contributions have been made possible through the
    integration of recent Machine Learning methods (based in particular on Deep
    Learning and Recurrent Neural Networks) with more traditional ones (e.g.
    Monte-Carlo tree search, goal babbling exploration or addressable memory
    systems). Regarding embodiment, we note that the traditional benchmark tasks
    (e.g. visual classification or board games) are becoming obsolete as
    state-of-the-art learning algorithms approach or even surpass human performance
    in most of them, having recently encouraged the development of first-person 3D
    game platforms embedding realistic physics. Building upon this analysis, we
    first propose an embodied cognitive architecture integrating heterogenous
    sub-fields of Artificial Intelligence into a unified framework. We demonstrate
    the utility of our approach by showing how major contributions of the field can
    be expressed within the proposed framework. We then claim that benchmarking
    environments need to reproduce ecologically-valid conditions for bootstrapping
    the acquisition of increasingly complex cognitive skills through the concept of
    a cognitive arms race between embodied agents.

    Geracao Automatica de Paineis de Controle para Analise de Mobilidade Urbana Utilizando Redes Complexas

    Victor Dantas, Henrique Santos, Carlos Caminha, Vasco Furtado
    Comments: 12 pages, in Portuguese, 8 figures
    Subjects: Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

    In this paper we describe an automatic generator to support the data
    scientist to construct, in a user-friendly way, dashboards from data
    represented as networks. The generator called SBINet (Semantic for Business
    Intelligence from Networks) has a semantic layer that, through ontologies,
    describes the data that represents a network as well as the possible metrics to
    be calculated in the network. Thus, with SBINet, the stages of the dashboard
    constructing process that uses complex network metrics are facilitated and can
    be done by users who do not necessarily know about complex networks.

    Finite Sample Analysis for TD(0) with Linear Function Approximation

    Gal Dalal, Balázs Szörényi, Gugan Thoppe, Shie Mannor
    Subjects: Artificial Intelligence (cs.AI)

    TD(0) is one of the most commonly used algorithms in reinforcement learning.
    Despite this, there is no existing finite sample analysis for TD(0) with
    function approximation, even for the linear case. Our work is the first to
    provide such a result. Works that managed to obtain concentration bounds for
    online Temporal Difference (TD) methods analyzed modified versions of them,
    carefully crafted for the analyses to hold. These modifications include
    projections and step-sizes dependent on unknown problem parameters. Our
    analysis obviates these artificial alterations by exploiting strong properties
    of TD(0) and tailor-made stochastic approximation tools.

    Best Practices for Applying Deep Learning to Novel Applications

    Leslie N. Smith
    Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    This report is targeted to groups who are subject matter experts in their
    application but deep learning novices. It contains practical advice for those
    interested in testing the use of deep neural networks on applications that are
    novel for deep learning. We suggest making your project more manageable by
    dividing it into phases. For each phase this report contains numerous
    recommendations and insights to assist novice practitioners.

    MIT at SemEval-2017 Task 10: Relation Extraction with Convolutional Neural Networks

    Ji Young Lee, Franck Dernoncourt, Peter Szolovits
    Comments: Accepted at SemEval 2017. The first two authors contributed equally to this work
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Over 50 million scholarly articles have been published: they constitute a
    unique repository of knowledge. In particular, one may infer from them
    relations between scientific concepts, such as synonyms and hyponyms.
    Artificial neural networks have been recently explored for relation extraction.
    In this work, we continue this line of work and present a system based on a
    convolutional neural network to extract relations. Our model ranked first in
    the SemEval-2017 task 10 (ScienceIE) for relation extraction in scientific
    articles (subtask C).

    Multi-Label Learning with Global and Local Label Correlation

    Yue Zhu, James T. Kwok, Zhi-Hua Zhou
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    It is well-known that exploiting label correlations is important to
    multi-label learning. Existing approaches either assume that the label
    correlations are global and shared by all instances; or that the label
    correlations are local and shared only by a data subset. In fact, in the
    real-world applications, both cases may occur that some label correlations are
    globally applicable and some are shared only in a local group of instances.
    Moreover, it is also a usual case that only partial labels are observed, which
    makes the exploitation of the label correlations much more difficult. That is,
    it is hard to estimate the label correlations when many labels are absent. In
    this paper, we propose a new multi-label approach GLOCAL dealing with both the
    full-label and the missing-label cases, exploiting global and local label
    correlations simultaneously, through learning a latent label representation and
    optimizing label manifolds. The extensive experimental studies validate the
    effectiveness of our approach on both full-label and missing-label data.

    Finite-Time Stabilization of Longitudinal Control for Autonomous Vehicles via a Model-Free Approach

    Philip Polack, Brigitte d'Andréa-Novel, Michel Fliess, Arnaud de la Fortelle, Lghani Menhour
    Comments: IFAC 2017 World Congress, Toulouse
    Subjects: Systems and Control (cs.SY); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

    This communication presents a longitudinal model-free control approach for
    computing the wheel torque command to be applied on a vehicle. This setting
    enables us to overcome the problem of unknown vehicle parameters for generating
    a suitable control law. An important parameter in this control setting is made
    time-varying for ensuring finite-time stability. Several convincing computer
    simulations are displayed and discussed. Overshoots become therefore smaller.
    The driving comfort is increased and the robustness to time-delays is improved.

    Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

    Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas Eck, Karen Simonyan, Mohammad Norouzi
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)

    Generative models in vision have seen rapid progress due to algorithmic
    improvements and the availability of high-quality image datasets. In this
    paper, we offer contributions in both these areas to enable similar progress in
    audio modeling. First, we detail a powerful new WaveNet-style autoencoder model
    that conditions an autoregressive decoder on temporal codes learned from the
    raw audio waveform. Second, we introduce NSynth, a large-scale and high-quality
    dataset of musical notes that is an order of magnitude larger than comparable
    public datasets. Using NSynth, we demonstrate improved qualitative and
    quantitative performance of the WaveNet autoencoder over a well-tuned spectral
    autoencoder baseline. Finally, we show that the model learns a manifold of
    embeddings that allows for morphing between instruments, meaningfully
    interpolating in timbre to create new types of sounds that are realistic and
    expressive.


    Information Retrieval

    Ranking with social cues: Integrating online review scores and popularity information

    Pantelis P. Analytis, Alexia Delfino, Juliane Kämmer, Mehdi Moussaïd, Thorsten Joachims
    Comments: 4 pages, 3 figures, ICWSM
    Subjects: Information Retrieval (cs.IR)

    Online marketplaces, search engines, and databases employ aggregated social
    information to rank their content for users. Two ranking heuristics commonly
    implemented to order the available options are the average review score and
    item popularity-that is, the number of users who have experienced an item.
    These rules, although easy to implement, only partly reflect actual user
    preferences, as people may assign values to both average scores and popularity
    and trade off between the two. How do people integrate these two pieces of
    social information when making choices? We present two experiments in which we
    asked participants to choose 200 times among options drawn directly from two
    widely used online venues: Amazon and IMDb. The only information presented to
    participants was the average score and the number of reviews, which served as a
    proxy for popularity. We found that most people are willing to settle for items
    with somewhat lower average scores if they are more popular. Yet, our study
    uncovered substantial diversity of preferences among participants, which
    indicates a sizable potential for personalizing ranking schemes that rely on
    social information.


    Computation and Language

    MIT at SemEval-2017 Task 10: Relation Extraction with Convolutional Neural Networks

    Ji Young Lee, Franck Dernoncourt, Peter Szolovits
    Comments: Accepted at SemEval 2017. The first two authors contributed equally to this work
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Over 50 million scholarly articles have been published: they constitute a
    unique repository of knowledge. In particular, one may infer from them
    relations between scientific concepts, such as synonyms and hyponyms.
    Artificial neural networks have been recently explored for relation extraction.
    In this work, we continue this line of work and present a system based on a
    convolutional neural network to extract relations. Our model ranked first in
    the SemEval-2017 task 10 (ScienceIE) for relation extraction in scientific
    articles (subtask C).

    Linear Ensembles of Word Embedding Models

    Avo Muromägi, Kairit Sirts, Sven Laur
    Comments: Nodalida 2017
    Subjects: Computation and Language (cs.CL)

    This paper explores linear methods for combining several word embedding
    models into an ensemble. We construct the combined models using an iterative
    method based on either ordinary least squares regression or the solution to the
    orthogonal Procrustes problem.

    We evaluate the proposed approaches on Estonian—a morphologically complex
    language, for which the available corpora for training word embeddings are
    relatively small. We compare both combined models with each other and with the
    input word embedding models using synonym and analogy tests. The results show
    that while using the ordinary least squares regression performs poorly in our
    experiments, using orthogonal Procrustes to combine several word embedding
    models into an ensemble model leads to 7-10% relative improvements over the
    mean result of the initial models in synonym tests and 19-47% in analogy tests.

    CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity

    Jeremy Ferrero, Frederic Agnes, Laurent Besacier, Didier Schwab
    Subjects: Computation and Language (cs.CL)

    We present our submitted systems for Semantic Textual Similarity (STS) Track
    4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must
    estimate their semantic similarity by a score between 0 and 5. In our
    submission, we use syntax-based, dictionary-based, context-based, and MT-based
    methods. We also combine these methods in unsupervised and supervised way. Our
    best run ranked 1st on track 4a with a correlation of 83.02% with human
    annotations.

    Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF

    Yan Shao, Christian Hardmeier, Jörg Tiedemann, Joakim Nivre
    Comments: 8 pages plus 2 pages references and 1 page appendix, 3 figures, submitted to EMNLP 2017
    Subjects: Computation and Language (cs.CL)

    We present a character-based model for joint segmentation and POS tagging for
    Chinese. The bidirectional RNN-CRF architecture for general sequence tagging is
    adapted and applied with novel vector representations of Chinese characters
    that capture rich contextual information and lower-than-character level
    features. The proposed model is extensively evaluated and compared with a
    state-of-the-art tagger respectively on CTB5, CTB9 and UD Chinese. The
    experimental results indicate that our model is accurate and robust across
    datasets in different sizes, genres and annotation schemes. We obtain state-
    of-the-art performance on CTB5, achieving 94.38 F1-score for joint segmentation
    and POS tagging.

    Learning to Generate Reviews and Discovering Sentiment

    Alec Radford, Rafal Jozefowicz, Ilya Sutskever
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    We explore the properties of byte-level recurrent language models. When given
    sufficient amounts of capacity, training data, and compute time, the
    representations learned by these models include disentangled features
    corresponding to high-level concepts. Specifically, we find a single unit which
    performs sentiment analysis. These representations, learned in an unsupervised
    manner, achieve state of the art on the binary subset of the Stanford Sentiment
    Treebank. They are also very data efficient. When using only a handful of
    labeled examples, our approach matches the performance of strong baselines
    trained on full datasets. We also demonstrate the sentiment unit has a direct
    influence on the generative process of the model. Simply fixing its value to be
    positive or negative generates samples with the corresponding positive or
    negative sentiment.


    Distributed, Parallel, and Cluster Computing

    Design and Analysis of a Task-based Parallelization over a Runtime System of an Explicit Finite-Volume CFD Code with Adaptive Time Stepping

    Jean Marie Couteyen Carpaye, Jean Roman, Pierre Brenner
    Comments: Preprint of an accepted paper in Journal of Computational Science
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    FLUSEPA (Registered trademark in France No. 134009261) is an advanced
    simulation tool which performs a large panel of aerodynamic studies. It is the
    unstructured finite-volume solver developed by Airbus Safran Launchers company
    to calculate compressible, multidimensional, unsteady, viscous and reactive
    flows around bodies in relative motion. The time integration in FLUSEPA is done
    using an explicit temporal adaptive method. The current production version of
    the code is based on MPI and OpenMP. This implementation leads to important
    synchronizations that must be reduced. To tackle this problem, we present the
    study of a task-based parallelization of the aerodynamic solver of FLUSEPA
    using the runtime system StarPU and combining up to three levels of
    parallelism. We validate our solution by the simulation (using a finite-volume
    mesh with 80 million cells) of a take-off blast wave propagation for Ariane 5
    launcher.

    0.5 Petabyte Simulation of a 45-Qubit Quantum Circuit

    Thomas Häner, Damian S. Steiger
    Subjects: Quantum Physics (quant-ph); Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET)

    Near-term quantum computers will soon reach sizes that are challenging to
    directly simulate, even when employing the most powerful supercomputers. Yet,
    the ability to simulate these early devices using classical computers is
    crucial for calibration, validation, and benchmarking. In order to make use of
    the full potential of systems featuring multi- and many-core processors, we use
    automatic code generation and optimization of compute kernels, which also
    enables performance portability. We apply a scheduling algorithm to quantum
    supremacy circuits in order to reduce the required communication and simulate a
    45-qubit circuit on the Cori II supercomputer using 8,192 nodes and 0.5
    petabytes of memory. To our knowledge, this constitutes the largest quantum
    circuit simulation to this date. Our highly-tuned kernels in combination with
    the reduced communication requirements allow an improvement in time-to-solution
    over state-of-the-art simulators by more than an order of magnitude at every
    scale.


    Learning

    Deep Learning and Quantum Physics : A Fundamental Bridge

    Yoav Levine, David Yakira, Nadav Cohen, Amnon Shashua
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Quantum Physics (quant-ph)

    Deep convolutional networks have witnessed unprecedented success in various
    machine learning applications. Formal understanding on what makes these
    networks so successful is gradually unfolding, but for the most part there are
    still significant mysteries to unravel. The inductive bias, which reflects
    prior knowledge embedded in the network architecture, is one of them. In this
    work, we establish a fundamental connection between the fields of quantum
    physics and deep learning. We use this connection for asserting novel
    theoretical observations regarding the role that the number of channels in each
    layer of the convolutional network fulfills in the overall inductive bias.
    Specifically, we show an equivalence between the function realized by a deep
    convolutional arithmetic circuit (ConvAC) and a quantum many-body wave
    function, which relies on their common underlying tensorial structure. This
    facilitates the use of quantum entanglement measures as well-defined
    quantifiers of a deep network’s expressive ability to model intricate
    correlation structures of its inputs. Most importantly, the construction of a
    deep ConvAC in terms of a Tensor Network is made available. This description
    enables us to carry a graph-theoretic analysis of a convolutional network, with
    which we demonstrate a direct control over the inductive bias of the deep
    network via its channel numbers, that are related min-cut in the underlying
    graph. This result is relevant to any practitioner designing a convolutional
    network for a specific task. We theoretically analyze ConvACs, and empirically
    validate our findings on more common ConvNets which involve ReLU activations
    and max pooling. Beyond the results described above, the description of a deep
    convolutional network in well-defined graph-theoretic tools and the formal
    connection to quantum entanglement, are two interdisciplinary bridges that are
    brought forth by this work.

    Learning to Generate Reviews and Discovering Sentiment

    Alec Radford, Rafal Jozefowicz, Ilya Sutskever
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    We explore the properties of byte-level recurrent language models. When given
    sufficient amounts of capacity, training data, and compute time, the
    representations learned by these models include disentangled features
    corresponding to high-level concepts. Specifically, we find a single unit which
    performs sentiment analysis. These representations, learned in an unsupervised
    manner, achieve state of the art on the binary subset of the Stanford Sentiment
    Treebank. They are also very data efficient. When using only a handful of
    labeled examples, our approach matches the performance of strong baselines
    trained on full datasets. We also demonstrate the sentiment unit has a direct
    influence on the generative process of the model. Simply fixing its value to be
    positive or negative generates samples with the corresponding positive or
    negative sentiment.

    AMIDST: a Java Toolbox for Scalable Probabilistic Machine Learning

    Andrés R. Masegosa, Ana M. Martínez, Darío Ramos-López, Rafael Cabañas, Antonio Salmerón, Thomas D. Nielsen, Helge Langseth, Anders L. Madsen
    Subjects: Learning (cs.LG)

    The AMIDST Toolbox is a software for scalable probabilistic machine learning
    with a spe- cial focus on (massive) streaming data. The toolbox supports a
    flexible modeling language based on probabilistic graphical models with latent
    variables and temporal dependencies. The specified models can be learnt from
    large data sets using parallel or distributed implementa- tions of Bayesian
    learning algorithms for either streaming or batch data. These algorithms are
    based on a flexible variational message passing scheme, which supports discrete
    and continu- ous variables from a wide range of probability distributions.
    AMIDST also leverages existing functionality and algorithms by interfacing to
    software tools such as Flink, Spark, MOA, Weka, R and HUGIN. AMIDST is an open
    source toolbox written in Java and available at this http URL
    under the Apache Software License version 2.0.

    Multi-Label Learning with Global and Local Label Correlation

    Yue Zhu, James T. Kwok, Zhi-Hua Zhou
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    It is well-known that exploiting label correlations is important to
    multi-label learning. Existing approaches either assume that the label
    correlations are global and shared by all instances; or that the label
    correlations are local and shared only by a data subset. In fact, in the
    real-world applications, both cases may occur that some label correlations are
    globally applicable and some are shared only in a local group of instances.
    Moreover, it is also a usual case that only partial labels are observed, which
    makes the exploitation of the label correlations much more difficult. That is,
    it is hard to estimate the label correlations when many labels are absent. In
    this paper, we propose a new multi-label approach GLOCAL dealing with both the
    full-label and the missing-label cases, exploiting global and local label
    correlations simultaneously, through learning a latent label representation and
    optimizing label manifolds. The extensive experimental studies validate the
    effectiveness of our approach on both full-label and missing-label data.

    Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

    Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas Eck, Karen Simonyan, Mohammad Norouzi
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)

    Generative models in vision have seen rapid progress due to algorithmic
    improvements and the availability of high-quality image datasets. In this
    paper, we offer contributions in both these areas to enable similar progress in
    audio modeling. First, we detail a powerful new WaveNet-style autoencoder model
    that conditions an autoregressive decoder on temporal codes learned from the
    raw audio waveform. Second, we introduce NSynth, a large-scale and high-quality
    dataset of musical notes that is an order of magnitude larger than comparable
    public datasets. Using NSynth, we demonstrate improved qualitative and
    quantitative performance of the WaveNet autoencoder over a well-tuned spectral
    autoencoder baseline. Finally, we show that the model learns a manifold of
    embeddings that allows for morphing between instruments, meaningfully
    interpolating in timbre to create new types of sounds that are realistic and
    expressive.

    Linear Additive Markov Processes

    Ravi Kumar, Maithra Raghu, Tamas Sarlos, Andrew Tomkins
    Comments: Accepted to WWW 2017
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We introduce LAMP: the Linear Additive Markov Process. Transitions in LAMP
    may be influenced by states visited in the distant history of the process, but
    unlike higher-order Markov processes, LAMP retains an efficient
    parametrization. LAMP also allows the specific dependence on history to be
    learned efficiently from data. We characterize some theoretical properties of
    LAMP, including its steady-state and mixing time. We then give an algorithm
    based on alternating minimization to learn LAMP models from data. Finally, we
    perform a series of real-world experiments to show that LAMP is more powerful
    than first-order Markov processes, and even holds its own against deep
    sequential models (LSTMs) with a negligible increase in parameter complexity.

    Neural Message Passing for Quantum Chemistry

    Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, George E. Dahl
    Comments: 13 pages
    Subjects: Learning (cs.LG)

    Supervised learning on molecules has incredible potential to be useful in
    chemistry, drug discovery, and materials science. Luckily, several promising
    and closely related neural network models invariant to molecular symmetries
    have already been described in the literature. These models learn a message
    passing algorithm and aggregation function to compute a function of their
    entire input graph. At this point, the next step is to find a particularly
    effective variant of this general approach and apply it to chemical prediction
    benchmarks until we either solve them or reach the limits of the approach. In
    this paper, we reformulate existing models into a single common framework we
    call Message Passing Neural Networks (MPNNs) and explore additional novel
    variations within this framework. Using MPNNs we demonstrate state of the art
    results on an important molecular property prediction benchmark, results we
    believe are strong enough to justify retiring this benchmark.

    On the Unreported-Profile-is-Negative Assumption for Predictive Cheminformatics

    Chao Lan, Sai Nivedita Chandrasekaran, Jun Huan
    Comments: 11 pages, 4 figures
    Subjects: Learning (cs.LG); Chemical Physics (physics.chem-ph); Machine Learning (stat.ML)

    The study of compound-target binding profiles has been a central theme in
    cheminformatics. For data repositories that only provide positive binding
    profiles, a popular assumption is that all unreported profiles are negative. In
    this paper, we caution audience not to take such assumptions for granted. Under
    a problem setting where binding profiles are used as features to train
    predictive models, we present empirical evidence that (1) predictive
    performance degrades when the assumption fails and (2) explicit recovery of
    unreported profiles improves predictive performance. In particular, we propose
    a joint framework of profile recovery and supervised learning, which shows
    further performance improvement. Our study not only calls for more careful
    treatment of unreported profiles in cheminformatics, but also initiates a new
    machine learning problem which we called Learning with Positive and Unknown
    Features.

    Comment on "Biologically inspired protection of deep networks from adversarial attacks"

    Wieland Brendel, Matthias Bethge
    Comments: 4 pages, 3 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neurons and Cognition (q-bio.NC)

    A recent paper suggests that Deep Neural Networks can be protected from
    gradient-based adversarial perturbations by driving the network activations
    into a highly saturated regime. Here we analyse such saturated networks and
    show that the attacks fail due to numerical limitations in the gradient
    computations. A simple stabilisation of the gradient estimates enables
    successful and efficient attacks. Thus, it has yet to be shown that the
    robustness observed in highly saturated networks is not simply due to numerical
    limitations.

    Convolutional Neural Networks for Page Segmentation of Historical Document Images

    Kai Chen, Mathias Seuret
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    This paper presents a Convolutional Neural Network (CNN) based page
    segmentation method for handwritten historical document images. We consider
    page segmentation as a pixel labeling problem, i.e., each pixel is classified
    as one of the predefined classes. Traditional methods in this area rely on
    carefully hand-crafted features or large amounts of prior knowledge. In
    contrast, we propose to learn features from raw image pixels using a CNN. While
    many researchers focus on developing deep CNN architectures to solve different
    problems, we train a simple CNN with only one convolution layer. We show that
    the simple architecture achieves competitive results against other deep
    architectures on different public datasets. Experiments also demonstrate the
    effectiveness and superiority of the proposed method compared to previous
    methods.

    Automatic Breast Ultrasound Image Segmentation: A Survey

    Min Xian, Yingtao Zhang, H.D. Cheng, Fei Xu, Boyu Zhang, Jianrui Ding
    Comments: 71 pages, 6 tables, 166 references
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Breast cancer is one of the leading causes of cancer death among women
    worldwide. In clinical routine, automatic breast ultrasound (BUS) image
    segmentation is very challenging and essential for cancer diagnosis and
    treatment planning. Many BUS segmentation approaches have been studied in the
    last two decades, and have been proved to be effective on private datasets.
    Currently, the advancement of BUS image segmentation seems to meet its
    bottleneck. The improvement of the performance is increasingly challenging, and
    only few new approaches were published in the last several years. It is the
    time to look at the field by reviewing previous approaches comprehensively and
    to investigate the future directions. In this paper, we study the basic ideas,
    theories, pros and cons of the approaches, group them into categories, and
    extensively review each category in depth by discussing the principles,
    application issues, and advantages/disadvantages.

    Comparison Based Nearest Neighbor Search

    Siavash Haghiri, Debarghya Ghoshdastidar, Ulrike von Luxburg
    Comments: 16 Pages, 3 Figures
    Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Learning (cs.LG)

    We consider machine learning in a comparison-based setting where we are given
    a set of points in a metric space, but we have no access to the actual
    distances between the points. Instead, we can only ask an oracle whether the
    distance between two points (i) and (j) is smaller than the distance between
    the points (i) and (k). We are concerned with data structures and algorithms to
    find nearest neighbors based on such comparisons. We focus on a simple yet
    effective algorithm that recursively splits the space by first selecting two
    random pivot points and then assigning all other points to the closer of the
    two (comparison tree). We prove that if the metric space satisfies certain
    expansion conditions, then with high probability the height of the comparison
    tree is logarithmic in the number of points, leading to efficient search
    performance. We also provide an upper bound for the failure probability to
    return the true nearest neighbor. Experiments show that the comparison tree is
    competitive with algorithms that have access to the actual distance values, and
    needs less triplet comparisons than other competitors.

    OEC: Open-Ended Classification for Future-Proof Link-Fraud Detection

    Neil Shah, Hemank Lamba, Alex Beutel, Christos Faloutsos
    Subjects: Social and Information Networks (cs.SI); Learning (cs.LG)

    When tasked to find fraudulent social network users, what is a practitioner
    to do? Traditional classification can lead to poor generalization and high
    misclassification given few and possibly biased labels. We tackle this problem
    by analyzing fraudulent behavioral patterns, featurizing users to yield strong
    discriminative performance, and building algorithms to handle new and
    multimodal fraud types. First, we set up honeypots, or “dummy” social network
    accounts on which we solicit fake followers (after careful IRB approval). We
    report the signs of such behaviors, including oddities in local network
    connectivity, account attributes, and similarities and differences across fraud
    providers. We discover several types of fraud behaviors, with the possibility
    of even more. We discuss how to leverage these insights in practice, build
    strongly performing entropy-based features, and propose OEC (Open-ended
    Classification), an approach for “future-proofing” existing algorithms to
    account for the complexities of link fraud. Our contributions are (a)
    observations: we analyze our honeypot fraudster ecosystem and give insights
    regarding various fraud behaviors, (b) features: we engineer features which
    give exceptionally strong (>0.95 precision/recall) discriminative power on
    ground-truth data, and (c) algorithm: we motivate and discuss OEC, which
    reduces misclassification rate by >18% over baselines and routes practitioner
    attention to samples at high-risk of misclassification.

    Embodied Artificial Intelligence through Distributed Adaptive Control: An Integrated Framework

    Clément Moulin-Frier, Jordi-Ysard Puigbò, Xerxes D. Arsiwalla, Martì Sanchez-Fibla, Paul F. M. J. Verschure
    Comments: Paper submitted to the ICDL-Epirob 2017 conference
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Multiagent Systems (cs.MA)

    In this paper, we argue that the future of Artificial Intelligence research
    resides in two keywords: integration and embodiment. We support this claim by
    analyzing the recent advances of the field. Regarding integration, we note that
    the most impactful recent contributions have been made possible through the
    integration of recent Machine Learning methods (based in particular on Deep
    Learning and Recurrent Neural Networks) with more traditional ones (e.g.
    Monte-Carlo tree search, goal babbling exploration or addressable memory
    systems). Regarding embodiment, we note that the traditional benchmark tasks
    (e.g. visual classification or board games) are becoming obsolete as
    state-of-the-art learning algorithms approach or even surpass human performance
    in most of them, having recently encouraged the development of first-person 3D
    game platforms embedding realistic physics. Building upon this analysis, we
    first propose an embodied cognitive architecture integrating heterogenous
    sub-fields of Artificial Intelligence into a unified framework. We demonstrate
    the utility of our approach by showing how major contributions of the field can
    be expressed within the proposed framework. We then claim that benchmarking
    environments need to reproduce ecologically-valid conditions for bootstrapping
    the acquisition of increasingly complex cognitive skills through the concept of
    a cognitive arms race between embodied agents.

    Not All Pixels Are Equal: Difficulty-aware Semantic Segmentation via Deep Layer Cascade

    Xiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, Xiaoou Tang
    Comments: To appear in CVPR 2017 as a spotlight paper
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We propose a novel deep layer cascade (LC) method to improve the accuracy and
    speed of semantic segmentation. Unlike the conventional model cascade (MC) that
    is composed of multiple independent models, LC treats a single deep model as a
    cascade of several sub-models. Earlier sub-models are trained to handle easy
    and confident regions, and they progressively feed-forward harder regions to
    the next sub-model for processing. Convolutions are only calculated on these
    regions to reduce computations. The proposed method possesses several
    advantages. First, LC classifies most of the easy regions in the shallow stage
    and makes deeper stage focuses on a few hard regions. Such an adaptive and
    ‘difficulty-aware’ learning improves segmentation performance. Second, LC
    accelerates both training and testing of deep network thanks to early decisions
    in the shallow stage. Third, in comparison to MC, LC is an end-to-end trainable
    framework, allowing joint learning of all sub-models. We evaluate our method on
    PASCAL VOC and Cityscapes datasets, achieving state-of-the-art performance and
    fast speed.

    On Generalization and Regularization in Deep Learning

    Pirmin Lemberger
    Comments: 11 pages, 3 figures pedagogical paper
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Statistics Theory (math.ST)

    Why do large neural network generalize so well on complex tasks such as image
    classification or speech recognition? What exactly is the role regularization
    for them? These are arguably among the most important open questions in machine
    learning today. In a recent and thought provoking paper [C. Zhang et al.]
    several authors performed a number of numerical experiments that hint at the
    need for novel theoretical concepts to account for this phenomenon. The paper
    stirred quit a lot of excitement among the machine learning community but at
    the same time it created some confusion as discussions on OpenReview.net
    testifies. The aim of this pedagogical paper is to make this debate accessible
    to a wider audience of data scientists without advanced theoretical knowledge
    in statistical learning. The focus here is on explicit mathematical definitions
    and on a discussion of relevant concepts, not on proofs for which we provide
    references.

    Revisiting the problem of audio-based hit song prediction using convolutional neural networks

    Li-Chia Yang, Szu-Yu Chou, Jen-Yu Liu, Yi-Hsuan Yang, Yi-An Chen
    Comments: To appear in the proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    Subjects: Sound (cs.SD); Learning (cs.LG); Machine Learning (stat.ML)

    Being able to predict whether a song can be a hit has impor- tant
    applications in the music industry. Although it is true that the popularity of
    a song can be greatly affected by exter- nal factors such as social and
    commercial influences, to which degree audio features computed from musical
    signals (whom we regard as internal factors) can predict song popularity is an
    interesting research question on its own. Motivated by the recent success of
    deep learning techniques, we attempt to ex- tend previous work on hit song
    prediction by jointly learning the audio features and prediction models using
    deep learning. Specifically, we experiment with a convolutional neural net-
    work model that takes the primitive mel-spectrogram as the input for feature
    learning, a more advanced JYnet model that uses an external song dataset for
    supervised pre-training and auto-tagging, and the combination of these two
    models. We also consider the inception model to characterize audio infor-
    mation in different scales. Our experiments suggest that deep structures are
    indeed more accurate than shallow structures in predicting the popularity of
    either Chinese or Western Pop songs in Taiwan. We also use the tags predicted
    by JYnet to gain insights into the result of different models.

    Geometry of Factored Nuclear Norm Regularization

    Qiuwei Li, Zhihui Zhu, Gongguo Tang
    Subjects: Numerical Analysis (cs.NA); Learning (cs.LG)

    This work investigates the geometry of a nonconvex reformulation of
    minimizing a general convex loss function (f(X)) regularized by the matrix
    nuclear norm (|X|_*). Nuclear-norm regularized matrix inverse problems are at
    the heart of many applications in machine learning, signal processing, and
    control. The statistical performance of nuclear norm regularization has been
    studied extensively in literature using convex analysis techniques. Despite its
    optimal performance, the resulting optimization has high computational
    complexity when solved using standard or even tailored fast convex solvers. To
    develop faster and more scalable algorithms, we follow the proposal of
    Burer-Monteiro to factor the matrix variable (X) into the product of two
    smaller rectangular matrices (X=UV^T) and also replace the nuclear norm
    (|X|_*) with ((|U|_F^2+|V|_F^2)/2). In spite of the nonconvexity of the
    factored formulation, we prove that when the convex loss function (f(X)) is
    ((2r,4r))-restricted well-conditioned, each critical point of the factored
    problem either corresponds to the optimal solution (X^star) of the original
    convex optimization or is a strict saddle point where the Hessian matrix has a
    strictly negative eigenvalue. Such a geometric structure of the factored
    formulation allows many local search algorithms to converge to the global
    optimum with random initializations.

    Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

    Weilin Xu, David Evans, Yanjun Qi
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Learning (cs.LG)

    Although deep neural networks (DNNs) have achieved great success in many
    computer vision tasks, recent studies have shown they are vulnerable to
    adversarial examples. Such examples, typically generated by adding small but
    purposeful distortions, can frequently fool DNN models. Previous studies to
    defend against adversarial examples mostly focused on refining the DNN models.
    They have either shown limited success or suffer from the expensive
    computation. We propose a new strategy, emph{feature squeezing}, that can be
    used to harden DNN models by detecting adversarial examples. Feature squeezing
    reduces the search space available to an adversary by coalescing samples that
    correspond to many different feature vectors in the original space into a
    single sample. By comparing a DNN model’s prediction on the original input with
    that on the squeezed input, feature squeezing detects adversarial examples with
    high accuracy and few false positives. This paper explores two instances of
    feature squeezing: reducing the color bit depth of each pixel and smoothing
    using a spatial filter. These strategies are straightforward, inexpensive, and
    complementary to defensive methods that operate on the underlying model, such
    as adversarial training.

    DyVEDeep: Dynamic Variable Effort Deep Neural Networks

    Sanjay Ganapathy, Swagath Venkataramani, Balaraman Ravindran, Anand Raghunathan
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Deep Neural Networks (DNNs) have advanced the state-of-the-art in a variety
    of machine learning tasks and are deployed in increasing numbers of products
    and services. However, the computational requirements of training and
    evaluating large-scale DNNs are growing at a much faster pace than the
    capabilities of the underlying hardware platforms that they are executed upon.
    In this work, we propose Dynamic Variable Effort Deep Neural Networks
    (DyVEDeep) to reduce the computational requirements of DNNs during inference.
    Previous efforts propose specialized hardware implementations for DNNs,
    statically prune the network, or compress the weights. Complementary to these
    approaches, DyVEDeep is a dynamic approach that exploits the heterogeneity in
    the inputs to DNNs to improve their compute efficiency with comparable
    classification accuracy. DyVEDeep equips DNNs with dynamic effort mechanisms
    that, in the course of processing an input, identify how critical a group of
    computations are to classify the input. DyVEDeep dynamically focuses its
    compute effort only on the critical computa- tions, while skipping or
    approximating the rest. We propose 3 effort knobs that operate at different
    levels of granularity viz. neuron, feature and layer levels. We build DyVEDeep
    versions for 5 popular image recognition benchmarks – one for CIFAR-10 and four
    for ImageNet (AlexNet, OverFeat and VGG-16, weight-compressed AlexNet). Across
    all benchmarks, DyVEDeep achieves 2.1x-2.6x reduction in the number of scalar
    operations, which translates to 1.8x-2.3x performance improvement over a
    Caffe-based implementation, with < 0.5% loss in accuracy.

    Satellite Image-based Localization via Learned Embeddings

    Dong-Ki Kim, Matthew R. Walter
    Comments: To be published in IEEE International Conference on Robotics and Automation (ICRA), 2017
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We propose a vision-based method that localizes a ground vehicle using
    publicly available satellite imagery as the only prior knowledge of the
    environment. Our approach takes as input a sequence of ground-level images
    acquired by the vehicle as it navigates, and outputs an estimate of the
    vehicle’s pose relative to a georeferenced satellite image. We overcome the
    significant viewpoint and appearance variations between the images through a
    neural multi-view model that learns location-discriminative embeddings in which
    ground-level images are matched with their corresponding satellite view of the
    scene. We use this learned function as an observation model in a filtering
    framework to maintain a distribution over the vehicle’s pose. We evaluate our
    method on different benchmark datasets and demonstrate its ability localize
    ground-level images in environments novel relative to training, despite the
    challenges of significant viewpoint and appearance variations.


    Information Theory

    Distributed Hypothesis Testing Over Noisy Channels

    Sreejith Sreekumar, Deniz Gündüz
    Subjects: Information Theory (cs.IT)

    A distributed binary hypothesis testing problem, in which multiple observers
    transmit their observations to a detector over noisy channels, is studied.
    Given its own side information, the goal of the detector is to decide between
    two hypotheses for the joint distribution of the data. Single-letter upper and
    lower bounds on the optimal type 2 error exponent (T2-EE), when the type 1
    error probability vanishes with the block-length are obtained. These bounds
    coincide and characterize the optimal T2-EE when only a single helper is
    involved. Our result shows that the optimal T2-EE depends on the marginal
    distributions of the data and the channels rather than their joint
    distribution. However, an operational separation between HT and channel coding
    does not hold, and the optimal T2-EE is achieved by generating channel inputs
    correlated with observed data.

    On application of OMP and CoSaMP algorithms for DOA estimation problem

    Abhishek Aich, P. Palanisamy
    Comments: 5 pages, 4 figures
    Subjects: Information Theory (cs.IT)

    Remarkable properties of Compressed sensing (CS) has led researchers to
    utilize it in various other fields where a solution to an underdetermined
    system of linear equations is needed. One such application is in the area of
    array signal processing e.g. in signal denoising and Direction of Arrival (DOA)
    estimation. From the two prominent categories of CS recovery algorithms, namely
    convex optimization algorithms and greedy sparse approximation algorithms, we
    investigate the application of greedy sparse approximation algorithms to
    estimate DOA in the uniform linear array (ULA) environment. We conduct an
    empirical investigation into the behavior of the two state-of-the-art greedy
    algorithms: OMP and CoSaMP. This investigation takes into account the various
    scenarios such as varying degrees of noise level and coherency between the
    sources. We perform simulations to demonstrate the performances of these
    algorithms and give a brief analysis of the results.

    Quasi-cyclic self-dual codes of length 70

    Alexandre Zhdanov
    Subjects: Information Theory (cs.IT)

    In this paper we obtain a number of [70,35,12] singly even self-dual codes as
    a quasi-cyclic codes with m=2 (tailbitting convolutional codes). One of them is
    the first known code with parameters Beta=140 Gamma=0. All codes are not pure
    double circulant i.e. could not be represented in systematic form.

    Alternating Optimization for Capacity Region of Gaussian MIMO Broadcast Channels with Per-antenna Power Constraint

    Thuy M. Pham, Ronan Farrell, Le-Nam Tran
    Comments: Accepted for publication in VTC2017-Spring conference
    Subjects: Information Theory (cs.IT)

    This paper characterizes the capacity region of Gaussian MIMO broadcast
    channels (BCs) with per-antenna power constraint (PAPC). While the capacity
    region of MIMO BCs with a sum power constraint (SPC) was extensively studied,
    that under PAPC has received less attention. A reason is that efficient
    solutions for this problem are hard to find. The goal of this paper is to
    devise an efficient algorithm for determining the capacity region of Gaussian
    MIMO BCs subject to PAPC, which is scalable to the problem size. To this end,
    we first transform the weighted sum capacity maximization problem, which is
    inherently nonconvex with the input covariance matrices, into a convex
    formulation in the dual multiple access channel by minimax duality. Then we
    derive a computationally efficient algorithm combining the concept of
    alternating optimization and successive convex approximation. The proposed
    algorithm achieves much lower complexity compared to an existing interiorpoint
    method. Moreover, numerical results demonstrate that the proposed algorithm
    converges very fast under various scenarios.

    Low-complexity Approaches for MIMO Capacity with Per-antenna Power Constraint

    Thuy M. Pham, Ronan Farrell, Le-Nam Tran
    Comments: Accepted for publication in VTCSpring-2017 Conference
    Subjects: Information Theory (cs.IT)

    This paper proposes two low-complexity iterative algorithms to compute the
    capacity of a single-user multiple-input multiple-output channel with
    per-antenna power constraint. The first method results from manipulating the
    optimality conditions of the considered problem and applying fixed-point
    iteration. In the second approach, we transform the considered problem into a
    minimax optimization program using the well-known MAC- BC duality, and then
    solve it by a novel alternating optimization method. In both proposed iterative
    methods, each iteration involves an optimization problem which can be
    efficiently solved by the water-filling algorithm. The proposed iterative
    methods are provably convergent. Complexity analysis and extensive numerical
    experiments are carried out to demonstrate the superior performance of the
    proposed algorithms over an existing approach known as the mode-dropping
    algorithm.

    Using Convolutional Codes for Key Extraction in Physical Unclonable Functions

    Sven Müelich, Martin Bossert
    Comments: Submitted to “The Tenth International Workshop on Coding and Cryptography 2017”
    Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)

    Physical Unclonable Functions (PUFs) exploit variations in the manufacturing
    process to derive bit sequences from integrated circuits which can be used as
    secure cryptographic keys. Instead of storing the keys, they can be reproduced
    when needed. Since the reproduced sequences are not stable, error correction
    must be applied. Recently, convolutional codes were used for key reproduction.
    This work shows that using soft information at the input of the decoder and
    list decoding reduces the key error probability compared to results from the
    literature.

    Packet Throughput Analysis of Static and Dynamic TDD in Small Cell Networks

    Howard H. Yang, Giovanni Geraci, Yi Zhong, Tony Q. S. Quek
    Subjects: Information Theory (cs.IT)

    We develop an analytical framework for the perfor- mance comparison of small
    cell networks operating under static time division duplexing (S-TDD) and
    dynamic TDD (D-TDD). While in S-TDD downlink/uplink (DL/UL) cell transmissions
    are synchronized, in D-TDD each cell dynamically allocates resources to the
    most demanding direction. By leveraging stochastic geom- etry and queuing
    theory, we derive closed-form expressions for the UL and DL packet throughput,
    also capturing the impact of random traffic arrivals and packet
    retransmissions. Through our analysis, which is validated via simulations, we
    confirm that D-TDD outperforms S-TDD in DL, with the vice versa occurring in
    UL, since asymmetric transmissions reduce DL interference at the expense of an
    increased UL interference. We also find that in asymmetric scenarios, where
    most of the traffic is in DL, D-TDD provides a DL packet throughput gain by
    better controlling the queuing delay, and that such gain vanishes in the
    light-traffic regime.

    Proof of a conjecture of Kløve on permutation codes under the Chebychev distance

    Victor J. W. Guo, Yiting Yang
    Comments: 6 pages
    Journal-ref: Des. Codes Cryptogr. 83 (2017), 685-690
    Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

    Let (d) be a positive integer and (x) a real number. Let (A_{d, x}) be a
    (d imes 2d) matrix with its entries () a_{i,j}=left{ egin{array}{ll} x
    & mbox{for} 1leqslant jleqslant d+1-i, 1 & mbox{for} d+2-ileqslant
    jleqslant d+i, 0 & mbox{for} d+1+ileqslant jleqslant 2d. end{array}

    ight. () Further, let (R_d) be a set of sequences of integers as follows:
    ()R_d={(
    ho_1,
    ho_2,ldots,
    ho_d)|1leqslant
    ho_ileqslant d+i,
    1leqslant i leqslant d, mbox{and}
    ho_r
    eq
    ho_s mbox{for} r
    eq
    s}.() and define ()Omega_d(x)=sum_{
    hoin R_d}a_{1,
    ho_1}a_{2,

    ho_2}ldots a_{d,
    ho_d}.() In order to give a better bound on the size of
    spheres of permutation codes under the Chebychev distance, Kl{o}ve introduced
    the above function and conjectured that ()Omega_d(x)=sum_{m=0}^d{dchoose
    m}(m+1)^d(x-1)^{d-m}.() In this paper, we settle down this conjecture
    positively.

    Dynamic Base Station Repositioning to Improve Spectral Efficiency of Drone Small Cells

    Azade Fotouhi, Ming Ding, Mahbub Hassan
    Comments: Accepted at IEEE WoWMoM 2017 – 9 pages, 2 tables, 4 figures
    Subjects: Information Theory (cs.IT)

    With recent advancements in drone technology, researchers are now considering
    the possibility of deploying small cells served by base stations mounted on
    flying drones. A major advantage of such drone small cells is that the
    operators can quickly provide cellular services in areas of urgent demand
    without having to pre-install any infrastructure. Since the base station is
    attached to the drone, technically it is feasible for the base station to
    dynamic reposition itself in response to the changing locations of users for
    reducing the communication distance, decreasing the probability of signal
    blocking, and ultimately increasing the spectral efficiency. In this paper, we
    first propose distributed algorithms for autonomous control of drone movements,
    and then model and analyse the spectral efficiency performance of a drone small
    cell to shed new light on the fundamental benefits of dynamic repositioning. We
    show that, with dynamic repositioning, the spectral efficiency of drone small
    cells can be increased by nearly 100\% for realistic drone speed, height, and
    user traffic model and without incurring any major increase in drone energy
    consumption.

    Finite State Multiple-Access Wiretap Channel with Delayed Feedback

    Bin Dai, Zheng Ma, Yuan Luo, Xiaohu Tang
    Comments: Submitted to IEEE Transactions on Information Forensics and Security
    Subjects: Information Theory (cs.IT)

    Recently, the finite state Markov channel (FSMC) with an additional
    eavesdropper and delayed feedback from the legitimate receiver to the
    transmitter has been shown to be a useful model for the physical layer security
    of the practical mobile wireless communication systems. In this paper, we
    extend this model to a multiple-access situation (up-link of the wireless
    communication systems), which we call the finite state multiple-access wiretap
    channel (FS-MAC-WT) with delayed feedback. To be specific, the FS-MAC-WT is a
    channel with two inputs (transmitters) and two outputs (a legitimate receiver
    and an eavesdropper). The channel depends on a state which undergoes a Markov
    process, and the state is entirely known by the legitimate receiver and the
    eavesdropper. The legitimate receiver intends to send his channel output and
    the perfectly known state back to the transmitters through noiseless feedback
    channels after some time delay. The main contribution of this paper is to
    provide inner and outer bounds on the secrecy capacity regions of the FS-MAC-WT
    with delayed state feedback, and with or without delayed legitimate receiver’s
    channel output feedback. The capacity results are further explained via a
    degraded Gaussian fading example, and from this example we see that sending the
    legitimate receiver’s channel output back to the transmitters helps to enhance
    the achievable secrecy rate region of the FS-MAC-WT with only delayed state
    feedback.

    Greedy Sampling of Graph Signals

    Luiz F. O. Chamon, Alejandro Ribeiro
    Comments: 13 pages, 12 figures
    Subjects: Information Theory (cs.IT); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Sampling is a fundamental topic in graph signal processing, having found
    applications in estimation, clustering, and video compression. In contrast to
    traditional signal processing, the irregularity of the signal domain makes
    selecting a sampling set non-trivial and hard to analyze. Indeed, though
    conditions for graph signal interpolation from noiseless samples exist, they do
    not lead to a unique sampling set. Thus, the presence of noise makes sampling
    set selection a hard combinatorial problem. Although greedy sampling schemes
    have become ubiquitous in practice, they have no performance guarantee. This
    work takes a twofold approach to address this issue. First, universal
    performance bounds are derived for the interpolation of stochastic graph
    signals from noisy samples. In contrast to currently available bounds, they are
    not restricted to specific sampling schemes and hold for any sampling sets.
    Second, this paper provides near-optimal guarantees for greedy sampling by
    introducing the concept of approximate submodularity and updating the classical
    greedy bound. It then provides explicit bounds on the approximate
    supermodularity of the interpolation mean-square error showing that it can be
    optimized with worst-case guarantees using greedy search even though it is not
    supermodular. Simulations illustrate the derived bound for different graph
    models and show an application of graph signal sampling to reduce the
    complexity of kernel principal component analysis.

    The adaptive zero-error capacity for a class of channels with noisy feedback

    Meysam Asadi, Natasha Devroye
    Subjects: Information Theory (cs.IT)

    The adaptive zero-error capacity of discrete memoryless channels (DMC) with
    noiseless feedback has been shown to be positive whenever there exists at least
    one channel output “disprover”, i.e. a channel output that cannot be reached
    from at least one of the inputs. Furthermore, whenever there exists a
    disprover, the adaptive zero-error capacity attains the Shannon (small-error)
    capacity. Here, we study the zero-error capacity of a DMC when the channel
    feedback is noisy rather than perfect. We show that the adaptive zero-error
    capacity with noisy feedback is lower bounded by the forward channel’s
    zero-undetected error capacity, and show that under certain conditions this is
    tight.

    All binary linear codes that are invariant under (PSL_2(n))

    Cunsheng Ding, Hao Liu, Vladimir D. Tonchev
    Subjects: Information Theory (cs.IT)

    The projective special linear group (PSL_2(n)) is (2)-transitive for all
    primes (n) and (3)-homogeneous for (n equiv 3 pmod{4}) on the set ({0,1,
    cdots, n-1, infty}). It is known that the extended odd-like quadratic
    residue codes are invariant under (PSL_2(n)). Hence, the extended quadratic
    residue codes hold an infinite family of (2)-designs for primes (n equiv 1
    pmod{4}), an infinite family of (3)-designs for primes (n equiv 3 pmod{4}).
    To construct more (t)-designs with (t in {2, 3}), one would search for other
    extended cyclic codes over finite fields that are invariant under the action of
    (PSL_2(n)). The objective of this paper is to prove that the extended
    quadratic residue binary codes are the only nontrivial extended binary cyclic
    codes that are invariant under (PSL_2(n)).

    On the Glitch Phenomenon

    Leslie Lamport, Richard Palais
    Comments: In November 1976, this paper was rejected by the IEEE Transactions on Computers because the engineers who reviewed it could not understand the mathematics. Six years later, the journal apparently acquired more mathematically sophisticated reviewers, and it published a less general result with a more complicated proof
    Subjects: Information Theory (cs.IT); Dynamical Systems (math.DS); Optimization and Control (math.OC)

    The Principle of the Glitch states that for any device which makes a discrete
    decision based upon a continuous range of possible inputs, there are inputs for
    which it will take arbitrarily long to reach a decision. The appropriate
    mathematical setting for studying this principle is described. This involves
    defining the concept of continuity for mappings on sets of functions. It can
    then be shown that the glitch principle follows from the continuous behavior of
    the device.

    Massive MIMO Unlicensed: A New Approach to Dynamic Spectrum Access

    Adrian Garcia-Rodriguez, Giovanni Geraci, Lorenzo Galati Giordano, Andrea Bonfante, Ming Ding, David Lopez-Perez
    Comments: 6 pages, 6 figures
    Subjects: Information Theory (cs.IT)

    Nowadays, the demand for wireless mobile services is copious, and will
    continue increasing in the near future. Mobile cellular operators are therefore
    looking at the unlicensed spectrum as an economical supplement to augment the
    capacity of their soon-to-be overloaded networks. The same unlicensed bands are
    luring internet service providers, venue owners, and authorities into
    autonomously setting up and managing their high-performance private networks.
    In light of this exciting future, ensuring coexistence between multiple
    unlicensed technologies becomes a pivotal issue. So far this issue has been
    merely addressed via inefficient sharing schemes based on intermittent
    transmission. In this article, we present the fundamentals and the main
    challenges behind massive MIMO unlicensed, a brand-new approach for technology
    coexistence in the unlicensed bands, which is envisioned to boost spectrum
    reuse for a plethora of use cases.

    Applications of position-based coding to classical communication over quantum channels

    Haoyu Qi, Qingle Wang, Mark M. Wilde
    Comments: 36 pages
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    Recently, a coding technique called position-based coding has been used to
    establish achievability statements for various kinds of classical communication
    protocols that use quantum channels. In the present paper, not only do we apply
    this technique in the entanglement-assisted setting in order to establish lower
    bounds for error exponents, lower bounds on the second-order coding rate, and
    one-shot lower bounds, but we also demonstrate that position-based coding can
    be a powerful tool for analyzing other communication settings. In particular,
    we reduce the quantum simultaneous decoding conjecture for
    entanglement-assisted or unassisted communication over a quantum multiple
    access channel to open conjectures in multiple quantum hypothesis testing. We
    then determine an achievable rate region for entanglement-assisted or
    unassisted classical communication over a quantum multiple-access channel, when
    using a particular quantum simultaneous decoder. The achievable rate regions
    given in this latter case are generally suboptimal, involving differences of
    Renyi-2 entropies and conditional quantum entropies.




沪ICP备19023445号-2号
友情链接