IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 26 Oct 2016

    我爱机器学习(52ml.net)发表于 2016-10-26 00:00:00
    love 0

    Computer Vision and Pattern Recognition

    Spatial Relationship Based Features for Indian Sign Language Recognition

    B. M. Chethana Kumara, H. S. Nagendraswamy, R. Lekha Chinmayi
    Comments: 7 Pages, 9 Figures, 4 Tables
    Journal-ref: Int’l Journal of Computing, Communications & Instrumentation Engg.
    (IJCCIE) Vol. 3, Issue 2 (2016) ISSN 2349-1469 EISSN 2349-1477
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, the task of recognizing signs made by hearing impaired people
    at sentence level has been addressed. A novel method of extracting spatial
    features to capture hand movements of a signer has been proposed. Frames of a
    given video of a sign are preprocessed to extract face and hand components of a
    signer. The local centroids of the extracted components along with the global
    centroid are exploited to extract spatial features. The concept of interval
    valued type symbolic data has been explored to capture variations in the same
    sign made by the different signers at different instances of time. A suitable
    symbolic similarity measure is studied to establish matching between test and
    reference signs and a simple nearest neighbour classifier is used to recognize
    an unknown sign as one among the known signs by specifying a desired level of
    threshold. An extensive experimentation is conducted on a considerably large
    database of signs created by us during the course of research work in order to
    evaluate the performance of the proposed system

    End-to-end Learning of Deep Visual Representations for Image Retrieval

    Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus
    Comments: Extended version of our ECCV2016 paper “Deep Image Retrieval: Learning global representations for image search”
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    While deep learning has become a key ingredient in the top performing methods
    for many computer vision tasks, it has failed so far to bring similar
    improvements to instance-level image retrieval. In this article, we argue that
    reasons for the underwhelming results of deep methods on image retrieval are
    threefold: i) noisy training data, ii) inappropriate deep architecture, and
    iii) suboptimal training procedure. We address all three issues.

    First, we leverage a large-scale but noisy landmark dataset and develop an
    automatic cleaning method that produces a suitable training set for deep
    retrieval. Second, we build on the recent R-MAC descriptor, show that it can be
    interpreted as a deep and differentiable architecture, and present improvements
    to enhance it. Last, we train this network with a siamese architecture that
    combines three streams with a triplet loss. At the end of the training process,
    the proposed architecture produces a global image representation in a single
    forward pass that is well suited for image retrieval. Extensive experiments
    show that our approach significantly outperforms previous retrieval approaches,
    including state-of-the-art methods based on costly local descriptor indexing
    and spatial verification. On Oxford 5k, Paris 6k and Holidays, we respectively
    report 94.7, 96.6, and 94.8 mean average precision. Our representations can
    also be heavily compressed using product quantization with little loss in
    accuracy. For additional material, please see
    www.xrce.xerox.com/Deep-Image-Retrieval.

    PATH: Person Authentication using Trace Histories

    Upal Mahbub, Rama Chellappa
    Comments: 8 pages, 9 figures. Best Paper award at IEEE UEMCON 2016
    Journal-ref: The 7th IEEE Annual Ubiquitous Computing, Electronics & Mobile
    Communication Conference (IEEE UEMCON), New York, USA, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, a solution to the problem of Active Authentication using trace
    histories is addressed. Specifically, the task is to perform user verification
    on mobile devices using historical location traces of the user as a function of
    time. Considering the movement of a human as a Markovian motion, a modified
    Hidden Markov Model (HMM)-based solution is proposed. The proposed method,
    namely the Marginally Smoothed HMM (MSHMM), utilizes the marginal probabilities
    of location and timing information of the observations to smooth-out the
    emission probabilities while training. Hence, it can efficiently handle
    unforeseen observations during the test phase. The verification performance of
    this method is compared to a sequence matching (SM) method , a Markov
    Chain-based method (MC) and an HMM with basic Laplace Smoothing (HMM-lap).
    Experimental results using the location information of the UMD Active
    Authentication Dataset-02 (UMDAA02) and the GeoLife dataset are presented. The
    proposed MSHMM method outperforms the compared methods in terms of equal error
    rate (EER). Additionally, the effects of different parameters on the proposed
    method are discussed.

    Anatomically Constrained Video-CT Registration via the V-IMLOP Algorithm

    Seth D. Billings, Ayushi Sinha, Austin Reiter, Simon Leonard, Masaru Ishii, Gregory D. Hager, Russell H. Taylor
    Comments: 8 pages, 4 figures, MICCAI
    Journal-ref: Medical Image Computing and Computer-Assisted Intervention —
    MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21,
    2016, Proceedings, Part III. Vol. 9902, pp. 133-141
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Functional endoscopic sinus surgery (FESS) is a surgical procedure used to
    treat acute cases of sinusitis and other sinus diseases. FESS is fast becoming
    the preferred choice of treatment due to its minimally invasive nature.
    However, due to the limited field of view of the endoscope, surgeons rely on
    navigation systems to guide them within the nasal cavity. State of the art
    navigation systems report registration accuracy of over 1mm, which is large
    compared to the size of the nasal airways. We present an anatomically
    constrained video-CT registration algorithm that incorporates multiple video
    features. Our algorithm is robust in the presence of outliers. We also test our
    algorithm on simulated and in-vivo data, and test its accuracy against
    degrading initializations.

    Active User Authentication for Smartphones: A Challenge Data Set and Benchmark Results

    Upal Mahbub, Sayantan Sarkar, Vishal M. Patel, Rama Chellappa
    Comments: 8 pages, 12 figures, 6 tables. Best poster award at BTAS 2016
    Journal-ref: 8th IEEE International Conference on Biometrics: Theory,
    Applications, and Systems (BTAS), 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB)

    In this paper, automated user verification techniques for smartphones are
    investigated. A unique non-commercial dataset, the University of Maryland
    Active Authentication Dataset 02 (UMDAA-02) for multi-modal user authentication
    research is introduced. This paper focuses on three sensors – front camera,
    touch sensor and location service while providing a general description for
    other modalities. Benchmark results for face detection, face verification,
    touch-based user identification and location-based next-place prediction are
    presented, which indicate that more robust methods fine-tuned to the mobile
    platform are needed to achieve satisfactory verification accuracy. The dataset
    will be made available to the research community for promoting additional
    research.

    Maxmin convolutional neural networks for image classification

    Michael Blot, Matthieu Cord, Nicolas Thome
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional neural networks (CNN) are widely used in computer vision,
    especially in image classification. However, the way in which information and
    invariance properties are encoded through in deep CNN architectures is still an
    open question. In this paper, we propose to modify the standard convo- lutional
    block of CNN in order to transfer more information layer after layer while
    keeping some invariance within the net- work. Our main idea is to exploit both
    positive and negative high scores obtained in the convolution maps. This behav-
    ior is obtained by modifying the traditional activation func- tion step before
    pooling. We are doubling the maps with spe- cific activations functions, called
    MaxMin strategy, in order to achieve our pipeline. Extensive experiments on two
    classical datasets, MNIST and CIFAR-10, show that our deep MaxMin convolutional
    net outperforms standard CNN.

    mdBrief – A Fast Online Adaptable, Distorted Binary Descriptor for Real-Time Applications Using Calibrated Wide-Angle Or Fisheye Cameras

    Steffen Urban, Stefan Hinz
    Comments: 18 pages, 3 tables, 14 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Fast binary descriptors build the core for many vision based applications
    with real-time demands like object detection, Visual Odometry or SLAM. Commonly
    it is assumed, that the acquired images and thus the patches extracted around
    keypoints originate from a perspective projection ignoring image distortion or
    completely different types of projections such as omnidirectional or fisheye.
    Usually the deviations from a perfect perspective projection are corrected by
    undistortion. Latter, however, introduces severe artifacts if the cameras
    field-of-view gets larger. In this paper, we propose a distorted and masked
    version of the BRIEF descriptor for calibrated cameras. Instead of correcting
    the distortion holistically, we distort the binary tests and thus adapt the
    descriptor to different image regions.

    Camera Fingerprint: A New Perspective for Identifying User's Identity

    Xiang Jiang, Shikui Wei, Ruizhen Zhao, Yao Zhao, Xindong Wu
    Comments: 12 pages, 7 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Identifying user’s identity is a key problem in many data mining
    applications, such as product recommendation, customized content delivery and
    criminal identification. Given a set of accounts from the same or different
    social network platforms, user identification attempts to identify all accounts
    belonging to the same person. A commonly used solution is to build the
    relationship among different accounts by exploring their collective patterns,
    e.g., user profile, writing style, similar comments. However, this kind of
    method doesn’t work well in many practical scenarios, since the information
    posted explicitly by users may be false due to various reasons. In this paper,
    we re-inspect the user identification problem from a novel perspective, i.e.,
    identifying user’s identity by matching his/her cameras. The underlying
    assumption is that multiple accounts belonging to the same person contain the
    same or similar camera fingerprint information. The proposed framework, called
    User Camera Identification (UCI), is based on camera fingerprints, which takes
    fully into account the problems of multiple cameras and reposting behaviors.

    A Learned Representation For Artistic Style

    Vincent Dumoulin, Jonathon Shlens, Manjunath Kudlur
    Comments: 9 pages. 15 pages of Appendix
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    The diversity of painting styles represents a rich visual vocabulary for the
    construction of an image. The degree to which one may learn and parsimoniously
    capture this visual vocabulary measures our understanding of the higher level
    features of paintings, if not images in general. In this work we investigate
    the construction of a single, scalable deep network that can parsimoniously
    capture the artistic style of a diversity of paintings. We demonstrate that
    such a network generalizes across a diversity of artistic styles by reducing a
    painting to a point in an embedding space. Importantly, this model permits a
    user to explore new painting styles by arbitrarily combining the styles learned
    from individual paintings. We hope that this work provides a useful step
    towards building rich models of paintings and offers a window on to the
    structure of the learned representation of artistic style.

    Savu: A Python-based, MPI Framework for Simultaneous Processing of Multiple, N-dimensional, Large Tomography Datasets

    Nicola Wadeson, Mark Basham
    Comments: 10 pages, 10 figures, 1 table
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB)

    Diamond Light Source (DLS), the UK synchrotron facility, attracts scientists
    from across the world to perform ground-breaking x-ray experiments. With over
    3000 scientific users per year, vast amounts of data are collected across the
    experimental beamlines, with the highest volume of data collected during
    tomographic imaging experiments. A growing interest in tomography as an imaging
    technique, has led to an expansion in the range of experiments performed, in
    addition to a growth in the size of the data per experiment.

    Savu is a portable, flexible, scientific processing pipeline capable of
    processing multiple, n-dimensional datasets in serial on a PC, or in parallel
    across a cluster. Developed at DLS, and successfully deployed across the
    beamlines, it uses a modular plugin format to enable experiment-specific
    processing and utilises parallel HDF5 to remove RAM restrictions. The Savu
    design, described throughout this paper, focuses on easy integration of
    existing and new functionality, flexibility and ease of use for users and
    developers alike.

    Image Clustering without Ground Truth

    Abhisek Dash, Sujoy Chatterjee, Tripti Prasad, Malay Bhattacharyya
    Comments: GroupSight Workshop, Fourth AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2016), Austin, USA
    Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)

    Cluster analysis has become one of the most exercised research areas over the
    past few decades in computer science. As a consequence, numerous clustering
    algorithms have already been developed to find appropriate partitions of a set
    of objects. Given multiple such clustering solutions, it is a challenging task
    to obtain an ensemble of these solutions. This becomes more challenging when
    the ground truth about the number of clusters is unavailable. In this paper, we
    introduce a crowd-powered model to collect solutions of image clustering from
    the general crowd and pose it as a clustering ensemble problem with variable
    number of clusters. The varying number of clusters basically reflects the crowd
    workers’ perspective toward a particular set of objects. We allow a set of
    crowd workers to independently cluster the images as per their perceptions. We
    address the problem by finding out centroid of the clusters using an
    appropriate distance measure and prioritize the likelihood of similarity of the
    individual cluster sets. The effectiveness of the proposed method is
    demonstrated by applying it on multiple artificial datasets obtained from
    crowd.

    A Novel Boundary Matching Algorithm for Video Temporal Error Concealment

    Seyed Mojtaba Marvasti-Zadeh, Hossein Ghanei-Yakhdan, Shohreh Kasaei
    Comments: arXiv admin note: text overlap with arXiv:1610.07386
    Journal-ref: International Journal of Image, Graphics, and Signal Processing
    (IJIGSP), vol. 6, no. 6, pp. 1-10, May. 2014
    Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)

    With the fast growth of communication networks, the video data transmission
    from these networks is extremely vulnerable. Error concealment is a technique
    to estimate the damaged data by employing the correctly received data at the
    decoder. In this paper, an efficient boundary matching algorithm for estimating
    damaged motion vectors (MVs) is proposed. The proposed algorithm performs error
    concealment for each damaged macro block (MB) according to the list of
    identified priority of each frame. It then uses a classic boundary matching
    criterion or the proposed boundary matching criterion adaptively to identify
    matching distortion in each boundary of candidate MB. Finally, the candidate MV
    with minimum distortion is selected as an MV of damaged MB and the list of
    priorities is updated. Experimental results show that the proposed algorithm
    improves both objective and subjective qualities of reconstructed frames
    without any significant increase in computational cost. The PSNR for test
    sequences in some frames is increased about 4.7, 4.5, and 4.4 dB compared to
    the classic boundary matching, directional boundary matching, and directional
    temporal boundary matching algorithm, respectively.


    Artificial Intelligence

    Artificial Intelligence Safety and Cybersecurity: a Timeline of AI Failures

    Roman V. Yampolskiy, M. S. Spellchecker
    Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

    In this work, we present and analyze reported failures of artificially
    intelligent systems and extrapolate our analysis to future AIs. We suggest that
    both the frequency and the seriousness of future AI failures will steadily
    increase. AI Safety can be improved based on ideas developed by cybersecurity
    experts. For narrow AIs safety failures are at the same, moderate, level of
    criticality as in cybersecurity, however for general AI, failures have a
    fundamentally different impact. A single failure of a superintelligent system
    may cause a catastrophic event without a chance for recovery. The goal of
    cybersecurity is to reduce the number of successful attacks on the system; the
    goal of AI Safety is to make sure zero attacks succeed in bypassing the safety
    mechanisms. Unfortunately, such a level of performance is unachievable. Every
    security system will eventually fail; there is no such thing as a 100% secure
    system.

    Intelligence in Artificial Intelligence

    Shoumen Palit Austin Datta
    Subjects: Artificial Intelligence (cs.AI)

    The elusive quest for intelligence in artificial intelligence prompts us to
    consider that instituting human-level intelligence in systems may be (still) in
    the realm of utopia. In about a quarter century, we have witnessed the winter
    of AI (1990) being transformed and transported to the zenith of tabloid fodder
    about AI (2015). The discussion at hand is about the elements that constitute
    the canonical idea of intelligence. The delivery of intelligence as a
    pay-per-use-service, popping out of an app or from a shrink-wrapped software
    defined point solution, is in contrast to the bio-inspired view of intelligence
    as an outcome, perhaps formed from a tapestry of events, cross-pollinated by
    instances, each with its own microcosm of experiences and learning, which may
    not be discrete all-or-none functions but continuous, over space and time. The
    enterprise world may not require, aspire or desire such an engaged solution to
    improve its services for enabling digital transformation through the deployment
    of digital twins, for example. One might ask whether the “work-flow on
    steroids” version of decision support may suffice for intelligence? Are we
    harking back to the era of rule based expert systems? The image conjured by the
    publicity machines offers deep solutions with human-level AI and preposterous
    claims about capturing the “brain in a box” by 2020. Even emulating insects may
    be difficult in terms of real progress. Perhaps we can try to focus on worms
    (Caenorhabditis elegans) which may be better suited for what business needs to
    quench its thirst for so-called intelligence in AI.

    Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

    Amit Sheth, Sujan Perera, Sanjaya Wijeratne
    Comments: 15 pages, 5 figures
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Machine Learning has been a big success story during the AI resurgence. One
    particular stand out success relates to unsupervised learning from a massive
    amount of data, albeit much of it relates to one modality/type of data at a
    time. In spite of early assertions of the unreasonable effectiveness of data,
    there is increasing recognition of utilizing knowledge whenever it is available
    or can be created purposefully. In this paper, we focus on discussing the
    indispensable role of knowledge for deeper understanding of complex text and
    multimodal data in situations where (i) large amounts of training data
    (labeled/unlabeled) are not available or labor intensive to create, (ii) the
    objects (particularly text) to be recognized are complex (i.e., beyond simple
    entity-person/location/organization names), such as implicit entities and
    highly subjective content, and (iii) applications need to use complementary or
    related data in multiple modalities/media. What brings us to the cusp of rapid
    progress is our ability to (a) create knowledge, varying from comprehensive or
    cross domain to domain or application specific, and (b) carefully exploit the
    knowledge to further empower or extend the applications of ML/NLP techniques.
    Using the early results in several diverse situations – both in data types and
    applications – we seek to foretell unprecedented progress in our ability for
    deeper understanding and exploitation of multimodal data.

    Surprisal-Driven Zoneout

    Kamil Rocki, Tomasz Kornuta
    Comments: Submitted to Continual Learning and Deep Networks Workshop NIPS 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    We propose a novel method of regularization for recurrent neural networks
    called suprisal-driven zoneout. In this method, states extit{zoneout}
    (maintain their previous value rather than updating), when the
    extit{suprisal} (discrepancy between the last state’s prediction and target)
    is small. Thus regularization is adaptive and input-driven on a per-neuron
    basis. We demonstrate the effectiveness of this idea by achieving
    state-of-the-art bits per character of 1.32 on the Hutter Prize Wikipedia
    dataset, significantly reducing the gap to the best known highly-engineered
    compression methods.


    Information Retrieval

    Scalable Dynamic Topic Modeling with Clustered Latent Dirichlet Allocation (CLDA)

    Chris Gropp, Alexander Herzog, Ilya Safro, Paul W. Wilson, Amy W. Apon
    Subjects: Information Retrieval (cs.IR); Machine Learning (stat.ML)

    Topic modeling is an increasingly important component of Big Data analytics,
    enabling the sense-making of highly dynamic and diverse streams of text data.
    Traditional methods such as Dynamic Topic Modeling (DTM), while mathematically
    elegant, do not lend themselves well to direct parallelization because of
    dependencies from one time step to another. Data decomposition approaches that
    partition data across time segments and then combine results in a global view
    of the dynamic change of topics enable execution of topic models on much larger
    datasets than is possibly without data decomposition. However, these methods
    are difficult to analyze mathematically and are relatively untested for quality
    of topics and performance on parallel systems. In this paper, we introduce and
    empirically analyze Clustered Latent Dirichlet Allocation (CLDA), a method for
    extracting dynamic latent topics from a collection of documents. CLDA uses a
    data decomposition strategy to partition data. CLDA takes advantage of
    parallelism, enabling fast execution for even very large datasets and a large
    number of topics. A large corpus is split into local segments to extract
    textual information from different time steps. Latent Dirichlet Allocation
    (LDA) is applied to infer topics at local segments. The results are merged, and
    clustering is used to combine topics from different segments into global
    topics. Results show that the perplexity is comparable and that topics
    generated by this algorithm are similar to those generated by DTM. In addition,
    CLDA is two orders of magnitude faster than existing approaches and allows for
    more freedom of experiment design. In this paper CLDA is applied successfully
    to seventeen years of NIPS conference papers, seventeen years of computer
    science journal abstracts, and to forty years of the PubMed corpus.


    Computation and Language

    Statistical Machine Translation for Indian Languages: Mission Hindi 2

    Raj Nath Patel, Prakash B. Pimpale
    Comments: 4 pages, Published in the Proceedings of NLP Tools Contest: Statistical Machine Translation in Indian Languages
    Journal-ref: In the Proceedings of the 12th International Conference on Natural
    Language Processing (ICON 2015)
    Subjects: Computation and Language (cs.CL)

    This paper presents Centre for Development of Advanced Computing Mumbai’s
    (CDACM) submission to NLP Tools Contest on Statistical Machine Translation in
    Indian Languages (ILSMT) 2015 (collocated with ICON 2015). The aim of the
    contest was to collectively explore the effectiveness of Statistical Machine
    Translation (SMT) while translating within Indian languages and between English
    and Indian languages. In this paper, we report our work on all five language
    pairs, namely Bengali-Hindi (nhi), Marathi-Hindi (mrhi), Tamil-Hindi
    ( ahi), Telugu-Hindi ( ehi), and English-Hindi (enhi) for Health, Tourism,
    and General domains. We have used suffix separation, compound splitting and
    preordering prior to SMT training and testing.

    Sequence Segmentation Using Joint RNN and Structured Prediction Models

    Yossi Adi, Joseph Keshet, Emily Cibelli, Matthew Goldrick
    Comments: under review
    Subjects: Computation and Language (cs.CL)

    We describe and analyze a simple and effective algorithm for sequence
    segmentation applied to speech processing tasks. We propose a neural
    architecture that is composed of two modules trained jointly: a recurrent
    neural network (RNN) module and a structured prediction model. The RNN outputs
    are considered as feature functions to the structured model. The overall model
    is trained with a structured loss function which can be designed to the given
    segmentation task. We demonstrate the effectiveness of our method by applying
    it to two simple tasks commonly used in phonetic studies: word segmentation and
    voice onset time segmentation. Results sug- gest the proposed model is superior
    to previous methods, ob- taining state-of-the-art results on the tested
    datasets.

    Improving historical spelling normalization with bi-directional LSTMs and multi-task learning

    Marcel Bollmann, Anders Søgaard
    Comments: Accepted to COLING 2016
    Subjects: Computation and Language (cs.CL)

    Natural-language processing of historical documents is complicated by the
    abundance of variant spellings and lack of annotated data. A common approach is
    to normalize the spelling of historical words to modern forms. We explore the
    suitability of a deep neural network architecture for this task, particularly a
    deep bi-LSTM network applied on a character level. Our model compares well to
    previously established normalization algorithms when evaluated on a diverse set
    of texts from Early New High German. We show that multi-task learning with
    additional normalization data can improve our model’s performance further.

    How Document Pre-processing affects Keyphrase Extraction Performance

    Florian Boudin, Hugo Mougard, Damien Cram
    Comments: Accepted at the COLING 2016 Workshop on Noisy User-generated Text (WNUT)
    Subjects: Computation and Language (cs.CL)

    The SemEval-2010 benchmark dataset has brought renewed attention to the task
    of automatic keyphrase extraction. This dataset is made up of scientific
    articles that were automatically converted from PDF format to plain text and
    thus require careful preprocessing so that irrevelant spans of text do not
    negatively affect keyphrase extraction performance. In previous work, a wide
    range of document preprocessing techniques were described but their impact on
    the overall performance of keyphrase extraction models is still unexplored.
    Here, we re-assess the performance of several keyphrase extraction models and
    measure their robustness against increasingly sophisticated levels of document
    preprocessing.

    Still not there? Comparing Traditional Sequence-to-Sequence Models to Encoder-Decoder Neural Networks on Monotone String Translation Tasks

    Carsten Schnober, Steffen Eger, Erik-Lan do Dinh, Iryna Gurevych
    Comments: Accepted for publication at COLING 2016. See also: this https URL&tx_bibtex_pi1%5Bpub_id%5D=TUD-CS-2016-1450
    Subjects: Computation and Language (cs.CL)

    We analyze the performance of encoder-decoder neural models and compare them
    with well-known established methods. The latter represent different classes of
    traditional approaches that are applied to the monotone sequence-to-sequence
    tasks OCR post-correction, spelling correction, grapheme-to-phoneme conversion,
    and lemmatization. Such tasks are of practical relevance for various
    higher-level research fields including digital humanities, automatic text
    correction, and speech recognition. We investigate how well generic
    deep-learning approaches adapt to these tasks, and how they perform in
    comparison with established and more specialized methods, including our own
    adaptation of pruned CRFs.

    EmojiNet: Building a Machine Readable Sense Inventory for Emoji

    Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran
    Comments: 15 pages, 4 figures, 3 tables, Accepted to publish at the 8th International Conference on Social Informatics (SocInfo 2016) as a full research track paper
    Subjects: Computation and Language (cs.CL)

    Emoji are a contemporary and extremely popular way to enhance electronic
    communication. Without rigid semantics attached to them, emoji symbols take on
    different meanings based on the context of a message. Thus, like the word sense
    disambiguation task in natural language processing, machines also need to
    disambiguate the meaning or sense of an emoji. In a first step toward achieving
    this goal, this paper presents EmojiNet, the first machine readable sense
    inventory for emoji. EmojiNet is a resource enabling systems to link emoji with
    their context-specific meaning. It is automatically constructed by integrating
    multiple emoji resources with BabelNet, which is the most comprehensive
    multilingual sense inventory available to date. The paper discusses its
    construction, evaluates the automatic resource creation process, and presents a
    use case where EmojiNet disambiguates emoji usage in tweets. EmojiNet is
    available online for use at this http URL

    UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation

    Chunlei Zhang, Fahimeh Bahmaninezhad, Shivesh Ranjan, Chengzhu Yu, Navid Shokouhi, John H.L. Hansen
    Comments: 5 pages
    Subjects: Computation and Language (cs.CL)

    This document briefly describes the systems submitted by the Center for
    Robust Speech Systems (CRSS) from The University of Texas at Dallas (UTD) to
    the 2016 National Institute of Standards and Technology (NIST) Speaker
    Recognition Evaluation (SRE). We developed several UBM and DNN i-Vector based
    speaker recognition systems with different data sets and feature
    representations. Given that the emphasis of the NIST SRE 2016 is on language
    mismatch between training and enrollment/test data, so-called domain mismatch,
    in our system development we focused on: (1) using unlabeled in-domain data for
    centralizing data to alleviate the domain mismatch problem, (2) finding the
    best data set for training LDA/PLDA, (3) using newly proposed dimension
    reduction technique incorporating unlabeled in-domain data before PLDA
    training, (4) unsupervised speaker clustering of unlabeled data and using them
    alone or with previous SREs for PLDA training, (5) score calibration using only
    unlabeled data and combination of unlabeled and development (Dev) data as
    separate experiments.

    Learning to Reason With Adaptive Computation

    Mark Neumann, Pontus Stenetorp, Sebastian Riedel
    Subjects: Computation and Language (cs.CL)

    Multi-hop inference is necessary for machine learning systems to successfully
    solve tasks such as Recognising Textual Entailment and Machine Reading. In this
    work, we demonstrate the effectiveness of adaptive computation for learning the
    number of inference steps required for examples of different complexity and
    that learning the correct number of inference steps is difficult. We introduce
    the first model involving Adaptive Computation Time which provides a small
    performance benefit on top of a similar model without an adaptive component as
    well as enabling considerable insight into the reasoning process of the model.

    Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

    Amit Sheth, Sujan Perera, Sanjaya Wijeratne
    Comments: 15 pages, 5 figures
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Machine Learning has been a big success story during the AI resurgence. One
    particular stand out success relates to unsupervised learning from a massive
    amount of data, albeit much of it relates to one modality/type of data at a
    time. In spite of early assertions of the unreasonable effectiveness of data,
    there is increasing recognition of utilizing knowledge whenever it is available
    or can be created purposefully. In this paper, we focus on discussing the
    indispensable role of knowledge for deeper understanding of complex text and
    multimodal data in situations where (i) large amounts of training data
    (labeled/unlabeled) are not available or labor intensive to create, (ii) the
    objects (particularly text) to be recognized are complex (i.e., beyond simple
    entity-person/location/organization names), such as implicit entities and
    highly subjective content, and (iii) applications need to use complementary or
    related data in multiple modalities/media. What brings us to the cusp of rapid
    progress is our ability to (a) create knowledge, varying from comprehensive or
    cross domain to domain or application specific, and (b) carefully exploit the
    knowledge to further empower or extend the applications of ML/NLP techniques.
    Using the early results in several diverse situations – both in data types and
    applications – we seek to foretell unprecedented progress in our ability for
    deeper understanding and exploitation of multimodal data.


    Distributed, Parallel, and Cluster Computing

    Savu: A Python-based, MPI Framework for Simultaneous Processing of Multiple, N-dimensional, Large Tomography Datasets

    Nicola Wadeson, Mark Basham
    Comments: 10 pages, 10 figures, 1 table
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB)

    Diamond Light Source (DLS), the UK synchrotron facility, attracts scientists
    from across the world to perform ground-breaking x-ray experiments. With over
    3000 scientific users per year, vast amounts of data are collected across the
    experimental beamlines, with the highest volume of data collected during
    tomographic imaging experiments. A growing interest in tomography as an imaging
    technique, has led to an expansion in the range of experiments performed, in
    addition to a growth in the size of the data per experiment.

    Savu is a portable, flexible, scientific processing pipeline capable of
    processing multiple, n-dimensional datasets in serial on a PC, or in parallel
    across a cluster. Developed at DLS, and successfully deployed across the
    beamlines, it uses a modular plugin format to enable experiment-specific
    processing and utilises parallel HDF5 to remove RAM restrictions. The Savu
    design, described throughout this paper, focuses on easy integration of
    existing and new functionality, flexibility and ease of use for users and
    developers alike.

    A work-efficient parallel sparse matrix-sparse vector multiplication algorithm

    Ariful Azad, Aydin Buluc
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    We design and develop a work-efficient multithreaded algorithm for sparse
    matrix-sparse vector multiplication (SpMSpV) where the matrix, the input
    vector, and the output vector are all sparse. SpMSpV is an important primitive
    in the emerging GraphBLAS standard and is the workhorse of many graph
    algorithms including breadth-first search, bipartite graph matching, and
    maximal independent set. As thread counts increase, existing multithreaded
    SpMSpV algorithms can spend more time accessing the sparse matrix data
    structure than doing arithmetic. Our shared-memory parallel SpMSpV algorithm is
    work efficient in the sense its total work is proportional to the number of
    arithmetic operations required. The key insight is to avoid each thread
    individually scan the list of matrix columns.

    Our algorithm is simple to implement and operates on existing column-based
    sparse matrix formats. It performs well on diverse matrices and vectors with
    heterogeneous sparsity patterns. A high-performance implementation of the
    algorithm attains up to 15x speedup on a 24-core Intel Ivy Bridge processor and
    up to 49x speedup on a 64-core Intel KNL manycore processor. In contrast to
    implementations of existing algorithms, the performance of our algorithm is
    sustained on a variety of different input types include matrices representing
    scale-free and high-diameter graphs.

    A parallel framework for reverse search using mts

    David Avis, Charles Jordan
    Comments: 16 pages
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    We describe mts, which is a generic framework for parallelizing certain types
    of tree search programs, that (a) provides a single common wrapper containing
    all of the parallelization, and (b) minimizes the changes needed to the
    existing single processor legacy code. The mts code was derived from ideas used
    to develop mplrs, a parallelization of the reverse search vertex enumeration
    code lrs. The tree search properties required for the use of mts are satisfied
    by any reverse search algorithm as well as other tree search methods such as
    backtracking and branch and bound. mts is programmed in C, uses the MPI
    parallel environment, and can be run on a network of computers.

    As examples we parallelize two simple existing reverse search codes:
    generating topological orderings and generating spanning trees of a graph. We
    give computational results comparing the parallel codes with state of the art
    sequential codes for the same problems.

    Proceedings of the First International Workshop on Formal Methods for and on the Cloud

    Razieh Behjati (Simula Research Laboratory), Ahmed Elmokashfi (Simula Research Laboratory)
    Journal-ref: EPTCS 228, 2016
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Logic in Computer Science (cs.LO); Networking and Internet Architecture (cs.NI)

    Cloud solutions are increasingly used for a plethora of purposes, including
    solving memory-intensive and computation-intensive problems. Ensuring the
    reliability, availability, scalability, and security of cloud solutions, as
    networked distributed systems with properties such as dynamic reallocation of
    resources, is a challenging problem that requires rigorous modeling, analysis,
    and verification tools. Such tools can be devised using the techniques provided
    by the formal methods community. On the other hand, many formal analysis and
    verification tools are memory-intensive and computation-intensive solutions,
    which can benefit from the cloud technology.

    The goal of the iFMCloud workshop is to identify and better understand
    challenges of using formal and semi-formal methods for modeling and
    verification of Cloud-based systems and computer and communication networks, as
    well as challenges and opportunities in providing formal analysis and
    verification as services on the Cloud. We aim to reach these goals by bringing
    together researchers and practitioners from these, and other related fields.

    A Unified Coding Framework for Distributed Computing with Straggling Servers

    Songze Li, Mohammad Ali Maddah-Ali, A. Salman Avestimehr
    Comments: a shorter version to appear in NetCod 2016
    Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)

    We propose a unified coded framework for distributed computing with
    straggling servers, by introducing a tradeoff between “latency of computation”
    and “load of communication” for some linear computation tasks. We show that the
    coded scheme of [1]-[3] that repeats the intermediate computations to create
    coded multicasting opportunities to reduce communication load, and the coded
    scheme of [4], [5] that generates redundant intermediate computations to combat
    against straggling servers can be viewed as special instances of the proposed
    framework, by considering two extremes of this tradeoff: minimizing either the
    load of communication or the latency of computation individually. Furthermore,
    the latency-load tradeoff achieved by the proposed coded framework allows to
    systematically operate at any point on that tradeoff to perform distributed
    computing tasks. We also prove an information-theoretic lower bound on the
    latency-load tradeoff, which is shown to be within a constant multiplicative
    gap from the achieved tradeoff at the two end points.

    A Scalable Framework for Wireless Distributed Computing

    Songze Li, Qian Yu, Mohammad Ali Maddah-Ali, A. Salman Avestimehr
    Comments: A shorter version of this paper will be presented in IEEE GLOBECOM, 2016
    Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)

    In this paper, a scalable framework for wireless distributed computing is
    proposed, in which the data shuffling load does not increase with the number of
    users in the network. The key idea is to utilize a particular repetitive
    structure of computation assignments at the users, in order to provide coding
    opportunities at both the users and the access point, which reduce the
    shuffling load by a factor that grows linearly with the number of users. It is
    demonstrated that the proposed computation assignments and coded shuffling
    schemes are optimal (i.e., achieve the minimum shuffling load) for both a
    centralized setting and a decentralized setting, by developing tight
    information-theoretic lower bounds.

    A Fundamental Tradeoff between Computation and Communication in Distributed Computing

    Songze Li, Mohammad Ali Maddah-Ali, Qian Yu, A. Salman Avestimehr
    Comments: Submitted for publication, a shorter version to appear in proceedings of ISIT 2016
    Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)

    How can we optimally trade extra computing power to reduce the communication
    load in distributed computing? We answer this question by characterizing a
    fundamental tradeoff relationship between computation and communication in
    distributed computing, i.e., the two are inverse-linearly proportional to each
    other.

    More specifically, a general distributed computing framework, motivated by
    commonly used structures like MapReduce, is considered, where the goal is to
    compute (Q) arbitrary output functions from (N) input files, by decomposing the
    overall computation into computing a set of “Map” and “Reduce” functions
    distributedly across (K) computing nodes. A coded scheme, named “Coded
    Distributed Computing” (CDC), is proposed to demonstrate that increasing the
    computation load of the Map phase by a factor of (r) (i.e., evaluating each Map
    function at (r) carefully chosen nodes) can create novel coding opportunities
    in the data shuffling phase that reduce the communication load by the same
    factor.

    An information-theoretic lower bound on the communication load is also
    provided, which matches the communication load achieved by the CDC scheme. As a
    result, the optimal computation-communication tradeoff in distributed computing
    is exactly characterized.


    Learning

    Generalization Bounds for Weighted Automata

    Borja Balle, Mehryar Mohri
    Subjects: Learning (cs.LG); Formal Languages and Automata Theory (cs.FL)

    This paper studies the problem of learning weighted automata from a finite
    labeled training sample. We consider several general families of weighted
    automata defined in terms of three different measures: the norm of an
    automaton’s weights, the norm of the function computed by an automaton, or the
    norm of the corresponding Hankel matrix. We present new data-dependent
    generalization guarantees for learning weighted automata expressed in terms of
    the Rademacher complexity of these families. We further present upper bounds on
    these Rademacher complexities, which reveal key new data-dependent terms
    related to the complexity of learning weighted automata.

    Hybrid clustering-classification neural network in the medical diagnostics of reactive arthritis

    Yevgeniy Bodyanskiy, Olena Vynokurova, Volodymyr Savvo, Tatiana Tverdokhlib, Pavlo Mulesa
    Journal-ref: International Journal of Intelligent Systems and Applications,
    2016, Vol. 8, No. 8, pp.1-9
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    The hybrid clustering-classification neural network is proposed. This network
    allows increasing a quality of information processing under the condition of
    overlapping classes due to the rational choice of a learning rate parameter and
    introducing a special procedure of fuzzy reasoning in the clustering process,
    which occurs both with an external learning signal (supervised) and without the
    one (unsupervised). As similarity measure neighborhood function or membership
    one, cosine structures are used, which allow to provide a high flexibility due
    to self-learning-learning process and to provide some new useful properties.
    Many realized experiments have confirmed the efficiency of proposed hybrid
    clustering-classification neural network; also, this network was used for
    solving diagnostics task of reactive arthritis.

    Sparse Hierarchical Tucker Factorization and its Application to Healthcare

    Ioakeim Perros, Robert Chen, Richard Vuduc, Jimeng Sun
    Comments: This is an extended version of a paper presented at the 15th IEEE International Conference on Data Mining (ICDM 2015)
    Subjects: Learning (cs.LG); Numerical Analysis (cs.NA)

    We propose a new tensor factorization method, called the Sparse
    Hierarchical-Tucker (Sparse H-Tucker), for sparse and high-order data tensors.
    Sparse H-Tucker is inspired by its namesake, the classical Hierarchical Tucker
    method, which aims to compute a tree-structured factorization of an input data
    set that may be readily interpreted by a domain expert. However, Sparse
    H-Tucker uses a nested sampling technique to overcome a key scalability problem
    in Hierarchical Tucker, which is the creation of an unwieldy intermediate dense
    core tensor; the result of our approach is a faster, more space-efficient, and
    more accurate method. We extensively test our method on a real healthcare
    dataset, which is collected from 30K patients and results in an 18th order
    sparse data tensor. Unlike competing methods, Sparse H-Tucker can analyze the
    full data set on a single multi-threaded machine. It can also do so more
    accurately and in less time than the state-of-the-art: on a 12th order subset
    of the input data, Sparse H-Tucker is 18x more accurate and 7.5x faster than a
    previously state-of-the-art method. Even for analyzing low order tensors (e.g.,
    4-order), our method requires close to an order of magnitude less time and over
    two orders of magnitude less memory, as compared to traditional tensor
    factorization methods such as CP and Tucker. Moreover, we observe that Sparse
    H-Tucker scales nearly linearly in the number of non-zero tensor elements. The
    resulting model also provides an interpretable disease hierarchy, which is
    confirmed by a clinical expert.

    Distributed and parallel time series feature extraction for industrial big data applications

    Maximilian Christ, Andreas W. Kempa-Liehr, Michael Feindt
    Comments: ACML 2016 Workshop on Learning on Big Data (4)
    Subjects: Learning (cs.LG)

    The all-relevant problem of feature selection is the identification of all
    strongly and weakly relevant attributes. This problem is especially hard to
    solve for time series classification and regression in industrial applications
    such as predictive maintenance or production line optimization, for which each
    label or regression target is associated with several time series and
    meta-information simultaneously. Here, we are proposing an efficient, scalable
    feature extraction algorithm, which filters the available features in an early
    stage of the machine learning pipeline with respect to their significance for
    the classification or regression task, while controlling the expected
    percentage of selected but irrelevant features.

    The proposed algorithm combines established feature extraction methods with a
    feature importance filter. It has a low computational complexity, allows to
    start on a problem with only limited domain knowledge available, can be
    trivially parallelized, is highly scalable and based on well studied
    non-parametric hypothesis tests. We benchmark our proposed algorithm on all
    binary classification problems of the UCR time series classification archive as
    well as time series from a production line optimization project and simulated
    stochastic processes with underlying qualitative change of dynamics.

    Co-Occuring Directions Sketching for Approximate Matrix Multiply

    Youssef Mroueh, Etienne Marcheret, Vaibhava Goel
    Subjects: Learning (cs.LG)

    We introduce co-occurring directions sketching, a deterministic algorithm for
    approximate matrix product (AMM), in the streaming model. We show that
    co-occuring directions achieves a better error bound for AMM than other
    randomized and deterministic approaches for AMM. Co-occurring directions gives
    a (1 + epsilon) -approximation of the optimal low rank approximation of a
    matrix product. Empirically our algorithm outperforms competing methods for
    AMM, for a small sketch size. We validate empirically our theoretical findings
    and algorithms

    Surprisal-Driven Zoneout

    Kamil Rocki, Tomasz Kornuta
    Comments: Submitted to Continual Learning and Deep Networks Workshop NIPS 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    We propose a novel method of regularization for recurrent neural networks
    called suprisal-driven zoneout. In this method, states extit{zoneout}
    (maintain their previous value rather than updating), when the
    extit{suprisal} (discrepancy between the last state’s prediction and target)
    is small. Thus regularization is adaptive and input-driven on a per-neuron
    basis. We demonstrate the effectiveness of this idea by achieving
    state-of-the-art bits per character of 1.32 on the Hutter Prize Wikipedia
    dataset, significantly reducing the gap to the best known highly-engineered
    compression methods.

    Predicting Counterfactuals from Large Historical Data and Small Randomized Trials

    Nir Rosenfeld, Yishay Mansour, Elad Yom-Tov
    Subjects: Learning (cs.LG)

    When a new treatment is considered for use, whether a pharmaceutical drug or
    a search engine ranking algorithm, a typical question that arises is, will its
    performance exceed that of the current treatment? The conventional way to
    answer this counterfactual question is to estimate the effect of the new
    treatment in comparison to that of the conventional treatment by running a
    controlled, randomized experiment. While this approach theoretically ensures an
    unbiased estimator, it suffers from several drawbacks, including the difficulty
    in finding representative experimental populations as well as the cost of
    running such trials. Moreover, such trials neglect the huge quantities of
    available control-condition data which are often completely ignored.

    In this paper we propose a discriminative framework for estimating the
    performance of a new treatment given a large dataset of the control condition
    and data from a small (and possibly unrepresentative) randomized trial
    comparing new and old treatments. Our objective, which requires minimal
    assumptions on the treatments, models the relation between the outcomes of the
    different conditions. This allows us to not only estimate mean effects but also
    to generate individual predictions for examples outside the randomized sample.

    We demonstrate the utility of our approach through experiments in two areas:
    Search engine ranking and market value estimation. Our results demonstrate that
    our approach can reduce the number and size of the currently performed
    randomized controlled experiments, thus saving significant time, money and
    effort on the part of practitioners.

    Frank-Wolfe Algorithms for Saddle Point Problems

    Gauthier Gidel, Tony Jebara, Simon Lacoste-Julien
    Comments: 39 pages
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)

    We extend the Frank-Wolfe (FW) optimization algorithm to solve constrained
    smooth convex-concave saddle point (SP) problems. Remarkably, the method only
    requires access to linear minimization oracles. Leveraging recent advances in
    FW optimization, we provide the first proof of convergence of a FW-type saddle
    point solver over polytopes, thereby partially answering a 30 year-old
    conjecture. We also survey other convergence results and highlight gaps in the
    theoretical underpinnings of FW-style algorithms. Motivating applications
    without known efficient alternatives are explored through structured prediction
    with combinatorial penalties as well as games over matching polytopes involving
    an exponential number of constraints.

    Big Models for Big Data using Multi objective averaged one dependence estimators

    Mrutyunjaya Panda
    Comments: 21 pages, 2 Figures, 10 tables
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Even though, many researchers tried to explore the various possibilities on
    multi objective feature selection, still it is yet to be explored with best of
    its capabilities in data mining applications rather than going for developing
    new ones. In this paper, multi-objective evolutionary algorithm ENORA is used
    to select the features in a multi-class classification problem. The fusion of
    AnDE (averaged n-dependence estimators) with n=1, a variant of naive Bayes with
    efficient feature selection by ENORA is performed in order to obtain a fast
    hybrid classifier which can effectively learn from big data. This method aims
    at solving the problem of finding optimal feature subset from full data which
    at present still remains to be a difficult problem. The efficacy of the
    obtained classifier is extensively evaluated with a range of most popular 21
    real world dataset, ranging from small to big. The results obtained are
    encouraging in terms of time, Root mean square error, zero-one loss and
    classification accuracy.

    Approximate cross-validation formula for Bayesian linear regression

    Yoshiyuki Kabashima, Tomoyuki Obuchi, Makoto Uemura
    Comments: 5 pages, 2 figures, invited paper for Allerton2016 conference
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Cross-validation (CV) is a technique for evaluating the ability of
    statistical models/learning systems based on a given data set. Despite its wide
    applicability, the rather heavy computational cost can prevent its use as the
    system size grows. To resolve this difficulty in the case of Bayesian linear
    regression, we develop a formula for evaluating the leave-one-out CV error
    approximately without actually performing CV. The usefulness of the developed
    formula is tested by statistical mechanical analysis for a synthetic model.
    This is confirmed by application to a real-world supernova data set as well.

    A Bayesian Ensemble for Unsupervised Anomaly Detection

    Edward Yu, Parth Parekh
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Methods for unsupervised anomaly detection suffer from the fact that the data
    is unlabeled, making it difficult to assess the optimality of detection
    algorithms. Ensemble learning has shown exceptional results in classification
    and clustering problems, but has not seen as much research in the context of
    outlier detection. Existing methods focus on combining output scores of
    individual detectors, but this leads to outputs that are not easily
    interpretable. In this paper, we introduce a theoretical foundation for
    combining individual detectors with Bayesian classifier combination. Not only
    are posterior distributions easily interpreted as the probability distribution
    of anomalies, but bias, variance, and individual error rates of detectors are
    all easily obtained. Performance on real-world datasets shows high accuracy
    across varied types of time series data.

    A Theoretical Analysis of Noisy Sparse Subspace Clustering on Dimensionality-Reduced Data

    Yining Wang, Yu-Xiang Wang, Aarti Singh
    Comments: 40 pages, 2 figures. A shorter version of this paper titled “A Deterministic Analysis of Noisy Sparse Subspace Clustering on Dimensionality-Reduced Data” with partial results appeared at Proceedings of the 32nd International Conference on Machine Learning (ICML) held at Lille, France in 2015
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Subspace clustering is the problem of partitioning unlabeled data points into
    a number of clusters so that data points within one cluster lie approximately
    on a low-dimensional linear subspace. In many practical scenarios, the
    dimensionality of data points to be clustered are compressed due to constraints
    of measurement, computation or privacy. In this paper, we study the theoretical
    properties of a popular subspace clustering algorithm named sparse subspace
    clustering (SSC) and establish formal success conditions of SSC on
    dimensionality-reduced data. Our analysis applies to the most general fully
    deterministic model where both underlying subspaces and data points within each
    subspace are deterministically positioned, and also a wide range of
    dimensionality reduction techniques (e.g., Gaussian random projection, uniform
    subsampling, sketching) that fall into a subspace embedding framework (Meng &
    Mahoney, 2013; Avron et al., 2014). Finally, we apply our analysis to a
    differentially private SSC algorithm and established both privacy and utility
    guarantees of the proposed method.

    A Learned Representation For Artistic Style

    Vincent Dumoulin, Jonathon Shlens, Manjunath Kudlur
    Comments: 9 pages. 15 pages of Appendix
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    The diversity of painting styles represents a rich visual vocabulary for the
    construction of an image. The degree to which one may learn and parsimoniously
    capture this visual vocabulary measures our understanding of the higher level
    features of paintings, if not images in general. In this work we investigate
    the construction of a single, scalable deep network that can parsimoniously
    capture the artistic style of a diversity of paintings. We demonstrate that
    such a network generalizes across a diversity of artistic styles by reducing a
    painting to a point in an embedding space. Importantly, this model permits a
    user to explore new painting styles by arbitrarily combining the styles learned
    from individual paintings. We hope that this work provides a useful step
    towards building rich models of paintings and offers a window on to the
    structure of the learned representation of artistic style.


    Information Theory

    Efficient Polar Code Construction for Higher-Order Modulation

    Georg Böcherer, Tobias Prinz, Peihong Yuan, Fabian Steiner
    Comments: 11 figures
    Subjects: Information Theory (cs.IT)

    An efficient algorithm for the construction of polar codes for higher-order
    modulation is presented based on information-theoretic principles. The bit
    reliabilities after successive demapping are estimated using the LM-rate, an
    achievable rate for mismatched decoding. The successive demapper bit channels
    are then replaced by binary input Additive White Gaussian Noise (biAWGN)
    surrogate channels and polar codes are constructed using the Gaussian
    approximation (GA). This LM-rate Demapper GA (LM-DGA) construction is used to
    construct polar codes for several demapping strategies proposed in literature.
    For all considered demappers, the LM-DGA constructed polar codes have the same
    performance as polar codes constructed by Monte Carlo (MC) simulation. The
    proposed LM-DGA construction is much faster than the MC construction. For
    64-QAM, spectral efficiency 3 bits/s/Hz, and block length 1536 bits, simulation
    results show that LM-DGA constructed polar codes with cyclic redundancy check
    and successive cancellation list decoding are 1 dB more power efficient than
    state-of-the-art AR4JA low-density parity-check codes.

    Improved Upper Bounds on Systematic-Length for Linear Minimum Storage Regenerating Codes

    Kun Huang, Udaya Parampalli, Ming Xian
    Subjects: Information Theory (cs.IT)

    In this paper, we revisit the problem of finding the longest
    systematic-length (k) in a linear minimum storage regenerating (MSR) code, for
    a given storage capacity of each node (l) and an arbitrary number of parity
    nodes (r). We study it by following the geometric analysis of linear subspaces
    and operators. First, a simple quadratic bound is given, which implies that
    (k=r+2) is the largest number of systematic nodes in the emph{scalar} linear
    MSR scenario. Second, an (r)-based-log bound is derived, which is superior to
    the upper bound on log-base (2) in the prior work. Finally, an explicit bound
    depending on the value of (frac{r^2}{l}) is introduced, which further extends
    the corresponding result in the literature.

    On Fractional Linear Network Coding Solution of Multiple-Unicast Networks

    Niladri Das, Brijesh Kumar Rai
    Subjects: Information Theory (cs.IT)

    It is known that there exists a multiple-unicast network which has a rate (1)
    linear network coding solution if and only if the characteristic of the finite
    field belongs to a given finite or co-finite set of primes. In this paper, we
    show that for any non-zero positive rational number (frac{k}{n}), there exists
    a multiple-unicast network which has a rate (frac{k}{n}) fractional linear
    network coding solution if and only if the characteristic of the finite field
    belongs to a given finite or co-finite set of primes.

    Wasserstein Stability of the Entropy Power Inequality for Log-Concave Densities

    Thomas A. Courtade, Max Fathi, Ashwin Pananjady
    Comments: 19 pages
    Subjects: Information Theory (cs.IT); Functional Analysis (math.FA); Probability (math.PR)

    We establish quantitative stability results for the entropy power inequality
    (EPI). Specifically, we show that if uniformly log-concave densities nearly
    saturate the EPI, then they must be close to Gaussian densities in the
    quadratic Wasserstein distance. Further, if one of the densities is log-concave
    and the other is Gaussian, then the deficit in the EPI can be controlled in
    terms of the (L^1)-Wasserstein distance. As a counterpoint, an example shows
    that the EPI can be unstable with respect to the quadratic Wasserstein distance
    when densities are uniformly log-concave on sets of measure arbitrarily close
    to one. Our stability results can be extended to non-log-concave densities,
    provided certain regularity conditions are met. The proofs are based on optimal
    transportation.

    On the distance of stabilizer quantum codes given by (J)-affine variety codes

    Carlos Galindo, Olav Geil, Fernando Hernando, Diego Ruano
    Subjects: Information Theory (cs.IT)

    Self-orthogonal (J)-affine variety codes have been successfully used to
    obtain quantum stabilizer codes with excellent parameters. In a previous paper
    we gave formulae for the dimension of this family of quantum codes, but no
    bound for the minimum distance was given. In this work, we show how to derive
    quantum stabilizer codes with designed minimum distance from (J)-affine variety
    codes and their subfield-subcodes. Moreover, this allows us to obtain new
    quantum codes, some of them either, with better parameters, or with larger
    distances than the previously known codes.

    MIMO Systems With Low-Resolution ADCs: Linear Coding Approach

    Song-Nam Hong, Yo-Seb Jeon, Namyoon Lee
    Comments: Submitted to IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    This paper considers a multiple-input multiple-output (MIMO) system with
    low-resolution analog-to-digital converters (ADCs). In this system, the paper
    presents a new MIMO detection approach using coding theory. The principal idea
    of the proposed approach is to transform a non-linear MIMO channel to a linear
    MIMO channel by leveraging both a (p)-level quantizer and a lattice code where
    (pgeq 2). After transforming to the linear MIMO channel with the sets of
    finite input and output elements, efficient MIMO detection methods are proposed
    to attain both diversity and multiplexing gains by using algebraic coding
    theory. In particular, using the proposed methods, the analytical
    characterizations of achievable rates are derived for different MIMO
    configurations. One major observation is that the proposed approach is
    particularly useful for a large MIMO system with the ADCs that use a few bits.

    Omnidirectional Space-Time Block Coding for Common Information Broadcasting in Massive MIMO Systems

    Xin Meng, Xiang-Gen Xia, Xiqi Gao
    Subjects: Information Theory (cs.IT)

    In this paper, we design space-time block codes (STBCs) to broadcast the
    common information omnidirectionally in a massive MIMO downlink. To reduce the
    burden of the downlink channel estimation and achieve partial spatial diversity
    from base station (BS) transmit antennas, we propose channel-independently
    precoded low-dimensional STBC. The precoding matrix and the signal
    constellation in the low-dimensional STBC are jointly designed to guarantee
    omnidirectional coverage at any instant time and sufficiently utilize the power
    amplifier capacities of BS transmit antennas, and at the same time, achieve the
    full diversity of the low-dimensional STBC. Under this framework, several
    designs are presented. To provide transmit diversity order of two, a precoded
    Alamouti code is proposed, which has a fast symbol-wise maximum-likelihood (ML)
    decoding. To provide transmit diversity order of four, three types of STBCs are
    proposed, being referred to as precoded orthogonal STBC (OSTBC), precoded
    quasi-orthogonal STBC (QOSTBC), and precoded coordinate interleaved orthogonal
    design (CIOD), respectively. The last two codes have the same complexity for
    pair-wise ML decoding, while precoded QOSTBC has a higher coding gain when the
    bit rate is lower than or equal to 4 bps/Hz, and precoded CIOD has a higher
    coding gain when the bit rate is higher than 4 bps/Hz. Precoded OSTBC has a
    higher decoding complexity and a lower coding gain than the other two codes,
    since in the precoded OSTBC the information symbols need to be jointly designed
    and decoded. Moreover, a precoded no-zero-entry Toeplitz code and a precoded
    no-zero-entry overlapped Alamouti code are also proposed. These two codes can
    achieve a higher diversity order with linear receivers.

    MIMO Multiway Distributed-Relay Channel with Full Data Exchange: An Achievable Rate Perspective

    Xiang Zhao, Jianwen Zhang, Yingjun Zhang, Xiaojun Yuan
    Subjects: Information Theory (cs.IT)

    We consider efficient communications over the multiple-input multiple-output
    (MIMO) multiway distributed relay channel (MDRC) with full data exchange, where
    each user, equipped with multiple antennas, broadcasts its message to all the
    other users via the help of a number of distributive relays. We propose a
    physical-layer network coding (PNC) based scheme involving linear precoding for
    channel alignment nested lattice coding for PNC, and lattice-based precoding
    for interference mitigation, We show that, with the proposed scheme,
    distributed relaying achieves the same sum-rate as cooperative relaying in the
    high SNR regime. We also show that the proposed scheme achieve the asymptotic
    sum capacity of the MIMO MDRC within a constant gap at high SNR. Numerical
    results demonstrate that the proposed scheme considerably outperforms the
    existing schemes including decode-and-forward and amplify-and-forward.

    Construction of MDS self-dual codes from orthogonal matrices

    Minjia Shi, Lin Sok, Patrick Solé
    Comments: presented in part in the 3rd Sino-Korea International Conference on Coding Theory and Related Topics; submitted to IEEE trans. on Information Theory
    Subjects: Information Theory (cs.IT)

    In this paper, we give algorithms and methods of construction of self-dual
    codes over finite fields using orthogonal matrices. Randomization in the
    orthogonal group, and code extension are the main tools. Some optimal, almost
    MDS, and MDS self-dual codes over both small and large prime fields are
    constructed.

    Matroidal Structure of Skew Polynomial Rings with Application to Network Coding

    Siyu Liu, Felice Manganiello, Frank R. Kschischang
    Subjects: Information Theory (cs.IT)

    Over a finite field (mathbb{F}_{q^m}), the evaluation of skew polynomials is
    intimately related to the evaluation of linearized polynomials. This connection
    allows one to relate the concept of polynomial independence defined for skew
    polynomials to the familiar concept of linear independence for vector spaces.
    This relation allows for the definition of a representable matroid called the
    (mathbb{F}_{q^m}[x;sigma])-matroid, with rank function that makes it a metric
    space. Specific submatroids of this matroid are individually bijectively
    isometric to the projective geometry of (mathbb{F}_{q^m}) equipped with the
    subspace metric. This isometry allows one to use the
    (mathbb{F}_{q^m}[x;sigma])-matroid in a matroidal network coding application.

    New results for traitor tracing schemes

    Chong Shangguan, Jingxue Ma, Gennian Ge
    Comments: 9 pages, submitted
    Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

    In the last two decades, several classes of codes are introduced to protect
    the copyrighted digital data. They have important applications in the scenarios
    like digital fingerprinting and broadcast encryption schemes. In this paper we
    will discuss three important classes of such codes, namely, frameproof codes,
    parent-identifying codes and traceability codes.

    Firstly, suppose (N(t)) is the minimal integer such that there exists a
    binary (t)-frameproof code of length (N) with cardinality larger than (N), we
    prove that (N(t)gefrac{15+sqrt{33}}{24} (t-2)^2), which is a great
    improvement of the previously known bound (N(t)geinom{t+1}{2}). Moreover, we
    find that the determination of (N(t)) is closely related to a conjecture of
    ErdH{o}s, Frankl and F”uredi posed in the 1980’s, which implies the
    conjectured value (N(t)=t^2+o(t^2)). Secondly, we derive a new upper bound for
    parent-identifying codes, which is superior than all previously known bounds.
    Thirdly, we present an upper bound for 3-traceability codes, which shows that a
    (q)-ary 3-traceability code of length (N) can have at most (cq^{lceil
    N/9
    ceil}) codewords, where (c) is a constant only related to the code length
    (N). It is the first meaningful upper bound for 3-traceability codes and our
    result supports a conjecture of Blackburn et al. posed in 2010.

    Blind Detection for MIMO Systems With Low-Resolution ADCs Using Supervised Learning

    Yo-Seb Jeon, Song-Nam Hong, Namyoon Lee
    Comments: Submitted to IEEE Transactions on Signal Processing
    Subjects: Information Theory (cs.IT)

    This paper considers a multiple-input-multiple-output (MIMO) system with
    low-resolution analog-to-digital converters (ADCs). In this system, we present
    a novel blind detection framework that performs data symbol detection without
    explicitly knowing channel state information at a receiver. The underlying idea
    of the proposed framework is to exploit supervised learning. Specifically,
    during channel training, the proposed approach sends a sequence of data symbols
    as pilots so that the receiver learns a nonlinear function that is determined
    by both a channel matrix and a quantization function of the ADCs. During data
    transmission, the receiver uses the estimated nonlinear function from labeled
    training data to detect which data symbols were transmitted. We propose three
    blind detection methods, which are connected to a (K)-nearest neighbors
    classification and a nearest-centroid classification. We also provide an
    analytical expression for the symbol-vector-error probability of the MIMO
    systems with one-bit ADCs when employing the proposed framework. One major
    observation is that the symbol-vector-error probability decreases exponentially
    with the inverse of the number of transmit antennas, the operating
    signal-to-noise ratio, and the minimum distance that can increase with the
    number of receive antennas. Simulations demonstrate the performance improvement
    of the proposed framework compared to existing detection techniques.

    Wirelessly Powered Communication Networks with Short Packets

    Talha Ahmed Khan, Robert W. Heath Jr., Petar Popovski
    Comments: submitted to IEEE Transactions on Communications
    Subjects: Information Theory (cs.IT)

    Wirelessly powered communications will entail short packets due to naturally
    small payloads, low-latency requirements and/or insufficient energy resources
    to support longer transmissions. In this paper, a wirelessly powered
    communication system is investigated where an energy harvesting transmitter,
    charged by one or more power beacons via wireless energy transfer, attempts to
    communicate with a receiver over a noisy channel. Under a save-then-transmit
    protocol, the system performance is characterized using metrics such as the
    energy supply probability at the transmitter, and the achievable rate at the
    receiver for the case of short packets. Leveraging the framework of
    finite-length information theory, tractable analytical expressions are derived
    for the considered metrics in terms of system parameters such as the harvest
    blocklength, the transmit blocklength, the harvested power and the transmit
    power. The analysis provides several useful design guidelines. Though using a
    small transmit power or a small transmit blocklength helps avoid energy
    outages, the consequently smaller signal-to-noise ratio or the fewer coding
    opportunities may cause an information outage. Scaling laws are derived to
    capture this inherent trade-off between the harvest and transmit blocklengths.
    Moreover, the asymptotically optimal transmit power is derived in closed-form.
    Numerical results reveal that power control is essential for improving the
    achievable rate of the system in the finite blocklength regime. The
    asymptotically optimal transmit power yields nearly optimal performance in the
    finite blocklength regime.

    A Mathematical Model for Fingerprinting-based Localization Algorithms

    Arash Behboodi, Filip Lemic, Adam Wolisz
    Subjects: Information Theory (cs.IT); Emerging Technologies (cs.ET)

    Despite popularity of Fingerprinting Localization Algorithms (FPS), general
    theoretical frameworks for their performance studies have rarely been discussed
    in the literature. In this work, after setting up an abstract model for the
    FPS, it is shown that fingerprinting-based localization problem can be cast as
    a Hypothesis Testing (HT) problem and therefore various results in the HT
    literature can be used to provide insights for the general FPS. This includes
    scaling limits of localization reliability in terms of measurement numbers and
    the precise characterization of a geometric error. The main quantity that
    encapsulates this information is shown to be the Kullback-Leibler (KL)
    divergence between probability distributions of a selected feature for
    fingerprinting at different locations. The KL divergence can be used as a
    central performance metric, indicating how well a localization algorithm can
    distinguish two points. The framework is instantiated for Received Signal
    Strength (RSS)-based algorithms, where the effect of various parameters on the
    performance of fingerprinting algorithms is discussed, including path loss and
    fading characteristics, number of measurements, number of anchors and their
    locations and placement of training points. Simulations and experimental
    results characterize numerically the findings of the theoretical framework and
    demonstrate its consistency with realistic localization scenarios.

    The Markov Memory for Generating Rare Events

    C. Aghamohammadi, J. P. Crutchfield
    Comments: 10 pages, 5 figures; this http URL
    Subjects: Statistical Mechanics (cond-mat.stat-mech); Computational Complexity (cs.CC); Information Theory (cs.IT); Chaotic Dynamics (nlin.CD)

    We classify the rare events of structured, memoryful stochastic processes and
    use this to analyze sequential and parallel generators for these events. Given
    a stochastic process, we introduce a method to construct a new process whose
    typical realizations are a given process’ rare events. This leads to an
    expression for the minimum memory required to generate rare events. We then
    show that the recently discovered classical-quantum ambiguity of simplicity
    also occurs when comparing the structure of process fluctuations.

    Discrimination power of a quantum detector

    Christoph Hirche, Masahito Hayashi, Emilio Bagan, John Calsamiglia
    Comments: 4+8 pages, 2 figures
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph)

    We investigate the ability of a quantum measurement device to discriminate
    two states or, generically, two hypothesis. In full generality, the measurement
    can be performed a number (n) of times, and arbitrary pre-processing of the
    states and post-processing of the obtained data is allowed. Even if the two
    hypothesis correspond to orthogonal states, perfect discrimination is not
    always possible. There is thus an intrinsic error associated to the measurement
    device, which we aim to quantify, that limits its discrimination power. We
    minimize various error probabilities (averaged or constrained) over all pairs
    of (n)-partite input states. These probabilities, or their exponential rates of
    decrease in the case of large (n), give measures of the discrimination power of
    the device. For the asymptotic rate of the averaged error probability, we
    obtain a Chernoff-type bound, dual to the standard Chernoff bound for which the
    state pair is fixed and the optimization is over all measurements. The key
    point in the derivation is that i.i.d. states become optimal in asymptotic
    settings. Minimum asymptotic rates are also obtained for constrained error
    probabilities, dual to Stein’s Lemma and Hoeffding’s bound. We further show
    that adaptive protocols where the state preparer gets feedback from the
    measurer do not improve the asymptotic rates. These rates thus quantify the
    ultimate discrimination power of a measurement device.




沪ICP备19023445号-2号
友情链接