IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 6 Jun 2017

    我爱机器学习(52ml.net)发表于 2017-06-06 00:00:00
    love 0

    Neural and Evolutionary Computing

    Neuro-RAM Unit with Applications to Similarity Testing and Compression in Spiking Neural Networks

    Nancy Lynch, Cameron Musco, Merav Parter
    Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Neurons and Cognition (q-bio.NC)

    We study distributed algorithms implemented in a simplified but biologically
    plausible model for stochastic spiking neural networks. We focus on tradeoffs
    between computation time and network complexity, along with the role of
    randomness in efficient neural computation.

    It is widely accepted that neural computation is inherently stochastic. In
    recent work, we explored how this stochasticity could be leveraged to solve the
    `winner-take-all’ leader election task. Here, we focus on using randomness in
    neural algorithms for similarity testing and compression. In the most basic
    setting, given two (n)-length patterns of firing neurons, we wish to
    distinguish if the patterns are equal or (epsilon)-far from equal.

    Randomization allows us to solve this task with a very compact network, using
    (O left (frac{sqrt{n}log n}{epsilon}
    ight)) auxiliary neurons, which is
    sublinear in the input size. At the heart of our solution is the design of a
    (t)-round neural random access memory, or indexing network, which we call a
    neuro-RAM. This module can be implemented with (O(n/t)) auxiliary neurons and
    is useful in many applications beyond similarity testing.

    Using a combination of Yao’s minimax principle and a VC dimension-based
    argument, we show that the tradeoff between runtime and network size in our
    neuro-RAM is nearly optimal. Our result has several implications — since our
    neuro-RAM construction can be implemented with deterministic threshold gates,
    it demonstrates that, in contrast to similarity testing, randomness does not
    provide significant computational advantages for this problem. It also
    establishes a separation between our networks, which spike with a sigmoidal
    probability function, and well-studied but less biologically plausible
    deterministic sigmoidal networks, whose gates output real number values, and
    which can implement a neuro-RAM much more efficiently.

    Neuroevolution on the Edge of Chaos

    Filip Matzner
    Comments: To appear in Proceedings of the Genetic and Evolutionary Computation Conference 2017 (GECCO ’17)
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Echo state networks represent a special type of recurrent neural networks.
    Recent papers stated that the echo state networks maximize their computational
    performance on the transition between order and chaos, the so-called edge of
    chaos. This work confirms this statement in a comprehensive set of experiments.
    Furthermore, the echo state networks are compared to networks evolved via
    neuroevolution. The evolved networks outperform the echo state networks,
    however, the evolution consumes significant computational resources. It is
    demonstrated that echo state networks with local connections combine the best
    of both worlds, the simplicity of random echo state networks and the
    performance of evolved networks. Finally, it is shown that evolution tends to
    stay close to the ordered side of the edge of chaos.

    Submanifold Sparse Convolutional Networks

    Benjamin Graham, Laurens van der Maaten
    Comments: 10 pages
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

    Convolutional network are the de-facto standard for analysing spatio-temporal
    data such as images, videos, 3D shapes, etc. Whilst some of this data is
    naturally dense (for instance, photos), many other data sources are inherently
    sparse. Examples include pen-strokes forming on a piece of paper, or (colored)
    3D point clouds that were obtained using a LiDAR scanner or RGB-D camera.
    Standard “dense” implementations of convolutional networks are very inefficient
    when applied on such sparse data. We introduce a sparse convolutional operation
    tailored to processing sparse data that differs from prior work on sparse
    convolutional networks in that it operates strictly on submanifolds, rather
    than “dilating” the observation with every layer in the network. Our empirical
    analysis of the resulting submanifold sparse convolutional networks shows that
    they perform on par with state-of-the-art methods whilst requiring
    substantially less computation.

    A Joint Model for Question Answering and Question Generation

    Tong Wang, Xingdi Yuan, Adam Trischler
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose a generative machine comprehension model that learns jointly to
    ask and answer questions based on documents. The proposed model uses a
    sequence-to-sequence framework that encodes the document and generates a
    question (answer) given an answer (question). Significant improvement in model
    performance is observed empirically on the SQuAD corpus, confirming our
    hypothesis that the model benefits from jointly learning to perform both tasks.
    We believe the joint model’s novelty offers a new perspective on machine
    comprehension beyond architectural engineering, and serves as a first step
    towards autonomous information seeking.

    NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

    Alessandro Aimar, Hesham Mostafa, Enrico Calabrese, Antonio Rios-Navarro, Ricardo Tapiador-Morales, Iulia-Alexandra Lungu, Moritz B. Milde, Federico Corradi, Alejandro Linares-Barranco, Shih-Chii Liu, Tobi Delbruck
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Convolutional neural networks (CNNs) have become the dominant neural network
    architecture for solving many state-of-the-art (SOA) visual processing tasks.
    Even though Graphical Processing Units (GPUs) are most often used in training
    and deploying CNNs, their power consumption becomes a problem for real time
    mobile applications. We propose a flexible and efficient CNN accelerator
    architecture which can support the implementation of SOA CNNs in low-power and
    low-latency application scenarios. This architecture exploits the sparsity of
    neuron activations in CNNs to accelerate the computation and reduce memory
    requirements. The flexible architecture allows high utilization of available
    computing resources across a wide range of convolutional network kernel sizes;
    and numbers of input and output feature maps. We implemented the proposed
    architecture on an FPGA platform and present results showing how our
    implementation reduces external memory transfers and compute time in five
    different CNNs ranging from small ones up to the widely known large VGG16 and
    VGG19 CNNs. We show how in RTL simulations in a 28nm process with a clock
    frequency of 500MHz, the NullHop core is able to reach over 450 GOp/s and
    efficiency of 368%, maintaining over 98% utilization of the MAC units and
    achieving a power efficiency of over 3TOp/s/W in a core area of 5.8mm2

    Event Representations for Automated Story Generation with Deep Neural Nets

    Lara J. Martin, Prithviraj Ammanabrolu, William Hancock, Shruti Singh, Brent Harrison, Mark O. Riedl
    Comments: 8 pages, 1 figure
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Automated story generation is the problem of automatically selecting a
    sequence of events, actions, or words that can be told as a story. We seek to
    develop a system that can generate stories by learning everything it needs to
    know from textual story corpora. To date, recurrent neural networks that learn
    language models at character, word, or sentence levels have had little success
    generating coherent stories. We explore the question of event representations
    that provide a mid-level of abstraction between words and sentences in order to
    retain the semantic information of the original data while minimizing event
    sparsity. We present a technique for preprocessing textual story data into
    event sequences. We then present a technique for automated story generation
    whereby we decompose the problem into the generation of successive events
    (event2event) and the generation of natural language sentences from events
    (event2sentence). We give empirical results comparing different event
    representations and their effects on event successor generation and the
    translation of events to natural language.

    Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework

    Shuochao Yao, Yiran Zhao, Aston Zhang, Lu Su, Tarek Abdelzaher
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)

    Recent advances in deep learning motivate the use of deep neutral networks in
    sensing applications, but their excessive resource needs on constrained
    embedded devices remain an important impediment. A recently explored solution
    space lies in compressing (approximating or simplifying) deep neural networks
    in some manner before use on the device. We propose a new compression solution,
    called DeepIoT, that makes two key contributions in that space. First, unlike
    current solutions geared for compressing specific types of neural networks,
    DeepIoT presents a unified approach that compresses all commonly used deep
    learning structures for sensing applications, including fully-connected,
    convolutional, and recurrent neural networks, as well as their combinations.
    Second, unlike solutions that either sparsify weight matrices or assume linear
    structure within weight matrices, DeepIoT compresses neural network structures
    into smaller dense matrices by finding the minimum number of non-redundant
    hidden elements, such as filters and dimensions required by each layer, while
    keeping the performance of sensing applications the same. Importantly, it does
    so using an approach that obtains a global view of parameter redundancies,
    which is shown to produce superior compression. We conduct experiments with
    five different sensing-related tasks on Intel Edison devices. DeepIoT
    outperforms all compared baseline algorithms with respect to execution time and
    energy consumption by a significant margin. It reduces the size of deep neural
    networks by 90% to 98.9%. It is thus able to shorten execution time by 71.4% to
    94.5%, and decrease energy consumption by 72.2% to 95.7%. These improvements
    are achieved without loss of accuracy. The results underscore the potential of
    DeepIoT for advancing the exploitation of deep neural networks on
    resource-constrained embedded devices.

    MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU

    Qingqing Cao, Niranjan Balasubramanian, Aruna Balasubramanian
    Comments: Published at 1st International Workshop on Embedded and Mobile Deep Learning colocated with MobiSys 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    In this paper, we explore optimizations to run Recurrent Neural Network (RNN)
    models locally on mobile devices. RNN models are widely used for Natural
    Language Processing, Machine Translation, and other tasks. However, existing
    mobile applications that use RNN models do so on the cloud. To address privacy
    and efficiency concerns, we show how RNN models can be run locally on mobile
    devices. Existing work on porting deep learning models to mobile devices focus
    on Convolution Neural Networks (CNNs) and cannot be applied directly to RNN
    models. In response, we present MobiRNN, a mobile-specific optimization
    framework that implements GPU offloading specifically for mobile GPUs.
    Evaluations using an RNN model for activity recognition shows that MobiRNN does
    significantly decrease the latency of running RNN models on phones.


    Computer Vision and Pattern Recognition

    Visual Interaction Networks

    Nicholas Watters, Andrea Tacchetti, Theophane Weber, Razvan Pascanu, Peter Battaglia, Daniel Zoran
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    From just a glance, humans can make rich predictions about the future state
    of a wide range of physical systems. On the other hand, modern approaches from
    engineering, robotics, and graphics are often restricted to narrow domains and
    require direct measurements of the underlying states. We introduce the Visual
    Interaction Network, a general-purpose model for learning the dynamics of a
    physical system from raw visual observations. Our model consists of a
    perceptual front-end based on convolutional neural networks and a dynamics
    predictor based on interaction networks. Through joint training, the perceptual
    front-end learns to parse a dynamic visual scene into a set of factored latent
    object representations. The dynamics predictor learns to roll these states
    forward in time by computing their interactions and dynamics, producing a
    predicted physical trajectory of arbitrary length. We found that from just six
    input video frames the Visual Interaction Network can generate accurate future
    trajectories of hundreds of time steps on a wide range of physical systems. Our
    model can also be applied to scenes with invisible objects, inferring their
    future states from their effects on the visible objects, and can implicitly
    infer the unknown mass of objects. Our results demonstrate that the perceptual
    module and the object-based dynamics predictor module can induce factored
    latent representations that support accurate dynamical predictions. This work
    opens new opportunities for model-based decision-making and planning from raw
    sensory observations in complex physical environments.

    NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

    Alessandro Aimar, Hesham Mostafa, Enrico Calabrese, Antonio Rios-Navarro, Ricardo Tapiador-Morales, Iulia-Alexandra Lungu, Moritz B. Milde, Federico Corradi, Alejandro Linares-Barranco, Shih-Chii Liu, Tobi Delbruck
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Convolutional neural networks (CNNs) have become the dominant neural network
    architecture for solving many state-of-the-art (SOA) visual processing tasks.
    Even though Graphical Processing Units (GPUs) are most often used in training
    and deploying CNNs, their power consumption becomes a problem for real time
    mobile applications. We propose a flexible and efficient CNN accelerator
    architecture which can support the implementation of SOA CNNs in low-power and
    low-latency application scenarios. This architecture exploits the sparsity of
    neuron activations in CNNs to accelerate the computation and reduce memory
    requirements. The flexible architecture allows high utilization of available
    computing resources across a wide range of convolutional network kernel sizes;
    and numbers of input and output feature maps. We implemented the proposed
    architecture on an FPGA platform and present results showing how our
    implementation reduces external memory transfers and compute time in five
    different CNNs ranging from small ones up to the widely known large VGG16 and
    VGG19 CNNs. We show how in RTL simulations in a 28nm process with a clock
    frequency of 500MHz, the NullHop core is able to reach over 450 GOp/s and
    efficiency of 368%, maintaining over 98% utilization of the MAC units and
    achieving a power efficiency of over 3TOp/s/W in a core area of 5.8mm2

    ToPs: Ensemble Learning with Trees of Predictors

    Jinsung Yoon, William R. Zame, Mihaela van der Schaar
    Comments: 11 pages, 2 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a new approach to ensemble learning. Our approach constructs a
    tree of subsets of the feature space and associates a predictor (predictive
    model) – determined by training one of a given family of base learners on an
    endogenously determined training set – to each node of the tree; we call the
    resulting object a tree of predictors. The (locally) optimal tree of predictors
    is derived recursively; each step involves jointly optimizing the split of the
    terminal nodes of the previous tree and the choice of learner and training set
    (hence predictor) for each set in the split. The feature vector of a new
    instance determines a unique path through the optimal tree of predictors; the
    final prediction aggregates the predictions of the predictors along this path.
    We derive loss bounds for the final predictor in terms of the Rademacher
    complexity of the base learners. We report the results of a number of
    experiments on a variety of datasets, showing that our approach provides
    statistically significant improvements over state-of-the-art machine learning
    algorithms, including various ensemble learning methods. Our approach works
    because it allows us to endogenously create more complex learners – when needed
    – and endogenously match both the learner and the training set to the
    characteristics of the dataset while still avoiding over-fitting.

    Learning Structured Semantic Embeddings for Visual Recognition

    Dong Li, Hsin-Ying Lee, Jia-Bin Huang, Shengjin Wang, Ming-Hsuan Yang
    Comments: 9 pages, 6 figures, 5 tables, conference
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Numerous embedding models have been recently explored to incorporate semantic
    knowledge into visual recognition. Existing methods typically focus on
    minimizing the distance between the corresponding images and texts in the
    embedding space but do not explicitly optimize the underlying structure. Our
    key observation is that modeling the pairwise image-image relationship improves
    the discrimination ability of the embedding model. In this paper, we propose
    the structured discriminative and difference constraints to learn
    visual-semantic embeddings. First, we exploit the discriminative constraints to
    capture the intra- and inter-class relationships of image embeddings. The
    discriminative constraints encourage separability for image instances of
    different classes. Second, we align the difference vector between a pair of
    image embeddings with that of the corresponding word embeddings. The difference
    constraints help regularize image embeddings to preserve the semantic
    relationships among word embeddings. Extensive evaluations demonstrate the
    effectiveness of the proposed structured embeddings for single-label
    classification, multi-label classification, and zero-shot recognition.

    Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning

    Jingkuan Song, Zhao Guo, Lianli Gao, Wu Liu, Dongxiang Zhang, Heng Tao Shen
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent progress has been made in using attention based encoder-decoder
    framework for video captioning. However, most existing decoders apply the
    attention mechanism to every generated word including both visual words (e.g.,
    “gun” and “shooting”) and non-visual words (e.g. “the”, “a”). However, these
    non-visual words can be easily predicted using natural language model without
    considering visual signals or attention. Imposing attention mechanism on
    non-visual words could mislead and decrease the overall performance of video
    captioning. To address this issue, we propose a hierarchical LSTM with adjusted
    temporal attention (hLSTMat) approach for video captioning. Specifically, the
    proposed framework utilizes the temporal attention for selecting specific
    frames to predict the related words, while the adjusted temporal attention is
    for deciding whether to depend on the visual information or the language
    context information. Also, a hierarchical LSTMs is designed to simultaneously
    consider both low-level visual information and high-level language context
    information to support the video caption generation. To demonstrate the
    effectiveness of our proposed framework, we test our method on two prevalent
    datasets: MSVD and MSR-VTT, and experimental results show that our approach
    outperforms the state-of-the-art methods on both two datasets.

    A Kind of Affine Weighted Moment Invariants

    Hanlin Mo, You Hao, Shirui Li, Hua Li
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A new kind of geometric invariants is proposed in this paper, which is called
    affine weighted moment invariant (AWMI). By combination of local affine
    differential invariants and a framework of global integral, they can more
    effectively extract features of images and help to increase the number of
    low-order invariants and to decrease the calculating cost. The experimental
    results show that AWMIs have good stability and distinguishability and achieve
    better results in image retrieval than traditional affine geometric moment
    invariants(AGMIs). An extension to 3D is straightforward.

    Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification

    Rao Muhammad Anwer, Fahad Shahbaz Khan, Joost van de Weijer, Matthieu Molinier, Jorma Laaksonen
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Designing discriminative powerful texture features robust to realistic
    imaging conditions is a challenging computer vision problem with many
    applications, including material recognition and analysis of satellite or
    aerial imagery. In the past, most texture description approaches were based on
    dense orderless statistical distribution of local features. However, most
    recent approaches to texture recognition and remote sensing scene
    classification are based on Convolutional Neural Networks (CNNs). The d facto
    practice when learning these CNN models is to use RGB patches as input with
    training performed on large amounts of labeled data (ImageNet). In this paper,
    we show that Binary Patterns encoded CNN models, codenamed TEX-Nets, trained
    using mapped coded images with explicit texture information provide
    complementary information to the standard RGB deep models. Additionally, two
    deep architectures, namely early and late fusion, are investigated to combine
    the texture and color information. To the best of our knowledge, we are the
    first to investigate Binary Patterns encoded CNNs and different deep network
    fusion architectures for texture recognition and remote sensing scene
    classification. We perform comprehensive experiments on four texture
    recognition datasets and four remote sensing scene classification benchmarks:
    UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with
    7 categories and the recently introduced large scale aerial image dataset (AID)
    with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary
    information to standard RGB deep model of the same network architecture. Our
    late fusion TEX-Net architecture always improves the overall performance
    compared to the standard RGB network on both recognition problems. Our final
    combination outperforms the state-of-the-art without employing fine-tuning or
    ensemble of RGB network architectures.

    Deep Frame Interpolation

    Vladislav Samsonov
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This work presents a supervised learning based approach to the computer
    vision problem of frame interpolation. The presented technique could also be
    used in the cartoon animations since drawing each individual frame consumes a
    noticeable amount of time. The most existing solutions to this problem use
    unsupervised methods and focus only on real life videos with already high frame
    rate. However, the experiments show that such methods do not work as well when
    the frame rate becomes low and object displacements between frames becomes
    large. This is due to the fact that interpolation of the large displacement
    motion requires knowledge of the motion structure thus the simple techniques
    such as frame averaging start to fail. In this work the deep convolutional
    neural network is used to solve the frame interpolation problem. In addition,
    it is shown that incorporating the prior information such as optical flow
    improves the interpolation quality significantly.

    Segmentation of Intracranial Arterial Calcification with Deeply Supervised Residual Dropout Networks

    Gerda Bortsova, Gijs van Tulder, Florian Dubost, Tingying Peng, Nassir Navab, Aad van der Lugt, Daniel Bos, Marleen de Bruijne
    Comments: Accepted for MICCAI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Intracranial carotid artery calcification (ICAC) is a major risk factor for
    stroke, and might contribute to dementia and cognitive decline. Reliance on
    time-consuming manual annotation of ICAC hampers much demanded further research
    into the relationship between ICAC and neurological diseases. Automation of
    ICAC segmentation is therefore highly desirable, but difficult due to the
    proximity of the lesions to bony structures with a similar attenuation
    coefficient. In this paper, we propose a method for automatic segmentation of
    ICAC; the first to our knowledge. Our method is based on a 3D fully
    convolutional neural network that we extend with two regularization techniques.
    Firstly, we use deep supervision (hidden layers supervision) to encourage
    discriminative features in the hidden layers. Secondly, we augment the network
    with skip connections, as in the recently developed ResNet, and dropout layers,
    inserted in a way that skip connections circumvent them. We investigate the
    effect of skip connections and dropout. In addition, we propose a simple
    problem-specific modification of the network objective function that restricts
    the focus to the most important image regions and simplifies the optimization.
    We train and validate our model using 882 CT scans and test on 1,000. Our
    regularization techniques and objective improve the average Dice score by 7.1%,
    yielding an average Dice of 76.2% and 97.7% correlation between predicted ICAC
    volumes and manual annotations.

    A Random-Fern based Feature Approach for Image Matching

    Yong Khoo, Seo-hyeon Keun
    Comments: Computer Imaging, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Image or object recognition is an important task in computer vision. With the
    hight-speed processing power on modern platforms and the availability of mobile
    phones everywhere, millions of photos are uploaded to the internet per minute,
    it is critical to establish a generic framework for fast and accurate image
    processing for automatic recognition and information retrieval. In this paper,
    we proposed an efficient image recognition and matching method that is
    originally derived from Naive Bayesian classification method to construct a
    probabilistic model. Our method support real-time performance and have very
    high ability to distinguish similar images with high details. Experiments are
    conducted together with intensive comparison with state-of-the-arts on image
    matching, such as Ferns recognition and SIFT recognition. The results
    demonstrate satisfactory performance.

    Face R-CNN

    Hao Wang, Zhifeng Li, Xing Ji, Yitong Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Faster R-CNN is one of the most representative and successful methods for
    object detection, and has been becoming increasingly popular in various
    objection detection applications. In this report, we propose a robust deep face
    detection approach based on Faster R-CNN. In our approach, we exploit several
    new techniques including new multi-task loss function design, online hard
    example mining, and multi-scale training strategy to improve Faster R-CNN in
    multiple aspects. The proposed approach is well suited for face detection, so
    we call it Face R-CNN. Extensive experiments are conducted on two most popular
    and challenging face detection benchmarks, FDDB and WIDER FACE, to demonstrate
    the superiority of the proposed approach over state-of-the-arts.

    Brain Intelligence: Go Beyond Artificial Intelligence

    Huimin Lu, Yujie Li, Min Chen, Hyoungseop Kim, Seiichi Serikawa
    Comments: 15 pages, Mobile Networks and Applications, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Artificial intelligence (AI) is an important technology that supports daily
    social life and economic activities. It contributes greatly to the sustainable
    growth of Japan’s economy and solves various social problems. In recent years,
    AI has attracted attention as a key for growth in developed countries such as
    Europe and the United States and developing countries such as China and India.
    The attention has been focused mainly on developing new artificial intelligence
    information communication technology (ICT) and robot technology (RT). Although
    recently developed AI technology certainly excels in extracting certain
    patterns, there are many limitations. Most ICT models are overly dependent on
    big data, lack a self-idea function, and are complicated. In this paper, rather
    than merely developing next-generation artificial intelligence technology, we
    aim to develop a new concept of general-purpose intelligence cognition
    technology called Beyond AI. Specifically, we plan to develop an intelligent
    learning model called Brain Intelligence (BI) that generates new ideas about
    events without having experienced them by using artificial life with an imagine
    function. We will also conduct demonstrations of the developed BI intelligence
    learning model on automatic driving, precision medical care, and industrial
    robots.

    Personalized Age Progression with Bi-level Aging Dictionary Learning

    Xiangbo Shu, Jinhui Tang, Zechao Li, Hanjiang Lai, Liyan Zhang, Shuicheng Yan
    Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Age progression is defined as aesthetically re-rendering the aging face at
    any future age for an individual face. In this work, we aim to automatically
    render aging faces in a personalized way. Basically, for each age group, we
    learn an aging dictionary to reveal its aging characteristics (e.g., wrinkles),
    where the dictionary bases corresponding to the same index yet from two
    neighboring aging dictionaries form a particular aging pattern cross these two
    age groups, and a linear combination of all these patterns expresses a
    particular personalized aging process. Moreover, two factors are taken into
    consideration in the dictionary learning process. First, beyond the aging
    dictionaries, each person may have extra personalized facial characteristics,
    e.g. mole, which are invariant in the aging process. Second, it is challenging
    or even impossible to collect faces of all age groups for a particular person,
    yet much easier and more practical to get face pairs from neighboring age
    groups. To this end, we propose a novel Bi-level Dictionary Learning based
    Personalized Age Progression (BDL-PAP) method. Here, bi-level dictionary
    learning is formulated to learn the aging dictionaries based on face pairs from
    neighboring age groups. Extensive experiments well demonstrate the advantages
    of the proposed BDL-PAP over other state-of-the-arts in term of personalized
    age progression, as well as the performance gain for cross-age face
    verification by synthesizing aging faces.

    Image Compression Based on Compressive Sensing: End-to-End Comparison with JPEG

    Xin Yuan, Raziel Haimi-Cohen
    Comments: 13 pages, 10 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present an end-to-end image compression system based on compressive
    sensing. The presented system integrates the conventional scheme of compressive
    sampling and reconstruction with quantization and entropy coding. The
    compression performance, in terms of decoded image quality versus data rate, is
    shown to be comparable with JPEG and significantly better at the low rate
    range. We study the parameters that influence the system performance, including
    (i) the choice of sensing matrix, (ii) the trade-off between quantization and
    compression ratio, and (iii) the reconstruction algorithms. We propose an
    effective method to jointly control the quantization step and compression ratio
    in order to achieve near optimal quality at any given bit rate. Furthermore,
    our proposed image compression system can be directly used in the compressive
    sensing camera, e.g. the single pixel camera, to construct a hardware
    compressive sampling system.

    Order embeddings and character-level convolutions for multimodal alignment

    Jônatas Wehrmann, Anderson Mattjie, Rodrigo C. Barros
    Comments: 7 pages, 5 figures, submitted to Pattern Recognition Letters
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    With the novel and fast advances in the area of deep neural networks, several
    challenging image-based tasks have been recently approached by researchers in
    pattern recognition and computer vision. In this paper, we address one of these
    tasks, which is to match image content with natural language descriptions,
    sometimes referred as multimodal content retrieval. Such a task is particularly
    challenging considering that we must find a semantic correspondence between
    captions and the respective image, a challenge for both computer vision and
    natural language processing areas. For such, we propose a novel multimodal
    approach based solely on convolutional neural networks for aligning images with
    their captions by directly convolving raw characters. Our proposed
    character-based textual embeddings allow the replacement of both
    word-embeddings and recurrent neural networks for text understanding, saving
    processing time and requiring fewer learnable parameters. Our method is based
    on the idea of projecting both visual and textual information into a common
    embedding space. For training such embeddings we optimize a contrastive loss
    function that is computed to minimize order-violations between images and their
    respective descriptions. We achieve state-of-the-art performance in the largest
    and most well-known image-text alignment dataset, namely Microsoft COCO, with a
    method that is conceptually much simpler and that possesses considerably fewer
    parameters than current approaches.

    Graph-Cut RANSAC

    Daniel Barath, Jiri Matas
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A novel method, called Graph Cut RANSAC, GC-RANSAC in short, is presented. To
    separate inliers and outliers, it runs the graph cut algorithm in the local
    optimization (LO) step which is applied after a extit{so-far-the-best} model
    is found. The proposed LO step is conceptually simple, easy to implement,
    globally optimal and efficient.

    Experiments show that GC-RANSAC outperforms LO-RANSAC and its
    state-of-the-art variants in terms of both accuracy and the required number of
    iterations for line, homography and fundamental matrix estimation on standard
    public datasets. GC-RANSAC is very efficient, its processing time for hundreds
    of input points is approximately (1) — (10) milliseconds, depending on the
    inlier-outlier ratio.

    See, Hear, and Read: Deep Aligned Representations

    Yusuf Aytar, Carl Vondrick, Antonio Torralba
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We capitalize on large amounts of readily-available, synchronous data to
    learn a deep discriminative representations shared across three major natural
    modalities: vision, sound and language. By leveraging over a year of sound from
    video and millions of sentences paired with images, we jointly train a deep
    convolutional network for aligned representation learning. Our experiments
    suggest that this representation is useful for several tasks, such as
    cross-modal retrieval or transferring classifiers between modalities. Moreover,
    although our network is only trained with image+text and image+sound pairs, it
    can transfer between text and sound as well, a transfer the network never
    observed during training. Visualizations of our representation reveal many
    hidden units which automatically emerge to detect concepts, independent of the
    modality.

    Concurrence-Aware Long Short-Term Sub-Memories for Person-Person Action Recognition

    Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Yan Song, Zechao Li, Liyan Zhang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recently, Long Short-Term Memory (LSTM) has become a popular choice to model
    individual dynamics for single-person action recognition due to its ability of
    modeling the temporal information in various ranges of dynamic contexts.
    However, existing RNN models only focus on capturing the temporal dynamics of
    the person-person interactions by naively combining the activity dynamics of
    individuals or modeling them as a whole. This neglects the inter-related
    dynamics of how person-person interactions change over time. To this end, we
    propose a novel Concurrence-Aware Long Short-Term Sub-Memories (Co-LSTSM) to
    model the long-term inter-related dynamics between two interacting people on
    the bounding boxes covering people. Specifically, for each frame, two
    sub-memory units store individual motion information, while a concurrent LSTM
    unit selectively integrates and stores inter-related motion information between
    interacting people from these two sub-memory units via a new co-memory cell.
    Experimental results on the BIT and UT datasets show the superiority of
    Co-LSTSM compared with the state-of-the-art methods.

    Deep-Learning Convolutional Neural Networks for scattered shrub detection with Google Earth Imagery

    Emilio Guirado, Siham Tabik, Domingo Alcaraz-Segura, Javier Cabello, Francisco Herrera
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    There is a growing demand for accurate high-resolution land cover maps in
    many fields, e.g., in land-use planning and biodiversity conservation.
    Developing such maps has been performed using Object-Based Image Analysis
    (OBIA) methods, which usually reach good accuracies, but require a high human
    supervision and the best configuration for one image can hardly be extrapolated
    to a different image. Recently, the deep learning Convolutional Neural Networks
    (CNNs) have shown outstanding results in object recognition in the field of
    computer vision. However, they have not been fully explored yet in land cover
    mapping for detecting species of high biodiversity conservation interest. This
    paper analyzes the potential of CNNs-based methods for plant species detection
    using free high-resolution Google Earth T M images and provides an objective
    comparison with the state-of-the-art OBIA-methods. We consider as case study
    the detection of Ziziphus lotus shrubs, which are protected as a priority
    habitat under the European Union Habitats Directive. According to our results,
    compared to OBIA-based methods, the proposed CNN-based detection model, in
    combination with data-augmentation, transfer learning and pre-processing,
    achieves higher performance with less human intervention and the knowledge it
    acquires in the first image can be transferred to other images, which makes the
    detection process very fast. The provided methodology can be systematically
    reproduced for other species detection.

    Learning by Association – A versatile semi-supervised training method for neural networks

    Philip Häusser, Alexander Mordvintsev, Daniel Cremers
    Comments: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In many real-world scenarios, labeled data for a specific machine learning
    task is costly to obtain. Semi-supervised training methods make use of
    abundantly available unlabeled data and a smaller number of labeled examples.
    We propose a new framework for semi-supervised training of deep neural networks
    inspired by learning in humans. “Associations” are made from embeddings of
    labeled samples to those of unlabeled ones and back. The optimization schedule
    encourages correct association cycles that end up at the same class from which
    the association was started and penalizes wrong associations ending at a
    different class. The implementation is easy to use and can be added to any
    existing end-to-end training setup. We demonstrate the capabilities of learning
    by association on several data sets and show that it can improve performance on
    classification tasks tremendously by making use of additionally available
    unlabeled data. In particular, for cases with few labeled data, our training
    scheme outperforms the current state of the art on SVHN.

    Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach

    Hu Han, Anil K. Jain, Shiguang Shan, Xilin Chen
    Comments: Technical report V1 on Mar 1, 2017, 14 pages, 10 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Face attribute estimation has many potential applications in video
    surveillance, face retrieval, and social media. While a number of methods have
    been proposed for face attribute estimation, most of them did not explicitly
    consider the attribute correlation and heterogeneity (e.g., ordinal vs. nominal
    attributes) during feature representation learning. In this paper, we present a
    Deep Multi-Task Learning (DMTL) approach to jointly estimate multiple
    heterogeneous attributes from a single face image. In DMTL, we tackle attribute
    correlation and heterogeneity with convolutional neural networks (CNNs)
    consisting of shared feature learning for all the attributes, and
    category-specific feature learning for heterogeneous attributes. We also
    introduce an unconstrained face database (LFW+), an extension of public-domain
    LFW, with heterogeneous demographic attributes (age, gender, and race) obtained
    via crowdsourcing. Experimental results on benchmarks with multiple
    heterogeneous attributes (LFW+ and MORPH II) show that the proposed approach
    has superior performance compared to state of the art. Finally, evaluations on
    public-domain face databases with multiple binary attributes (CelebA and LFWA)
    and a single attribute (LAP) show that the proposed approach has excellent
    generalization ability.

    Learning Person Trajectory Representations for Team Activity Analysis

    Nazanin Mehrasa, Yatao Zhong, Frederick Tung, Luke Bornn, Greg Mori
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Activity analysis in which multiple people interact across a large space is
    challenging due to the interplay of individual actions and collective group
    dynamics. We propose an end-to-end approach for learning person trajectory
    representations for group activity analysis. The learned representations encode
    rich spatio-temporal dependencies and capture useful motion patterns for
    recognizing individual events, as well as characteristic group dynamics that
    can be used to identify groups from their trajectories alone. We develop our
    deep learning approach in the context of team sports, which provide
    well-defined sets of events (e.g. pass, shot) and groups of people (teams).
    Analysis of events and team formations using NHL hockey and NBA basketball
    datasets demonstrate the generality of our approach.

    IDK Cascades: Fast Deep Learning by Learning not to Overthink

    Xin Wang, Yujia Luo, Daniel Crankshaw, Alexey Tumanov, Joseph E. Gonzalez
    Comments: 11 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Advances in deep learning have led to substantial increases in prediction
    accuracy as well as the cost of rendering predictions. We conjecture that for a
    majority of real-world inputs, the recent advances in deep learning have
    created models that effectively “over-think” on simple inputs. In this paper we
    revisit the classic idea of prediction cascades to reduce prediction costs. We
    introduce the “I Don’t Know” (IDK) prediction cascades framework, a general
    framework for constructing prediction cascades for arbitrary multi-class
    prediction tasks. We propose two baseline methods for constructing cascades as
    well as a new objective within this framework and evaluate these techniques on
    a range of benchmark and real-world datasets to demonstrate the prediction
    cascades can achieve 1.7-10.5x speedups in image classification tasks while
    maintaining comparable accuracy to state-of-the-art models. When combined with
    human experts, prediction cascades can achieve nearly perfect accuracy(within
    5%) while requiring human intervention on less than 30% of the queries.

    Neureal Network-Based Automatic Liver Tumor Segmentation With Random Forest-Based Candidate Filtering

    Grzegorz Chlebus, Hans Meine, Jan Hendrik Moltz, Andrea Schenk
    Comments: Submitted to the ISBI 2017 LiTS Challenge
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a fully automatic method employing convolutional neural networks
    based on the 2D U-net architecture and random forest classifier to solve the
    automatic liver lesion segmentation problem of the ISBI 2017 Liver Tumor
    Segmentation Challenge (LiTS). In order to constrain the ROI in which the
    tumors could be located, a liver segmentation is performed first. For the organ
    segmentation, an ensemble of convolutional networks is trained to segment a
    liver using a set of 179 liver CT datasets from liver surgery planning. Inside
    of the liver ROI a neural network, trained using 127 challenge training
    datasets, identifies tumor candidates, which are subsequently filtered with a
    random forest classifier yielding the final tumor segmentation. The evaluation
    on the 70 challenge test cases resulted in a mean Dice coefficient of 0.65,
    ranking our method in the second place.

    Multi-Class Model Fitting by Energy Minimization and Mode-Seeking

    Daniel Barath, Jiri Matas
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a novel method, called Multi-X, for general multi-class
    multi-instance model fitting – the problem of interpreting input data as a
    mixture of noisy observations originating from multiple instances of multiple
    types. The proposed approach combines global energy optimization and
    mode-seeking in the parameter domain. Robustness is achieved by using an
    outlier class. Key optimization parameters like the outlier threshold are set
    automatically within the algorithm. Considering that a group of outliers may
    form spatially coherent structures in the data, we propose a
    cross-validation-based technique removing statistically insignificant
    instances.

    Multi-X outperforms significantly the state-of-the-art on the standard
    AdelaideRMF (multiple plane segmentation, multiple rigid motion detection) and
    Hopkins datasets (motion segmentation) and in experiments on 3D LIDAR data
    (simultaneous plane and cylinder fitting) and on 2D edge interpretation (circle
    and line fitting). Multi-X runs in time approximately linear in the number of
    data points at around 0.1 second per 100 points, an order of magnitude faster
    than available implementations of commonly used methods.

    One-Sided Unsupervised Domain Mapping

    Sagie Benaim, Lior Wolf
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In unsupervised domain mapping, the learner is given two unmatched datasets
    (A) and (B). The goal is to learn a mapping (G_{AB}) that translates a sample
    in (A) to the analog sample in (B). Recent approaches have shown that when
    learning simultaneously both (G_{AB}) and the inverse mapping (G_{BA}),
    convincing mappings are obtained. In this work, we present a method of learning
    (G_{AB}) without learning (G_{BA}). This is done by learning a mapping that
    maintains the distance between a pair of samples. Moreover, good mappings are
    obtained, even by maintaining the distance between different parts of the same
    sample before and after mapping. We present experimental results that the new
    method not only allows for one sided mapping learning, but also leads to
    preferable numerical results over the existing circularity-based constraint.
    Our entire code is made publicly available at
    this https URL .

    A watershed-based algorithm to segment and classify cells in fluorescence microscopy images

    Lena R. Bartell, Lawrence J. Bonassar, Itai Cohen
    Comments: 8 pages, 5 figures, 2 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Imaging assays of cellular function, especially those using fluorescent
    stains, are ubiquitous in the biological and medical sciences. Despite advances
    in computer vision, such images are often analyzed using only manual or
    rudimentary automated processes. Watershed-based segmentation is an effective
    technique for identifying objects in images; it outperforms commonly used image
    analysis methods, but requires familiarity with computer-vision techniques to
    be applied successfully. In this report, we present and implement a
    watershed-based image analysis and classification algorithm in a GUI, enabling
    a broad set of users to easily understand the algorithm and adjust the
    parameters to their specific needs. As an example, we implement this algorithm
    to find and classify cells in a complex imaging assay for mitochondrial
    function. In a second example, we demonstrate a workflow using manual
    comparisons and receiver operator characteristics to optimize the algorithm
    parameters for finding live and dead cells in a standard viability assay.
    Overall, this watershed-based algorithm is more advanced than traditional
    thresholding and can produce optimized, automated results. By incorporating
    associated pre-processing steps in the GUI, the algorithm is also easily
    adjusted, rendering it user-friendly.

    Automatic Response Assessment in Regions of Language Cortex in Epilepsy Patients Using ECoG-based Functional Mapping and Machine Learning

    Harish RaviPrakash, Milena Korostenskaja, Eduardo Castillo, Ki Lee, James Baumgartner, Ulas Bagci
    Comments: This paper will appear in the Proceedings of IEEE International Conference on Systems, Man and Cybernetics (SMC) 2017
    Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Accurate localization of brain regions responsible for language and cognitive
    functions in Epilepsy patients should be carefully determined prior to surgery.
    Electrocorticography (ECoG)-based Real Time Functional Mapping (RTFM) has been
    shown to be a safer alternative to electrical cortical stimulation mapping
    (ESM), which is currently the clinical/gold standard. Conventional methods for
    analyzing RTFM signals are based on statistical comparison of signal power at
    certain frequency bands with limited response assessment accuracies. This
    inherently leads to low accuracies of functional mapping results when compared
    with gold standard.

    In this study, we address the limitation of the current RTFM signal
    estimation methods by analyzing the full frequency spectrum of the signal and
    applying machine learning algorithms, specifically random forest (RF). We train
    RF with power spectral density of the time-series RTFM signal in supervised
    learning framework where ground truth labels are obtained from the ESM.
    Experimental results obtained from RTFM of six adult patients in a strictly
    controlled experimental setup reveal the state of the art detection accuracy of
    (approx 78\%) for the language comprehension task, an improvement of (23\%)
    over the conventional RTFM estimation method. To the best of our knowledge,
    this is the first study exploring the use of machine learning approaches for
    determining RTFM signal characteristics, and using the whole-frequency band for
    better region localization. Our results demonstrate the feasibility of machine
    learning based RTFM signal analysis method over the full spectrum to be a
    clinical routine in the near future.

    Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor

    Robin Ruede, Markus Müller, Sebastian Stüker, Alex Waibel
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Learning (cs.LG); Sound (cs.SD)

    Using supporting backchannel (BC) cues can make human-computer interaction
    more social. BCs provide a feedback from the listener to the speaker indicating
    to the speaker that he is still listened to. BCs can be expressed in different
    ways, depending on the modality of the interaction, for example as gestures or
    acoustic cues. In this work, we only considered acoustic cues. We are proposing
    an approach towards detecting BC opportunities based on acoustic input features
    like power and pitch. While other works in the field rely on the use of a
    hand-written rule set or specialized features, we made use of artificial neural
    networks. They are capable of deriving higher order features from input
    features themselves. In our setup, we first used a fully connected feed-forward
    network to establish an updated baseline in comparison to our previously
    proposed setup. We also extended this setup by the use of Long Short-Term
    Memory (LSTM) networks which have shown to outperform feed-forward based setups
    on various tasks. Our best system achieved an F1-Score of 0.37 using power and
    pitch features. Adding linguistic information using word2vec, the score
    increased to 0.39.

    Deep learning evaluation using deep linguistic processing

    Alexander Kuhnle, Ann Copestake
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We discuss problems with the standard approaches to evaluation for tasks like
    visual question answering, and argue that artificial data can be used to
    address these as a complement to current practice. We demonstrate that with the
    help of existing ‘deep’ linguistic processing technology we are able to create
    challenging abstract datasets, which enable us to investigate the language
    understanding abilities of multimodal deep learning models in detail.

    Submanifold Sparse Convolutional Networks

    Benjamin Graham, Laurens van der Maaten
    Comments: 10 pages
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

    Convolutional network are the de-facto standard for analysing spatio-temporal
    data such as images, videos, 3D shapes, etc. Whilst some of this data is
    naturally dense (for instance, photos), many other data sources are inherently
    sparse. Examples include pen-strokes forming on a piece of paper, or (colored)
    3D point clouds that were obtained using a LiDAR scanner or RGB-D camera.
    Standard “dense” implementations of convolutional networks are very inefficient
    when applied on such sparse data. We introduce a sparse convolutional operation
    tailored to processing sparse data that differs from prior work on sparse
    convolutional networks in that it operates strictly on submanifolds, rather
    than “dilating” the observation with every layer in the network. Our empirical
    analysis of the resulting submanifold sparse convolutional networks shows that
    they perform on par with state-of-the-art methods whilst requiring
    substantially less computation.

    Where and Who? Automatic Semantic-Aware Person Composition

    Fuwen Tan, Crispin Bernier, Benjamin Cohen, Vicente Ordonez, Connelly Barnes
    Comments: 11 pages, 11 figures
    Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

    Image compositing is a popular and successful method used to generate
    realistic yet fake imagery. Much previous work in compositing has focused on
    improving the appearance compatibility between a given object segment and a
    background image. However, most previous work does not investigate the topic of
    automatically selecting semantically compatible segments and predicting their
    locations and sizes given a background image. In this work, we attempt to fill
    this gap by developing a fully automatic compositing system that learns this
    information. To simplify the task, we restrict our problem by focusing on human
    instance composition, because human segments exhibit strong correlations with
    the background scene and are easy to collect. The first problem we investigate
    is determining where should a person segment be placed given a background
    image, and what should be its size in the background image. We tackle this by
    developing a novel Convolutional Neural Network (CNN) model that jointly
    predicts the potential location and size of the person segment. The second
    problem we investigate is, given the background image, which person segments
    (who) can be composited with the previously predicted locations and sizes,
    while retaining compatibility with both the local context and the global scene
    semantics? To achieve this, we propose an efficient context-based segment
    retrieval method that incorporates pre-trained deep feature representations.

    To demonstrate the effectiveness of the proposed compositing system, we
    conduct quantitative and qualitative experiments including a user study.
    Experimental results show our system can generate composite images that look
    semantically and visually convincing. We also develop a proof-of-concept user
    interface to demonstrate the potential application of our method.


    Artificial Intelligence

    Types of Cognition and its Implications for future High-Level Cognitive Machines

    Camilo Miguel Signorelli
    Comments: 2017 AAAI Spring Symposium Series, Science of Intelligence: Computational Principles of Natural and Artificial Intelligence
    Subjects: Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

    This work summarizes part of current knowledge on High-level Cognitive
    process and its relation with biological hardware. Thus, it is possible to
    identify some paradoxes which could impact the development of future
    technologies and artificial intelligence: we may make a High-level Cognitive
    Machine, sacrificing the principal attribute of a machine, its accuracy.

    A method for the online construction of the set of states of a Markov Decision Process using Answer Set Programming

    Leonardo A. Ferreira, Reinaldo A. C. Bianchi, Paulo E. Santos, Ramon Lopez de Mantaras
    Comments: Submitted to IJCAI 17
    Subjects: Artificial Intelligence (cs.AI)

    Non-stationary domains, that change in unpredicted ways, are a challenge for
    agents searching for optimal policies in sequential decision-making problems.
    This paper presents a combination of Markov Decision Processes (MDP) with
    Answer Set Programming (ASP), named {em Online ASP for MDP} (oASP(MDP)), which
    is a method capable of constructing the set of domain states while the agent
    interacts with a changing environment. oASP(MDP) updates previously obtained
    policies, learnt by means of Reinforcement Learning (RL), using rules that
    represent the domain changes observed by the agent. These rules represent a set
    of domain constraints that are processed as ASP programs reducing the search
    space. Results show that oASP(MDP) is capable of finding solutions for problems
    in non-stationary domains without interfering with the action-value function
    approximation process.

    3D Pathfinding and Collision Avoidance Using Uneven Search-space Quantization and Visual Cone Search

    Diptangshu Pandit
    Subjects: Artificial Intelligence (cs.AI)

    Pathfinding is a very popular area in computer game development. While
    two-dimensional (2D) pathfinding is widely applied in most of the popular game
    engines, little implementation of real three-dimensional (3D) pathfinding can
    be found. This research presents a dynamic search space optimization algorithm
    which can be applied to tessellate 3D search space unevenly, significantly
    reducing the total number of resulting nodes. The algorithm can be used with
    popular pathfinding algorithms in 3D game engines. Furthermore, a simplified
    standalone 3D pathfinding algorithm is proposed in this paper. The proposed
    algorithm relies on ray-casting or line vision to generate a feasible path
    during runtime without requiring division of the search space into a 3D grid.
    Both of the proposed algorithms are simulated on Unreal Engine to show
    innerworkings and resultant path comparison with A*. The advantages and
    shortcomings of the proposed algorithms are also discussed along with future
    directions.

    The Singularity May Be Near

    Roman V. Yampolskiy
    Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

    Toby Walsh in ‘The Singularity May Never Be Near’ gives six arguments to
    support his point of view that technological singularity may happen but that it
    is unlikely. In this paper, we provide analysis of each one of his arguments
    and arrive at similar conclusions, but with more weight given to the ‘likely to
    happen’ probability.

    Actor-Critic for Linearly-Solvable Continuous MDP with Partially Known Dynamics

    Tomoki Nishi, Prashant Doshi, Michael R. James, Danil Prokhorov
    Comments: 10 pages, 7 figures
    Subjects: Artificial Intelligence (cs.AI)

    In many robotic applications, some aspects of the system dynamics can be
    modeled accurately while others are difficult to obtain or model. We present a
    novel reinforcement learning (RL) method for continuous state and action spaces
    that learns with partial knowledge of the system and without active
    exploration. It solves linearly-solvable Markov decision processes (L-MDPs),
    which are well suited for continuous state and action spaces, based on an
    actor-critic architecture. Compared to previous RL methods for L-MDPs and path
    integral methods which are model based, the actor-critic learning does not need
    a model of the uncontrolled dynamics and, importantly, transition noise levels;
    however, it requires knowing the control dynamics for the problem. We evaluate
    our method on two synthetic test problems, and one real-world problem in
    simulation and using real traffic data. Our experiments demonstrate improved
    learning and policy performance.

    A Joint Model for Question Answering and Question Generation

    Tong Wang, Xingdi Yuan, Adam Trischler
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose a generative machine comprehension model that learns jointly to
    ask and answer questions based on documents. The proposed model uses a
    sequence-to-sequence framework that encodes the document and generates a
    question (answer) given an answer (question). Significant improvement in model
    performance is observed empirically on the SQuAD corpus, confirming our
    hypothesis that the model benefits from jointly learning to perform both tasks.
    We believe the joint model’s novelty offers a new perspective on machine
    comprehension beyond architectural engineering, and serves as a first step
    towards autonomous information seeking.

    On the Emergence of Invariance and Disentangling in Deep Representations

    Alessandro Achille, Stefano Soatto
    Comments: Keywords: Deep learning; neural network; representation; flat minima; information bottleneck; overfitting; generalization; sufficiency; minimality; sensitivity; information complexity; stochastic gradient descent; regularization; total correlation
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Using classical notions of statistical decision and information theory, we
    show that invariance in a deep neural network is equivalent to minimality of
    the representation it computes, and can be achieved by stacking layers and
    injecting noise in the computation, under realistic and empirically validated
    assumptions. We use an Information Decomposition of the empirical loss to show
    that overfitting can be reduced by limiting the information content stored in
    the weights. We then present a sharp inequality that relates the information
    content in the weights — which are a representation of the training set and
    inferred by generic optimization agnostic of invariance and disentanglement —
    and the minimality and total correlation of the activation functions, which are
    a representation of the test datum. This allows us to tackle recent puzzles
    concerning the generalization properties of deep networks and their relation to
    the geometry of the optimization residual.

    Event Representations for Automated Story Generation with Deep Neural Nets

    Lara J. Martin, Prithviraj Ammanabrolu, William Hancock, Shruti Singh, Brent Harrison, Mark O. Riedl
    Comments: 8 pages, 1 figure
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Automated story generation is the problem of automatically selecting a
    sequence of events, actions, or words that can be told as a story. We seek to
    develop a system that can generate stories by learning everything it needs to
    know from textual story corpora. To date, recurrent neural networks that learn
    language models at character, word, or sentence levels have had little success
    generating coherent stories. We explore the question of event representations
    that provide a mid-level of abstraction between words and sentences in order to
    retain the semantic information of the original data while minimizing event
    sparsity. We present a technique for preprocessing textual story data into
    event sequences. We then present a technique for automated story generation
    whereby we decompose the problem into the generation of successive events
    (event2event) and the generation of natural language sentences from events
    (event2sentence). We give empirical results comparing different event
    representations and their effects on event successor generation and the
    translation of events to natural language.

    Deep learning evaluation using deep linguistic processing

    Alexander Kuhnle, Ann Copestake
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We discuss problems with the standard approaches to evaluation for tasks like
    visual question answering, and argue that artificial data can be used to
    address these as a complement to current practice. We demonstrate that with the
    help of existing ‘deep’ linguistic processing technology we are able to create
    challenging abstract datasets, which enable us to investigate the language
    understanding abilities of multimodal deep learning models in detail.

    Learning Neural Programs To Parse Programs

    Xinyun Chen, Chang Liu, Dawn Song
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)

    In this work, we study an important problem: learning programs from
    input-output examples. We propose a novel method to learn a neural program
    operating a domain-specific non-differentiable machine, and demonstrate that
    this method can be applied to learn programs that are significantly more
    complex than the ones synthesized before: programming language parsers from
    input-output pairs without knowing the underlying grammar. The main challenge
    is to train the neural program without supervision on execution traces. To
    tackle it, we propose: (1) LL machines and neural programs operating them to
    effectively regularize the space of the learned programs; and (2) a two-phase
    reinforcement learning-based search technique to train the model. Our
    evaluation demonstrates that our approach can successfully learn to parse
    programs in both an imperative language and a functional language, and achieve
    100% test accuracy, while existing approaches’ accuracies are almost 0%. This
    is the first successful demonstration of applying reinforcement learning to
    train a neural program operating a non-differentiable machine that can fully
    generalize to test sets on a non-trivial task.

    Visuospatial Skill Learning for Robots

    S. Reza Ahmadzadeh, Fulvio Mastrogiovanni, Petar Kormushev
    Comments: 24 pages, 36 figures
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

    A novel skill learning approach is proposed that allows a robot to acquire
    human-like visuospatial skills for object manipulation tasks. Visuospatial
    skills are attained by observing spatial relationships among objects through
    demonstrations. The proposed Visuospatial Skill Learning (VSL) is a goal-based
    approach that focuses on achieving a desired goal configuration of objects
    relative to one another while maintaining the sequence of operations. VSL is
    capable of learning and generalizing multi-operation skills from a single
    demonstration, while requiring minimum prior knowledge about the objects and
    the environment. In contrast to many existing approaches, VSL offers
    simplicity, efficiency and user-friendly human-robot interaction. We also show
    that VSL can be easily extended towards 3D object manipulation tasks, simply by
    employing point cloud processing techniques. In addition, a robot learning
    framework, VSL-SP, is proposed by integrating VSL, Imitation Learning, and a
    conventional planning method. In VSL-SP, the sequence of performed actions are
    learned using VSL, while the sensorimotor skills are learned using a
    conventional trajectory-based learning approach. such integration easily
    extends robot capabilities to novel situations, even by users without
    programming ability. In VSL-SP the internal planner of VSL is integrated with
    an existing action-level symbolic planner. Using the underlying constraints of
    the task and extracted symbolic predicates, identified by VSL, symbolic
    representation of the task is updated. Therefore the planner maintains a
    generalized representation of each skill as a reusable action, which can be
    used in planning and performed independently during the learning phase. The
    proposed approach is validated through several real-world experiments.

    Active learning machine learns to create new quantum experiments

    Alexey A. Melnikov, Hendrik Poulsen Nautrup, Mario Krenn, Vedran Dunjko, Markus Tiersch, Anton Zeilinger, Hans J. Briegel
    Comments: 11 pages, 6 figures, 1 table
    Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    How useful can machine learning be in a quantum laboratory? Here we raise the
    question of the potential of intelligent machines in the context of scientific
    research. We investigate this question by using the projective simulation
    model, a physics-oriented approach to artificial intelligence. In our approach,
    the projective simulation system is challenged to design complex photonic
    quantum experiments that produce high-dimensional entangled multiphoton states,
    which are of high interest in modern quantum experiments. The artificial
    intelligence system learns to create a variety of entangled states, in number
    surpassing the best previously studied automated approaches, and improves the
    efficiency of their realization. In the process, the system autonomously
    (re)discovers experimental techniques which are only becoming standard in
    modern quantum optical experiments – a trait which was not explicitly demanded
    from the system but emerged through the process of learning. Such features
    highlight the possibility that machines could have a significantly more
    creative role in future research.

    Hyperparameter Optimization: A Spectral Approach

    Elad Hazan, Adam Klivans, Yang Yuan
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

    We give a simple, fast algorithm for hyperparameter optimization inspired by
    techniques from the analysis of Boolean functions. We focus on the
    high-dimensional regime where the canonical example is training a neural
    network with a large number of hyperparameters. The algorithm – an iterative
    application of compressed sensing techniques for orthogonal polynomials –
    requires only uniform sampling of the hyperparameters and is thus easily
    parallelizable. Experiments for training deep nets on Cifar-10 show that
    compared to state-of-the-art tools (e.g., Hyperband and Spearmint), our
    algorithm finds significantly improved solutions, in some cases matching what
    is attainable by hand-tuning. In terms of overall running time (i.e., time
    required to sample various settings of hyperparameters plus additional
    computation time), we are at least an order of magnitude faster than Hyperband
    and even more so compared to Bayesian Optimization. We also outperform Random
    Search 5X. Additionally, our method comes with provable guarantees and yields
    the first quasi-polynomial time algorithm for learning decision trees under the
    uniform distribution with polynomial sample complexity, the first improvement
    in over two decades.

    An Improved System for Sentence-level Novelty Detection in Textual Streams

    Xinyu Fu, Eugene Ch'ng, Uwe Aickelin, Lanyun Zhang
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Novelty detection in news events has long been a difficult problem. A number
    of models performed well on specific data streams but certain issues are far
    from being solved, particularly in large data streams from the WWW where
    unpredictability of new terms requires adaptation in the vector space model. We
    present a novel event detection system based on the Incremental Term
    Frequency-Inverse Document Frequency (TF-IDF) weighting incorporated with
    Locality Sensitive Hashing (LSH). Our system could efficiently and effectively
    adapt to the changes within the data streams of any new terms with continual
    updates to the vector space model. Regarding miss probability, our proposed
    novelty detection framework outperforms a recognised baseline system by
    approximately 16% when evaluating a benchmark dataset from Google News.


    Information Retrieval

    SimDex: Exploiting Model Similarity in Exact Matrix Factorization Recommendations

    Firas Abuzaid, Geet Sethi, Peter Bailis, Matei Zaharia
    Comments: 13 pages, 8 figures
    Subjects: Information Retrieval (cs.IR); Databases (cs.DB); Data Structures and Algorithms (cs.DS); Performance (cs.PF)

    We present SimDex, a new technique for serving exact top-K recommendations on
    matrix factorization models that measures and optimizes for the similarity
    between users in the model. Previous serving techniques presume a high degree
    of similarity (e.g., L2 or cosine distance) among users and/or items in MF
    models; however, as we demonstrate, the most accurate models are not guaranteed
    to exhibit high similarity. As a result, brute-force matrix multiply
    outperforms recent proposals for top-K serving on several collaborative
    filtering tasks. Based on this observation, we develop SimDex, a new technique
    for serving matrix factorization models that automatically optimizes serving
    based on the degree of similarity between users, and outperforms existing
    methods in both the high-similarity and low-similarity regimes. SimDexfirst
    measures the degree of similarity among users via clustering and uses a
    cost-based optimizer to either construct an index on the model or defer to
    blocked matrix multiply. It leverages highly efficient linear algebra
    primitives in both cases to deliver predictions either from its index or from
    brute-force multiply. Overall, SimDex runs an average of 2x and up to 6x faster
    than highly optimized baselines for the most accurate models on several popular
    collaborative filtering datasets.

    Joint Text Embedding for Personalized Content-based Recommendation

    Ting Chen, Liangjie Hong, Yue Shi, Yizhou Sun
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

    Learning a good representation of text is key to many recommendation
    applications. Examples include news recommendation where texts to be
    recommended are constantly published everyday. However, most existing
    recommendation techniques, such as matrix factorization based methods, mainly
    rely on interaction histories to learn representations of items. While latent
    factors of items can be learned effectively from user interaction data, in many
    cases, such data is not available, especially for newly emerged items.

    In this work, we aim to address the problem of personalized recommendation
    for completely new items with text information available. We cast the problem
    as a personalized text ranking problem and propose a general framework that
    combines text embedding with personalized recommendation. Users and textual
    content are embedded into latent feature space. The text embedding function can
    be learned end-to-end by predicting user interactions with items. To alleviate
    sparsity in interaction data, and leverage large amount of text data with
    little or no user interactions, we further propose a joint text embedding model
    that incorporates unsupervised text embedding with a combination module.
    Experimental results show that our model can significantly improve the
    effectiveness of recommendation systems on real-world datasets.

    Improving Legal Information Retrieval by Distributional Composition with Term Order Probabilities

    Danilo S. Carvalho, Duc-Vu Tran, Van-Khanh Tran, Le-Nguyen Minh
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    Legal professionals worldwide are currently trying to get up-to-pace with the
    explosive growth in legal document availability through digital means. This
    drives a need for high efficiency Legal Information Retrieval (IR) and Question
    Answering (QA) methods. The IR task in particular has a set of unique
    challenges that invite the use of semantic motivated NLP techniques. In this
    work, a two-stage method for Legal Information Retrieval is proposed, combining
    lexical statistics and distributional sentence representations in the context
    of Competition on Legal Information Extraction/Entailment (COLIEE). The
    combination is done with the use of disambiguation rules, applied over the
    rankings obtained through n-gram statistics. After the ranking is done, its
    results are evaluated for ambiguity, and disambiguation is done if a result is
    decided to be unreliable for a given query. Competition and experimental
    results indicate small gains in overall retrieval performance using the
    proposed approach. Additionally, an analysis of error and improvement cases is
    presented for a better understanding of the contributions.

    More Accurate Entity Ranking Using Knowledge Graph and Web Corpus

    Uma Sawant, Soumen Chakrabarti, Ganesh Ramakrishnan
    Subjects: Information Retrieval (cs.IR)

    Recent years have witnessed some convergence in the architecture of entity
    search systems driven by a knowledge graph (KG) and a corpus with annotated
    entity mentions. However, each specific system has some limitations. We present
    AQQUCN, an entity search system that combines the best design principles into a
    public reference implementation. AQQUCN does not depend on well formed question
    syntax, but works equally well with syntax poor keyword queries. It uses
    several convolutional networks (convnets) to extract subtle, overlapping roles
    of query words. Instead of ranking structured query interpretations, which are
    then executed on the KG to return unranked sets, AQQUCN directly ranks response
    entities, by closely integrating coarse grained predicates from the KG with
    fine grained scoring from the corpus, into a single ranking model. Over and
    above competitive F1 score, AQQUCN gets the best entity ranking accuracy on two
    syntax rich and two syntax poor public query workloads amounting to over 8,000
    queries, with 16 to 18 percent absolute improvement in mean average precision
    (MAP), compared to recent systems.

    Semantic Vector Encoding and Similarity Search Using Fulltext Search Engines

    Jan Rygl, Jan Pomikálek, Radim Řehůřek, Michal Růžička, Vít Novotný, Petr Sojka
    Comments: Preprint of the paper accepted to the ACL 2017 (this http URL) workshop RepL4NLP 2017 (this https URL)
    Subjects: Information Retrieval (cs.IR)

    Vector representations and vector space modeling (VSM) play a central role in
    modern machine learning. We propose a novel approach to `vector similarity
    searching’ over dense semantic representations of words and documents that can
    be deployed on top of traditional inverted-index-based fulltext engines, taking
    advantage of their robustness, stability, scalability and ubiquity.

    We show that this approach allows the indexing and querying of dense vectors
    in text domains. This opens up exciting avenues for major efficiency gains,
    along with simpler deployment, scaling and monitoring.

    The end result is a fast and scalable vector database with a tunable
    trade-off between vector search performance and quality, backed by a standard
    fulltext engine such as Elasticsearch.

    We empirically demonstrate its querying performance and quality by applying
    this solution to the task of semantic searching over a dense vector
    representation of the entire English Wikipedia.

    Simultaneous Inference of User Representations and Trust

    Shashank Gupta, Pulkit Parikh, Manish Gupta, Vasudeva Varma
    Comments: To appear in the proceedings of ASONAM’17. Please cite that version
    Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    Inferring trust relations between social media users is critical for a number
    of applications wherein users seek credible information. The fact that
    available trust relations are scarce and skewed makes trust prediction a
    challenging task. To the best of our knowledge, this is the first work on
    exploring representation learning for trust prediction. We propose an approach
    that uses only a small amount of binary user-user trust relations to
    simultaneously learn user embeddings and a model to predict trust between user
    pairs. We empirically demonstrate that for trust prediction, our approach
    outperforms classifier-based approaches which use state-of-the-art
    representation learning methods like DeepWalk and LINE as features. We also
    conduct experiments which use embeddings pre-trained with DeepWalk and LINE
    each as an input to our model, resulting in further performance improvement.
    Experiments with a dataset of (sim)356K user pairs show that the proposed
    method can obtain an high F-score of 92.65%.

    The Capacity of Private Information Retrieval from Byzantine and Colluding Databases

    Karim Banawan, Sennur Ulukus
    Comments: Submitted to IEEE Transactions on Information Theory, June 2017
    Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Information Retrieval (cs.IR)

    We consider the problem of single-round private information retrieval (PIR)
    from (N) replicated databases. We consider the case when (B) databases are
    outdated (unsynchronized), or even worse, adversarial (Byzantine), and
    therefore, can return incorrect answers. In the PIR problem with Byzantine
    databases (BPIR), a user wishes to retrieve a specific message from a set of
    (M) messages with zero-error, irrespective of the actions performed by the
    Byzantine databases. We consider the (T)-privacy constraint in this paper,
    where any (T) databases can collude, and exchange the queries submitted by the
    user. We derive the information-theoretic capacity of this problem, which is
    the maximum number of emph{correct symbols} that can be retrieved privately
    (under the (T)-privacy constraint) for every symbol of the downloaded data. We
    determine the exact BPIR capacity to be
    (C=frac{N-2B}{N}cdotfrac{1-frac{T}{N-2B}}{1-(frac{T}{N-2B})^M}), if (2B+T
    < N). This capacity expression shows that the effect of Byzantine databases on
    the retrieval rate is equivalent to removing (2B) databases from the system,
    with a penalty factor of (frac{N-2B}{N}), which signifies that even though the
    number of databases needed for PIR is effectively (N-2B), the user still needs
    to access the entire (N) databases. The result shows that for the
    unsynchronized PIR problem, if the user does not have any knowledge about the
    fraction of the messages that are mis-synchronized, the single-round capacity
    is the same as the BPIR capacity. Our achievable scheme extends the optimal
    achievable scheme for the robust PIR (RPIR) problem to correct the
    emph{errors} introduced by the Byzantine databases as opposed to
    emph{erasures} in the RPIR problem. Our converse proof uses the idea of the
    cut-set bound in the network coding problem against adversarial nodes.

    Task-specific Word Identification from Short Texts Using a Convolutional Neural Network

    Shuhan Yuan, Xintao Wu, Yang Xiang
    Comments: accepted by Intelligent Data Analysis, an International Journal
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG)

    Task-specific word identification aims to choose the task-related words that
    best describe a short text. Existing approaches require well-defined seed words
    or lexical dictionaries (e.g., WordNet), which are often unavailable for many
    applications such as social discrimination detection and fake review detection.
    However, we often have a set of labeled short texts where each short text has a
    task-related class label, e.g., discriminatory or non-discriminatory, specified
    by users or learned by classification algorithms. In this paper, we focus on
    identifying task-specific words and phrases from short texts by exploiting
    their class labels rather than using seed words or lexical dictionaries. We
    consider the task-specific word and phrase identification as feature learning.
    We train a convolutional neural network over a set of labeled texts and use
    score vectors to localize the task-specific words and phrases. Experimental
    results on sentiment word identification show that our approach significantly
    outperforms existing methods. We further conduct two case studies to show the
    effectiveness of our approach. One case study on a crawled tweets dataset
    demonstrates that our approach can successfully capture the
    discrimination-related words/phrases. The other case study on fake review
    detection shows that our approach can identify the fake-review words/phrases.

    A Complete Year of User Retrieval Sessions in a Social Sciences Academic Search Engine

    Philipp Mayr, Ameni Kacem
    Comments: 6 pages, 2 figures, accepted short paper at the 21st International Conference on Theory and Practice of Digital Libraries (TPDL 2017)
    Subjects: Digital Libraries (cs.DL); Information Retrieval (cs.IR)

    In this paper, we present an open data set extracted from the transaction log
    of the social sciences academic search engine sowiport. The data set includes a
    filtered set of 484,449 retrieval sessions which have been carried out by
    sowiport users in the period from April 2014 to April 2015. We propose a
    description of the data set features that can be used as ground truth for
    different applications such as result ranking improvement, user modeling, query
    reformulation analysis, search pattern recognition.


    Computation and Language

    A Joint Model for Question Answering and Question Generation

    Tong Wang, Xingdi Yuan, Adam Trischler
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose a generative machine comprehension model that learns jointly to
    ask and answer questions based on documents. The proposed model uses a
    sequence-to-sequence framework that encodes the document and generates a
    question (answer) given an answer (question). Significant improvement in model
    performance is observed empirically on the SQuAD corpus, confirming our
    hypothesis that the model benefits from jointly learning to perform both tasks.
    We believe the joint model’s novelty offers a new perspective on machine
    comprehension beyond architectural engineering, and serves as a first step
    towards autonomous information seeking.

    A simple neural network module for relational reasoning

    Adam Santoro, David Raposo, David G.T. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, Timothy Lillicrap
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Relational reasoning is a central component of generally intelligent
    behavior, but has proven difficult for neural networks to learn. In this paper
    we describe how to use Relation Networks (RNs) as a simple plug-and-play module
    to solve problems that fundamentally hinge on relational reasoning. We tested
    RN-augmented networks on three tasks: visual question answering using a
    challenging dataset called CLEVR, on which we achieve state-of-the-art,
    super-human performance; text-based question answering using the bAbI suite of
    tasks; and complex reasoning about dynamic physical systems. Then, using a
    curated dataset called Sort-of-CLEVR we show that powerful convolutional
    networks do not have a general capacity to solve relational questions, but can
    gain this capacity when augmented with RNs. Our work shows how a deep learning
    architecture equipped with an RN module can implicitly discover and learn to
    reason about entities and their relations.

    Language Generation with Recurrent Generative Adversarial Networks without Pre-training

    Ofir Press, Amir Bar, Ben Bogin, Jonathan Berant, Lior Wolf
    Subjects: Computation and Language (cs.CL)

    Generative Adversarial Networks (GANs) have shown great promise recently in
    image generation. Training GANs for text generation has proven to be more
    difficult, because of the non-differentiable nature of generating text with
    recurrent neural networks. Consequently, past work has either resorted to
    pre-training with maximum-likelihood or used convolutional networks for
    generation. In this work, we show that recurrent neural networks can be trained
    to generate text with GANs from scratch by employing curriculum learning,
    slowly increasing the length of the generated text, and by training the RNN
    simultaneously to generate sequences of different lengths. We show that this
    approach vastly improves the quality of generated sequences compared to the
    convolutional baseline.

    Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor

    Robin Ruede, Markus Müller, Sebastian Stüker, Alex Waibel
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Learning (cs.LG); Sound (cs.SD)

    Using supporting backchannel (BC) cues can make human-computer interaction
    more social. BCs provide a feedback from the listener to the speaker indicating
    to the speaker that he is still listened to. BCs can be expressed in different
    ways, depending on the modality of the interaction, for example as gestures or
    acoustic cues. In this work, we only considered acoustic cues. We are proposing
    an approach towards detecting BC opportunities based on acoustic input features
    like power and pitch. While other works in the field rely on the use of a
    hand-written rule set or specialized features, we made use of artificial neural
    networks. They are capable of deriving higher order features from input
    features themselves. In our setup, we first used a fully connected feed-forward
    network to establish an updated baseline in comparison to our previously
    proposed setup. We also extended this setup by the use of Long Short-Term
    Memory (LSTM) networks which have shown to outperform feed-forward based setups
    on various tasks. Our best system achieved an F1-Score of 0.37 using power and
    pitch features. Adding linguistic information using word2vec, the score
    increased to 0.39.

    Event Representations for Automated Story Generation with Deep Neural Nets

    Lara J. Martin, Prithviraj Ammanabrolu, William Hancock, Shruti Singh, Brent Harrison, Mark O. Riedl
    Comments: 8 pages, 1 figure
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Automated story generation is the problem of automatically selecting a
    sequence of events, actions, or words that can be told as a story. We seek to
    develop a system that can generate stories by learning everything it needs to
    know from textual story corpora. To date, recurrent neural networks that learn
    language models at character, word, or sentence levels have had little success
    generating coherent stories. We explore the question of event representations
    that provide a mid-level of abstraction between words and sentences in order to
    retain the semantic information of the original data while minimizing event
    sparsity. We present a technique for preprocessing textual story data into
    event sequences. We then present a technique for automated story generation
    whereby we decompose the problem into the generation of successive events
    (event2event) and the generation of natural language sentences from events
    (event2sentence). We give empirical results comparing different event
    representations and their effects on event successor generation and the
    translation of events to natural language.

    Deep learning evaluation using deep linguistic processing

    Alexander Kuhnle, Ann Copestake
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We discuss problems with the standard approaches to evaluation for tasks like
    visual question answering, and argue that artificial data can be used to
    address these as a complement to current practice. We demonstrate that with the
    help of existing ‘deep’ linguistic processing technology we are able to create
    challenging abstract datasets, which enable us to investigate the language
    understanding abilities of multimodal deep learning models in detail.

    One-step and Two-step Classification for Abusive Language Detection on Twitter

    Ji Ho Park, Pascale Fung
    Comments: ALW1: 1st Workshop on Abusive Language Online to be held at the annual meeting of the Association of Computational Linguistics (ACL) 2017 (Vancouver, Canada), August 4th, 2017
    Subjects: Computation and Language (cs.CL)

    Automatic abusive language detection is a difficult but important task for
    online social media. Our research explores a two-step approach of performing
    classification on abusive language and then classifying into specific types and
    compares it with one-step approach of doing one multi-class classification for
    detecting sexist and racist languages. With a public English Twitter corpus of
    20 thousand tweets in the type of sexism and racism, our approach shows a
    promising performance of 0.827 F-measure by using HybridCNN in one-step and
    0.824 F-measure by using logistic regression in two-steps.

    CRNN: A Joint Neural Network for Redundancy Detection

    Xinyu Fu, Eugene Ch'ng, Uwe Aickelin, Simon See
    Comments: Conference paper accepted at IEEE SMARTCOMP 2017, Hong Kong
    Subjects: Computation and Language (cs.CL)

    This paper proposes a novel framework for detecting redundancy in supervised
    sentence categorisation. Unlike traditional singleton neural network, our model
    incorporates character-aware convolutional neural network (Char-CNN) with
    character-aware recurrent neural network (Char-RNN) to form a convolutional
    recurrent neural network (CRNN). Our model benefits from Char-CNN in that only
    salient features are selected and fed into the integrated Char-RNN. Char-RNN
    effectively learns long sequence semantics via sophisticated update mechanism.
    We compare our framework against the state-of-the-art text classification
    algorithms on four popular benchmarking corpus. For instance, our model
    achieves competing precision rate, recall ratio, and F1 score on the
    Google-news data-set. For twenty-news-groups data stream, our algorithm obtains
    the optimum on precision rate, recall ratio, and F1 score. For Brown Corpus,
    our framework obtains the best F1 score and almost equivalent precision rate
    and recall ratio over the top competitor. For the question classification
    collection, CRNN produces the optimal recall rate and F1 score and comparable
    precision rate. We also analyse three different RNN hidden recurrent cells’
    impact on performance and their runtime efficiency. We observe that MGU
    achieves the optimal runtime and comparable performance against GRU and LSTM.
    For TFIDF based algorithms, we experiment with word2vec, GloVe, and sent2vec
    embeddings and report their performance differences.

    Concept Transfer Learning for Adaptive Language Understanding

    Su Zhu, Kai Yu
    Comments: 10 pages, 4 figures
    Subjects: Computation and Language (cs.CL)

    Semantic transfer is an important problem of the language understanding (LU),
    which is about how the recognition pattern of a semantic concept benefits other
    associated concepts. In this paper, we propose a new semantic representation
    based on combinatory concepts. Semantic slot is represented as a composition of
    different atomic concepts in different semantic dimensions. Specifically, we
    propose the concept transfer learning methods for extending combinatory
    concepts in LU. The concept transfer learning makes use of the common ground of
    combinatory concepts shown in the literal description. Our methods are applied
    to two adaptive LU problems: semantic slot refinement and domain adaptation,
    and respectively evaluated on two benchmark LU datasets: ATIS and DSTC 2&3. The
    experiment results show that the concept transfer learning is very efficient
    for semantic slot refinement and domain adaptation in the LU.

    Task-specific Word Identification from Short Texts Using a Convolutional Neural Network

    Shuhan Yuan, Xintao Wu, Yang Xiang
    Comments: accepted by Intelligent Data Analysis, an International Journal
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG)

    Task-specific word identification aims to choose the task-related words that
    best describe a short text. Existing approaches require well-defined seed words
    or lexical dictionaries (e.g., WordNet), which are often unavailable for many
    applications such as social discrimination detection and fake review detection.
    However, we often have a set of labeled short texts where each short text has a
    task-related class label, e.g., discriminatory or non-discriminatory, specified
    by users or learned by classification algorithms. In this paper, we focus on
    identifying task-specific words and phrases from short texts by exploiting
    their class labels rather than using seed words or lexical dictionaries. We
    consider the task-specific word and phrase identification as feature learning.
    We train a convolutional neural network over a set of labeled texts and use
    score vectors to localize the task-specific words and phrases. Experimental
    results on sentiment word identification show that our approach significantly
    outperforms existing methods. We further conduct two case studies to show the
    effectiveness of our approach. One case study on a crawled tweets dataset
    demonstrates that our approach can successfully capture the
    discrimination-related words/phrases. The other case study on fake review
    detection shows that our approach can identify the fake-review words/phrases.

    Improving Legal Information Retrieval by Distributional Composition with Term Order Probabilities

    Danilo S. Carvalho, Duc-Vu Tran, Van-Khanh Tran, Le-Nguyen Minh
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    Legal professionals worldwide are currently trying to get up-to-pace with the
    explosive growth in legal document availability through digital means. This
    drives a need for high efficiency Legal Information Retrieval (IR) and Question
    Answering (QA) methods. The IR task in particular has a set of unique
    challenges that invite the use of semantic motivated NLP techniques. In this
    work, a two-stage method for Legal Information Retrieval is proposed, combining
    lexical statistics and distributional sentence representations in the context
    of Competition on Legal Information Extraction/Entailment (COLIEE). The
    combination is done with the use of disambiguation rules, applied over the
    rankings obtained through n-gram statistics. After the ranking is done, its
    results are evaluated for ambiguity, and disambiguation is done if a result is
    decided to be unreliable for a given query. Competition and experimental
    results indicate small gains in overall retrieval performance using the
    proposed approach. Additionally, an analysis of error and improvement cases is
    presented for a better understanding of the contributions.

    Wikipedia Vandal Early Detection: from User Behavior to User Embedding

    Shuhan Yuan, Panpan Zheng, Xintao Wu, Yang Xiang
    Comments: 14 pages, 3 figures
    Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Computers and Society (cs.CY)

    Wikipedia is the largest online encyclopedia that allows anyone to edit
    articles. In this paper, we propose the use of deep learning to detect vandals
    based on their edit history. In particular, we develop a multi-source
    long-short term memory network (M-LSTM) to model user behaviors by using a
    variety of user edit aspects as inputs, including the history of edit reversion
    information, edit page titles and categories. With M-LSTM, we can encode each
    user into a low dimensional real vector, called user embedding. Meanwhile, as a
    sequential model, M-LSTM updates the user embedding each time after the user
    commits a new edit. Thus, we can predict whether a user is benign or vandal
    dynamically based on the up-to-date user embedding. Furthermore, those user
    embeddings are crucial to discover collaborative vandals.


    Distributed, Parallel, and Cluster Computing

    Distributed Contingency Analysis over Wide Area Network among Dispatch Centers

    Zhengwei Ren, Ying Chen, Shaowei Huang, Shuang Sheng, Huiping Zheng, Xinyuan Liu
    Comments: 5 pages, 6 figures, 2017 IEEE PES General Meeting
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Traditionally, a regional dispatch center uses the equivalent method to deal
    with external grids, which fails to reflect the interactions among regions.
    This paper proposes a distributed N-1 contingency analysis (DCA) solution,
    where dispatch centers join a coordinated computation using their private data
    and computing resources. A distributed screening method is presented to
    determine the Critical Contingency Set (DCCS) in DCA. Then, the distributed
    power flow is formulated as a set of boundary equations, which is solved by a
    Jacobi-Free Newton-GMRES (JFNG) method. During solving the distributed power
    flow, only boundary conditions are exchanged. Acceleration techniques are also
    introduced, including reusing preconditioners and optimal resource scheduling
    during parallel processing of multiple contingencies. The proposed method is
    implemented on a real EMS platform, where tests using the Southwest Regional
    Grid of China are carried out to validate its feasibility.

    MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU

    Qingqing Cao, Niranjan Balasubramanian, Aruna Balasubramanian
    Comments: Published at 1st International Workshop on Embedded and Mobile Deep Learning colocated with MobiSys 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    In this paper, we explore optimizations to run Recurrent Neural Network (RNN)
    models locally on mobile devices. RNN models are widely used for Natural
    Language Processing, Machine Translation, and other tasks. However, existing
    mobile applications that use RNN models do so on the cloud. To address privacy
    and efficiency concerns, we show how RNN models can be run locally on mobile
    devices. Existing work on porting deep learning models to mobile devices focus
    on Convolution Neural Networks (CNNs) and cannot be applied directly to RNN
    models. In response, we present MobiRNN, a mobile-specific optimization
    framework that implements GPU offloading specifically for mobile GPUs.
    Evaluations using an RNN model for activity recognition shows that MobiRNN does
    significantly decrease the latency of running RNN models on phones.

    Neuro-RAM Unit with Applications to Similarity Testing and Compression in Spiking Neural Networks

    Nancy Lynch, Cameron Musco, Merav Parter
    Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Neurons and Cognition (q-bio.NC)

    We study distributed algorithms implemented in a simplified but biologically
    plausible model for stochastic spiking neural networks. We focus on tradeoffs
    between computation time and network complexity, along with the role of
    randomness in efficient neural computation.

    It is widely accepted that neural computation is inherently stochastic. In
    recent work, we explored how this stochasticity could be leveraged to solve the
    `winner-take-all’ leader election task. Here, we focus on using randomness in
    neural algorithms for similarity testing and compression. In the most basic
    setting, given two (n)-length patterns of firing neurons, we wish to
    distinguish if the patterns are equal or (epsilon)-far from equal.

    Randomization allows us to solve this task with a very compact network, using
    (O left (frac{sqrt{n}log n}{epsilon}
    ight)) auxiliary neurons, which is
    sublinear in the input size. At the heart of our solution is the design of a
    (t)-round neural random access memory, or indexing network, which we call a
    neuro-RAM. This module can be implemented with (O(n/t)) auxiliary neurons and
    is useful in many applications beyond similarity testing.

    Using a combination of Yao’s minimax principle and a VC dimension-based
    argument, we show that the tradeoff between runtime and network size in our
    neuro-RAM is nearly optimal. Our result has several implications — since our
    neuro-RAM construction can be implemented with deterministic threshold gates,
    it demonstrates that, in contrast to similarity testing, randomness does not
    provide significant computational advantages for this problem. It also
    establishes a separation between our networks, which spike with a sigmoidal
    probability function, and well-studied but less biologically plausible
    deterministic sigmoidal networks, whose gates output real number values, and
    which can implement a neuro-RAM much more efficiently.

    Design and Execution of make-like, distributed Analyses based on Spotify's Pipelining Package Luigi

    Marcel Rieger, Martin Erdmann, Benjamin Fischer, Robert Fischer
    Subjects: Data Analysis, Statistics and Probability (physics.data-an); Distributed, Parallel, and Cluster Computing (cs.DC)

    In high-energy particle physics, workflow management systems are primarily
    used as tailored solutions in dedicated areas such as Monte Carlo production.
    However, physicists performing data analyses are usually required to steer
    their individual workflows manually which is time-consuming and often leads to
    undocumented relations between particular workloads. We present a generic
    analysis design pattern that copes with the sophisticated demands of end-to-end
    HEP analyses and provides a make-like execution system. It is based on the
    open-source pipelining package Luigi which was developed at Spotify and enables
    the definition of arbitrary workloads, so-called Tasks, and the dependencies
    between them in a lightweight and scalable structure. Further features are
    multi-user support, automated dependency resolution and error handling, central
    scheduling, and status visualization in the web. In addition to already
    built-in features for remote jobs and file systems like Hadoop and HDFS, we
    added support for WLCG infrastructure such as LSF and CREAM job submission, as
    well as remote file access through the Grid File Access Library. Furthermore,
    we implemented automated resubmission functionality, software sandboxing, and a
    command line interface with auto-completion for a convenient working
    environment. For the implementation of a (tar{t}H) cross section measurement,
    we created a generic Python interface that provides programmatic access to all
    external information such as datasets, physics processes, statistical models,
    and additional files and values. In summary, the setup enables the execution of
    the entire analysis in a parallelized and distributed fashion with a single
    command.


    Learning

    Multi-Observation Elicitation

    Sebastian Casalaina-Martin, Rafael Frongillo, Tom Morgan, Bo Waggoner
    Subjects: Learning (cs.LG)

    We study loss functions that measure the accuracy of a prediction based on
    multiple data points simultaneously. To our knowledge, such loss functions have
    not been studied before in the area of property elicitation or in machine
    learning more broadly. As compared to traditional loss functions that take only
    a single data point, these multi-observation loss functions can in some cases
    drastically reduce the dimensionality of the hypothesis required. In
    elicitation, this corresponds to requiring many fewer reports; in empirical
    risk minimization, it corresponds to algorithms on a hypothesis space of much
    smaller dimension. We explore some examples of the tradeoff between
    dimensionality and number of observations, give some geometric
    characterizations and intuition for relating loss functions and the properties
    that they elicit, and discuss some implications for both elicitation and
    machine-learning contexts.

    Sparse Stochastic Bandits

    Joon Kwon, Vianney Perchet, Claire Vernade
    Subjects: Learning (cs.LG)

    In the classical multi-armed bandit problem, d arms are available to the
    decision maker who pulls them sequentially in order to maximize his cumulative
    reward. Guarantees can be obtained on a relative quantity called regret, which
    scales linearly with d (or with sqrt(d) in the minimax sense). We here consider
    the sparse case of this classical problem in the sense that only a small number
    of arms, namely s < d, have a positive expected reward. We are able to leverage
    this additional assumption to provide an algorithm whose regret scales with s
    instead of d. Moreover, we prove that this algorithm is optimal by providing a
    matching lower bound – at least for a wide and pertinent range of parameters
    that we determine – and by evaluating its performance on simulated data.

    On the Emergence of Invariance and Disentangling in Deep Representations

    Alessandro Achille, Stefano Soatto
    Comments: Keywords: Deep learning; neural network; representation; flat minima; information bottleneck; overfitting; generalization; sufficiency; minimality; sensitivity; information complexity; stochastic gradient descent; regularization; total correlation
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Using classical notions of statistical decision and information theory, we
    show that invariance in a deep neural network is equivalent to minimality of
    the representation it computes, and can be achieved by stacking layers and
    injecting noise in the computation, under realistic and empirically validated
    assumptions. We use an Information Decomposition of the empirical loss to show
    that overfitting can be reduced by limiting the information content stored in
    the weights. We then present a sharp inequality that relates the information
    content in the weights — which are a representation of the training set and
    inferred by generic optimization agnostic of invariance and disentanglement —
    and the minimality and total correlation of the activation functions, which are
    a representation of the test datum. This allows us to tackle recent puzzles
    concerning the generalization properties of deep networks and their relation to
    the geometry of the optimization residual.

    Learning Neural Programs To Parse Programs

    Xinyun Chen, Chang Liu, Dawn Song
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)

    In this work, we study an important problem: learning programs from
    input-output examples. We propose a novel method to learn a neural program
    operating a domain-specific non-differentiable machine, and demonstrate that
    this method can be applied to learn programs that are significantly more
    complex than the ones synthesized before: programming language parsers from
    input-output pairs without knowing the underlying grammar. The main challenge
    is to train the neural program without supervision on execution traces. To
    tackle it, we propose: (1) LL machines and neural programs operating them to
    effectively regularize the space of the learned programs; and (2) a two-phase
    reinforcement learning-based search technique to train the model. Our
    evaluation demonstrates that our approach can successfully learn to parse
    programs in both an imperative language and a functional language, and achieve
    100% test accuracy, while existing approaches’ accuracies are almost 0%. This
    is the first successful demonstration of applying reinforcement learning to
    train a neural program operating a non-differentiable machine that can fully
    generalize to test sets on a non-trivial task.

    Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework

    Shuochao Yao, Yiran Zhao, Aston Zhang, Lu Su, Tarek Abdelzaher
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)

    Recent advances in deep learning motivate the use of deep neutral networks in
    sensing applications, but their excessive resource needs on constrained
    embedded devices remain an important impediment. A recently explored solution
    space lies in compressing (approximating or simplifying) deep neural networks
    in some manner before use on the device. We propose a new compression solution,
    called DeepIoT, that makes two key contributions in that space. First, unlike
    current solutions geared for compressing specific types of neural networks,
    DeepIoT presents a unified approach that compresses all commonly used deep
    learning structures for sensing applications, including fully-connected,
    convolutional, and recurrent neural networks, as well as their combinations.
    Second, unlike solutions that either sparsify weight matrices or assume linear
    structure within weight matrices, DeepIoT compresses neural network structures
    into smaller dense matrices by finding the minimum number of non-redundant
    hidden elements, such as filters and dimensions required by each layer, while
    keeping the performance of sensing applications the same. Importantly, it does
    so using an approach that obtains a global view of parameter redundancies,
    which is shown to produce superior compression. We conduct experiments with
    five different sensing-related tasks on Intel Edison devices. DeepIoT
    outperforms all compared baseline algorithms with respect to execution time and
    energy consumption by a significant margin. It reduces the size of deep neural
    networks by 90% to 98.9%. It is thus able to shorten execution time by 71.4% to
    94.5%, and decrease energy consumption by 72.2% to 95.7%. These improvements
    are achieved without loss of accuracy. The results underscore the potential of
    DeepIoT for advancing the exploitation of deep neural networks on
    resource-constrained embedded devices.

    Inconsistent Node Flattening for Improving Top-down Hierarchical Classification

    Azad Naik, Huzefa Rangwala
    Comments: IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2016
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Large-scale classification of data where classes are structurally organized
    in a hierarchy is an important area of research. Top-down approaches that
    exploit the hierarchy during the learning and prediction phase are efficient
    for large scale hierarchical classification. However, accuracy of top-down
    approaches is poor due to error propagation i.e., prediction errors made at
    higher levels in the hierarchy cannot be corrected at lower levels. One of the
    main reason behind errors at the higher levels is the presence of inconsistent
    nodes that are introduced due to the arbitrary process of creating these
    hierarchies by domain experts. In this paper, we propose two different
    data-driven approaches (local and global) for hierarchical structure
    modification that identifies and flattens inconsistent nodes present within the
    hierarchy. Our extensive empirical evaluation of the proposed approaches on
    several image and text datasets with varying distribution of features, classes
    and training instances per class shows improved classification performance over
    competing hierarchical modification approaches. Specifically, we see an
    improvement upto 7% in Macro-F1 score with our approach over best TD baseline.
    SOURCE CODE: this http URL

    Evolving imputation strategies for missing data in classification problems with TPOT

    Unai Garciarena, Roberto Santana, Alexander Mendiburu
    Comments: 15 pages, 4 figures
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Missing data has a ubiquitous presence in real-life applications of machine
    learning techniques. Imputation methods are algorithms conceived for restoring
    missing values in the data, based on other entries in the database. The choice
    of the imputation method has an influence on the performance of the machine
    learning technique, e.g., it influences the accuracy of the classification
    algorithm applied to the data. Therefore, selecting and applying the right
    imputation method is important and usually requires a substantial amount of
    human intervention. In this paper we propose the use of genetic programming
    techniques to search for the right combination of imputation and classification
    algorithms. We build our work on the recently introduced Python-based TPOT
    library, and incorporate a heterogeneous set of imputation algorithms as part
    of the machine learning pipeline search. We show that genetic programming can
    automatically find increasingly better pipelines that include the most
    effective combinations of imputation methods, feature pre-processing, and
    classifiers for a variety of classification problems with missing data.

    Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration

    Lijie Chen, Anupam Gupta, Jian Li, Mingda Qiao, Ruosong Wang
    Comments: Accepted to COLT 2017
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)

    We study the combinatorial pure exploration problem Best-Set in stochastic
    multi-armed bandits. In a Best-Set instance, we are given (n) arms with unknown
    reward distributions, as well as a family (mathcal{F}) of feasible subsets
    over the arms. Our goal is to identify the feasible subset in (mathcal{F})
    with the maximum total mean using as few samples as possible. The problem
    generalizes the classical best arm identification problem and the top-(k) arm
    identification problem, both of which have attracted significant attention in
    recent years. We provide a novel instance-wise lower bound for the sample
    complexity of the problem, as well as a nontrivial sampling algorithm, matching
    the lower bound up to a factor of (ln|mathcal{F}|). For an important class of
    combinatorial families, we also provide polynomial time implementation of the
    sampling algorithm, using the equivalence of separation and optimization for
    convex program, and approximate Pareto curves in multi-objective optimization.
    We also show that the (ln|mathcal{F}|) factor is inevitable in general
    through a nontrivial lower bound construction. Our results significantly
    improve several previous results for several important combinatorial
    constraints, and provide a tighter understanding of the general Best-Set
    problem.

    We further introduce an even more general problem, formulated in geometric
    terms. We are given (n) Gaussian arms with unknown means and unit variance.
    Consider the (n)-dimensional Euclidean space (mathbb{R}^n), and a collection
    (mathcal{O}) of disjoint subsets. Our goal is to determine the subset in
    (mathcal{O}) that contains the (n)-dimensional vector of the means. The
    problem generalizes most pure exploration bandit problems studied in the
    literature. We provide the first nearly optimal sample complexity upper and
    lower bounds for the problem.

    Adaptive Multiple-Arm Identification

    Jiecao Chen, Xi Chen, Qin Zhang, Yuan Zhou
    Comments: 30 pages, 5 figures, preliminary version to appear in ICML 2017
    Subjects: Learning (cs.LG)

    We study the problem of selecting (K) arms with the highest expected rewards
    in a stochastic (n)-armed bandit game. This problem has a wide range of
    applications, e.g., A/B testing, crowdsourcing, simulation optimization. Our
    goal is to develop a PAC algorithm, which, with probability at least
    (1-delta), identifies a set of (K) arms with the aggregate regret at most
    (epsilon). The notion of aggregate regret for multiple-arm identification was
    first introduced in cite{Zhou:14} , which is defined as the difference of the
    averaged expected rewards between the selected set of arms and the best (K)
    arms. In contrast to cite{Zhou:14} that only provides instance-independent
    sample complexity, we introduce a new hardness parameter for characterizing the
    difficulty of any given instance. We further develop two algorithms and
    establish the corresponding sample complexity in terms of this hardness
    parameter. The derived sample complexity can be significantly smaller than
    state-of-the-art results for a large class of instances and matches the
    instance-independent lower bound upto a (log(epsilon^{-1})) factor in the
    worst case. We also prove a lower bound result showing that the extra
    (log(epsilon^{-1})) is necessary for instance-dependent algorithms using the
    introduced hardness parameter.

    Non-convex Penalties with Analytical Solutions for One-bit Compressive Sensing

    Xiaolin Huang, Ming Yan
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    One-bit measurements widely exist in the real world, and they can be used to
    recover sparse signals. This task is known as the problem of learning
    halfspaces in learning theory and one-bit compressive sensing (1bit-CS) in
    signal processing. In this paper, we propose novel algorithms based on both
    convex and nonconvex sparsity-inducing penalties for robust 1bit-CS. We provide
    a sufficient condition to verify whether a solution is globally optimal or not.
    Then we show that the globally optimal solution for positive homogeneous
    penalties can be obtained in two steps: a proximal operator and a normalization
    step. For several nonconvex penalties, including minimax concave penalty (MCP),
    (ell_0) norm, and sorted (ell_1) penalty, we provide fast algorithms for
    finding the analytical solutions by solving the dual problem. Specifically, our
    algorithm is more than (200) times faster than the existing algorithm for MCP.
    Its efficiency is comparable to the algorithm for the (ell_1) penalty in time,
    while its performance is much better. Among these penalties, the sorted
    (ell_1) penalty is most robust to noise in different settings.

    DeepSF: deep convolutional neural network for mapping protein sequences to folds

    Jie Hou, Badri Adhikari, Jianlin Cheng
    Comments: 28 pages, 13 figures
    Subjects: Learning (cs.LG); Biomolecules (q-bio.BM)

    Motivation

    Protein fold recognition is an important problem in structural
    bioinformatics. Almost all traditional fold recognition methods use sequence
    (homology) comparison to indirectly predict the fold of a tar get protein based
    on the fold of a template protein with known structure, which cannot explain
    the relationship between sequence and fold. Only a few methods had been
    developed to classify protein sequences into a small number of folds due to
    methodological limitations, which are not generally useful in practice.

    Results

    We develop a deep 1D-convolution neural network (DeepSF) to directly classify
    any protein se quence into one of 1195 known folds, which is useful for both
    fold recognition and the study of se quence-structure relationship. Different
    from traditional sequence alignment (comparison) based methods, our method
    automatically extracts fold-related features from a protein sequence of any
    length and map it to the fold space. We train and test our method on the
    datasets curated from SCOP1.75, yielding a classification accuracy of 80.4%. On
    the independent testing dataset curated from SCOP2.06, the classification
    accuracy is 77.0%. We compare our method with a top profile profile alignment
    method – HHSearch on hard template-based and template-free modeling targets of
    CASP9-12 in terms of fold recognition accuracy. The accuracy of our method is
    14.5%-29.1% higher than HHSearch on template-free modeling targets and
    4.5%-16.7% higher on hard template-based modeling targets for top 1, 5, and 10
    predicted folds. The hidden features extracted from sequence by our method is
    robust against sequence mutation, insertion, deletion and truncation, and can
    be used for other protein pattern recognition problems such as protein
    clustering, comparison and ranking.

    Swarm Intelligence in Semi-supervised Classification

    Shahira Shaaban Azab, Hesham Ahmed Hefny
    Journal-ref: Data Mining And Knowledge Engineering, 9(5), 99-103 (2017)
    Subjects: Learning (cs.LG)

    This Paper represents a literature review of Swarm intelligence algorithm in
    the area of semi-supervised classification. There are many research papers for
    applying swarm intelligence algorithms in the area of machine learning. Some
    algorithms of SI are applied in the area of ML either solely or hybrid with
    other ML algorithms. SI algorithms are also used for tuning parameters of ML
    algorithm, or as a backbone for ML algorithms. This paper introduces a brief
    literature review for applying swarm intelligence algorithms in the field of
    semi-supervised learning

    Center of Gravity PSO for Partitioning Clustering

    Shahira Shaaban Azab, Hesham Ahmed Hefny
    Journal-ref: International Journal of Advanced Research in Computer Science and
    Software Engineering,Volume 7, Issue 5, May 2017 ISSN: 2277 128X
    Subjects: Learning (cs.LG)

    This paper presents the local best model of PSO for partition-based
    clustering. The proposed model gets rid off the drawbacks of gbest PSO for
    clustering. The model uses a pre-specified number of clusters K. The LPOSC has
    K neighborhoods. Each neighborhood represents one of the clusters. The goal of
    the particles in each neighborhood is optimizing the position of the centroid
    of the cluster. The performance of the proposed algorithms is measured using
    adjusted rand index. The results is compared with k-means and global best model
    of PSO.

    Semi-supervised Classification: Cluster and Label Approach using Particle Swarm Optimization

    Shahira Shaaban Azab, Mohamed Farouk Abdel Hady, Hesham Ahmed Hefny
    Journal-ref: International Journal of Computer Applications (09758887), Volume
    160, No 3, February 2017
    Subjects: Learning (cs.LG)

    Classification predicts classes of objects using the knowledge learned during
    the training phase. This process requires learning from labeled samples.
    However, the labeled samples usually limited. Annotation process is annoying,
    tedious, expensive, and requires human experts. Meanwhile, unlabeled data is
    available and almost free. Semi-supervised learning approaches make use of both
    labeled and unlabeled data. This paper introduces cluster and label approach
    using PSO for semi-supervised classification. PSO is competitive to traditional
    clustering algorithms. A new local best PSO is presented to cluster the
    unlabeled data. The available labeled data guides the learning process. The
    experiments are conducted using four state-of-the-art datasets from different
    domains. The results compared with Label Propagation a popular semi-supervised
    classifier and two state-of-the-art supervised classification models, namely
    k-nearest neighbors and decision trees. The experiments show the efficiency of
    the proposed model.

    Thompson Sampling for the MNL-Bandit

    Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi
    Comments: Accepted for presentation at Conference on Learning Theory (COLT) 2017
    Subjects: Learning (cs.LG)

    We consider a sequential subset selection problem under parameter
    uncertainty, where at each time step, the decision maker selects a subset of
    cardinality (K) from (N) possible items (arms), and observes a (bandit)
    feedback in the form of the index of one of the items in said subset, or none.
    Each item in the index set is ascribed a certain value (reward), and the
    feedback is governed by a Multinomial Logit (MNL) choice model whose parameters
    are a priori unknown. The objective of the decision maker is to maximize the
    expected cumulative rewards over a finite horizon (T), or alternatively,
    minimize the regret relative to an oracle that knows the MNL parameters. We
    refer to this as the MNL-Bandit problem. This problem is representative of a
    larger family of exploration-exploitation problems that involve a combinatorial
    objective, and arise in several important application domains. We present an
    approach to adapt Thompson Sampling to this problem and show that it achieves
    near-optimal regret as well as attractive numerical performance.

    Financial Series Prediction: Comparison Between Precision of Time Series Models and Machine Learning Methods

    Xin-Yao Qian, Shan Gao
    Subjects: Learning (cs.LG); Statistical Finance (q-fin.ST)

    Precise financial series predicting has long been a difficult problem because
    of unstableness and many noises within the series. Although Traditional time
    series models like ARIMA and GARCH have been researched and proved to be
    effective in predicting, their performances are still far from satisfying.
    Machine Learning, as an emerging research field in recent years, has brought
    about many incredible improvements in tasks such as regressing and classifying,
    and it’s also promising to exploit the methodology in financial time series
    predicting. In this paper, the predicting precision of financial time series
    between traditional time series models and mainstream machine learning models
    including some state-of-the-art ones of deep learning are compared through
    experiment using real stock index data from history. The result shows that
    machine learning as a modern method far surpasses traditional models in
    precision.

    Multiple Kernel Learning and Automatic Subspace Relevance Determination for High-dimensional Neuroimaging Data

    Murat Seckin Ayhan, Vijay Raghavan, Alzheimer's disease Neuroimaging Initiative
    Comments: The material presented here is to promote the dissemination of scholarly and technical work in a timely fashion. Data in this article are from ADNI (adni.loni.usc.edu). As such, ADNI provided data but did not participate in writing of this report
    Subjects: Learning (cs.LG); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

    Alzheimer’s disease is a major cause of dementia. Its diagnosis requires
    accurate biomarkers that are sensitive to disease stages. In this respect, we
    regard probabilistic classification as a method of designing a probabilistic
    biomarker for disease staging. Probabilistic biomarkers naturally support the
    interpretation of decisions and evaluation of uncertainty associated with them.
    In this paper, we obtain probabilistic biomarkers via Gaussian Processes.
    Gaussian Processes enable probabilistic kernel machines that offer flexible
    means to accomplish Multiple Kernel Learning. Exploiting this flexibility, we
    propose a new variation of Automatic Relevance Determination and tackle the
    challenges of high dimensionality through multiple kernels. Our research
    results demonstrate that the Gaussian Process models are competitive with or
    better than the well-known Support Vector Machine in terms of classification
    performance even in the cases of single kernel learning. Extending the basic
    scheme towards the Multiple Kernel Learning, we improve the efficacy of the
    Gaussian Process models and their interpretability in terms of the known
    anatomical correlates of the disease. For instance, the disease pathology
    starts in and around the hippocampus and entorhinal cortex. Through the use of
    Gaussian Processes and Multiple Kernel Learning, we have automatically and
    efficiently determined those portions of neuroimaging data. In addition to
    their interpretability, our Gaussian Process models are competitive with recent
    deep learning solutions under similar settings.

    Online Dynamic Programming

    Holakou Rahmanian, S.V.N. Vishwanathan, Manfred K. Warmuth
    Subjects: Learning (cs.LG)

    We consider the problem of repeatedly solving a variant of the same dynamic
    programming problem in successive trials. An instance of the type of problems
    we consider is to find the optimal binary search tree. At the beginning of each
    trial, the learner probabilistically chooses a tree with the n keys at the
    internal nodes and the n + 1 gaps between keys at the leaves. It is then told
    the frequencies of the keys and gaps and is charged by the average search cost
    for the chosen tree. The problem is online because the frequencies can change
    between trials. The goal is to develop algorithms with the property that their
    total average search cost (loss) in all trials is close to the total loss of
    the best tree chosen in hind sight for all trials. The challenge, of course, is
    that the algorithm has to deal with exponential number of trees. We develop a
    methodology for tackling such problems for a wide class of dynamic programming
    algorithms. Our framework allows us to extend online learning algorithms like
    Hedge and Component Hedge to a significantly wider class of combinatorial
    objects than was possible before.

    Information, Privacy and Stability in Adaptive Data Analysis

    Adam Smith
    Comments: 15 pages, first drafted February 2017. A version of this survey appears in the Information Theory Society Newsletter
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Traditional statistical theory assumes that the analysis to be performed on a
    given data set is selected independently of the data themselves. This
    assumption breaks downs when data are re-used across analyses and the analysis
    to be performed at a given stage depends on the results of earlier stages. Such
    dependency can arise when the same data are used by several scientific studies,
    or when a single analysis consists of multiple stages.

    How can we draw statistically valid conclusions when data are re-used? This
    is the focus of a recent and active line of work. At a high level, these
    results show that limiting the information revealed by earlier stages of
    analysis controls the bias introduced in later stages by adaptivity.

    Here we review some known results in this area and highlight the role of
    information-theoretic concepts, notably several one-shot notions of mutual
    information.

    A Joint Model for Question Answering and Question Generation

    Tong Wang, Xingdi Yuan, Adam Trischler
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose a generative machine comprehension model that learns jointly to
    ask and answer questions based on documents. The proposed model uses a
    sequence-to-sequence framework that encodes the document and generates a
    question (answer) given an answer (question). Significant improvement in model
    performance is observed empirically on the SQuAD corpus, confirming our
    hypothesis that the model benefits from jointly learning to perform both tasks.
    We believe the joint model’s novelty offers a new perspective on machine
    comprehension beyond architectural engineering, and serves as a first step
    towards autonomous information seeking.

    A simple neural network module for relational reasoning

    Adam Santoro, David Raposo, David G.T. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, Timothy Lillicrap
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Relational reasoning is a central component of generally intelligent
    behavior, but has proven difficult for neural networks to learn. In this paper
    we describe how to use Relation Networks (RNs) as a simple plug-and-play module
    to solve problems that fundamentally hinge on relational reasoning. We tested
    RN-augmented networks on three tasks: visual question answering using a
    challenging dataset called CLEVR, on which we achieve state-of-the-art,
    super-human performance; text-based question answering using the bAbI suite of
    tasks; and complex reasoning about dynamic physical systems. Then, using a
    curated dataset called Sort-of-CLEVR we show that powerful convolutional
    networks do not have a general capacity to solve relational questions, but can
    gain this capacity when augmented with RNs. Our work shows how a deep learning
    architecture equipped with an RN module can implicitly discover and learn to
    reason about entities and their relations.

    Learning Whenever Learning is Possible: Universal Learning under General Stochastic Processes

    Steve Hanneke
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)

    This work initiates a general study of learning and generalization without
    the i.i.d. assumption, starting from first principles. While the standard
    approach to statistical learning theory is based on assumptions chosen largely
    for their convenience (e.g., i.i.d. or stationary ergodic), in this work we are
    interested in developing a theory of learning based only on the most
    fundamental and natural assumptions implicit in the requirements of the
    learning problem itself. We specifically study universally consistent function
    learning, where the objective is to obtain low long-run average loss for any
    target function, when the data follow a given stochastic process. We are then
    interested in the question of whether there exist learning rules guaranteed to
    be universally consistent given only the assumption that universally consistent
    learning is possible for the given data process. The reasoning that motivates
    this criterion emanates from a kind of optimist’s decision theory, and so we
    refer to such learning rules as being optimistically universal. We study this
    question in three natural learning settings: inductive, self-adaptive, and
    online. Remarkably, as our strongest positive result, we find that
    optimistically universal learning rules do indeed exist in the self-adaptive
    learning setting. Establishing this fact requires us to develop new approaches
    to the design of learning algorithms. Along the way, we also identify concise
    characterizations of the family of processes under which universally consistent
    learning is possible in the inductive and self-adaptive settings. We
    additionally pose a number of enticing open problems, particularly for the
    online learning setting.

    Automatic Response Assessment in Regions of Language Cortex in Epilepsy Patients Using ECoG-based Functional Mapping and Machine Learning

    Harish RaviPrakash, Milena Korostenskaja, Eduardo Castillo, Ki Lee, James Baumgartner, Ulas Bagci
    Comments: This paper will appear in the Proceedings of IEEE International Conference on Systems, Man and Cybernetics (SMC) 2017
    Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Accurate localization of brain regions responsible for language and cognitive
    functions in Epilepsy patients should be carefully determined prior to surgery.
    Electrocorticography (ECoG)-based Real Time Functional Mapping (RTFM) has been
    shown to be a safer alternative to electrical cortical stimulation mapping
    (ESM), which is currently the clinical/gold standard. Conventional methods for
    analyzing RTFM signals are based on statistical comparison of signal power at
    certain frequency bands with limited response assessment accuracies. This
    inherently leads to low accuracies of functional mapping results when compared
    with gold standard.

    In this study, we address the limitation of the current RTFM signal
    estimation methods by analyzing the full frequency spectrum of the signal and
    applying machine learning algorithms, specifically random forest (RF). We train
    RF with power spectral density of the time-series RTFM signal in supervised
    learning framework where ground truth labels are obtained from the ESM.
    Experimental results obtained from RTFM of six adult patients in a strictly
    controlled experimental setup reveal the state of the art detection accuracy of
    (approx 78\%) for the language comprehension task, an improvement of (23\%)
    over the conventional RTFM estimation method. To the best of our knowledge,
    this is the first study exploring the use of machine learning approaches for
    determining RTFM signal characteristics, and using the whole-frequency band for
    better region localization. Our results demonstrate the feasibility of machine
    learning based RTFM signal analysis method over the full spectrum to be a
    clinical routine in the near future.

    Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor

    Robin Ruede, Markus Müller, Sebastian Stüker, Alex Waibel
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Learning (cs.LG); Sound (cs.SD)

    Using supporting backchannel (BC) cues can make human-computer interaction
    more social. BCs provide a feedback from the listener to the speaker indicating
    to the speaker that he is still listened to. BCs can be expressed in different
    ways, depending on the modality of the interaction, for example as gestures or
    acoustic cues. In this work, we only considered acoustic cues. We are proposing
    an approach towards detecting BC opportunities based on acoustic input features
    like power and pitch. While other works in the field rely on the use of a
    hand-written rule set or specialized features, we made use of artificial neural
    networks. They are capable of deriving higher order features from input
    features themselves. In our setup, we first used a fully connected feed-forward
    network to establish an updated baseline in comparison to our previously
    proposed setup. We also extended this setup by the use of Long Short-Term
    Memory (LSTM) networks which have shown to outperform feed-forward based setups
    on various tasks. Our best system achieved an F1-Score of 0.37 using power and
    pitch features. Adding linguistic information using word2vec, the score
    increased to 0.39.

    Event Representations for Automated Story Generation with Deep Neural Nets

    Lara J. Martin, Prithviraj Ammanabrolu, William Hancock, Shruti Singh, Brent Harrison, Mark O. Riedl
    Comments: 8 pages, 1 figure
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Automated story generation is the problem of automatically selecting a
    sequence of events, actions, or words that can be told as a story. We seek to
    develop a system that can generate stories by learning everything it needs to
    know from textual story corpora. To date, recurrent neural networks that learn
    language models at character, word, or sentence levels have had little success
    generating coherent stories. We explore the question of event representations
    that provide a mid-level of abstraction between words and sentences in order to
    retain the semantic information of the original data while minimizing event
    sparsity. We present a technique for preprocessing textual story data into
    event sequences. We then present a technique for automated story generation
    whereby we decompose the problem into the generation of successive events
    (event2event) and the generation of natural language sentences from events
    (event2sentence). We give empirical results comparing different event
    representations and their effects on event successor generation and the
    translation of events to natural language.

    Deep learning evaluation using deep linguistic processing

    Alexander Kuhnle, Ann Copestake
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We discuss problems with the standard approaches to evaluation for tasks like
    visual question answering, and argue that artificial data can be used to
    address these as a complement to current practice. We demonstrate that with the
    help of existing ‘deep’ linguistic processing technology we are able to create
    challenging abstract datasets, which enable us to investigate the language
    understanding abilities of multimodal deep learning models in detail.

    Bayesian LSTMs in medicine

    Jos van der Westhuizen, Joan Lasenby
    Comments: 11 pages, 8 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Applications (stat.AP)

    The medical field stands to see significant benefits from the recent advances
    in deep learning. Knowing the uncertainty in the decision made by any machine
    learning algorithm is of utmost importance for medical practitioners. This
    study demonstrates the utility of using Bayesian LSTMs for classification of
    medical time series. Four medical time series datasets are used to show the
    accuracy improvement Bayesian LSTMs provide over standard LSTMs. Moreover, we
    show cherry-picked examples of confident and uncertain classifications of the
    medical time series. With simple modifications of the common practice for deep
    learning, significant improvements can be made for the medical practitioner and
    patient.

    PReP: Path-Based Relevance from a Probabilistic Perspective in Heterogeneous Information Networks

    Yu Shi, Po-Wei Chan, Honglei Zhuang, Huan Gui, Jiawei Han
    Comments: 10 pages. In Proceedings of the 23nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Nova Scotia, Canada, ACM, 2017
    Subjects: Social and Information Networks (cs.SI); Learning (cs.LG)

    As a powerful representation paradigm for networked and multi-typed data, the
    heterogeneous information network (HIN) is ubiquitous. Meanwhile, defining
    proper relevance measures has always been a fundamental problem and of great
    pragmatic importance for network mining tasks. Inspired by our probabilistic
    interpretation of existing path-based relevance measures, we propose to study
    HIN relevance from a probabilistic perspective. We also identify, from
    real-world data, and propose to model cross-meta-path synergy, which is a
    characteristic important for defining path-based HIN relevance and has not been
    modeled by existing methods. A generative model is established to derive a
    novel path-based relevance measure, which is data-driven and tailored for each
    HIN. We develop an inference algorithm to find the maximum a posteriori (MAP)
    estimate of the model parameters, which entails non-trivial tricks. Experiments
    on two real-world datasets demonstrate the effectiveness of the proposed model
    and relevance measure.

    Deep MIMO Detection

    Neev Samuel, Tzvi Diskin, Ami Wiesel
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    In this paper, we consider the use of deep neural networks in the context of
    Multiple-Input-Multiple-Output (MIMO) detection. We give a brief introduction
    to deep learning and propose a modern neural network architecture suitable for
    this detection task. First, we consider the case in which the MIMO channel is
    constant, and we learn a detector for a specific system. Next, we consider the
    harder case in which the parameters are known yet changing and a single
    detector must be learned for all multiple varying channels. We demonstrate the
    performance of our deep MIMO detector using numerical simulations in comparison
    to competing methods including approximate message passing and semidefinite
    relaxation. The results show that deep networks can achieve state of the art
    accuracy with significantly lower complexity while providing robustness against
    ill conditioned channels and mis-specified noise variance.

    InfiniteBoost: building infinite ensembles with gradient descent

    Alex Rogozhnikov, Tatiana Likhomanenko
    Comments: 7 pages, 5 figures, 3 tables
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    In machine learning ensemble methods have demonstrated high accuracy for the
    variety of problems in different areas. The most known algorithms intensively
    used in practice are random forests and gradient boosting. In this paper we
    present InfiniteBoost – a novel algorithm, which combines the best properties
    of these two approaches. The algorithm constructs the ensemble of trees for
    which two properties hold: trees of the ensemble incorporate the mistakes done
    by others; at the same time the ensemble could contain the infinite number of
    trees without the over-fitting effect. The proposed algorithm is evaluated on
    the regression, classification, and ranking tasks using large scale, publicly
    available datasets.

    Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory

    Peter Richtárik, Martin Takácv
    Comments: 39 pages, 4 reformulations, 3 algorithms
    Subjects: Numerical Analysis (math.NA); Learning (cs.LG); Machine Learning (stat.ML)

    We develop a family of reformulations of an arbitrary consistent linear
    system into a stochastic problem. The reformulations are governed by two
    user-defined parameters: a positive definite matrix defining a norm, and an
    arbitrary discrete or continuous distribution over random matrices. Our
    reformulation has several equivalent interpretations, allowing for researchers
    from various communities to leverage their domain specific insights. In
    particular, our reformulation can be equivalently seen as a stochastic
    optimization problem, stochastic linear system, stochastic fixed point problem
    and a probabilistic intersection problem. We prove sufficient, and necessary
    and sufficient conditions for the reformulation to be exact.

    Further, we propose and analyze three stochastic algorithms for solving the
    reformulated problem—basic, parallel and accelerated methods—with global
    linear convergence rates. The rates can be interpreted as condition numbers of
    a matrix which depends on the system matrix and on the reformulation
    parameters. This gives rise to a new phenomenon which we call stochastic
    preconditioning, and which refers to the problem of finding parameters (matrix
    and distribution) leading to a sufficiently small condition number. Our basic
    method can be equivalently interpreted as stochastic gradient descent,
    stochastic Newton method, stochastic proximal point method, stochastic fixed
    point method, and stochastic projection method, with fixed stepsize (relaxation
    parameter), applied to the reformulations.

    Joint Text Embedding for Personalized Content-based Recommendation

    Ting Chen, Liangjie Hong, Yue Shi, Yizhou Sun
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

    Learning a good representation of text is key to many recommendation
    applications. Examples include news recommendation where texts to be
    recommended are constantly published everyday. However, most existing
    recommendation techniques, such as matrix factorization based methods, mainly
    rely on interaction histories to learn representations of items. While latent
    factors of items can be learned effectively from user interaction data, in many
    cases, such data is not available, especially for newly emerged items.

    In this work, we aim to address the problem of personalized recommendation
    for completely new items with text information available. We cast the problem
    as a personalized text ranking problem and propose a general framework that
    combines text embedding with personalized recommendation. Users and textual
    content are embedded into latent feature space. The text embedding function can
    be learned end-to-end by predicting user interactions with items. To alleviate
    sparsity in interaction data, and leverage large amount of text data with
    little or no user interactions, we further propose a joint text embedding model
    that incorporates unsupervised text embedding with a combination module.
    Experimental results show that our model can significantly improve the
    effectiveness of recommendation systems on real-world datasets.

    Context-aware, Adaptive and Scalable Android Malware Detection through Online Learning (extended version)

    Annamalai Narayanan, Mahinthan Chandramohan, Lihui Chen, Yang Liu
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG); Software Engineering (cs.SE)

    It is well-known that Android malware constantly evolves so as to evade
    detection. This causes the entire malware population to be non-stationary.
    Contrary to this fact, most of the prior works on Machine Learning based
    Android malware detection have assumed that the distribution of the observed
    malware characteristics (i.e., features) does not change over time. In this
    work, we address the problem of malware population drift and propose a novel
    online learning based framework to detect malware, named CASANDRA
    (Contextaware, Adaptive and Scalable ANDRoid mAlware detector). In order to
    perform accurate detection, a novel graph kernel that facilitates capturing
    apps’ security-sensitive behaviors along with their context information from
    dependency graphs is proposed. Besides being accurate and scalable, CASANDRA
    has specific advantages: i) being adaptive to the evolution in malware features
    over time ii) explaining the significant features that led to an app’s
    classification as being malicious or benign. In a large-scale comparative
    analysis, CASANDRA outperforms two state-of-the-art techniques on a benchmark
    dataset achieving 99.23% F-measure. When evaluated with more than 87,000 apps
    collected in-the-wild, CASANDRA achieves 89.92% accuracy, outperforming
    existing techniques by more than 25% in their typical batch learning setting
    and more than 7% when they are continuously retained, while maintaining
    comparable efficiency. Besides this, CASANDRA demonstrates excellent ability to
    adapt to the drift in real-world malware over time and great potential as a
    reliable framework to perform market-scale analysis.

    Learning by Association – A versatile semi-supervised training method for neural networks

    Philip Häusser, Alexander Mordvintsev, Daniel Cremers
    Comments: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In many real-world scenarios, labeled data for a specific machine learning
    task is costly to obtain. Semi-supervised training methods make use of
    abundantly available unlabeled data and a smaller number of labeled examples.
    We propose a new framework for semi-supervised training of deep neural networks
    inspired by learning in humans. “Associations” are made from embeddings of
    labeled samples to those of unlabeled ones and back. The optimization schedule
    encourages correct association cycles that end up at the same class from which
    the association was started and penalizes wrong associations ending at a
    different class. The implementation is easy to use and can be added to any
    existing end-to-end training setup. We demonstrate the capabilities of learning
    by association on several data sets and show that it can improve performance on
    classification tasks tremendously by making use of additionally available
    unlabeled data. In particular, for cases with few labeled data, our training
    scheme outperforms the current state of the art on SVHN.

    Spectrum-based deep neural networks for fraud detection

    Shuhan Yuan, Xintao Wu, Jun Li, Aidong Lu
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG); Social and Information Networks (cs.SI)

    In this paper, we focus on fraud detection on a signed graph with only a
    small set of labeled training data. We propose a novel framework that combines
    deep neural networks and spectral graph analysis. In particular, we use the
    node projection (called as spectral coordinate) in the low dimensional spectral
    space of the graph’s adjacency matrix as input of deep neural networks.
    Spectral coordinates in the spectral space capture the most useful topology
    information of the network. Due to the small dimension of spectral coordinates
    (compared with the dimension of the adjacency matrix derived from a graph),
    training deep neural networks becomes feasible. We develop and evaluate two
    neural networks, deep autoencoder and convolutional neural network, in our
    fraud detection framework. Experimental results on a real signed graph show
    that our spectrum based deep neural networks are effective in fraud detection.

    IDK Cascades: Fast Deep Learning by Learning not to Overthink

    Xin Wang, Yujia Luo, Daniel Crankshaw, Alexey Tumanov, Joseph E. Gonzalez
    Comments: 11 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Advances in deep learning have led to substantial increases in prediction
    accuracy as well as the cost of rendering predictions. We conjecture that for a
    majority of real-world inputs, the recent advances in deep learning have
    created models that effectively “over-think” on simple inputs. In this paper we
    revisit the classic idea of prediction cascades to reduce prediction costs. We
    introduce the “I Don’t Know” (IDK) prediction cascades framework, a general
    framework for constructing prediction cascades for arbitrary multi-class
    prediction tasks. We propose two baseline methods for constructing cascades as
    well as a new objective within this framework and evaluate these techniques on
    a range of benchmark and real-world datasets to demonstrate the prediction
    cascades can achieve 1.7-10.5x speedups in image classification tasks while
    maintaining comparable accuracy to state-of-the-art models. When combined with
    human experts, prediction cascades can achieve nearly perfect accuracy(within
    5%) while requiring human intervention on less than 30% of the queries.

    Task-specific Word Identification from Short Texts Using a Convolutional Neural Network

    Shuhan Yuan, Xintao Wu, Yang Xiang
    Comments: accepted by Intelligent Data Analysis, an International Journal
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG)

    Task-specific word identification aims to choose the task-related words that
    best describe a short text. Existing approaches require well-defined seed words
    or lexical dictionaries (e.g., WordNet), which are often unavailable for many
    applications such as social discrimination detection and fake review detection.
    However, we often have a set of labeled short texts where each short text has a
    task-related class label, e.g., discriminatory or non-discriminatory, specified
    by users or learned by classification algorithms. In this paper, we focus on
    identifying task-specific words and phrases from short texts by exploiting
    their class labels rather than using seed words or lexical dictionaries. We
    consider the task-specific word and phrase identification as feature learning.
    We train a convolutional neural network over a set of labeled texts and use
    score vectors to localize the task-specific words and phrases. Experimental
    results on sentiment word identification show that our approach significantly
    outperforms existing methods. We further conduct two case studies to show the
    effectiveness of our approach. One case study on a crawled tweets dataset
    demonstrates that our approach can successfully capture the
    discrimination-related words/phrases. The other case study on fake review
    detection shows that our approach can identify the fake-review words/phrases.

    MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU

    Qingqing Cao, Niranjan Balasubramanian, Aruna Balasubramanian
    Comments: Published at 1st International Workshop on Embedded and Mobile Deep Learning colocated with MobiSys 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    In this paper, we explore optimizations to run Recurrent Neural Network (RNN)
    models locally on mobile devices. RNN models are widely used for Natural
    Language Processing, Machine Translation, and other tasks. However, existing
    mobile applications that use RNN models do so on the cloud. To address privacy
    and efficiency concerns, we show how RNN models can be run locally on mobile
    devices. Existing work on porting deep learning models to mobile devices focus
    on Convolution Neural Networks (CNNs) and cannot be applied directly to RNN
    models. In response, we present MobiRNN, a mobile-specific optimization
    framework that implements GPU offloading specifically for mobile GPUs.
    Evaluations using an RNN model for activity recognition shows that MobiRNN does
    significantly decrease the latency of running RNN models on phones.

    Adversarial Ranking for Language Generation

    Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, Ming-Ting Sun
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Generative adversarial networks (GANs) have great successes on synthesizing
    data. However, the existing GANs restrict the discriminator to be a binary
    classifier, and thus limit their learning capacity for tasks that need to
    synthesize output with rich structures such as natural language descriptions.
    In this paper, we propose a novel generative adversarial network, RankGAN, for
    generating high-quality language descriptions. Rather than train the
    discriminator to learn and assign absolute binary predicate for individual data
    sample, the proposed RankGAN is able to analyze and rank a collection of
    human-written and machine-written sentences by giving a reference group. By
    viewing a set of data samples collectively and evaluating their quality through
    relative ranking scores, the discriminator is able to make better assessment
    which in turn helps to learn a better generator. The proposed RankGAN is
    optimized through the policy gradient technique. Experimental results on
    multiple public datasets clearly demonstrate the effectiveness of the proposed
    approach.

    Unfolding Hidden Barriers by Active Enhanced Sampling

    Jing Zhang, Ming Chen
    Comments: 19 pages, 5 figures
    Subjects: Chemical Physics (physics.chem-ph); Statistical Mechanics (cond-mat.stat-mech); Learning (cs.LG)

    Collective variable (CV) or order parameter based enhanced sampling
    algorithms have achieved great success due to their ability to efficiently
    explore the rough potential energy landscapes of complex systems. However, the
    degeneracy of microscopic configurations, originating from the orthogonal space
    perpendicular to the CVs, is likely to shadow “hidden barriers” and greatly
    reduce the efficiency of CV-based sampling. Here we demonstrate that systematic
    machine learning CV, through enhanced sampling, can iteratively lift such
    degeneracies on the fly. We introduce an active learning scheme that consists
    of a parametric CV learner based on deep neural network and a CV-based enhanced
    sampler. Our active enhanced sampling (AES) algorithm is capable of identifying
    the least informative regions based on a historical sample, forming a positive
    feedback loop between the CV leaner and sampler. This approach is able to
    globally preserve kinetic characteristics by incrementally enhancing both
    sample completeness and CV quality.

    NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization

    Davood Hajinezhad, Mingyi Hong, Tuo Zhao, Zhaoran Wang
    Comments: 35 pages, 2 figures
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)

    We study a stochastic and distributed algorithm for nonconvex problems whose
    objective consists of a sum of (N) nonconvex (L_i/N)-smooth functions, plus a
    nonsmooth regularizer. The proposed NonconvEx primal-dual SpliTTing (NESTT)
    algorithm splits the problem into (N) subproblems, and utilizes an augmented
    Lagrangian based primal-dual scheme to solve it in a distributed and stochastic
    manner. With a special non-uniform sampling, a version of NESTT achieves
    (epsilon)-stationary solution using
    (mathcal{O}((sum_{i=1}^Nsqrt{L_i/N})^2/epsilon)) gradient evaluations,
    which can be up to (mathcal{O}(N)) times better than the (proximal) gradient
    descent methods. It also achieves Q-linear convergence rate for nonconvex
    (ell_1) penalized quadratic problems with polyhedral constraints. Further, we
    reveal a fundamental connection between primal-dual based methods and a few
    primal only methods such as IAG/SAG/SAGA.

    Pathwise Coordinate Optimization for Sparse Learning: Algorithm and Theory

    Tuo Zhao, Han Liu, Tong Zhang
    Comments: Accepted by the Annals of Statistics, 2016+
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)

    The pathwise coordinate optimization is one of the most important
    computational frameworks for high dimensional convex and nonconvex sparse
    learning problems. It differs from the classical coordinate optimization
    algorithms in three salient features: {it warm start initialization}, {it
    active set updating}, and {it strong rule for coordinate preselection}. Such a
    complex algorithmic structure grants superior empirical performance, but also
    poses significant challenge to theoretical analysis. To tackle this long
    lasting problem, we develop a new theory showing that these three features play
    pivotal roles in guaranteeing the outstanding statistical and computational
    performance of the pathwise coordinate optimization framework. Particularly, we
    analyze the existing pathwise coordinate optimization algorithms and provide
    new theoretical insights into them. The obtained insights further motivate the
    development of several modifications to improve the pathwise coordinate
    optimization framework, which guarantees linear convergence to a unique sparse
    local optimum with optimal statistical properties in parameter estimation and
    support recovery. This is the first result on the computational and statistical
    guarantees of the pathwise coordinate optimization framework in high
    dimensions. Thorough numerical experiments are provided to support our theory.

    Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery

    Han Liu, Lie Wang, Tuo Zhao
    Comments: Journal of Machine Learning Research, 2015
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We propose a calibrated multivariate regression method named CMR for fitting
    high dimensional multivariate regression models. Compared with existing
    methods, CMR calibrates regularization for each regression task with respect to
    its noise level so that it simultaneously attains improved finite-sample
    performance and tuning insensitiveness. Theoretically, we provide sufficient
    conditions under which CMR achieves the optimal rate of convergence in
    parameter estimation. Computationally, we propose an efficient smoothed
    proximal gradient algorithm with a worst-case numerical rate of convergence
    (cO(1/epsilon)), where (epsilon) is a pre-specified accuracy of the
    objective function value. We conduct thorough numerical simulations to
    illustrate that CMR consistently outperforms other high dimensional
    multivariate regression methods. We also apply CMR to solve a brain activity
    prediction problem and find that it is as competitive as a handcrafted model
    created by human experts. The R package exttt{camel} implementing the
    proposed method is available on the Comprehensive R Archive Network
    url{this http URL}.


    Information Theory

    The Capacity of Private Information Retrieval from Byzantine and Colluding Databases

    Karim Banawan, Sennur Ulukus
    Comments: Submitted to IEEE Transactions on Information Theory, June 2017
    Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Information Retrieval (cs.IR)

    We consider the problem of single-round private information retrieval (PIR)
    from (N) replicated databases. We consider the case when (B) databases are
    outdated (unsynchronized), or even worse, adversarial (Byzantine), and
    therefore, can return incorrect answers. In the PIR problem with Byzantine
    databases (BPIR), a user wishes to retrieve a specific message from a set of
    (M) messages with zero-error, irrespective of the actions performed by the
    Byzantine databases. We consider the (T)-privacy constraint in this paper,
    where any (T) databases can collude, and exchange the queries submitted by the
    user. We derive the information-theoretic capacity of this problem, which is
    the maximum number of emph{correct symbols} that can be retrieved privately
    (under the (T)-privacy constraint) for every symbol of the downloaded data. We
    determine the exact BPIR capacity to be
    (C=frac{N-2B}{N}cdotfrac{1-frac{T}{N-2B}}{1-(frac{T}{N-2B})^M}), if (2B+T
    < N). This capacity expression shows that the effect of Byzantine databases on
    the retrieval rate is equivalent to removing (2B) databases from the system,
    with a penalty factor of (frac{N-2B}{N}), which signifies that even though the
    number of databases needed for PIR is effectively (N-2B), the user still needs
    to access the entire (N) databases. The result shows that for the
    unsynchronized PIR problem, if the user does not have any knowledge about the
    fraction of the messages that are mis-synchronized, the single-round capacity
    is the same as the BPIR capacity. Our achievable scheme extends the optimal
    achievable scheme for the robust PIR (RPIR) problem to correct the
    emph{errors} introduced by the Byzantine databases as opposed to
    emph{erasures} in the RPIR problem. Our converse proof uses the idea of the
    cut-set bound in the network coding problem against adversarial nodes.

    Optimal Design Method of MIMO Antenna Directivities and Corresponding Current Distributions by Using Spherical Mode Expansion

    Maki Arai, Masashi Iwabuchi, Kei Sakaguchi, Kiyomichi Araki
    Comments: 23 pages, 19 figures, accepted, paper
    Journal-ref: IEICE Transactions on Communications, Vol.E100-B, No.10, Oct. 2017
    Subjects: Information Theory (cs.IT)

    This paper proposes a new methodology to design optimal antennas for MIMO
    (Multi-Input Multi-Output) communication systems by using spherical mode
    expansion. Given spatial channel properties of a MIMO channel, such as the
    angular profile at both sides, the optimal MIMO antennas should provide the
    largest channel capacity with a constraint of the limited implementation space
    (volume). In designing a conventional MIMO antenna, first the antenna structure
    (current distribution) is determined, second antenna directivity is calculated
    based on the current distribution, and thirdly MIMO channel capacity is
    calculated by using given angular profiles and obtained antenna directivity.
    This process is repeated by adjusting the antenna structure until the
    performance satisfies a predefined threshold. To the contrary, this paper
    solves the optimization problem analytically and finally gives near optimal
    antenna structure (current distribution) without any greedy search. In the
    proposed process, first the optimal directivity of MIMO antennas is derived by
    applying spherical mode expansion to the angular profiles, and second a
    far-near field conversion is applied on the derived optimal directivity to
    achieve near optimal current distributions on a limited surface. The
    effectiveness of the proposed design methodology is validated via numerical
    calculation of MIMO channel capacity as in the conventional design method while
    giving near optimal current distribution with constraint of an antenna
    structure derived from proposed methodology.

    Hybrid Beamforming with Reduced Number of Phase Shifters for Massive MIMO Systems

    Sohail Payami, Mir Ghoraishi, Mehrdad Dianati
    Comments: Submitted to Transactions on Vehicular Technology Correspondence
    Subjects: Information Theory (cs.IT)

    In this correspondence, two novel hybrid beamforming methods are proposed to
    reduce the cost and power consumption of hybrid beamformers with subconnected
    phase shifter network structure in massive multiple-input multiple-output
    (MIMO) systems. This is achieved by replacing some of the phase shifters with
    switches which, in general, are cheaper and have lower power consumption
    compared to phase shifters. The proposed methods and the closed-form
    expressions of their performance are derived according to the properties of the
    elements of the singular vectors of the channel matrix. In the first approach,
    it is shown that by combining the subconnected phase shifter network with a
    fully-connected switch architecture, the number of the phase shifters can be
    reduced up to 50% while the spectral efficiency is preserved. Then, in order to
    simplify the structure of the switch network, the fully-connected structure is
    replaced by subconnected switch network, e.g. binary switches. The analytical
    and simulation results indicate that the number of the phase shifters can be
    reduced to 25% while 90% of the spectral efficiency is achieved.

    Maximum Likelihood Signal Amplitude Estimation Based on Permuted Blocks of Differently Binary Quantized Observations of a Signal in Noise

    Guanyu Wang, Jiang Zhu, Rick S. Blum, Paolo Braca, Zhiwei Xu
    Subjects: Information Theory (cs.IT)

    Parameter estimation based on binary quantized observations is considered
    given the estimation system does not know which of a set of quantizers was
    used, without replacement, for each block of observations. Thus the estimation
    system receives permutated blocks of quantized samples of a signal in noise
    with unknown signal amplitude. Maximum likelihood (ML) estimators are utilized
    to estimate both the permutation matrix and unknown signal amplitude under
    arbitrary, but known, signal shape and quantizer thresholds. Sufficient
    conditions are provided under which an ML estimator can be found in polynomial
    time. In addition, model identifiability is also studied, and an alternating
    maximization algorithm is proposed to solve the general problem via good
    initial estimates. Finally numerical simulations are performed to evaluate the
    performances of the ML estimators.

    Linear Capacity of Networks over Ring Alphabets

    Joseph Connelly, Kenneth Zeger
    Comments: Submitted to IEEE Transactions on Information Theory June 4th 2017
    Subjects: Information Theory (cs.IT)

    The rate of a network code is the ratio of the block sizes of the network’s
    messages and its edge codewords. The linear capacity of a network over a finite
    ring alphabet is the supremum of achievable rates using linear codes over the
    ring. We prove the following for directed acyclic networks:

    (i) For every finite field F and every finite ring R, there is some network
    whose linear capacity over R is greater than over F if and only if the sizes of
    F and R are relatively prime.

    (ii) Every network’s linear capacity over a finite ring is less than or equal
    to its linear capacity over every finite field whose characteristic divides the
    ring’s size. As a consequence, every network’s linear capacity over a finite
    field is greater than or equal to its linear capacity over every ring of the
    same size. Additionally, every network’s linear capacity over a module is at
    most its linear capacity over some finite field.

    (iii) The linear capacity of any network over a finite field depends only on
    the characteristic of the field.

    (iv) Every network that is asymptotically linearly solvable over some finite
    ring (or even some module) is also asymptotically linearly solvable over some
    finite field.

    These results establish the sufficiency of finite field alphabets for linear
    network coding. Namely, linear network coding capacity cannot be improved by
    looking beyond finite field alphabets to more general ring alphabets. However,
    certain rings can yield higher linear capacities for certain networks than can
    a given field.

    Optimal Envelope Approximation in Fourier Basis with Applications in TV White Space

    Animesh Kumar
    Comments: 5 pages, 4 figures, submitted to Global SIP 2017 conference
    Subjects: Information Theory (cs.IT); Computational Engineering, Finance, and Science (cs.CE)

    Lowpass envelope approximation of smooth continuous-variable signals are
    introduced in this work. Envelope approximations are necessary when a given
    signal has to be approximated always to a larger value (such as in TV white
    space protection regions). In this work, a near-optimal approximate algorithm
    for finding a signal’s envelope, while minimizing a mean-squared cost function,
    is detailed. The sparse (lowpass) signal approximation is obtained in the
    linear Fourier series basis. This approximate algorithm works by discretizing
    the envelope property from an infinite number of points to a large (but finite)
    number of points. It is shown that this approximate algorithm is near-optimal
    and can be solved by using efficient convex optimization programs available in
    the literature. Simulation results are provided towards the end to gain more
    insights into the analytical results presented.

    A Construction for Balancing Non-Binary Sequences Based on Gray Code Prefixes

    Elie Ngomseu Mambou, Theo G. Swart
    Comments: 9 pages, 7 figures
    Subjects: Information Theory (cs.IT)

    We introduce a new construction for the balancing of non-binary sequences
    that make use of Gray codes for prefix coding. Our construction provides full
    encoding and decoding of sequences, including the prefix. This construction is
    based on a generalization of Knuth’s parallel balancing approach, which can
    handle very long information sequences. However, the overall sequence composed
    of the information sequence, together with the prefix must be balanced. This is
    reminiscent of Knuth’s serial algorithm. The encoding of our construction does
    not make use of lookup tables, while the decoding process is simple and can be
    done in parallel.

    Construction of q-ary Constant Weight Sequences using a Knuth-like Approach

    Elie Ngomseu Mambou, Theo G. Swart
    Comments: 5 pages, 4 figures
    Subjects: Information Theory (cs.IT)

    We present an encoding and decoding scheme for constant weight sequences,
    that is, given an information sequence, the construction results in a sequence
    of specific weight within a certain range. The scheme uses a prefix design that
    is based on Gray codes. Furthermore, by adding redundant symbols we extend the
    range of weight values for output sequences, which is useful for some
    applications.

    Two-Point Codes for the Generalized GK curve

    Elise Barelli, Peter Beelen, Mrinmoy Datta, Vincent Neiger, Johan Rosenkilde
    Comments: 13 pages
    Subjects: Information Theory (cs.IT)

    In this article we investigate AG codes constructed using the generalized
    Giulietti-Korchmaros curve. More precisely we consider two-point codes coming
    from this curve. A special case was considered in [5], where two-point codes
    coming from the Giulietti-Korchmaros curve were studied. However, even in this
    special case, our results improve upon what is currently known. Our method
    builds on the order bound for AG codes, which is why we study several
    Weierstrass semigroups before stating our results on two-point codes. We find
    several further improvements upon the MinT tables.

    DORE: An Experimental Framework to Enable Outband D2D Relay in Cellular Networks

    Arash Asadi, Vincenzo Mancuso, Rohit Gupta
    Comments: arXiv admin note: substantial text overlap with arXiv:1610.08293
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    Device-to-Device communications represent a paradigm shift in cellular
    networks. In particular, analytical results on D2D performance for offloading
    and relay are very promising, but no experimental evidence validates these
    results to date. This paper is the first to provide an experimental analysis of
    outband D2D relay schemes. Moreover, we design DORE, a complete framework for
    handling channel opportunities offered by outband D2D relay nodes. DORE
    consists of resource allocation optimization tools and protocols suitable to
    integrate QoS-aware opportunistic D2D communications within the architecture of
    3GPP Proximity-based Services. We implement DORE using an SDR framework to
    profile cellular network dynamics in the presence of opportunistic outband D2D
    communication schemes. Our experiments reveal that outband D2D communications
    are suitable for relaying in a large variety of delay-sensitive cellular
    applications, and that DORE enables notable gains even with a few active D2D
    relay nodes.

    The Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

    Pragya Sur, Yuxin Chen, Emmanuel J. Candès
    Comments: 58 pages, 7 figures
    Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Probability (math.PR); Machine Learning (stat.ML)

    Logistic regression is used thousands of times a day to fit data, predict
    future outcomes, and assess the statistical significance of explanatory
    variables. When used for the purpose of statistical inference, logistic models
    produce p-values for the regression coefficients by using an approximation to
    the distribution of the likelihood-ratio test. Indeed, Wilks’ theorem asserts
    that whenever we have a fixed number (p) of variables, twice the log-likelihood
    ratio (LLR) (2Lambda) is distributed as a (chi^2_k) variable in the limit of
    large sample sizes (n); here, (k) is the number of variables being tested. In
    this paper, we prove that when (p) is not negligible compared to (n), Wilks’
    theorem does not hold and that the chi-square approximation is grossly
    incorrect; in fact, this approximation produces p-values that are far too small
    (under the null hypothesis). Assume that (n) and (p) grow large in such a way
    that (p/n
    ightarrowkappa) for some constant (kappa < 1/2). We prove that for
    a class of logistic models, the LLR converges to a rescaled chi-square, namely,
    (2Lambda~stackrel{mathrm{d}}{
    ightarrow}~alpha(kappa)chi_k^2), where the
    scaling factor (alpha(kappa)) is greater than one as soon as the
    dimensionality ratio (kappa) is positive. Hence, the LLR is larger than
    classically assumed. For instance, when (kappa=0.3),
    (alpha(kappa)approx1.5). In general, we show how to compute the scaling
    factor by solving a nonlinear system of two equations with two unknowns. Our
    mathematical arguments are involved and use techniques from approximate message
    passing theory, non-asymptotic random matrix theory and convex geometry. We
    also complement our mathematical study by showing that the new limiting
    distribution is accurate for finite sample sizes. Finally, all the results from
    this paper extend to some other regression models such as the probit regression
    model.

    Deep MIMO Detection

    Neev Samuel, Tzvi Diskin, Ami Wiesel
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    In this paper, we consider the use of deep neural networks in the context of
    Multiple-Input-Multiple-Output (MIMO) detection. We give a brief introduction
    to deep learning and propose a modern neural network architecture suitable for
    this detection task. First, we consider the case in which the MIMO channel is
    constant, and we learn a detector for a specific system. Next, we consider the
    harder case in which the parameters are known yet changing and a single
    detector must be learned for all multiple varying channels. We demonstrate the
    performance of our deep MIMO detector using numerical simulations in comparison
    to competing methods including approximate message passing and semidefinite
    relaxation. The results show that deep networks can achieve state of the art
    accuracy with significantly lower complexity while providing robustness against
    ill conditioned channels and mis-specified noise variance.

    X-TCP: A Cross Layer Approach for TCP Uplink Flows in mmWave Networks

    Tommy Azzino, Matteo Drago, Michele Polese, Andrea Zanella, Michele Zorzi
    Comments: 6 pages, 5 figures, accepted for presentation at the 2017 16th Annual Mediterranean Ad Hoc Networking Workshop (MED-HOC-NET)
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    Millimeter wave frequencies will likely be part of the fifth generation of
    mobile networks and of the 3GPP New Radio (NR) standard. MmWave communication
    indeed provides a very large bandwidth, thus an increased cell throughput, but
    how to exploit these resources at the higher layers is still an open research
    question. A very relevant issue is the high variability of the channel, caused
    by the blockage from obstacles and the human body. This affects the design of
    congestion control mechanisms at the transport layer, and state-of-the-art TCP
    schemes such as TCP CUBIC present suboptimal performance. In this paper, we
    present a cross layer approach for uplink flows that adjusts the congestion
    window of TCP at the mobile equipment side using an estimation of the available
    data rate at the mmWave physical layer, based on the actual resource allocation
    and on the Signal to Interference plus Noise Ratio. We show that this approach
    reduces the latency, avoiding to fill the buffers in the cellular stack, and
    has a quicker recovery time after RTO events than several other TCP congestion
    control algorithms.

    Spectral Simplicity of Apparent Complexity, Part II: Exact Complexities and Complexity Spectra

    Paul M. Riechers, James P. Crutchfield
    Comments: 27 pages, 12 figures, 2 tables; most recent version at this http URL
    Subjects: Chaotic Dynamics (nlin.CD); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Dynamical Systems (math.DS); Functional Analysis (math.FA)

    The meromorphic functional calculus developed in Part I overcomes the
    nondiagonalizability of linear operators that arises often in the temporal
    evolution of complex systems and is generic to the metadynamics of predicting
    their behavior. Using the resulting spectral decomposition, we derive
    closed-form expressions for correlation functions, finite-length Shannon
    entropy-rate approximates, asymptotic entropy rate, excess entropy, transient
    information, transient and asymptotic state uncertainty, and synchronization
    information of stochastic processes generated by finite-state hidden Markov
    models. This introduces analytical tractability to investigating information
    processing in discrete-event stochastic processes, symbolic dynamics, and
    chaotic dynamical systems. Comparisons reveal mathematical similarities between
    complexity measures originally thought to capture distinct informational and
    computational properties. We also introduce a new kind of spectral analysis via
    coronal spectrograms and the frequency-dependent spectra of past-future mutual
    information. We analyze a number of examples to illustrate the methods,
    emphasizing processes with multivariate dependencies beyond pairwise
    correlation. An appendix presents spectral decomposition calculations for one
    example in full detail.

    Efficient Textual Representation of Structure

    Brenton Chapin
    Comments: submitted to 10th ACM SIGPLAN International Conference on Software Language Engineering (SLE), 2017
    Subjects: Programming Languages (cs.PL); Information Theory (cs.IT)

    This paper attempts a more formal approach to the legibility of text based
    programming languages, presenting, with proof, minimum possible ways of
    representing structure in text interleaved with information. This presumes that
    a minimalist approach is best for purposes of human readability, data storage
    and transmission, and machine evaluation.

    Several proposals are given for improving the expression of interleaved
    hierarchical structure. For instance, a single colon can replace a pair of
    brackets, and bracket types do not need to be repeated in both opening and
    closing symbols or words. Historic and customary uses of punctuation symbols
    guided the chosen form and nature of the improvements.

    A spectral characterisation of t-designs

    Cunsheng Ding
    Subjects: Combinatorics (math.CO); Information Theory (cs.IT)

    There are two standard approaches to the construction of (t)-designs. The
    first one is based on permutation group actions on certain base blocks. The
    second one is based on coding theory. These approaches are not effective as no
    infinite family of (t)-designs is constructed for (t geq 5) with them. The
    objective of this paper is to give a spectral characterisation of all
    (t)-designs by introducing a characteristic Boolean function of a (t)-design.
    We will determine the spectra of the characteristic function of ((n-2)/2)-((n,
    n/2, 1)) Steiner systems and prove properties of such designs.




沪ICP备19023445号-2号
友情链接