IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Thu, 22 Dec 2016

    我爱机器学习(52ml.net)发表于 2016-12-22 00:00:00
    love 0

    Neural and Evolutionary Computing

    Scale-invariance of ruggedness measures in fractal fitness landscapes

    Hendrik Richter
    Subjects: Chaotic Dynamics (nlin.CD); Neural and Evolutionary Computing (cs.NE); Populations and Evolution (q-bio.PE)

    The paper deals with using chaos to direct trajectories to targets and
    analyzes ruggedness and fractality of the resulting fitness landscapes. The
    targeting problem is formulated as a dynamic fitness landscape and four
    different chaotic maps generating such a landscape are studied. By using a
    computational approach, we analyze properties of the landscapes and quantify
    their fractal and rugged characteristics. In particular, it is shown that
    ruggedness measures such as correlation length and information content are
    scale-invariant and self-similar.

    Stochastic Runtime Analysis of a Cross Entropy Algorithm for Traveling Salesman Problems

    Zijun Wu, Rolf Moehring, Jianhui Lai
    Comments: 38 pages, 7 figures
    Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    This article analyzes the stochastic runtime of a Cross-Entropy Algorithm on
    two classes of traveling salesman problems. The algorithm shares main features
    of the famous Max-Min Ant System with iteration-best reinforcement.

    For simple instances that have a ({1,n})-valued distance function and a
    unique optimal solution, we prove a stochastic runtime of (O(n^{6+epsilon}))
    with the vertex-based random solution generation, and a stochastic runtime of
    (O(n^{3+epsilon}ln n)) with the edge-based random solution generation for an
    arbitrary (epsilonin (0,1)). These runtimes are very close to the known
    expected runtime for variants of Max-Min Ant System with best-so-far
    reinforcement. They are obtained for the stronger notion of stochastic runtime,
    which means that an optimal solution is obtained in that time with an
    overwhelming probability, i.e., a probability tending exponentially fast to one
    with growing problem size.

    We also inspect more complex instances with (n) vertices positioned on an
    (m imes m) grid. When the (n) vertices span a convex polygon, we obtain a
    stochastic runtime of (O(n^{3}m^{5+epsilon})) with the vertex-based random
    solution generation, and a stochastic runtime of (O(n^{2}m^{5+epsilon})) for
    the edge-based random solution generation. When there are (k = O(1)) many
    vertices inside a convex polygon spanned by the other (n-k) vertices, we obtain
    a stochastic runtime of (O(n^{4}m^{5+epsilon}+n^{6k-1}m^{epsilon})) with the
    vertex-based random solution generation, and a stochastic runtime of
    (O(n^{3}m^{5+epsilon}+n^{3k}m^{epsilon})) with the edge-based random solution
    generation. These runtimes are better than the expected runtime for the
    so-called ((mu!+!lambda)) EA reported in a recent article, and again
    obtained for the stronger notion of stochastic runtime.


    Computer Vision and Pattern Recognition

    Beyond Holistic Object Recognition: Enriching Image Understanding with Part States

    Cewu Lu, Hao Su, Yongyi Lu, Li Yi, Chikeung Tang, Leonidas Guibas
    Comments: 9 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Important high-level vision tasks such as human-object interaction, image
    captioning and robotic manipulation require rich semantic descriptions of
    objects at part level. Based upon previous work on part localization, in this
    paper, we address the problem of inferring rich semantics imparted by an object
    part in still images. We propose to tokenize the semantic space as a discrete
    set of part states. Our modeling of part state is spatially localized,
    therefore, we formulate the part state inference problem as a pixel-wise
    annotation problem. An iterative part-state inference neural network is
    specifically designed for this task, which is efficient in time and accurate in
    performance. Extensive experiments demonstrate that the proposed method can
    effectively predict the semantic states of parts and simultaneously correct
    localization errors, thus benefiting a few visual understanding applications.
    The other contribution of this paper is our part state dataset which contains
    rich part-level semantic annotations.

    Learning Motion Patterns in Videos

    Pavel Tokmakov, Karteek Alahari, Cordelia Schmid
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The problem of determining whether an object is in motion, irrespective of
    the camera motion, is far from being solved. We address this challenging task
    by learning motion patterns in videos. The core of our approach is a fully
    convolutional network, which is learnt entirely from synthetic video sequences,
    and their ground-truth optical flow and motion segmentation. This
    encoder-decoder style architecture first learns a coarse representation of the
    optical flow field features, and then refines it iteratively to produce motion
    labels at the original high-resolution. The output label of each pixel denotes
    whether it has undergone independent motion, i.e., irrespective of the camera
    motion. We demonstrate the benefits of this learning framework on the moving
    object segmentation task, where the goal is to segment all the objects in
    motion. To this end we integrate an objectness measure into the framework. Our
    approach outperforms the top method on the recently released DAVIS benchmark
    dataset, comprising real-world sequences, by 5.6%. We also evaluate on the
    Berkeley motion segmentation database, achieving state-of-the-art results.

    A Unified Framework for Tumor Proliferation Score Prediction in Breast Histopathology

    Kyunghyun Paeng, Sangheum Hwang, Sunggyun Park, Minsoo Kim, Seokhwi Kim
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Predicting tumor proliferation scores is an important biomarker indicative of
    breast cancer patients’ prognosis. In this paper, we present a unified
    framework to predict tumor proliferation scores from whole slide images in
    breast histopathology. The proposed system is offers a fully automated solution
    to predicting both a molecular data based, and a mitosis counting based tumor
    proliferation score. The framework integrates three modules, each fine-tuned to
    maximize the overall performance: an image processing component for handling
    whole slide images, a deep learning based mitosis detection network, and a
    proliferation scores prediction module. We have achieved 0.567 quadratic
    weighted Cohen’s kappa in mitosis counting based score prediction and 0.652
    F1-score in mitosis detection. On Spearman’s correlation coefficient, which
    evaluates prediction on the molecular data based score, the system obtained
    0.6171. Our system won first place in all of the three tasks in Tumor
    Proliferation Assessment Challenge at MICCAI 2016, outperforming all other
    approaches.

    Trilaminar Multiway Reconstruction Tree for Efficient Large Scale Structure from Motion

    Kun Sun, Wenbing Tao
    Comments: this manuscript has been submitted to cvpr 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Accuracy and efficiency are two key problems in large scale incremental
    Structure from Motion (SfM). In this paper, we propose a unified framework to
    divide the image set into clusters suitable for reconstruction as well as find
    multiple reliable and stable starting points. Image partitioning performs in
    two steps. First, some small image groups are selected at places with high
    image density, and then all the images are clustered according to their optimal
    reconstruction paths to these image groups. This promises that the scene is
    always reconstructed from dense places to sparse areas, which can reduce error
    accumulation when images have weak overlap. To enable faster speed, images
    outside the selected group in each cluster are further divided to achieve a
    greater degree of parallelism. Experiments show that our method achieves
    significant speedup, higher accuracy and better completeness.

    Imaging around corners with single-pixel detector by computational ghost imaging

    Bin Bai, Jianbin Liu, Yu Zhou, Songlin Zhang, Yuchen He, Zhuo Xu
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)

    We have designed a single-pixel camera with imaging around corners based on
    computational ghost imaging. It can obtain the image of an object when the
    camera cannot look at the object directly. Our imaging system explores the fact
    that a bucket detector in a ghost imaging setup has no spatial resolution
    capability. A series of experiments have been designed to confirm our
    predictions. This camera has potential applications for imaging around corner
    or other similar environments where the object cannot be observed directly.

    FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

    Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers
    Comments: To appear in the 25th International Symposium on Field-Programmable Gate Arrays, February 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Hardware Architecture (cs.AR); Learning (cs.LG)

    Research has shown that convolutional neural networks contain significant
    redundancy, and high classification accuracy can be obtained even when weights
    and activations are reduced from floating point to binary values. In this
    paper, we present FINN, a framework for building fast and flexible FPGA
    accelerators using a flexible heterogeneous streaming architecture. By
    utilizing a novel set of optimizations that enable efficient mapping of
    binarized neural networks to hardware, we implement fully connected,
    convolutional and pooling layers, with per-layer compute resources being
    tailored to user-provided throughput requirements. On a ZC706 embedded FPGA
    platform drawing less than 25 W total system power, we demonstrate up to 12.3
    million image classifications per second with 0.31 {mu}s latency on the MNIST
    dataset with 95.8% accuracy, and 21906 image classifications per second with
    283 {mu}s latency on the CIFAR-10 and SVHN datasets with respectively 80.1%
    and 94.9% accuracy. To the best of our knowledge, ours are the fastest
    classification rates reported to date on these benchmarks.

    Recurrent Highway Networks with Language CNN for Image Captioning

    Jiuxiang Gu, Gang Wang, Tsuhan Chen
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In this paper, we propose a Recurrent Highway Network with Language CNN for
    image caption generation. Our network consists of three sub-networks: the deep
    Convolutional Neural Network for image representation, the Convolutional Neural
    Network for language modeling, and the Multimodal Recurrent Highway Network for
    sequence prediction. Our proposed model can naturally exploit the hierarchical
    and temporal structure of history words, which are critical for image caption
    generation. The effectiveness of our model is validated on two datasets MS COCO
    and Flickr30K. Our extensive experiment results show that our method is
    competitive with the state-of-the-art methods.

    Image biomarker standardisation initiative – feature definitions

    Alex Zwanenburg, Stefan Leger, Martin Vallières, Steffen Löck, for the Image Biomarker Standardisation Initiative
    Comments: 59 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    While analysis of medical images has practically taken place since the first
    image was recorded, high throughput analysis of medical images is a more recent
    phenomenon. The aim of such a radiomics process is to provide decision support
    based on medical imaging. Part of the radiomics process is the conversion of
    image data into numerical features which capture different medical image
    aspects, and can be subsequently correlated as biomarkers to e.g. expected
    oncological treatment outcome.

    With the growth of the radiomics field, it has become clear that results are
    often difficult to reproduce, that standards for image processing and feature
    extraction are missing, and that reporting guidelines are absent. The image
    biomarker standardisation initiative (IBSI) seeks to address these issues. The
    current document provides definitions for a large number of image features.

    Temporal Tessellation for Video Annotation and Summarization

    Dotan Kaufman, Gil Levi, Tal Hassner, Lior Wolf
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a general approach to video understanding, inspired by semantic
    transfer techniques successfully used for 2D image understanding. Our method
    considers a video to be a 1D sequence of clips, each one associated with its
    own semantics. The nature of these semantics — natural language captions or
    other labels — depends on the task at hand. A test video is processed by
    forming correspondences between its clips and the clips of reference videos
    with known semantics, following which, reference semantics can be transferred
    to the test video. We describe two matching methods, both designed to ensure
    that (a) reference clips appear similar to test clips and (b), taken together,
    the semantics of selected reference clips is consistent and maintains temporal
    coherence. We use our method for video captioning on the LSMDC’16 benchmark and
    video summarization on the SumMe benchmark. In both cases, our method not only
    surpasses state of the art results, but importantly, it is the only method we
    know of that was successfully applied to both video understanding tasks.

    Unsupervised Place Discovery for Visual Place Classification

    Fei Xiaoxiao, Tanaka Kanji, Inamoto Kouya
    Comments: Technical Report, 5 pages, 4 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this study, we explore the use of deep convolutional neural networks
    (DCNNs) in visual place classification for robotic mapping and localization. An
    open question is how to partition the robot’s workspace into places to maximize
    the performance (e.g., accuracy, precision, recall) of potential DCNN
    classifiers. This is a chicken and egg problem: If we had a well-trained DCNN
    classifier, it is rather easy to partition the robot’s workspace into places,
    but the training of a DCNN classifier requires a set of pre-defined place
    classes. In this study, we address this problem and present several strategies
    for unsupervised discovery of place classes (“time cue,” “location cue,”
    “time-appearance cue,” and “location-appearance cue”). We also evaluate the
    efficacy of the proposed methods using the publicly available University of
    Michigan North Campus Long-Term (NCLT) Dataset.

    A Statistical Approach to Continuous Self-Calibrating Eye Gaze Tracking for Head-Mounted Virtual Reality Systems

    Subarna Tripathi, Brian Guenter
    Comments: Accepted for publication in WACV 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a novel, automatic eye gaze tracking scheme inspired by smooth
    pursuit eye motion while playing mobile games or watching virtual reality
    contents. Our algorithm continuously calibrates an eye tracking system for a
    head mounted display. This eliminates the need for an explicit calibration step
    and automatically compensates for small movements of the headset with respect
    to the head. The algorithm finds correspondences between corneal motion and
    screen space motion, and uses these to generate Gaussian Process Regression
    models. A combination of those models provides a continuous mapping from
    corneal position to screen space position. Accuracy is nearly as good as
    achieved with an explicit calibration step.

    CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

    Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Learning (cs.LG)

    When building artificial intelligence systems that can reason and answer
    questions about visual data, we need diagnostic tests to analyze our progress
    and discover shortcomings. Existing benchmarks for visual question answering
    can help, but have strong biases that models can exploit to correctly answer
    questions without reasoning. They also conflate multiple sources of error,
    making it hard to pinpoint model weaknesses. We present a diagnostic dataset
    that tests a range of visual reasoning abilities. It contains minimal biases
    and has detailed annotations describing the kind of reasoning each question
    requires. We use this dataset to analyze a variety of modern visual reasoning
    systems, providing novel insights into their abilities and limitations.

    Multi-Agent Cooperation and the Emergence of (Natural) Language

    Angeliki Lazaridou, Alexander Peysakhovich, Marco Baroni
    Comments: Under submission at ICLR 2017
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT); Learning (cs.LG); Multiagent Systems (cs.MA)

    The current mainstream approach to train natural language systems is to
    expose them to large amounts of text. This passive learning is problematic if
    we are interested in developing interactive machines, such as conversational
    agents. We propose a framework for language learning that relies on multi-agent
    communication. We study this learning in the context of referential games. In
    these games, a sender and a receiver see a pair of images. The sender is told
    one of them is the target and is allowed to send a message from a fixed,
    arbitrary vocabulary to the receiver. The receiver must rely on this message to
    identify the target. Thus, the agents develop their own language interactively
    out of the need to communicate. We show that two networks with simple
    configurations are able to learn to coordinate in the referential game. We
    further explore how to make changes to the game environment to cause the “word
    meanings” induced in the game to better reflect intuitive semantic properties
    of the images. In addition, we present a simple strategy for grounding the
    agents’ code into natural language. Both of these are necessary steps towards
    developing machines that are able to communicate with humans productively.

    Stochastic Multidimensional Scaling

    Ketan Rajawat, Sandeep Kumar
    Subjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV)

    Multidimensional scaling (MDS) is a popular dimensionality reduction
    techniques that has been widely used for network visualization and cooperative
    localization. However, the traditional stress minimization formulation of MDS
    necessitates the use of batch optimization algorithms that are not scalable to
    large-sized problems. This paper considers an alternative stochastic stress
    minimization framework that is amenable to incremental and distributed
    solutions. A novel linear-complexity stochastic optimization algorithm is
    proposed that is provably convergent and simple to implement. The applicability
    of the proposed algorithm to localization and visualization tasks is also
    expounded. Extensive tests on synthetic and real datasets demonstrate the
    efficacy of the proposed algorithm.


    Artificial Intelligence

    Understanding Error Correction and its Role as Part of the Communication Channel in Environments composed of Self-Integrating Systems

    Aleksander Lodwich
    Comments: 60 pages, 55 figures, gray literature
    Subjects: Artificial Intelligence (cs.AI)

    The raise of complexity of technical systems also raises knowledge required
    to set them up and to maintain them. The cost to evolve such systems can be
    prohibitive. In the field of Autonomic Computing, technical systems should
    therefore have various self-healing capabilities allowing system owners to
    provide only partial, potentially inconsistent updates of the system. The
    self-healing or self-integrating system shall find out the remaining changes to
    communications and functionalities in order to accommodate change and yet still
    restore function. This issue becomes even more interesting in context of
    Internet of Things and Industrial Internet where previously unexpected device
    combinations can be assembled in order to provide a surprising new function. In
    order to pursue higher levels of self-integration capabilities I propose to
    think of self-integration as sophisticated error correcting communications.
    Therefore, this paper discusses an extended scope of error correction with the
    purpose to emphasize error correction’s role as an integrated element of
    bi-directional communication channels in self-integrating, autonomic
    communication scenarios.

    AIVAT: A New Variance Reduction Technique for Agent Evaluation in Imperfect Information Games

    Neil Burch, Martin Schmid, Matej Moravčík, Michael Bowling
    Subjects: Artificial Intelligence (cs.AI)

    Evaluating agent performance when outcomes are stochastic and agents use
    randomized strategies can be challenging when there is limited data available.
    The variance of sampled outcomes may make the simple approach of Monte Carlo
    sampling inadequate. This is the case for agents playing heads-up no-limit
    Texas hold’em poker, where man-machine competitions have involved multiple days
    of consistent play and still not resulted in statistically significant
    conclusions even when the winner’s margin is substantial. In this paper, we
    introduce AIVAT, a low variance, provably unbiased value assessment tool that
    uses an arbitrary heuristic estimate of state value, as well as the explicit
    strategy of a subset of the agents. Unlike existing techniques which reduce the
    variance from chance events, or only consider game ending actions, AIVAT
    reduces the variance both from choices by nature and by players with a known
    strategy. The resulting estimator in no-limit poker can reduce the number of
    hands needed to draw statistical conclusions by more than a factor of 10.

    Deep-learning in Mobile Robotics – from Perception to Control Systems: A Survey on Why and Why not

    Lei Tai, Ming Liu
    Comments: 16 pages, 4 figures, submit to journal
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

    Deep-learning has dramatically changed the world overnight. It greatly
    boosted the development of visual perception, object detection, and speech
    recognition, etc. That was attributed to the multiple convolutional processing
    layers for abstraction of learning representations from massive data. The
    advantages of deep convolutional structures in data processing motivated the
    applications of artificial intelligence methods in robotic problems, especially
    perception and control system, the two typical and challenging problems in
    robotics. This paper presents a survey of the deep-learning research landscape
    in mobile robotics. We start with introducing the definition and development of
    deep-learning in related fields, especially the essential distinctions between
    image processing and robotic tasks. We described and discussed several typical
    applications and related works in this domain, followed by the benefits from
    deep-learning, and related existing frameworks. Besides, operation in the
    complex dynamic environment is regarded as a critical bottleneck for mobile
    robots, such as that for autonomous driving. We thus further emphasize the
    recent achievement on how deep-learning contributes to navigation and control
    systems for mobile robots. At the end, we discuss the open challenges and
    research frontiers.

    Disjunctive Boolean Kernels for Collaborative Filtering in Top-N Recommendation

    Mirko Polato, Fabio Aiolli
    Comments: 21 pages, 25 figures, 2 tables
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

    In many personalized recommendation problems available data consists only of
    positive interactions (implicit feedback) between users and items. This problem
    is also known as One-Class Collaborative Filtering (OC-CF). Linear models
    usually achieves state-of-the-art performances on OC-CF problems and many
    efforts have been devoted to build more expressive and complex representations
    able to improve the recommendations but with no much success. Recent analysis
    shows that collaborative filtering (CF) datasets have peculiar characteristics
    such as high sparsity and a long tailed distribution of the ratings. In this
    paper we propose a boolean kernel, called Disjunctive Kernel, which is less
    expressive than the linear one but it is able to alleviate the sparsity issue
    in CF contexts. The embedding of this kernel is composed by all the
    combinations of a certain degree (d) of the input variables, and these combined
    features are semantically interpreted as disjunctions of the input variables.
    Experiments on several CF datasets show the effectiveness and the efficiency of
    the proposed kernel.

    Stochastic Runtime Analysis of a Cross Entropy Algorithm for Traveling Salesman Problems

    Zijun Wu, Rolf Moehring, Jianhui Lai
    Comments: 38 pages, 7 figures
    Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    This article analyzes the stochastic runtime of a Cross-Entropy Algorithm on
    two classes of traveling salesman problems. The algorithm shares main features
    of the famous Max-Min Ant System with iteration-best reinforcement.

    For simple instances that have a ({1,n})-valued distance function and a
    unique optimal solution, we prove a stochastic runtime of (O(n^{6+epsilon}))
    with the vertex-based random solution generation, and a stochastic runtime of
    (O(n^{3+epsilon}ln n)) with the edge-based random solution generation for an
    arbitrary (epsilonin (0,1)). These runtimes are very close to the known
    expected runtime for variants of Max-Min Ant System with best-so-far
    reinforcement. They are obtained for the stronger notion of stochastic runtime,
    which means that an optimal solution is obtained in that time with an
    overwhelming probability, i.e., a probability tending exponentially fast to one
    with growing problem size.

    We also inspect more complex instances with (n) vertices positioned on an
    (m imes m) grid. When the (n) vertices span a convex polygon, we obtain a
    stochastic runtime of (O(n^{3}m^{5+epsilon})) with the vertex-based random
    solution generation, and a stochastic runtime of (O(n^{2}m^{5+epsilon})) for
    the edge-based random solution generation. When there are (k = O(1)) many
    vertices inside a convex polygon spanned by the other (n-k) vertices, we obtain
    a stochastic runtime of (O(n^{4}m^{5+epsilon}+n^{6k-1}m^{epsilon})) with the
    vertex-based random solution generation, and a stochastic runtime of
    (O(n^{3}m^{5+epsilon}+n^{3k}m^{epsilon})) with the edge-based random solution
    generation. These runtimes are better than the expected runtime for the
    so-called ((mu!+!lambda)) EA reported in a recent article, and again
    obtained for the stronger notion of stochastic runtime.


    Information Retrieval

    Classification and Learning-to-rank Approaches for Cross-Device Matching at CIKM Cup 2016

    Nam Khanh Tran
    Comments: CIKM Cup 2016
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

    In this paper, we propose two methods for tackling the problem of
    cross-device matching for online advertising at CIKM Cup 2016. The first method
    considers the matching problem as a binary classification task and solve it by
    utilizing ensemble learning techniques. The second method defines the matching
    problem as a ranking task and effectively solve it with using learning-to-rank
    algorithms. The results show that the proposed methods obtain promising
    results, in which the ranking-based method outperforms the classification-based
    method for the task.

    A deep learning approach for predicting the quality of online health expert question-answering services

    Ze Hu, Zhan Zhang, Qing Chen, Haiqin Yang, Decheng Zuo
    Comments: Submitted to Journal of Biomedical Informatics journal on Dec 10, 2016
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    Currently, a growing number of health consumers are asking health-related
    questions online, at any time and from anywhere, which effectively lowers the
    cost of health care. The most common approach is using online health expert
    question-answering (HQA) services, as health consumers are more willing to
    trust answers from professional physicians. However, these answers can be of
    varying quality depending on circumstance. In addition, as the available HQA
    services grow, how to predict the answer quality of HQA services via machine
    learning becomes increasingly important and challenging. In an HQA service,
    answers are normally short texts, which are severely affected by the data
    sparsity problem. Furthermore, HQA services lack community features such as
    best answer and user votes. Therefore, the wisdom of the crowd is not available
    to rate answer quality. To address these problems, in this paper, the
    prediction of HQA answer quality is defined as a classification task. First,
    based on the characteristics of HQA services and feedback from medical experts,
    a standard for HQA service answer quality evaluation is defined. Next, based on
    the characteristics of HQA services, several novel non-textual features are
    proposed, including surface linguistic features and social features. Finally, a
    deep belief network (DBN)-based HQA answer quality prediction framework is
    proposed to predict the quality of answers by learning the high-level hidden
    semantic representation from the physicians’ answers. Our results prove that
    the proposed framework overcomes the problem of overly sparse textual features
    in short text answers and effectively identifies high-quality answers.

    Disjunctive Boolean Kernels for Collaborative Filtering in Top-N Recommendation

    Mirko Polato, Fabio Aiolli
    Comments: 21 pages, 25 figures, 2 tables
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

    In many personalized recommendation problems available data consists only of
    positive interactions (implicit feedback) between users and items. This problem
    is also known as One-Class Collaborative Filtering (OC-CF). Linear models
    usually achieves state-of-the-art performances on OC-CF problems and many
    efforts have been devoted to build more expressive and complex representations
    able to improve the recommendations but with no much success. Recent analysis
    shows that collaborative filtering (CF) datasets have peculiar characteristics
    such as high sparsity and a long tailed distribution of the ratings. In this
    paper we propose a boolean kernel, called Disjunctive Kernel, which is less
    expressive than the linear one but it is able to alleviate the sparsity issue
    in CF contexts. The embedding of this kernel is composed by all the
    combinations of a certain degree (d) of the input variables, and these combined
    features are semantically interpreted as disjunctions of the input variables.
    Experiments on several CF datasets show the effectiveness and the efficiency of
    the proposed kernel.

    Exploiting Rich Contents for Personalized Video Recommendation

    Xingzhong Du, Hongzhi Yin, Ling Chen, Yang Wang, Yi Yang, Xiaofang Zhou
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

    Video recommendation has become an essential way of helping people explore
    the video world and discover the ones that may be of interest to them. However,
    mainstream collaborative filtering techniques usually suffer from limited
    performance due to the sparsity of user-video interactions, and hence are
    ineffective for new video recommendation. Although some recent recommender
    models such as CTR and CDL, have integrated text information to boost
    performance, user-generated videos typically include scarce or low-quality text
    information, which seriously degenerates performance. In this paper, we
    investigate how to leverage the non-textual content contained in videos to
    improve the quality of recommendations. We propose to first extract and encode
    the diverse audio, visual and action information that rich video content
    provides, then effectively incorporate these features with collaborative
    filtering using a collaborative embedding regression model (CER). We also study
    how to fuse multiple types of content features to further improve video
    recommendation using a novel fusion method that unifies both non-textual and
    textual features. We conducted extensive experiments on a large video dataset
    collected from multiple sources. The experimental results reveal that our
    proposed recommender model and feature fusion method outperform the
    state-of-the-art methods.


    Computation and Language

    Inverted Bilingual Topic Models for Lexicon Extraction from Non-parallel Data

    Tengfei Ma
    Subjects: Computation and Language (cs.CL)

    A good lexicon is an important resource for various cross-lingual tasks such
    as information retrieval and text mining. In this paper, we focus on extracting
    translation pairs from non-parallel cross-lingual corpora. Previous lexicon
    extraction algorithms for non-parallel data generally rely on an accurate seed
    dictionary and extract translation pairs by using context similarity. However,
    there are two problems. One, a lot of semantic information is lost if we just
    use seed dictionary words to construct context vectors and obtain the context
    similarity. Two, in practice, we may not have a clean seed dictionary. For
    example, if we use a generic dictionary as a seed dictionary in a special
    domain, it might be very noisy. To solve these two problems, we propose two new
    bilingual topic models to better capture the semantic information of each word
    while discriminating the multiple translations in a noisy seed dictionary. We
    then use an effective measure to evaluate the similarity of words in different
    languages and select the optimal translation pairs. Results of experiments
    using real Japanese-English data demonstrate the effectiveness of our models.

    Multi-Agent Cooperation and the Emergence of (Natural) Language

    Angeliki Lazaridou, Alexander Peysakhovich, Marco Baroni
    Comments: Under submission at ICLR 2017
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT); Learning (cs.LG); Multiagent Systems (cs.MA)

    The current mainstream approach to train natural language systems is to
    expose them to large amounts of text. This passive learning is problematic if
    we are interested in developing interactive machines, such as conversational
    agents. We propose a framework for language learning that relies on multi-agent
    communication. We study this learning in the context of referential games. In
    these games, a sender and a receiver see a pair of images. The sender is told
    one of them is the target and is allowed to send a message from a fixed,
    arbitrary vocabulary to the receiver. The receiver must rely on this message to
    identify the target. Thus, the agents develop their own language interactively
    out of the need to communicate. We show that two networks with simple
    configurations are able to learn to coordinate in the referential game. We
    further explore how to make changes to the game environment to cause the “word
    meanings” induced in the game to better reflect intuitive semantic properties
    of the images. In addition, we present a simple strategy for grounding the
    agents’ code into natural language. Both of these are necessary steps towards
    developing machines that are able to communicate with humans productively.

    Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling

    Gábor Berend
    Subjects: Computation and Language (cs.CL)

    In this paper we propose and carefully evaluate a sequence labeling framework
    which solely utilizes sparse indicator features derived from dense distributed
    word representations. The proposed model obtains (near) state-of-the art
    performance for both part-of-speech tagging and named entity recognition for a
    variety of languages. Our model relies only on a few thousand sparse
    coding-derived features, without applying any modification of the word
    representations employed for the different tasks. The proposed model has
    favorable generalization properties as it retains over 89.8% of its average POS
    tagging accuracy when trained at 1.2% of the total available training data,
    i.e.~150 sentences per language.

    Fast Domain Adaptation for Neural Machine Translation

    Markus Freitag, Yaser Al-Onaizan
    Subjects: Computation and Language (cs.CL)

    Neural Machine Translation (NMT) is a new approach for automatic translation
    of text from one human language into another. The basic concept in NMT is to
    train a large Neural Network that maximizes the translation performance on a
    given parallel corpus. NMT is gaining popularity in the research community
    because it outperformed traditional SMT approaches in several translation tasks
    at WMT and other evaluation tasks/benchmarks at least for some language pairs.
    However, many of the enhancements in SMT over the years have not been
    incorporated into the NMT framework. In this paper, we focus on one such
    enhancement namely domain adaptation. We propose an approach for adapting a NMT
    system to a new domain. The main idea behind domain adaptation is that the
    availability of large out-of-domain training data and a small in-domain
    training data. We report significant gains with our proposed method in both
    automatic metrics and a human subjective evaluation metric on two language
    pairs. With our adaptation method, we show large improvement on the new domain
    while the performance of our general domain only degrades slightly. In
    addition, our approach is fast enough to adapt an already trained system to a
    new domain within few hours without the need to retrain the NMT model on the
    combined data which usually takes several days/weeks depending on the volume of
    the data.

    A deep learning approach for predicting the quality of online health expert question-answering services

    Ze Hu, Zhan Zhang, Qing Chen, Haiqin Yang, Decheng Zuo
    Comments: Submitted to Journal of Biomedical Informatics journal on Dec 10, 2016
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    Currently, a growing number of health consumers are asking health-related
    questions online, at any time and from anywhere, which effectively lowers the
    cost of health care. The most common approach is using online health expert
    question-answering (HQA) services, as health consumers are more willing to
    trust answers from professional physicians. However, these answers can be of
    varying quality depending on circumstance. In addition, as the available HQA
    services grow, how to predict the answer quality of HQA services via machine
    learning becomes increasingly important and challenging. In an HQA service,
    answers are normally short texts, which are severely affected by the data
    sparsity problem. Furthermore, HQA services lack community features such as
    best answer and user votes. Therefore, the wisdom of the crowd is not available
    to rate answer quality. To address these problems, in this paper, the
    prediction of HQA answer quality is defined as a classification task. First,
    based on the characteristics of HQA services and feedback from medical experts,
    a standard for HQA service answer quality evaluation is defined. Next, based on
    the characteristics of HQA services, several novel non-textual features are
    proposed, including surface linguistic features and social features. Finally, a
    deep belief network (DBN)-based HQA answer quality prediction framework is
    proposed to predict the quality of answers by learning the high-level hidden
    semantic representation from the physicians’ answers. Our results prove that
    the proposed framework overcomes the problem of overly sparse textual features
    in short text answers and effectively identifies high-quality answers.

    CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

    Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Learning (cs.LG)

    When building artificial intelligence systems that can reason and answer
    questions about visual data, we need diagnostic tests to analyze our progress
    and discover shortcomings. Existing benchmarks for visual question answering
    can help, but have strong biases that models can exploit to correctly answer
    questions without reasoning. They also conflate multiple sources of error,
    making it hard to pinpoint model weaknesses. We present a diagnostic dataset
    that tests a range of visual reasoning abilities. It contains minimal biases
    and has detailed annotations describing the kind of reasoning each question
    requires. We use this dataset to analyze a variety of modern visual reasoning
    systems, providing novel insights into their abilities and limitations.


    Distributed, Parallel, and Cluster Computing

    Study of Raspberry Pi 2 Quad-core Cortex A7 CPU Cluster as a Mini Supercomputer

    Abdurrachman Mappuji, Nazrul Effendy, Muhamad Mustaghfirin, Fandy Sondok, Rara Priska Yuniar, Sheptiani Putri Pangesti
    Comments: Pre-print of conference paper on International Conference on Information Technology and Electrical Engineering
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    High performance computing (HPC) devices is no longer exclusive for academic,
    R&D, or military purposes. The use of HPC device such as supercomputer now
    growing rapidly as some new area arise such as big data, and computer
    simulation. It makes the use of supercomputer more inclusive. Todays
    supercomputer has a huge computing power, but requires an enormous amount of
    energy to operate. In contrast a single board computer (SBC) such as Raspberry
    Pi has minimum computing power, but require a small amount of energy to
    operate, and as a bonus it is small and cheap. This paper covers the result of
    utilizing many Raspberry Pi 2 SBCs, a quad-core Cortex A7 900 MHz, as a cluster
    to compensate its computing power. The high performance linpack (HPL) is used
    to benchmark the computing power, and a power meter with resolution 10mV / 10mA
    is used to measure the power consumption. The experiment shows that the
    increase of number of cores in every SBC member in a cluster is not giving
    significant increase in computing power. This experiment give a recommendation
    that 4 nodes is a maximum number of nodes for SBC cluster based on the
    characteristic of computing performance and power consumption.


    Learning

    Loss is its own Reward: Self-Supervision for Reinforcement Learning

    Evan Shelhamer, Parsa Mahmoudieh, Max Argus, Trevor Darrell
    Subjects: Learning (cs.LG)

    Reinforcement learning, driven by reward, addresses tasks by optimizing
    policies for expected return. Need the supervision be so narrow? Reward is
    delayed and sparse for many tasks, so we argue that reward alone is a difficult
    and impoverished signal for end-to-end optimization. To augment reward, we
    consider a range of self-supervised tasks that incorporate states, actions, and
    successors to provide auxiliary losses. These losses offer ubiquitous and
    instantaneous supervision for representation learning even in the absence of
    reward. While current results show that learning from reward alone is feasible,
    pure reinforcement learning methods are constrained by computational and data
    efficiency issues that can be remedied by auxiliary losses. Self-supervised
    pre-training improves the data efficiency and policy returns of end-to-end
    reinforcement learning.

    Collaborative Filtering with User-Item Co-Autoregressive Models

    Chao Du, Chongxuan Li, Yin Zheng, Jun Zhu, Cailiang Liu, Hanning Zhou, Bo Zhang
    Subjects: Learning (cs.LG)

    Besides the success on object recognition, machine translation and system
    control in games, (deep) neural networks have achieved state-of-the-art results
    in collaborative filtering (CF) recently. Previous neural approaches for CF are
    either user-based or item-based, which cannot leverage all relevant information
    explicitly. We propose CF-UIcA, a neural co-autoregressive model for CF tasks,
    which exploit the structural autoregressiveness in the domains of both users
    and items. Furthermore, we separate the inherent dependence in this structure
    under a natural assumption and develop an efficient stochastic learning
    algorithm to handle large scale datasets. We evaluate CF-UIcA on two popular
    benchmarks: MovieLens 1M and Netflix, and achieve state-of-the-art predictive
    performance, which demonstrates the effectiveness of CF-UIcA.

    Robust Classification of Graph-Based Data

    Carlos M. Alaíz, Michaël Fanuel, Johan A. K. Suykens
    Subjects: Learning (cs.LG)

    A graph-based classification method is proposed both for semi-supervised
    learning in the case of Euclidean data and for classification in the case of
    graph data. Our manifold learning technique is based on a convex optimization
    problem involving a convex regularization term and a concave loss function with
    a trade-off parameter carefully chosen so that the objective function remains
    convex. As shown experimentally, the advantage of considering a concave loss
    function is that the learning problem becomes more robust in the presence of
    noisy labels. Furthermore, the loss function considered is then more similar to
    a classification loss while several other methods treat graph-based
    classification problems as regression problems.

    Temporal Feature Selection on Networked Time Series

    Haishuai Wang, Jia Wu, Peng Zhang, Chengqi Zhang
    Subjects: Learning (cs.LG)

    This paper formulates the problem of learning discriminative features
    ( extit{i.e.,} segments) from networked time series data considering the
    linked information among time series. For example, social network users are
    considered to be social sensors that continuously generate social signals
    (tweets) represented as a time series. The discriminative segments are often
    referred to as emph{shapelets} in a time series. Extracting shapelets for time
    series classification has been widely studied. However, existing works on
    shapelet selection assume that the time series are independent and identically
    distributed (i.i.d.). This assumption restricts their applications to social
    networked time series analysis, since a user’s actions can be correlated to
    his/her social affiliations. In this paper we propose a new Network Regularized
    Least Squares (NetRLS) feature selection model that combines typical time
    series data and user network data for analysis. Experiments on real-world
    networked time series Twitter and DBLP data demonstrate the performance of the
    proposed method. NetRLS performs better than LTS, the state-of-the-art time
    series feature selection approach, on real-world data.

    Bayesian Decision Process for Cost-Efficient Dynamic Ranking via Crowdsourcing

    Xi Chen, Kevin Jiao, Qihang Lin
    Journal-ref: Journal of Machine Learning Research 17 (2016) 1-40
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Methodology (stat.ME)

    Rank aggregation based on pairwise comparisons over a set of items has a wide
    range of applications. Although considerable research has been devoted to the
    development of rank aggregation algorithms, one basic question is how to
    efficiently collect a large amount of high-quality pairwise comparisons for the
    ranking purpose. Because of the advent of many crowdsourcing services, a crowd
    of workers are often hired to conduct pairwise comparisons with a small
    monetary reward for each pair they compare. Since different workers have
    different levels of reliability and different pairs have different levels of
    ambiguity, it is desirable to wisely allocate the limited budget for
    comparisons among the pairs of items and workers so that the global ranking can
    be accurately inferred from the comparison results. To this end, we model the
    active sampling problem in crowdsourced ranking as a Bayesian Markov decision
    process, which dynamically selects item pairs and workers to improve the
    ranking accuracy under a budget constraint. We further develop a
    computationally efficient sampling policy based on knowledge gradient as well
    as a moment matching technique for posterior approximation. Experimental
    evaluations on both synthetic and real data show that the proposed policy
    achieves high ranking accuracy with a lower labeling cost.

    Multi-Agent Cooperation and the Emergence of (Natural) Language

    Angeliki Lazaridou, Alexander Peysakhovich, Marco Baroni
    Comments: Under submission at ICLR 2017
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT); Learning (cs.LG); Multiagent Systems (cs.MA)

    The current mainstream approach to train natural language systems is to
    expose them to large amounts of text. This passive learning is problematic if
    we are interested in developing interactive machines, such as conversational
    agents. We propose a framework for language learning that relies on multi-agent
    communication. We study this learning in the context of referential games. In
    these games, a sender and a receiver see a pair of images. The sender is told
    one of them is the target and is allowed to send a message from a fixed,
    arbitrary vocabulary to the receiver. The receiver must rely on this message to
    identify the target. Thus, the agents develop their own language interactively
    out of the need to communicate. We show that two networks with simple
    configurations are able to learn to coordinate in the referential game. We
    further explore how to make changes to the game environment to cause the “word
    meanings” induced in the game to better reflect intuitive semantic properties
    of the images. In addition, we present a simple strategy for grounding the
    agents’ code into natural language. Both of these are necessary steps towards
    developing machines that are able to communicate with humans productively.

    Deep-learning in Mobile Robotics – from Perception to Control Systems: A Survey on Why and Why not

    Lei Tai, Ming Liu
    Comments: 16 pages, 4 figures, submit to journal
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

    Deep-learning has dramatically changed the world overnight. It greatly
    boosted the development of visual perception, object detection, and speech
    recognition, etc. That was attributed to the multiple convolutional processing
    layers for abstraction of learning representations from massive data. The
    advantages of deep convolutional structures in data processing motivated the
    applications of artificial intelligence methods in robotic problems, especially
    perception and control system, the two typical and challenging problems in
    robotics. This paper presents a survey of the deep-learning research landscape
    in mobile robotics. We start with introducing the definition and development of
    deep-learning in related fields, especially the essential distinctions between
    image processing and robotic tasks. We described and discussed several typical
    applications and related works in this domain, followed by the benefits from
    deep-learning, and related existing frameworks. Besides, operation in the
    complex dynamic environment is regarded as a critical bottleneck for mobile
    robots, such as that for autonomous driving. We thus further emphasize the
    recent achievement on how deep-learning contributes to navigation and control
    systems for mobile robots. At the end, we discuss the open challenges and
    research frontiers.

    FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

    Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers
    Comments: To appear in the 25th International Symposium on Field-Programmable Gate Arrays, February 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Hardware Architecture (cs.AR); Learning (cs.LG)

    Research has shown that convolutional neural networks contain significant
    redundancy, and high classification accuracy can be obtained even when weights
    and activations are reduced from floating point to binary values. In this
    paper, we present FINN, a framework for building fast and flexible FPGA
    accelerators using a flexible heterogeneous streaming architecture. By
    utilizing a novel set of optimizations that enable efficient mapping of
    binarized neural networks to hardware, we implement fully connected,
    convolutional and pooling layers, with per-layer compute resources being
    tailored to user-provided throughput requirements. On a ZC706 embedded FPGA
    platform drawing less than 25 W total system power, we demonstrate up to 12.3
    million image classifications per second with 0.31 {mu}s latency on the MNIST
    dataset with 95.8% accuracy, and 21906 image classifications per second with
    283 {mu}s latency on the CIFAR-10 and SVHN datasets with respectively 80.1%
    and 94.9% accuracy. To the best of our knowledge, ours are the fastest
    classification rates reported to date on these benchmarks.

    Classification and Learning-to-rank Approaches for Cross-Device Matching at CIKM Cup 2016

    Nam Khanh Tran
    Comments: CIKM Cup 2016
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

    In this paper, we propose two methods for tackling the problem of
    cross-device matching for online advertising at CIKM Cup 2016. The first method
    considers the matching problem as a binary classification task and solve it by
    utilizing ensemble learning techniques. The second method defines the matching
    problem as a ranking task and effectively solve it with using learning-to-rank
    algorithms. The results show that the proposed methods obtain promising
    results, in which the ranking-based method outperforms the classification-based
    method for the task.

    Recurrent Highway Networks with Language CNN for Image Captioning

    Jiuxiang Gu, Gang Wang, Tsuhan Chen
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In this paper, we propose a Recurrent Highway Network with Language CNN for
    image caption generation. Our network consists of three sub-networks: the deep
    Convolutional Neural Network for image representation, the Convolutional Neural
    Network for language modeling, and the Multimodal Recurrent Highway Network for
    sequence prediction. Our proposed model can naturally exploit the hierarchical
    and temporal structure of history words, which are critical for image caption
    generation. The effectiveness of our model is validated on two datasets MS COCO
    and Flickr30K. Our extensive experiment results show that our method is
    competitive with the state-of-the-art methods.

    Robust Learning with Kernel Mean p-Power Error Loss

    Badong Chen, Lei Xing, Xin Wang, Jing Qin, Nanning Zheng
    Comments: 11 pages, 7 figures, 10 tables
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Correntropy is a second order statistical measure in kernel space, which has
    been successfully applied in robust learning and signal processing. In this
    paper, we define a nonsecond order statistical measure in kernel space, called
    the kernel mean-p power error (KMPE), including the correntropic loss (CLoss)
    as a special case. Some basic properties of KMPE are presented. In particular,
    we apply the KMPE to extreme learning machine (ELM) and principal component
    analysis (PCA), and develop two robust learning algorithms, namely ELM-KMPE and
    PCA-KMPE. Experimental results on synthetic and benchmark data show that the
    developed algorithms can achieve consistently better performance when compared
    with some existing methods.

    Exploiting Rich Contents for Personalized Video Recommendation

    Xingzhong Du, Hongzhi Yin, Ling Chen, Yang Wang, Yi Yang, Xiaofang Zhou
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

    Video recommendation has become an essential way of helping people explore
    the video world and discover the ones that may be of interest to them. However,
    mainstream collaborative filtering techniques usually suffer from limited
    performance due to the sparsity of user-video interactions, and hence are
    ineffective for new video recommendation. Although some recent recommender
    models such as CTR and CDL, have integrated text information to boost
    performance, user-generated videos typically include scarce or low-quality text
    information, which seriously degenerates performance. In this paper, we
    investigate how to leverage the non-textual content contained in videos to
    improve the quality of recommendations. We propose to first extract and encode
    the diverse audio, visual and action information that rich video content
    provides, then effectively incorporate these features with collaborative
    filtering using a collaborative embedding regression model (CER). We also study
    how to fuse multiple types of content features to further improve video
    recommendation using a novel fusion method that unifies both non-textual and
    textual features. We conducted extensive experiments on a large video dataset
    collected from multiple sources. The experimental results reveal that our
    proposed recommender model and feature fusion method outperform the
    state-of-the-art methods.

    CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

    Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Learning (cs.LG)

    When building artificial intelligence systems that can reason and answer
    questions about visual data, we need diagnostic tests to analyze our progress
    and discover shortcomings. Existing benchmarks for visual question answering
    can help, but have strong biases that models can exploit to correctly answer
    questions without reasoning. They also conflate multiple sources of error,
    making it hard to pinpoint model weaknesses. We present a diagnostic dataset
    that tests a range of visual reasoning abilities. It contains minimal biases
    and has detailed annotations describing the kind of reasoning each question
    requires. We use this dataset to analyze a variety of modern visual reasoning
    systems, providing novel insights into their abilities and limitations.

    Robust mixture of experts modeling using the skew (t) distribution

    Faicel Chamroukhi
    Comments: arXiv admin note: substantial text overlap with arXiv:1506.06707
    Subjects: Methodology (stat.ME); Learning (cs.LG); Machine Learning (stat.ML)

    Mixture of Experts (MoE) is a popular framework in the fields of statistics
    and machine learning for modeling heterogeneity in data for regression,
    classification and clustering. MoE for continuous data are usually based on the
    normal distribution. However, it is known that for data with asymmetric
    behavior, heavy tails and atypical observations, the use of the normal
    distribution is unsuitable. We introduce a new robust non-normal mixture of
    experts modeling using the skew (t) distribution. The proposed skew (t) mixture
    of experts, named STMoE, handles these issues of the normal mixtures experts
    regarding possibly skewed, heavy-tailed and noisy data. We develop a dedicated
    expectation conditional maximization (ECM) algorithm to estimate the model
    parameters by monotonically maximizing the observed data log-likelihood. We
    describe how the presented model can be used in prediction and in model-based
    clustering of regression data. Numerical experiments carried out on simulated
    data show the effectiveness and the robustness of the proposed model in fitting
    non-linear regression functions as well as in model-based clustering. Then, the
    proposed model is applied to the real-world data of tone perception for musical
    data analysis, and the one of temperature anomalies for the analysis of climate
    change data. The obtained results confirm the usefulness of the model for
    practical data analysis applications.


    Information Theory

    Full-Duplex MIMO Small-Cell Networks with Interference Cancellation

    Italo Atzeni, Marios Kountouris
    Comments: Submitted for possible publication
    Subjects: Information Theory (cs.IT)

    Full-duplex (FD) technology is envisaged as a key component for future mobile
    broadband networks due to its ability to boost the spectral efficiency. FD
    systems can transmit and receive simultaneously on the same frequency at the
    expense of residual self-interference and additional interference to the
    network compared with half-duplex (HD) transmission. This paper analyzes the
    performance of wireless networks with FD multi-antenna base stations (BSs) and
    HD user equipments (UEs) using stochastic geometry. Our analytical results
    quantify the success probability and the achievable spectral efficiency and
    indicate the amount of self-interference cancellation needed for beneficial FD
    operation. The advantages of multi-antenna BSs/UEs are investigated and the
    performance gains achieved by optimally balancing desired signal power increase
    and interference cancellation are derived. The proposed framework provides
    crisp insights on the system-level gains of FD mode with respect to HD mode in
    terms of network throughput, as well as useful design guidelines for the
    practical implementation of FD technology in large small-cell networks.

    Poisson Cluster Process Based Analysis of HetNets with Correlated User and Base Station Locations

    Mehrnaz Afshang, Harpreet S. Dhillon
    Subjects: Information Theory (cs.IT)

    This paper develops a comprehensive new approach to the modeling and analysis
    of HetNets that accurately incorporates correlation in the locations of users
    and base stations, which exists due to the deployment of small cell base
    stations (SBSs) at the places of high user density (termed user hotspots in
    this paper). Modeling the locations of the geographical centers of user
    hotspots as a homogeneous Poisson Point Process (PPP), we assume that the users
    and SBSs are clustered around each user hotspot center independently with two
    different distributions. The macrocell base station (BS) locations are modeled
    by an independent PPP. This model naturally captures correlation that exists
    between the locations of users and their serving SBSs. Using this model, we
    study the performance of a typical user in terms of coverage probability and
    throughput for two association policies: i) power-based association, where a
    typical user is served by the open-access BS that provides maximum averaged
    received power, and ii) distance-based association, where a typical user is
    served by its nearest open-access SBS if it is located closer than a certain
    distance threshold; and macro tier otherwise. After deriving all the results in
    terms of general distributions describing the locations of users and SBSs
    around the geographical center of user hotspots, we specialize the setup to the
    Thomas cluster process. A key intermediate step in this analysis is the
    derivation of distance distributions from a typical user to the open-access and
    closed-access interfering SBSs. Consistent with the intuition, our analysis
    demonstrates that as the number of SBSs reusing the same resource block
    increases (higher frequency reuse), coverage probability decreases whereas
    throughput increases. Thus the same resource block can be aggressively reused
    by more SBSs as long as the coverage probability remains acceptable.

    Anti-Jamming Strategy for Distributed Microgrid Control based on Power Talk Communication

    Pietro Danzi, Marko Angjelichinoski, Čedomir Stefanović, Petar Popovski
    Subjects: Information Theory (cs.IT)

    In standard implementations of distributed secondary control for DC
    MicroGrids (MGs), the exchange of local measurements among neighboring control
    agents is enabled via off-the-shelf wireless solutions, such as IEEE 802.11.
    However, Denial of Service (DoS) attacks on the wireless interface through
    jamming prevents the secondary control system from performing its main tasks,
    which might compromise the stability of the MG. In this paper, we propose
    novel, robust and secure secondary control reconfiguration strategy, tailored
    to counteract DoS attacks. Specifically, upon detecting the impairment of the
    wireless interface, the jammed secondary control agent notifies its peers via a
    secure, low-rate powerline channel based on Power Talk communication. This
    triggers reconfiguration of the wireless communication graph through primary
    control mode switching, where the jammed agents leave the secondary control by
    switching to current source mode, and are replaced by nonjammed current sources
    that switch to voltage source mode and join the secondary control. The strategy
    fits within the software-defined networking framework, where the network
    control is split from the data plane using reliable and secure side power talk
    communication channel, created via software modification of the MG primary
    control loops. The simulation results illustrate the feasibility of the
    solution and prove that the MG resilience and performance can be indeed
    improved via software-defined networking approaches.

    Secure and Robust Authentication for DC MicroGrids based on Power Talk Communication

    Marko Angjelichinoski, Pietro Danzi, Čedomir Stefanović, Petar Popovski
    Subjects: Information Theory (cs.IT)

    We propose a novel framework for secure and reliable authentication of
    Distributed Energy Resources to the centralized secondary/tertiary control
    system of a DC MicroGrid (MG), networked using the IEEE 802.11 wireless
    interface. The key idea is to perform the authentication using power talk,
    which is a powerline communication technique executed by the primary control
    loops of the power electronic converters, without the use of a dedicated
    hardware for its modem. In addition, the scheme also promotes direct and active
    participation of the control system in the authentication process, a feature
    not commonly encountered in current networked control systems for MicroGrids.
    The PLECS-based simulations verifies the viability of our scheme.

    Network coding and spherical buildings

    Dirk Liebhold, Gabriele Nebe, Angeles Vazquez-Castro
    Subjects: Information Theory (cs.IT)

    We develop a network coding technique based on flags of subspaces and a
    corresponding network channel model. To define error correcting codes we
    introduce a new distance on the flag variety, the Grassmann distance on flags
    and compare it to the commonly used gallery distance for full flags.

    Rate-storage regions for Massive Random Access

    Elsa Dupraz, Thomas Maugey, Aline Roumy, Michel Kieffer
    Comments: Submitted to IEEE Transactions on Information Theory
    Subjects: Information Theory (cs.IT)

    This paper introduces a new source coding paradigm called Massive Random
    Access (MRA). In MRA, a set of correlated sources is jointly encoded and stored
    on a server, and clients want to access to only a subset of the sources. Since
    the number of simultaneous clients can be huge, the server is only authorized
    to extract a bitstream from the stored data: no re-encoding can be performed
    before the transmission of the specific client’s request. In this paper, we
    formally define the MRA framework and we introduce the notion of rate-storage
    region to characterize the performance of MRA. From an information theoretic
    analysis, we derive achievable rate-storage bounds for lossless source coding
    of i.i.d. and non i.i.d. sources, and rate-storage distortion regions for
    Gaussian sources. We also show two practical implementations of MRA systems
    based on rate-compatible LDPC codes. Both the theoretical and the experimental
    results demonstrate that MRA systems can reach the same transmission rates as
    in traditional point to point source coding schemes, while having a reasonable
    storage cost overhead. These results constitute a breakthrough for many recent
    data transmission applications in which only a part of the data is requested by
    the clients.

    New Convolutional Codes Derived from Algebraic Geometry Codes

    Francisco Revson F. Pereira, Giuliano G. La Guardia, Francisco M. de Assis
    Comments: 14 pages, 2 Table
    Subjects: Information Theory (cs.IT); Algebraic Geometry (math.AG)

    In this paper, we construct new families of convolutional codes. Such codes
    are obtained by means of algebraic geometry codes. Additionally, more families
    of convolutional codes are constructed by means of puncturing, extending,
    expanding and by the direct product code construction applied to algebraic
    geometry codes. The parameters of the new convolutional codes are better than
    or comparable to the ones available in literature. In particular, a family of
    almost near MDS codes is presented.

    Performance of group testing algorithms with constant tests-per-item

    Oliver Johnson, Matthew Aldridge, Jonathan Scarlett
    Subjects: Information Theory (cs.IT); Probability (math.PR)

    We consider the nonadaptive group testing problem in the case that each item
    appears in a constant number of tests, chosen uniformly at random with
    replacement, so that the testing matrix has (almost) constant column weights.
    We analyse the performance of simple and practical algorithms in a range of
    sparsity regimes, showing that the performance is consistently improved in
    comparison with more standard Bernoulli designs. In particular, using a
    constant-column weight design, the DD algorithm is shown to outperform all
    possible algorithms for Bernoulli designs in a broad range of sparsity regimes,
    and to beat the best-known theoretical guarantees of existing practical
    algorithms in all sparsity regimes.

    Conceptual Proposal: Frequency Offset Modulation for High-Efficiency Communications

    Xihua Zou, Wei Pan, Ge Yu, Bin Luo, Lianshan Yan
    Comments: 5 pages, 6 figures
    Subjects: Information Theory (cs.IT)

    Frequency offset modulation (FOM) is proposed as a new concept to provide
    both high energy efficiency and high spectral efficiency for communications. In
    the FOM system, an array of transmitters (TXs) is deployed and only one TX is
    activated for data transmission at any signaling time instance. The TX index
    distinguished by a very slight frequency offset among the entire occupied
    bandwidth is exploited to implicitly convey a bit unit without any power or
    signal radiation, saving the power and spectral resources. Moreover, the FOM is
    characterized by removing the stringent requirements on distinguishable spatial
    channels and perfect priori channel knowledge, while retaining the advantages
    of no inter-channel interference and no need of inter-antenna synchronization.
    In addition, a hybrid solution integrating the FOM and the spatial modulation
    is discussed to further improve the energy efficiency and spectral efficiency.
    Consequently, the FOM will be an enabling and green solution to support
    ever-increasing high-capacity data traffic in a variety of interdisciplinary
    fields.

    Private Information Retrieval in Distributed Storage Systems Using an Arbitrary Linear Code

    Siddhartha Kumar, Eirik Rosnes, Alexandre Graell i Amat
    Comments: Submitted to the 2017 IEEE International Symposium on Information Theory
    Subjects: Information Theory (cs.IT)

    We propose an information-theoretic private information retrieval (PIR)
    scheme for distributed storage systems where data is stored using a linear
    systematic code of rate (R > 1/2). The proposed scheme generalizes the PIR
    scheme for data stored using maximum distance separable codes recently proposed
    by Tajeddine and El Rouayheb for the scenario of a single spy node. We further
    propose an algorithm to optimize the communication price of privacy (cPoP)
    using the structure of the underlying linear code. As an example, we apply the
    proposed algorithm to several distributed storage codes, showing that the cPoP
    can be significantly reduced by exploiting the structure of the distributed
    storage code.

    A class of three-weight linear codes

    Gaopeng Jian, Rongquan Feng
    Subjects: Information Theory (cs.IT)

    Since Ding et al. proposed a general method for constructing linear codes
    from defining sets, the researchers have obtained a large number of linear
    codes with few weights by choosing appropriate defining sets. Let
    (mathbb{F}_q) be a finite field with (q=p^m) elements, where (p) is an odd
    prime and (m) is a positive integer. Let ( ext{Tr}) denote the trace function
    from (mathbb{F}_q) to (mathbb{F}_p) and (D={ (x,y) in mathbb{F}_q^2
    ackslash {(0,0)}: ext{Tr}(x+y^{p^k+1})=0}), where (k) is a positive
    integer. We define a (p)-ary linear codes (C_D) by [
    C_D={c(a,b)=( ext{Tr}(ax+by) )_{(x,y) in D}: a,b in mathbb{F}_q }. ] In
    this paper, we use Weil sums to investigate the weight distribution of (C_D).
    We show that the code has three nonzero weights and it can be used to construct
    secret sharing schemes.

    Joint Transceiver and Offset Design for Visible Light Communications with Input-dependent Shot Noise

    Qian Gao, Chen Gong, Zhengyuan Xu
    Comments: This work was submitted to the Transaction on Wireless Communications on Feb. 16, 2016 and is currently under review. An abridged version of this manuscript was accepted by the IEEE Globecom 2016
    Subjects: Information Theory (cs.IT)

    In this paper, we investigate the problem of the joint transceiver and offset
    design (JTOD) for point-to-point multiple-input-multiple-output (MIMO) and
    multiple user multiple-input-single-output (MU-MISO) visible light
    communication (VLC) systems. Both uplink and downlink multi-user scenarios are
    considered. The shot noise induced by the incoming signals is considered,
    leading to a more realistic MIMO VLC channel model. Under key lighting
    constraints, we formulate non-convex optimization problems aiming at minimizing
    the sum mean squared error. To optimize the transceiver and the offset jointly,
    a gradient projection based procedure is resorted to. When only imperfect
    channel state information is available, a semidefinite programming (SDP) based
    scheme is proposed to obtain robust transceiver and offset. The proposed method
    is shown to non-trivially outperform the conventional scaled zero forcing (ZF)
    and singular value decomposition (SVD) based equalization methods. The robust
    scheme works particularly well when the signal is much stronger than the noise.

    On the Design of Secure Non-Orthogonal Multiple Access Systems

    Biao He, An Liu, Nan Yang, Vincent K. N. Lau
    Subjects: Information Theory (cs.IT)

    This paper proposes a new design of non-orthogonal multiple access (NOMA)
    under secrecy considerations. We focus on a NOMA system where a transmitter
    sends confidential messages to multiple users in the presence of an external
    eavesdropper. The optimal designs of decoding order, transmission rates, and
    power allocated to each user are investigated. Considering the practical
    passive eavesdropping scenario where the instantaneous channel state of the
    eavesdropper is unknown, we adopt the secrecy outage probability as the secrecy
    metric. We first consider the problem of minimizing the transmit power subject
    to the secrecy outage and quality of service constraints, and derive the
    closed-form solution to this problem. We then explore the problem of maximizing
    the minimum confidential information rate among users subject to the secrecy
    outage and transmit power constraints, and provide an iterative algorithm to
    solve this problem. We find that the secrecy outage constraint in the studied
    problems does not change the optimal decoding order for NOMA, and one should
    increase the power allocated to the user whose channel is relatively bad when
    the secrecy constraint becomes more stringent. Finally, we show the advantage
    of NOMA over orthogonal multiple access in the studied problems both
    analytically and numerically.

    Feedback Does Not Increase the Capacity of Compound Channels with Additive Noise

    Sergey Loyka, Charalambos D. Charalambous
    Comments: submitted to IEEE Info. Theory Transactions
    Subjects: Information Theory (cs.IT)

    A discrete compound channel with memory is considered, where no stationarity,
    ergodicity or information stability is required, and where the uncertainty set
    can be arbitrary. When the discrete noise is additive but otherwise arbitrary
    and there is no cost constraint on the input, it is shown that the causal
    feedback does not increase the capacity. This extends the earlier result
    obtained for general single-state channels with full transmitter (Tx) channel
    state information (CSI) to the compound setting. It is further shown that, for
    this compound setting and under a mild technical condition on the additive
    noise, the addition of the full Tx CSI does not increase the capacity either,
    so that the worst-case and compound channel capacities are the same. This can
    also be expressed as a saddle-point in the information-theoretic game between
    the transmitter (who selects the input distribution) and the nature (who
    selects the channel state), even though the objective function (the
    inf-information rate) is not convex/concave in the right way. Cases where the
    Tx CSI does increase the capacity are identified.

    Conditions under which the strong converse holds for this channel are
    studied. The ergodic behaviour of the worst-case noise in otherwise
    information-unstable channel is shown to be both sufficient and necessary for
    the strong converse to hold, including feedback and no feedback cases.

    Good and asymptotically good quantum codes derived from algebraic geometry codes

    Giuliano Gadioli La Guardia (corresponding author), Francisco Revson F. Pereira
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    In this paper we construct several new families of quantum codes with good
    and asymptotically good parameters. These new quantum codes are derived from
    (classical) algebraic geometry (AG) codes by applying the
    Calderbank-Shor-Steane (CSS) construction. Many of these codes have large
    minimum distances when compared with its code length and they have relatively
    small Singleton defect. For example, we construct a family [[46; 2(t_2 – t1);
    d]]_25 of quantum codes, where t_1, t_2 are positive integers such that 1 < t_1
    < t_2 < 23 and d >= min {46 – 2t_2, 2t_1-2}, of length n = 46, with minimum
    distance in the range 2 =< d =< 20, having Singleton defect four. Additionally,
    by utilizing t-point AG codes, with t >= 2, we show how to obtain sequences of
    asymptotically good quantum codes.

    The Global Dynamical Complexity of the Human Brain Network

    Xerxes D. Arsiwalla, Paul Verschure
    Comments: 16 pages, 6 figures
    Subjects: Neurons and Cognition (q-bio.NC); Information Theory (cs.IT); Dynamical Systems (math.DS); Biological Physics (physics.bio-ph)

    How much information do large brain networks integrate as a whole over the
    sum of their parts? Can the dynamical complexity of such networks be globally
    quantified in an information-theoretic way and be meaningfully coupled to brain
    function? Recently, measures of dynamical complexity such as integrated
    information have been proposed. However, problems related to the normalization
    and Bell number of partitions associated to these measures make these
    approaches computationally infeasible for large-scale brain networks. Our goal
    in this work is to address this problem. Our formulation of network integrated
    information is based on the Kullback-Leibler divergence between the
    multivariate distribution on the set of network states versus the corresponding
    factorized distribution over its parts. We find that implementing the maximum
    information partition optimizes computations. These methods are well-suited for
    large networks with linear stochastic dynamics. We compute the integrated
    information for both, the system’s attractor states, as well as non-stationary
    dynamical states of the network. We then apply this formalism to brain networks
    to compute the integrated information for the human brain’s connectome.
    Compared to a randomly re-wired network, we find that the specific topology of
    the brain generates greater information complexity.




沪ICP备19023445号-2号
友情链接