IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 17 May 2017

    我爱机器学习(52ml.net)发表于 2017-05-17 00:00:00
    love 0

    Neural and Evolutionary Computing

    Ensemble of heterogeneous flexible neural trees using multiobjective genetic programming

    Varun Kumar Ojha, Ajith Abraham, Václav Snášel
    Journal-ref: Applied Soft Computing, 2017, Volume 52 Pages 909 to 924
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Machine learning algorithms are inherently multiobjective in nature, where
    approximation error minimization and model’s complexity simplification are two
    conflicting objectives. We proposed a multiobjective genetic programming (MOGP)
    for creating a heterogeneous flexible neural tree (HFNT), tree-like flexible
    feedforward neural network model. The functional heterogeneity in neural tree
    nodes was introduced to capture a better insight of data during learning
    because each input in a dataset possess different features. MOGP guided an
    initial HFNT population towards Pareto-optimal solutions, where the final
    population was used for making an ensemble system. A diversity index measure
    along with approximation error and complexity was introduced to maintain
    diversity among the candidates in the population. Hence, the ensemble was
    created by using accurate, structurally simple, and diverse candidates from
    MOGP final population. Differential evolution algorithm was applied to
    fine-tune the underlying parameters of the selected candidates. A comprehensive
    test over classification, regression, and time-series datasets proved the
    efficiency of the proposed algorithm over other available prediction methods.
    Moreover, the heterogeneous creation of HFNT proved to be efficient in making
    ensemble system from the final population.

    Metaheuristic Design of Feedforward Neural Networks: A Review of Two Decades of Research

    Varun Kumar Ojha, Ajith Abraham, Václav Snášel
    Journal-ref: Engineering Applications of Artificial Intelligence Volume 60,
    April 2017, Pages 97 to 116
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Over the past two decades, the feedforward neural network (FNN) optimization
    has been a key interest among the researchers and practitioners of multiple
    disciplines. The FNN optimization is often viewed from the various
    perspectives: the optimization of weights, network architecture, activation
    nodes, learning parameters, learning environment, etc. Researchers adopted such
    different viewpoints mainly to improve the FNN’s generalization ability. The
    gradient-descent algorithm such as backpropagation has been widely applied to
    optimize the FNNs. Its success is evident from the FNN’s application to
    numerous real-world problems. However, due to the limitations of the
    gradient-based optimization methods, the metaheuristic algorithms including the
    evolutionary algorithms, swarm intelligence, etc., are still being widely
    explored by the researchers aiming to obtain generalized FNN for a given
    problem. This article attempts to summarize a broad spectrum of FNN
    optimization methodologies including conventional and metaheuristic approaches.
    This article also tries to connect various research directions emerged out of
    the FNN optimization practices, such as evolving neural network (NN),
    cooperative coevolution NN, complex-valued NN, deep learning, extreme learning
    machine, quantum NN, etc. Additionally, it provides interesting research
    challenges for future research to cope-up with the present information
    processing era.

    The power of deeper networks for expressing natural functions

    David Rolnick (MIT), Max Tegmark (MIT)
    Comments: 9 pages, 2 figs
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    It is well-known that neural networks are universal approximators, but that
    deeper networks tend to be much more efficient than shallow ones. We shed light
    on this by proving that the total number of neurons (m) required to approximate
    natural classes of multivariate polynomials of (n) variables grows only
    linearly with (n) for deep neural networks, but grows exponentially when merely
    a single hidden layer is allowed. We also provide evidence that when the number
    of hidden layers is increased from (1) to (k), the neuron requirement grows
    exponentially not with (n) but with (n^{1/k}), suggesting that the minimum
    number of layers required for computational tractability grows only
    logarithmically with (n).

    NeuroNER: an easy-to-use program for named-entity recognition based on neural networks

    Franck Dernoncourt, Ji Young Lee, Peter Szolovits
    Comments: The first two authors contributed equally to this work
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Named-entity recognition (NER) aims at identifying entities of interest in a
    text. Artificial neural networks (ANNs) have recently been shown to outperform
    existing NER systems. However, ANNs remain challenging to use for non-expert
    users. In this paper, we present NeuroNER, an easy-to-use named-entity
    recognition tool based on ANNs. Users can annotate entities using a graphical
    web-based user interface (BRAT): the annotations are then used to train an ANN,
    which in turn predict entities’ locations and categories in new texts. NeuroNER
    makes this annotation-training-prediction flow smooth and accessible to anyone.

    Sparse Coding by Spiking Neural Networks: Convergence Theory and Computational Results

    Ping Tak Peter Tang, Tsung-Han Lin, Mike Davies
    Comments: 13 pages, 3 figures
    Subjects: Learning (cs.LG); Numerical Analysis (cs.NA); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

    In a spiking neural network (SNN), individual neurons operate autonomously
    and only communicate with other neurons sparingly and asynchronously via spike
    signals. These characteristics render a massively parallel hardware
    implementation of SNN a potentially powerful computer, albeit a non von Neumann
    one. But can one guarantee that a SNN computer solves some important problems
    reliably? In this paper, we formulate a mathematical model of one SNN that can
    be configured for a sparse coding problem for feature extraction. With a
    moderate but well-defined assumption, we prove that the SNN indeed solves
    sparse coding. To the best of our knowledge, this is the first rigorous result
    of this kind.


    Computer Vision and Pattern Recognition

    The Incremental Multiresolution Matrix Factorization Algorithm

    Vamsi K. Ithapu, Risi Kondor, Sterling C. Johnson, Vikas Singh
    Comments: Computer Vision and Pattern Recognition (CVPR) 2017, 10 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (cs.NA); Machine Learning (stat.ML)

    Multiresolution analysis and matrix factorization are foundational tools in
    computer vision. In this work, we study the interface between these two
    distinct topics and obtain techniques to uncover hierarchical block structure
    in symmetric matrices — an important aspect in the success of many vision
    problems. Our new algorithm, the incremental multiresolution matrix
    factorization, uncovers such structure one feature at a time, and hence scales
    well to large matrices. We describe how this multiscale analysis goes much
    farther than what a direct global factorization of the data can identify. We
    evaluate the efficacy of the resulting factorizations for relative leveraging
    within regression tasks using medical imaging data. We also use the
    factorization on representations learned by popular deep networks, providing
    evidence of their ability to infer semantic relationships even when they are
    not explicitly trained to do so. We show that this algorithm can be used as an
    exploratory tool to improve the network architecture, and within numerous other
    settings in vision.

    Learning Features for Offline Handwritten Signature Verification using Deep Convolutional Neural Networks

    Luiz G. Hafemann, Robert Sabourin, Luiz S. Oliveira
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Verifying the identity of a person using handwritten signatures is
    challenging in the presence of skilled forgeries, where a forger has access to
    a person’s signature and deliberately attempt to imitate it. In offline
    (static) signature verification, the dynamic information of the signature
    writing process is lost, and it is difficult to design good feature extractors
    that can distinguish genuine signatures and skilled forgeries. This reflects in
    a relatively poor performance, with verification errors around 7% in the best
    systems in the literature. To address both the difficulty of obtaining good
    features, as well as improve system performance, we propose learning the
    representations from signature images, in a Writer-Independent format, using
    Convolutional Neural Networks. In particular, we propose a novel formulation of
    the problem that includes knowledge of skilled forgeries from a subset of users
    in the feature learning process, that aims to capture visual cues that
    distinguish genuine signatures and forgeries regardless of the user. Extensive
    experiments were conducted on four datasets: GPDS, MCYT, CEDAR and Brazilian
    PUC-PR datasets. On GPDS-160, we obtained a large improvement in
    state-of-the-art performance, achieving 1.72% Equal Error Rate, compared to
    6.97% in the literature. We also verified that the features generalize beyond
    the GPDS dataset, surpassing the state-of-the-art performance in the other
    datasets, without requiring the representation to be fine-tuned to each
    particular dataset.

    Volumetric Super-Resolution of Multispectral Data

    Vildan Atalay Aydin, Hassan Foroosh
    Comments: arXiv admin note: text overlap with arXiv:1705.01258
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Most multispectral remote sensors (e.g. QuickBird, IKONOS, and Landsat 7
    ETM+) provide low-spatial high-spectral resolution multispectral (MS) or
    high-spatial low-spectral resolution panchromatic (PAN) images, separately. In
    order to reconstruct a high-spatial/high-spectral resolution multispectral
    image volume, either the information in MS and PAN images are fused (i.e.
    pansharpening) or super-resolution reconstruction (SRR) is used with only MS
    images captured on different dates. Existing methods do not utilize temporal
    information of MS and high spatial resolution of PAN images together to improve
    the resolution. In this paper, we propose a multiframe SRR algorithm using
    pansharpened MS images, taking advantage of both temporal and spatial
    information available in multispectral imagery, in order to exceed spatial
    resolution of given PAN images. We first apply pansharpening to a set of
    multispectral images and their corresponding PAN images captured on different
    dates. Then, we use the pansharpened multispectral images as input to the
    proposed wavelet-based multiframe SRR method to yield full volumetric SRR. The
    proposed SRR method is obtained by deriving the subband relations between
    multitemporal MS volumes. We demonstrate the results on Landsat 7 ETM+ images
    comparing our method to conventional techniques.

    Motion-Compensated Temporal Filtering for Critically-Sampled Wavelet-Encoded Images

    Vildan Atalay Aydin, Hassan Foroosh
    Comments: arXiv admin note: substantial text overlap with arXiv:1705.04433, arXiv:1705.04641
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a novel motion estimation/compensation (ME/MC) method for
    wavelet-based (in-band) motion compensated temporal filtering (MCTF), with
    application to low-bitrate video coding. Unlike the conventional in-band MCTF
    algorithms, which require redundancy to overcome the shift-variance problem of
    critically sampled (complete) discrete wavelet transforms (DWT), we perform
    ME/MC steps directly on DWT coefficients by avoiding the need of
    shift-invariance. We omit upsampling, the inverse-DWT (IDWT), and the
    calculation of redundant DWT coefficients, while achieving arbitrary subpixel
    accuracy without interpolation, and high video quality even at very
    low-bitrates, by deriving the exact relationships between DWT subbands of input
    image sequences. Experimental results demonstrate the accuracy of the proposed
    method, confirming that our model for ME/MC effectively improves video coding
    quality.

    Active Control of Camera Parameters for Object Detection Algorithms

    Yulong Wu, John Tsotsos
    Comments: 7 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Camera parameters not only play an important role in determining the visual
    quality of perceived images, but also affect the performance of vision
    algorithms, for a vision-guided robot. By quantitatively evaluating four object
    detection algorithms, with respect to varying ambient illumination, shutter
    speed and voltage gain, it is observed that the performance of the algorithms
    is highly dependent on these variables. From this observation, a novel active
    control of camera parameters method is proposed, to make robot vision more
    robust under different light conditions. Experimental results demonstrate the
    effectiveness of our proposed approach, which improves the performance of
    object detection algorithms, compared with the conventional auto-exposure
    algorithm.

    Learning Image Relations with Contrast Association Networks

    Yao Lu, Zhirong Yang, Juho Kannala, Samuel Kaski
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Inferring the relations between two images is an important class of tasks in
    computer vision. Examples of such tasks include computing optical flow and
    stereo disparity. We treat the relation inference tasks as a machine learning
    problem and tackle it with neural networks. A key to the problem is learning a
    representation of relations. We propose a new neural network module, contrast
    association unit (CAU), which explicitly models the relations between two sets
    of input variables. Due to the non-negativity of the weights in CAU, we adopt a
    multiplicative update algorithm for learning these weights. Experiments show
    that neural networks with CAUs are more effective in learning five fundamental
    image transformations than conventional neural networks.

    WebVision Challenge: Visual Learning and Understanding With Web Data

    Wen Li, Limin Wang, Wei Li, Eirikur Agustsson, Jesse Berent, Abhinav Gupta, Rahul Sukthankar, Luc Van Gool
    Comments: project page: this http URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present the 2017 WebVision Challenge, a public image recognition challenge
    designed for deep learning based on web images without instance-level human
    annotation. Following the spirit of previous vision challenges, such as ILSVRC,
    Places2 and PASCAL VOC, which have played critical roles in the development of
    computer vision by contributing to the community with large scale annotated
    data for model designing and standardized benchmarking, we contribute with this
    challenge a large scale web images dataset, and a public competition with a
    workshop co-located with CVPR 2017. The WebVision dataset contains more than
    (2.4) million web images crawled from the Internet by using queries generated
    from the (1,000) semantic concepts of the benchmark ILSVRC 2012 dataset. Meta
    information is also included. A validation set and test set containing human
    annotated images are also provided to facilitate algorithmic development. The
    2017 WebVision challenge consists of two tracks, the image classification task
    on WebVision test set, and the transfer learning task on PASCAL VOC 2012
    dataset. In this paper, we describe the details of data collection and
    annotation, highlight the characteristics of the dataset, and introduce the
    evaluation metrics.

    Picasso: A Neural Network Visualizer

    Ryan Henderson, Rasmus Rothe
    Comments: 9 pages; submission to the Journal of Open Research Software; github.com/merantix/picasso
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

    Picasso is a free open-source (Eclipse Public License) web application
    written in Python for rendering standard visualizations useful for training
    convolutional neural networks. Picasso ships with occlusion maps and saliency
    maps, two visualizations which help reveal issues that evaluation metrics like
    loss and accuracy might hide: for example, learning a proxy classification
    task. Picasso works with the Keras and Tensorflow deep learning frameworks.
    Picasso can be used with minimal configuration by deep learning researchers and
    engineers alike across various neural network architectures. Adding new
    visualizations is simple: the user can specify their visualization code and
    HTML template separately from the application code.

    Research on Bi-mode Biometrics Based on Deep Learning

    Hao Jiang
    Comments: in Chinese
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In view of the fact that biological characteristics have excellent
    independent distinguishing characteristics,biometric identification technology
    involves almost all the relevant areas of human distinction. Fingerprints,
    iris, face, voice-print and other biological features have been widely used in
    the public security departments to detect detection, mobile equipment unlock,
    target tracking and other fields. With the use of electronic devices more and
    more widely and the frequency is getting higher and higher. Only the Biometrics
    identification technology with excellent recognition rate can guarantee the
    long-term development of these fields.

    IAN: The Individual Aggregation Network for Person Search

    Jimin Xiao, Yanchun Xie, Tammam Tillo, Kaizhu Huang, Yunchao Wei, Jiashi Feng
    Comments: 10 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Person search in real-world scenarios is a new challenging computer version
    task with many meaningful applications. The challenge of this task mainly comes
    from: (1) unavailable bounding boxes for pedestrians and the model needs to
    search for the person over the whole gallery images; (2) huge variance of
    visual appearance of a particular person owing to varying poses, lighting
    conditions, and occlusions. To address these two critical issues in modern
    person search applications, we propose a novel Individual Aggregation Network
    (IAN) that can accurately localize persons by learning to minimize intra-person
    feature variations. IAN is built upon the state-of-the-art object detection
    framework, i.e., faster R-CNN, so that high-quality region proposals for
    pedestrians can be produced in an online manner. In addition, to relieve the
    negative effect caused by varying visual appearances of the same individual,
    IAN introduces a novel center loss that can increase the intra-class
    compactness of feature representations. The engaged center loss encourages
    persons with the same identity to have similar feature characteristics.
    Extensive experimental results on two benchmarks, i.e., CUHK-SYSU and PRW, well
    demonstrate the superiority of the proposed model. In particular, IAN achieves
    77.23% mAP and 80.45% top-1 accuracy on CUHK-SYSU, which outperform the
    state-of-the-art by 1.7% and 1.85%, respectively.

    Intel RealSense Stereoscopic Depth Cameras

    Leonid Keselman, John Iselin Woodfill, Anders Grunnet-Jepsen, Achintya Bhowmik
    Comments: Accepted to CCD 2017, a CVPR 2017 Workshop
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a comprehensive overview of the stereoscopic Intel RealSense RGBD
    imaging systems. We discuss these systems’ mode-of-operation, functional
    behavior and include models of their expected performance, shortcomings, and
    limitations. We provide information about the systems’ optical characteristics,
    their correlation algorithms, and how these properties can affect different
    applications, including 3D reconstruction and gesture recognition. Our
    discussion covers the Intel RealSense R200 and RS400.

    Cooperative Learning with Visual Attributes

    Tanmay Batra, Devi Parikh
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Learning paradigms involving varying levels of supervision have received a
    lot of interest within the computer vision and machine learning communities.
    The supervisory information is typically considered to come from a human
    supervisor — a “teacher” figure. In this paper, we consider an alternate
    source of supervision — a “peer” — i.e. a different machine. We introduce
    cooperative learning, where two agents trying to learn the same visual
    concepts, but in potentially different environments using different sources of
    data (sensors), communicate their current knowledge of these concepts to each
    other. Given the distinct sources of data in both agents, the mode of
    communication between the two agents is not obvious. We propose the use of
    visual attributes — semantic mid-level visual properties such as furry,
    wooden, etc.– as the mode of communication between the agents. Our experiments
    in three domains — objects, scenes, and animals — demonstrate that our
    proposed cooperative learning approach improves the performance of both agents
    as compared to their performance if they were to learn in isolation. Our
    approach is particularly applicable in scenarios where privacy, security and/or
    bandwidth constraints restrict the amount and type of information the two
    agents can exchange.

    Joint Geometrical and Statistical Alignment for Visual Domain Adaptation

    Jing Zhang, Wanqing Li, Philip Ogunbona
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a novel unsupervised domain adaptation method for
    cross-domain visual recognition. We propose a unified framework that reduces
    the shift between domains both statistically and geometrically, referred to as
    Joint Geometrical and Statistical Alignment (JGSA). Specifically, we learn two
    coupled projections that project the source domain and target domain data into
    low dimensional subspaces where the geometrical shift and distribution shift
    are reduced simultaneously. The objective function can be solved efficiently in
    a closed form. Extensive experiments have verified that the proposed method
    significantly outperforms several state-of-the-art domain adaptation methods on
    a synthetic dataset and three different real world cross-domain visual
    recognition tasks.

    WordFence: Text Detection in Natural Images with Border Awareness

    Andrei Polzounov, Artsiom Ablavatski, Sergio Escalera, Shijian Lu, Jianfei Cai
    Comments: 5 pages, 2 figures, ICIP 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In recent years, text recognition has achieved remarkable success in
    recognizing scanned document text. However, word recognition in natural images
    is still an open problem, which generally requires time consuming
    post-processing steps. We present a novel architecture for individual word
    detection in scene images based on semantic segmentation. Our contributions are
    twofold: the concept of WordFence, which detects border areas surrounding each
    individual word and a novel pixelwise weighted softmax loss function which
    penalizes background and emphasizes small text regions. WordFence ensures that
    each word is detected individually, and the new loss function provides a strong
    training signal to both text and word border localization. The proposed
    technique avoids intensive post-processing, producing an end-to-end word
    detection system. We achieve superior localization recall on common benchmark
    datasets – 92% recall on ICDAR11 and ICDAR13 and 63% recall on SVT.
    Furthermore, our end-to-end word recognition system achieves state-of-the-art
    86% F-Score on ICDAR13.

    Handwritten Urdu Character Recognition using 1-Dimensional BLSTM Classifier

    Saad Bin Ahmed, Saeeda Naz, Salahuddin Swati, Muhammad Imran Razzak
    Comments: 10 pages, Accepted in NCA for publication
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The recognition of cursive script is regarded as a subtle task in optical
    character recognition due to its varied representation. Every cursive script
    has different nature and associated challenges. As Urdu is one of cursive
    language that is derived from Arabic script, thats why it nearly shares the
    same challenges and difficulties even more harder. We can categorized Urdu and
    Arabic language on basis of its script they use. Urdu is mostly written in
    Nastaliq style whereas, Arabic follows Naskh style of writing. This paper
    presents new and comprehensive Urdu handwritten offline database name
    Urdu-Nastaliq Handwritten Dataset (UNHD). Currently, there is no standard and
    comprehensive Urdu handwritten dataset available publicly for researchers. The
    acquired dataset covers commonly used ligatures that were written by 500
    writers with their natural handwriting on A4 size paper. We performed
    experiments using recurrent neural networks and reported a significant accuracy
    for handwritten Urdu character recognition.

    A Non-Rigid Map Fusion-Based RGB-Depth SLAM Method for Endoscopic Capsule Robots

    Mehmet Turan, Yasin Almalioglu, Helder Araujo, Ender Konukoglu, Metin Sitti
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In the gastrointestinal (GI) tract endoscopy field, ingestible wireless
    capsule endoscopy is considered as a minimally invasive novel diagnostic
    technology to inspect the entire GI tract and to diagnose various diseases and
    pathologies. Since the development of this technology, medical device companies
    and many groups have made significant progress to turn such passive capsule
    endoscopes into robotic active capsule endoscopes to achieve almost all
    functions of current active flexible endoscopes. However, the use of robotic
    capsule endoscopy still has some challenges. One such challenge is the precise
    localization of such active devices in 3D world, which is essential for a
    precise three-dimensional (3D) mapping of the inner organ. A reliable 3D map of
    the explored inner organ could assist the doctors to make more intuitive and
    correct diagnosis. In this paper, we propose to our knowledge for the first
    time in literature a visual simultaneous localization and mapping (SLAM) method
    specifically developed for endoscopic capsule robots. The proposed RGB-Depth
    SLAM method is capable of capturing comprehensive dense globally consistent
    surfel-based maps of the inner organs explored by an endoscopic capsule robot
    in real time. This is achieved by using dense frame-to-model camera tracking
    and windowed surfelbased fusion coupled with frequent model refinement through
    non-rigid surface deformations.

    A Deep Learning Based 6 Degree-of-Freedom Localization Method for Endoscopic Capsule Robots

    Mehmet Turan, Yasin Almalioglu, Ender Konukoglu, Metin Sitti
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a robust deep learning based 6 degrees-of-freedom (DoF)
    localization system for endoscopic capsule robots. Our system mainly focuses on
    localization of endoscopic capsule robots inside the GI tract using only visual
    information captured by a mono camera integrated to the robot. The proposed
    system is a 23-layer deep convolutional neural network (CNN) that is capable to
    estimate the pose of the robot in real time using a standard CPU. The dataset
    for the evaluation of the system was recorded inside a surgical human stomach
    model with realistic surface texture, softness, and surface liquid properties
    so that the pre-trained CNN architecture can be transferred confidently into a
    real endoscopic scenario. An average error of 7:1% and 3:4% for translation and
    rotation has been obtained, respectively. The results accomplished from the
    experiments demonstrate that a CNN pre-trained with raw 2D endoscopic images
    performs accurately inside the GI tract and is robust to various challenges
    posed by reflection distortions, lens imperfections, vignetting, noise, motion
    blur, low resolution, and lack of unique landmarks to track.

    Real-Time Adaptive Image Compression

    Oren Rippel, Lubomir Bourdev
    Comments: Published at ICML 2017
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We present a machine learning-based approach to lossy image compression which
    outperforms all existing codecs, while running in real-time.

    Our algorithm typically produces files 2.5 times smaller than JPEG and JPEG
    2000, 2 times smaller than WebP, and 1.7 times smaller than BPG on datasets of
    generic images across all quality levels. At the same time, our codec is
    designed to be lightweight and deployable: for example, it can encode or decode
    the Kodak dataset in around 10ms per image on GPU.

    Our architecture is an autoencoder featuring pyramidal analysis, an adaptive
    coding module, and regularization of the expected codelength. We also
    supplement our approach with adversarial training specialized towards use in a
    compression setting: this enables us to produce visually pleasing
    reconstructions for very low bitrates.

    Automated Body Structure Extraction from Arbitrary 3D Mesh

    Yong Khoo, Sang Chung
    Journal-ref: Imaging and Graphics, 2017
    Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

    This paper presents an automated method for 3D character skeleton extraction
    that can be applied for generic 3D shapes. Our work is motivated by the
    skeleton-based prior work on automatic rigging focused on skeleton extraction
    and can automatically aligns the extracted structure to fit the 3D shape of the
    given 3D mesh. The body mesh can be subsequently skinned based on the extracted
    skeleton and thus enables rigging process. In the experiment, we apply public
    dataset to drive the estimated skeleton from different body shapes, as well as
    the real data obtained from 3D scanning systems. Satisfactory results are
    obtained compared to the existing approaches.


    Artificial Intelligence

    Demystifying Relational Latent Representations

    Sebastijan Dumančić, Hendrik Blockeel
    Comments: 12 pages, 8 figures
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Latent features learned by deep learning approaches have proven to be a
    powerful tool for machine learning. They serve as a data abstraction that makes
    learning easier by capturing regularities in data explicitly. Their benefits
    motivated their adaptation to relational learning context. In our previous
    work, we introduce an approach that learns relational latent features by means
    of clustering instances and their relations. The major drawback of latent
    representations is that they are often black-box and difficult to interpret.
    This work addresses these issues and shows that (1) latent features created by
    clustering are interpretable and capture interesting properties of data; (2)
    they identify local regions of instances that match well with the label, which
    partially explains their benefit; and (3) although the number of latent
    features generated by this approach is large, often many of them are highly
    redundant and can be removed without hurting performance much.

    Multiobjective Programming for Type-2 Hierarchical Fuzzy Inference Trees

    Varun Kumar Ojha, Vaclav Snasel, Ajith Abraham
    Journal-ref: IEEE Transactions on Fuzzy Systems 2017
    Subjects: Artificial Intelligence (cs.AI)

    This paper proposes a design of hierarchical fuzzy inference tree (HFIT). An
    HFIT produces an optimum treelike structure, i.e., a natural hierarchical
    structure that accommodates simplicity by combining several low-dimensional
    fuzzy inference systems (FISs). Such a natural hierarchical structure provides
    a high degree of approximation accuracy. The construction of HFIT takes place
    in two phases. Firstly, a nondominated sorting based multiobjective genetic
    programming (MOGP) is applied to obtain a simple tree structure (a low
    complexity model) with a high accuracy. Secondly, the differential evolution
    algorithm is applied to optimize the obtained tree’s parameters. In the derived
    tree, each node acquires a different input’s combination, where the
    evolutionary process governs the input’s combination. Hence, HFIT nodes are
    heterogeneous in nature, which leads to a high diversity among the rules
    generated by the HFIT. Additionally, the HFIT provides an automatic feature
    selection because it uses MOGP for the tree’s structural optimization that
    accepts inputs only relevant to the knowledge contained in data. The HFIT was
    studied in the context of both type-1 and type-2 FISs, and its performance was
    evaluated through six application problems. Moreover, the proposed
    multiobjective HFIT was compared both theoretically and empirically with
    recently proposed FISs methods from the literature, such as McIT2FIS,
    TSCIT2FNN, SIT2FNN, RIT2FNS-WB, eT2FIS, MRIT2NFS, IT2FNN-SVR, etc. From the
    obtained results, it was found that the HFIT provided less complex and highly
    accurate models compared to the models produced by the most of other methods.
    Hence, the proposed HFIT is an efficient and competitive alternative to the
    other FISs for function approximation and feature selection.

    Online Article Ranking as a Constrained, Dynamic, Multi-Objective Optimization Problem

    Jeya Balaji Balasubramanian, Akshay Soni, Yashar Mehdad, Nikolay Laptev
    Comments: 7 pages
    Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

    The content ranking problem in a social news website, is typically a function
    that maximizes a scalar metric of interest like dwell-time. However, like in
    most real-world applications we are interested in more than one metric—for
    instance simultaneously maximizing click-through rate, monetization metrics,
    dwell-time—and also satisfy the traffic requirements promised to different
    publishers. All this needs to be done on online data and under the settings
    where the objective function and the constraints can dynamically change; this
    could happen if for instance new publishers are added, some contracts are
    adjusted, or if some contracts are over.

    In this paper, we formulate this problem as a constrained, dynamic,
    multi-objective optimization problem. We propose a novel framework that extends
    a successful genetic optimization algorithm, NSGA-II, to solve this online,
    data-driven problem. We design the modules of NSGA-II to suit our problem. We
    evaluate optimization performance using Hypervolume and introduce a confidence
    interval metric for assessing the practicality of a solution. We demonstrate
    the application of this framework on a real-world Article Ranking problem. We
    observe that we make considerable improvements in both time and performance
    over a brute-force baseline technique that is currently in production.

    All-relevant feature selection using multidimensional filters with exhaustive search

    Krzysztof Mnich, Witold R. Rudnicki
    Comments: 27 pages, 11 figures, 3 tables
    Subjects: Artificial Intelligence (cs.AI)

    This paper describes a method for identification of the informative variables
    in the information system with discrete decision variables. It is targeted
    specifically towards discovery of the variables that are non-informative when
    considered alone, but are informative when the synergistic interactions between
    multiple variables are considered. To this end, the mutual entropy of all
    possible k-tuples of variables with decision variable is computed. Then, for
    each variable the maximal information gain due to interactions with other
    variables is obtained. For non-informative variables this quantity conforms to
    the well known statistical distributions. This allows for discerning truly
    informative variables from non-informative ones. For demonstration of the
    approach, the method is applied to several synthetic datasets that involve
    complex multidimensional interactions between variables. It is capable of
    identifying most important informative variables, even in the case when the
    dimensionality of the analysis is smaller than the true dimensionality of the
    problem. What is more, the high sensitivity of the algorithm allows for
    detection of the influence of nuisance variables on the response variable.

    Know-Evolve: Deep Reasoning in Temporal Knowledge Graphs

    Rakshit Trivedi, Mehrdad Farajtabar, Yichen Wang, Hanjun Dai, Hongyuan Zha, Le Song
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    Knowledge Graphs are important tools to model multi-relational data that
    serves as information pool for various applications. Traditionally, these
    graphs are considered to be static in nature. However, recent availability of
    large scale event-based interaction data has given rise to dynamically evolving
    knowledge graphs that contain temporal information for each edge. Reasoning
    over time in such graphs is not yet well understood.

    In this paper, we present a novel deep evolutionary knowledge network
    architecture to learn entity embeddings that can dynamically and non-linearly
    evolve over time. We further propose a multivariate point process framework to
    model the occurrence of a fact (edge) in continuous time. To facilitate
    temporal reasoning, the learned embeddings are used to compute relationship
    score that further parametrizes intensity function of the point process. We
    demonstrate improved performance over various existing relational learning
    models on two large scale real-world datasets. Further, our method effectively
    predicts occurrence or recurrence time of a fact which is novel compared to any
    prior reasoning approaches in multi-relational setting.

    Optimal Warping Paths are unique for almost every pair of Time Series

    Brijnesh J. Jain, David Schultz
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    An optimal warping path between two time series is generally not unique. The
    size and form of the set of pairs of time series with non-unique optimal
    warping path is unknown. This article shows that optimal warping paths are
    unique for almost every pair of time series in a measure-theoretic sense. All
    pairs of time series with non-unique optimal warping path form a negligible set
    and are geometrically the union of zero sets of quadratic forms. The result is
    useful for analyzing and understanding adaptive learning methods in dynamic
    time warping spaces.

    Text-based Adventures of the Golovin AI Agent

    Bartosz Kostka, Jaroslaw Kwiecien, Jakub Kowalski, Pawel Rychlikowski
    Subjects: Artificial Intelligence (cs.AI)

    The domain of text-based adventure games has been recently established as a
    new challenge of creating the agent that is both able to understand natural
    language, and acts intelligently in text-described environments.

    In this paper, we present our approach to tackle the problem. Our agent,
    named Golovin, takes advantage of the limited game domain. We use genre-related
    corpora (including fantasy books and decompiled games) to create language
    models suitable to this domain. Moreover, we embed mechanisms that allow us to
    specify, and separately handle, important tasks as fighting opponents, managing
    inventory, and navigating on the game map.

    We validated usefulness of these mechanisms, measuring agent’s performance on
    the set of 50 interactive fiction games. Finally, we show that our agent plays
    on a level comparable to the winner of the last year Text-Based Adventure AI
    Competition.

    New Reinforcement Learning Using a Chaotic Neural Network for Emergence of "Thinking" – "Exploration" Grows into "Thinking" through Learning –

    Katsunari Shibata, Yuki Goto
    Comments: The Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM) 2017, 5 pages, 6 figures
    Subjects: Artificial Intelligence (cs.AI)

    Expectation for the emergence of higher functions is getting larger in the
    framework of end-to-end reinforcement learning using a recurrent neural
    network. However, the emergence of “thinking” that is a typical higher function
    is difficult to realize because “thinking” needs non fixed-point, flow-type
    attractors with both convergence and transition dynamics. Furthermore, in order
    to introduce “inspiration” or “discovery” in “thinking”, not completely random
    but unexpected transition should be also required.

    By analogy to “chaotic itinerancy”, we have hypothesized that “exploration”
    grows into “thinking” through learning by forming flow-type attractors on
    chaotic random-like dynamics. It is expected that if rational dynamics are
    learned in a chaotic neural network (ChNN), coexistence of rational state
    transition, inspiration-like state transition and also random-like exploration
    for unknown situation can be realized.

    Based on the above idea, we have proposed new reinforcement learning using a
    ChNN as an actor. The positioning of exploration is completely different from
    the conventional one. The chaotic dynamics inside the ChNN produces exploration
    factors by itself. Since external random numbers for stochastic action
    selection are not used, exploration factors cannot be isolated from the output.
    Therefore, the learning method is also completely different from the
    conventional one.

    At each non-feedback connection, one variable named causality trace takes in
    and maintains the input through the connection according to the change in its
    output. Using the trace and TD error, the weight is updated.

    In this paper, as the result of a recent simple task to see whether the new
    learning works or not, it is shown that a robot with two wheels and two visual
    sensors reaches a target while avoiding an obstacle after learning though there
    are still many rooms for improvement.

    Learning Hard Alignments with Variational Inference

    Dieterich Lawson, George Tucker, Chung-Cheng Chiu, Colin Raffel, Kevin Swersky, Navdeep Jaitly
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    There has recently been significant interest in hard attention models for
    tasks such as object recognition, visual captioning and speech recognition.
    Hard attention can offer benefits over soft attention such as decreased
    computational cost, but training hard attention models can be difficult because
    of the discrete latent variables they introduce. Previous work has used
    REINFORCE and Q-learning to approach these issues, but those methods can
    provide high-variance gradient estimates and be slow to train. In this paper,
    we tackle the problem of learning hard attention for a 1-d temporal task using
    variational inference methods, specifically the recently introduced VIMCO and
    NVIL. Furthermore, we propose novel baselines that adapt VIMCO to this setting.
    We demonstrate our method on a phoneme recognition task in clean and noisy
    environments and show that our method outperforms REINFORCE with the difference
    being greater for a more complicated task.

    A Method for Determining Weights of Criterias and Alternative of Fuzzy Group Decision Making Problem

    Jon JaeGyong, Mun JongHui, Ryang GyongIl
    Comments: 12 pages, 3 tables
    Subjects: Artificial Intelligence (cs.AI)

    In this paper, we constructed a model to determine weights of criterias and
    presented a solution for determining the optimal alternative by using the
    constructed model and relationship analysis between criterias in fuzzy group
    decision-making problem with different forms of preference information of
    decision makers on criterias.

    Repeated Inverse Reinforcement Learning for AI Safety

    Kareem Amin, Nan Jiang, Satinder Singh
    Comments: The first two authors contributed equally to this work
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    How detailed should we make the goals we prescribe to AI agents acting on our
    behalf in complex environments? Detailed and low-level specification of goals
    can be tedious and expensive to create, and abstract and high-level goals could
    lead to negative surprises as the agent may find behaviors that we would not
    want it to do, i.e., lead to unsafe AI. One approach to addressing this dilemma
    is for the agent to infer human goals by observing human behavior. This is the
    Inverse Reinforcement Learning (IRL) problem. However, IRL is generally
    ill-posed for there are typically many reward functions for which the observed
    behavior is optimal. While the use of heuristics to select from among the set
    of feasible reward functions has led to successful applications of IRL to
    learning from demonstration, such heuristics do not address AI safety. In this
    paper we introduce a novel repeated IRL problem that captures an aspect of AI
    safety as follows. The agent has to act on behalf of a human in a sequence of
    tasks and wishes to minimize the number of tasks that it surprises the human.
    Each time the human is surprised the agent is provided a demonstration of the
    desired behavior by the human. We formalize this problem, including how the
    sequence of tasks is chosen, in a few different ways and provide some
    foundational results.

    Comparison-Based Choices

    Jon Kleinberg, Sendhil Mullainathan, Johan Ugander
    Comments: 20 pages, 3 figures
    Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI)

    A broad range of on-line behaviors are mediated by interfaces in which people
    make choices among sets of options. A rich and growing line of work in the
    behavioral sciences indicate that human choices follow not only from the
    utility of alternatives, but also from the choice set in which alternatives are
    presented. In this work we study comparison-based choice functions, a simple
    but surprisingly rich class of functions capable of exhibiting so-called
    choice-set effects. Motivated by the challenge of predicting complex choices,
    we study the query complexity of these functions in a variety of settings. We
    consider settings that allow for active queries or passive observation of a
    stream of queries, and give analyses both at the granularity of individuals or
    populations that might exhibit heterogeneous choice behavior. Our main result
    is that any comparison-based choice function in one dimension can be inferred
    as efficiently as a basic maximum or minimum choice function across many query
    contexts, suggesting that choice-set effects need not entail any fundamental
    algorithmic barriers to inference. We also introduce a class of choice
    functions we call distance-comparison-based functions, and briefly discuss the
    analysis of such functions. The framework we outline provides intriguing
    connections between human choice behavior and a range of questions in the
    theory of sorting.

    Subjective Knowledge Acquisition and Enrichment Powered By Crowdsourcing

    Rui Meng, Hao Xin, Lei Chen, Yangqiu Song
    Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

    Knowledge bases (KBs) have attracted increasing attention due to its great
    success in various areas, such as Web and mobile search.Existing KBs are
    restricted to objective factual knowledge, such as city population or fruit
    shape, whereas,subjective knowledge, such as big city, which is commonly
    mentioned in Web and mobile queries, has been neglected. Subjective knowledge
    differs from objective knowledge in that it has no documented or observed
    ground truth. Instead, the truth relies on people’s dominant opinion. Thus, we
    can use the crowdsourcing technique to get opinion from the crowd. In our work,
    we propose a system, called crowdsourced subjective knowledge acquisition
    (CoSKA),for subjective knowledge acquisition powered by crowdsourcing and
    existing KBs. The acquired knowledge can be used to enrich existing KBs in the
    subjective dimension which bridges the gap between existing objective knowledge
    and subjective queries.The main challenge of CoSKA is the conflict between
    large scale knowledge facts and limited crowdsourcing resource. To address this
    challenge, in this work, we define knowledge inference rules and then select
    the seed knowledge judiciously for crowdsourcing to maximize the inference
    power under the resource constraint. Our experimental results on real knowledge
    base and crowdsourcing platform verify the effectiveness of CoSKA system.

    Picasso: A Neural Network Visualizer

    Ryan Henderson, Rasmus Rothe
    Comments: 9 pages; submission to the Journal of Open Research Software; github.com/merantix/picasso
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

    Picasso is a free open-source (Eclipse Public License) web application
    written in Python for rendering standard visualizations useful for training
    convolutional neural networks. Picasso ships with occlusion maps and saliency
    maps, two visualizations which help reveal issues that evaluation metrics like
    loss and accuracy might hide: for example, learning a proxy classification
    task. Picasso works with the Keras and Tensorflow deep learning frameworks.
    Picasso can be used with minimal configuration by deep learning researchers and
    engineers alike across various neural network architectures. Adding new
    visualizations is simple: the user can specify their visualization code and
    HTML template separately from the application code.

    Learning Probabilistic Programs Using Backpropagation

    Avi Pfeffer
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Probabilistic modeling enables combining domain knowledge with learning from
    data, thereby supporting learning from fewer training instances than purely
    data-driven methods. However, learning probabilistic models is difficult and
    has not achieved the level of performance of methods such as deep neural
    networks on many tasks. In this paper, we attempt to address this issue by
    presenting a method for learning the parameters of a probabilistic program
    using backpropagation. Our approach opens the possibility to building deep
    probabilistic programming models that are trained in a similar way to neural
    networks.

    Probabilistically Safe Policy Transfer

    David Held, Zoe McCarthy, Michael Zhang, Fred Shentu, Pieter Abbeel
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Although learning-based methods have great potential for robotics, one
    concern is that a robot that updates its parameters might cause large amounts
    of damage before it learns the optimal policy. We formalize the idea of safe
    learning in a probabilistic sense by defining an optimization problem: we
    desire to maximize the expected return while keeping the expected damage below
    a given safety limit. We study this optimization for the case of a robot
    manipulator with safety-based torque limits. We would like to ensure that the
    damage constraint is maintained at every step of the optimization and not just
    at convergence. To achieve this aim, we introduce a novel method which predicts
    how modifying the torque limit, as well as how updating the policy parameters,
    might affect the robot’s safety. We show through a number of experiments that
    our approach allows the robot to improve its performance while ensuring that
    the expected damage constraint is not violated during the learning process.

    Predicting Blood Pressure with Deep Bidirectional LSTM Network

    Peng Su, Xiaorong Ding, Yuanting Zhang, Ye Li, Ni Zhao
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS); Machine Learning (stat.ML)

    Blood pressure (BP) has been a difficult vascular risk factor to measure
    continuously and precisely with a small cuffless electronic gadget. In the
    meantime, it is the key biomarker for control of cardiovascular diseases (CVD),
    the leading cause of death worldwide. In this work, we addressed the current
    limitation of BP prediction models by formulating BP extraction as a temporal
    sequence prediction problem in which both the input and target are sequential
    data. By incorporating both a bidirectional layer structure and a deep
    architecture in a standard long short term-memory (LSTM), we established a deep
    bidirectional LSTM (DB-LSTM) network that can adaptively discover the latent
    structures of different timescales in BP sequences and automatically learn such
    multiscale dependencies. We evaluated our proposed model on a one-day and
    four-day continuous BP dataset, and the results show that DB-LSTM network can
    effectively learn different timescale dependencies in the BP sequences and
    advances the state-of-the-art by achieving superior accuracy performance than
    other leading methods on both datasets. To the best of our knowledge, this is
    the first study to validate the ability of recurrent neural networks to learn
    the different timescale dependencies of long-term continuous BP sequence.


    Information Retrieval

    Online Article Ranking as a Constrained, Dynamic, Multi-Objective Optimization Problem

    Jeya Balaji Balasubramanian, Akshay Soni, Yashar Mehdad, Nikolay Laptev
    Comments: 7 pages
    Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

    The content ranking problem in a social news website, is typically a function
    that maximizes a scalar metric of interest like dwell-time. However, like in
    most real-world applications we are interested in more than one metric—for
    instance simultaneously maximizing click-through rate, monetization metrics,
    dwell-time—and also satisfy the traffic requirements promised to different
    publishers. All this needs to be done on online data and under the settings
    where the objective function and the constraints can dynamically change; this
    could happen if for instance new publishers are added, some contracts are
    adjusted, or if some contracts are over.

    In this paper, we formulate this problem as a constrained, dynamic,
    multi-objective optimization problem. We propose a novel framework that extends
    a successful genetic optimization algorithm, NSGA-II, to solve this online,
    data-driven problem. We design the modules of NSGA-II to suit our problem. We
    evaluate optimization performance using Hypervolume and introduce a confidence
    interval metric for assessing the practicality of a solution. We demonstrate
    the application of this framework on a real-world Article Ranking problem. We
    observe that we make considerable improvements in both time and performance
    over a brute-force baseline technique that is currently in production.


    Computation and Language

    Social Media-based Substance Use Prediction

    Tao Ding, Warren K. Bickel, Shimei Pan
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Social and Information Networks (cs.SI)

    In this paper, we demonstrate how the state-of-the-art machine learning and
    text mining techniques can be used to build effective social media-based
    substance use detection systems. Since a substance use ground truth is
    difficult to obtain on a large scale, to maximize system performance, we
    explore different feature learning methods to take advantage of a large amount
    of unsupervised social media data. We also demonstrate the benefit of using
    multi-view unsupervised feature learning to combine heterogeneous user
    information such as Facebook `”likes” and “status updates” to enhance system
    performance. Based on our evaluation, our best models achieved 86% AUC for
    predicting tobacco use, 81% for alcohol use and 84% for drug use, all of which
    significantly outperformed existing methods. Our investigation has also
    uncovered interesting relations between a user’s social media behavior (e.g.,
    word usage) and substance use.

    NeuroNER: an easy-to-use program for named-entity recognition based on neural networks

    Franck Dernoncourt, Ji Young Lee, Peter Szolovits
    Comments: The first two authors contributed equally to this work
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Named-entity recognition (NER) aims at identifying entities of interest in a
    text. Artificial neural networks (ANNs) have recently been shown to outperform
    existing NER systems. However, ANNs remain challenging to use for non-expert
    users. In this paper, we present NeuroNER, an easy-to-use named-entity
    recognition tool based on ANNs. Users can annotate entities using a graphical
    web-based user interface (BRAT): the annotations are then used to train an ANN,
    which in turn predict entities’ locations and categories in new texts. NeuroNER
    makes this annotation-training-prediction flow smooth and accessible to anyone.

    A Biomedical Information Extraction Primer for NLP Researchers

    Surag Nair
    Subjects: Computation and Language (cs.CL)

    Biomedical Information Extraction is an exciting field at the crossroads of
    Natural Language Processing, Biology and Medicine. It encompasses a variety of
    different tasks that require application of state-of-the-art NLP techniques,
    such as NER and Relation Extraction. This paper provides an overview of the
    problems in the field and discusses some of the techniques used for solving
    them.

    Key-Value Retrieval Networks for Task-Oriented Dialogue

    Mihail Eric, Christopher D. Manning
    Subjects: Computation and Language (cs.CL)

    Neural task-oriented dialogue systems often struggle to smoothly interface
    with a knowledge base. In this work, we seek to address this problem by
    proposing a new neural dialogue agent that is able to effectively sustain
    grounded, multi-domain discourse through a novel key-value retrieval mechanism.
    The model is end-to-end differentiable and does not need to explicitly model
    dialogue state or belief trackers. We also release a new dataset of 3,031
    dialogues that are grounded through underlying knowledge bases and span three
    distinct tasks in the in-car personal assistant space: calendar scheduling,
    weather information retrieval, and point-of-interest navigation. Our
    architecture is simultaneously trained on data from all domains and
    significantly outperforms a competitive rule-based system and other existing
    neural dialogue architectures on the provided domains according to both
    automatic and human evaluation metrics.

    Agent-based model for the origins of scaling in human language

    Javier Vera, Felipe Urbina
    Subjects: Adaptation and Self-Organizing Systems (nlin.AO); Computation and Language (cs.CL); Physics and Society (physics.soc-ph)

    Background/Introduction: The Zipf’s law establishes that if the words of a
    (large) text are ordered by decreasing frequency, the frequency versus the rank
    decreases as a power law with exponent close to -1. Previous work has stressed
    that this pattern arises from a conflict of interests of the participants of
    communication: speakers and hearers. Methods: The challenge here is to define a
    computational language game on a population of agents, playing games mainly
    based on a parameter that measures the relative participant’s interests.
    Results: Numerical simulations suggest that at critical values of the parameter
    a human-like vocabulary, exhibiting scaling properties, seems to appear.
    Conclusions: The appearance of an intermediate distribution of frequencies at
    some critical values of the parameter suggests that on a population of
    artificial agents the emergence of scaling partly arises as a self-organized
    process only from local interactions between agents.

    Know-Evolve: Deep Reasoning in Temporal Knowledge Graphs

    Rakshit Trivedi, Mehrdad Farajtabar, Yichen Wang, Hanjun Dai, Hongyuan Zha, Le Song
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    Knowledge Graphs are important tools to model multi-relational data that
    serves as information pool for various applications. Traditionally, these
    graphs are considered to be static in nature. However, recent availability of
    large scale event-based interaction data has given rise to dynamically evolving
    knowledge graphs that contain temporal information for each edge. Reasoning
    over time in such graphs is not yet well understood.

    In this paper, we present a novel deep evolutionary knowledge network
    architecture to learn entity embeddings that can dynamically and non-linearly
    evolve over time. We further propose a multivariate point process framework to
    model the occurrence of a fact (edge) in continuous time. To facilitate
    temporal reasoning, the learned embeddings are used to compute relationship
    score that further parametrizes intensity function of the point process. We
    demonstrate improved performance over various existing relational learning
    models on two large scale real-world datasets. Further, our method effectively
    predicts occurrence or recurrence time of a fact which is novel compared to any
    prior reasoning approaches in multi-relational setting.


    Distributed, Parallel, and Cluster Computing

    Minimizing Communication Overhead in Window-Based Parallel Complex Event Processing

    Ruben Mayer, Muhammad Adnan Tariq, Kurt Rothermel
    Comments: In Proceedings of ACM International Conference on Distributed and Event-Based Systems, Barcelona, Spain, June 19 – 23, 2017 (DEBS ’17), 12 pages. DOI: this http URL
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Distributed Complex Event Processing has emerged as a well-established
    paradigm to detect situations of interest from basic sensor streams, building
    an operator graph between sensors and applications. In order to detect event
    patterns that correspond to situations of interest, each operator correlates
    events on its incoming streams according to a sliding window mechanism. To
    increase the throughput of an operator, different windows can be assigned to
    different operator instances—i.e., identical operator copies—which process
    them in parallel. This implies that events that are part of multiple
    overlapping windows are replicated to different operator instances. The
    communication overhead of replicating the events can be reduced by assigning
    overlapping windows to the same operator instance. However, this imposes a
    higher processing load on the single operator instance, possibly overloading
    it. In this paper, we address the trade-off between processing load and
    communication overhead when assigning overlapping windows to a single operator
    instance. Controlling the trade-off is challenging and cannot be solved with
    traditional reactive methods. To this end, we propose a model-based batch
    scheduling controller building on prediction. Evaluations show that our
    approach is able to significantly save bandwidth, while keeping a user-defined
    latency bound in the operator instances.

    Parallel Search with no Coordination

    Amos Korman (IRIF, GANG), Yoav Rodeh
    Journal-ref: 24th International Colloquium on Structural Information and
    Communication Complexity (SIROCCO), Jun 2017, Porquerolles, France
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    We consider a parallel version of a classical Bayesian search problem. (k)
    agents are looking for a treasure that is placed in one of the boxes indexed by
    (mathbb{N}^+) according to a known distribution (p). The aim is to minimize
    the expected time until the first agent finds it. Searchers run in parallel
    where at each time step each searcher can “peek” into a box. A basic family of
    algorithms which are inherently robust is emph{non-coordinating} algorithms.
    Such algorithms act independently at each searcher, differing only by their
    probabilistic choices. We are interested in the price incurred by employing
    such algorithms when compared with the case of full coordination. We first show
    that there exists a non-coordination algorithm, that knowing only the relative
    likelihood of boxes according to (p), has expected running time of at most
    (10+4(1+frac{1}{k})^2 T), where (T) is the expected running time of the best
    fully coordinated algorithm. This result is obtained by applying a refined
    version of the main algorithm suggested by Fraigniaud, Korman and Rodeh in
    STOC’16, which was designed for the context of linear parallel search.We then
    describe an optimal non-coordinating algorithm for the case where the
    distribution (p) is known. The running time of this algorithm is difficult to
    analyse in general, but we calculate it for several examples. In the case where
    (p) is uniform over a finite set of boxes, then the algorithm just checks boxes
    uniformly at random among all non-checked boxes and is essentially (2) times
    worse than the coordinating algorithm.We also show simple algorithms for Pareto
    distributions over (M) boxes. That is, in the case where (p(x) sim 1/x^b) for
    (0< b < 1), we suggest the following algorithm: at step (t) choose uniformly
    from the boxes unchecked in ({1, . . . ,min(M, lfloor t/sigma
    floor)}),
    where (sigma = b/(b + k – 1)). It turns out this algorithm is asymptotically
    optimal, and runs about (2+b) times worse than the case of full coordination.

    Cloudroid: A Cloud Framework for Transparent and QoS-aware Robotic Computation Outsourcing

    Ben Hu, Huaimin Wang, Pengfei Zhang, Bo Ding, Huimin Che
    Comments: Accepted by 10th IEEE International Conference on Cloud Computing in 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Robotics (cs.RO)

    Many robotic tasks require heavy computation, which can easily exceed the
    robot’s onboard computer capability. A promising solution to address this
    challenge is outsourcing the computation to the cloud. However, exploiting the
    potential of cloud resources in robotic software is difficult, because it
    involves complex code modification and extensive (re)configuration procedures.
    Moreover, quality of service (QoS) such as timeliness, which is critical to
    robot’s behavior, have to be considered. In this paper, we propose a
    transparent and QoS-aware software framework called Cloudroid for cloud robotic
    applications. This framework supports direct deployment of existing robotic
    software packages to the cloud, transparently transforming them into
    Internet-accessible cloud services. And with the automatically generated
    service stubs, robotic applications can outsource their computation to the
    cloud without any code modification. Furthermore, the robot and the cloud can
    cooperate to maintain the specific QoS property such as request response time,
    even in a highly dynamic and resource-competitive environment. We evaluated
    Cloudroid based on a group of typical robotic scenarios and a set of software
    packages widely adopted in real-world robot practices. Results show that
    robot’s capability can be enhanced significantly without code modification and
    specific QoS objectives can be guaranteed. In certain tasks, the “cloud +
    robot” setup shows improved performance in orders of magnitude compared with
    the robot native setup.

    A lightweight MapReduce framework for secure processing with SGX

    Rafael Pires, Daniel Gavril, Pascal Felber, Emanuel Onica, Marcelo Pasin
    Comments: 8 pages WACC@CCGRID International Workshop on Assured Cloud Computing and QoS aware Big Data
    Journal-ref: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and
    Grid Computing
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)

    MapReduce is a programming model used extensively for parallel data
    processing in distributed environments. A wide range of algorithms were
    implemented using MapReduce, from simple tasks like sorting and searching up to
    complex clustering and machine learning operations. Many of these
    implementations are part of services externalized to cloud infrastructures.
    Over the past years, however, many concerns have been raised regarding the
    security guarantees offered in such environments. Some solutions relying on
    cryptography were proposed for countering threats but these typically imply a
    high computational overhead. Intel, the largest manufacturer of commodity CPUs,
    recently introduced SGX (software guard extensions), a set of hardware
    instructions that support execution of code in an isolated secure environment.
    In this paper, we explore the use of Intel SGX for providing privacy guarantees
    for MapReduce operations, and based on our evaluation we conclude that it
    represents a viable alternative to a cryptographic mechanism. We present
    results based on the widely used k-means clustering algorithm, but our
    implementation can be generalized to other applications that can be expressed
    using MapReduce model.

    Quadratic and Near-Quadratic Lower Bounds for the CONGEST Model

    Keren Censor-Hillel, Seri Khoury, Ami Paz
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    We present the first super-linear lower bounds for natural graph problems in
    the CONGEST model, answering a long-standing open question.

    Specifically, we show that any exact computation of a minimum vertex cover or
    a maximum independent set requires (Omega(n^2/log^2{n})) rounds in the worst
    case in the CONGEST model, as well as any algorithm for (chi)-coloring a
    graph, where (chi) is the chromatic number of the graph. We further show that
    such strong lower bounds are not limited to NP-hard problems, by showing two
    simple graph problems in P which require a quadratic and near-quadratic number
    of rounds.

    Finally, we address the problem of computing an exact solution to weighted
    all-pairs-shortest-paths (APSP), which arguably may be considered as a
    candidate for having a super-linear lower bound. We show a simple (Omega(n))
    lower bound for this problem, which implies a separation between the weighted
    and unweighted cases, since the latter is known to have a complexity of
    (Theta(n/log{n})). We also formally prove that the standard Alice-Bob
    framework is incapable of providing a super-linear lower bound for exact
    weighted APSP, whose complexity remains an intriguing open question.

    GraphH: High Performance Big Graph Analytics in Small Clusters

    Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong. Xiaokui Xiao
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    It is common for real-world applications to analyze big graphs using
    distributed graph processing systems. Popular in-memory systems require an
    enormous amount of resources to handle big graphs. While several out-of-core
    systems have been proposed recently for processing big graphs using secondary
    storage, the high disk I/O overhead could significantly reduce performance. In
    this paper, we propose GraphH to enable high- performance big graph analytics
    in small clusters. Specifically, we design a two-stage graph partition scheme
    to evenly divide the input graph into partitions, and propose a GAB
    (Gather-Apply- Broadcast) computation model to make each worker process a
    partition in memory at a time. We use an edge cache mechanism to reduce the
    disk I/O overhead, and design a hybrid strategy to improve the communication
    performance. GraphH can efficiently process big graphs in small clusters or
    even a single commodity server. Extensive evaluations have shown that GraphH
    could be up to 7.8x faster compared to popular in-memory systems, such as
    Pregel+ and PowerGraph when processing generic graphs, and more than 100x
    faster than recently proposed out-of-core systems, such as GraphD and Chaos
    when processing big graphs.

    Tight Analysis for the 3-Majority Consensus Dynamics

    Mohsen Ghaffari, Johannes Lengler
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Discrete Mathematics (cs.DM)

    We present a tight analysis for the well-studied randomized 3-majority
    dynamics of stabilizing consensus, hence answering the main open question of
    Becchetti et al. [SODA’16].

    Consider a distributed system of n nodes, each initially holding an opinion
    in {1, 2, …, k}. The system should converge to a setting where all
    (non-corrupted) nodes hold the same opinion. This consensus opinion should be
    emph{valid}, meaning that it should be among the initially supported opinions,
    and the (fast) convergence should happen even in the presence of a malicious
    adversary who can corrupt a bounded number of nodes per round and in particular
    modify their opinions. A well-studied distributed algorithm for this problem is
    the 3-majority dynamics, which works as follows: per round, each node gathers
    three opinions — say by taking its own and two of other nodes sampled at
    random — and then it sets its opinion equal to the majority of this set; ties
    are broken arbitrarily, e.g., towards the node’s own opinion.

    Becchetti et al. [SODA’16] showed that the 3-majority dynamics converges to
    consensus in O((k^2sqrt{log n} + klog n)(k+log n)) rounds, even in the
    presence of a limited adversary. We prove that, even with a stronger adversary,
    the convergence happens within O(klog n) rounds. This bound is known to be
    optimal.

    Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC

    Shuo Yang, Kai Wu, Yifan Qiao, Dong Li, Jidong Zhai
    Comments: 12 pages
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Fault tolerance is one of the major design goals for HPC. The emergence of
    non-volatile memories (NVM) provides a solution to build fault tolerant HPC.
    Data in NVM-based main memory are not lost when the system crashes because of
    the non-volatility nature of NVM. However, because of volatile caches, data
    must be logged and explicitly flushed from caches into NVM to ensure
    consistence and correctness before crashes, which can cause large runtime
    overhead.

    In this paper, we introduce an algorithm-based method to establish crash
    consistence in NVM for HPC applications. We slightly extend application data
    structures or sparsely flush cache blocks, which introduce ignorable runtime
    overhead. Such extension or cache flushing allows us to use algorithm knowledge
    to extit{reason} data consistence or correct inconsistent data when the
    application crashes. We demonstrate the effectiveness of our method for three
    algorithms, including an iterative solver, dense matrix multiplication, and
    Monte-Carlo simulation. Based on comprehensive performance evaluation on a
    variety of test environments, we demonstrate that our approach has very small
    runtime overhead (at most 8.2\% and less than 3\% in most cases), much smaller
    than that of traditional checkpoint, while having the same or less
    recomputation cost.

    Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

    Yudong Chen, Lili Su, Jiaming Xu
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Learning (cs.LG); Machine Learning (stat.ML)

    We consider the problem of distributed statistical machine learning in
    adversarial settings, where some unknown and time-varying subset of working
    machines may be compromised and behave arbitrarily to prevent an accurate model
    from being learned. This setting captures the potential adversarial attacks
    faced by Federated Learning — a modern machine learning paradigm that is
    proposed by Google researchers and has been intensively studied for ensuring
    user privacy. Formally, we focus on a distributed system consisting of a
    parameter server and (m) working machines. Each working machine keeps (N/m)
    data samples, where (N) is the total number of samples. The goal is to
    collectively learn the underlying true model parameter of dimension (d).

    In classical batch gradient descent methods, the gradients reported to the
    server by the working machines are aggregated via simple averaging, which is
    vulnerable to a single Byzantine failure. In this paper, we propose a Byzantine
    gradient descent method based on the geometric median of means of the
    gradients. We show that our method can tolerate (q le (m-1)/2) Byzantine
    failures, and the parameter estimate converges in (O(log N)) rounds with an
    estimation error of (sqrt{d(2q+1)/N}), hence approaching the optimal error
    rate (sqrt{d/N}) in the centralized and failure-free setting. The total
    computational complexity of our algorithm is of (O((Nd/m) log N)) at each
    working machine and (O(md + kd log^3 N)) at the central server, and the total
    communication cost is of (O(m d log N)). We further provide an application of
    our general results to the linear regression problem.

    A key challenge arises in the above problem is that Byzantine failures create
    arbitrary and unspecified dependency among the iterations and the aggregated
    gradients. We prove that the aggregated gradient converges uniformly to the
    true gradient function.


    Learning

    Hierarchical Temporal Representation in Linear Reservoir Computing

    Claudio Gallicchio, Alessio Micheli, Luca Pedrelli
    Comments: This is a pre-print of the paper submitted to the 27th Italian Workshop on Neural Networks, WIRN 2017
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Recently, studies on deep Reservoir Computing (RC) highlighted the role of
    layering in deep recurrent neural networks (RNNs). In this paper, the use of
    linear recurrent units allows us to bring more evidence on the intrinsic
    hierarchical temporal representation in deep RNNs through frequency analysis
    applied to the state signals. The potentiality of our approach is assessed on
    the class of Multiple Superimposed Oscillator tasks. Furthermore, our
    investigation provides useful insights to open a discussion on the main aspects
    that characterize the deep learning framework in the temporal domain.

    Learning Edge Representations via Low-Rank Asymmetric Projections

    Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou
    Subjects: Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    We propose a method for learning continuous-space vector representation of
    graphs, which preserves directed edge information. Previous work in learning
    structure-preserving graph embeddings learn one embedding vector per node. In
    addition to learning node embeddings, we also model a directed edge as a
    learnable function of node embeddings, which enable us to learn more concise
    representations that better preserve the graph structure. We perform both
    intrinsic and extrinsic evaluations of our method, presenting results on a
    variety of graphs from social networks, protein interactions, and e-commerce.
    Our results show that learning joint representations learned through our method
    significantly improves state-of-the-art on link prediction tasks, showing error
    reductions of up to 69% and 36%, respectively, on directed and undirected
    graphs, while using representations with 16 times less dimensions per node.

    Learning Convex Regularizers for Optimal Bayesian Denoising

    Ha Q. Nguyen, Emrah Bostan, Michael Unser
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We propose a data-driven algorithm for the maximum a posteriori (MAP)
    estimation of stochastic processes from noisy observations. The primary
    statistical properties of the sought signal is specified by the penalty
    function (i.e., negative logarithm of the prior probability density function).
    Our alternating direction method of multipliers (ADMM)-based approach
    translates the estimation task into successive applications of the proximal
    mapping of the penalty function. Capitalizing on this direct link, we define
    the proximal operator as a parametric spline curve and optimize the spline
    coefficients by minimizing the average reconstruction error for a given
    training set. The key aspects of our learning method are that the associated
    penalty function is constrained to be convex and the convergence of the ADMM
    iterations is proven. As a result of these theoretical guarantees, adaptation
    of the proposed framework to different levels of measurement noise is extremely
    simple and does not require any retraining. We apply our method to estimation
    of both sparse and non-sparse models of L'{e}vy processes for which the
    minimum mean square error (MMSE) estimators are available. We carry out a
    single training session and perform comparisons at various signal-to-noise
    ratio (SNR) values. Simulations illustrate that the performance of our
    algorithm is practically identical to the one of the MMSE estimator
    irrespective of the noise power.

    The power of deeper networks for expressing natural functions

    David Rolnick (MIT), Max Tegmark (MIT)
    Comments: 9 pages, 2 figs
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    It is well-known that neural networks are universal approximators, but that
    deeper networks tend to be much more efficient than shallow ones. We shed light
    on this by proving that the total number of neurons (m) required to approximate
    natural classes of multivariate polynomials of (n) variables grows only
    linearly with (n) for deep neural networks, but grows exponentially when merely
    a single hidden layer is allowed. We also provide evidence that when the number
    of hidden layers is increased from (1) to (k), the neuron requirement grows
    exponentially not with (n) but with (n^{1/k}), suggesting that the minimum
    number of layers required for computational tractability grows only
    logarithmically with (n).

    Sparse Coding by Spiking Neural Networks: Convergence Theory and Computational Results

    Ping Tak Peter Tang, Tsung-Han Lin, Mike Davies
    Comments: 13 pages, 3 figures
    Subjects: Learning (cs.LG); Numerical Analysis (cs.NA); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

    In a spiking neural network (SNN), individual neurons operate autonomously
    and only communicate with other neurons sparingly and asynchronously via spike
    signals. These characteristics render a massively parallel hardware
    implementation of SNN a potentially powerful computer, albeit a non von Neumann
    one. But can one guarantee that a SNN computer solves some important problems
    reliably? In this paper, we formulate a mathematical model of one SNN that can
    be configured for a sparse coding problem for feature extraction. With a
    moderate but well-defined assumption, we prove that the SNN indeed solves
    sparse coding. To the best of our knowledge, this is the first rigorous result
    of this kind.

    Learning Probabilistic Programs Using Backpropagation

    Avi Pfeffer
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Probabilistic modeling enables combining domain knowledge with learning from
    data, thereby supporting learning from fewer training instances than purely
    data-driven methods. However, learning probabilistic models is difficult and
    has not achieved the level of performance of methods such as deep neural
    networks on many tasks. In this paper, we attempt to address this issue by
    presenting a method for learning the parameters of a probabilistic program
    using backpropagation. Our approach opens the possibility to building deep
    probabilistic programming models that are trained in a similar way to neural
    networks.

    Real-Time Adaptive Image Compression

    Oren Rippel, Lubomir Bourdev
    Comments: Published at ICML 2017
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We present a machine learning-based approach to lossy image compression which
    outperforms all existing codecs, while running in real-time.

    Our algorithm typically produces files 2.5 times smaller than JPEG and JPEG
    2000, 2 times smaller than WebP, and 1.7 times smaller than BPG on datasets of
    generic images across all quality levels. At the same time, our codec is
    designed to be lightweight and deployable: for example, it can encode or decode
    the Kodak dataset in around 10ms per image on GPU.

    Our architecture is an autoencoder featuring pyramidal analysis, an adaptive
    coding module, and regularization of the expected codelength. We also
    supplement our approach with adversarial training specialized towards use in a
    compression setting: this enables us to produce visually pleasing
    reconstructions for very low bitrates.

    Demystifying Relational Latent Representations

    Sebastijan Dumančić, Hendrik Blockeel
    Comments: 12 pages, 8 figures
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Latent features learned by deep learning approaches have proven to be a
    powerful tool for machine learning. They serve as a data abstraction that makes
    learning easier by capturing regularities in data explicitly. Their benefits
    motivated their adaptation to relational learning context. In our previous
    work, we introduce an approach that learns relational latent features by means
    of clustering instances and their relations. The major drawback of latent
    representations is that they are often black-box and difficult to interpret.
    This work addresses these issues and shows that (1) latent features created by
    clustering are interpretable and capture interesting properties of data; (2)
    they identify local regions of instances that match well with the label, which
    partially explains their benefit; and (3) although the number of latent
    features generated by this approach is large, often many of them are highly
    redundant and can be removed without hurting performance much.

    Know-Evolve: Deep Reasoning in Temporal Knowledge Graphs

    Rakshit Trivedi, Mehrdad Farajtabar, Yichen Wang, Hanjun Dai, Hongyuan Zha, Le Song
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    Knowledge Graphs are important tools to model multi-relational data that
    serves as information pool for various applications. Traditionally, these
    graphs are considered to be static in nature. However, recent availability of
    large scale event-based interaction data has given rise to dynamically evolving
    knowledge graphs that contain temporal information for each edge. Reasoning
    over time in such graphs is not yet well understood.

    In this paper, we present a novel deep evolutionary knowledge network
    architecture to learn entity embeddings that can dynamically and non-linearly
    evolve over time. We further propose a multivariate point process framework to
    model the occurrence of a fact (edge) in continuous time. To facilitate
    temporal reasoning, the learned embeddings are used to compute relationship
    score that further parametrizes intensity function of the point process. We
    demonstrate improved performance over various existing relational learning
    models on two large scale real-world datasets. Further, our method effectively
    predicts occurrence or recurrence time of a fact which is novel compared to any
    prior reasoning approaches in multi-relational setting.

    A Long Short-Term Memory Recurrent Neural Network Framework for Network Traffic Matrix Prediction

    Abdelhadi Azzouni, Guy Pujolle
    Comments: Submitted for peer review
    Subjects: Networking and Internet Architecture (cs.NI); Learning (cs.LG)

    Network Traffic Matrix (TM) prediction is defined as the problem of
    estimating future network traffic from the previous and achieved network
    traffic data. It is widely used in network planning, resource management and
    network security. Long Short-Term Memory (LSTM) is a specific recurrent neural
    network (RNN) architecture that is well-suited to learn from experience to
    classify, process and predict time series with time lags of unknown size. LSTMs
    have been shown to model temporal sequences and their long-range dependencies
    more accurately than conventional RNNs. In this paper, we propose a LSTM RNN
    framework for predicting short and long term Traffic Matrix (TM) in large
    networks. By validating our framework on real-world data from GEANT network, we
    show that our LSTM models converge quickly and give state of the art TM
    prediction performance for relatively small sized models.

    Optimal Warping Paths are unique for almost every pair of Time Series

    Brijnesh J. Jain, David Schultz
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    An optimal warping path between two time series is generally not unique. The
    size and form of the set of pairs of time series with non-unique optimal
    warping path is unknown. This article shows that optimal warping paths are
    unique for almost every pair of time series in a measure-theoretic sense. All
    pairs of time series with non-unique optimal warping path form a negligible set
    and are geometrically the union of zero sets of quadratic forms. The result is
    useful for analyzing and understanding adaptive learning methods in dynamic
    time warping spaces.

    Learning Image Relations with Contrast Association Networks

    Yao Lu, Zhirong Yang, Juho Kannala, Samuel Kaski
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Inferring the relations between two images is an important class of tasks in
    computer vision. Examples of such tasks include computing optical flow and
    stereo disparity. We treat the relation inference tasks as a machine learning
    problem and tackle it with neural networks. A key to the problem is learning a
    representation of relations. We propose a new neural network module, contrast
    association unit (CAU), which explicitly models the relations between two sets
    of input variables. Due to the non-negativity of the weights in CAU, we adopt a
    multiplicative update algorithm for learning these weights. Experiments show
    that neural networks with CAUs are more effective in learning five fundamental
    image transformations than conventional neural networks.

    To tune or not to tune the number of trees in random forest?

    Philipp Probst, Anne-Laure Boulesteix
    Comments: 20 pages, 4 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    The number of trees T in the random forest (RF) algorithm for supervised
    learning has to be set by the user. It is controversial whether T should simply
    be set to the largest computationally manageable value or whether a smaller T
    may in some cases be better. While the principle underlying bagging is that
    “more trees are better”, in practice the classification error rate sometimes
    reaches a minimum before increasing again for increasing number of trees. The
    goal of this paper is four-fold: (i) providing theoretical results showing that
    the expected error rate may be a non-monotonous function of the number of trees
    and explaining under which circumstances this happens; (ii) providing
    theoretical results showing that such non-monotonous patterns cannot be
    observed for other performance measures such as the Brier score and the
    logarithmic loss (for classification) and the mean squared error (for
    regression); (iii) illustrating the extent of the problem through an
    application to a large number (n = 306) of datasets from the public database
    OpenML; (iv) finally arguing in favor of setting it to a computationally
    feasible large number, depending on convergence properties of the desired
    performance measure.

    Social Media-based Substance Use Prediction

    Tao Ding, Warren K. Bickel, Shimei Pan
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Social and Information Networks (cs.SI)

    In this paper, we demonstrate how the state-of-the-art machine learning and
    text mining techniques can be used to build effective social media-based
    substance use detection systems. Since a substance use ground truth is
    difficult to obtain on a large scale, to maximize system performance, we
    explore different feature learning methods to take advantage of a large amount
    of unsupervised social media data. We also demonstrate the benefit of using
    multi-view unsupervised feature learning to combine heterogeneous user
    information such as Facebook `”likes” and “status updates” to enhance system
    performance. Based on our evaluation, our best models achieved 86% AUC for
    predicting tobacco use, 81% for alcohol use and 84% for drug use, all of which
    significantly outperformed existing methods. Our investigation has also
    uncovered interesting relations between a user’s social media behavior (e.g.,
    word usage) and substance use.

    PatternNet and PatternLRP — Improving the interpretability of neural networks

    Pieter-Jan Kindermans, Kristof T. Schütt, Maximilian Alber, Klaus-Robert Müller, Sven Dähne
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Deep learning has significantly advanced the state of the art in machine
    learning. However, neural networks are often considered black boxes. There is
    significant effort to develop techniques that explain a classifier’s decisions.
    Although some of these approaches have resulted in compelling visualisations,
    there is a lack of theory of what is actually explained. Here we present an
    analysis of these methods and formulate a quality criterion for explanation
    methods. On this ground, we propose an improved method that may serve as an
    extension for existing back-projection and decomposition techniques.

    Metaheuristic Design of Feedforward Neural Networks: A Review of Two Decades of Research

    Varun Kumar Ojha, Ajith Abraham, Václav Snášel
    Journal-ref: Engineering Applications of Artificial Intelligence Volume 60,
    April 2017, Pages 97 to 116
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Over the past two decades, the feedforward neural network (FNN) optimization
    has been a key interest among the researchers and practitioners of multiple
    disciplines. The FNN optimization is often viewed from the various
    perspectives: the optimization of weights, network architecture, activation
    nodes, learning parameters, learning environment, etc. Researchers adopted such
    different viewpoints mainly to improve the FNN’s generalization ability. The
    gradient-descent algorithm such as backpropagation has been widely applied to
    optimize the FNNs. Its success is evident from the FNN’s application to
    numerous real-world problems. However, due to the limitations of the
    gradient-based optimization methods, the metaheuristic algorithms including the
    evolutionary algorithms, swarm intelligence, etc., are still being widely
    explored by the researchers aiming to obtain generalized FNN for a given
    problem. This article attempts to summarize a broad spectrum of FNN
    optimization methodologies including conventional and metaheuristic approaches.
    This article also tries to connect various research directions emerged out of
    the FNN optimization practices, such as evolving neural network (NN),
    cooperative coevolution NN, complex-valued NN, deep learning, extreme learning
    machine, quantum NN, etc. Additionally, it provides interesting research
    challenges for future research to cope-up with the present information
    processing era.

    Learning Hard Alignments with Variational Inference

    Dieterich Lawson, George Tucker, Chung-Cheng Chiu, Colin Raffel, Kevin Swersky, Navdeep Jaitly
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    There has recently been significant interest in hard attention models for
    tasks such as object recognition, visual captioning and speech recognition.
    Hard attention can offer benefits over soft attention such as decreased
    computational cost, but training hard attention models can be difficult because
    of the discrete latent variables they introduce. Previous work has used
    REINFORCE and Q-learning to approach these issues, but those methods can
    provide high-variance gradient estimates and be slow to train. In this paper,
    we tackle the problem of learning hard attention for a 1-d temporal task using
    variational inference methods, specifically the recently introduced VIMCO and
    NVIL. Furthermore, we propose novel baselines that adapt VIMCO to this setting.
    We demonstrate our method on a phoneme recognition task in clean and noisy
    environments and show that our method outperforms REINFORCE with the difference
    being greater for a more complicated task.

    Data clustering with edge domination in complex networks

    Paulo Roberto Urio, Zhao Liang
    Comments: 13 pages, 6 figures
    Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Physics and Society (physics.soc-ph)

    This paper presents a model for a dynamical system where particles dominate
    edges in a complex network. The proposed dynamical system is then extended to
    an application on the problem of community detection and data clustering. In
    the case of the data clustering problem, 6 different techniques were simulated
    on 10 different datasets in order to compare with the proposed technique. The
    results show that the proposed algorithm performs well when prior knowledge of
    the number of clusters is known to the algorithm.

    Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

    Yudong Chen, Lili Su, Jiaming Xu
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Learning (cs.LG); Machine Learning (stat.ML)

    We consider the problem of distributed statistical machine learning in
    adversarial settings, where some unknown and time-varying subset of working
    machines may be compromised and behave arbitrarily to prevent an accurate model
    from being learned. This setting captures the potential adversarial attacks
    faced by Federated Learning — a modern machine learning paradigm that is
    proposed by Google researchers and has been intensively studied for ensuring
    user privacy. Formally, we focus on a distributed system consisting of a
    parameter server and (m) working machines. Each working machine keeps (N/m)
    data samples, where (N) is the total number of samples. The goal is to
    collectively learn the underlying true model parameter of dimension (d).

    In classical batch gradient descent methods, the gradients reported to the
    server by the working machines are aggregated via simple averaging, which is
    vulnerable to a single Byzantine failure. In this paper, we propose a Byzantine
    gradient descent method based on the geometric median of means of the
    gradients. We show that our method can tolerate (q le (m-1)/2) Byzantine
    failures, and the parameter estimate converges in (O(log N)) rounds with an
    estimation error of (sqrt{d(2q+1)/N}), hence approaching the optimal error
    rate (sqrt{d/N}) in the centralized and failure-free setting. The total
    computational complexity of our algorithm is of (O((Nd/m) log N)) at each
    working machine and (O(md + kd log^3 N)) at the central server, and the total
    communication cost is of (O(m d log N)). We further provide an application of
    our general results to the linear regression problem.

    A key challenge arises in the above problem is that Byzantine failures create
    arbitrary and unspecified dependency among the iterations and the aggregated
    gradients. We prove that the aggregated gradient converges uniformly to the
    true gradient function.

    Repeated Inverse Reinforcement Learning for AI Safety

    Kareem Amin, Nan Jiang, Satinder Singh
    Comments: The first two authors contributed equally to this work
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    How detailed should we make the goals we prescribe to AI agents acting on our
    behalf in complex environments? Detailed and low-level specification of goals
    can be tedious and expensive to create, and abstract and high-level goals could
    lead to negative surprises as the agent may find behaviors that we would not
    want it to do, i.e., lead to unsafe AI. One approach to addressing this dilemma
    is for the agent to infer human goals by observing human behavior. This is the
    Inverse Reinforcement Learning (IRL) problem. However, IRL is generally
    ill-posed for there are typically many reward functions for which the observed
    behavior is optimal. While the use of heuristics to select from among the set
    of feasible reward functions has led to successful applications of IRL to
    learning from demonstration, such heuristics do not address AI safety. In this
    paper we introduce a novel repeated IRL problem that captures an aspect of AI
    safety as follows. The agent has to act on behalf of a human in a sequence of
    tasks and wishes to minimize the number of tasks that it surprises the human.
    Each time the human is surprised the agent is provided a demonstration of the
    desired behavior by the human. We formalize this problem, including how the
    sequence of tasks is chosen, in a few different ways and provide some
    foundational results.

    Probabilistically Safe Policy Transfer

    David Held, Zoe McCarthy, Michael Zhang, Fred Shentu, Pieter Abbeel
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Although learning-based methods have great potential for robotics, one
    concern is that a robot that updates its parameters might cause large amounts
    of damage before it learns the optimal policy. We formalize the idea of safe
    learning in a probabilistic sense by defining an optimization problem: we
    desire to maximize the expected return while keeping the expected damage below
    a given safety limit. We study this optimization for the case of a robot
    manipulator with safety-based torque limits. We would like to ensure that the
    damage constraint is maintained at every step of the optimization and not just
    at convergence. To achieve this aim, we introduce a novel method which predicts
    how modifying the torque limit, as well as how updating the policy parameters,
    might affect the robot’s safety. We show through a number of experiments that
    our approach allows the robot to improve its performance while ensuring that
    the expected damage constraint is not violated during the learning process.


    Information Theory

    New self-dual codes of length 72

    Alexandre Zhdanov
    Subjects: Information Theory (cs.IT)

    In this paper we obtain at least 61 new singly even (Type I) binary
    [72,36,12] self-dual codes as a quasi-cyclic codes with m=2 (tailbitting
    convolutional codes) and at least 13 new doubly even (Type II) binary
    [72,36,12] self-dual codes by replacing the first row in each circulant in a
    double circulant code by “all ones” and “all zeros” vectors respectively.

    Efficient Bit-Channel Reliability Computation for Multi-Mode Polar Code Encoders and Decoders

    Carlo Condo, Seyyed Ali Hashemi, Warren J. Gross
    Subjects: Information Theory (cs.IT)

    Polar codes are a family of capacity-achieving error-correcting codes, and
    they have been selected as part of the next generation wireless communication
    standard. Each polar code bit-channel is assigned a reliability value, used to
    determine which bits transmit information and which parity. Relative
    reliabilities need to be known by both encoders and decoders: in case of
    multi-mode systems, where multiple code lengths and code rates are supported,
    the storage of relative reliabilities can lead to high implementation
    complexity. In this work, observe patterns among code reliabilities. We propose
    an approximate computation technique to easily represent the reliabilities of
    multiple codes, through a limited set of variables and update rules. The
    proposed method allows to tune the trade-off between reliability accuracy and
    implementation complexity. An approximate computation architecture for encoders
    and decoders is designed and implemented, showing 50.7% less area occupation
    than storage-based solutions, with less than 0.05 dB error correction
    performance degradation.

    Transmitter Beam Selection in Millimeter-wave MIMO with In-Band Position-Aiding

    Gabriel E. Garcia, Gonzalo Seco-Granados, Eleftherios Karipidis, Henk Wymeersch
    Comments: 10 pages, 5 figures, journal. Submitted to IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    Emerging wireless communication systems will be characterized by a tight
    coupling between communication and positioning. This is particularly apparent
    in millimeter-wave (mm-wave) communications, where devices use a large number
    of antennas and the propagation is well described by geometric channel models.
    For mm-wave communications, initial access, consisting in the beam selection
    and alignment of two devices, is challenging and time-consuming in the absence
    of location information. Conversely, accurate positioning relies on
    high-quality communication links with proper beam alignment. This paper studies
    this interaction and proposes a new position-aided beam selection protocol,
    which considers the problem of joint communication and positioning in scenarios
    with direct line-of-sight and scattering. Simulation results show significant
    reductions in latency with respect to a standard protocol.

    Super-resolution channel estimation for mmWave massive MIMO with hybrid precoding

    Chen Hu, Linglong Dai, Zhen Gao, Jun Fang
    Comments: 5 pages, 2 figures
    Subjects: Information Theory (cs.IT)

    Channel estimation is challenging for millimeter-wave (mmWave) massive MIMO
    with hybrid precoding, since the number of radio frequency (RF) chains is much
    smaller than that of antennas. Conventional compressive sensing based channel
    estimation schemes suffer from severe resolution loss due to the channel angle
    quantization. To improve the channel estimation accuracy, we propose an
    iterative reweight (IR)-based super-resolution channel estimation scheme in
    this paper. By optimizing an objective function through the gradient descent
    method, the proposed scheme can iteratively move the estimated angle of
    arrivals/departures (AoAs/AoDs) towards the optimal solutions, and finally
    realize the super-resolution channel estimation. In the optimization, a weight
    parameter is used to control the tradeoff between the sparsity and the data
    fitting error. In addition, a singular value decomposition (SVD)-based
    preconditioning is developed to reduce the computational complexity of the
    proposed scheme. Simulation results verify the better performance of the
    proposed scheme than conventional solutions.

    Construction of Parallel RIO Codes using Coset Coding with Hamming Codes

    Akira Yamawaki, Hiroshi Kamabe, Shan Lu
    Comments: 16 pages
    Subjects: Information Theory (cs.IT)

    Random input/output (RIO) code is a coding scheme that enables reading of one
    logical page using a single read threshold in multilevel flash memory. The
    construction of RIO codes is equivalent to the construction of WOM codes.
    Parallel RIO (P-RIO) code is an RIO code that encodes all pages in parallel. In
    this paper, we utilize coset coding with Hamming codes in order to construct
    P-RIO codes. Coset coding is a technique that constructs WOM codes using linear
    binary codes. We leverage the information on the data of all pages to encode
    each page. Our constructed codes store more pages than RIO codes constructed
    via coset coding.

    Edge-Caching Wireless Networks: Energy-Efficient Design and Optimization

    Thang X. Vu, Symeon Chatzinotas, Bjorn Ottersten
    Subjects: Information Theory (cs.IT)

    Edge-caching has received much attention as an efficient technique to reduce
    delivery latency and network congestion during peak-traffic times by bringing
    data closer to end users. Existing works usually design caching algorithms
    separately from physical layer design. In this paper, we analyse edge-caching
    wireless networks by taking into account the caching capability when designing
    the signal transmission. Particularly, we investigate multi-layer caching where
    both base station (BS) and users are capable of storing content data in their
    local cache and analyse the performance of edge-caching wireless networks under
    two notable uncoded and coded caching strategies. Firstly, we propose a coded
    caching strategy that is applied to arbitrary values of cache size. The
    required backhaul and access rates are derived as a function of the BS and user
    cache size. Secondly, closed-form expressions for the system energy efficiency
    (EE) corresponding to the two caching methods are derived. Based on the derived
    formulas, the system EE is maximized via precoding vectors design and
    optimization while satisfying a predefined user request rate. Thirdly, two
    optimization problems are proposed to minimize the content delivery time for
    the two caching strategies. Finally, numerical results are presented to verify
    the effectiveness of the two caching methods.

    Strong Coordination of Signals and Actions over Noisy Channels

    Giulia Cervia (ETIS), Laura Luzzi (ETIS), Maël Le Treust (ETIS), Matthieu Bloch (ECE GeorgiaTech)
    Journal-ref: IEEE International Symposium on Information Theory, Jun 2017,
    Aachen, Germany
    Subjects: Information Theory (cs.IT)

    -We develop a random binning scheme for strong coordination in a network of
    two nodes separated by a noisy channel, in which the input and output signals
    have to be coordinated with the source and its reconstruction. In the case of
    non-causal encoding and decoding, we propose a joint source-channel coding
    scheme and develop inner and outer bounds for the strong coordination region.
    While the set of achievable target distributions is the same as for empirical
    coordination, we characterize the rate of common randomness required for strong
    coordination.

    Short Codes with Mismatched Channel State Information: A Case Study

    Gianluigi Liva, Giuseppe Durisi, Marco Chiani, Shakeel Salamat Ullah, Soung Chang Liew
    Comments: 5 pages, 5 figures, to appear in Proceedings of the IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC 2017)
    Subjects: Information Theory (cs.IT)

    The rising interest in applications requiring the transmission of small
    amounts of data has recently lead to the development of accurate performance
    bounds and of powerful channel codes for the transmission of short-data packets
    over the AWGN channel. Much less is known about the interaction between error
    control coding and channel estimation at short blocks when transmitting over
    channels with states (e.g., fading channels, phase-noise channels, etc…) for
    the setup where no a priori channel state information (CSI) is available at the
    transmitter and the receiver. In this paper, we use the mismatched-decoding
    framework to characterize the fundamental tradeoff occurring in the
    transmission of short data packet over an AWGN channel with unknown gain that
    stays constant over the packet. Our analysis for this simplified setup aims at
    showing the potential of mismatched decoding as a tool to design and analyze
    transmission strategies for short blocks. We focus on a pragmatic approach
    where the transmission frame contains a codeword as well as a preamble that is
    used to estimate the channel (the codeword symbols are not used for channel
    estimation). Achievability and converse bounds on the block error probability
    achievable by this approach are provided and compared with simulation results
    for schemes employing short low-density parity-check codes. Our bounds turn out
    to predict accurately the optimal trade-off between the preamble length and the
    redundancy introduced by the channel code.

    New quaternary sequences of even length with optimal auto-correlation

    W Su, Y Yang, Z Zhou, X Tang
    Comments: This paper was submitted to Science China: Information Sciences at Oct 16, 2016, and accpted for publication at Apr 27, 2017
    Subjects: Information Theory (cs.IT)

    Sequences with low auto-correlation property have been applied in
    code-division multiple access communication systems, radar and cryptography.
    Using the inverse Gray mapping, a quaternary sequence of even length (N) can be
    obtained from two binary sequences of the same length, which are called
    component sequences. In this paper, using interleaving method, we present
    several classes of component sequences from twin-prime sequences pairs or GMW
    sequences pairs given by Tang and Ding in 2010; two, three or four binary
    sequences defined by cyclotomic classes of order (4). Hence we can obtain new
    classes of quaternary sequences, which are different from known ones, since
    known component sequences are constructed from a pair of binary sequences with
    optimal auto-correlation or Sidel’nikov sequences.

    Maximum Signal Minus Interference to Noise Ratio Multiuser Receive Beamforming

    Majid Bavand, Steven D. Blostein
    Comments: Part of this work was presented at 27th Biennial Symposium on Communications, ON, June 2014, and was the runner-up for the best student paper award
    Subjects: Information Theory (cs.IT)

    Motivated by massive deployment of low data rate Internet of things (IoT) and
    ehealth devices with requirement for highly reliable communications, this paper
    proposes receive beamforming techniques for the uplink of a single-input
    multiple-output (SIMO) multiple access channel (MAC), based on a per-user
    probability of error metric and one-dimensional signalling. Although
    beamforming by directly minimizing probability of error (MPE) has potential
    advantages over classical beamforming methods such as zero-forcing and minimum
    mean square error beamforming, MPE beamforming results in a non-convex and a
    highly nonlinear optimization problem. In this paper, by adding a set of
    modulation-based constraints, the MPE beamforming problem is transformed into a
    convex programming problem. Then, a simplified version of the MPE beamforming
    is proposed which reduces the exponential number of constraints in the MPE
    beamforming problem. The simplified problem is also shown to be a convex
    programming problem. The complexity of the simplified problem is further
    reduced by minimizing a convex function which serves as an upper bound on the
    error probability. Minimization of this upper bound results in the introduction
    of a new metric, which is termed signal minus interference to noise ratio
    (SMINR). It is shown that maximizing SMINR leads to a closed-form expression
    for beamforming vectors as well as improved performance over existing
    beamforming methods.

    Partitioned List Decoding of Polar Codes: Analysis and Improvement of Finite Length Performance

    Seyyed Ali Hashemi, Marco Mondelli, S. Hamed Hassani, Rudiger Urbanke, Warren J. Gross
    Comments: Submitted to 2017 IEEE Global Communications Conference (GLOBECOM)
    Subjects: Information Theory (cs.IT)

    Polar codes represent one of the major recent breakthroughs in coding theory
    and, because of their attractive features, they have been selected for the
    incoming 5G standard. As such, a lot of attention has been devoted to the
    development of decoding algorithms with good error performance and efficient
    hardware implementation. One of the leading candidates in this regard is
    represented by successive-cancellation list (SCL) decoding. However, its
    hardware implementation requires a large amount of memory. Recently, a
    partitioned SCL (PSCL) decoder has been proposed to significantly reduce the
    memory consumption. In this paper, we examine the paradigm of PSCL decoding
    from both theoretical and practical standpoints: (i) by changing the
    construction of the code, we are able to improve the performance at no
    additional computational, latency or memory cost, (ii) we present an optimal
    scheme to allocate cyclic redundancy checks (CRCs), and (iii) we provide an
    upper bound on the list size that allows MAP performance.

    Attack Detection in Sensor Network Target Localization Systems with Quantized Data

    Jiangfan Zhang, Xiaodong Wang, Rick S. Blum, Lance M. Kaplan
    Subjects: Information Theory (cs.IT)

    We consider a sensor network focused on target localization, where sensors
    measure the signal strength emitted from the target. Each measurement is
    quantized to one bit and sent to the fusion center. A general attack is
    considered at some sensors that attempts to cause the fusion center to produce
    an inaccurate estimation of the target location with a large mean-square-error.
    The attack is a combination of man-in-the-middle, hacking, and spoofing attacks
    that can effectively change both signals going into and coming out of the
    sensor nodes in a realistic manner. We show that the essential effect of
    attacks is to alter the estimated distance between the target and each attacked
    sensor to a different extent, giving rise to a geometric inconsistency among
    the attacked and unattacked sensors. Hence, with the help of two secure
    sensors, a class of detectors are proposed to detect the attacked sensors by
    scrutinizing the existence of the geometric inconsistency. We show that the
    false alarm and miss probabilities of the proposed detectors decrease
    exponentially as the number of measurement samples increases, which implies
    that for sufficiently large number of samples, the proposed detectors can
    identify the attacked and unattacked sensors with any required accuracy.

    On Channel State Feedback Model and Overhead in Theoretical and Practical Views

    Chunbo Luo
    Comments: 13 pages, 14 figures, IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    Channel state feedback plays an important role to the improvement of link
    performance in current wireless communication systems, and even more in the
    next generation. The feedback information, however, consumes the uplink
    bandwidth and thus generates overhead. In this paper, we investigate the impact
    of channel state feedback and propose an improved scheme to reduce the overhead
    in practical communication systems. Compared with existing schemes, we
    introduce a more accurate channel model to describe practical wireless channels
    and obtain the theoretical lower bounds of overhead for the periodical and
    aperiodical feedback schemes. The obtained theoretical results provide us the
    guidance to optimise the design of feedback systems, such as the number of bits
    used for quantizing channel states. We thus propose a practical feedback scheme
    that achieves low overhead and improved performance over currently widely used
    schemes such as zero holding. Simulation experiments confirm its advantages and
    suggest its potentially wide applications in the next generation of wireless
    systems.

    Invariance: a Theoretical Approach for Coding Sets of Words Modulo Literal (Anti)Morphisms

    Jean Néraud (LITIS), Carla Selmi (LITIS)
    Comments: Submitted to WORDS 2017
    Subjects: Discrete Mathematics (cs.DM); Information Theory (cs.IT); Combinatorics (math.CO)

    Let (A) be a finite or countable alphabet and let ( heta) be literal
    (anti)morphism onto (A^*) (by definition, such a correspondence is determinated
    by a permutation of the alphabet). This paper deals with sets which are
    invariant under ( heta) (( heta)-invariant for short).We establish an
    extension of the famous defect theorem. Moreover, we prove that for the
    so-called thin ( heta)-invariant codes, maximality and completeness are two
    equivalent notions. We prove that a similar property holds for some special
    families of ( heta)-invariant codes such as prefix (bifix) codes, codes with a
    finite deciphering delay, uniformly synchronous codes and circular codes. For a
    special class of involutive antimorphisms, we prove that any regular
    ( heta)-invariant code may be embedded into a complete one.




沪ICP备19023445号-2号
友情链接