IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 24 Jan 2017

    我爱机器学习(52ml.net)发表于 2017-01-24 00:00:00
    love 0

    Neural and Evolutionary Computing

    Regularizing Neural Networks by Penalizing Confident Output Distributions

    Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, Geoffrey Hinton
    Comments: Submitted to ICLR 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    We systematically explore regularizing neural networks by penalizing low
    entropy output distributions. We show that penalizing low entropy output
    distributions, which has been shown to improve exploration in reinforcement
    learning, acts as a strong regularizer in supervised learning. Furthermore, we
    connect a maximum entropy based confidence penalty to label smoothing through
    the direction of the KL divergence. We exhaustively evaluate the proposed
    confidence penalty and label smoothing on 6 common benchmarks: image
    classification (MNIST and Cifar-10), language modeling (Penn Treebank), machine
    translation (WMT’14 English-to-German), and speech recognition (TIMIT and WSJ).
    We find that both label smoothing and the confidence penalty improve
    state-of-the-art models across benchmarks without modifying existing
    hyperparameters, suggesting the wide applicability of these regularizers.

    Integration of Preferences in Decomposition Multi-Objective Optimization

    Ke Li, Kalyanmoy Deb, Xin Yao
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Most existing studies on evolutionary multi-objective optimization focus on
    approximating the whole Pareto-optimal front. Nevertheless, rather than the
    whole front, which demands for too many points (especially in a
    high-dimensional space), the decision maker might only interest in a partial
    region, called the region of interest. In this case, solutions outside this
    region can be noisy to the decision making procedure. Even worse, there is no
    guarantee that we can find the preferred solutions when tackling problems with
    complicated properties or a large number of objectives. In this paper, we
    develop a systematic way to incorporate the decision maker’s preference
    information into the decomposition-based evolutionary multi-objective
    optimization methods. Generally speaking, our basic idea is a non-uniform
    mapping scheme by which the originally uniformly distributed reference points
    on a canonical simplex can be mapped to the new positions close to the
    aspiration level vector specified by the decision maker. By these means, we are
    able to steer the search process towards the region of interest either directly
    or in an interactive manner and also handle a large number of objectives. In
    the meanwhile, the boundary solutions can be approximated given the decision
    maker’s requirements. Furthermore, the extent of the region of the interest is
    intuitively understandable and controllable in a closed form. Extensive
    experiments, both proof-of-principle and on a variety of problems with 3 to 10
    objectives, fully demonstrate the effectiveness of our proposed method for
    approximating the preferred solutions in the region of interest.

    Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks

    Rahul Dey, Fathi M. Salem
    Comments: 5 pages, 8 Figures, 4 Tables
    Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    The paper evaluates three variants of the Gated Recurrent Unit (GRU) in
    recurrent neural networks (RNN) by reducing parameters in the update and reset
    gates. We evaluate the three variant GRU models on MNIST and IMDB datasets and
    show that these GRU-RNN variant models perform as well as the original GRU RNN
    model while reducing the computational expense.

    Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    The capacity of a neural network to absorb information is limited by its
    number of parameters. Conditional computation, where parts of the network are
    active on a per-example basis, has been proposed in theory as a way of
    dramatically increasing model capacity without a proportional increase in
    computation. In practice, however, there are significant algorithmic and
    performance challenges. In this work, we address these challenges and finally
    realize the promise of conditional computation, achieving greater than 1000x
    improvements in model capacity with only minor losses in computational
    efficiency on modern GPU clusters. We introduce a Sparsely-Gated
    Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward
    sub-networks. A trainable gating network determines a sparse combination of
    these experts to use for each example. We apply the MoE to the tasks of
    language modeling and machine translation, where model capacity is critical for
    absorbing the vast quantities of knowledge available in the training corpora.
    We present model architectures in which a MoE with up to 137 billion parameters
    is applied convolutionally between stacked LSTM layers. On large language
    modeling and machine translation benchmarks, these models achieve significantly
    better results than state-of-the-art at lower computational cost.

    Optimization on Product Submanifolds of Convolution Kernels

    Mete Ozay, Takayuki Okatani
    Comments: 3 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We address a problem of optimization on product of embedded submanifolds of
    convolution kernels (PEMs) in convolutional neural networks (CNNs). First, we
    explore metric and curvature properties of PEMs in terms of component
    manifolds. Next, we propose a SGD method, called C-SGD, by generalizing the SGD
    methods employed on kernel submanifolds for optimization on product of
    different collections of kernel submanifolds. Then, we analyze convergence
    properties of the C-SGD considering sectional curvature properties of PEMs. In
    the theoretical results, we expound the constraints that can be used to compute
    adaptive step sizes of the C-SGD in order to assure the asymptotic convergence.


    Computer Vision and Pattern Recognition

    Dirty Pixels: Optimizing Image Classification Architectures for Raw Sensor Data

    Steven Diamond, Vincent Sitzmann, Stephen Boyd, Gordon Wetzstein, Felix Heide
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Real-world sensors suffer from noise, blur, and other imperfections that make
    high-level computer vision tasks like scene segmentation, tracking, and scene
    understanding difficult. Making high-level computer vision networks robust is
    imperative for real-world applications like autonomous driving, robotics, and
    surveillance. We propose a novel end-to-end differentiable architecture for
    joint denoising, deblurring, and classification that makes classification
    robust to realistic noise and blur. The proposed architecture dramatically
    improves the accuracy of a classification network in low light and other
    challenging conditions, outperforming alternative approaches such as retraining
    the network on noisy and blurry images and preprocessing raw sensor inputs with
    conventional denoising and deblurring algorithms. The architecture learns
    denoising and deblurring pipelines optimized for classification whose outputs
    differ markedly from those of state-of-the-art denoising and deblurring
    methods, preserving fine detail at the cost of more noise and artifacts. Our
    results suggest that the best low-level image processing for computer vision is
    different from existing algorithms designed to produce visually pleasing
    images. The principles used to design the proposed architecture easily extend
    to other high-level computer vision tasks and image formation models, providing
    a general framework for integrating low-level and high-level image processing.

    Using Convolutional Neural Networks to Count Palm Trees in Satellite Images

    Eu Koon Cheang, Teik Koon Cheang, Yong Haur Tay
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper we propose a supervised learning system for counting and
    localizing palm trees in high-resolution, panchromatic satellite imagery
    (40cm/pixel to 1.5m/pixel). A convolutional neural network classifier trained
    on a set of palm and no-palm images is applied across a satellite image scene
    in a sliding window fashion. The resultant confidence map is smoothed with a
    uniform filter. A non-maximal suppression is applied onto the smoothed
    confidence map to obtain peaks. Trained with a small dataset of 500 images of
    size 40×40 cropped from satellite images, the system manages to achieve a tree
    count accuracy of over 99%.

    Segmentation-free Vehicle License Plate Recognition using ConvNet-RNN

    Teik Koon Cheang, Yong Shean Chong, Yong Haur Tay
    Comments: 5 pages, 3 figures, International Workshop on Advanced Image Technology, January, 8-10, 2017. Penang, Malaysia. Proceeding IWAIT2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    While vehicle license plate recognition (VLPR) is usually done with a sliding
    window approach, it can have limited performance on datasets with characters
    that are of variable width. This can be solved by hand-crafting algorithms to
    prescale the characters. While this approach can work fairly well, the
    recognizer is only aware of the pixels within each detector window, and fails
    to account for other contextual information that might be present in other
    parts of the image. A sliding window approach also requires training data in
    the form of presegmented characters, which can be more difficult to obtain. In
    this paper, we propose a unified ConvNet-RNN model to recognize real-world
    captured license plate photographs. By using a Convolutional Neural Network
    (ConvNet) to perform feature extraction and using a Recurrent Neural Network
    (RNN) for sequencing, we address the problem of sliding window approaches being
    unable to access the context of the entire image by feeding the entire image as
    input to the ConvNet. This has the added benefit of being able to perform
    end-to-end training of the entire model on labelled, full license plate images.
    Experimental results comparing the ConvNet-RNN architecture to a sliding
    window-based approach shows that the ConvNet-RNN architecture performs
    significantly better.

    Nonsmooth Analysis and Subgradient Methods for Averaging in Dynamic Time Warping Spaces

    David Schultz, Brijnesh Jain
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Time series averaging in dynamic time warping (DTW) spaces has been
    successfully applied to improve pattern recognition systems. This article
    proposes and analyzes subgradient methods for the problem of finding a sample
    mean in DTW spaces. The class of subgradient methods generalizes existing
    sample mean algorithms such as DTW Barycenter Averaging (DBA). We show that DBA
    is a majorize-minimize algorithm that converges to necessary conditions of
    optimality after finitely many iterations. Empirical results show that for
    increasing sample sizes the proposed stochastic subgradient (SSG) algorithm is
    more stable and finds better solutions in shorter time than the DBA algorithm
    on average. Therefore, SSG is useful in online settings and for non-small
    sample sizes. The theoretical and empirical results open new paths for devising
    sample mean algorithms: nonsmooth optimization methods and modified variants of
    pairwise averaging methods.

    Person Re-Identification via Recurrent Feature Aggregation

    Yichao Yan, Bingbing Ni, Zhichao Song, Chao Ma, Yan Yan, Xiaokang Yang
    Comments: 14 pages, 4 figures, in ECCV 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We address the person re-identification problem by effectively exploiting a
    globally discriminative feature representation from a sequence of tracked human
    regions/patches. This is in contrast to previous person re-id works, which rely
    on either single frame based person to person patch matching, or graph based
    sequence to sequence matching. We show that a progressive/sequential fusion
    framework based on long short term memory (LSTM) network aggregates the
    frame-wise human region representation at each time stamp and yields a sequence
    level human feature representation. Since LSTM nodes can remember and propagate
    previously accumulated good features and forget newly input inferior ones, even
    with simple hand-crafted features, the proposed recurrent feature aggregation
    network (RFA-Net) is effective in generating highly discriminative sequence
    level human representations. Extensive experimental results on two person
    re-identification benchmarks demonstrate that the proposed method performs
    favorably against state-of-the-art person re-identification methods.

    Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities

    Guo-Jun Qi
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a novel loss-sensitive generative adversarial net
    (LS-GAN). Compared with the classic GAN that uses a dyadic classification of
    real and generated samples to train the discriminator, we learn a loss function
    that can generate samples with the constraint that a real example should have a
    smaller loss than a generated sample. This results in a novel paradigm of
    loss-sensitive GAN (LS-GAN), as well as a conditional derivative that can
    generate samples satisfying specified conditions by properly defining a
    suitable loss function. The theoretical analysis shows that the LS-GAN can
    generate samples following the true data density we wish to estimate. In
    particular, we focus on a large family of Lipschitz densities for the
    underlying data distribution, allowing us to use a class of Lipschitz losses
    and generators to model the LS-GAN. This relaxes the assumption on the classic
    GANs that the model should have infinite modeling capacity to obtain the
    similar theoretical guarantee. This provides a principled way to regularize a
    family of deep generative models with the proposed LS-GAN criterion, preventing
    them from being overfitted to duplicate few training examples. Furthermore, we
    derive a non-parametric solution that characterizes the upper and lower bounds
    of the losses learned by the LS-GAN. We conduct experiments to evaluate the
    proposed LS-GAN on classification and generation tasks, and demonstrate the
    competitive performances as compared with the other state-of-the-art models.

    A New Convolutional Network-in-Network Structure and Its Applications in Skin Detection, Semantic Segmentation, and Artifact Reduction

    Yoonsik Kim, Insung Hwang, Nam Ik Cho
    Comments: 10 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The inception network has been shown to provide good performance on image
    classification problems, but there are not much evidences that it is also
    effective for the image restoration or pixel-wise labeling problems. For image
    restoration problems, the pooling is generally not used because the decimated
    features are not helpful for the reconstruction of an image as the output.
    Moreover, most deep learning architectures for the restoration problems do not
    use dense prediction that need lots of training parameters. From these
    observations, for enjoying the performance of inception-like structure on the
    image based problems we propose a new convolutional network-in-network
    structure. The proposed network can be considered a modification of inception
    structure where pool projection and pooling layer are removed for maintaining
    the entire feature map size, and a larger kernel filter is added instead.
    Proposed network greatly reduces the number of parameters on account of removed
    dense prediction and pooling, which is an advantage, but may also reduce the
    receptive field in each layer. Hence, we add a larger kernel than the original
    inception structure for not increasing the depth of layers. The proposed
    structure is applied to typical image-to-image learning problems, i.e., the
    problems where the size of input and output are same such as skin detection,
    semantic segmentation, and compression artifacts reduction. Extensive
    experiments show that the proposed network brings comparable or better results
    than the state-of-the-art convolutional neural networks for these problems.

    Image Compression with SVD : A New Quality Metric Based On Energy Ratio

    Henri Bruno Razafindradina, Paul Auguste Randriamitantsoa, Nicolas Raft Razafindrakoto
    Comments: 6 pages, International Journal of Computer Science and Network, Volume 5, Issue 6, December 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Digital image compression is a technique that allows to reduce the size of an
    image in order to increase the capacity storage devices and to optimize the use
    of network bandwidth. The quality of compressed images with the techniques
    based on the discrete cosine transform or the wavelet transform is generally
    measured with PSNR or SSIM. Theses metrics are not suitable to images
    compressed with the singular values decomposition. This paper presents a new
    metric based on the energy ratio to measure the quality of the images coded
    with the SVD. A series of tests on 512 * 512 pixels images show that, for a
    rank k = 40 corresponding to a SSIM = 0,94 or PSNR = 35 dB, 99,9% of the energy
    are restored. Three areas of image quality assessments were identified. This
    new metric is also very accurate and could overcome the weaknesses of PSNR and
    SSIM.

    Greedy Compositional Clustering for Unsupervised Learning of Hierarchical Compositional Models

    Adam Kortylewski, Clemens Blumer, Thomas Vetter
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes to integrate a feature pursuit learning process into a
    greedy bottom-up learning scheme. The algorithm combines the benefits of
    bottom-up and top-down approaches for learning hierarchical models: It allows
    to induce the hierarchical structure of objects in an unsupervised manner,
    while avoiding a hard decision on the activation of parts. We follow the
    principle of compositionality by assembling higher-order parts from elements of
    lower layers in the hierarchy. The parts are learned greedily with an EM-type
    process that iterates between image encoding and part re-learning. The process
    stops when a candidate part is not able to find a free niche in the image. The
    algorithm proceeds layer by layer in a bottom-up manner until no further
    compositions are found. A subsequent top-down process composes the learned
    hierarchical shape vocabulary into a holistic object model. Experimental
    evaluation of the approach shows state-of-the-art performance on a domain
    adaptation task. Moreover, we demonstrate the capability of learning complex,
    semantically meaningful hierarchical compositional models without supervision.

    Perception-based energy functions in seam-cutting

    Nan Li, Tianli Liao, Chao Wang
    Comments: 5 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Image stitching is challenging in consumer-level photography, due to
    alignment difficulties in unconstrained shooting environment. Recent studies
    show that seam-cutting approaches can effectively relieve artifacts generated
    by local misalignment. Normally, seam-cutting is described in terms of energy
    minimization, however, few of existing methods consider human perception in
    their energy functions, which sometimes causes that a seam with minimum energy
    is not most invisible in the overlapping region. In this paper, we propose a
    novel perception-based energy function in the seam-cutting framework, which
    considers the nonlinearity and the nonuniformity of human perception in energy
    minimization. Our perception-based approach adopts a sigmoid metric to
    characterize the perception of color discrimination, and a saliency weight to
    simulate that human eyes incline to pay more attention to salient objects. In
    addition, our seam-cutting composition can be easily implemented into other
    stitching pipelines. Experiments show that our method outperforms the
    seam-cutting method of the normal energy function, and a user study
    demonstrates that our composed results are more consistent with human
    perception.

    Optimization on Product Submanifolds of Convolution Kernels

    Mete Ozay, Takayuki Okatani
    Comments: 3 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We address a problem of optimization on product of embedded submanifolds of
    convolution kernels (PEMs) in convolutional neural networks (CNNs). First, we
    explore metric and curvature properties of PEMs in terms of component
    manifolds. Next, we propose a SGD method, called C-SGD, by generalizing the SGD
    methods employed on kernel submanifolds for optimization on product of
    different collections of kernel submanifolds. Then, we analyze convergence
    properties of the C-SGD considering sectional curvature properties of PEMs. In
    the theoretical results, we expound the constraints that can be used to compute
    adaptive step sizes of the C-SGD in order to assure the asymptotic convergence.

    Multimodal Fusion via a Series of Transfers for Noise Removal

    Chang-Hwan Son, Xiao-Ping Zhang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Near-infrared imaging has been considered as a solution to provide high
    quality photographs in dim lighting conditions. This imaging system captures
    two types of multimodal images: one is near-infrared gray image (NGI) and the
    other is the visible color image (VCI). NGI is noise-free but it is grayscale,
    whereas the VCI has colors but it contains noise. Moreover, there exist serious
    edge and brightness discrepancies between NGI and VCI. To deal with this
    problem, a new transfer-based fusion method is proposed for noise removal.
    Different from conventional fusion approaches, the proposed method conducts a
    series of transfers: contrast, detail, and color transfers. First, the proposed
    contrast and detail transfers aim at solving the serious discrepancy problem,
    thereby creating a new noise-free and detail-preserving NGI. Second, the
    proposed color transfer models the unknown colors from the denoised VCI via a
    linear transform, and then transfers natural-looking colors into the newly
    generated NGI. Experimental results show that the proposed transfer-based
    fusion method is highly successful in solving the discrepancy problem, thereby
    describing edges and textures clearly as well as removing noise completely on
    the fused images. Most of all, the proposed method is superior to conventional
    fusion methods and guided filtering, and even the state-of-the-art fusion
    methods based on scale map and layer decomposition.

    DeadNet: Identifying Phototoxicity from Label-free Microscopy Images of Cells using Deep ConvNets

    David Richmond, Anna Payne-Tobin Jost, Talley Lambert, Jennifer Waters, Hunter Elliott
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)

    Exposure to intense illumination light is an unavoidable consequence of
    fluorescence microscopy, and poses a risk to the health of the sample in every
    live-cell fluorescence microscopy experiment. Furthermore, the possible
    side-effects of phototoxicity on the scientific conclusions that are drawn from
    an imaging experiment are often unaccounted for. Previously, controlling for
    phototoxicity in imaging experiments required additional labels and
    experiments, limiting its widespread application. Here we provide a
    proof-of-principle demonstration that the phototoxic effects of an imaging
    experiment can be identified directly from a single phase-contrast image using
    deep convolutional neural networks (ConvNets). This lays the groundwork for an
    automated tool for assessing cell health in a wide range of imaging
    experiments. Interpretability of such a method is crucial for its adoption. We
    take steps towards interpreting the classification mechanism of the trained
    ConvNet by visualizing salient features of images that contribute to accurate
    classification.

    Image De-raining Using a Conditional Generative Adversarial Network

    He Zhang, Vishwanath Sindagi, Vishal M. Patel
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Severe weather conditions such as rain and snow adversely affect the visual
    quality of images captured under such conditions thus rendering them useless
    for further usage and sharing. In addition, such degraded images drastically
    affect performance of vision systems. Hence, it is important to solve the
    problem of single image de-raining/de-snowing. However, this is a difficult
    problem to solve due to its inherent ill-posed nature. Existing approaches
    attempt to introduce prior information to convert it into a well-posed problem.
    In this paper, we investigate a new point of view in addressing the single
    image de-raining problem. Instead of focusing only on deciding what is a good
    prior or a good framework to achieve good quantitative and qualitative
    performance, we also ensure that the de-rained image itself does not degrade
    the performance of a given computer vision algorithm such as detection and
    classification. In other words, the de-rained result should be
    indistinguishable from its corresponding clear image to a given discriminator.
    This criterion can be directly incorporated into the optimization framework by
    using the recently introduced conditional generative adversarial networks
    (GANs). To minimize artifacts introduced by GANs and ensure better visual
    quality, a new refined loss function is introduced. Based on this, we propose a
    novel single image de-raining method called Image De-raining Conditional
    General Adversarial Network (ID-CGAN), which considers quantitative, visual and
    also discriminative performance into the objective function. Experiments
    evaluated on synthetic images and real images show that the proposed method
    outperforms many recent state-of-the-art single image de-raining methods in
    terms of quantitative and visual performance.

    Learning what to look in chest X-rays with a recurrent visual attention model

    Petros-Pavlos Ypsilantis, Giovanni Montana
    Comments: NIPS 2016 Workshop on Machine Learning for Health
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    X-rays are commonly performed imaging tests that use small amounts of
    radiation to produce pictures of the organs, tissues, and bones of the body.
    X-rays of the chest are used to detect abnormalities or diseases of the
    airways, blood vessels, bones, heart, and lungs. In this work we present a
    stochastic attention-based model that is capable of learning what regions
    within a chest X-ray scan should be visually explored in order to conclude that
    the scan contains a specific radiological abnormality. The proposed model is a
    recurrent neural network (RNN) that learns to sequentially sample the entire
    X-ray and focus only on informative areas that are likely to contain the
    relevant information. We report on experiments carried out with more than
    (100,000) X-rays containing enlarged hearts or medical devices. The model has
    been trained using reinforcement learning methods to learn task-specific
    policies.

    Normative Theory of Visual Receptive Fields

    Tony Lindeberg
    Comments: 6 pages, 8 figures
    Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV)

    This article gives an overview of a normative computational theory of visual
    receptive fields, by which idealized shapes of early spatial, spatio-chromatic
    and spatio-temporal receptive fields can be derived in an axiomatic way based
    on structural properties of the environment in combination with assumptions
    about the internal structure of a vision system to guarantee consistent
    handling of image representations over multiple spatial and temporal scales.
    Interestingly, this theory leads to predictions about visual receptive field
    shapes with qualitatively very good similarity to biological receptive fields
    measured in the retina, the LGN and the primary visual cortex (V1) of mammals.


    Artificial Intelligence

    Constraint programming for planning test campaigns of communications satellites

    Emmanuel Hébrard (LAAS-ROC), Marie-José Huguet (LAAS-ROC), Daniel Veysseire (LAAS-ROC), Ludivine Sauvan (LAAS-ROC), Bertrand Cabon
    Journal-ref: Constraints, Springer Verlag, 2017, 22, pp.73 – 89
    Subjects: Artificial Intelligence (cs.AI)

    The payload of communications satellites must go through a series of tests to
    assert their ability to survive in space. Each test involves some equipment of
    the payload to be active, which has an impact on the temperature of the
    payload. Sequencing these tests in a way that ensures the thermal stability of
    the payload and minimizes the overall duration of the test campaign is a very
    important objective for satellite manufacturers. The problem can be decomposed
    in two sub-problems corresponding to two objectives: First, the number of
    distinct configurations necessary to run the tests must be minimized. This can
    be modeled as packing the tests into configurations, and we introduce a set of
    implied constraints to improve the lower bound of the model. Second, tests must
    be sequenced so that the number of times an equipment unit has to be switched
    on or off is minimized. We model this aspect using the constraint Switch, where
    a buffer with limited capacity represents the currently active equipment units,
    and we introduce an improvement of the propagation algorithm for this
    constraint. We then introduce a search strategy in which we sequentially solve
    the sub-problems (packing and sequencing). Experiments conducted on real and
    random instances show the respective interest of our contributions.

    Binary Matrix Guessing Problem

    Çağrı Latifoğlu
    Comments: 9 pages, 4 tables
    Subjects: Artificial Intelligence (cs.AI)

    We introduce the Binary Matrix Guessing Problem and provide two algorithms to
    solve this problem. The first algorithm we introduce is Elementwise Probing
    Algorithm (EPA) which is very fast under a score which utilizes Frobenius
    Distance. The second algorithm is Additive Reinforcement Learning Algorithm
    which combines ideas from perceptron algorithm and reinforcement learning
    algorithm. This algorithm is significantly slower compared to first one, but
    less restrictive and generalizes better. We compare computational performance
    of both algorithms and provide numerical results.

    Interactive Learning from Policy-Dependent Human Feedback

    James MacGlashan, Mark K Ho, Robert Loftin, Bei Peng, David Roberts, Matthew E. Taylor, Michael L. Littman
    Comments: 7 pages, 2 figures
    Subjects: Artificial Intelligence (cs.AI)

    For agents and robots to become more useful, they must be able to quickly
    learn from non-technical users. This paper investigates the problem of
    interactively learning behaviors communicated by a human teacher using positive
    and negative feedback. Much previous work on this problem has made the
    assumption that people provide feedback for decisions that is dependent on the
    behavior they are teaching and is independent from the learner’s current
    policy. We present empirical results that show this assumption to be
    false—whether human trainers give a positive or negative feedback for a
    decision is influenced by the learner’s current policy. We argue that
    policy-dependent feedback, in addition to being commonplace, enables useful
    training strategies from which agents should benefit. Based on this insight, we
    introduce Convergent Actor-Critic by Humans (COACH), an algorithm for learning
    from policy-dependent feedback that converges to a local optimum. Finally, we
    demonstrate that COACH can successfully learn multiple behaviors on a physical
    robot, even with noisy image features.

    Identification of Unmodeled Objects from Symbolic Descriptions

    Andrea Baisero, Stefan Otte, Peter Englert, Marc Toussaint
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI)

    Successful human-robot cooperation hinges on each agent’s ability to process
    and exchange information about the shared environment and the task at hand.
    Human communication is primarily based on symbolic abstractions of object
    properties, rather than precise quantitative measures. A comprehensive robotic
    framework thus requires an integrated communication module which is able to
    establish a link and convert between perceptual and abstract information.

    The ability to interpret composite symbolic descriptions enables an
    autonomous agent to a) operate in unstructured and cluttered environments, in
    tasks which involve unmodeled or never seen before objects; and b) exploit the
    aggregation of multiple symbolic properties as an instance of ensemble
    learning, to improve identification performance even when the individual
    predicates encode generic information or are imprecisely grounded.

    We propose a discriminative probabilistic model which interprets symbolic
    descriptions to identify the referent object contextually w.r.t. the structure
    of the environment and other objects. The model is trained using a collected
    dataset of identifications, and its performance is evaluated by quantitative
    measures and a live demo developed on the PR2 robot platform, which integrates
    elements of perception, object extraction, object identification and grasping.

    A Multichannel Convolutional Neural Network For Cross-language Dialog State Tracking

    Hongjie Shi, Takashi Ushio, Mitsuru Endo, Katsuyoshi Yamagami, Noriaki Horii
    Comments: Copyright 2016 IEEE. Published in the 2016 IEEE Workshop on Spoken Language Technology (SLT 2016)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    The fifth Dialog State Tracking Challenge (DSTC5) introduces a new
    cross-language dialog state tracking scenario, where the participants are asked
    to build their trackers based on the English training corpus, while evaluating
    them with the unlabeled Chinese corpus. Although the computer-generated
    translations for both English and Chinese corpus are provided in the dataset,
    these translations contain errors and careless use of them can easily hurt the
    performance of the built trackers. To address this problem, we propose a
    multichannel Convolutional Neural Networks (CNN) architecture, in which we
    treat English and Chinese language as different input channels of one single
    CNN model. In the evaluation of DSTC5, we found that such multichannel
    architecture can effectively improve the robustness against translation errors.
    Additionally, our method for DSTC5 is purely machine learning based and
    requires no prior knowledge about the target language. We consider this a
    desirable property for building a tracker in the cross-language context, as not
    every developer will be familiar with both languages.

    Label Propagation on K-partite Graphs with Heterophily

    Dingxiong Deng, Fan Bai, Yiqi Tang, Shuigeng Zhou, Cyrus Shahabi, Linhong Zhu
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

    In this paper, for the first time, we study label propagation in
    heterogeneous graphs under heterophily assumption. Homophily label propagation
    (i.e., two connected nodes share similar labels) in homogeneous graph (with
    same types of vertices and relations) has been extensively studied before.
    Unfortunately, real-life networks are heterogeneous, they contain different
    types of vertices (e.g., users, images, texts) and relations (e.g.,
    friendships, co-tagging) and allow for each node to propagate both the same and
    opposite copy of labels to its neighbors. We propose a (mathcal{K})-partite
    label propagation model to handle the mystifying combination of heterogeneous
    nodes/relations and heterophily propagation. With this model, we develop a
    novel label inference algorithm framework with update rules in near-linear time
    complexity. Since real networks change over time, we devise an incremental
    approach, which supports fast updates for both new data and evidence (e.g.,
    ground truth labels) with guaranteed efficiency. We further provide a utility
    function to automatically determine whether an incremental or a re-modeling
    approach is favored. Extensive experiments on real datasets have verified the
    effectiveness and efficiency of our approach, and its superiority over the
    state-of-the-art label propagation methods.


    Computation and Language

    Learning to Decode for Future Success

    Jiwei Li, Will Monroe, Dan Jurafsky
    Subjects: Computation and Language (cs.CL)

    We introduce a general strategy for improving neural sequence generation by
    incorporating knowledge about the future. Our decoder combines a standard
    sequence decoder with a `soothsayer’ prediction function Q that estimates the
    outcome in the future of generating a word in the present. Our model draws on
    the same intuitions as reinforcement learning, but is both simpler and higher
    performing, avoiding known problems with the use of reinforcement learning in
    tasks with enormous search spaces like sequence generation. We demonstrate our
    model by incorporating Q functions that incrementally predict what the future
    BLEU or ROUGE score of the completed sequence will be, its future length, and
    the backwards probability of the source given the future target sequence.
    Experimental results show that future rediction yields improved performance in
    abstractive summarization and conversational response generation and the
    state-of-the-art in machine translation, while also enabling the decoder to
    generate outputs that have specific properties.

    Adversarial Learning for Neural Dialogue Generation

    Jiwei Li, Will Monroe, Tianlin Shi, Alan Ritter, Dan Jurafsky
    Subjects: Computation and Language (cs.CL)

    In this paper, drawing intuition from the Turing test, we propose using
    adversarial training for open-domain dialogue generation: the system is trained
    to produce sequences that are indistinguishable from human-generated dialogue
    utterances. We cast the task as a reinforcement learning (RL) problem where we
    jointly train two systems, a generative model to produce response sequences,
    and a discriminator—analagous to the human evaluator in the Turing test— to
    distinguish between the human-generated dialogues and the machine-generated
    ones. The outputs from the discriminator are then used as rewards for the
    generative model, pushing the system to generate dialogues that mostly resemble
    human dialogues.

    In addition to adversarial training we describe a model for adversarial {em
    evaluation} that uses success in fooling an adversary as a dialogue evaluation
    metric, while avoiding a number of potential pitfalls. Experimental results on
    several metrics, including adversarial evaluation, demonstrate that the
    adversarially-trained system generates higher-quality responses than previous
    baselines.

    Incorporating Global Visual Features into Attention-Based Neural Machine Translation

    Iacer Calixto, Qun Liu, Nick Campbell
    Comments: 8 pages (11 including references), 5 figures
    Subjects: Computation and Language (cs.CL)

    We introduce multi-modal, attention-based neural machine translation (NMT)
    models which incorporate visual features into different parts of both the
    encoder and the decoder. We utilise global image features extracted using a
    pre-trained convolutional neural network and incorporate them (i) as words in
    the source sentence, (ii) to initialise the encoder hidden state, and (iii) as
    additional data to initialise the decoder hidden state. In our experiments, we
    evaluate how these different strategies to incorporate global image features
    compare and which ones perform best. We also study the impact that adding
    synthetic multi-modal, multilingual data brings and find that the additional
    data have a positive impact on multi-modal models. We report new
    state-of-the-art results and our best models also significantly improve on a
    comparable phrase-based Statistical MT (PBSMT) model trained on the Multi30k
    data set according to all metrics evaluated. To the best of our knowledge, it
    is the first time a purely neural model significantly improves over a PBSMT
    model on all metrics evaluated on this data set.

    A Multichannel Convolutional Neural Network For Cross-language Dialog State Tracking

    Hongjie Shi, Takashi Ushio, Mitsuru Endo, Katsuyoshi Yamagami, Noriaki Horii
    Comments: Copyright 2016 IEEE. Published in the 2016 IEEE Workshop on Spoken Language Technology (SLT 2016)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    The fifth Dialog State Tracking Challenge (DSTC5) introduces a new
    cross-language dialog state tracking scenario, where the participants are asked
    to build their trackers based on the English training corpus, while evaluating
    them with the unlabeled Chinese corpus. Although the computer-generated
    translations for both English and Chinese corpus are provided in the dataset,
    these translations contain errors and careless use of them can easily hurt the
    performance of the built trackers. To address this problem, we propose a
    multichannel Convolutional Neural Networks (CNN) architecture, in which we
    treat English and Chinese language as different input channels of one single
    CNN model. In the evaluation of DSTC5, we found that such multichannel
    architecture can effectively improve the robustness against translation errors.
    Additionally, our method for DSTC5 is purely machine learning based and
    requires no prior knowledge about the target language. We consider this a
    desirable property for building a tracker in the cross-language context, as not
    every developer will be familiar with both languages.

    Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    The capacity of a neural network to absorb information is limited by its
    number of parameters. Conditional computation, where parts of the network are
    active on a per-example basis, has been proposed in theory as a way of
    dramatically increasing model capacity without a proportional increase in
    computation. In practice, however, there are significant algorithmic and
    performance challenges. In this work, we address these challenges and finally
    realize the promise of conditional computation, achieving greater than 1000x
    improvements in model capacity with only minor losses in computational
    efficiency on modern GPU clusters. We introduce a Sparsely-Gated
    Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward
    sub-networks. A trainable gating network determines a sparse combination of
    these experts to use for each example. We apply the MoE to the tasks of
    language modeling and machine translation, where model capacity is critical for
    absorbing the vast quantities of knowledge available in the training corpora.
    We present model architectures in which a MoE with up to 137 billion parameters
    is applied convolutionally between stacked LSTM layers. On large language
    modeling and machine translation benchmarks, these models achieve significantly
    better results than state-of-the-art at lower computational cost.

    dna2vec: Consistent vector representations of variable-length k-mers

    Patrick Ng
    Comments: 10 pages, 3 figures, 2 tables
    Subjects: Quantitative Methods (q-bio.QM); Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    One of the ubiquitous representation of long DNA sequence is dividing it into
    shorter k-mer components. Unfortunately, the straightforward vector encoding of
    k-mer as a one-hot vector is vulnerable to the curse of dimensionality. Worse
    yet, the distance between any pair of one-hot vectors is equidistant. This is
    particularly problematic when applying the latest machine learning algorithms
    to solve problems in biological sequence analysis. In this paper, we propose a
    novel method to train distributed representations of variable-length k-mers.
    Our method is based on the popular word embedding model word2vec, which is
    trained on a shallow two-layer neural network. Our experiments provide evidence
    that the summing of dna2vec vectors is akin to nucleotides concatenation. We
    also demonstrate that there is correlation between Needleman-Wunsch similarity
    score and cosine similarity of dna2vec vectors.


    Distributed, Parallel, and Cluster Computing

    ARM Wrestling with Big Data: A Study of ARM64 and x64 Servers for Data Intensive Workloads

    Jayanth Kalyanasundaram, Yogesh Simmhan
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    ARM processors have dominated the mobile device market in the last decade due
    to their favorable computing to energy ratio. In this age of Cloud data centers
    and Big Data analytics, the focus is increasingly on power efficient
    processing, rather than just high throughput computing. ARM’s first commodity
    server-grade processor is the recent AMD A1100-series processor, based on a
    64-bit ARM Cortex A57 architecture. In this paper, we study the performance and
    energy efficiency of a server based on this ARM64 CPU, relative to a comparable
    server running an AMD Opteron 3300-series x64 CPU, for Big Data workloads.
    Specifically, we study these for Intel’s HiBench suite of web, query and
    machine learning benchmarks on Apache Hadoop v2.7 in a pseudo-distributed
    setup, for data sizes up to (20GB) files, (5M) web pages and (500M) tuples. Our
    results show that the ARM64 server’s runtime performance is comparable to the
    x64 server for integer-based workloads like Sort and Hive queries, and only
    lags behind for floating-point intensive benchmarks like PageRank, when they do
    not exploit data parallelism adequately. We also see that the ARM64 server
    takes 1/3rd the energy, and has an Energy Delay Product (EDP) that is (50-71\%)
    lower than the x64 server. These results hold significant promise for data
    centers hosting ARM64 servers to reduce their operational costs, while offering
    a competitive performance for Big Data workloads.

    Distributed Random-Fixed Projected Algorithm for Constrained Optimization Over Digraphs

    Pei Xie, Keyou You, Shiji Song, Cheng Wu
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)

    This paper is concerned with a constrained optimization problem over a
    directed graph (digraph) of nodes, in which the cost function is a sum of local
    objectives, and each node only knows its local objective and constraints. To
    collaboratively solve the optimization, most of the existing works require the
    interaction graph to be balanced or “doubly-stochastic”, which is quite
    restrictive and not necessary as shown in this paper. We focus on an epigraph
    form of the original optimization to resolve the “unbalanced” problem, and
    design a novel two-step recursive algorithm with a simple structure. Under
    strongly connected digraphs, we prove that each node asymptotically converges
    to some common optimal solution. Finally, simulations are performed to
    illustrate the effectiveness of the proposed algorithms.

    Coded Computation over Heterogeneous Clusters

    Amirhossein Reisizadehmobarakeh, Saurav Prakash, Ramtin Pedarsani, Salman Avestimehr
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In large-scale distributed computing clusters, such as Amazon EC2, there are
    several types of “system noise” that can result in major degradation of
    performance: system failures, bottlenecks due to limited communication
    bandwidth, latency due to straggler nodes, etc. On the other hand, these
    systems enjoy abundance of redundancy — a vast number of computing nodes and
    large storage capacity. There have been recent results that demonstrate the
    impact of coding for efficient utilization of computation and storage
    redundancy to alleviate the effect of stragglers and communication bottlenecks
    in homogeneous clusters. In this paper, we focus on general heterogeneous
    distributed computing clusters consisting of a variety of computing machines
    with different capabilities. We propose a coding framework for speeding up
    distributed computing in heterogeneous clusters with straggling servers by
    trading redundancy for reducing the latency of computation. In particular, we
    propose Heterogeneous Coded Matrix Multiplication (HCMM) algorithm for
    performing distributed matrix multiplication over heterogeneous clusters that
    is provably asymptotically optimal. Moreover, if the number of worker nodes in
    the cluster is (n), we show that HCMM is (Theta(log n)) times faster than any
    uncoded scheme. We further provide numerical results demonstrating significant
    speedups of up to (49\%) and (34\%) for HCMM in comparison to the “uncoded” and
    “homogeneous coded” schemes, respectively.

    Exploiting the Cloud Control Plane for Fun and Profit

    Josef Spillner
    Comments: 14 pages, 10 figures, unreviewed
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Cloud providers typically charge for their services. There are diverse
    pricing models which often follow a pay-per-use paradigm. The consumers’
    payments are expected to cover all cost which incurs to the provider for
    processing, storage, bandwidth, data centre operation and engineering efforts,
    among others. In contrast, the consumer management interfaces are free of
    charge as they are expected to cause only a minority of the load compared to
    the actual computing services. With new service models and more complex and
    powerful management abilities, it is time to rethink this decision. The paper
    shows how to exploit the control plane of AWS Lambda to implement stateful
    services practically for free and under some circumstances even guaranteed for
    free which if widely deployed would cause a monetary loss for the provider. It
    also elaborates on the consistency model for AWS Lambda.

    Let's HPC: A web-based interactive platform to aid High Performance Computing education

    Akshar Varma (1), Yashwant Keswani (1), Yashodhan Bhatnagar (1), Bhaskar Chaudhury (1) ((1) Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, India)
    Comments: 8 pages, 4 figures. Submitted to EduPar-17. This paper is regarding the Let’s HPC platform which can be found here: this http URL
    Subjects: Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC)

    Let’s HPC (www.letshpc.org) is an open-access online platform to supplement
    conventional classroom oriented High Performance Computing (HPC) and Parallel &
    Distributed Computing (PDC) education. The web based platform provides online
    plotting and analysis tools which allow users to learn, evaluate, teach and see
    the performance of parallel algorithms from a system’s viewpoint. The user can
    quantitatively compare and understand the importance of numerous deterministic
    as well as non-deterministic factors of both the software and the hardware that
    impact the performance of parallel programs. At the heart of this platform is a
    database archiving the performance and execution environment related data of
    standard parallel algorithms executed on different computing architectures
    using different programming environments, this data is contributed by various
    stakeholders in the HPC community. The plotting and analysis tools of our
    platform can be combined seamlessly with the database to aid self-learning,
    teaching, evaluation and discussion of different HPC related topics.
    Instructors of HPC/PDC related courses can use the platform’s tools to
    illustrate the importance of proper analysis in understanding factors impacting
    performance, to encourage peer learning among students, as well as to allow
    students to prepare a standard lab/project report aiding the instructor in
    uniform evaluation. The platform’s modular design enables easy inclusion of
    performance related data from contributors as well as addition of new features
    in the future.

    Observations on Factors Affecting Performance of MapReduce based Apriori on Hadoop Cluster

    Sudhakar Singh, Rakhi Garg, P. K. Mishra
    Comments: 8 pages, 8 figures, International Conference on Computing, Communication and Automation (ICCCA2016)
    Journal-ref: 2016 International Conference on Computing, Communication and
    Automation (ICCCA), Greater Noida, India, 2016, pp. 87-94
    Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    Designing fast and scalable algorithm for mining frequent itemsets is always
    being a most eminent and promising problem of data mining. Apriori is one of
    the most broadly used and popular algorithm of frequent itemset mining.
    Designing efficient algorithms on MapReduce framework to process and analyze
    big datasets is contemporary research nowadays. In this paper, we have focused
    on the performance of MapReduce based Apriori on homogeneous as well as on
    heterogeneous Hadoop cluster. We have investigated a number of factors that
    significantly affects the execution time of MapReduce based Apriori running on
    homogeneous and heterogeneous Hadoop Cluster. Factors are specific to both
    algorithmic and non-algorithmic improvements. Considered factors specific to
    algorithmic improvements are filtered transactions and data structures.
    Experimental results show that how an appropriate data structure and filtered
    transactions technique drastically reduce the execution time. The
    non-algorithmic factors include speculative execution, nodes with poor
    performance, data locality & distribution of data blocks, and parallelism
    control with input split size. We have applied strategies against these factors
    and fine tuned the relevant parameters in our particular application.
    Experimental results show that if cluster specific parameters are taken care of
    then there is a significant reduction in execution time. Also we have discussed
    the issues regarding MapReduce implementation of Apriori which may
    significantly influence the performance.


    Learning

    On the Parametric Study of Lubricating Oil Production using an Artificial Neural Network (ANN) Approach

    Masood Tehrani, Mary Ahmadi
    Comments: 7 pages, 2 figures
    Subjects: Learning (cs.LG)

    In this study, an Artificial Neural Network (ANN) approach is utilized to
    perform a parametric study on the process of extraction of lubricants from
    heavy petroleum cuts. To train the model, we used field data collected from an
    industrial plant. Operational conditions of feed and solvent flow rate,
    Temperature of streams and mixing rate were considered as the input to the
    model, whereas the flow rate of the main product was considered as the output
    of the ANN model. A feed-forward Multi-Layer Perceptron Neural Network was
    successfully applied to capture the relationship between inputs and output
    parameters.

    Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    The capacity of a neural network to absorb information is limited by its
    number of parameters. Conditional computation, where parts of the network are
    active on a per-example basis, has been proposed in theory as a way of
    dramatically increasing model capacity without a proportional increase in
    computation. In practice, however, there are significant algorithmic and
    performance challenges. In this work, we address these challenges and finally
    realize the promise of conditional computation, achieving greater than 1000x
    improvements in model capacity with only minor losses in computational
    efficiency on modern GPU clusters. We introduce a Sparsely-Gated
    Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward
    sub-networks. A trainable gating network determines a sparse combination of
    these experts to use for each example. We apply the MoE to the tasks of
    language modeling and machine translation, where model capacity is critical for
    absorbing the vast quantities of knowledge available in the training corpora.
    We present model architectures in which a MoE with up to 137 billion parameters
    is applied convolutionally between stacked LSTM layers. On large language
    modeling and machine translation benchmarks, these models achieve significantly
    better results than state-of-the-art at lower computational cost.

    Predicting Demographics of High-Resolution Geographies with Geotagged Tweets

    Omar Montasser, Daniel Kifer
    Comments: 6 pages, AAAI-17 preprint
    Subjects: Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    In this paper, we consider the problem of predicting demographics of
    geographic units given geotagged Tweets that are composed within these units.
    Traditional survey methods that offer demographics estimates are usually
    limited in terms of geographic resolution, geographic boundaries, and time
    intervals. Thus, it would be highly useful to develop computational methods
    that can complement traditional survey methods by offering demographics
    estimates at finer geographic resolutions, with flexible geographic boundaries
    (i.e. not confined to administrative boundaries), and at different time
    intervals. While prior work has focused on predicting demographics and health
    statistics at relatively coarse geographic resolutions such as the county-level
    or state-level, we introduce an approach to predict demographics at finer
    geographic resolutions such as the blockgroup-level. For the task of predicting
    gender and race/ethnicity counts at the blockgroup-level, an approach adapted
    from prior work to our problem achieves an average correlation of 0.389
    (gender) and 0.569 (race) on a held-out test dataset. Our approach outperforms
    this prior approach with an average correlation of 0.671 (gender) and 0.692
    (race).

    Effective and Extensible Feature Extraction Method Using Genetic Algorithm-Based Frequency-Domain Feature Search for Epileptic EEG Multi-classification

    Tingxi Wen, Zhongnan Zhang
    Comments: 17 pages, 9 figures
    Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

    In this paper, a genetic algorithm-based frequency-domain feature search
    (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of
    epilepsy. In this method, frequency-domain features are first searched and then
    combined with nonlinear features. Subsequently, these features are selected and
    optimized to classify EEG signals. The extracted features are analyzed
    experimentally. The features extracted by GAFDS show remarkable independence,
    and they are superior to the nonlinear features in terms of the ratio of
    inter-class distance and intra-class distance. Moreover, the proposed feature
    search method can additionally search for features of instantaneous frequency
    in a signal after Hilbert transformation. The classification results achieved
    using these features are reasonable, thus, GAFDS exhibits good extensibility.
    Multiple classic classifiers (i.e., (k)-nearest neighbor, linear discriminant
    analysis, decision tree, AdaBoost, multilayer perceptron, and Na”ive Bayes)
    achieve good results by using the features generated by GAFDS method and the
    optimized selection. Specifically, the accuracies for the two-classification
    and three-classification problems may reach up to 99% and 97%, respectively.
    Results of several cross-validation experiments illustrate that GAFDS is
    effective in feature extraction for EEG classification. Therefore, the proposed
    feature selection and optimization model can improve classification accuracy.

    Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing World

    Sahil Garg, Irina Rish, Guillermo Cecchi, Aurelie Lozano
    Comments: Under review as a conference paper at ICLR 2017
    Subjects: Learning (cs.LG)

    In this paper, we focus on online representation learning in non-stationary
    environments which may require continuous adaptation of model architecture. We
    propose a novel online dictionary-learning (sparse-coding) framework which
    incorporates the addition and deletion of hidden units (dictionary elements),
    and is inspired by the adult neurogenesis phenomenon in the dentate gyrus of
    the hippocampus, known to be associated with improved cognitive function and
    adaptation to new environments. In the online learning setting, where new input
    instances arrive sequentially in batches, the neuronal-birth is implemented by
    adding new units with random initial weights (random dictionary elements); the
    number of new units is determined by the current performance (representation
    error) of the dictionary, higher error causing an increase in the birth rate.
    Neuronal-death is implemented by imposing l1/l2-regularization (group sparsity)
    on the dictionary within the block-coordinate descent optimization at each
    iteration of our online alternating minimization scheme, which iterates between
    the code and dictionary updates. Finally, hidden unit connectivity adaptation
    is facilitated by introducing sparsity in dictionary elements. Our empirical
    evaluation on several real-life datasets (images and language) as well as on
    synthetic data demonstrates that the proposed approach can considerably
    outperform the state-of-art fixed-size (nonadaptive) online sparse coding of
    Mairal et al. (2009) in the presence of nonstationary data. Moreover, we
    identify certain properties of the data (e.g., sparse inputs with nearly
    non-overlapping supports) and of the model (e.g., dictionary sparsity)
    associated with such improvements.

    Label Propagation on K-partite Graphs with Heterophily

    Dingxiong Deng, Fan Bai, Yiqi Tang, Shuigeng Zhou, Cyrus Shahabi, Linhong Zhu
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

    In this paper, for the first time, we study label propagation in
    heterogeneous graphs under heterophily assumption. Homophily label propagation
    (i.e., two connected nodes share similar labels) in homogeneous graph (with
    same types of vertices and relations) has been extensively studied before.
    Unfortunately, real-life networks are heterogeneous, they contain different
    types of vertices (e.g., users, images, texts) and relations (e.g.,
    friendships, co-tagging) and allow for each node to propagate both the same and
    opposite copy of labels to its neighbors. We propose a (mathcal{K})-partite
    label propagation model to handle the mystifying combination of heterogeneous
    nodes/relations and heterophily propagation. With this model, we develop a
    novel label inference algorithm framework with update rules in near-linear time
    complexity. Since real networks change over time, we devise an incremental
    approach, which supports fast updates for both new data and evidence (e.g.,
    ground truth labels) with guaranteed efficiency. We further provide a utility
    function to automatically determine whether an incremental or a re-modeling
    approach is favored. Extensive experiments on real datasets have verified the
    effectiveness and efficiency of our approach, and its superiority over the
    state-of-the-art label propagation methods.

    Regularizing Neural Networks by Penalizing Confident Output Distributions

    Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, Geoffrey Hinton
    Comments: Submitted to ICLR 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    We systematically explore regularizing neural networks by penalizing low
    entropy output distributions. We show that penalizing low entropy output
    distributions, which has been shown to improve exploration in reinforcement
    learning, acts as a strong regularizer in supervised learning. Furthermore, we
    connect a maximum entropy based confidence penalty to label smoothing through
    the direction of the KL divergence. We exhaustively evaluate the proposed
    confidence penalty and label smoothing on 6 common benchmarks: image
    classification (MNIST and Cifar-10), language modeling (Penn Treebank), machine
    translation (WMT’14 English-to-German), and speech recognition (TIMIT and WSJ).
    We find that both label smoothing and the confidence penalty improve
    state-of-the-art models across benchmarks without modifying existing
    hyperparameters, suggesting the wide applicability of these regularizers.

    Aggressive Sampling for Multi-class to Binary Reduction with Applications to Text Classification

    Bikash Joshi, Massih-Reza Amini, Ioannis Partalas, Franck Iutzeler, Yury Maximov
    Comments: 18 pages, 4 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We address the problem of multiclass classification in the case where the
    number of classes is very large. We propose a multiclass to binary reduction
    strategy, in which we transform the original problem into a binary
    classification one over pairs of examples. We derive generalization bounds for
    the error of the classifier of pairs using local Rademacher complexity, and a
    double sampling strategy (in the terms of examples and classes) that speeds up
    the training phase while maintaining a very low memory usage. Experiments are
    carried for text classification on DMOZ and Wikipedia collections with up to
    20,000 classes in order to show the efficiency of the proposed method.

    Learning what to look in chest X-rays with a recurrent visual attention model

    Petros-Pavlos Ypsilantis, Giovanni Montana
    Comments: NIPS 2016 Workshop on Machine Learning for Health
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    X-rays are commonly performed imaging tests that use small amounts of
    radiation to produce pictures of the organs, tissues, and bones of the body.
    X-rays of the chest are used to detect abnormalities or diseases of the
    airways, blood vessels, bones, heart, and lungs. In this work we present a
    stochastic attention-based model that is capable of learning what regions
    within a chest X-ray scan should be visually explored in order to conclude that
    the scan contains a specific radiological abnormality. The proposed model is a
    recurrent neural network (RNN) that learns to sequentially sample the entire
    X-ray and focus only on informative areas that are likely to contain the
    relevant information. We report on experiments carried out with more than
    (100,000) X-rays containing enlarged hearts or medical devices. The model has
    been trained using reinforcement learning methods to learn task-specific
    policies.

    Comparative study on supervised learning methods for identifying phytoplankton species

    Thi-Thu-Hong Phan (LISIC), Emilie Poisson Caillault (LISIC), André Bigand (LISIC)
    Journal-ref: 2016 IEEE Sixth International Conference on Communications and
    Electronics (ICCE), Jul 2016, Ha Long, Vietnam. pp.283 – 288, 2016
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Phytoplankton plays an important role in marine ecosystem. It is defined as a
    biological factor to assess marine quality. The identification of phytoplankton
    species has a high potential for monitoring environmental, climate changes and
    for evaluating water quality. However, phytoplankton species identification is
    not an easy task owing to their variability and ambiguity due to thousands of
    micro and pico-plankton species. Therefore, the aim of this paper is to build a
    framework for identifying phytoplankton species and to perform a comparison on
    different features types and classifiers. We propose a new features type
    extracted from raw signals of phytoplankton species. We then analyze the
    performance of various classifiers on the proposed features type as well as two
    other features types for finding the robust one. Through experiments, it is
    found that Random Forest using the proposed features gives the best
    classification results with average accuracy up to 98.24%.

    dna2vec: Consistent vector representations of variable-length k-mers

    Patrick Ng
    Comments: 10 pages, 3 figures, 2 tables
    Subjects: Quantitative Methods (q-bio.QM); Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    One of the ubiquitous representation of long DNA sequence is dividing it into
    shorter k-mer components. Unfortunately, the straightforward vector encoding of
    k-mer as a one-hot vector is vulnerable to the curse of dimensionality. Worse
    yet, the distance between any pair of one-hot vectors is equidistant. This is
    particularly problematic when applying the latest machine learning algorithms
    to solve problems in biological sequence analysis. In this paper, we propose a
    novel method to train distributed representations of variable-length k-mers.
    Our method is based on the popular word embedding model word2vec, which is
    trained on a shallow two-layer neural network. Our experiments provide evidence
    that the summing of dna2vec vectors is akin to nucleotides concatenation. We
    also demonstrate that there is correlation between Needleman-Wunsch similarity
    score and cosine similarity of dna2vec vectors.

    A Multichannel Convolutional Neural Network For Cross-language Dialog State Tracking

    Hongjie Shi, Takashi Ushio, Mitsuru Endo, Katsuyoshi Yamagami, Noriaki Horii
    Comments: Copyright 2016 IEEE. Published in the 2016 IEEE Workshop on Spoken Language Technology (SLT 2016)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    The fifth Dialog State Tracking Challenge (DSTC5) introduces a new
    cross-language dialog state tracking scenario, where the participants are asked
    to build their trackers based on the English training corpus, while evaluating
    them with the unlabeled Chinese corpus. Although the computer-generated
    translations for both English and Chinese corpus are provided in the dataset,
    these translations contain errors and careless use of them can easily hurt the
    performance of the built trackers. To address this problem, we propose a
    multichannel Convolutional Neural Networks (CNN) architecture, in which we
    treat English and Chinese language as different input channels of one single
    CNN model. In the evaluation of DSTC5, we found that such multichannel
    architecture can effectively improve the robustness against translation errors.
    Additionally, our method for DSTC5 is purely machine learning based and
    requires no prior knowledge about the target language. We consider this a
    desirable property for building a tracker in the cross-language context, as not
    every developer will be familiar with both languages.

    Optimization on Product Submanifolds of Convolution Kernels

    Mete Ozay, Takayuki Okatani
    Comments: 3 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We address a problem of optimization on product of embedded submanifolds of
    convolution kernels (PEMs) in convolutional neural networks (CNNs). First, we
    explore metric and curvature properties of PEMs in terms of component
    manifolds. Next, we propose a SGD method, called C-SGD, by generalizing the SGD
    methods employed on kernel submanifolds for optimization on product of
    different collections of kernel submanifolds. Then, we analyze convergence
    properties of the C-SGD considering sectional curvature properties of PEMs. In
    the theoretical results, we expound the constraints that can be used to compute
    adaptive step sizes of the C-SGD in order to assure the asymptotic convergence.

    Lyrics-to-Audio Alignment by Unsupervised Discovery of Repetitive Patterns in Vowel Acoustics

    Sungkyun Chang, Kyogu Lee
    Comments: 13 pages, Under review as a journal paper at IEEE/ACM Transactions on Audio, Speech, and Language Processing
    Subjects: Sound (cs.SD); Learning (cs.LG)

    Most of the previous approaches to lyrics-to-audio alignment used a
    pre-developed automatic speech recognition (ASR) system that innately suffered
    from several difficulties to adapt the speech model to individual singers. A
    significant aspect missing in previous works is the self-learnability of
    repetitive vowel patterns in the singing voice, where the vowel part used is
    more consistent than the consonant part. Based on this, our system first learns
    a discriminative subspace of vowel sequences, based on weighted symmetric
    non-negative matrix factorization (WS-NMF), by taking the self-similarity of a
    standard acoustic feature as an input. Then, we make use of canonical time
    warping (CTW), derived from a recent computer vision technique, to find an
    optimal spatiotemporal transformation between the text and the acoustic
    sequences. Experiments with Korean and English data sets showed that deploying
    this method after a pre-developed, unsupervised, singing source separation
    achieved more promising results than other state-of-the-art unsupervised
    approaches and an existing ASR-based system.

    Learning Policies for Markov Decision Processes from Data

    Manjesh K. Hanawal, Hao Liu, Henghui Zhu, Ioannis Ch. Paschalidis
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)

    We consider the problem of learning a policy for a Markov decision process
    consistent with data captured on the state-actions pairs followed by the
    policy. We assume that the policy belongs to a class of parameterized policies
    which are defined using features associated with the state-action pairs. The
    features are known a priori, however, only an unknown subset of them could be
    relevant. The policy parameters that correspond to an observed target policy
    are recovered using (ell_1)-regularized logistic regression that best fits the
    observed state-action samples. We establish bounds on the difference between
    the average reward of the estimated and the original policy (regret) in terms
    of the generalization error and the ergodic coefficient of the underlying
    Markov chain. To that end, we combine sample complexity theory and sensitivity
    analysis of the stationary distribution of Markov chains. Our analysis suggests
    that to achieve regret within order (O(sqrt{epsilon})), it suffices to use
    training sample size on the order of (Omega(log n cdot poly(1/epsilon))),
    where (n) is the number of the features. We demonstrate the effectiveness of
    our method on a synthetic robot navigation example.


    Information Theory

    Decoding of Interleaved Reed-Solomon Codes Using Improved Power Decoding

    Sven Puchinger, Johan Rosenkilde né Nielsen
    Comments: 5 pages, submitted to IEEE International Symposium on Information Theory 2017
    Subjects: Information Theory (cs.IT)

    We propose a new partial decoding algorithm for (m)-interleaved Reed–Solomon
    (IRS) codes that can decode, with high probability, a random error of relative
    weight (1-R^{frac{m}{m+1}}) at all code rates (R), in time polynomial in the
    code length (n). This is an asymptotic improvement over the previous
    state-of-the-art for all rates, and the first improvement for (R > 1/3) in the
    last 20 years. The method combines collaborative decoding of IRS codes with
    power decoding up to the Johnson radius.

    Exponent Function for Stationary Memoryless Channels with Input Cost at Rates above the Capacity

    Yasutada Oohama
    Comments: 15pages
    Subjects: Information Theory (cs.IT)

    We consider the stationaly memoryless channels with input cost. We prove that
    for transmission rates above the capacity the correct probability of decoding
    tends to zero exponentially as the block length (n) of codes tends to infinity.
    In the case where both of channel input and output sets are finite, we
    determine the optimal exponent function on the above exponential decay of the
    correct probability. To derive this result we use a new technique called the
    recuresive method, which is based on the information spectrum approach. The
    recursive method utilize a certain recursive structure on the information
    spectrum quantities.

    Storage Allocation for Multi-Class Distributed Data Storage Systems

    Koosha Pourtahmasi Roshandeh, Moslem Noori, Masoud Ardakani, Chintha Tellambura
    Subjects: Information Theory (cs.IT)

    Distributed storage systems (DSSs) provide a scalable solution for reliably
    storing massive amounts of data coming from various sources. Heterogeneity of
    these data sources often means different data classes (types) exist in a DSS,
    each needing a different level of quality of service (QoS). As a result,
    efficient data storage and retrieval processes that satisfy various QoS
    requirements are needed. This paper studies storage allocation, meaning how
    data of different classes must be spread over the set of storage nodes of a
    DSS. More specifically, assuming a probabilistic access to the storage nodes,
    we aim at maximizing the weighted sum of the probability of successful data
    recovery of data classes, when for each class a minimum QoS (probability of
    successful recovery) is guaranteed. Solving this optimization problem for a
    general setup is intractable. Thus, we find the optimal storage allocation when
    the data of each class is spread minimally over the storage nodes, i.e. minimal
    spreading allocation (MSA). Using upper bounds on the performance of the
    optimal storage allocation, we show that the optimal MSA allocation approaches
    the optimal performance in many practical cases. Computer simulations are also
    presented to better illustrate the results.

    On Spectral Coexistence of CP-OFDM and FB-MC Waveforms in 5G Networks

    Quentin Bodinier, Faouzi Bader, Jacques Palicot
    Comments: Manuscript submitted for review to IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    Future 5G networks will serve a variety of applications that will coexist on
    the same spectral band and geographical area, in an uncoordinated and
    asynchronous manner. It is widely accepted that using CP-OFDM, the waveform
    used by most current communication systems, will make it difficult to achieve
    this paradigm. Especially, CP-OFDM is not adapted for spectral coexistence
    because of its poor spectral localization. Therefore, it has been widely
    suggested to use filter bank based multi carrier (FB-MC) waveforms with
    enhanced spectral localization to replace CP-OFDM. Especially, FB-MC waveforms
    are expected to facilitate coexistence with legacy CP-OFDM based systems.
    However, this idea is based on the observation of the PSD of FB-MC waveforms
    only. In this paper, we demonstrate that this approach is flawed and show what
    metric should be used to rate interference between FB-MC and CP-OFDM systems.
    Finally, our results show that using FB-MC waveforms does not facilitate
    coexistence with CP-OFDM based systems to a high extent.

    Performance Limits of Solutions to Network Utility Maximization Problems

    R. L. G. Cavalcante, S. Stańczak
    Subjects: Information Theory (cs.IT)

    We study performance limits of solutions to utility maximization problems
    (e.g., max-min problems) in wireless networks as a function of the power budget
    (ar{p}) available to transmitters. Special focus is devoted to the utility
    and the transmit energy efficiency (i.e., utility over transmit power) of the
    solution. Briefly, we show tight bounds for the general class of network
    utility optimization problems that can be solved by computing conditional
    eigenvalues of standard interference mappings. The proposed bounds, which are
    based on the concept of asymptotic functions, are simple to compute, provide us
    with good estimates of the performance of networks for any value of (ar{p})
    in many real-world applications, and enable us to determine points in which
    networks move from a noise limited regime to an interference limited regime.
    Furthermore, they also show that the utility and the transmit energy efficiency
    scales as (Theta(1)) and (Theta(1/ar{p})), respectively, as
    (ar{p} oinfty).

    On the Robustness of Coordinated Beamforming to Uncoordinated Interference and CSI Uncertainty

    George C. Alexandropoulos, Paul Ferrand, Constantinos B. Papadias
    Comments: 6 pages, 4 figures, IEEE WCNC 2017
    Subjects: Information Theory (cs.IT)

    As network deployments become denser, interference arises as a dominant
    performance degradation factor. To confront with this trend, Long Term
    Evolution (LTE) incorporated features aiming at enabling cooperation among
    different base stations, a technique termed as Coordinated Multi Point (CoMP).
    Recent field trial results and theoretical studies of the performance of CoMP
    schemes revealed, however, that their gains are not as high as initially
    expected, despite their large coordination overhead. In this paper, we review
    recent advanced Coordinated Beamforming (CB) schemes, a special family of CoMP
    that reduces the coordination overhead through a joint choice of transmit and
    receive linear filters. We focus on assessing their resilience to uncoordinated
    interference and Channel State Information (CSI) imperfections, which both
    severely limit the performance gains of all CoMP schemes. We present a simple
    yet encompassing system model that aims at incorporating different parameters
    of interest in the relative interference power and CSI errors, and then utilize
    it for the performance evaluation of the state-of-the-art in CB schemes. It is
    shown that blindly applying CB in all system scenarios can indeed be
    counter-productive.

    Correcting Two Deletions and Insertions in Racetrack Memory

    Alireza Vahid, Georgios Mappouras, Daniel J. Sorin, Robert Calderbank
    Comments: submitted to ISIT 2017
    Subjects: Information Theory (cs.IT)

    Racetrack memory is a non-volatile memory engineered to provide both high
    density and low latency, that is subject to synchronization or shift errors.
    This paper describes a fast coding solution, in which delimiter bits assist in
    identifying the type of shift error, and easily implementable graph-based codes
    are used to correct the error, once identified. A code that is able to detect
    and correct double shift errors is described in detail.

    Low-Complexity Puncturing and Shortening of Polar Codes

    Valerio Bioglio, Frederic Gabry, Ingmar Land
    Comments: to appear in WCNC 2017 – “Polar Coding in Wireless Communications: Theory and Implementation” Workshop
    Subjects: Information Theory (cs.IT)

    In this work, we address the low-complexity construction of shortened and
    punctured polar codes from a unified view. While several independent puncturing
    and shortening designs were attempted in the literature, our goal is a unique,
    low-complexity construction encompassing both techniques in order to achieve
    any code length and rate. We observe that our solution significantly reduces
    the construction complexity as compared to state-of-the-art solutions while
    providing a block error rate performance comparable to constructions that are
    highly optimized for specific lengths and rates. This makes the constructed
    polar codes highly suitable for practical application in future communication
    systems requiring a large set of polar codes with different lengths and rates.

    Almost Optimal Phaseless Compressed Sensing with Sublinear Decoding Time

    Vasileios Nakos
    Subjects: Information Theory (cs.IT)

    In the problem of compressive phase retrieval, one wants to recover an
    approximately (k)-sparse signal (x in mathbb{C}^n), given the magnitudes of
    the entries of (Phi x), where (Phi in mathbb{C}^{m imes n}). This problem
    has received a fair amount of attention, with sublinear time algorithms
    appearing in cite{cai2014super,pedarsani2014phasecode,yin2015fast}. In this
    paper we further investigate the direction of sublinear decoding for real
    signals by giving a recovery scheme under the (ell_2 / ell_2) guarantee, with
    almost optimal, (Oh(k log n )), number of measurements. Our result
    outperforms all previous sublinear time algorithms in the number of
    measurements. Moreover, we give a very simple deterministic scheme that
    recovers all (k)-sparse vectors in (Oh(k^3)) time, using (4k-1) measurements.

    Second-order Cyclostationarity-based Detection of LTE SC-FDMA Signals for Cognitive Radio Systems

    Walid A. Jerjawi, Yahia A. Eldemerdash, Octavia A. Dobre
    Subjects: Information Theory (cs.IT)

    In this paper, we investigate the detection of long term evolution (LTE)
    single carrier-frequency division multiple access (SC-FDMA) signals, with
    application to cognitive radio systems. We explore the second-order
    cyclostationarity of the LTE SC-FDMA signals, and apply results obtained for
    the cyclic autocorrelation function to signal detection. The proposed detection
    algorithm provides a very good performance under various channel conditions,
    with a short observation time and at low signal-to-noise ratios, with reduced
    complexity. The validity of the proposed algorithm is verified using signals
    generated and acquired by laboratory instrumentation, and the experimental
    results show a good match with computer simulation results.

    On the Many-Help-One Problem with Independently Degraded Helpers

    Albrecht Wolf, Diana Cristina González, Meik Dörpinghaus, José Cândido Silveira Santos Filho, Gerhard Fettweis
    Subjects: Information Theory (cs.IT)

    This work provides new results for the separated encoding of correlated
    discrete memoryless sources. We address the scenario where an arbitrary number
    of auxiliary sources (a.k.a. helpers) are supposed to help decode a single
    primary source errorless. The rate-distortion analysis of such a system is the
    so-called many-help-one problem. We focus on the case in which the auxiliary
    sources are conditionally independent, given the primary source. We provide an
    inner and an outer bound of the rate-distortion region for what we call the
    strong many-help-one problem, where the auxiliary sources must be also
    recovered under certain distortion constraints. Furthermore, based on known
    results from the CEO problem we provide the admissible rate region for what we
    call the weak many-help-one problem, where the auxiliary sources do not need to
    be recovered serving merely as side information to help recover the primary
    source. Both scenarios find important application to emerging cooperative
    techniques in which a direct-link communication is assisted by multiple lossy
    relaying-link transmissions. Motivated by this application, we specialize the
    referred scenarios to binary sources and the Hamming distortion measure.
    Remarkably, we show that for the weak many-help-one problem the independent
    decoding of the auxiliary sources can achieve optimal performance.

    Minimax Optimal Estimators for Additive Scalar Functionals of Discrete Distributions

    Kazuto Fukuchi, Jun Sakuma
    Comments: 37 pages, 1 figure
    Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)

    In this paper, we consider estimators for an additive functional of (phi),
    which is defined as ( heta(P;phi)=sum_{i=1}^kphi(p_i)), from (n) i.i.d.
    random samples drawn from a discrete distribution (P=(p_1,…,p_k)) with
    alphabet size (k). We propose a minimax optimal estimator for the estimation
    problem of the additive functional. We reveal that the minimax optimal rate is
    substantially characterized by the divergence speed of the fourth derivative of
    (phi). As a result, we show that there is no consistent estimator if the
    divergence speed of the fourth derivative of (phi) is larger than (p^{-4}).
    Furthermore, if the divergence speed of the fourth derivative of (phi) is
    (p^{4-alpha}) for (alpha in (0,1)), the minimax optimal rate is obtained
    within a universal multiplicative constant as (frac{k^2}{(nln n)^{2alpha}} +
    frac{k^{2-2alpha}}{n}).

    Zero-Delay Rate Distortion via Filtering for Unstable Vector Gaussian Sources

    Photios A. Stavrou, Jan Ostergaard, Charalambos D. Charalambous, Milan Derpich
    Comments: 7 pages, 5 figures, an extended version of the paper submitted to ISIT 2017
    Subjects: Information Theory (cs.IT)

    In this work, we deal with zero-delay source coding of an unstable vector
    Gaussian Auto-Regressive (AR) source under average mean square error fidelity
    criterion. To begin with, we consider zero-delay coding of a vector Gaussian
    AR(1) source modeled in state space form. It turns out to be convenient to use
    an asymptotically stationary time-domain scheme with feedback, which was
    recently proposed in [1] in the context of filtering theory applications via
    the Information Nonanticipative Rate Distortion Function (INRDF). In this
    scheme, the feedback path comprises a Kalman filter, which produces an estimate
    of the source. Our idea is then to encode the vector innovations due to Kalman
    filtering via lattice quantization with subtractive dither and memoryless
    entropy coding. We show that the resulting system is stable and furthermore
    provide new bounds to the zero-delay RDF. We then generalize our results to
    vector Gaussian AR sources of any order. With this, we are able to establish a
    time-domain approach with feedback, which by use of a single filter gets very
    close to the zero-delay RDF for vector Gaussian AR sources of any order.
    Interestingly, for infinite dimensional vector sources, the INRDF coincides
    with the Optimal Performance Theoretically Attainable (OPTA) by zero-delay
    codes.

    The Reliability Function for the Additive White Gaussian Noise Channel at Rates above the Capacity

    Yasutada Oohama
    Comments: 11pages
    Subjects: Information Theory (cs.IT)

    We consider the additive white Gaussian noise channels. We prove that the
    error probability of decoding tends to one exponentially for rates above the
    capacity and derive the optimal exponent function. We shall demonstrate that
    the information spectrum approach is quite useful for investigating this
    problem.

    User Activity Detection via Group Testing and Coded Computation

    Matthias Frey, Igor Bjelaković, Sławomir Stańczak
    Subjects: Information Theory (cs.IT)

    Inspired by group testing algorithms and the coded computation paradigm, we
    propose and analyze a novel multiple access scheme for detecting active users
    in large-scale networks. The scheme consists of a simple randomized detection
    algorithm that uses computation coding as intermediate steps for computing
    logical disjunction functions over the multiple access channel (MAC). First we
    show that given an efficient MAC code for disjunction computation the algorithm
    requires (O(k log (frac{N}{k }))) decision steps for detecting (k) active
    users out of (N+k) users. Subsequently we present a simple suboptimal code for
    a class of MACs with arbitrarily varying sub-gaussian noise that uniformly
    requires (O (k log (N) max { log k , log log N } )) channel uses for
    solving the activity detection problem. This shows that even in the presence of
    noise an efficient detection of active users is possible. Our approach reveals
    that the true crux of the matter lies in constructing efficient codes for
    computing disjunctions over a MAC.

    On The Equivalence of Projections In Relative (α)-Entropy and Rényi Divergence

    P. N. Karthik, Rajesh Sundaresan
    Comments: 13 pages
    Subjects: Information Theory (cs.IT)

    The aim of this work is to establish that two recently published projection
    theorems, one dealing with a parametric generalization of relative entropy and
    another dealing with R’enyi divergence, are equivalent under a correspondence
    on the space of probability measures. Further, we demonstrate that the
    associated “Pythagorean” theorems are equivalent under this correspondence.
    Finally, we apply Eguchi’s method of obtaining Riemannian metrics from general
    divergence functions to show that the geometry arising from the above
    divergences are equivalent under the aforementioned correspondence.

    Bayesian definition of random sequences with respect to conditional probabilities

    Hayato Takahashi
    Comments: submitted to ISIT2017
    Subjects: Information Theory (cs.IT)

    We review the recent progress on the definition of randomness with respect to
    conditional probabilities and a generalization of van Lambalgen theorem
    (Takahashi 2006, 2008, 2011). In addition we show a new result on the random
    sequences when the conditional probabilities are mutually singular, which is a
    generalization of Kjos Hanssen’s theorem (2010). Finally we propose a
    definition of random sequences with respect to conditional probability and
    argue the validity of the definition from the Bayesian statistical point of
    view.

    Codes for Channels With Segmented Edits

    Mahed Abroshan, Ramji Venkataramanan, Albert Guillen i Fabregas
    Comments: Shorter version submitted to the 2017 IEEE International Symposium on Information Theory
    Subjects: Information Theory (cs.IT)

    We consider insertion and deletion channels with the additional assumption
    that the channel input sequence is implicitly divided into segments such that
    at most one edit can occur within a segment. We further assume that there are
    no segment markers in the received sequence. We propose code constructions for
    the segmented deletion, segmented insertion, and segmented insertion-deletion
    channels based on subsets of VT codes chosen with pre-determined prefixes
    and/or suffixes. The proposed codes are zero-error, can be decoded segment-by-
    segment, and their rate scaling as the segment length increases is the same as
    that of the maximal code.

    SCW Codes for Optimal CSI-Free Detection in Diffusive Molecular Communications

    Vahid Jamali, Arman Ahmadzadeh, Nariman Farsad, Robert Schober, Andrea Goldsmith
    Comments: This is an extended version of a paper submitted to IEEE International Symposium on Information Theory (ISIT) 2017
    Subjects: Information Theory (cs.IT)

    Instantaneous or statistical channel state information (CSI) is needed for
    most detection schemes developed in the molecular communication (MC)
    literature. Since the MC channel changes, e.g., due to variations in the
    velocity of flow, the temperature, or the distance between transmitter and
    receiver, CSI acquisition has to be conducted repeatedly to keep track of CSI
    variations. Frequent CSI acquisition may entail a large overhead whereas
    infrequent CSI acquisition may result in a low CSI estimation quality. To cope
    with these issues, we design codes which facilitate maximum likelihood sequence
    detection at the receiver without instantaneous or statistical CSI. In
    particular, assuming concentration shift keying modulation, we show that a
    class of codes, referred to as strongly constant-weight (SCW) codes, enables
    optimal CSI-free sequence detection at the cost of decreasing the data rate.
    For the proposed SCW codes, we analyze the code rate and the error rate.
    Simulation results verify our analytical derivations and reveal that the
    proposed CSI-free detector for SCW codes outperforms the baseline coherent and
    non-coherent detectors for uncoded transmission.

    A New Combination of Message Passing Techniques for Receiver Design in MIMO-OFDM Systems

    Chuanzong Zhang, Zhengdao Yuan, Zhongyong Wang, Qinghua Guo
    Subjects: Information Theory (cs.IT)

    In this paper, we propose a new combined message passing algorithm which
    allows belief propagation (BP) and mean filed (MF) applied on a same factor
    node, so that MF can be applied to hard constraint factors. Based on the
    proposed message passing algorithm, a iterative receiver is designed for
    MIMO-OFDM systems. Both BP and MF are exploited to deal with the hard
    constraint factor nodes involving the multiplication of channel coefficients
    and data symbols to reduce the complexity of the only BP used. The numerical
    results show that the BER performance of the proposed low complexity receiver
    closely approach that of the state-of-the-art receiver, where only BP is used
    to handled the hard constraint factors, in the high SNRs.

    Delivery Latency Regions in Fog-RANs with Edge Caching and Cloud Processing

    Jasper Goseling, Osvaldo Simeone, Petar Popovski
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    A Fog Radio Access Network (F-RAN) is a cellular wireless system that enables
    content delivery via edge caching, i.e., storage of popular content at the edge
    nodes (ENs), and cloud processing. The existing information-theoretic analyses
    of F-RAN systems, and special cases thereof, make the assumption that all files
    in the content library are equally relevant, and hence all users’ requests
    should be guaranteed the same delivery latency (i.e., transmission time). In
    practice, it is generally desirable to provide different latency levels as a
    function of the users’ requests, e.g., based on the current popularity ranking
    of the contents. This paper studies the problem of characterizing the region of
    delivery latencies for all possible users’ requests under a per-EN cache
    capacity constraint. For the case with two ENs and two users, the latency
    region is characterized in the high-SNR regime in terms of the Normalized
    Delivery Time (NDT) metric, introduced in prior work.

    A Practical Approach for Successive Omniscience

    Ni Ding, Rodney A. Kennedy, Parastoo Sadeghi
    Subjects: Information Theory (cs.IT)

    The system that we study in this paper contains a set of users that observe a
    discrete memoryless multiple source and communicate via noise-free channels
    with the aim of attaining omniscience, the state that all users recover the
    entire multiple source. We adopt the concept of successive omniscience (SO),
    i.e., letting the local omniscience in some user subset be attained before the
    global omniscience in the entire system, and consider the problem of how to
    efficiently attain omniscience in a successive manner. Based on the existing
    results on SO, we propose a CompSetSO algorithm for determining a complimentary
    set, a user subset in which the local omniscience can be attained first without
    increasing the sum-rate, the total number of communications, for the global
    omniscience. We also derive a sufficient condition for a user subset to be
    complimentary so that running the CompSetSO algorithm only requires a lower
    bound, instead of the exact value, of the minimum sum-rate for attaining global
    omniscience. The CompSetSO algorithm returns a complimentary user subset in
    polynomial time. We show by example how to recursively apply the CompSetSO
    algorithm so that the global omniscience can be attained by multi-stages of SO.

    Optimizing MDS Coded Caching in Wireless Networks with Device-to-Device Communication

    Jesper Pedersen, Alexandre Graell i Amat, Iryna Andriyanova, Fredrik Brännström
    Subjects: Information Theory (cs.IT)

    We consider the caching of content in the mobile devices in a wireless
    network using maximum distance separable (MDS) codes. We focus on a small cell,
    served by a base station (BS), where mobile devices arrive and depart according
    to a Poisson birth-death process. Users requesting a particular file download
    coded symbols from caching devices using device-to-device communication. If
    additional symbols are required to decode the file, these are downloaded from
    the BS. We optimize the MDS codes to minimize the network load under a global
    average cache size constraint. We show that, for most practical scenarios,
    using optimized MDS codes results in significantly lower network load compared
    to when caching the most popular files. Furthermore, our results show that
    caching coded symbols of each file on all mobile devices, i.e., maximal
    spreading, is optimal.

    Distributed Decoding of Convolutional Network Error Correction Codes

    Hengjie Yang, Wangmei Guo
    Comments: the extended version for ISIT 2017
    Subjects: Information Theory (cs.IT)

    A Viterbi-like decoding algorithm is proposed in this paper for generalized
    convolutional network error correction coding. Different from classical Viterbi
    algorithm, our decoding algorithm is based on minimum error weight rather than
    the shortest Hamming distance between received and sent sequences. Network
    errors may disperse or neutralize due to network transmission and convolutional
    network coding. Therefore, classical decoding algorithm cannot be employed any
    more. Source decoding was proposed by multiplying the inverse of network
    transmission matrix, where the inverse is hard to compute. Starting from the
    extit{Maximum A Posteriori (MAP)} decoding criterion, we find that it is
    equivalent to the minimum error weight under our model. Inspired by Viterbi
    algorithm, we propose a Viterbi-like decoding algorithm based on minimum error
    weight of combined error vectors, which can be carried out directly at sink
    nodes and can correct any network errors within the capability of convolutional
    network error correction codes (CNECC). Under certain situations, the proposed
    algorithm can realize the distributed decoding of CNECC.

    Uniprior Index Coding

    Vijaya Kumar Mareedu, Prasad Krishnan
    Comments: 7 pages, Shorter version submitted to International Symposium on Information Theory (ISIT), 2017
    Subjects: Information Theory (cs.IT)

    The index coding problem is a problem of efficient broadcasting with
    side-information. We look at the uniprior index coding problem, in which the
    receivers have disjoint side-information symbols and arbitrary demand sets.
    Previous work has addressed single uniprior index coding, in which each
    receiver has a single unique side-information symbol. Modeling the uniprior
    index coding problem as a extit{supergraph}, we focus on a class of uniprior
    problems defined on extit{generalized cycle} supergraphs. For such problems,
    we prove upper and lower bounds on the optimal broadcast rate. Using a
    connection with Eulerian directed graphs, we also show that the upper and lower
    bounds are equal for a subclass of uniprior problems. We show the NP-hardness
    of finding the lower bound for uniprior problems on generalized cycles.
    Finally, we look at a simple extension of the generalized cycle uniprior class
    for which we give bounds on the optimal rate and show an explicit scheme which
    achieves the upper bound.

    Hybrid RF-mmWave Communications to Achieve Low Latency and High Energy Efficiency in 5G Cellular Systems

    Morteza Hashemi, C. Emre Koksal, Ness B. Shroff
    Comments: 19 pages
    Subjects: Information Theory (cs.IT)

    We propose a hybrid RF/millimeter wave (mmWave) architecture for 5G cellular
    systems. Communication in the mmWave band faces significant challenges due to
    variable channels, intermittent connectivity, and high energy usage. Moreover,
    speeds for electronic processing of data is of the same order as typical rates
    for mmWave interface. Therefore, the use of complex algorithms for tracking
    channel variations and adjusting resources accordingly is practically
    out-of-reach. To alleviate the challenges associated with mmWave
    communications, our proposed architecture integrates the RF and mmWave
    interfaces for beamforming and data transfer, and exploits the spatio-temporal
    correlations between the interfaces. Based on extensive experimentation in
    indoor and outdoor settings, we demonstrate that an integrated RF/mmWave
    signaling and channel estimation scheme can remedy the problem of high energy
    usage and delay associated with digital and analog beamforming, respectively.
    In addition, cooperation between two interfaces at the higher layers
    effectively addresses the high delays caused by highly intermittent
    connectivity in mmWave channels. Subsequently, we formulate an optimal
    scheduling problem over the RF and mmWave interfaces where the goal is to
    maximize the delay-constrained throughput of the mmWave interface. We prove
    using subadditivity analysis that the optimal scheduling policy is based on a
    single threshold that can be easily adopted despite high link variations. We
    design an optimal scheduler that opportunistically schedules the packets over
    the mmWave interface, while the RF link acts as a fallback mechanism to prevent
    high delay.

    On Advisability of Designing Short Length QC-LDPC Codes Using Perfect Difference Families

    A.Tasdighi, M. R. Sadeghi
    Subjects: Information Theory (cs.IT)

    A simple and general definition of quasi cyclic low density parity check (QC
    LDPC) codes which are constructed based on circulant permutation matrices (CPM)
    is proposed. As an special case of this definition, we first represent one type
    of so called combinatorially designed multiple edge protograph codes. The code
    construction is mainly based on perfect difference families (PDF) and is called
    Construction 1. Secondly, using the proposed Construction 1 along with a
    technique named as column dispersion technique (CDT), we design several types
    of multiple-edge CPM QC LDPC codes (codes with Construction 2) in a wide range
    of rates, lengths, girths and minimum distances. Parameters of some of these
    codes are reported in tables. Also included in this paper are the
    multiplicities of short simple cycles of length up to 10 in Tanner graph of our
    constructed codes. Our experimental results for short to moderate length codes
    show that although minimum distances of codes play an important role in
    waterfall region, the higher the number of short simple cycles is, the better
    (sharper) the waterfall is. The performances of many codes such as WiMAX, PEG,
    array, MacKay, algebraic and combinatorial, and also, symmetrical codes have
    compared with our constructed codes. Based on our numerical results and
    regardless of how a code is constructed, those with higher number of short
    simple cycles and higher minimum distances, have better waterfalls.

    On Optimal Spectrum Access of Cognitive Relay With Finite Packet Buffer

    Kedar Kulkarni, Adrish Banerjee
    Comments: Accepted for publication in IEEE Transactions on Vehicular Technology
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    We investigate a cognitive radio system where secondary user (SU) relays
    primary user (PU) packets using two-phase relaying. SU transmits its own
    packets with some access probability in relaying phase using time sharing. PU
    and SU have queues of finite capacity which results in packet loss when the
    queues are full. Utilizing knowledge of relay queue state, SU aims to maximize
    its packet throughput while keeping packet loss probability of PU below a
    threshold. By exploiting structure of the problem, we formulate it as a linear
    program and find optimal access policy of SU. We also propose low complexity
    sub-optimal access policies, namely constant probability transmission and step
    transmission. Numerical results are presented to compare performance of
    proposed methods and study effect of queue sizes on packet throughput.

    Hypothesis Test for Upper Bound on the Size of Random Defective Set

    A.G. D'yachkov, I.V. Vorobyev, N.A. Polyanskii, V.Yu. Shchukin
    Subjects: Information Theory (cs.IT)

    Let (1 le s < t), (N ge 1) be fixed integers and a complex electronic
    circuit of size (t) is said to be an (s)-active, (; s ll t), and can work as
    a system block if not more than (s) elements of the circuit are defective.
    Otherwise, the circuit is said to be an (s)-defective and should be replaced by
    a similar (s)-active circuit. Suppose that there exists a possibility to run
    (N) non-adaptive group tests to check the (s)-activity of the circuit. As
    usual, we say that a (disjunctive) group test yields the positive response if
    the group contains at least one defective element. In this paper, we will
    interpret the unknown set of defective elements as a random set and discuss
    upper bounds on the error probability of the hypothesis test for the null
    hypothesis (left{ H_0 ,:, ext{the circuit is )s(-active}
    ight}) verse
    the alternative hypothesis (left{ H_1 ,:, ext{the circuit is
    )s(-defective}
    ight}). Along with the conventional decoding algorithm based
    on the known random set of positive responses and disjunctive (s)-codes, we
    consider a (T)-weight decision rule which is based on the simple comparison of
    a fixed threshold (T), (1 le T le N – 1), with the known random number of
    positive responses (p), (0 le p le N).

    Online Edge Caching in Fog-Aided Wireless Network

    Seyyed Mohammadreza Azimi, Osvaldo Simeone, Avik Sengupta, Ravi Tandon
    Comments: 10 pages, 5 figures, A shorter version was submitted to ISIT 2017
    Subjects: Information Theory (cs.IT)

    In a Fog Radio Access Network (F-RAN) architecture, edge nodes (ENs), such as
    base stations, are equipped with limited-capacity caches, as well as with
    fronthaul links that can support given transmission rates from a cloud
    processor. Existing information-theoretic analyses of content delivery in
    F-RANs have focused on offline caching with separate content placement and
    delivery phases. In contrast, this work considers an online caching set-up, in
    which the set of popular files is time-varying and both cache replenishment and
    content delivery can take place in each time slot. The analysis is centered on
    the characterization of the long-term Normalized Delivery Time (NDT), which
    captures the temporal dependence of the coding latencies accrued across
    multiple time slots in the high signal-to- noise ratio regime. Online caching
    and delivery schemes based on reactive and proactive caching are investigated,
    and their performance is compared to optimal offline caching schemes both
    analytically and via numerical results.

    Joint secrecy over the K-Transmitter Multiple Access Channel

    Yanling Chen, O. Ozan Koyluoglu, A. J. Han Vinck
    Comments: 6 pages, 1 figure
    Subjects: Information Theory (cs.IT)

    This paper studies the problem of secure communication over a K-transmitter
    multiple access channel in the presence of an external eavesdropper, subject to
    a joint secrecy constraint (i.e., information leakage rate from the collection
    of K messages to an eavesdropper is made vanishing). As a result, we establish
    the joint secrecy achievable rate region. To this end, our results build upon
    two techniques in addition to the standard information-theoretic methods. The
    first is a generalization of Chia-El Gamal’s lemma on entropy bound for a set
    of codewords given partial information. The second is to utilize a compact
    representation of a list of sets that, together with properties of mutual
    information, leads to an efficient Fourier-Motzkin elimination. These two
    approaches could also be of independent interests in other contexts.

    The Optimality of Partial Clique Covering for Index Coding

    Xinping Yi, Giuseppe Caire
    Subjects: Information Theory (cs.IT)

    Partial clique covering is one of the most basic coding schemes for index
    coding problems, generalizing clique and cycle covering on the side information
    digraph and further reducing the achievable broadcast rate. In this paper, we
    start with partition multicast, a special case of partial clique covering with
    cover number 1, and show that partition multicast achieves the optimal
    broadcast rate of the multiple-unicast index coding if and only if the side
    information digraph is partially acyclic. A digraph is said to be partially
    acyclic if its sub-digraph induced by the vertex with maximum in-degree and its
    incoming neighbors in the complementary digraph is acyclic. We further extend
    to the general partial clique covering, offering sufficient conditions of its
    optimality and sub-optimality with the aid of strong connectivity
    decomposition. In addition, for some digraph classes, we also prove that the
    optimal broadcast rate can be approximated by partial clique covering (as well
    as by other basic schemes) within either a constant factor, or a multiplicative
    factor of (O(frac{n}{log n})), or (O(n^epsilon)) for some (epsilon in
    (0,1)).

    Feedback capacity and coding for the BIBO channel with a no-repeated-ones input constraint

    Oron Sabag, Haim H. Permuter, Navin Kashyap
    Subjects: Information Theory (cs.IT)

    In this paper, a general binary-input binary-output (BIBO) channel is
    investigated in the presence of feedback and input constraints. The feedback
    capacity and the optimal input distribution of this setting are calculated for
    the case of an ((1,infty))-RLL input constraint, that is, the input sequence
    contains no consecutive ones. These results are obtained via explicit solution
    of a corresponding dynamic programming optimization problem. A simple coding
    scheme is designed based on the principle of posterior matching, which was
    introduced by Shayevitz and Feder for memoryless channels. The posterior
    matching scheme for our input-constrained setting is shown to achieve capacity
    using two new ideas: extit{message history}, which captures the memory
    embedded in the setting, and extit{message splitting}, which eases the
    analysis of the scheme. Additionally, in the special case of an S-channel, we
    give a very simple zero-error coding scheme that is based on variation of a
    repetition, and is shown to achieve capacity. For the input-constrained BSC, we
    show using our capacity formula that feedback increases capacity when the
    cross-over probability is small.

    The exponential family of Markov chains and its information geometry

    Hiroshi Nagaoka
    Comments: This work was presented in the 28th Symposium on Information Theory and Its Applications (SITA2005) in 2005
    Journal-ref: Proc. of the 28th Symposium on Information Theory and Its
    Applications, 2005 pp. 601-604,
    Subjects: Information Theory (cs.IT)

    We introduce a new definition of exponential family of Markov chains, and
    show that many characteristic properties of the usual exponential family of
    probability distributions are properly extended to Markov chains. The method of
    information geometry is effectively applied to our framework, which enables us
    to characterize the divergence rate of Markov chain from a differential
    geometric viewpoint.

    Polar Coding for Block Fading Channels

    Mengfan Zheng, Meixia Tao, Wen Chen, Cong Ling
    Comments: 5 pages, 2 figures, submitted to ISIT 2017
    Subjects: Information Theory (cs.IT)

    In this paper, we consider the problem of polar coding for block fading
    channels, with emphasis on those with instantaneous channel state information
    (CSI) at neither the transmitter nor the receiver. Our approach is to decompose
    a block fading channel of (T_c) symbols per coherent interval into (T_c)
    binary-input sub-channels in a capacity-preserving way, and design a polar code
    for each of them. For the case when instantaneous CSI is available at the
    receiver, a random interleaver can be used to enable joint encoding and
    decoding of all sub-channels so as to enhance the finite length performance. It
    is shown that our proposed schemes achieve the ergodic capacity of binary-input
    block fading channels under various CSI assumptions.

    Multi-Erasure Locally Recoverable Codes Over Small Fields For Flash Memory Array

    Pengfei Huang, Eitan Yaakobi, Paul H. Siegel
    Comments: Submitted to ISIT 2017
    Subjects: Information Theory (cs.IT)

    Erasure codes play an important role in storage systems to prevent data loss.
    In this work, we study a class of erasure codes called Multi-Erasure Locally
    Recoverable Codes (ME-LRCs) for flash memory array. Compared to previous
    related works, we focus on the construction of ME-LRCs over small fields. We
    first develop upper and lower bounds on the minimum distance of ME-LRCs. These
    bounds explicitly take the field size into account. Our main contribution is to
    propose a general construction of ME-LRCs based on generalized tensor product
    codes, and study their erasure-correcting property. A decoding algorithm
    tailored for erasure recovery is given. We then prove that our construction
    yields optimal ME-LRCs with a wide range of code parameters. Finally, we
    present several families of ME-LRCs over different fields.

    Bounds for Signature Codes

    A.G. Dyachkov, N. Polyanskii, V. Shchukin, I. Vorobyev
    Comments: 5 pages, two columns
    Subjects: Information Theory (cs.IT)

    We discuss upper and lower bounds of zero error capacity for signature coding
    models based on the symmetric noiseless multiple access channel.

    Polynomial-Complexity, Exponentially Consistent Tests for Outlying Sequence Detection

    Yuheng Bu, Shaofeng Zou, Venugopal V. Veeravalli
    Comments: 5 pages submitted to ISIT
    Subjects: Information Theory (cs.IT); Machine Learning (stat.ML)

    We study an outlying sequence detection problem, in which there are (M)
    sequences of samples out of which a small subset of outliers needs to be
    detected. A sequence is considered an outlier if the observations therein are
    generated by a distribution different from those generating the observations in
    the majority of sequences. In the universal setting, the goal is to identify
    all the outliers without any knowledge about the underlying generating
    distributions. In prior work, this problem was studied as a universal
    hypothesis testing problem, and a generalized likelihood (GL) test with high
    computational complexity was constructed and its asymptotic performance
    characterized. we novelly propose a test based on distribution clustering. Such
    a test is shown to be exponentially consistent and the time complexity is
    linear in the total number of sequences. Furthermore, our tests based on
    clustering are applicable to more general scenarios, e.g., when both the
    typical and outlier distributions forming clusters, our new tests is
    exponentially consistent, but the GL test is not even applicable.

    Opportunities for Analog Coding in Emerging Memory Systems

    Jesse H. Engel, S. Burc Eryilmaz, SangBum Kim, Matthew BrightSky, Chung Lam, Hsiang-Lan Lung, Bruno A. Olshausen, H.-S. Philip Wong
    Subjects: Information Theory (cs.IT)

    The exponential growth in data generation and large-scale data analysis
    creates an unprecedented need for inexpensive, low-latency, and high-density
    information storage. This need has motivated significant research into
    multi-level memory systems that can store multiple bits of information per
    device. Although both the memory state of these devices and much of the data
    they store are intrinsically analog-valued, both are quantized for use with
    digital systems and discrete error correcting codes. Using phase change memory
    as a prototypical multi-level storage technology, we herein demonstrate that
    analog-valued devices can achieve higher capacities when paired with analog
    codes. Further, we find that storing analog signals directly through
    joint-coding can achieve low distortion with reduced coding complexity. By
    jointly optimizing for signal statistics, device statistics, and a distortion
    metric, finite-length analog encodings can perform comparable to digital
    systems with asymptotically infinite large encodings. These results show that
    end-to-end analog memory systems have not only the potential to reach higher
    storage capacities than discrete systems, but also to significantly lower
    coding complexity, leading to faster and more energy efficient storage.

    On the Capacity for Distributed Index Coding

    Yucheng Liu, Parastoo Sadeghi, Fatemeh Arbabjolfaei, Young-Han Kim
    Subjects: Information Theory (cs.IT)

    The distributed index coding problem is studied, whereby multiple messages
    are stored at different servers to be broadcast to receivers with side
    information. First, the existing composite coding scheme is enhanced for the
    centralized (single-server) index coding problem, which is then merged with
    fractional partitioning of servers to yield a new coding scheme for distributed
    index coding. New outer bounds on the capacity region are also established. For
    213 out of 218 non-isomorphic distributed index coding problems with four
    messages the achievable sum-rate of the proposed distributed composite coding
    scheme matches the outer bound, thus establishing the sum-capacity for these
    problems.

    Bounds and Constructions for Linear Locally Repairable Codes over Binary Fields

    Anyu Wang, Zhifang Zhang, Dongdai Lin
    Comments: 7 pages, 2 figures
    Subjects: Information Theory (cs.IT)

    We present two new upper bounds on the dimension (k) for binary linear
    locally repairable codes (LRCs). The first one is an explicit bound for binary
    linear LRCs with disjoint repair groups, which can be regarded as an extension
    of known explicit bounds for such LRCs to more general code parameters. The
    second one is based on an optimization problem, and can be applied to any
    binary linear LRC. By simplifying the second bound for (dge5), we obtain an
    explicit bound which can outperform the Cadambe-Mazumdar bound for (5 le d le
    8) and large (n). Lastly, we give a new construction of binary linear LRCs with
    (d ge 6), and prove their optimality with respect to the simplified bound.

    Asynchronous Physical-layer Network Coding: Symbol Misalignment Estimation and Its Effect on Decoding

    Yulin Shao, Soung Chang Liew, Lu Lu
    Comments: 39 pages, full length version
    Subjects: Information Theory (cs.IT)

    In asynchronous physical-layer network coding (APNC) systems, the symbols
    from multiple transmitters to a common receiver may be misaligned. The
    knowledge of the amount of symbol misalignment, hence its estimation, is
    important to PNC decoding. This paper addresses the problem of
    symbol-misalignment estimation and the problem of optimal PNC decoding given
    the misalignment estimate, assuming the APNC system uses the root-raised-cosine
    pulse to carry signals (RRC-APNC). First, we put forth an optimal
    symbol-misalignment estimator that makes use of double baud-rate samples. Then,
    we devise optimal decoders for RRC-APNC in the presence of inaccurate
    symbol-misalignment estimates. In particular, we present a new whitening
    transformation to whiten the noise of the double baud-rate samples. Finally, we
    investigate the decoding performance of various estimation-and-decoding schemes
    for RRC-APNC. Extensive simulations show that: (i) Our double baud-rate
    estimator yields substantially more accurate symbol-misalignment estimates than
    the baud-rate estimator does. The mean-square-error (MSE) gains are up to 8 dB.
    (ii) An overall estimation-and-decoding scheme in which both estimation and
    decoding are based on double baud-rate samples yields much better performance
    than other schemes. Compared with a scheme in which both estimation and
    decoding are based on baud-rate samples), the double baud-rate sampling scheme
    yields 4.5 dB gains on symbol error rate (SER) performance in an AWGN channel,
    and 2 dB gains on packet error rate (PER) performance in a Rayleigh fading
    channel.

    An infinite family of Steiner systems (S(2, 4, 2^m)) from cyclic codes

    Cunsheng Ding
    Comments: arXiv admin note: text overlap with arXiv:1605.03796
    Subjects: Information Theory (cs.IT)

    Steiner systems are a fascinating topic of combinatorics. The most studied
    Steiner systems are (S(2, 3, v)) (Steiner triple systems), (S(3, 4, v))
    (Steiner quadruple systems), and (S(2, 4, v)). There are a few infinite
    families of Steiner systems (S(2, 4, v)) in the literature. The objective of
    this paper is to present an infinite family of Steiner systems (S(2, 4, 2^m))
    for all (m equiv 2 pmod{4} geq 6) from cyclic codes. This may be the first
    coding-theoretic construction of an infinite family of Steiner systems (S(2, 4,
    v)). As a by-product, many infinite families of (2)-designs are also reported
    in this paper.

    Polar Coding for Achieving the Capacity of Marginal Channels in Nonbinary-Input Setting

    Amirsina Torfi, Sobhan Soleymani, Seyed Mehdi Iranmanesh, Hadi Kazemi, Rouzbeh Asghari Shirvani, Vahid Tabataba Vakili
    Comments: Accepted to be published in “51th Conference on Information Sciences and Systems”, Baltimore, Maryland
    Subjects: Information Theory (cs.IT); Computational Complexity (cs.CC); Cryptography and Security (cs.CR)

    Achieving information-theoretic security using explicit coding scheme in
    which unlimited computational power for eavesdropper is assumed, is one of the
    main topics is security consideration. It is shown that polar codes are
    capacity achieving codes and have a low complexity in encoding and decoding. It
    has been proven that polar codes reach to secrecy capacity in the binary-input
    wiretap channels in symmetric settings for which the wiretapper’s channel is
    degraded with respect to the main channel. The first task of this paper is to
    propose a coding scheme to achieve secrecy capacity in asymmetric
    nonbinary-input channels while keeping reliability and security conditions
    satisfied. Our assumption is that the wiretap channel is stochastically
    degraded with respect to the main channel and message distribution is
    unspecified. The main idea is to send information set over good channels for
    Bob and bad channels for Eve and send random symbols for channels that are good
    for both. In this scheme the frozen vector is defined over all possible choices
    using polar codes ensemble concept. We proved that there exists a frozen vector
    for which the coding scheme satisfies reliability and security conditions. It
    is further shown that uniform distribution of the message is the necessary
    condition for achieving secrecy capacity.

    Estimation of RFID Tag Population Size by Gaussian Estimator

    Md Mahmudul Hasan, Shuangqing Wei, Ramachandran Vaidyanathan
    Comments: 7 pages, 6 figures, this paper is going to be submitted to IEEE International Symposium on Information Theory (ISIT) 2017
    Subjects: Information Theory (cs.IT)

    In this paper we propose a novel approach to estimating RFID tag population
    size. Unlike all previous {0,1} estimation schemes, we analysed our scheme
    under {0,1,e} and presented results under both {0,1} and {0,1,e} channel
    models. Under both the channel models we used a well justified Gaussian
    estimator for estimation. We have named our algorithm “Gaussian Estimation of
    RFID Tags,” namely, GERT. The most prominent feature of GERT is the quality
    with which it estimates a tag population size. We supported all the required
    approximations with detailed analytical work and counted for all the
    approximation errors when we considered the overall quality of the estimation.
    GERT is shown to be more cost effective than the previous methods suggested so
    far, under the same performance constraints.

    Representations of the Multicast Network Problem

    Sarah E. Anderson, Wael Halbawi, Nathan Kaplan, Hiram H. López, Felice Manganiello, Emina Soljanin, Judy Walker
    Comments: 24 pages, 19 figures
    Subjects: Information Theory (cs.IT)

    We approach the problem of linear network coding for multicast networks from
    different perspectives. We introduce the notion of the coding points of a
    network, which are edges of the network where messages combine and coding
    occurs. We give an integer linear program that leads to choices of paths
    through the network that minimize the number of coding points. We introduce the
    code graph of a network, a simplified directed graph that maintains the
    information essential to understanding the coding properties of the network.
    One of the main problems in network coding is to understand when the capacity
    of a multicast network is achieved with linear network coding over a finite
    field of size q. We explain how this problem can be interpreted in terms of
    rational points on certain algebraic varieties.

    Neural Offset Min-Sum Decoding

    Loren Lugosch, Warren J. Gross
    Subjects: Information Theory (cs.IT)

    Recently, it was shown that if multiplicative weights are assigned to the
    edges of a Tanner graph used in belief propagation decoding, it is possible to
    use deep learning techniques to find values for the weights which improve the
    error-correction performance of the decoder. Unfortunately, this approach
    requires many multiplications, which are generally expensive operations. In
    this paper, we suggest a more hardware-friendly approach in which offset
    min-sum decoding is augmented with learnable offset parameters. Our method uses
    no multiplications and has a parameter count less than half that of the
    multiplicative algorithm. This both speeds up training and provides a feasible
    path to hardware architectures. After describing our method, we compare the
    performance of the two neural decoding algorithms and show that our method
    achieves error-correction performance within 0.1 dB of the multiplicative
    approach and as much as 1 dB better than traditional belief propagation for the
    codes under consideration.

    Co-design of jump estimators and transmission policies for wireless multi-hop networks with fading channels [Extended]

    D. Dolz, D. E. Quevedo, I. Peñarrocha, A. S. Leong, R. Sanchis
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)

    We study transmission power budget minimization of battery-powered nodes in a
    remote state estimation problem over multi-hop wireless networks. Communication
    links between nodes are subject to fading, thereby generating random dropouts.
    Relay nodes help to transmit measurements from distributed sensors to an
    estimator node. Hopping through each relay node introduces a unit delay.
    Motivated by the need for estimators with low computational and implementation
    cost, we propose a jump estimator whose modes depend on a Markovian parameter
    that describes measurement transmission outcomes over a finite interval. It is
    well known that transmission power helps to increase the reliability of
    measurement transmissions, at the expense of reducing the life-time of the
    nodes’ battery. Motivated by this, we derive a tractable iterative procedure,
    based on semi-definite programming, to design a finite set of filter gains, and
    associated power control laws to minimize the energy budget while guaranteeing
    an estimation performance level. This procedure allows us to tradeoff the
    complexity of the filter implementation with performance and energy use.

    Recent progress on conditional randomness

    Hayato Takahashi
    Comments: Presented at Probability Symposium RIMS (2016 Dec.), and Ergodic theory and related area (Tsukuba Univ. 2016 Nov.)
    Subjects: Logic (math.LO); Information Theory (cs.IT)

    In this article, recent progress on ML-randomness with respect to conditional
    probabilities is reviewed. In particular a new result of conditional randomness
    with respect to mutually singular probabilities are shown, which is a
    generalization of Hanssen’s result (2010) for Bernoulli processes.

    Distributed Clustering for Multiuser Networks through Coalition Formation

    Rami Mochaourab, Eduard Jorswieck, Mats Bengtsson
    Comments: 10 pages, 3 figures, submitted to IEEE Signal Processing Magazine [Lecture Notes]
    Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)

    Clustering mechanisms are essential in certain multiuser networks for
    achieving efficient resource utilization. This lecture note presents the theory
    of coalition formation as a useful tool for distributed clustering problems. We
    reveal the generality of the theory, and study complexity aspects which must be
    considered in multiuser networks.

    Effective and Extensible Feature Extraction Method Using Genetic Algorithm-Based Frequency-Domain Feature Search for Epileptic EEG Multi-classification

    Tingxi Wen, Zhongnan Zhang
    Comments: 17 pages, 9 figures
    Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

    In this paper, a genetic algorithm-based frequency-domain feature search
    (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of
    epilepsy. In this method, frequency-domain features are first searched and then
    combined with nonlinear features. Subsequently, these features are selected and
    optimized to classify EEG signals. The extracted features are analyzed
    experimentally. The features extracted by GAFDS show remarkable independence,
    and they are superior to the nonlinear features in terms of the ratio of
    inter-class distance and intra-class distance. Moreover, the proposed feature
    search method can additionally search for features of instantaneous frequency
    in a signal after Hilbert transformation. The classification results achieved
    using these features are reasonable, thus, GAFDS exhibits good extensibility.
    Multiple classic classifiers (i.e., (k)-nearest neighbor, linear discriminant
    analysis, decision tree, AdaBoost, multilayer perceptron, and Na”ive Bayes)
    achieve good results by using the features generated by GAFDS method and the
    optimized selection. Specifically, the accuracies for the two-classification
    and three-classification problems may reach up to 99% and 97%, respectively.
    Results of several cross-validation experiments illustrate that GAFDS is
    effective in feature extraction for EEG classification. Therefore, the proposed
    feature selection and optimization model can improve classification accuracy.

    Structure of optimal strategies for remote estimation over Gilbert-Elliott channel with feedback

    Jhelum Chakravorty, Aditya Mahajan
    Comments: 9 pages
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Probability (math.PR)

    We investigate remote estimation over a Gilbert-Elliot channel with feedback.
    We assume that the channel state is observed by the receiver and fed back to
    the transmitter with one unit delay. In addition, the transmitter gets ACK/NACK
    feedback for successful/unsuccessful transmission. Using ideas from team
    theory, we establish the structure of optimal transmission and estimation
    strategies and identify a dynamic program to determine optimal strategies with
    that structure. We then consider first-order autoregressive sources where the
    noise process has unimodal and symmetric distribution. Using ideas from
    majorization theory, we show that the optimal transmission strategy has a
    threshold structure and the optimal estimation strategy is Kalman-like.




沪ICP备19023445号-2号
友情链接