IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Mon, 3 Oct 2016

    我爱机器学习(52ml.net)发表于 2016-10-03 00:00:00
    love 0

    Computer Vision and Pattern Recognition

    Latent fingerprint minutia extraction using fully convolutional network

    Yao Tang, Fei Gao, Jufu Feng
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Minutiae have played an important role in fingerprint identification.
    Extracting effective minutiae is difficult for latent fingerprints which are
    usually of poor quality. Instead of conventional hand-craft features, a fully
    convolutional network(FCN) is learned end-to-end to extract minutiae from raw
    fingerprints in pixel level. FCN is used to map raw fingerprints to a
    correspondingly-sized minutia-score map with a fixed stride. And thus a large
    number of minutiae will be extracted through a given thresh. Then small regions
    centering at these minutia points are put into a convolutional neural
    network(CNN) to reclassify these minutiae and calculate their orientations. The
    CNN shares the convolutional layers with the fully convolutional network to
    speed up. For the VGG model~cite{simonyan2014very}, 0.45 second is used on
    average to detect one fingerprint on a GPU. On the NIST SD27 database we
    achieve 53\% recall rate and 53\% precise rate that beats many other
    algorithms. Our trained model is also visualized to see that we have
    successfully extracted features preserving ridge information of a latent
    fingerprint.

    A deep representation for depth images from synthetic data

    Fabio Maria Carlucci, Paolo Russo, Barbara Caputo
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional Neural Networks (CNNs) trained on large scale RGB databases
    have become the secret sauce in the majority of recent approaches for object
    categorization from RGB-D data. Thanks to colorization techniques, these
    methods exploit the filters learned from 2D images to extract meaningful
    representations in 2.5D. Still, the perceptual signature of these two kind of
    images is very different, with the first usually strongly characterized by
    textures, and the second mostly by silhouettes of objects. Ideally, one would
    like to have two CNNs, one for RGB and one for depth, each trained on a
    suitable data collection, able to capture the perceptual properties of each
    channel for the task at hand. This has not been possible so far, due to the
    lack of a suitable depth database. This paper addresses this issue, proposing
    to opt for synthetically generated images rather than collecting by hand a 2.5D
    large scale database. While being clearly a proxy for real data, synthetic
    images allow to trade quality for quantity, making it possible to generate a
    virtually infinite amount of data. We show that the filters learned from such
    data collection, using the very same architecture typically used on visual
    data, learns very different filters, resulting in depth features (a) able to
    better characterize the different facets of depth images, and (b) complementary
    with respect to those derived from CNNs pre-trained on 2D datasets. Experiments
    on two publicly available databases show the power of our approach.

    Training a Feedback Loop for Hand Pose Estimation

    Markus Oberweger, Paul Wohlhart, Vincent Lepetit
    Comments: Presented at ICCV 2015 (oral)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose an entirely data-driven approach to estimating the 3D pose of a
    hand given a depth image. We show that we can correct the mistakes made by a
    Convolutional Neural Network trained to predict an estimate of the 3D pose by
    using a feedback loop. The components of this feedback loop are also Deep
    Networks, optimized using training data. They remove the need for fitting a 3D
    model to the input data, which requires both a carefully designed fitting
    function and algorithm. We show that our approach outperforms state-of-the-art
    methods, and is efficient as our implementation runs at over 400 fps on a
    single GPU.

    Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

    Roberto DiCecco, Griffin Lacey, Jasmina Vasiljevic, Paul Chow, Graham Taylor, Shawki Areibi
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

    Convolutional Neural Networks (CNNs) have gained significant traction in the
    field of machine learning, particularly due to their high accuracy in visual
    recognition. Recent works have pushed the performance of GPU implementations of
    CNNs to significantly improve their classification and training times. With
    these improvements, many frameworks have become available for implementing CNNs
    on both CPUs and GPUs, with no support for FPGA implementations. In this work
    we present a modified version of the popular CNN framework Caffe, with FPGA
    support. This allows for classification using CNN models and specialized FPGA
    implementations with the flexibility of reprogramming the device when
    necessary, seamless memory transactions between host and device, simple-to-use
    test benches, and the ability to create pipelined layer implementations. To
    validate the framework, we use the Xilinx SDAccel environment to implement an
    FPGA-based Winograd convolution engine and show that the FPGA layer can be used
    alongside other layers running on a host processor to run several popular CNNs
    (AlexNet, GoogleNet, VGG A, Overfeat). The results show that our framework
    achieves 50 GFLOPS across 3×3 convolutions in the benchmarks. This is achieved
    within a practical framework, which will aid in future development of
    FPGA-based CNNs.

    A CNN Cascade for Landmark Guided Semantic Part Segmentation

    Aaron Jackson, Michel Valstar, Georgios Tzimiropoulos
    Comments: accepted to Geometry Meets Deep Learning ECCV 2016 Workshop
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes a CNN cascade for semantic part segmentation guided by
    pose-specific information encoded in terms of a set of landmarks (or
    keypoints). There is large amount of prior work on each of these tasks
    separately, yet, to the best of our knowledge, this is the first time in
    literature that the interplay between pose estimation and semantic part
    segmentation is investigated. To address this limitation of prior work, in this
    paper, we propose a CNN cascade of tasks that firstly performs landmark
    localisation and then uses this information as input for guiding semantic part
    segmentation. We applied our architecture to the problem of facial part
    segmentation and report large performance improvement over the standard
    unguided network on the most challenging face datasets. Testing code and models
    will be published online at this http URL

    Two-stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge

    Adrian Bulat, Georgios Tzimiropoulos
    Comments: Winner of 3D Face Alignment in the Wild (3DFAW) Challenge, ECCV 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper describes our submission to the 1st 3D Face Alignment in the Wild
    (3DFAW) Challenge. Our method builds upon the idea of convolutional part
    heatmap regression [1], extending it for 3D face alignment. Our method
    decomposes the problem into two parts: (a) X,Y (2D) estimation and (b) Z
    (depth) estimation. At the first stage, our method estimates the X,Y
    coordinates of the facial landmarks by producing a set of 2D heatmaps, one for
    each landmark, using convolutional part heatmap regression. Then, these
    heatmaps, alongside the input RGB image, are used as input to a very deep
    subnetwork trained via residual learning for regressing the Z coordinate. Our
    method ranked 1st in the 3DFAW Challenge, surpassing the second best result by
    more than 22%.

    Digitizing Municipal Street Inspections Using Computer Vision

    Varun Adibhatla (ARGO Labs), Shi Fan (NYU Center for Data Science), Krystof Litomisky (ARGO Labs), Patrick Atwater (ARGO Labs)
    Comments: Presented at the Data For Good Exchange 2016
    Subjects: Computers and Society (cs.CY); Computer Vision and Pattern Recognition (cs.CV)

    “People want an authority to tell them how to value things. But they chose
    this authority not based on facts or results. They chose it because it seems
    authoritative and familiar.” – The Big Short

    The pavement condition index is one such a familiar measure used by many US
    cities to measure street quality and justify billions of dollars spent every
    year on street repair. These billion-dollar decisions are based on evaluation
    criteria that are subjective and not representative. In this paper, we build
    upon our initial submission to D4GX 2015 that approaches this problem of
    information asymmetry in municipal decision-making.

    We describe a process to identify street-defects using computer vision
    techniques on data collected using the Street Quality Identification Device
    (SQUID). A User Interface to host a large quantity of image data towards
    digitizing the street inspection process and enabling actionable intelligence
    for a core public service is also described. This approach of combining device,
    data and decision-making around street repair enables cities make targeted
    decisions about street repair and could lead to an anticipatory response which
    can result in significant cost savings. Lastly, we share lessons learnt from
    the deployment of SQUID in the city of Syracuse, NY.

    Multi-dimensional signal approximation with sparse structured priors using split Bregman iterations

    Yoann Isaac, Quentin Barthélemy, Cédric Gouy-Pailler, Michèle Sebag, Jamal Atif
    Subjects: Data Structures and Algorithms (cs.DS); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    This paper addresses the structurally-constrained sparse decomposition of
    multi-dimensional signals onto overcomplete families of vectors, called
    dictionaries. The contribution of the paper is threefold. Firstly, a generic
    spatio-temporal regularization term is designed and used together with the
    standard $ell_1$ regularization term to enforce a sparse decomposition
    preserving the spatio-temporal structure of the signal. Secondly, an
    optimization algorithm based on the split Bregman approach is proposed to
    handle the associated optimization problem, and its convergence is analyzed.
    Our well-founded approach yields same accuracy as the other algorithms at the
    state-of-the-art, with significant gains in terms of convergence speed.
    Thirdly, the empirical validation of the approach on artificial and real-world
    problems demonstrates the generality and effectiveness of the method. On
    artificial problems, the proposed regularization subsumes the Total Variation
    minimization and recovers the expected decomposition. On the real-world problem
    of electro-encephalography brainwave decomposition, the approach outperforms
    similar approaches in terms of P300 evoked potentials detection, using
    structured spatial priors to guide the decomposition.


    Artificial Intelligence

    Technical Report: Graph-Structured Sparse Optimization for Connected Subgraph Detection

    Baojian Zhou, Feng Chen
    Comments: 11 pages in 2016 IEEE International Conference of Data Mining
    Subjects: Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)

    Structured sparse optimization is an important and challenging problem for
    analyzing high-dimensional data in a variety of applications such as
    bioinformatics, medical imaging, social networks, and astronomy. Although a
    number of structured sparsity models have been explored, such as trees, groups,
    clusters, and paths, connected subgraphs have been rarely explored in the
    current literature. One of the main technical challenges is that there is no
    structured sparsity-inducing norm that can directly model the space of
    connected subgraphs, and there is no exact implementation of a projection
    oracle for connected subgraphs due to its NP-hardness. In this paper, we
    explore efficient approximate projection oracles for connected subgraphs, and
    propose two new efficient algorithms, namely, Graph-IHT and Graph-GHTP, to
    optimize a generic nonlinear objective function subject to connectivity
    constraint on the support of the variables. Our proposed algorithms enjoy
    strong guarantees analogous to several current methods for sparsity-constrained
    optimization, such as Projected Gradient Descent (PGD), Approximate Model
    Iterative Hard Thresholding (AM-IHT), and Gradient Hard Thresholding Pursuit
    (GHTP) with respect to convergence rate and approximation accuracy. We apply
    our proposed algorithms to optimize several well-known graph scan statistics in
    several applications of connected subgraph detection as a case study, and the
    experimental results demonstrate that our proposed algorithms outperform
    state-of-the-art methods.

    Characterization of experts in crowdsourcing platforms

    Amal Ben Rjab (LARODEC, DRUID), Mouloud Kharoune (DRUID), Zoltan Miklos (DRUID), Arnaud Martin (DRUID)
    Comments: in The 4th International Conference on Belief Functions, Sep 2016, Prague, Czech Republic
    Subjects: Artificial Intelligence (cs.AI)

    Crowdsourcing platforms enable to propose simple human intelligence tasks to
    a large number of participants who realise these tasks. The workers often
    receive a small amount of money or the platforms include some other incentive
    mechanisms, for example they can increase the workers reputation score, if they
    complete the tasks correctly. We address the problem of identifying experts
    among participants, that is, workers, who tend to answer the questions
    correctly. Knowing who are the reliable workers could improve the quality of
    knowledge one can extract from responses. As opposed to other works in the
    literature, we assume that participants can give partial or incomplete
    responses, in case they are not sure that their answers are correct. We model
    such partial or incomplete responses with the help of belief functions, and we
    derive a measure that characterizes the expertise level of each participant.
    This measure is based on precise and exactitude degrees that represent two
    parts of the expertise level. The precision degree reflects the reliability
    level of the participants and the exactitude degree reflects the knowledge
    level of the participants. We also analyze our model through simulation and
    demonstrate that our richer model can lead to more reliable identification of
    experts.

    Structured Inference Networks for Nonlinear State Space Models

    Rahul G. Krishnan, Uri Shalit, David Sontag
    Comments: Main paper: 13 pages, 12 Figures
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Gaussian state space models have been used for decades as generative models
    of sequential data. They admit an intuitive probabilistic interpretation, have
    a simple functional form, and enjoy widespread adoption. We introduce a unified
    algorithm to efficiently learn a broad class of linear and non-linear state
    space models, including variants where the emission and transition
    distributions are modeled by deep neural networks. Our learning algorithm
    simultaneously learns a compiled inference network and the generative model,
    leveraging a structured variational approximation parameterized by recurrent
    neural networks to mimic the posterior distribution. We apply the learning
    algorithm to both synthetic and real-world datasets, demonstrating its
    scalability and versatility. We find that using the structured approximation to
    the posterior results in models with significantly higher held-out likelihood.


    Computation and Language

    Referential Uncertainty and Word Learning in High-dimensional, Continuous Meaning Spaces

    Michael Spranger, Katrien Beuls
    Comments: Published as Spranger, M. and Beuls, K. (2016). Referential uncertainty and word learning in high-dimensional, continuous meaning spaces. In Hafner, V. and Pitti, A., editors, Development and Learning and Epigenetic Robotics (ICDL-Epirob), 2016 Joint IEEE International Conferences on, 2016. IEEE
    Subjects: Computation and Language (cs.CL)

    This paper discusses lexicon word learning in high-dimensional meaning spaces
    from the viewpoint of referential uncertainty. We investigate various
    state-of-the-art Machine Learning algorithms and discuss the impact of scaling,
    representation and meaning space structure. We demonstrate that current Machine
    Learning techniques successfully deal with high-dimensional meaning spaces. In
    particular, we show that exponentially increasing dimensions linearly impact
    learner performance and that referential uncertainty from word sensitivity has
    no impact.

    Controlling Output Length in Neural Encoder-Decoders

    Yuta Kikuchi, Graham Neubig, Ryohei Sasano, Hiroya Takamura, Manabu Okumura
    Comments: 11 pages. To appear in EMNLP 2016
    Subjects: Computation and Language (cs.CL)

    Neural encoder-decoder models have shown great success in many sequence
    generation tasks. However, previous work has not investigated situations in
    which we would like to control the length of encoder-decoder outputs. This
    capability is crucial for applications such as text summarization, in which we
    have to generate concise summaries with a desired length. In this paper, we
    propose methods for controlling the output sequence length for neural
    encoder-decoder models: two decoding-based methods and two learning-based
    methods. Results show that our learning-based methods have the capability to
    control length without degrading summary quality in a summarization task.


    Distributed, Parallel, and Cluster Computing

    Sprout: A functional caching approach to minimize service latency in erasure-coded storage

    Vaneet Aggarwal, Yih-Farn R. Chen, Tian Lan, Yu Xiang
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

    Modern distributed storage systems often use erasure codes to protect against
    disk and node failures to increase reliability, while trying to meet the
    latency requirements of the applications and clients. Storage systems may have
    caches at the proxy or client ends in order to reduce the latency. In this
    paper, we consider a novel caching framework with erasure code called {em
    functional caching}. Functional Caching involves using erasure-coded chunks in
    the cache such that the code formed by the chunks in storage nodes and cache
    combined are maximal-distance-separable (MDS) erasure codes. Based on the
    arrival rates of different files, placement of file chunks on the servers, and
    service time distribution of storage servers, an optimal functional caching
    placement and the access probabilities of the file request from different disks
    are considered. The proposed algorithm gives significant latency improvement in
    both simulations and a prototyped solution in an open-source, cloud storage
    deployment.

    ERA Revisited: Theoretical and Experimental Evaluation

    Matevž Jekovec, Andrej Brodnik
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    Efficient construction of the suffix tree given an input text is an active
    area of research from the time it was first introduced. Both theoretical
    computer scientists and engineers tackled the problem. In this paper we focus
    on the fastest practical suffix tree construction algorithm to date, ERA. We
    first provide a theoretical analysis of the algorithm assuming the uniformly
    random text as an input and using the PEM model of computation with respect to
    the lower bounds. Secondly, we empirically confirm the theoretical results in
    different test scenarios exposing the critical terms. Thirdly, we discuss the
    fundamental characteristics of the input text where the fastest suffix tree
    construction algorithms in practice fail. This paper serves as a foundation for
    further research in the parallel text indexing area.

    On the Worst-case Communication Overhead for Distributed Data Shuffling

    Mohamed Attia, Ravi Tandon
    Comments: To appear in Allerton 2016
    Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    Distributed learning platforms for processing large scale data-sets are
    becoming increasingly prevalent. In typical distributed implementations, a
    centralized master node breaks the data-set into smaller batches for parallel
    processing across distributed workers to achieve speed-up and efficiency.
    Several computational tasks are of sequential nature, and involve multiple
    passes over the data. At each iteration over the data, it is common practice to
    randomly re-shuffle the data at the master node, assigning different batches
    for each worker to process. This random re-shuffling operation comes at the
    cost of extra communication overhead, since at each shuffle, new data points
    need to be delivered to the distributed workers.

    In this paper, we focus on characterizing the information theoretically
    optimal communication overhead for the distributed data shuffling problem. We
    propose a novel coded data delivery scheme for the case of no excess storage,
    where every worker can only store the assigned data batches under processing.
    Our scheme exploits a new type of coding opportunity and is applicable to any
    arbitrary shuffle, and for any number of workers. We also present an
    information theoretic lower bound on the minimum communication overhead for
    data shuffling, and show that the proposed scheme matches this lower bound for
    the worst-case communication overhead.

    Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

    Roberto DiCecco, Griffin Lacey, Jasmina Vasiljevic, Paul Chow, Graham Taylor, Shawki Areibi
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

    Convolutional Neural Networks (CNNs) have gained significant traction in the
    field of machine learning, particularly due to their high accuracy in visual
    recognition. Recent works have pushed the performance of GPU implementations of
    CNNs to significantly improve their classification and training times. With
    these improvements, many frameworks have become available for implementing CNNs
    on both CPUs and GPUs, with no support for FPGA implementations. In this work
    we present a modified version of the popular CNN framework Caffe, with FPGA
    support. This allows for classification using CNN models and specialized FPGA
    implementations with the flexibility of reprogramming the device when
    necessary, seamless memory transactions between host and device, simple-to-use
    test benches, and the ability to create pipelined layer implementations. To
    validate the framework, we use the Xilinx SDAccel environment to implement an
    FPGA-based Winograd convolution engine and show that the FPGA layer can be used
    alongside other layers running on a host processor to run several popular CNNs
    (AlexNet, GoogleNet, VGG A, Overfeat). The results show that our framework
    achieves 50 GFLOPS across 3×3 convolutions in the benchmarks. This is achieved
    within a practical framework, which will aid in future development of
    FPGA-based CNNs.

    Asynchronous Multi-Task Learning

    Inci M. Baytas, Ming Yan, Anil K. Jain, Jiayu Zhou
    Comments: IEEE International Conference on Data Mining (ICDM) 2016
    Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

    Many real-world machine learning applications involve several learning tasks
    which are inter-related. For example, in healthcare domain, we need to learn a
    predictive model of a certain disease for many hospitals. The models for each
    hospital may be different because of the inherent differences in the
    distributions of the patient populations. However, the models are also closely
    related because of the nature of the learning tasks modeling the same disease.
    By simultaneously learning all the tasks, multi-task learning (MTL) paradigm
    performs inductive knowledge transfer among tasks to improve the generalization
    performance. When datasets for the learning tasks are stored at different
    locations, it may not always be feasible to transfer the data to provide a
    data-centralized computing environment due to various practical issues such as
    high data volume and privacy. In this paper, we propose a principled MTL
    framework for distributed and asynchronous optimization to address the
    aforementioned challenges. In our framework, gradient update does not wait for
    collecting the gradient information from all the tasks. Therefore, the proposed
    method is very efficient when the communication delay is too high for some task
    nodes. We show that many regularized MTL formulations can benefit from this
    framework, including the low-rank MTL for shared subspace learning. Empirical
    studies on both synthetic and real-world datasets demonstrate the efficiency
    and effectiveness of the proposed framework.


    Learning

    Predicting the consequence of action in digital control state spaces

    Emmanuel Daucé
    Subjects: Learning (cs.LG); Systems and Control (cs.SY)

    The objective of this dissertation is to shed light on some fundamental
    impediments in learning control laws in continuous state spaces. In particular,
    if one wants to build artificial devices capable to learn motor tasks the same
    way they learn to classify signals and images, one needs to establish control
    rules that do not necessitate comparisons between quantities of the surrounding
    space. We propose, in that context, to take inspiration from the “end effector
    control” principle, as suggested by neuroscience studies, as opposed to the
    “displacement control” principle used in the classical control theory.

    Asynchronous Multi-Task Learning

    Inci M. Baytas, Ming Yan, Anil K. Jain, Jiayu Zhou
    Comments: IEEE International Conference on Data Mining (ICDM) 2016
    Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

    Many real-world machine learning applications involve several learning tasks
    which are inter-related. For example, in healthcare domain, we need to learn a
    predictive model of a certain disease for many hospitals. The models for each
    hospital may be different because of the inherent differences in the
    distributions of the patient populations. However, the models are also closely
    related because of the nature of the learning tasks modeling the same disease.
    By simultaneously learning all the tasks, multi-task learning (MTL) paradigm
    performs inductive knowledge transfer among tasks to improve the generalization
    performance. When datasets for the learning tasks are stored at different
    locations, it may not always be feasible to transfer the data to provide a
    data-centralized computing environment due to various practical issues such as
    high data volume and privacy. In this paper, we propose a principled MTL
    framework for distributed and asynchronous optimization to address the
    aforementioned challenges. In our framework, gradient update does not wait for
    collecting the gradient information from all the tasks. Therefore, the proposed
    method is very efficient when the communication delay is too high for some task
    nodes. We show that many regularized MTL formulations can benefit from this
    framework, including the low-rank MTL for shared subspace learning. Empirical
    studies on both synthetic and real-world datasets demonstrate the efficiency
    and effectiveness of the proposed framework.

    Algorithms for item categorization based on ordinal ranking data

    Josh Girson, Shuchin Aeron
    Comments: To appear in IEEE Allerton conference on computing, communications and control, 2016
    Subjects: Learning (cs.LG); Social and Information Networks (cs.SI)

    We present a new method for identifying the latent categorization of items
    based on their rankings. Complimenting a recent work that uses a Dirichlet
    prior on preference vectors and variational inference, we show that this
    problem can be effectively dealt with using existing community detection
    algorithms, with the communities corresponding to item categories. In
    particular we convert the bipartite ranking data to a unipartite graph of item
    affinities, and apply community detection algorithms. In this context we modify
    an existing algorithm – namely the label propagation algorithm to a variant
    that uses the distance between the nodes for weighting the label propagation –
    to identify the categories. We propose and analyze a synthetic ordinal ranking
    model and show its relation to the recently much studied stochastic block
    model. We test our algorithms on synthetic data and compare performance with
    several popular community detection algorithms. We also test the method on real
    data sets of movie categorization from the Movie Lens database. In all of the
    cases our algorithm is able to identify the categories for a suitable choice of
    tuning parameter.

    Charged Point Normalization: An Efficient Solution to the Saddle Point Problem

    Armen Aghajanyan
    Subjects: Learning (cs.LG)

    Recently, the problem of local minima in very high dimensional non-convex
    optimization has been challenged and the problem of saddle points has been
    introduced. This paper introduces a dynamic type of normalization that forces
    the system to escape saddle points. Unlike other saddle point escaping
    algorithms, second order information is not utilized, and the system can be
    trained with an arbitrary gradient descent learner. The system drastically
    improves learning in a range of deep neural networks on various data-sets in
    comparison to non-CPN neural networks.

    Structured Inference Networks for Nonlinear State Space Models

    Rahul G. Krishnan, Uri Shalit, David Sontag
    Comments: Main paper: 13 pages, 12 Figures
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Gaussian state space models have been used for decades as generative models
    of sequential data. They admit an intuitive probabilistic interpretation, have
    a simple functional form, and enjoy widespread adoption. We introduce a unified
    algorithm to efficiently learn a broad class of linear and non-linear state
    space models, including variants where the emission and transition
    distributions are modeled by deep neural networks. Our learning algorithm
    simultaneously learns a compiled inference network and the generative model,
    leveraging a structured variational approximation parameterized by recurrent
    neural networks to mimic the posterior distribution. We apply the learning
    algorithm to both synthetic and real-world datasets, demonstrating its
    scalability and versatility. We find that using the structured approximation to
    the posterior results in models with significantly higher held-out likelihood.

    On the Worst-case Communication Overhead for Distributed Data Shuffling

    Mohamed Attia, Ravi Tandon
    Comments: To appear in Allerton 2016
    Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    Distributed learning platforms for processing large scale data-sets are
    becoming increasingly prevalent. In typical distributed implementations, a
    centralized master node breaks the data-set into smaller batches for parallel
    processing across distributed workers to achieve speed-up and efficiency.
    Several computational tasks are of sequential nature, and involve multiple
    passes over the data. At each iteration over the data, it is common practice to
    randomly re-shuffle the data at the master node, assigning different batches
    for each worker to process. This random re-shuffling operation comes at the
    cost of extra communication overhead, since at each shuffle, new data points
    need to be delivered to the distributed workers.

    In this paper, we focus on characterizing the information theoretically
    optimal communication overhead for the distributed data shuffling problem. We
    propose a novel coded data delivery scheme for the case of no excess storage,
    where every worker can only store the assigned data batches under processing.
    Our scheme exploits a new type of coding opportunity and is applicable to any
    arbitrary shuffle, and for any number of workers. We also present an
    information theoretic lower bound on the minimum communication overhead for
    data shuffling, and show that the proposed scheme matches this lower bound for
    the worst-case communication overhead.

    Optimal spectral transportation with application to music transcription

    Rémi Flamary, Cédric Févotte, Nicolas Courty, Valentin Emiya
    Comments: NIPS 2016
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)

    Many spectral unmixing methods rely on the non-negative decomposition of
    spectral data onto a dictionary of spectral templates. In particular,
    state-of-the-art music transcription systems decompose the spectrogram of the
    input signal onto a dictionary of representative note spectra. The typical
    measures of fit used to quantify the adequacy of the decomposition compare the
    data and template entries frequency-wise. As such, small displacements of
    energy from a frequency bin to another as well as variations of timber can
    disproportionally harm the fit. We address these issues by means of optimal
    transportation and propose a new measure of fit that treats the frequency
    distributions of energy holistically as opposed to frequency-wise. Building on
    the harmonic nature of sound, the new measure is invariant to shifts of energy
    to harmonically-related frequencies, as well as to small and local
    displacements of energy. Equipped with this new measure of fit, the dictionary
    of note templates can be considerably simplified to a set of Dirac vectors
    located at the target fundamental frequencies (musical pitch values). This in
    turns gives ground to a very fast and simple decomposition algorithm that
    achieves state-of-the-art performance on real musical data.

    On Identification of Sparse Multivariable ARX Model: A Sparse Bayesian Learning Approach

    J. Jin, Y. Yuan, W. Pan, D. L.T. Pham, C. J. Tomlin, A.Webb, J. Goncalves
    Subjects: Systems and Control (cs.SY); Learning (cs.LG); Machine Learning (stat.ML)

    This paper begins with considering the identification of sparse linear
    time-invariant networks described by multivariable ARX models. Such models
    possess relatively simple structure thus used as a benchmark to promote further
    research. With identifiability of the network guaranteed, this paper presents
    an identification method that infers both the Boolean structure of the network
    and the internal dynamics between nodes. Identification is performed directly
    from data without any prior knowledge of the system, including its order. The
    proposed method solves the identification problem using Maximum a posteriori
    estimation (MAP) but with inseparable penalties for complexity, both in terms
    of element (order of nonzero connections) and group sparsity (network
    topology). Such an approach is widely applied in Compressive Sensing (CS) and
    known as Sparse Bayesian Learning (SBL). We then propose a novel scheme that
    combines sparse Bayesian and group sparse Bayesian to efficiently solve the
    problem. The resulted algorithm has a similar form of the standard Sparse Group
    Lasso (SGL) while with known noise variance, it simplifies to exact re-weighted
    SGL. The method and the developed toolbox can be applied to infer networks from
    a wide range of fields, including systems biology applications such as
    signaling and genetic regulatory networks.

    Big Data analytics. Three use cases with R, Python and Spark

    Philippe Besse (IMT), Brendan Guillouet (IMT), Jean-Michel Loubes (IMT)
    Comments: in French, Apprentissage Statistique et Donn{‘e}es Massives, Technip, 2017, Journ{‘e}es d’Etudes en Statistisque
    Subjects: Applications (stat.AP); Learning (cs.LG)

    Management and analysis of big data are systematically associated with a data
    distributed architecture in the Hadoop and now Spark frameworks. This article
    offers an introduction for statisticians to these technologies by comparing the
    performance obtained by the direct use of three reference environments: R,
    Python Scikit-learn, Spark MLlib on three public use cases: character
    recognition, recommending films, categorizing products. As main result, it
    appears that, if Spark is very efficient for data munging and recommendation by
    collaborative filtering (non-negative factorization), current implementations
    of conventional learning methods (logistic regression, random forests) in MLlib
    or SparkML do not ou poorly compete habitual use of these methods (R, Python
    Scikit-learn) in an integrated or undistributed architecture

    Social Computing for Mobile Big Data in Wireless Networks

    Xing Zhang, Zhenglei Yi, Zhi Yan, Geyong Min, Wenbo Wang, Sabita Maharjan, Yan Zhang
    Comments: 8 papges, 3 figures, 1 tables
    Journal-ref: Computer, vol.49, no. 9, pp. 86-90, Sept. 2016
    Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)

    Mobile big data contains vast statistical features in various dimensions,
    including spatial, temporal, and the underlying social domain. Understanding
    and exploiting the features of mobile data from a social network perspective
    will be extremely beneficial to wireless networks, from planning, operation,
    and maintenance to optimization and marketing. In this paper, we categorize and
    analyze the big data collected from real wireless cellular networks. Then, we
    study the social characteristics of mobile big data and highlight several
    research directions for mobile big data in the social computing areas.

    Multi-dimensional signal approximation with sparse structured priors using split Bregman iterations

    Yoann Isaac, Quentin Barthélemy, Cédric Gouy-Pailler, Michèle Sebag, Jamal Atif
    Subjects: Data Structures and Algorithms (cs.DS); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    This paper addresses the structurally-constrained sparse decomposition of
    multi-dimensional signals onto overcomplete families of vectors, called
    dictionaries. The contribution of the paper is threefold. Firstly, a generic
    spatio-temporal regularization term is designed and used together with the
    standard $ell_1$ regularization term to enforce a sparse decomposition
    preserving the spatio-temporal structure of the signal. Secondly, an
    optimization algorithm based on the split Bregman approach is proposed to
    handle the associated optimization problem, and its convergence is analyzed.
    Our well-founded approach yields same accuracy as the other algorithms at the
    state-of-the-art, with significant gains in terms of convergence speed.
    Thirdly, the empirical validation of the approach on artificial and real-world
    problems demonstrates the generality and effectiveness of the method. On
    artificial problems, the proposed regularization subsumes the Total Variation
    minimization and recovers the expected decomposition. On the real-world problem
    of electro-encephalography brainwave decomposition, the approach outperforms
    similar approaches in terms of P300 evoked potentials detection, using
    structured spatial priors to guide the decomposition.

    Max-plus statistical leverage scores

    James Hook
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    The statistical leverage scores of a complex matrix $Ainmathbb{C}^{n imes
    d}$ record the degree of alignment between col$(A)$ and the coordinate axes in
    $mathbb{C}^n$. These score are used in random sampling algorithms for solving
    certain numerical linear algebra problems. In this paper we present a max-plus
    algebraic analogue for statistical leverage scores. We show that max-plus
    statistical leverage scores can be used to calculate the exact asymptotic
    behavior of the conventional statistical leverage scores of a generic matrices
    of Puiseux series and also provide a novel way to approximate the conventional
    statistical leverage scores of a fixed or complex matrix. The advantage of
    approximating a complex matrices scores with max-plus scores is that the
    max-plus scores can be computed very quickly. This approximation is typically
    accurate to within an order or magnitude and should be useful in practical
    problems where the true scores are known to vary widely.

    Tensor Completion by Alternating Minimization under the Tensor Train (TT) Model

    Wenqi Wang, Vaneet Aggarwal, Shuchin Aeron
    Subjects: Numerical Analysis (cs.NA); Information Theory (cs.IT); Learning (cs.LG)

    Using the matrix product state (MPS) representation of tensor train
    decompositions, in this paper we propose a tensor completion algorithm which
    alternates over the matrices (tensors) in the MPS representation. This
    development is motivated in part by the success of matrix completion algorithms
    which alternate over the (low-rank) factors. We comment on the computational
    complexity of the proposed algorithm and numerically compare it with existing
    methods employing low rank tensor train approximation for data completion as
    well as several other recently proposed methods. We show that our method is
    superior to existing ones for a variety of real settings.


    Information Theory

    On the Worst-case Communication Overhead for Distributed Data Shuffling

    Mohamed Attia, Ravi Tandon
    Comments: To appear in Allerton 2016
    Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    Distributed learning platforms for processing large scale data-sets are
    becoming increasingly prevalent. In typical distributed implementations, a
    centralized master node breaks the data-set into smaller batches for parallel
    processing across distributed workers to achieve speed-up and efficiency.
    Several computational tasks are of sequential nature, and involve multiple
    passes over the data. At each iteration over the data, it is common practice to
    randomly re-shuffle the data at the master node, assigning different batches
    for each worker to process. This random re-shuffling operation comes at the
    cost of extra communication overhead, since at each shuffle, new data points
    need to be delivered to the distributed workers.

    In this paper, we focus on characterizing the information theoretically
    optimal communication overhead for the distributed data shuffling problem. We
    propose a novel coded data delivery scheme for the case of no excess storage,
    where every worker can only store the assigned data batches under processing.
    Our scheme exploits a new type of coding opportunity and is applicable to any
    arbitrary shuffle, and for any number of workers. We also present an
    information theoretic lower bound on the minimum communication overhead for
    data shuffling, and show that the proposed scheme matches this lower bound for
    the worst-case communication overhead.

    Bit-permuted coded modulation for polar codes

    Saurabha R Tavildar
    Comments: 6 pages; 14 figures
    Subjects: Information Theory (cs.IT)

    We consider the problem of using polar codes with higher order modulation
    over AWGN channels. Unlike prior work, we focus on using modulation independent
    polar codes. That is, the polar codes are not re-designed based on the
    modulation used. Instead, we propose bit-permuted coded modulation (BPCM): a
    technique for using the multilevel coding (MLC) approach for an arbitrary polar
    code. The BPCM technique exploits a natural connection between MLC and polar
    codes. It involves applying bit permutations prior to mapping the polar code to
    a higher order modulation. The bit permutations are designed, via density
    evolution, to match the rates provided by various bit levels of the higher
    order modulation to that of the polar code.

    We demonstrate performance of the BPCM technique using link simulations and
    density evolution for the AWGN channel. We compare the BPCM technique with the
    bit-interleaved coded modulation (BICM) technique. When using polar codes
    designed for BPSK modulation, we show gains for BPCM over BICM with random
    interleaver of up to 0.2 dB, 0.7 dB and 1.4 dB for 4-ASK, 8-ASK, and 16-ASK
    respectively.

    An Overview of Sustainable Green 5G Networks

    Qingqing Wu, Geoffrey Ye Li, Wen Chen, Derrick Wing Kwan Ng, Robert Schober
    Comments: Submitted for possible publication
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    The stringent requirements of a 1,000 times increase in data traffic and one
    millisecond round trip latency have made limiting the potentially tremendous
    ensuing energy consumption one of the most challenging problems for the design
    of the upcoming fifth-generation (5G) networks. To enable sustainable 5G
    networks, new technologies have been proposed to improve the system energy
    efficiency and alternative energy sources are introduced to reduce our
    dependence on traditional fossil fuels. In particular, various 5G techniques
    target the reduction of the energy consumption without sacrificing the
    quality-of-service. Meanwhile, energy harvesting technologies, which enable
    communication transceivers to harvest energy from various renewable resources
    and ambient radio frequency signals for communi- cation, have drawn significant
    interest from both academia and industry. In this article, we provide an
    overview of the latest research on both green 5G techniques and energy
    harvesting for communication. In addition, some technical challenges and
    potential research topics for realizing sustainable green 5G networks are also
    identified.

    Modeling of $K$-tier Downlink Heterogeneous Cellular Networks over $κ$-$μ$ Shadowed Fading Channels

    Young Jin Chun, Simon L. Cotton, Harpreet S. Dhillon, F. Javier Lopez-Martinez, José F. Paris, Seong Ki Yoo
    Subjects: Information Theory (cs.IT)

    Emerging cellular technologies such as those proposed for use in 5G
    communications will accommodate a wide range of usage scenarios with diverse
    link requirements. This will include the necessity to operate over a versatile
    set of wireless channels ranging from indoor to outdoor, from line-of-sight
    (LOS) to non-LOS, and from circularly symmetric scattering to environments
    which promote the clustering of scattered multipath waves. Unfortunately, many
    of the conventional fading models adopted in the literature to develop network
    models lack the flexibility to account for such disparate signal propagation
    mechanisms. To bridge the gap between theory and practical channels, we
    consider $kappa$-$mu$ shadowed fading, which contains as special cases, the
    majority of the linear fading models proposed in the open literature, including
    Rayleigh, Rician, Nakagami-m, Nakagami-q, One-sided Gaussian, $kappa$-$mu$,
    $eta$-$mu$, and Rician shadowed to name but a few. In particular, we apply an
    orthogonal expansion to represent the $kappa$-$mu$ shadowed fading
    distribution as a simplified series expression. Then using the series
    expressions with stochastic geometry, we propose an analytic framework to
    evaluate the average of an arbitrary function of the SINR over $kappa$-$mu$
    shadowed fading channels. Using the proposed method, we evaluate the spectral
    efficiency, moments of the SINR, bit error probability and outage probability
    of a $K$-tier HetNet with $K$ classes of BSs, differing in terms of the
    transmit power, BS density, shadowing characteristics and small-scale fading.
    Building upon these results, we provide important new insights into the network
    performance of these emerging wireless applications while considering a diverse
    range of fading conditions and link qualities.

    Relative two-weight $mathbb{Z}_2 mathbb{Z}_4$-additive Codes

    N. Annamalai, C. Durairajan
    Subjects: Information Theory (cs.IT)

    In this paper, we study a relative two-weight $mathbb{Z}_2
    mathbb{Z}_4$-additive codes. It is shown that the Gray image of a two-distance
    $mathbb{Z}_2 mathbb{Z}_4$-additive code is a binary two-distance code and
    that the Gray image of a relative two-weight $mathbb{Z}_2
    mathbb{Z}_4$-additive code, with nontrivial binary part, is a linear binary
    relative two-weight code. The structure of relative two-weight $mathbb{Z}_2
    mathbb{Z}_4$-additive codes are described. Finally, we discussed permutation
    automorphism group of a $mathbb{Z}_2 mathbb{Z}_4$-additive codes.

    Joint Channel Estimation / Data Detection in MIMO-FBMC/OQAM Systems – A Tensor-Based Approach

    Eleftherios Kofidis, Christos Chatzichristos, Andre L. F. de Almeida
    Subjects: Information Theory (cs.IT)

    Filter bank-based multicarrier (FBMC) systems are currently being considered
    as a prevalent candidate for replacing the long established cyclic prefix
    (CP)-based orthogonal frequency division multiplexing (CP-OFDM) in the physical
    layer of next generation communications systems. In particular, FBMC/OQAM has
    received increasing attention due to, among other features, its potential for
    maximum spectral efficiency. It suffers, however, from an intrinsic
    self-interference effect, which complicates signal processing tasks at the
    receiver, including synchronization, channel estimation and equalization. In a
    multiple-input multiple-output (MIMO) configuration, the multi-antenna
    interference has also to be taken into account. (Semi-)blind FBMC/OQAM
    receivers have been little studied so far and mainly for single-antenna
    systems. The problem of joint channel estimation and data detection in a
    MIMO-FBMC/OQAM system, given limited or no training information, is studied in
    this paper through a tensor-based approach in the light of the success of such
    techniques in OFDM applications. Simulation-based comparisons with CP-OFDM are
    included, for realistic transmission models.

    Sparse Methods for Direction-of-Arrival Estimation

    Zai Yang, Jian Li, Petre Stoica, Lihua Xie
    Comments: 64 pages, overview article
    Subjects: Information Theory (cs.IT)

    Direction of arrival (DOA) estimation refers to the process of retrieving the
    direction information of several electromagnetic waves/sources from the outputs
    of a number of receiving antennas that form a sensor array. DOA estimation is a
    major problem in array signal processing and has wide applications in radar,
    sonar, wireless communications, etc. The purpose of this article is to provide
    an overview of the recent work on sparse DOA estimation methods.

    Numerical Approach to Maximize SNR for CDMA Systems

    Hirofumi Tsuda, Ken Umeno
    Comments: 6 pages, 5 figures
    Subjects: Information Theory (cs.IT)

    Signal to Noise Ratio (SNR) is an important index for wireless
    communications. There are many methods for increasing SNR. In CDMA systems,
    spreading sequences are used. To increase SNR, we have to improve spreading
    sequences. In classical approaches, the expression of SNR is not differentiable
    in terms of the parameter of the spreading sequences even in no fading
    situations. Thus, we express it as the differentiable form and construct the
    non-linear programing for maximizing SNR. In particular, we solve the problem
    of maximizing SNR numerically by obtaining spreading sequences whose SNR is
    guaranteed to be high.

    On Weighted MSE Model for MIMO Transceiver Optimization

    Chengwen Xing, Yindi Jing, Yiqing Zhou
    Comments: 34 Pages, 4 figures
    Subjects: Information Theory (cs.IT)

    Mean-squared-error (MSE) is one of the most widely used performance metrics
    for the designs and analysis of multi-input-multiple-output (MIMO)
    communications. Weighted MSE minimization, a more general formulation of MSE
    minimization, plays an important role in MIMO transceiver optimization. While
    this topic has a long history and has been extensively studied, existing
    treatments on the methods in solving the weighted MSE optimization are more or
    less sporadic and non-systematic. In this paper, we firstly review the two
    major methodologies, Lagrange multiplier method and majorization theory based
    method, and their common procedures in solving the weighted MSE minimization.
    Then some problems and limitations of the methods that were usually neglected
    or glossed over in existing literature are provided. These problems are
    fundamental and of critical importance for the corresponding MIMO transceiver
    optimizations. In addition, a new extended matrix-field weighted MSE model is
    proposed. Its solutions and applications are discussed in details. Compared
    with existing models, this new model has wider applications, e.g., nonlinear
    MIMO transceiver designs and capacity-maximization transceiver designs for
    general MIMO networks.

    The Generalized Reed-Muller codes in a modular group algebra

    Harinaivo Andriatahiny, Vololona Harinoro Rakotomalala
    Subjects: Information Theory (cs.IT)

    First we study some properties of the modular group algebra
    $mathbb{F}_{p^r}[G]$ where $G$ is the additive group of a Galois ring of
    characteristic $p^r$ and $mathbb{F}_{p^r}$ is the field of $p^r$ elements.
    Secondly a description of the Generalized Reed-Muller codes over
    $mathbb{F}_{p^r}$ in $mathbb{F}_{p^r}[G]$ is presented.

    Sprout: A functional caching approach to minimize service latency in erasure-coded storage

    Vaneet Aggarwal, Yih-Farn R. Chen, Tian Lan, Yu Xiang
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

    Modern distributed storage systems often use erasure codes to protect against
    disk and node failures to increase reliability, while trying to meet the
    latency requirements of the applications and clients. Storage systems may have
    caches at the proxy or client ends in order to reduce the latency. In this
    paper, we consider a novel caching framework with erasure code called {em
    functional caching}. Functional Caching involves using erasure-coded chunks in
    the cache such that the code formed by the chunks in storage nodes and cache
    combined are maximal-distance-separable (MDS) erasure codes. Based on the
    arrival rates of different files, placement of file chunks on the servers, and
    service time distribution of storage servers, an optimal functional caching
    placement and the access probabilities of the file request from different disks
    are considered. The proposed algorithm gives significant latency improvement in
    both simulations and a prototyped solution in an open-source, cloud storage
    deployment.

    Contextuality in multipartie pseudo-telepathy graph games

    Peter Hoyer, Mehdi Mhalla, Simon Perdrix
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    Analyzing pseudo-telepathy graph games, we propose a way to build
    contextuality scenarios exhibiting the quantum supremacy using graph states. We
    consider the combinatorial structures generating equivalent scenarios. We
    investigate which scenarios are more multipartite and show that there exist
    graphs generating scenarios with a linear multipartiteness width.

    The value of timing information in event-triggered control: The scalar case

    Mohammad Javad Khojasteh, Pavankumar Tallapragada, Jorge Cortés, Massimo Franceschetti
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Systems and Control (cs.SY)

    The problem of event-triggered control with rate limited communication is
    considered. For continuous-time scalar systems without disturbances, a phase
    transition behavior of the transmission rate required for stabilization as a
    function of the communication delay is revealed. It is shown that for low
    values of the delay the timing information carried by the triggering events is
    large and the system can be stabilized with any positive rate. On the other
    hand, when the delay exceeds a certain threshold that depends on the given
    triggering strategy, the timing information alone is not enough to achieve
    stabilization and the rate must begin to grow, eventually becoming larger than
    what required by the classic data-rate theorem. The critical point where the
    transmission rate equals the one imposed by the data-rate theorem occurs when
    the delay equals the inverse of the entropy rate of the plant, representing the
    intrinsic rate at which the system generates information. At this critical
    point, the timing information supplied by event triggering is completely
    balanced by the information loss due to the communication delay. Exponential
    convergence guarantees are also discussed, and an explicit construction
    providing a sufficient condition for stabilization is given.

    One-Lee weight and two-Lee weight $mathbb{Z}_2mathbb{Z}_2[u]$-additive codes

    Zhenliang Lu, Shixin Zhu, Liqi Wang, Xiaoshan Kai
    Subjects: Rings and Algebras (math.RA); Information Theory (cs.IT)

    In this paper, we study one-Lee weight and two-Lee weight codes over
    $mathbb{Z}_{2}mathbb{Z}_{2}[u]$, where $u^{2}=0$. Some properties of one-Lee
    weight $mathbb{Z}_{2}mathbb{Z}_{2}[u]$-additive codes are given, and a
    complete classification of one-Lee weight
    $mathbb{Z}_2mathbb{Z}_2[u]$-additive formally self-dual codes is obtained.
    The structure of two-Lee weight projective $mathbb{Z}_2mathbb{Z}_2[u]$ codes
    are determined. Some optimal binary linear codes are obtained directly from
    one-Lee weight and two-Lee weight $mathbb{Z}_{2}mathbb{Z}_{2}[u]$-additive
    codes via the extended Gray map.

    Reversibility Problem of Multidimensional Finite Cellular Automata

    Chih-Hung Chang, Hasan Akın
    Subjects: Dynamical Systems (math.DS); Information Theory (cs.IT)

    While the reversibility of multidimensional cellular automata is undecidable
    and there exists a criterion for determining if a multidimensional linear
    cellular automaton is reversible, there are only a few results about the
    reversibility problem of multidimensional linear cellular automata under
    boundary conditions. This work proposes a criterion for testing the
    reversibility of a multidimensional linear cellular automaton under null
    boundary condition and an algorithm for the computation of its reverse, if it
    exists. The investigation of the dynamical behavior of a multidimensional
    linear cellular automaton under null boundary condition is equivalent to
    elucidating the properties of block Toeplitz matrix. The proposed criterion
    significantly reduce the computational cost whenever the number of cells or the
    dimension is large; the discussion can also apply to cellular automata under
    periodic boundary condition with a minor modification.

    Fast $L_1$-$L_2$ Minimization via a Proximal Operator

    Yifei Lou, Ming Yan
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Numerical Analysis (math.NA)

    This paper aims to develop new and fast algorithms for recovering a sparse
    vector from a small number of measurements, which is a fundamental problem in
    the field of compressive sensing (CS). Currently, CS favors incoherent systems,
    in which any two measurements are as little correlated as possible. In reality,
    however, many problems are coherent, and conventional methods such as $L_1$
    minimization do not work well. Recently, the difference of the $L_1$ and $L_2$
    norms, denoted as $L_1$-$L_2$, is shown to have superior performance over the
    classic $L_1$ method, but it is computationally expensive. We derive an
    analytical solution for the proximal operator of the $L_1$-$L_2$ metric, and it
    makes some fast $L_1$ solvers such as forward-backward splitting (FBS) and
    alternative direction method of multipliers (ADMM) applicable for $L_1$-$L_2$.
    We describe in details how to incorporate the proximal operator into FBS and
    ADMM and show that the resulting algorithms are convergent under mild
    conditions. Both algorithms are shown to be much more efficient than the
    original implementation of $L_1$-$L_2$ based on a difference-of-convex approach
    in the numerical experiments.




沪ICP备19023445号-2号
友情链接