IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 14 Feb 2017

    我爱机器学习(52ml.net)发表于 2017-02-14 00:00:00
    love 0

    Neural and Evolutionary Computing

    Feature Space Modeling Through Surrogate Illumination

    Adam Gaier, Alexander Asteroth, Jean-Baptiste Mouret
    Subjects: Neural and Evolutionary Computing (cs.NE); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (stat.ML)

    The MAP-Elites algorithm produces a set of high-performing solutions that
    vary according to features defined by the user. This technique has the
    potential to be a powerful tool for design space exploration, but is limited by
    the need for numerous evaluations. The Surrogate-Assisted Illumination
    algorithm (SAIL), introduced here, integrates approximative models and
    intelligent sampling of the objective function to minimize the number of
    evaluations required by MAP-Elites.

    The ability of SAIL to efficiently produce both accurate models and diverse
    high performing solutions is illustrated on a 2D airfoil design problem. The
    search space is divided into bins, each holding a design with a different
    combination of features. In each bin SAIL produces a better performing solution
    than MAP-Elites, and requires several orders of magnitude fewer evaluations.
    The CMA-ES algorithm was used to produce an optimal design in each bin: with
    the same number of evaluations required by CMA-ES to find a near-optimal
    solution in a single bin, SAIL finds solutions of similar quality in every bin.

    Group Scissor: Scaling Neuromorphic Computing Design to Big Neural Networks

    Yandan Wang, Wei Wen, Beiye Liu, Donald Chiarulli, Hai Li
    Comments: Accepted in DAC 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

    Synapse crossbar is an elementary structure in Neuromorphic Computing Systems
    (NCS). However, the limited size of crossbars and heavy routing congestion
    impedes the NCS implementations of big neural networks. In this paper, we
    propose a two-step framework (namely, extit{group scissor}) to scale NCS
    designs to big neural networks. The first step is extit{rank clipping}, which
    integrates low-rank approximation into the training to reduce total crossbar
    area. The second step is extit{group connection deletion}, which structurally
    prunes connections to reduce routing congestion between crossbars. Tested on
    convolutional neural networks of extit{LeNet} on MNIST database and
    extit{ConvNet} on CIFAR-10 database, our experiments show significant
    reduction of crossbar area and routing area in NCS designs. Without accuracy
    loss, rank clipping reduces total crossbar area to 13.62\% and 51.81\% in the
    NCS designs of extit{LeNet} and extit{ConvNet}, respectively. Following
    rank clipping, group connection deletion further reduces the routing area of
    extit{LeNet} and extit{ConvNet} to 8.1\% and 52.06\%, respectively.

    Whale swarm algorithm for function optimization

    Bing Zeng, Liang Gao, Xinyu Li
    Comments: 8 pages, 5 figures
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Increasing nature-inspired metaheuristic algorithms are applied to solving
    the real-world optimization problems, as they have some advantages over the
    classical methods of numerical optimization. This paper has proposed a new
    nature-inspired metaheuristic called Whale Swarm Algorithm for function
    optimization, which is inspired by the whales behavior of communicating with
    each other via ultrasound for hunting. The proposed Whale Swarm Algorithm has
    been compared with several popular metaheuristic algorithms on comprehensive
    performance metrics. According to the experimental results, Whale Swarm
    Algorithm has a quite competitive performance when compared with other
    algorithms.


    Computer Vision and Pattern Recognition

    Cognitive Mapping and Planning for Visual Navigation

    Saurabh Gupta, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik
    Comments: Under review for CVPR 2017. Project webpage: this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)

    We introduce a neural architecture for navigation in novel environments. Our
    proposed architecture learns to map from first-person viewpoints and plans a
    sequence of actions towards goals in the environment. The Cognitive Mapper and
    Planner (CMP) is based on two key ideas: a) a unified joint architecture for
    mapping and planning, such that the mapping is driven by the needs of the
    planner, and b) a spatial memory with the ability to plan given an incomplete
    set of observations about the world. CMP constructs a top-down belief map of
    the world and applies a differentiable neural net planner to produce the next
    action at each time step. The accumulated belief of the world enables the agent
    to track visited regions of the environment. Our experiments demonstrate that
    CMP outperforms both reactive strategies and standard memory-based
    architectures and performs well in novel environments. Furthermore, we show
    that CMP can also achieve semantically specified goals, such as ‘go to a
    chair’.

    Estimation of the volume of the left ventricle from MRI images using deep neural networks

    Fangzhou Liao, Xi Chen, Xiaolin Hu, Sen Song
    Comments: 10 pages, 10 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Segmenting human left ventricle (LV) in magnetic resonance imaging (MRI)
    images and calculating its volume are important for diagnosing cardiac
    diseases. In 2016, Kaggle organized a competition to estimate the volume of LV
    from MRI images. The dataset consisted of a large number of cases, but only
    provided systole and diastole volumes as labels. We designed a system based on
    neural networks to solve this problem. It began with a detector combined with a
    neural network classifier for detecting regions of interest (ROIs) containing
    LV chambers. Then a deep neural network named hypercolumns fully convolutional
    network was used to segment LV in ROIs. The 2D segmentation results were
    integrated across different images to estimate the volume. With ground-truth
    volume labels, this model was trained end-to-end. To improve the result, an
    additional dataset with only segmentation label was used. The model was trained
    alternately on these two datasets with different types of teaching signals. We
    also proposed a variance estimation method for the final prediction. Our
    algorithm ranked the 4th on the test set in this competition.

    Online People Tracking and Identification with RFID and Kinect

    Xinyu Li, Yanyi Zhang, Ivan Marsic, Randall S. Burd
    Comments: 8 Pages, 8 Figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We introduce a novel, accurate and practical system for real-time people
    tracking and identification. We used a Kinect V2 sensor for tracking that
    generates a body skeleton for up to six people in the view. We perform
    identification using both Kinect and passive RFID, by first measuring the
    velocity vector of person’s skeleton and of their RFID tag using the position
    of the RFID reader antennas as reference points and then finding the best match
    between skeletons and tags. We introduce a method for synchronizing Kinect
    data, which is captured regularly, with irregular or missing RFID data
    readouts. Our experiments show centimeter-level people tracking resolution with
    80% average identification accuracy for up to six people in indoor
    environments, which meets the needs of many applications. Our system can
    preserve user privacy and work with different lighting.

    An Efficient Decomposition Framework for Discriminative Segmentation with Supermodular Losses

    Jiaqian Yu, Matthew B. Blaschko
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Several supermodular losses have been shown to improve the perceptual quality
    of image segmentation in a discriminative framework such as a structured output
    support vector machine (SVM). These loss functions do not necessarily have the
    same structure as the one used by the segmentation inference algorithm, and in
    general, we may have to resort to generic submodular minimization algorithms
    for loss augmented inference. Although these come with polynomial time
    guarantees, they are not practical to apply to image scale data. Many
    supermodular losses come with strong optimization guarantees, but are not
    readily incorporated in a loss augmented graph cuts procedure. This motivates
    our strategy of employing the alternating direction method of multipliers
    (ADMM) decomposition for loss augmented inference. In doing so, we create a new
    API for the structured SVM that separates the maximum a posteriori (MAP)
    inference of the model from the loss augmentation during training. In this way,
    we gain computational efficiency, making new choices of loss functions
    practical for the first time, while simultaneously making the inference
    algorithm employed during training closer to the test time procedure. We show
    improvement both in accuracy and computational performance on the Microsoft
    Research Grabcut database and a brain structure segmentation task, empirically
    validating the use of several supermodular loss functions during training, and
    the improved computational properties of the proposed ADMM approach over the
    Fujishige-Wolfe minimum norm point algorithm.

    Unsupervised temporal context learning using convolutional neural networks for laparoscopic workflow analysis

    Sebastian Bodenstedt (1), Martin Wagner (2), Darko Katić (1), Patrick Mietkowski (2), Benjamin Mayer (2), Hannes Kenngott (2), Beat Müller-Stich (2), Rüdiger Dillmann (1), Stefanie Speidel (1) ((1) Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Karlsruhe, (2) Department of General, Visceral and Transplant Surgery, University of Heidelberg, Heidelberg)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Computer-assisted surgery (CAS) aims to provide the surgeon with the right
    type of assistance at the right moment. Such assistance systems are especially
    relevant in laparoscopic surgery, where CAS can alleviate some of the drawbacks
    that surgeons incur. For many assistance functions, e.g. displaying the
    location of a tumor at the appropriate time or suggesting what instruments to
    prepare next, analyzing the surgical workflow is a prerequisite. Since
    laparoscopic interventions are performed via endoscope, the video signal is an
    obvious sensor modality to rely on for workflow analysis.

    Image-based workflow analysis tasks in laparoscopy, such as phase
    recognition, skill assessment, video indexing or automatic annotation, require
    a temporal distinction between video frames. Generally computer vision based
    methods that generalize from previously seen data are used. For training such
    methods, large amounts of annotated data are necessary. Annotating surgical
    data requires expert knowledge, therefore collecting a sufficient amount of
    data is difficult, time-consuming and not always feasible.

    In this paper, we address this problem by presenting an unsupervised method
    for training a convolutional neural network (CNN) to differentiate between
    laparoscopic video frames on a temporal basis. We extract video frames at
    regular intervals from 324 unlabeled laparoscopic interventions, resulting in a
    dataset of approximately 2.2 million images. From this dataset, we extract
    image pairs from the same video and train a CNN to determine their temporal
    order. To solve this problem, the CNN has to extract features that are relevant
    for comprehending laparoscopic workflow.

    Furthermore, we demonstrate that such a CNN can be adapted for surgical
    workflow segmentation. We performed image-based workflow segmentation on a
    publicly available dataset of 7 cholecystectomies and 9 colorectal
    interventions.

    Underwater Optical Image Processing: A Comprehensive Review

    Huimin Lu, Yujie Li, Yudong Zhang, Min Chen, Seiichi Serikawa, Hyoungseop Kim
    Comments: 14 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Underwater cameras are widely used to observe the sea floor. They are usually
    included in autonomous underwater vehicles, unmanned underwater vehicles, and
    in situ ocean sensor networks. Despite being an important sensor for monitoring
    underwater scenes, there exist many issues with recent underwater camera
    sensors. Because of lights transportation characteristics in water and the
    biological activity at the sea floor, the acquired underwater images often
    suffer from scatters and large amounts of noise. Over the last five years, many
    methods have been proposed to overcome traditional underwater imaging problems.
    This paper aims to review the state-of-the-art techniques in underwater image
    processing by highlighting the contributions and challenges presented in over
    40 papers. We present an overview of various underwater image processing
    approaches, such as underwater image descattering, underwater image color
    restoration, and underwater image quality assessments. Finally, we summarize
    the future trends and challenges in designing and processing underwater imaging
    sensors.

    Sparse Representation based Multi-sensor Image Fusion: A Review

    Qiang Zhang, Yi Liu, Rick S. Blum, Jungong Han, Dacheng Tao
    Comments: 19 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    As a result of several successful applications in computer vision and image
    processing, sparse representation (SR) has attracted significant attention in
    multi-sensor image fusion. Unlike the traditional multiscale transforms (MSTs)
    that presume the basis functions, SR learns an over-complete dictionary from a
    set of training images for image fusion, and it achieves more stable and
    meaningful representations of the source images. By doing so, the SR-based
    fusion methods generally outperform the traditional MST-based image fusion
    methods in both subjective and objective tests. In addition, they are less
    susceptible to mis-registration among the source images, thus facilitating the
    practical applications. This survey paper proposes a systematic review of the
    SR-based multi-sensor image fusion literature, highlighting the pros and cons
    of each category of approaches. Specifically, we start by performing a
    theoretical investigation of the entire system from three key algorithmic
    aspects, (1) sparse representation models; (2) dictionary learning methods; and
    (3) activity levels and fusion rules. Subsequently, we show how the existing
    works address these scientific problems and design the appropriate fusion rules
    for each application, such as multi-focus image fusion and multi-modality
    (e.g., infrared and visible) image fusion. At last, we carry out some
    experiments to evaluate the impact of these three algorithmic components on the
    fusion performance when dealing with different applications. This article is
    expected to serve as a tutorial and source of reference for researchers
    preparing to enter the field or who desire to employ the sparse representation
    theory in other fields.

    A Novel Weight-Shared Multi-Stage Network Architecture of CNNs for Scale Invariance

    Ryo Takahashi, Takashi Matsubara, Kuniaki Uehara
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional neural networks (CNNs) have demonstrated remarkable results in
    image classification tasks for benchmark and practical uses. The CNNs with
    deeper architectures have achieved higher performances thanks to their numerous
    parameters and resulting high expression ability recently. However, the CNNs
    have a problem of limited robustness to geometric transformation of objects in
    images such as scaling and rotation. This problem is considered to limit
    performance improvement of the deep CNNs but there is no established solution.
    This study focuses on scale transformation and proposes a novel network
    architecture called weight-shared multi-stage network (WSMS-Net), which enables
    the existing deep CNNs, such as ResNet and DenseNet, to acquire robustness to
    scaling of objects. The WSMS-Net architecture consists of multiple stages of
    CNNs and is easily combined with existing deep CNNs. This study demonstrates
    that existing deep CNNs combined the proposed WSMS-Net archive higher accuracy
    for image classification tasks only with little increase in the number of
    parameters.

    Crossing Nets: Dual Generative Models with a Shared Latent Space for Hand Pose Estimation

    Chengde Wan, Thomas Probst, Luc Van Gool, Angela Yao
    Comments: 10 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    State-of-the-art methods for 3D hand pose estimation from depth images
    require large amounts of annotated training data. We propose to model the
    statistical relationships of 3D hand poses and corresponding depth images using
    two deep generative models with a shared latent space. By design, our
    architecture allows for learning from unlabeled image data in a semi-supervised
    manner. Assuming a one-to-one mapping between a pose and a depth map, any given
    point in the shared latent space can be projected into both a hand pose and a
    corresponding depth map. Regressing the hand pose can then be done by learning
    a discriminator to estimate the posterior of the latent pose given some depth
    map. To improve generalization and to better exploit unlabeled depth maps, we
    jointly train a generator and a discriminator. At each iteration, the generator
    is updated with the back-propagated gradient from the discriminator to
    synthesize realistic depth maps of the articulated hand, while the
    discriminator benefits from an augmented training set of synthesized and
    unlabeled samples. The proposed discriminator network architecture is highly
    efficient and runs at 90 FPS on the CPU with accuracies comparable or better
    than state-of-art on 3 publicly available benchmarks.

    ArtGAN: Artwork Synthesis with Conditional Categorial GANs

    Wei Ren Tan, Chee Seng Chan, Hernan Aguirre, Kiyoshi Tanaka
    Comments: 10 pages, 10 figures, submitted to ICIP2017 (extension version)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes an extension to the Generative Adversarial Networks
    (GANs), namely as ARTGAN to synthetically generate more challenging and complex
    images such as artwork that have abstract characteristics. This is in contrast
    to most of the current solutions that focused on generating natural images such
    as room interiors, birds, flowers and faces. The key innovation of our work is
    to allow back-propagation of the loss function w.r.t. the labels (randomly
    assigned to each generated images) to the generator from the discriminator.
    With the feedback from the label information, the generator is able to learn
    faster and achieve better generated image quality. Empirically, we show that
    the proposed ARTGAN is capable to create realistic artwork, as well as generate
    compelling real world images that globally look natural with clear shape on
    CIFAR-10.

    Reverse Classification Accuracy: Predicting Segmentation Performance in the Absence of Ground Truth

    Vanya V. Valindria, Ioannis Lavdas, Wenjia Bai, Konstantinos Kamnitsas, Eric O. Aboagye, Andrea G. Rockall, Daniel Rueckert, Ben Glocker
    Comments: Accepted article to appear in IEEE Transactions on Medical Imaging 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    When integrating computational tools such as automatic segmentation into
    clinical practice, it is of utmost importance to be able to assess the level of
    accuracy on new data, and in particular, to detect when an automatic method
    fails. However, this is difficult to achieve due to absence of ground truth.
    Segmentation accuracy on clinical data might be different from what is found
    through cross-validation because validation data is often used during
    incremental method development, which can lead to overfitting and unrealistic
    performance expectations. Before deployment, performance is quantified using
    different metrics, for which the predicted segmentation is compared to a
    reference segmentation, often obtained manually by an expert. But little is
    known about the real performance after deployment when a reference is
    unavailable. In this paper, we introduce the concept of reverse classification
    accuracy (RCA) as a framework for predicting the performance of a segmentation
    method on new data. In RCA we take the predicted segmentation from a new image
    to train a reverse classifier which is evaluated on a set of reference images
    with available ground truth. The hypothesis is that if the predicted
    segmentation is of good quality, then the reverse classifier will perform well
    on at least some of the reference images. We validate our approach on
    multi-organ segmentation with different classifiers and segmentation methods.
    Our results indicate that it is indeed possible to predict the quality of
    individual segmentations, in the absence of ground truth. Thus, RCA is ideal
    for integration into automatic processing pipelines in clinical routine and as
    part of large-scale image analysis studies.

    Enhanced Local Binary Patterns for Automatic Face Recognition

    Pavel Král, Antonín Vrba
    Comments: Submitted for ICIP 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a novel automatic face recognition approach based on
    local binary patterns (LBP). LBP descriptor considers a local neighbourhood of
    a pixel to compute the features. This method is not very robust to handle image
    noise, variances and different illumination conditions. In this paper, we
    address these issues and extend the original LBP operator by considering more
    pixels and different neighbourhoods to compute the feature vector. The proposed
    method is evaluated on two benchmark corpora, namely UFI and FERET face
    datasets. We experimentally show that our approach is very efficient because it
    significantly outperforms several other state-of-the-art methods and is
    efficient particularly in the real conditions where the above mentioned issues
    are obvious.

    Multi-Resolution Dual-Tree Wavelet Scattering Network for Signal Classification

    Amarjot Singh, Nick Kingsbury
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper introduces a Deep Scattering network that utilizes Dual-Tree
    complex wavelets to extract translation invariant representations from an input
    signal. The computationally efficient Dual-Tree wavelets decompose the input
    signal into densely spaced representations over scales. Translation invariance
    is introduced in the representations by applying a non-linearity over a region
    followed by averaging. The discriminatory information in the densely spaced,
    locally smooth, signal representations aids the learning of the classifier. The
    proposed network is shown to outperform Mallat’s ScatterNet on four datasets
    with different modalities on classification accuracy.

    Distributed Mapping with Privacy and Communication Constraints: Lightweight Algorithms and Object-based Models

    Siddharth Choudhary, Luca Carlone, Carlos Nieto, John Rogers, Henrik I. Christensen, Frank Dellaert
    Comments: preprint for IJRR submission
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    We consider the following problem: a team of robots is deployed in an unknown
    environment and it has to collaboratively build a map of the area without a
    reliable infrastructure for communication. The backbone for modern mapping
    techniques is pose graph optimization, which estimates the trajectory of the
    robots, from which the map can be easily built. The first contribution of this
    paper is a set of distributed algorithms for pose graph optimization: rather
    than sending all sensor data to a remote sensor fusion server, the robots
    exchange very partial and noisy information to reach an agreement on the pose
    graph configuration. Our approach can be considered as a distributed
    implementation of the two-stage approach of Carlone et al., where we use the
    Successive Over-Relaxation (SOR) and the Jacobi Over-Relaxation (JOR) as
    workhorses to split the computation among the robots. As a second contribution,
    we extend %and demonstrate the applicability of the proposed distributed
    algorithms to work with object-based map models. The use of object-based models
    avoids the exchange of raw sensor measurements (e.g., point clouds) further
    reducing the communication burden. Our third contribution is an extensive
    experimental evaluation of the proposed techniques, including tests in
    realistic Gazebo simulations and field experiments in a military test facility.
    Abundant experimental evidence suggests that one of the proposed algorithms
    (the Distributed Gauss-Seidel method or DGS) has excellent performance. The DGS
    requires minimal information exchange, has an anytime flavor, scales well to
    large teams, is robust to noise, and is easy to implement. Our field tests show
    that the combined use of our distributed algorithms and object-based models
    reduces the communication requirements by several orders of magnitude and
    enables distributed mapping with large teams of robots in real-world problems.


    Artificial Intelligence

    Bilateral Multi-Perspective Matching for Natural Language Sentences

    Zhiguo Wang, Wael Hamza, Radu Florian
    Comments: 7
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Natural language sentence matching is a fundamental technology for a variety
    of tasks. Previous approaches either match sentences from a single direction or
    only apply single granular (word-by-word or sentence-by-sentence) matching. In
    this work, we propose a bilateral multi-perspective matching (BiMPM) model
    under the “matching-aggregation” framework. Given two sentences (P) and (Q),
    our model first encodes them with a BiLSTM encoder. Next, we match the two
    encoded sentences in two directions (P
    ightarrow Q) and (P leftarrow Q). In
    each matching direction, each time step of one sentence is matched against all
    time-steps of the other sentence from multiple perspectives. Then, another
    BiLSTM layer is utilized to aggregate the matching results into a fix-length
    matching vector. Finally, based on the matching vector, the decision is made
    through a fully connected layer. We evaluate our model on three tasks:
    paraphrase identification, natural language inference and answer sentence
    selection. Experimental results on standard benchmark datasets show that our
    model achieves the state-of-the-art performance on all tasks.

    Traditional PageRank versus Network Capacity Bound

    Mieczysław A.Kłopotek, Sławomir T.Wierzchom, Robert A. Kłopotek, Elżbieta A. Kłopotek
    Subjects: Artificial Intelligence (cs.AI)

    In a former paper we simplified the proof of a theorem on personalized random
    walk that is fundamental to graph nodes clustering and generalized it to
    bipartite graphs for a specific case where the proobability of random jump was
    proprtional to the number of links of “personally prefereed” nodes. In this
    paper we turn to the more complex issue of graphs in which the random jump
    follows uniform distribution.

    On Seeking Consensus Between Document Similarity Measures

    Mieczysław Kłopotek
    Subjects: Artificial Intelligence (cs.AI)

    This paper investigates the application of consensus clustering and
    meta-clustering to the set of all possible partitions of a data set. We show
    that when using a “complement” of Rand Index as a measure of cluster
    similarity, the total-separation partition, putting each element in a separate
    set, is chosen.

    Genetic and Memetic Algorithm with Diversity Equilibrium based on Greedy Diversification

    Andrés Herrera-Poyatos, Francisco Herrera
    Comments: 27 pages, 5 figures, 11 tables
    Subjects: Artificial Intelligence (cs.AI)

    The lack of diversity in a genetic algorithm’s population may lead to a bad
    performance of the genetic operators since there is not an equilibrium between
    exploration and exploitation. In those cases, genetic algorithms present a fast
    and unsuitable convergence.

    In this paper we develop a novel hybrid genetic algorithm which attempts to
    obtain a balance between exploration and exploitation. It confronts the
    diversity problem using the named greedy diversification operator. Furthermore,
    the proposed algorithm applies a competition between parent and children so as
    to exploit the high quality visited solutions. These operators are complemented
    by a simple selection mechanism designed to preserve and take advantage of the
    population diversity.

    Additionally, we extend our proposal to the field of memetic algorithms,
    obtaining an improved model with outstanding results in practice.

    The experimental study shows the validity of the approach as well as how
    important is taking into account the exploration and exploitation concepts when
    designing an evolutionary algorithm.

    Graph Neural Networks and Boolean Satisfiability

    Benedikt Bünz, Matthew Lamm
    Subjects: Artificial Intelligence (cs.AI)

    In this paper we explore whether or not deep neural architectures can learn
    to classify Boolean satisfiability (SAT). We devote considerable time to
    discussing the theoretical properties of SAT. Then, we define a graph
    representation for Boolean formulas in conjunctive normal form, and train
    neural classifiers over general graph structures called Graph Neural Networks,
    or GNNs, to recognize features of satisfiability. To the best of our knowledge
    this has never been tried before. Our preliminary findings are potentially
    profound. In a weakly-supervised setting, that is, without problem specific
    feature engineering, Graph Neural Networks can learn features of
    satisfiability.

    Similarity Preserving Representation Learning for Time Series Analysis

    Qi Lei, Jinfeng Yi, Roman Vaculin, Lingfei Wu, Inderjit S. Dhillon
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    A considerable amount of machine learning algorithms take matrices as their
    inputs. As such, they cannot directly analyze time series data due to its
    temporal nature, usually unequal lengths, and complex properties. This is a
    great pity since many of these algorithms are effective, robust, efficient, and
    easy to use. In this paper, we bridge this gap by proposing an efficient
    representation learning framework that is able to convert a set of time series
    with equal or unequal lengths to a matrix format. In particular, we guarantee
    that the pairwise similarities between time series are well preserved after the
    transformation. Therefore, the learned feature representation is particularly
    suitable to the class of learning problems that are sensitive to data
    similarities. Given a set of (n) time series, we first construct an (n imes n)
    partially observed similarity matrix by randomly sampling (O(n log n)) pairs
    of time series and computing their pairwise similarities. We then propose an
    extremely efficient algorithm that solves a highly non-convex and NP-hard
    problem to learn new features based on the partially observed similarity
    matrix. We use the learned features to conduct experiments on both data
    classification and clustering tasks. Our extensive experimental results
    demonstrate that the proposed framework is both effective and efficient.

    Octopus: A Framework for Cost-Quality-Time Optimization in Crowdsourcing

    Karan Goel, Shreya Rajpal, Mausam
    Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multiagent Systems (cs.MA)

    Managing micro-tasks on crowdsourcing marketplaces involves balancing
    conflicting objectives — the quality of work, total cost incurred and time to
    completion. Previous agents have focused on cost-quality, or cost-time
    tradeoffs, limiting their real-world applicability. As a step towards this goal
    we present Octopus, the first AI agent that jointly manages all three
    objectives in tandem. Octopus is based on a computationally tractable,
    multi-agent formulation consisting of three components; one that sets the price
    per ballot to adjust the rate of completion of tasks, another that optimizes
    each task for quality and a third that performs task selection. We demonstrate
    that Octopus outperforms existing state-of-the-art approaches in simulation and
    experiments with real data, demonstrating its superior performance. We also
    deploy Octopus on Amazon Mechanical Turk to establish its ability to manage
    tasks in a real-world, dynamic setting.

    A Minimax Algorithm Better Than Alpha-beta?: No and Yes

    Aske Plaat, Jonathan Schaeffer, Wim Pijls, Arie de Bruin
    Comments: Report version of AI Journal article Best-first fixed-depth minimax algorithms 1996
    Subjects: Artificial Intelligence (cs.AI)

    This paper has three main contributions to our understanding of fixed-depth
    minimax search: (A) A new formulation for Stockman’s SSS* algorithm, based on
    Alpha-Beta, is presented. It solves all the perceived drawbacks of SSS*,
    finally transforming it into a practical algorithm. In effect, we show that
    SSS* = alpha-beta + ransposition tables. The crucial step is the realization
    that transposition tables contain so-called solution trees, structures that are
    used in best-first search algorithms like SSS*. Having created a practical
    version, we present performance measurements with tournament game-playing
    programs for three different minimax games, yielding results that contradict a
    number of publications. (B) Based on the insights gained in our attempts at
    understanding SSS*, we present a framework that facilitates the construction of
    several best-first fixed- depth game-tree search algorithms, known and new. The
    framework is based on depth-first null-window Alpha-Beta search, enhanced with
    storage to allow for the refining of previous search results. It focuses
    attention on the essential differences between algorithms. (C) We present a new
    instance of the framework, MTD(f). It is well-suited for use with iterative
    deepening, and performs better than algorithms that are currently used in most
    state-of-the-art game-playing programs. We provide experimental evidence to
    explain why MTD(f) performs better than the other fixed-depth minimax
    algorithms.

    Cognitive Mapping and Planning for Visual Navigation

    Saurabh Gupta, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik
    Comments: Under review for CVPR 2017. Project webpage: this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)

    We introduce a neural architecture for navigation in novel environments. Our
    proposed architecture learns to map from first-person viewpoints and plans a
    sequence of actions towards goals in the environment. The Cognitive Mapper and
    Planner (CMP) is based on two key ideas: a) a unified joint architecture for
    mapping and planning, such that the mapping is driven by the needs of the
    planner, and b) a spatial memory with the ability to plan given an incomplete
    set of observations about the world. CMP constructs a top-down belief map of
    the world and applies a differentiable neural net planner to produce the next
    action at each time step. The accumulated belief of the world enables the agent
    to track visited regions of the environment. Our experiments demonstrate that
    CMP outperforms both reactive strategies and standard memory-based
    architectures and performs well in novel environments. Furthermore, we show
    that CMP can also achieve semantically specified goals, such as ‘go to a
    chair’.

    Offline bilingual word vectors, orthogonal transformations and the inverted softmax

    Samuel L. Smith, David H. P. Turban, Steven Hamblin, Nils Y. Hammerla
    Comments: Accepted to conference track at ICLR 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

    Usually bilingual word vectors are trained “online”. Mikolov et al. showed
    they can also be found “offline”, whereby two pre-trained embeddings are
    aligned with a linear transformation, using dictionaries compiled from expert
    knowledge. In this work, we prove that the linear transformation between two
    spaces should be orthogonal. This transformation can be obtained using the
    singular value decomposition. We introduce a novel “inverted softmax” for
    identifying translation pairs, with which we improve the precision @1 of
    Mikolov’s original mapping from 34% to 43%, when translating a test set
    composed of both common and rare English words into Italian. Orthogonal
    transformations are more robust to noise, enabling us to learn the
    transformation without expert bilingual signal by constructing a
    “pseudo-dictionary” from the identical character strings which appear in both
    languages, achieving 40% precision on the same test set. Finally, we extend our
    method to retrieve the true translations of English sentences from a corpus of
    200k Italian sentences with a precision @1 of 68%.

    Reservoir Computing Using Non-Uniform Binary Cellular Automata

    Stefano Nichele, Magnus S. Gundersen
    Subjects: Emerging Technologies (cs.ET); Artificial Intelligence (cs.AI)

    The Reservoir Computing (RC) paradigm utilizes a dynamical system, i.e., a
    reservoir, and a linear classifier, i.e., a read-out layer, to process data
    from sequential classification tasks. In this paper the usage of Cellular
    Automata (CA) as a reservoir is investigated. The use of CA in RC has been
    showing promising results. In this paper, selected state-of-the-art experiments
    are reproduced. It is shown that some CA-rules perform better than others, and
    the reservoir performance is improved by increasing the size of the CA
    reservoir itself. In addition, the usage of parallel loosely coupled
    CA-reservoirs, where each reservoir has a different CA-rule, is investigated.
    The experiments performed on quasi-uniform CA reservoir provide valuable
    insights in CA reservoir design. The results herein show that some rules do not
    work well together, while other combinations work remarkably well. This
    suggests that non-uniform CA could represent a powerful tool for novel CA
    reservoir implementations.

    Is Big Data Sufficient for a Reliable Detection of Non-Technical Losses?

    Patrick Glauner, Angelo Migliosi, Jorge Meira, Eric Aislan Antonelo, Petko Valtchev, Radu State, Franck Bettinger
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Non-technical losses (NTL) occur during the distribution of electricity in
    power grids and include, but are not limited to, electricity theft and faulty
    meters. In emerging countries, they may range up to 40% of the total
    electricity distributed. In order to detect NTLs, machine learning methods are
    used that learn irregular consumption patterns from customer data and
    inspection results. The Big Data paradigm followed in modern machine learning
    reflects the desire of deriving better conclusions from simply analyzing more
    data, without the necessity of looking at theory and models. However, the
    sample of inspected customers may be biased, i.e. it does not represent the
    population of all customers. As a consequence, machine learning models trained
    on these inspection results are biased as well and therefore lead to unreliable
    predictions of whether customers cause NTL or not. In machine learning, this
    issue is called covariate shift and has not been addressed in the literature on
    NTL detection yet. In this work, we present a novel framework for quantifying
    and visualizing covariate shift. We apply it to a commercial data set from
    Brazil that consists of 3.6M customers and 820K inspection results. We show
    that some features have a stronger covariate shift than others, making
    predictions less reliable. In particular, previous inspections were focused on
    certain neighborhoods or customer classes and that they were not sufficiently
    spread among the population of customers. This framework is about to be
    deployed in a commercial product for NTL detection.

    Group Scissor: Scaling Neuromorphic Computing Design to Big Neural Networks

    Yandan Wang, Wei Wen, Beiye Liu, Donald Chiarulli, Hai Li
    Comments: Accepted in DAC 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

    Synapse crossbar is an elementary structure in Neuromorphic Computing Systems
    (NCS). However, the limited size of crossbars and heavy routing congestion
    impedes the NCS implementations of big neural networks. In this paper, we
    propose a two-step framework (namely, extit{group scissor}) to scale NCS
    designs to big neural networks. The first step is extit{rank clipping}, which
    integrates low-rank approximation into the training to reduce total crossbar
    area. The second step is extit{group connection deletion}, which structurally
    prunes connections to reduce routing congestion between crossbars. Tested on
    convolutional neural networks of extit{LeNet} on MNIST database and
    extit{ConvNet} on CIFAR-10 database, our experiments show significant
    reduction of crossbar area and routing area in NCS designs. Without accuracy
    loss, rank clipping reduces total crossbar area to 13.62\% and 51.81\% in the
    NCS designs of extit{LeNet} and extit{ConvNet}, respectively. Following
    rank clipping, group connection deletion further reduces the routing area of
    extit{LeNet} and extit{ConvNet} to 8.1\% and 52.06\%, respectively.


    Information Retrieval

    Offline bilingual word vectors, orthogonal transformations and the inverted softmax

    Samuel L. Smith, David H. P. Turban, Steven Hamblin, Nils Y. Hammerla
    Comments: Accepted to conference track at ICLR 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

    Usually bilingual word vectors are trained “online”. Mikolov et al. showed
    they can also be found “offline”, whereby two pre-trained embeddings are
    aligned with a linear transformation, using dictionaries compiled from expert
    knowledge. In this work, we prove that the linear transformation between two
    spaces should be orthogonal. This transformation can be obtained using the
    singular value decomposition. We introduce a novel “inverted softmax” for
    identifying translation pairs, with which we improve the precision @1 of
    Mikolov’s original mapping from 34% to 43%, when translating a test set
    composed of both common and rare English words into Italian. Orthogonal
    transformations are more robust to noise, enabling us to learn the
    transformation without expert bilingual signal by constructing a
    “pseudo-dictionary” from the identical character strings which appear in both
    languages, achieving 40% precision on the same test set. Finally, we extend our
    method to retrieve the true translations of English sentences from a corpus of
    200k Italian sentences with a precision @1 of 68%.


    Computation and Language

    Offline bilingual word vectors, orthogonal transformations and the inverted softmax

    Samuel L. Smith, David H. P. Turban, Steven Hamblin, Nils Y. Hammerla
    Comments: Accepted to conference track at ICLR 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

    Usually bilingual word vectors are trained “online”. Mikolov et al. showed
    they can also be found “offline”, whereby two pre-trained embeddings are
    aligned with a linear transformation, using dictionaries compiled from expert
    knowledge. In this work, we prove that the linear transformation between two
    spaces should be orthogonal. This transformation can be obtained using the
    singular value decomposition. We introduce a novel “inverted softmax” for
    identifying translation pairs, with which we improve the precision @1 of
    Mikolov’s original mapping from 34% to 43%, when translating a test set
    composed of both common and rare English words into Italian. Orthogonal
    transformations are more robust to noise, enabling us to learn the
    transformation without expert bilingual signal by constructing a
    “pseudo-dictionary” from the identical character strings which appear in both
    languages, achieving 40% precision on the same test set. Finally, we extend our
    method to retrieve the true translations of English sentences from a corpus of
    200k Italian sentences with a precision @1 of 68%.

    Towards speech-to-text translation without speech recognition

    Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater
    Comments: To appear in EACL 2017 (short papers)
    Subjects: Computation and Language (cs.CL)

    We explore the problem of translating speech to text in low-resource
    scenarios where neither automatic speech recognition (ASR) nor machine
    translation (MT) are available, but we have training data in the form of audio
    paired with text translations. We present the first system for this problem
    applied to a realistic multi-speaker dataset, the CALLHOME Spanish-English
    speech translation corpus. Our approach uses unsupervised term discovery (UTD)
    to cluster repeated patterns in the audio, creating a pseudotext, which we pair
    with translations to create a parallel text and train a simple bag-of-words MT
    model. We identify the challenges faced by the system, finding that the
    difficulty of cross-speaker UTD results in low recall, but that our system is
    still able to correctly translate some content words in test data.

    Multitask Learning with Deep Neural Networks for Community Question Answering

    Daniele Bonadiman, Antonio Uva, Alessandro Moschitti
    Subjects: Computation and Language (cs.CL)

    In this paper, we developed a deep neural network (DNN) that learns to solve
    simultaneously the three tasks of the cQA challenge proposed by the
    SemEval-2016 Task 3, i.e., question-comment similarity, question-question
    similarity and new question-comment similarity. The latter is the main task,
    which can exploit the previous two for achieving better results. Our DNN is
    trained jointly on all the three cQA tasks and learns to encode questions and
    comments into a single vector representation shared across the multiple tasks.
    The results on the official challenge test set show that our approach produces
    higher accuracy and faster convergence rates than the individual neural
    networks. Additionally, our method, which does not use any manual feature
    engineering, approaches the state of the art established with methods that make
    heavy use of it.

    A Morphology-aware Network for Morphological Disambiguation

    Eray Yildiz, Caglar Tirkaz, H. Bahadir Sahin, Mustafa Tolga Eren, Ozan Sonmez
    Comments: 6 pages, 1 figure, Thirtieth AAAI Conference on Artificial Intelligence. 2016
    Subjects: Computation and Language (cs.CL)

    Agglutinative languages such as Turkish, Finnish and Hungarian require
    morphological disambiguation before further processing due to the complex
    morphology of words. A morphological disambiguator is used to select the
    correct morphological analysis of a word. Morphological disambiguation is
    important because it generally is one of the first steps of natural language
    processing and its performance affects subsequent analyses. In this paper, we
    propose a system that uses deep learning techniques for morphological
    disambiguation. Many of the state-of-the-art results in computer vision, speech
    recognition and natural language processing have been obtained through deep
    learning models. However, applying deep learning techniques to morphologically
    rich languages is not well studied. In this work, while we focus on Turkish
    morphological disambiguation we also present results for French and German in
    order to show that the proposed architecture achieves high accuracy with no
    language-specific feature engineering or additional resource. In the
    experiments, we achieve 84.12, 88.35 and 93.78 morphological disambiguation
    accuracy among the ambiguous words for Turkish, German and French respectively.

    Learning to Parse and Translate Improves Neural Machine Translation

    Akiko Eriguchi, Yoshimasa Tsuruoka, Kyunghyun Cho
    Subjects: Computation and Language (cs.CL)

    There has been relatively little attention to incorporating linguistic prior
    to neural machine translation. Much of the previous work was further
    constrained to considering linguistic prior on the source side. In this paper,
    we propose a hybrid model, called NMT+RG, that learns to parse and translate by
    combining the recurrent neural network grammar into the attention-based neural
    machine translation. Our approach encourages the neural machine translation
    model to incorporate linguistic prior during training, and lets it translate on
    its own afterward. Extensive experiments with four language pairs show the
    effectiveness of the proposed NMT+RG.

    Vector Embedding of Wikipedia Concepts and Entities

    Ehsan Sherkat, Evangelos Milios
    Subjects: Computation and Language (cs.CL)

    Using deep learning for different machine learning tasks such as image
    classification and word embedding has recently gained many attentions. Its
    appealing performance reported across specific Natural Language Processing
    (NLP) tasks in comparison with other approaches is the reason for its
    popularity. Word embedding is the task of mapping words or phrases to a low
    dimensional numerical vector. In this paper, we use deep learning to embed
    Wikipedia Concepts and Entities. The English version of Wikipedia contains more
    than five million pages, which suggest its capability to cover many English
    Entities, Phrases, and Concepts. Each Wikipedia page is considered as a
    concept. Some concepts correspond to entities, such as a person’s name, an
    organization or a place. Contrary to word embedding, Wikipedia Concepts
    Embedding is not ambiguous, so there are different vectors for concepts with
    similar surface form but different mentions. We proposed several approaches and
    evaluated their performance based on Concept Analogy and Concept Similarity
    tasks. The results show that proposed approaches have the performance
    comparable and in some cases even higher than the state-of-the-art methods.

    Parallel Long Short-Term Memory for Multi-stream Classification

    Mohamed Bouaziz, Mohamed Morchid, Richard Dufour, Georges Linarès, Renato De Mori
    Comments: 2016 IEEE Workshop on Spoken Language Technology
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Recently, machine learning methods have provided a broad spectrum of original
    and efficient algorithms based on Deep Neural Networks (DNN) to automatically
    predict an outcome with respect to a sequence of inputs. Recurrent hidden cells
    allow these DNN-based models to manage long-term dependencies such as Recurrent
    Neural Networks (RNN) and Long Short-Term Memory (LSTM). Nevertheless, these
    RNNs process a single input stream in one (LSTM) or two (Bidirectional LSTM)
    directions. But most of the information available nowadays is from multistreams
    or multimedia documents, and require RNNs to process these information
    synchronously during the training. This paper presents an original LSTM-based
    architecture, named Parallel LSTM (PLSTM), that carries out multiple parallel
    synchronized input sequences in order to predict a common output. The proposed
    PLSTM method could be used for parallel sequence classification purposes. The
    PLSTM approach is evaluated on an automatic telecast genre sequences
    classification task and compared with different state-of-the-art architectures.
    Results show that the proposed PLSTM method outperforms the baseline n-gram
    models as well as the state-of-the-art LSTM approach.

    Learning Concept Embeddings for Efficient Bag-of-Concepts Densification

    Walid Shalaby, Wlodek Zadrozny
    Subjects: Computation and Language (cs.CL)

    Explicit concept space models have proven efficacy for text representation in
    many natural language and text mining applications. The idea is to embed
    textual structures into a semantic space of concepts which captures the main
    topics of these structures. That so called bag-of-concepts representation
    suffers from data sparsity causing low similarity scores between similar texts
    due to low concept overlap. In this paper we propose two neural embedding
    models in order to learn continuous concept vectors. Once learned, we propose
    an efficient vector aggregation method to generate fully dense bag-of-concepts
    representations. Empirical results on a benchmark dataset for measuring entity
    semantic relatedness show superior performance over other concept embedding
    models. In addition, by utilizing our efficient aggregation method, we
    demonstrate the effectiveness of the densified vector representation over the
    typical sparse representations for dataless classification where we can achieve
    at least same or better accuracy with much less dimensions.

    Universal Dependencies to Logical Forms with Negation Scope

    Federico Fancellu, Siva Reddy, Adam Lopez, Bonnie Webber
    Comments: This a draft version of the paper. We welcome any comments you may have regarding the content and presentation
    Subjects: Computation and Language (cs.CL)

    Many language technology applications would benefit from the ability to
    represent negation and its scope on top of widely-used linguistic resources. In
    this paper, we investigate the possibility of obtaining a first-order logic
    representation with negation scope marked using Universal Dependencies. To do
    so, we enhance UDepLambda, a framework that converts dependency graphs to
    logical forms. The resulting UDepLambda(lnot) is able to handle phenomena
    related to scope by means of an higher-order type theory, relevant not only to
    negation but also to universal quantification and other complex semantic
    phenomena. The initial conversion we did for English is promising, in that one
    can represent the scope of negation also in the presence of more complex
    phenomena such as universal quantifiers.

    Bilateral Multi-Perspective Matching for Natural Language Sentences

    Zhiguo Wang, Wael Hamza, Radu Florian
    Comments: 7
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Natural language sentence matching is a fundamental technology for a variety
    of tasks. Previous approaches either match sentences from a single direction or
    only apply single granular (word-by-word or sentence-by-sentence) matching. In
    this work, we propose a bilateral multi-perspective matching (BiMPM) model
    under the “matching-aggregation” framework. Given two sentences (P) and (Q),
    our model first encodes them with a BiLSTM encoder. Next, we match the two
    encoded sentences in two directions (P
    ightarrow Q) and (P leftarrow Q). In
    each matching direction, each time step of one sentence is matched against all
    time-steps of the other sentence from multiple perspectives. Then, another
    BiLSTM layer is utilized to aggregate the matching results into a fix-length
    matching vector. Finally, based on the matching vector, the decision is made
    through a fully connected layer. We evaluate our model on three tasks:
    paraphrase identification, natural language inference and answer sentence
    selection. Experimental results on standard benchmark datasets show that our
    model achieves the state-of-the-art performance on all tasks.


    Distributed, Parallel, and Cluster Computing

    Unit Commitment on the Cloud

    Mushfiqur R. Sarker, Jianhui Wang
    Comments: 2 pages, 1 figure, 1 table, IEEE Letter
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    The advent of High Performance Computing (HPC) has provided the computational
    capacity required for power system operators (SO) to obtain solutions in the
    least time to highly-complex applications, i.e., Unit Commitment (UC). The UC
    problem, which attempts to schedule the least-cost combination of generating
    units to meet the load, is increasing in complexity and problem size due to
    deployments of renewable resources and smart grid technologies. The current
    approach to solving the UC problem consists of in-house HPC infrastructures,
    which experience issues at scale, and demands high maintenance and capital
    expenditures. On the other hand, cloud computing is an ideal substitute due to
    its powerful computational capacity, rapid scalability, and high
    cost-effectiveness. In this work, the benefits and challenges of outsourcing
    the UC application to the cloud are explored. A quantitative analysis of the
    computational performance gain is explored for a large-scale UC problem solved
    on the cloud and compared to traditional in-house HPC infrastructure. The
    results show substantial reduction in solve time when outsourced to the cloud.

    Random walk based in-network computation of arbitrary functions

    Iqra Altaf Gillani, Pooja Vyavahare, Amitabha Bagchi
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

    We study the in-network computation of arbitrary functions whose computation
    schema is a complete binary tree, i.e., we assume that the network wants to
    compute a function of (K) operands, each of which is available at a distinct
    node in the network, and rather than simply collecting the (K) operands at a
    single sink node and computing the function, we want to compute the function
    during the process of moving the data towards the sink. Such settings have been
    studied in the literature but largely only for symmetric functions, e.g.
    average, parity etc., which have the specific property that the output is
    invariant to permutations of the operands. To the best of our knowledge, we
    present the first decentralised algorithms for arbitrary functions. We propose
    two algorithms, Fixed Metropolis-Compute and Flexible Metropolis-Compute, for
    this problem, both of which use random walks on the network as their basic
    primitive. Assuming that time is slotted we provide upper bounds on time taken
    to compute the function, characterising this time in terms of the fundamental
    parameters of the random walk on a graph: the hitting time in the case of Fixed
    Metropolis-Compute, and the mixing time in the case of Flexible
    Metropolis-Compute. Assuming a stochastic model for the generation of streams
    of data at each source, we also provide lower and upper bound on the rate at
    which Fixed Metropolis-Compute is able to compute the stream of associated
    function values.

    System Modeling in the COSMA Environment

    Wiktor B. Daszczuk, Waldemar Grabski, Jerzy Mieścicki, Jacek Wytrębowicz
    Comments: 6 pages, 3 figures, 1 table
    Journal-ref: Proc. Euromicro Symposium on Digital Systems Design –
    Architectures, Methods and Tools, September 4-6 2001, Warsaw, Poland, pp.
    152-157
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    The aim of this paper is to demonstrate how the COSMA environment can be used
    for system modeling. This environment is a set of tools based on Concurrent
    State Machines paradigm and is developed in the Institute of Computer Science
    at the Warsaw University of Technology. Our demonstration example is a
    distributed brake control system dedicated for a railway transport. The paper
    shortly introduces COSMA. Next it shows how the example model can be validated
    by our temporal logic analyzer.

    Leader Election in Trees with Customized Advice

    Barun Gorain, Andrzej Pelc
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Leader election is a basic symmetry breaking problem in distributed
    computing. All nodes of a network have to agree on a single node, called the
    leader. If the nodes of the network have distinct labels, then agreeing on a
    single node means that all nodes have to output the label of the elected
    leader.

    If the nodes are anonymous, the task of leader election is formulated as
    follows: every node of the network must output a simple path starting at it,
    which is coded as a sequence of port numbers, such that all these paths end at
    a common node, the leader. In this paper, we study deterministic leader
    election in anonymous trees.

    Our goal is to establish tradeoffs between the allocated time ( au) and the
    amount of information that has to be given {em a priori} to the nodes of a
    network to enable leader election in time ( au). Following the framework of
    {em algorithms with advice}, this information is provided to all nodes at the
    start by an oracle knowing the entire tree, in form of binary strings assigned
    to all nodes. There are two possible variants of formulating this advice
    assignment. Either the strings provided to all nodes are identical, or strings
    assigned to different nodes may be potentially different, i.e., advice can be
    {em customized}. As opposed to previous papers on leader election with advice,
    in this paper we consider the latter option.

    The maximum length of all assigned binary strings is called the {em size of
    advice}.

    For a given time ( au) allocated to leader election, we give upper and lower
    bounds on the minimum size of advice sufficient to perform leader election in
    time ( au). All our bounds except one pair are tight up to multiplicative
    constants, and in this one exceptional case, the gap between the upper and the
    lower bound is very small.

    Gathering Anonymous, Oblivious Robots on a Grid

    Matthias Fischer, Daniel Jung, Friedhelm Meyer auf der Heide
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA); Robotics (cs.RO)

    We consider a swarm of (n) autonomous mobile robots, distributed on a
    2-dimensional grid. A basic task for such a swarm is the gathering process: all
    robots have to gather at one (not predefined) place. The work in this paper is
    motivated by the following insight: On one side, for swarms of robots
    distributed in the 2-dimensional Euclidean space, several gathering algorithms
    are known for extremely simple robots that are oblivious, have bounded viewing
    radius, no compass, and no “flags” to communicate a status to others. On the
    other side, in case of the 2-dimensional grid, the only known gathering
    algorithms for robots with bounded viewing radius without compass, need to
    memorize a constant number of rounds and need flags.

    In this paper we contribute the, to the best of our knowledge, first
    gathering algorithm on the grid, which works for anonymous, oblivious robots
    with bounded viewing range, without any further means of communication and
    without any memory. We prove its correctness and an (O(n^2)) time bound. This
    time bound matches those of the best known algorithms for the Euclidean plane
    mentioned above.

    Efficient Resource Allocation in Mass Customization based on Service Oriented Architecture

    Ali Vatankhah Barenji, Reza Vatankhah Barenji
    Comments: 6
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Mass customization explains the phenomenon to provide a unique designed
    products and services to all customer by achieving a high process integration
    and flexibility. It has been used as a competitive approach by many companies.
    Adequate resource implementation in mass customization-particularly in terms of
    resource usage, it is therefore important to meet customer’s requirement in
    terms effective responsiveness and meeting deadlines, at the same time offering
    high scalability. An architecture for solving the resource allocation issue in
    mass customized flexible manufacturing system was illustrated, by putting in
    place a couple of advance reservation systems and scheduling algorithms for
    effective usage of the product.

    Training Deep Neural Networks via Optimization Over Graphs

    Guoqiang Zhang, W. Bastiaan Kleijn
    Comments: 5 pages
    Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

    In this work, we propose to train a deep neural network by distributed
    optimization over a graph. Two nonlinear functions are considered: the
    rectified linear unit (ReLU) and a linear unit with both lower and upper
    cutoffs (DCutLU). The problem reformulation over a graph is realized by
    explicitly representing ReLU or DCutLU using a set of slack variables. We then
    apply the alternating direction method of multipliers (ADMM) to update the
    weights of the network layerwise by solving subproblems of the reformulated
    problem. Empirical results suggest that by proper parameter selection, the
    ADMM- based method converges considerably faster than gradient descent method.


    Learning

    Next-Step Conditioned Deep Convolutional Neural Networks Improve Protein Secondary Structure Prediction

    Akosua Busia, Navdeep Jaitly
    Comments: 11 pages, 3 figures, 4 tables, submitted to ISMB/ECCB 2017
    Subjects: Learning (cs.LG); Biomolecules (q-bio.BM)

    Recently developed deep learning techniques have significantly improved the
    accuracy of various speech and image recognition systems. In this paper we show
    how to adapt some of these techniques to create a novel chained convolutional
    architecture with next-step conditioning for improving performance on protein
    sequence prediction problems. We explore its value by demonstrating its ability
    to improve performance on eight-class secondary structure prediction. We first
    establish a state-of-the-art baseline by adapting recent advances in
    convolutional neural networks which were developed for vision tasks. This model
    achieves 70.0% per amino acid accuracy on the CB513 benchmark dataset without
    use of standard performance-boosting techniques such as ensembling or multitask
    learning. We then improve upon this state-of-the-art result using a novel
    chained prediction approach which frames the secondary structure prediction as
    a next-step prediction problem. This sequential model achieves 70.3% Q8
    accuracy on CB513 with a single model; an ensemble of these models produces
    71.4% Q8 accuracy on the same test set, improving upon the previous overall
    state of the art for the eight-class secondary structure problem. Our models
    are implemented using TensorFlow, an open-source machine learning software
    library available at TensorFlow.org; we aim to release the code for these
    experiments as part of the TensorFlow repository.

    Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

    Maxim Raginsky, Alexander Rakhlin, Matus Telgarsky
    Comments: 29 pages
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Probability (math.PR); Machine Learning (stat.ML)

    Stochastic Gradient Langevin Dynamics (SGLD) is a popular variant of
    Stochastic Gradient Descent, where properly scaled isotropic Gaussian noise is
    added to an unbiased estimate of the gradient at each iteration. This modest
    change allows SGLD to escape local minima and suffices to guarantee asymptotic
    convergence to global minimizers for sufficiently regular non-convex objectives
    (Gelfand and Mitter, 1991).

    The present work provides a nonasymptotic analysis in the context of
    non-convex learning problems: SGLD requires ( ilde{O}(varepsilon^{-4}))
    iterations to sample ( ilde{O}(varepsilon))-approximate minimizers of both
    empirical and population risk, where ( ilde{O}(cdot)) hides polynomial
    dependence on a temperature parameter, the model dimension, and a certain
    spectral gap parameter.

    As in the asymptotic setting, our analysis relates the discrete-time SGLD
    Markov chain to a continuous-time diffusion process. A new tool that drives the
    results is the use of weighted transportation cost inequalities to quantify the
    rate of convergence of SGLD to a stationary distribution in the Euclidean
    (2)-Wasserstein distance.

    Is Big Data Sufficient for a Reliable Detection of Non-Technical Losses?

    Patrick Glauner, Angelo Migliosi, Jorge Meira, Eric Aislan Antonelo, Petko Valtchev, Radu State, Franck Bettinger
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Non-technical losses (NTL) occur during the distribution of electricity in
    power grids and include, but are not limited to, electricity theft and faulty
    meters. In emerging countries, they may range up to 40% of the total
    electricity distributed. In order to detect NTLs, machine learning methods are
    used that learn irregular consumption patterns from customer data and
    inspection results. The Big Data paradigm followed in modern machine learning
    reflects the desire of deriving better conclusions from simply analyzing more
    data, without the necessity of looking at theory and models. However, the
    sample of inspected customers may be biased, i.e. it does not represent the
    population of all customers. As a consequence, machine learning models trained
    on these inspection results are biased as well and therefore lead to unreliable
    predictions of whether customers cause NTL or not. In machine learning, this
    issue is called covariate shift and has not been addressed in the literature on
    NTL detection yet. In this work, we present a novel framework for quantifying
    and visualizing covariate shift. We apply it to a commercial data set from
    Brazil that consists of 3.6M customers and 820K inspection results. We show
    that some features have a stronger covariate shift than others, making
    predictions less reliable. In particular, previous inspections were focused on
    certain neighborhoods or customer classes and that they were not sufficiently
    spread among the population of customers. This framework is about to be
    deployed in a commercial product for NTL detection.

    Coresets for Kernel Regression

    Yan Zheng, Jeff M. Phillips
    Comments: 11 pages, 20 figures
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS)

    Kernel regression is an essential and ubiquitous tool for non-parametric data
    analysis, particularly popular among time series and spatial data. However, the
    central operation which is performed many times, evaluating a kernel on the
    data set, takes linear time. This is impractical for modern large data sets.

    In this paper we describe coresets for kernel regression: compressed data
    sets which can be used as proxy for the original data and have provably bounded
    worst case error. The size of the coresets are independent of the raw number of
    data points, rather they only depend on the error guarantee, and in some cases
    the size of domain and amount of smoothing. We evaluate our methods on very
    large time series and spatial data, and demonstrate that they incur negligible
    error, can be constructed extremely efficiently, and allow for great
    computational gains.

    A Multi-model Combination Approach for Probabilistic Wind Power Forecasting

    You Lin, Ming Yang, Can Wan, Jianhui Wang, Yonghua Song
    Subjects: Learning (cs.LG); Applications (stat.AP)

    Short-term probabilistic wind power forecasting can provide critical
    quantified uncertainty information of wind generation for power system
    operation and control. As the complicated characteristics of wind power
    prediction error, it would be difficult to develop a universal forecasting
    model dominating over other alternative models. Therefore, a novel multi-model
    combination (MMC) approach for short-term probabilistic wind generation
    forecasting is proposed in this paper to exploit the advantages of different
    forecasting models. The proposed approach can combine different forecasting
    models those provide different kinds of probability density functions to
    improve the probabilistic forecast accuracy. Three probabilistic forecasting
    models based on the sparse Bayesian learning, kernel density estimation and
    beta distribution fitting are used to form the combined model. The parameters
    of the MMC model are solved based on Bayesian framework. Numerical tests
    illustrate the effectiveness of the proposed MMC approach.

    Nearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection

    Lijie Chen, Jian Li, Mingda Qiao
    Comments: Accepted by AISTATS 2017
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS)

    In the Best-(k)-Arm problem, we are given (n) stochastic bandit arms, each
    associated with an unknown reward distribution. We are required to identify the
    (k) arms with the largest means by taking as few samples as possible. In this
    paper, we make progress towards a complete characterization of the
    instance-wise sample complexity bounds for the Best-(k)-Arm problem. On the
    lower bound side, we obtain a novel complexity term to measure the sample
    complexity that every Best-(k)-Arm instance requires. This is derived by an
    interesting and nontrivial reduction from the Best-(1)-Arm problem. We also
    provide an elimination-based algorithm that matches the instance-wise lower
    bound within doubly-logarithmic factors. The sample complexity of our algorithm
    strictly dominates the state-of-the-art for Best-(k)-Arm (module constant
    factors).

    Concept Drift Adaptation by Exploiting Historical Knowledge

    Yu Sun, Ke Tang, Zexuan Zhu, Xin Yao
    Comments: First version
    Subjects: Learning (cs.LG)

    Incremental learning with concept drift has often been tackled by ensemble
    methods, where models built in the past can be re-trained to attain new models
    for the current data. Two design questions need to be addressed in developing
    ensemble methods for incremental learning with concept drift, i.e., which
    historical (i.e., previously trained) models should be preserved and how to
    utilize them. A novel ensemble learning method, namely Diversity and Transfer
    based Ensemble Learning (DTEL), is proposed in this paper. Given newly arrived
    data, DTEL uses each preserved historical model as an initial model and further
    trains it with the new data via transfer learning. Furthermore, DTEL preserves
    a diverse set of historical models, rather than a set of historical models that
    are merely accurate in terms of classification accuracy. Empirical studies on
    15 synthetic data streams and 4 real-world data streams (all with concept
    drifts) demonstrate that DTEL can handle concept drift more effectively than 4
    other state-of-the-art methods.

    Training Deep Neural Networks via Optimization Over Graphs

    Guoqiang Zhang, W. Bastiaan Kleijn
    Comments: 5 pages
    Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

    In this work, we propose to train a deep neural network by distributed
    optimization over a graph. Two nonlinear functions are considered: the
    rectified linear unit (ReLU) and a linear unit with both lower and upper
    cutoffs (DCutLU). The problem reformulation over a graph is realized by
    explicitly representing ReLU or DCutLU using a set of slack variables. We then
    apply the alternating direction method of multipliers (ADMM) to update the
    weights of the network layerwise by solving subproblems of the reformulated
    problem. Empirical results suggest that by proper parameter selection, the
    ADMM- based method converges considerably faster than gradient descent method.

    Generative Mixture of Networks

    Ershad Banijamali, Ali Ghodsi, Pascal Poupart
    Comments: 9 pages
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    A generative model based on training deep architectures is proposed. The
    model consists of K networks that are trained together to learn the underlying
    distribution of a given data set. The process starts with dividing the input
    data into K clusters and feeding each of them into a separate network. After
    few iterations of training networks separately, we use an EM-like algorithm to
    train the networks together and update the clusters of the data. We call this
    model Mixture of Networks. The provided model is a platform that can be used
    for any deep structure and be trained by any conventional objective function
    for distribution modeling. As the components of the model are neural networks,
    it has high capability in characterizing complicated data distributions as well
    as clustering data. We apply the algorithm on MNIST hand-written digits and
    Yale face datasets. We also demonstrate the clustering ability of the model
    using some real-world and toy examples.

    Cognitive Mapping and Planning for Visual Navigation

    Saurabh Gupta, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik
    Comments: Under review for CVPR 2017. Project webpage: this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)

    We introduce a neural architecture for navigation in novel environments. Our
    proposed architecture learns to map from first-person viewpoints and plans a
    sequence of actions towards goals in the environment. The Cognitive Mapper and
    Planner (CMP) is based on two key ideas: a) a unified joint architecture for
    mapping and planning, such that the mapping is driven by the needs of the
    planner, and b) a spatial memory with the ability to plan given an incomplete
    set of observations about the world. CMP constructs a top-down belief map of
    the world and applies a differentiable neural net planner to produce the next
    action at each time step. The accumulated belief of the world enables the agent
    to track visited regions of the environment. Our experiments demonstrate that
    CMP outperforms both reactive strategies and standard memory-based
    architectures and performs well in novel environments. Furthermore, we show
    that CMP can also achieve semantically specified goals, such as ‘go to a
    chair’.

    DNN Filter Bank Cepstral Coefficients for Spoofing Detection

    Hong Yu, Zheng-Hua Tan, Zhanyu Ma, Jun Guo
    Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Learning (cs.LG)

    With the development of speech synthesis techniques, automatic speaker
    verification systems face the serious challenge of spoofing attack. In order to
    improve the reliability of speaker verification systems, we develop a new
    filter bank based cepstral feature, deep neural network filter bank cepstral
    coefficients (DNN-FBCC), to distinguish between natural and spoofed speech. The
    deep neural network filter bank is automatically generated by training a filter
    bank neural network (FBNN) using natural and synthetic speech. By adding
    restrictions on the training rules, the learned weight matrix of FBNN is
    band-limited and sorted by frequency, similar to the normal filter bank. Unlike
    the manually designed filter bank, the learned filter bank has different filter
    shapes in different channels, which can capture the differences between natural
    and synthetic speech more effectively. The experimental results on the ASVspoof
    {2015} database show that the Gaussian mixture model maximum-likelihood
    (GMM-ML) classifier trained by the new feature performs better than the
    state-of-the-art linear frequency cepstral coefficients (LFCC) based
    classifier, especially on detecting unknown attacks.

    Similarity Preserving Representation Learning for Time Series Analysis

    Qi Lei, Jinfeng Yi, Roman Vaculin, Lingfei Wu, Inderjit S. Dhillon
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    A considerable amount of machine learning algorithms take matrices as their
    inputs. As such, they cannot directly analyze time series data due to its
    temporal nature, usually unequal lengths, and complex properties. This is a
    great pity since many of these algorithms are effective, robust, efficient, and
    easy to use. In this paper, we bridge this gap by proposing an efficient
    representation learning framework that is able to convert a set of time series
    with equal or unequal lengths to a matrix format. In particular, we guarantee
    that the pairwise similarities between time series are well preserved after the
    transformation. Therefore, the learned feature representation is particularly
    suitable to the class of learning problems that are sensitive to data
    similarities. Given a set of (n) time series, we first construct an (n imes n)
    partially observed similarity matrix by randomly sampling (O(n log n)) pairs
    of time series and computing their pairwise similarities. We then propose an
    extremely efficient algorithm that solves a highly non-convex and NP-hard
    problem to learn new features based on the partially observed similarity
    matrix. We use the learned features to conduct experiments on both data
    classification and clustering tasks. Our extensive experimental results
    demonstrate that the proposed framework is both effective and efficient.

    Enabling Robots to Communicate their Objectives

    Sandy H. Huang, David Held, Pieter Abbeel, Anca D. Dragan
    Subjects: Robotics (cs.RO); Learning (cs.LG)

    Our ultimate goal is to efficiently enable end-users to correctly anticipate
    a robot’s behavior in novel situations. This behavior is often a direct result
    of the robot’s underlying objective function. Our insight is that end-users
    need to have an accurate mental model of this objective function in order to
    understand and predict what the robot will do. While people naturally develop
    such a mental model over time through observing the robot act, this
    familiarization process may be lengthy. Our approach reduces this time by
    having the robot model how people infer objectives from observed behavior, and
    then selecting those behaviors that are maximally informative. The problem of
    computing a posterior over objectives from observed behavior is known as
    Inverse Reinforcement Learning (IRL), and has been applied to robots learning
    human objectives. We consider the problem where the roles of human and robot
    are swapped. Our main contribution is to recognize that unlike robots, humans
    will not be emph{exact} in their IRL inference. We thus introduce two factors
    to define candidate approximate-inference models for human learning in this
    setting, and analyze them in a user study in the autonomous driving domain. We
    show that certain approximate-inference models lead to the robot generating
    example behaviors that better enable users to anticipate what the robot will do
    in test situations. Our results also suggest, however, that additional research
    is needed in modeling how humans extrapolate from examples of robot behavior.

    A Collective, Probabilistic Approach to Schema Mapping: Appendix

    Angelika Kimmig, Alex Memory, Renee J. Miller, Lise Getoor
    Comments: This is the appendix to the paper “A Collective, Probabilistic Approach to Schema Mapping” accepted to ICDE 2017
    Subjects: Databases (cs.DB); Learning (cs.LG)

    In this appendix we provide additional supplementary material to “A
    Collective, Probabilistic Approach to Schema Mapping.” We include an additional
    extended example, supplementary experiment details, and proof for the
    complexity result stated in the main paper.

    Parallel Long Short-Term Memory for Multi-stream Classification

    Mohamed Bouaziz, Mohamed Morchid, Richard Dufour, Georges Linarès, Renato De Mori
    Comments: 2016 IEEE Workshop on Spoken Language Technology
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Recently, machine learning methods have provided a broad spectrum of original
    and efficient algorithms based on Deep Neural Networks (DNN) to automatically
    predict an outcome with respect to a sequence of inputs. Recurrent hidden cells
    allow these DNN-based models to manage long-term dependencies such as Recurrent
    Neural Networks (RNN) and Long Short-Term Memory (LSTM). Nevertheless, these
    RNNs process a single input stream in one (LSTM) or two (Bidirectional LSTM)
    directions. But most of the information available nowadays is from multistreams
    or multimedia documents, and require RNNs to process these information
    synchronously during the training. This paper presents an original LSTM-based
    architecture, named Parallel LSTM (PLSTM), that carries out multiple parallel
    synchronized input sequences in order to predict a common output. The proposed
    PLSTM method could be used for parallel sequence classification purposes. The
    PLSTM approach is evaluated on an automatic telecast genre sequences
    classification task and compared with different state-of-the-art architectures.
    Results show that the proposed PLSTM method outperforms the baseline n-gram
    models as well as the state-of-the-art LSTM approach.

    Batch Policy Gradient Methods for Improving Neural Conversation Models

    Kirthevasan Kandasamy, Yoram Bachrach, Ryota Tomioka, Daniel Tarlow, David Carter
    Comments: International Conference on Learning Representations (ICLR) 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We study reinforcement learning of chatbots with recurrent neural network
    architectures when the rewards are noisy and expensive to obtain. For instance,
    a chatbot used in automated customer service support can be scored by quality
    assurance agents, but this process can be expensive, time consuming and noisy.
    Previous reinforcement learning work for natural language processing uses
    on-policy updates and/or is designed for on-line learning settings. We
    demonstrate empirically that such strategies are not appropriate for this
    setting and develop an off-policy batch policy gradient method (BPG). We
    demonstrate the efficacy of our method via a series of synthetic experiments
    and an Amazon Mechanical Turk experiment on a restaurant recommendations
    dataset.


    Information Theory

    On the Transport Capability of LAN Cables

    Syed Hassan Raza Naqvi, Andrea Matera, Lorenzo Combi, Umberto Spagnolini
    Subjects: Information Theory (cs.IT)

    Centralized Radio Access Network (C-RAN) architecture is the only viable
    solution to handle the complex interference scenario generated by massive
    antennas and small cells deployment as required by next generation (5G) mobile
    networks. In conventional C-RAN, the fronthaul links used to exchange the
    signal between Base Band Units (BBUs) and Remote Antenna Units (RAUs) are based
    on digital baseband (BB) signals over optical fibers due to the huge bandwidth
    required. In this paper we evaluate the transport capability of copper-based
    all-analog fronthaul architecture called Radio over Copper (RoC) that leverages
    on the pre-existing LAN cables that are already deployed in buildings and
    enterprises. In particular, the main contribution of the paper is to evaluate
    the number of independent BB signals for multiple antennas system that can be
    transported over multi-pair Cat-5/6/7 cables under a predefined fronthauling
    transparency condition in terms of maximum BB signal degradation. The MIMO-RoC
    proves to be a complementary solution to optical fiber for the last 200m toward
    the RAUs, mostly to reuse the existing LAN cables and to power-supply the RAUs
    over the same cable.

    A Study on the Impact of Locality in the Decoding of Binary Cyclic Codes

    M. Nikhil Krishnan, Bhagyashree Puranik, P. Vijay Kumar, Itzhak Tamo, Alexander Barg
    Comments: Extended version of a paper submitted to ISIT 2017
    Subjects: Information Theory (cs.IT)

    In this paper, we study the impact of locality on the decoding of binary
    cyclic codes under two approaches, namely ordered statistics decoding (OSD) and
    trellis decoding. Given a binary cyclic code having locality or availability,
    we suitably modify the OSD to obtain gains in terms of the Signal-To-Noise
    ratio, for a given reliability and essentially the same level of decoder
    complexity. With regard to trellis decoding, we show that careful introduction
    of locality results in the creation of cyclic subcodes having lower maximum
    state complexity. We also present a simple upper-bounding technique on the
    state complexity profile, based on the zeros of the code. Finally, it is shown
    how the decoding speed can be significantly increased in the presence of
    locality, in the moderate-to-high SNR regime, by making use of a quick-look
    decoder that often returns the ML codeword.

    Stealthy Secret Key Generation

    Pin-Hsun Lin, Carsten Rudolf Janda, Eduard Axel Jorswieck
    Comments: 22 pages, 1 figure, submitted to ISIT 2017
    Subjects: Information Theory (cs.IT)

    In this work, we consider a complete covert communication system, which
    includes the source-model of a stealthy secret key generation (SSKG) as the
    first phase. The generated key will be used for the covert communication in the
    second phase of the current round and also in the first phase of the next
    round. We investigate the stealthy SK rate performance of the first phase. The
    derived results show that the SK capacity lower and upper bounds of the
    source-model SKG are not affected by the additional stealth constraint. This
    result implies that we can attain the SSKG capacity for free when the sequences
    observed by the three terminals Alice ((X^n)), Bob ((Y^n)) and Willie ((Z^n))
    follow a Markov chain relationship, i.e., (X^n-Y^n-Z^n). We then prove that the
    sufficient condition to attain both, the SK capacity as well as the SSK
    capacity, can be relaxed from physical to stochastic degradedness. In order to
    underline the practical relevance, we also derive a sufficient condition to
    attain the degradedness by the usual stochastic order for Maurer’s fast fading
    Gaussian (satellite) model for the source of common randomness.

    On Muting Mobile Terminals for Uplink Interference Mitigation in HetNets — System-Level Analysis via Stochastic Geometry

    F. J. Martin-Vega, M. C. Aguayo-Torres, G. Gomez, M. Di Renzo
    Comments: 16 pages, 12 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
    Subjects: Information Theory (cs.IT)

    We investigate the performance of a scheduling algorithm where the Mobile
    Terminals (MTs) may be turned off if they cause a level of interference greater
    than a given threshold. This approach, which is referred to as Interference
    Aware Muting (IAM), may be regarded as an interference-aware scheme that is
    aimed to reduce the level of interference. We analyze its performance with the
    aid of stochastic geometry and compare it against other interference-unaware
    and interference-aware schemes, where the level of interference is kept under
    control in the power control scheme itself rather than in the scheduling
    process. IAM is studied in terms of average transmit power, mean and variance
    of the interference, coverage probability, Spectral Efficiency (SE), and Binary
    Rate (BR), which accounts for the amount of resources allocated to the typical
    MT. Simplified expressions of SE and BR for adaptive modulation and coding
    schemes are proposed, which better characterize practical communication
    systems. Our system-level analysis unveils that IAM increases the BR and
    reduces the mean and variance of the interference. It is proved that an
    operating regime exists, where the performance of IAM is independent of the
    cell association criterion, which simplifies the joint design of uplink and
    downlink transmissions.

    On the Energy/Distortion Tradeoff in the IoT

    Alessandro Biason, Chiara Pielli, Andrea Zanella, Michele Zorzi
    Comments: 14 pages, 8 figures, submitted to IEEE Transactions on Wireless Communications, Nov. 2016
    Subjects: Information Theory (cs.IT)

    The Internet of Things paradigm envisages the presence of many
    battery-powered sensors and this entails the design of energy-aware protocols.
    Source coding techniques allow to save some energy by compressing the packets
    sent over the network, but at the cost of a poorer accuracy in the
    representation of the data. This paper addresses the problem of designing
    efficient policies to jointly perform processing and transmission tasks. In
    particular, we aim at defining an optimal scheduling strategy with the twofold
    ultimate goal of extending the network lifetime and guaranteeing a low overall
    distortion of the transmitted data. We propose a Time Division Multiple Access
    (TDMA)-based access scheme that optimally allocates resources to heterogeneous
    nodes. We use realistic rate-distortion curves to quantify the impact of
    compression on the data quality and propose a complete energy model that
    includes the energy spent for processing and transmitting the data, as well as
    the circuitry costs. Both full knowledge and statistical knowledge of the
    wireless channels are considered, and optimal policies are derived for both
    cases. The overall problem is structured in a modular fashion and solved
    through convex and alternate programming techniques. Finally, we thoroughly
    evaluate the proposed algorithms and the influence of the design variables on
    the system performance adopting parameters of real sensors.

    Unified Analysis of SWIPT Relay Networks with Noncoherent Modulation

    Lina Mohjazi, Sami Muhaidat, Mehrdad Dianati, Mahmoud Al-Qutayri
    Subjects: Information Theory (cs.IT)

    Simultaneous wireless information and power transfer (SWIPT) relay networks
    represent a paradigm shift in the development of wireless networks, enabling
    simultaneous radio frequency (RF) energy harvesting (EH) and information
    processing. Different from conventional SWIPT relaying schemes, which typically
    assume the availability of perfect channel state information (CSI), here we
    consider the application of noncoherent modulation in order to avoid the need
    of instantaneous CSI estimation/tracking and minimise the energy consumption.
    We propose a unified and comprehensive analytical framework for the analysis of
    time switching (TS) and power splitting (PS) receiver architectures with the
    amplify-and-forward (AF) protocol. In particular, we adopt a moments-based
    approach to derive novel expressions for the outage probability, achievable
    throughput, and average symbol error rate (ASER) of the considered SWIPT
    system. We quantify the impact of several system parameters, involving relay
    location, energy conversion efficiency, and TS and PS ratio assumptions,
    imposed on the EH relay terminal. Our results reveal that the throughput
    performance of the TS protocol is superior to that of the PS protocol at lower
    receive signal-to-noise (SNR) values, which is in contrast to the
    point-to-point SWIPT systems. An extensive Monte Carlo simulation study is
    presented to corroborate the proposed analysis.

    Information and estimation in Fokker-Planck channels

    Andre Wibisono, Varun Jog, Po-Ling Loh
    Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)

    We study the relationship between information- and estimation-theoretic
    quantities in time-evolving systems. We focus on the Fokker-Planck channel
    defined by a general stochastic differential equation, and show that the time
    derivatives of entropy, KL divergence, and mutual information are characterized
    by estimation-theoretic quantities involving an appropriate generalization of
    the Fisher information. Our results vastly extend De Bruijn’s identity and the
    classical I-MMSE relation.

    On the Capacity of Signal Dependent Noise Channels

    Hamid Ghourchian, Gholamali Aminian, Amin Gohari, Mahtab Mirmohseni, Masoumeh Nasiri-Kenari
    Comments: 32 pages, 2 figures
    Subjects: Information Theory (cs.IT)

    In some applications, the variance of measurement noise depends on the signal
    that we aim to measure. For instance, additive Gaussian signal-dependent noise
    (AGSDN) channel models are used in molecular and optical communication. Herein
    we provide lower and upper bounds on the capacity of additive signal dependent
    noise (ASDN) channels. We also provide sufficient conditions under which the
    capacity becomes infinity.

    Subspace-Aware Index Codes

    Bhavya Kailkhura, Lakshmi Narasimhan Theagarajan, Pramod K. Varshney
    Subjects: Information Theory (cs.IT)

    In this letter, we extend the well-known index coding problem to exploit the
    structure in the source-data to improve system throughput. In many
    applications, the data to be transmitted may lie (or can be well approximated)
    in a low-dimensional subspace. We exploit this low-dimensional structure of the
    data using an algebraic framework to solve the index coding problem (referred
    to as subspace-aware index coding) as opposed to the traditional index coding
    problem which is subspace-unaware. Also, we propose an efficient algorithm
    based on the alternating minimization approach to obtain near optimal index
    codes for both subspace-aware and -unaware cases. Our simulations indicate that
    under certain conditions, a significant throughput gain (about 90%) can be
    achieved by subspace-aware index codes over conventional subspace-unaware index
    codes.

    On the capacity of bandlimited optical intensity channels with Gaussian noise

    Jing Zhou, Wenyi Zhang
    Comments: 13 pages, 9 figures, 4 tables
    Subjects: Information Theory (cs.IT)

    We determine lower and upper bounds on the capacity of bandlimited optical
    intensity channels (BLOIC) with white Gaussian noise. Three types of input
    power constraints are considered: 1) only an average power constraint, 2) only
    a peak power constraint, and 3) an average and a peak power constraint.
    Capacity lower bounds are derived by a two-step process including 1) for each
    type of constraint, designing admissible pulse amplitude modulated input
    waveform ensembles, and 2) lower bounding the maximum achievable information
    rates of the designed input ensembles. Capacity upper bounds are derived by
    exercising constraint relaxations and utilizing known results on discrete-time
    optical intensity channels. We obtain degrees-of-freedom-optimal (DOF-optimal)
    lower bounds which have the same pre-log factor as the upper bounds, thereby
    characterizing the high SNR capacity of BLOIC to within a finite gap. We
    further derive intersymbol-interference-free (ISI-free) signaling based lower
    bounds, which perform well for all practical SNR values. In particular, the
    ISI-free signaling based lower bounds outperform the DOF-optimal lower bound
    when the SNR is below 10 dB.

    Sense-and-Predict: Opportunistic MAC Based on Spatial Interference Correlation for Cognitive Radio Networks

    Jeemin Kim, Seung-Woo Ko, Han Cha, Seong-Lyun Kim
    Subjects: Information Theory (cs.IT)

    Opportunity detection at secondary transmitters (TXs) is a key technique
    enabling cognitive radio (CR) networks. Such detection however cannot guarantee
    reliable communication at secondary receivers (RXs), especially when their
    association distance is long. To cope with the issue, this paper proposes a
    novel MAC called sense-and-predict (SaP), where each secondary TX decides
    whether to access or not based on the prediction of the interference level at
    RX. Firstly, we provide the spatial interference correlation in a probabilistic
    form using stochastic geometry, and utilize it to maximize the area spectral
    efficiency (ASE) for secondary networks while guaranteeing the service quality
    of primary networks. Through simulations and testbed experiments using USRP,
    SaP is shown to always achieve ASE improvement compared with the conventional
    TX based sensing.

    Classical-Quantum Arbitrarily Varying Wiretap Channel: Secret Message Transmission under Jamming Attacks

    Holger Boche, Minglai Cai, Christian Deppe, Janis Nötzel
    Subjects: Information Theory (cs.IT)

    We analyze arbitrarily varying classical-quantum wiretap channels.These
    channels are subject to two attacks at the same time: one passive
    (eavesdropping), and one active (jamming). We progress on previous works by
    introducing a reduced class of allowed codes that fulfills a more stringent
    secrecy requirement than earlier definitions. In addition, we prove that
    non-symmetrizability of the legal link is suficient for equality of the
    deterministic and the common randomness assisted secrecy capacities. At last,
    we focus on analytic properties of both secrecy capacities: We completely
    characterize their discontinuity points, and their super-activation properties.

    1-bit Massive MU-MIMO Precoding in VLSI

    Oscar Castañeda, Sven Jacobsson, Giuseppe Durisi, Mikael Coldrey, Tom Goldstein, Christoph Studer
    Comments: 13 pages, 5 figures, 2 tables; submitted to a journal
    Subjects: Information Theory (cs.IT); Hardware Architecture (cs.AR)

    Massive multiuser (MU) multiple-input multiple-output (MIMO) will be a core
    technology in fifth-generation (5G) wireless systems as it offers significant
    improvements in spectral efficiency compared to existing multi-antenna
    technologies. The presence of hundreds of antenna elements at the base station
    (BS), however, results in excessively high hardware costs and power
    consumption, and requires high interconnect throughput between the
    baseband-processing unit and the radio unit. Massive MU-MIMO that uses
    low-resolution analog-to-digital and digital-to-analog converters (DACs) has
    the potential to address all these issues. In this paper, we focus on downlink
    precoding for massive MU-MIMO systems with 1-bit DACs at the BS. The objective
    is to design precoders that simultaneously mitigate multi-user interference
    (MUI) and quantization artifacts. We propose two nonlinear 1-bit precoding
    algorithms and corresponding very-large scale integration (VLSI) designs. Our
    algorithms rely on biconvex relaxation, which enables the design of efficient
    1-bit precoding algorithms that achieve superior error-rate performance
    compared to that of linear precoding algorithms followed by quantization. To
    showcase the efficacy of our algorithms, we design VLSI architectures that
    enable efficient 1-bit precoding for massive MU-MIMO systems in which hundreds
    of antennas serve tens of user equipments. We present corresponding
    field-programmable gate array (FPGA) implementations to demonstrate that 1-bit
    precoding enables reliable and high-rate downlink data transmission in
    practical systems.

    On the Global-Local Dichotomy in Sparsity Modeling

    Dmitry Batenkov, Yaniv Romano, Michael Elad
    Subjects: Information Theory (cs.IT); Machine Learning (stat.ML)

    The traditional sparse modeling approach, when applied to inverse problems
    with large data such as images, essentially assumes a sparse model for small
    overlapping data patches. While producing state-of-the-art results, this
    methodology is suboptimal, as it does not attempt to model the entire global
    signal in any meaningful way – a nontrivial task by itself. In this paper we
    propose a way to bridge this theoretical gap by constructing a global model
    from the bottom up. Given local sparsity assumptions in a dictionary, we show
    that the global signal representation must satisfy a constrained
    underdetermined system of linear equations, which can be solved efficiently by
    modern optimization methods such as Alternating Direction Method of Multipliers
    (ADMM). We investigate conditions for unique and stable recovery, and provide
    numerical evidence corroborating the theory.

    The Connectivity of Millimeter-Wave Networks in Urban Environments Modeled Using Random Lattices

    Kaifeng Han, Kaibin Huang, Ying Cui, Yueping Wu
    Comments: 28 pages, 10 figures, submitted to IEEE Trans. on Wireless Communications
    Subjects: Information Theory (cs.IT)

    Millimeter-wave (mmWave) communication opens up tens of giga-hertz (GHz)
    spectrum in the mmWave band for use by next-generation wireless systems,
    thereby solving the problem of spectrum scarcity. Maintaining connectivity
    stands out to be a key design challenge for mmWave networks deployed in urban
    regions due to the blockage effect characterizing mmWave propagation.
    Specifically, mmWave signals can be blocked by buildings and other large urban
    objects. In this paper, we set out to investigate the blockage effect on the
    connectivity of mmWave networks in a Manhattan-type urban region modeled using
    a random regular lattice while base stations (BSs) are Poisson distributed in
    the plane. In particular, we analyze the connectivity probability that a
    typical user is within the transmission range of a BS and connected by a
    line-of-sight. Using random lattice and stochastic geometry theories, different
    lower bounds on the connectivity probability are derived as functions of the
    buildings’ size and probability of a lattice cell being occupied by a building
    as well as BS density and transmission range. The asymptotic connectivity
    probability is also derived for cases of dense buildings. Last, the results are
    extended to heterogeneous networks. Our study yields closed-form relations
    between the parameters of the building process and the BS process, providing
    useful guidelines for practical mmWave network deployment.

    Joint Precoding and RRH selection for User-centric Green MIMO C-RAN

    Cunhua Pan, Huiling Zhu, Nathan J. Gomes, Jiangzhou Wang
    Comments: This work has been accepted in IEEE TWC
    Subjects: Information Theory (cs.IT)

    This paper jointly optimizes the precoding matrices and the set of active
    remote radio heads (RRHs) to minimize the network power consumption (NPC) for a
    user-centric cloud radio access network (C-RAN), where both the RRHs and users
    have multiple antennas and each user is served by its nearby RRHs. Both users’
    rate requirements and per-RRH power constraints are considered. Due to these
    conflicting constraints, this optimization problem may be infeasible. In this
    paper, we propose to solve this problem in two stages. In Stage I, a
    low-complexity user selection algorithm is proposed to find the largest subset
    of feasible users. In Stage II, a low-complexity algorithm is proposed to solve
    the optimization problem with the users selected from Stage I. Specifically,
    the re-weighted (l_1)-norm minimization method is used to transform the
    original problem with non-smooth objective function into a series of weighted
    power minimization (WPM) problems, each of which can be solved by the weighted
    minimum mean square error (WMMSE) method. The solution obtained by the WMMSE
    method is proved to satisfy the Karush-Kuhn-Tucker (KKT) conditions of the WPM
    problem. Moreover, a low-complexity algorithm based on Newton’s method and the
    gradient descent method is developed to update the precoder matrices in each
    iteration of the WMMSE method. Simulation results demonstrate the rapid
    convergence of the proposed algorithms and the benefits of equipping multiple
    antennas at the user side. Moreover, the proposed algorithm is shown to achieve
    near-optimal performance in terms of NPC.

    Globally convergent Jacobi-type algorithms for simultaneous orthogonal symmetric tensor diagonalization

    Jianze Li, Konstantin Usevich, Pierre Comon
    Comments: 19 pages, 5 figures
    Subjects: Numerical Analysis (math.NA); Information Theory (cs.IT); Optimization and Control (math.OC)

    In this paper, we consider a family of Jacobi-type algorithms for
    simultaneous orthogonal diagonalization problem of symmetric tensors. For the
    Jacobi-based algorithm of [SIAM J. Matrix Anal. Appl., 2(34):651–672, 2013],
    we prove its global convergence for simultaneous orthogonal diagonalization of
    symmetric matrices and 3rd-order tensors. We also propose a new Jacobi-based
    algorithm in the general setting and prove its global convergence for a wide
    range of tensor problems (including Tucker approximation).




沪ICP备19023445号-2号
友情链接