IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 5 Apr 2017

    我爱机器学习(52ml.net)发表于 2017-04-05 00:00:00
    love 0

    Neural and Evolutionary Computing

    A Probabilistic Linear Genetic Programming with Stochastic Context-Free Grammar for solving Symbolic Regression problems

    Léo Françoso Dal Piccol Sotto, Vinícius Veloso de Melo
    Comments: Genetic and Evolutionary Computation Conference (GECCO) 2017, Berlin, Germany
    Subjects: Neural and Evolutionary Computing (cs.NE); Probability (math.PR); Machine Learning (stat.ML)

    Traditional Linear Genetic Programming (LGP) algorithms are based only on the
    selection mechanism to guide the search. Genetic operators combine or mutate
    random portions of the individuals, without knowing if the result will lead to
    a fitter individual. Probabilistic Model Building Genetic Programming (PMB-GP)
    methods were proposed to overcome this issue through a probability model that
    captures the structure of the fit individuals and use it to sample new
    individuals. This work proposes the use of LGP with a Stochastic Context-Free
    Grammar (SCFG), that has a probability distribution that is updated according
    to selected individuals. We proposed a method for adapting the grammar into the
    linear representation of LGP. Tests performed with the proposed probabilistic
    method, and with two hybrid approaches, on several symbolic regression
    benchmark problems show that the results are statistically better than the
    obtained by the traditional LGP.

    A Genetic Programming Approach to Designing Convolutional Neural Network Architectures

    Masanori Suganuma, Shinichi Shirakawa, Tomoharu Nagao
    Comments: This paper has been accepted to GECCO 2017. This is the submitted version on February 6, 2017
    Subjects: Neural and Evolutionary Computing (cs.NE)

    The convolutional neural network (CNN), which is one of the deep learning
    models, has seen much success in a variety of computer vision tasks. However,
    designing CNN architectures still requires expert knowledge and a lot of trial
    and error. In this paper, we attempt to automatically construct CNN
    architectures for an image classification task based on Cartesian genetic
    programming (CGP). In our method, we adopt highly functional modules, such as
    convolutional blocks and tensor concatenation, as the node functions in CGP.
    The CNN structure and connectivity represented by the CGP encoding method are
    optimized to maximize the validation accuracy. To evaluate the proposed method,
    we constructed a CNN architecture for the image classification task with the
    CIFAR-10 dataset. The experimental result shows that the proposed method can be
    used to automatically find the competitive CNN architecture compared with
    state-of-the-art models.

    Using Echo State Networks for Cryptography

    Rajkumar Ramamurthy, Christian Bauckhage, Krisztian Buza, Stefan Wrobel
    Comments: 8 pages, ICANN 2017
    Subjects: Cryptography and Security (cs.CR); Neural and Evolutionary Computing (cs.NE)

    Echo state networks are simple recurrent neural networks that are easy to
    implement and train. Despite their simplicity, they show a form of memory and
    can predict or regenerate sequences of data. We make use of this property to
    realize a novel neural cryptography scheme. The key idea is to assume that
    Alice and Bob share a copy of an echo state network. If Alice trains her copy
    to memorize a message, she can communicate the trained part of the network to
    Bob who plugs it into his copy to regenerate the message. Considering a
    byte-level representation of in- and output, the technique applies to arbitrary
    types of data (texts, images, audio files, etc.) and practical experiments
    reveal it to satisfy the fundamental cryptographic properties of diffusion and
    confusion.


    Computer Vision and Pattern Recognition

    Deep Depth From Focus

    Caner Hazirbas, Laura Leal-Taixé, Daniel Cremers
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Depth from Focus (DFF) is one of the classical ill-posed inverse problems in
    computer vision. Most approaches recover the depth at each pixel based on the
    focal setting which exhibits maximal sharpness. Yet, it is not obvious how to
    reliably estimate the sharpness level, particularly in low-textured areas. In
    this paper, we propose ‘Deep Depth From Focus (DDFF)’ as the first end-to-end
    learning approach to this problem. Towards this goal, we create a novel
    real-scene indoor benchmark composed of 4D light-field images obtained from a
    plenoptic camera and ground truth depth obtained from a registered RGB-D
    sensor. Compared to existing benchmarks our dataset is 30 times larger,
    enabling the use of machine learning for this inverse problem. We compare our
    results with state-of-the-art DFF methods and we also analyze the effect of
    several key deep architectural components. These experiments show that DDFFNet
    achieves state-of-the-art performance in all scenes, reducing depth error by
    more than 70% wrt classic DFF methods.

    ME R-CNN: Multi-Expert Region-based CNN for Object Detection

    Hyungtae Lee, Sungmin Eum, Heesung Kwon
    Comments: Submit to ICCV
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent CNN-based object detection methods have drastically improved their
    performances but still use a single classifier as opposed to “multiple experts”
    in categorizing objects. The main motivation of introducing multi-experts is
    twofold: i) to allow different experts to specialize in different fundamental
    object shape priors and ii) to better capture the appearance variations caused
    by different poses and viewing angles. The proposed approach, referred to as
    multi-expert Region-based CNN (ME R-CNN), consists of three experts each
    responsible for objects with particular shapes: horizontally elongated,
    square-like, and vertically elongated. Each expert is a network with multiple
    fully connected layers and all the experts are preceded by a shared network
    which consists of multiple convolutional layers.

    On top of using selective search which provides a compact, yet effective set
    of region of interests (RoIs) for object detection, we augmented the set by
    also employing the exhaustive search for training. Incorporating the exhaustive
    search can provide complementary advantages: i) it captures the multitude of
    neighboring RoIs missed by the selective search, and thus ii) provide
    significantly larger amount of training examples to achieve the enhanced
    accuracy.

    OctNetFusion: Learning Depth Fusion from Data

    Gernot Riegler, Ali Osman Ulusoy, Horst Bischof, Andreas Geiger
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we present a learning based approach to depth fusion, i.e.,
    dense 3D reconstruction from multiple depth images. The most common approach to
    depth fusion is based on averaging truncated signed distance functions, which
    was originally proposed by Curless and Levoy in 1996. While this method
    achieves great results, it can not reconstruct surfaces occluded in the input
    views and requires a large number frames to filter out sensor noise and
    outliers. Motivated by large 3D model databases and recent advances in deep
    learning, we present a novel 3D convolutional network architecture that learns
    to predict an implicit surface representation from the input depth maps. Our
    learning based fusion approach significantly outperforms the traditional
    volumetric fusion approach in terms of noise reduction and outlier suppression.
    By learning the structure of real world 3D objects and scenes, our approach is
    further able to reconstruct occluded regions and to fill gaps in the
    reconstruction. We evaluate our approach extensively on both synthetic and
    real-world datasets for volumetric fusion. Further, we apply our approach to
    the problem of 3D shape completion from a single view where our approach
    achieves state-of-the-art results.

    Optic Disc and Cup Segmentation Methods for Glaucoma Detection with Modification of U-Net Convolutional Neural Network

    Artem Sevastopolsky
    Comments: accepted for publication in “Pattern Recognition and Image Analysis: Advances in Mathematical Theory and Applications” journal, ISSN 1054-6618
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Glaucoma is the second leading cause of blindness all over the world, with
    approximately 60 million cases reported worldwide in 2010. If undiagnosed in
    time, glaucoma causes irreversible damage to the optic nerve leading to
    blindness. The optic nerve head examination, which involves measurement of
    cup-to-disc ratio, is considered one of the most valuable methods of structural
    diagnosis of the disease. Estimation of cup-to-disc ratio requires segmentation
    of optic disc and optic cup on eye fundus images and can be performed by modern
    computer vision algorithms. This work presents universal approach for automatic
    optic disc and cup segmentation, which is based on deep learning, namely,
    modification of U-Net convolutional neural network. Our experiments include
    comparison with the best known methods on publicly available databases
    DRIONS-DB, RIM-ONE v.3, DRISHTI-GS. For both optic disc and cup segmentation,
    our method achieves quality comparable to current state-of-the-art methods,
    outperforming them in terms of the prediction time.

    Simultaneous Feature Aggregating and Hashing for Large-scale Image Search

    Thanh-Toan Do, Dang-Khoa Le Tan, Trung T. Pham, Ngai-Man Cheung
    Comments: Accepted to CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In most state-of-the-art hashing-based visual search systems, local image
    descriptors of an image are first aggregated as a single feature vector. This
    feature vector is then subjected to a hashing function that produces a binary
    hash code. In previous work, the aggregating and the hashing processes are
    designed independently. In this paper, we propose a novel framework where
    feature aggregating and hashing are designed simultaneously and optimized
    jointly. Specifically, our joint optimization produces aggregated
    representations that can be better reconstructed by some binary codes. This
    leads to more discriminative binary hash codes and improved retrieval accuracy.
    In addition, we also propose a fast version of the recently-proposed Binary
    Autoencoder to be used in our proposed framework. We perform extensive
    retrieval experiments on several benchmark datasets with both SIFT and
    convolutional features. Our results suggest that the proposed framework
    achieves significant improvements over the state of the art.

    Guided Proofreading of Automatic Segmentations for Connectomics

    Daniel Haehn, Verena Kaynig, James Tompkin, Jeff W. Lichtman, Hanspeter Pfister
    Comments: Supplemental material available at this http URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Automatic cell image segmentation methods in connectomics produce merge and
    split errors, which require correction through proofreading. Previous research
    has identified the visual search for these errors as the bottleneck in
    interactive proofreading. To aid error correction, we develop two classifiers
    that automatically recommend candidate merges and splits to the user. These
    classifiers use a convolutional neural network (CNN) that has been trained with
    errors in automatic segmentations against expert-labeled ground truth. Our
    classifiers detect potentially-erroneous regions by considering a large context
    region around a segmentation boundary. Corrections can then be performed by a
    user with yes/no decisions, which reduces variation of information 7.5x faster
    than previous proofreading methods. We also present a fully-automatic mode that
    uses a probability threshold to make merge/split decisions. Extensive
    experiments using the automatic approach and comparing performance of novice
    and expert users demonstrate that our method performs favorably against
    state-of-the-art proofreading methods on different connectomics datasets.

    Cascaded Segmentation-Detection Networks for Word-Level Text Spotting

    Siyang Qin, Roberto Manduchi
    Comments: 7 pages, 8 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We introduce an algorithm for word-level text spotting that is able to
    accurately and reliably determine the bounding regions of individual words of
    text “in the wild”. Our system is formed by the cascade of two convolutional
    neural networks. The first network is fully convolutional and is in charge of
    detecting areas containing text. This results in a very reliable but possibly
    inaccurate segmentation of the input image. The second network (inspired by the
    popular YOLO architecture) analyzes each segment produced in the first stage,
    and predicts oriented rectangular regions containing individual words. No
    post-processing (e.g. text line grouping) is necessary. With execution time of
    450 ms for a 1000-by-560 image on a Titan X GPU, our system achieves the
    highest score to date among published algorithms on the ICDAR 2015 Incidental
    Scene Text dataset benchmark.

    AMC: Attention guided Multi-modal Correlation Learning for Image Search

    Kan Chen, Trung Bui, Fang Chen, Zhaowen Wang, Ram Nevatia
    Comments: CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Given a user’s query, traditional image search systems rank images according
    to its relevance to a single modality (e.g., image content or surrounding
    text). Nowadays, an increasing number of images on the Internet are available
    with associated meta data in rich modalities (e.g., titles, keywords, tags,
    etc.), which can be exploited for better similarity measure with queries. In
    this paper, we leverage visual and textual modalities for image search by
    learning their correlation with input query. According to the intent of query,
    attention mechanism can be introduced to adaptively balance the importance of
    different modalities. We propose a novel Attention guided Multi-modal
    Correlation (AMC) learning method which consists of a jointly learned hierarchy
    of intra and inter-attention networks. Conditioned on query’s intent,
    intra-attention networks (i.e., visual intra-attention network and language
    intra-attention network) attend on informative parts within each modality; a
    multi-modal inter-attention network promotes the importance of the most
    query-relevant modalities. In experiments, we evaluate AMC models on the search
    logs from two real world image search engines and show a significant boost on
    the ranking of user-clicked images in search results. Additionally, we extend
    AMC models to caption ranking task on COCO dataset and achieve competitive
    results compared with recent state-of-the-arts.

    Unsupervised Action Proposal Ranking through Proposal Recombination

    Waqas Sultani, Dong Zhang, Mubarak Shah
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recently, action proposal methods have played an important role in action
    recognition tasks, as they reduce the search space dramatically. Most
    unsupervised action proposal methods tend to generate hundreds of action
    proposals which include many noisy, inconsistent, and unranked action
    proposals, while supervised action proposal methods take advantage of
    predefined object detectors (e.g., human detector) to refine and score the
    action proposals, but they require thousands of manual annotations to train.

    Given the action proposals in a video, the goal of the proposed work is to
    generate a few better action proposals that are ranked properly. In our
    approach, we first divide action proposal into sub-proposal and then use
    Dynamic Programming based graph optimization scheme to select the optimal
    combinations of sub-proposals from different proposals and assign each new
    proposal a score. We propose a new unsupervised image-based actioness detector
    that leverages web images and employs it as one of the node scores in our graph
    formulation. Moreover, we capture motion information by estimating the number
    of motion contours within each action proposal patch. The proposed method is an
    unsupervised method that neither needs bounding box annotations nor video level
    labels, which is desirable with the current explosion of large-scale action
    datasets. Our approach is generic and does not depend on a specific action
    proposal method. We evaluate our approach on several publicly available trimmed
    and un-trimmed datasets and obtain better performance compared to several
    proposal ranking methods. In addition, we demonstrate that properly ranked
    proposals produce significantly better action detection as compared to
    state-of-the-art proposal based methods.

    sWSI: A Low-cost and Commercial-quality Whole Slide Imaging System on Android and iOS Smartphones

    Shuoxin Ma, Tan Wang
    Subjects: Biological Physics (physics.bio-ph); Computer Vision and Pattern Recognition (cs.CV)

    In this paper, scalable Whole Slide Imaging (sWSI), a novel high-throughput,
    cost-effective and robust whole slide imaging system on both Android and iOS
    platforms is introduced and analyzed. With sWSI, most mainstream smartphone
    connected to a optical eyepiece of any manually controlled microscope can be
    automatically controlled to capture sequences of mega-pixel fields of views
    that are synthesized into giga-pixel virtual slides. Remote servers carry out
    the majority of computation asynchronously to support clients running at
    satisfying frame rates without sacrificing image quality nor robustness. A
    typical 15x15mm sample can be digitized in 30 seconds with 4X or in 3 minutes
    with 10X object magnification, costing under (1. The virtual slide quality is
    considered comparable to existing high-end scanners thus satisfying for
    clinical usage by surveyed pathologies. The scan procedure with features such
    as supporting magnification up to 100x, recoding z-stacks,
    specimen-type-neutral and giving real-time feedback, is deemed
    work-flow-friendly and reliable.

    A Branch-and-Bound Algorithm for Checkerboard Extraction in Camera-Laser Calibration

    Alireza Khosravian, Tat-Jun Chin, Ian Reid
    Comments: To appear in IEEE Conference on Robotics and Automation 2017
    Journal-ref: Proceedings of IEEE Conference on Robotics and Automation, 2017
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    We address the problem of camera-to-laser-scanner calibration using a
    checkerboard and multiple image-laser scan pairs. Distinguishing which laser
    points measure the checkerboard and which lie on the background is essential to
    any such system. We formulate the checkerboard extraction as a combinatorial
    optimization problem with a clear cut objective function. We propose a
    branch-and-bound technique that deterministically and globally optimizes the
    objective. Unlike what is available in the literature, the proposed method is
    not heuristic and does not require assumptions such as constraints on the
    background or relying on discontinuity of the range measurements to partition
    the data into line segments. The proposed approach is generic and can be
    applied to both 3D or 2D laser scanners as well as the cases where multiple
    checkerboards are present. We demonstrate the effectiveness of the proposed
    approach by providing numerical simulations as well as experimental results.

    Online deforestation detection

    Emiliano Diaz
    Subjects: Applications (stat.AP); Computer Vision and Pattern Recognition (cs.CV)

    Deforestation detection using satellite images can make an important
    contribution to forest management. Current approaches can be broadly divided
    into those that compare two images taken at similar periods of the year and
    those that monitor changes by using multiple images taken during the growing
    season. The CMFDA algorithm described in Zhu et al. (2012) is an algorithm that
    builds on the latter category by implementing a year-long, continuous,
    time-series based approach to monitoring images. This algorithm was developed
    for 30m resolution, 16-day frequency reflectance data from the Landsat
    satellite. In this work we adapt the algorithm to 1km, 16-day frequency
    reflectance data from the modis sensor aboard the Terra satellite. The CMFDA
    algorithm is composed of two submodels which are fitted on a pixel-by-pixel
    basis. The first estimates the amount of surface reflectance as a function of
    the day of the year. The second estimates the occurrence of a deforestation
    event by comparing the last few predicted and real reflectance values. For this
    comparison, the reflectance observations for six different bands are first
    combined into a forest index. Real and predicted values of the forest index are
    then compared and high absolute differences for consecutive observation dates
    are flagged as deforestation events. Our adapted algorithm also uses the two
    model framework. However, since the modis 13A2 dataset used, includes
    reflectance data for different spectral bands than those included in the
    Landsat dataset, we cannot construct the forest index. Instead we propose two
    contrasting approaches: a multivariate and an index approach similar to that of
    CMFDA.


    Artificial Intelligence

    Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes

    Feras Saad, Leonardo Casarsa, Vikash Mansinghka
    Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB); Learning (cs.LG); Machine Learning (stat.ML)

    Databases are widespread, yet extracting relevant data can be difficult.
    Without substantial domain knowledge, multivariate search queries often return
    sparse or uninformative results. This paper introduces an approach for
    searching structured data based on probabilistic programming and nonparametric
    Bayes. Users specify queries in a probabilistic language that combines standard
    SQL database search operators with an information theoretic ranking function
    called predictive relevance. Predictive relevance can be calculated by a fast
    sparse matrix algorithm based on posterior samples from CrossCat, a
    nonparametric Bayesian model for high-dimensional, heterogeneously-typed data
    tables. The result is a flexible search technique that applies to a broad class
    of information retrieval problems, which we integrate into BayesDB, a
    probabilistic programming platform for probabilistic data analysis. This paper
    demonstrates applications to databases of US colleges, global macroeconomic
    indicators of public health, and classic cars. We found that human evaluators
    often prefer the results from probabilistic search to results from a standard
    baseline.

    A simulated annealing approach to optimal storing in a multi-level warehouse

    Alexander Eckrot, Carina Geldhauser, Jan Jurczyk
    Subjects: Artificial Intelligence (cs.AI)

    We propose a simulated annealing algorithm specifically tailored to optimise
    total retrieval times in a multi-level warehouse under complex pre-batched
    picking constraints. Experiments on real data from a picker-to-parts order
    picking process in the warehouse of a European manufacturer show that optimal
    storage assignments do not necessarily display features presumed in heuristics,
    such as clustering of positively correlated items or ordering of items by
    picking frequency.

    In an experiment run on more than 4000 batched orders with 1 to 150 items per
    batch, the storage assignment suggested by the algorithm produces a 21\%
    reduction in the total retrieval time with respect to a frequency-based storage
    assignment.

    An Ontological Architecture for Orbital Debris Data

    Robert J. Rovetto
    Journal-ref: Earth Science Informatics (Aug 2015) 9:67, pp.67-82, Springer
    Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB)

    The orbital debris problem presents an opportunity for inter-agency and
    international cooperation toward the mutually beneficial goals of debris
    prevention, mitigation, remediation, and improved space situational awareness
    (SSA). Achieving these goals requires sharing orbital debris and other SSA
    data. Toward this, I present an ontological architecture for the orbital debris
    domain, taking steps in the creation of an orbital debris ontology (ODO). The
    purpose of this ontological system is to (I) represent general orbital debris
    and SSA domain knowledge, (II) structure, and standardize where needed, orbital
    data and terminology, and (III) foster semantic interoperability and
    data-sharing. In doing so I hope to (IV) contribute to solving the orbital
    debris problem, improving peaceful global SSA, and ensuring safe space travel
    for future generations.

    Ontology based Scene Creation for the Development of Automated Vehicles

    Gerrit Bagschik, Till Menzel, Markus Maurer
    Comments: Submitted for review to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017), 7 pages, 4 figures
    Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

    The introduction of automated vehicles without permanent human supervision
    demands a functional system description including functional system boundaries
    and a comprehensive safety analysis. These inputs to the technical development
    can be identified and analyzed by a scenario-based approach. Experts are doing
    well to identify scenarios that are difficult to handle or unlikely to happen.
    To establish an economical test and release process, a large number of
    scenarios must be identified to enable an execution of test-cases in simulation
    environments. Expert knowledge modeled for computer aided processing may help
    for the purpose of providing a wide range of scenarios. This contribution
    reviews the use of ontologies as knowledge-based systems in the field of
    automated vehicles, and proposes a modeling concept for traffic scenes.
    Afterwards, a process to create scenes from the gathered knowledge is
    introduced and the utilization of ontologies in the given problem statement is
    evaluated using criteria from literature.

    Adaptive Motion Gaming AI for Health Promotion

    Pujana Paliyawan, Takahiro Kusano, Yuto Nakagawa, Tomohiro Harada, Ruck Thawonmas
    Comments: A revised version of our paper for 2017 AAAI Spring Symposium Series (Well-Being AI: From Machine Learning to Subjective Oriented Computing), San Francisco,USA, Mar. 27-29, 2017. Revised contents, due to our correction of (8), are highlighted in red. Many apologies, but the effectiveness of the proposed method/approach in the paper still holds
    Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

    This paper presents a design of a non-player character (AI) for promoting
    balancedness in use of body segments when engaging in full-body motion gaming.
    In our experiment, we settle a battle between the proposed AI and a player by
    using FightingICE, a fighting game platform for AI development. A middleware
    called UKI is used to allow the player to control the game by using body motion
    instead of the keyboard and mouse. During gameplay, the proposed AI analyze
    health states of the player; it determines its next action by predicting how
    each candidate action, recommended by a Monte-Carlo tree search algorithm, will
    induce the player to move, and how the player’s health tends to be affected.
    Our result demonstrates successful improvement in balancedness in use of body
    segments on 4 out of 5 subjects.

    A History of Metaheuristics

    Kenneth Sorensen, Marc Sevaux, Fred Glover
    Comments: 27 pages, to appear in: R. Marti, P. Pardalos, and M. Resende, eds., Handbook of Heuristics, Springer
    Subjects: Artificial Intelligence (cs.AI)

    This chapter describes the history of metaheuristics in five distinct
    periods, starting long before the first use of the term and ending a long time
    in the future.

    On the idea of a new artificial intelligence based optimization algorithm inspired from the nature of vortex

    Utku Kose, Ahmet Arslan
    Comments: 7 pages, 1 figure, 2 tables
    Journal-ref: Broad Research in Artificial Intelligence and Neuroscience,
    5(1-4), 2014, 60-66
    Subjects: Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

    In this paper, the idea of a new artificial intelligence based optimization
    algorithm, which is inspired from the nature of vortex, has been provided
    briefly. As also a bio-inspired computation algorithm, the idea is generally
    focused on a typical vortex flow / behavior in nature and inspires from some
    dynamics that are occurred in the sense of vortex nature. Briefly, the
    algorithm is also a swarm-oriented evolutional problem solution approach;
    because it includes many methods related to elimination of weak swarm members
    and trying to improve the solution process by supporting the solution space via
    new swarm members. In order have better idea about success of the algorithm; it
    has been tested via some benchmark functions. At this point, the obtained
    results show that the algorithm can be an alternative to the literature in
    terms of single-objective optimization solution ways. Vortex Optimization
    Algorithm (VOA) is the name suggestion by the authors; for this new idea of
    intelligent optimization approach.

    Design and development of a software system for swarm intelligence based research studies

    Utku Kose
    Comments: 6 pages, 2 figures, 1 table
    Journal-ref: Broad Research in Artificial Intelligence and Neuroscience, 3(3),
    2012, 12-17
    Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

    This paper introduce a software system including widely-used Swarm
    Intelligence algorithms or approaches to be used for the related scientific
    research studies associated with the subject area. The programmatic
    infrastructure of the system allows working on a fast, easy-to-use, interactive
    platform to perform Swarm Intelligence based studies in a more effective,
    efficient and accurate way. In this sense, the system employs all of the
    necessary controls for the algorithms and it ensures an interactive platform on
    which computer users can perform studies on a wide spectrum of solution
    approaches associated with simple and also more advanced problems.

    Brief Notes on Hard Takeoff, Value Alignment, and Coherent Extrapolated Volition

    Gopal P. Sarma
    Comments: 3 pages
    Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Learning (cs.LG)

    I make some basic observations about hard takeoff, value alignment, and
    coherent extrapolated volition, concepts which have been central in analyses of
    superintelligent AI systems.

    Deriving Probability Density Functions from Probabilistic Functional Programs

    Sooraj Bhat, Johannes Borgström, Andrew D. Gordon, Claudio Russo
    Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI)

    The probability density function of a probability distribution is a
    fundamental concept in probability theory and a key ingredient in various
    widely used machine learning methods. However, the necessary framework for
    compiling probabilistic functional programs to density functions has only
    recently been developed. In this work, we present a density compiler for a
    probabilistic language with failure and both discrete and continuous
    distributions, and provide a proof of its soundness. The compiler greatly
    reduces the development effort of domain experts, which we demonstrate by
    solving inference problems from various scientific applications, such as
    modelling the global carbon cycle, using a standard Markov chain Monte Carlo
    framework.

    Multi-Advisor Reinforcement Learning

    Romain Laroche, Mehdi Fatemi, Joshua Romoff, Harm van Seijen
    Comments: Submitted at UAI2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    This article deals with a novel branch of Separation of Concerns, called
    Multi-Advisor Reinforcement Learning (MAd-RL), where a single-agent RL problem
    is distributed to )n( learners, called advisors. Each advisor tries to solve
    the problem with a different focus. Their advice is then communicated to an
    aggregator, which is in control of the system. For the local training, three
    off-policy bootstrapping methods are proposed and analysed: local-max
    bootstraps with the local greedy action, rand-policy bootstraps with respect to
    the random policy, and agg-policy bootstraps with respect to the aggregator’s
    greedy policy. MAd-RL is positioned as a generalisation of Reinforcement
    Learning with Ensemble methods. An experiment is held on a simplified version
    of the Ms. Pac-Man Atari game. The results confirm the theoretical relative
    strengths and weaknesses of each method.

    Reprogramming Matter, Life, and Purpose

    Hector Zenil
    Comments: Invited contribution to `Computing in the year 2065′, A. Adamatzky (Ed.), Springer Verlag
    Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

    Reprogramming matter may sound far-fetched, but we have been doing it with
    increasing power and staggering efficiency for at least 60 years, and for
    centuries we have been paving the way toward the ultimate reprogrammed fate of
    the universe, the vessel of all programs. How will we be doing it in 60 years’
    time and how will it impact life and the purpose both of machines and of
    humans?

    Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders

    Tiancheng Zhao, Ran Zhao, Maxine Eskenazi
    Comments: Accepted as a long paper in ACL 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    While recent neural encoder-decoder models have shown great promise in
    modeling open-domain conversations, they often generate dull and generic
    responses. Unlike past work that has focused on diversifying the output of the
    decoder at word-level to alleviate this problem, we present a novel framework
    based on conditional variational autoencoders that captures the discourse-level
    diversity in the encoder. Our model uses latent variables to learn a
    distribution over potential conversational intents and generates diverse
    responses using only greedy decoders. We have further developed a novel variant
    that is integrated with linguistic prior knowledge for better performance.
    Finally, the training procedure is improved by introducing a bag-of-word loss.
    Our proposed models have been validated to generate significantly more diverse
    responses than baseline approaches and exhibit competence in discourse-level
    decision-making.


    Computation and Language

    Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory

    Hao Zhou, Minlie Huang, Tianyang Zhang, Xiaoyan Zhu, Bing Liu
    Subjects: Computation and Language (cs.CL)

    Emotional intelligence is one of the key factors to the success of dialogue
    systems or conversational agents. In this paper, we propose Emotional Chatting
    Machine (ECM) which generates responses that are appropriate not only at the
    content level (relevant and grammatical) but also at the emotion level
    (consistent emotional expression). To the best of our knowledge, this is the
    first work that addresses the emotion factor in large-scale conversation
    generation. ECM addresses the factor in three ways: modeling high-level
    abstraction of emotion expression by embedding emotion categories, changing of
    implicit internal emotion states, and using explicit emotion expressions with
    an external emotion vocabulary. Experiments show that our model can generate
    responses appropriate not only in content but also in emotion.

    Fortia-FBK at SemEval-2017 Task 5: Bullish or Bearish? Inferring Sentiment towards Brands from Financial News Headlines

    Youness Mansar, Lorenzo Gatti, Sira Ferradans, Marco Guerini, Jacopo Staiano
    Comments: 6 pages, 1 figure; accepted for publication at the International Workshop on Semantic Evaluation (SemEval-2017) to be held in conjunction with ACL 2017
    Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)

    In this paper, we describe a methodology to infer Bullish or Bearish
    sentiment towards companies/brands. More specifically, our approach leverages
    affective lexica and word embeddings in combination with convolutional neural
    networks to infer the sentiment of financial news headlines towards a target
    company. Such architecture was used and evaluated in the context of the SemEval
    2017 challenge (task 5, subtask 2), in which it obtained the best performance.

    Japanese Sentiment Classification using a Tree-Structured Long Short-Term Memory with Attention

    Ryosuke Miyazaki, Mamoru Komachi
    Comments: 6 pages
    Subjects: Computation and Language (cs.CL)

    Previous approaches to training syntax-based sentiment classification models
    required phrase-level annotated corpora, which are not readily available in
    many languages other than English. Thus, we propose the use of tree-structured
    Long Short-Term Memory with an attention mechanism that pays attention to each
    subtree of the parse tree. Experimental results indicate that our model
    achieves the state-of-the-art performance in a Japanese sentiment
    classification task.

    Interpretation of Semantic Tweet Representations

    J Ganesh, Manish Gupta, Vasudeva Varma
    Comments: Under review at ASONAM 2017; Initial version presented at NIPS 2016 workshop can be found at arXiv:1611.04887
    Subjects: Computation and Language (cs.CL)

    Research in analysis of microblogging platforms is experiencing a renewed
    surge with a large number of works applying representation learning models for
    applications like sentiment analysis, semantic textual similarity computation,
    hashtag prediction, etc. Although the performance of the representation
    learning models has been better than the traditional baselines for such tasks,
    little is known about the elementary properties of a tweet encoded within these
    representations, or why particular representations work better for certain
    tasks. Our work presented here constitutes the first step in opening the
    black-box of vector embeddings for tweets. Traditional feature engineering
    methods for high-level applications have exploited various elementary
    properties of tweets. We believe that a tweet representation is effective for
    an application because it meticulously encodes the application-specific
    elementary properties of tweets. To understand the elementary properties
    encoded in a tweet representation, we evaluate the representations on the
    accuracy to which they can model each of those properties such as tweet length,
    presence of particular words, hashtags, mentions, capitalization, etc. Our
    systematic extensive study of nine supervised and four unsupervised tweet
    representations against most popular eight textual and five social elementary
    properties reveal that Bi-directional LSTMs (BLSTMs) and Skip-Thought Vectors
    (STV) best encode the textual and social properties of tweets respectively.
    FastText is the best model for low resource settings, providing very little
    degradation with reduction in embedding size. Finally, we draw interesting
    insights by correlating the model performance obtained for elementary property
    prediction tasks with the high-level downstream applications.

    Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks

    Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang
    Comments: Submitted to INTERSPEECH 2017
    Subjects: Computation and Language (cs.CL)

    Building a voice conversion (VC) system from non-parallel speech corpora is
    challenging but highly valuable in real application scenarios. In most
    situations, the source and the target speakers do not repeat the same texts or
    they may even speak different languages. In this case, one possible, although
    indirect, solution is to build a generative model for speech. Generative models
    focus on explaining the observations with latent variables instead of learning
    a pairwise transformation function, thereby bypassing the requirement of speech
    frame alignment. In this paper, we propose a non-parallel VC framework with a
    Wasserstein generative adversarial network (W-GAN) that explicitly takes a
    VC-related objective into account. Experimental results corroborate the
    capability of our framework for building a VC system from unaligned data, and
    demonstrate improved conversion quality.

    Restricted Recurrent Neural Tensor Networks

    Alexandre Salle, Aline Villavicencio
    Subjects: Computation and Language (cs.CL)

    Increasing the capacity of recurrent neural networks (RNN) usually involves
    augmenting the size of the hidden layer, resulting in a significant increase of
    computational cost. An alternative is the recurrent neural tensor network
    (RNTN), which increases capacity by employing distinct hidden layer weights for
    each vocabulary word. The disadvantage of RNTNs is that memory usage scales
    linearly with vocabulary size, which can reach millions for word-level language
    models. In this paper, we introduce restricted recurrent neural tensor networks
    (r-RNTN) which reserve distinct hidden layer weights for frequent vocabulary
    words while sharing a single set of weights for infrequent words. Perplexity
    evaluations using the Penn Treebank corpus show that r-RNTNs improve language
    model performance over standard RNNs using only a small fraction of the
    parameters of unrestricted RNTNs.

    Online and Linear-Time Attention by Enforcing Monotonic Alignments

    Colin Raffel, Thang Luong, Peter J. Liu, Ron J. Weiss, Douglas Eck
    Comments: 19 pages, 5 figures (three full-page), including 4 pages of supplementary material
    Subjects: Learning (cs.LG); Computation and Language (cs.CL)

    Recurrent neural network models with an attention mechanism have proven to be
    extremely effective on a wide variety of sequence-to-sequence problems.
    However, the fact that soft attention mechanisms perform a pass over the entire
    input sequence when producing each element in the output sequence precludes
    their use in online settings and results in a quadratic time complexity. Based
    on the insight that the alignment between input and output sequence elements is
    monotonic in many problems of interest, we propose an end-to-end differentiable
    method for learning monotonic alignments which, at test time, enables computing
    attention online and in linear time. We validate our approach on sentence
    summarization, machine translation, and online speech recognition problems and
    achieve results competitive with existing sequence-to-sequence models.


    Distributed, Parallel, and Cluster Computing

    Converging High-Throughput and High-Performance Computing: A Case Study

    Alessio Angius, Danila Oleynik, Sergey Panitkin, Matteo Turilli, Kaushik De, Alexei Klimentov, Sarp H. Oral, Jack C. Wells, Shantenu Jha
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)

    The computing systems used by LHC experiments has historically consisted of
    the federation of hundreds to thousands of distributed resources, ranging from
    small to mid-size resource. In spite of the impressive scale of the existing
    distributed computing solutions, the federation of small to mid-size resources
    will be insufficient to meet projected future demands. This paper is a case
    study of how the ATLAS experiment has embraced Titan — a DOE leadership
    facility in conjunction with traditional distributed high-throughput computing
    to reach sustained production scales of approximately 51M core-hours a years.
    The three main contributions of this paper are: (i) a critical evaluation of
    design and operational considerations to support the sustained, scalable and
    production usage of Titan; (ii) a preliminary characterization of a next
    generation executor for PanDA to support new workloads and advanced execution
    modes; and (iii) early lessons for how current and future experimental and
    observational systems can be integrated with production supercomputers and
    other platforms in a general and extensible manner.

    HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing

    Shixiang Wan, Quan Zou
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Multiple sequence alignment (MSA) plays a key role in biological sequence
    analyses, especially in phylogenetic tree construction. Extreme increase in
    next-generation sequencing results in shortage of efficient ultra-large
    biological sequence alignment approaches for coping with different sequence
    types. Distributed and parallel computing represents a crucial technique for
    accelerating ultra-large sequence analyses. Based on HAlign and Spark
    distributed computing system, we implement a highly cost-efficient and
    time-efficient HAlign-II tool to address ultra-large multiple biological
    sequence alignment and phylogenetic tree construction. After comparing with
    most available state-of-the-art methods, our experimental results indicate the
    following: 1) HAlign-II can efficiently carry out MSA and construct
    phylogenetic trees with ultra-large biological sequences; 2) HAlign-II shows
    extremely high memory efficiency and scales well with increases in computing
    resource; 3) HAlign-II provides a user-friendly web server based on our
    distributed computing infrastructure. HAlign-II with open-source codes and
    datasets was established at this http URL

    The Cloudlet Bazaar Dynamic Markets for the Small Cloud

    Ranjan Pal, Sung-Han Lin, Leana Golubchik
    Comments: Submitted to IEEE Transactions on Cloud Computing
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Science and Game Theory (cs.GT)

    Due to increasing concerns about privacy and data control as one of the
    factors, many small clouds (SCs) established by different providers are
    emerging in an attempt to meet demand locally. The flip-side of this approach
    is the problem of resource inelasticity faced by the SCs due to their
    relatively scarce resources (e.g., virtual machines), thereby leading to
    potential degradation of customer QoS and loss of revenue. A proposed solution
    to this problem recommends the sharing of resources between SCs to alleviate
    the resource inelasticity issues that might arise. However, effective borrowing
    of resources by an SC from its peers involves mutually satisfying the interests
    of the stakeholders in question, i.e., the SC customers, the SCs, and a
    regulatory agency (e.g., the federal government) overseeing the functioning of
    the SCs. In this paper, we model the ‘stakeholder satisfaction problem’ in SCs
    as a socially efficient dynamic market/ecosystem design task, where the market
    elements comprise the stakeholders, and the term ‘social efficiency’ implying
    the market reaching a utilitarian maximum social welfare state when in
    equilibrium. Our market design approach is based on Arrow and Hurwicz’s
    disequilibrium process and uses the gradient play technique in game theory to
    iteratively converge upon efficient and stable market equilibria. We illustrate
    the stability and sustainability of SC markets via both theory and experiments.

    Locally Self-Adjusting Skip Graphs

    Sikder Huq, Sukumar Ghosh
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    We present a distributed self-adjusting algorithm for skip graphs that
    minimizes the average routing costs between arbitrary communication pairs by
    performing topological adaptation to the communication pattern. Our algorithm
    is fully decentralized, conforms to the )mathcal{CONGEST}( model (i.e. uses
    )O(log n)( bit messages), and requires )O(log n)( bits of memory for each
    node, where )n( is the total number of nodes. Upon each communication request,
    our algorithm first establishes communication by using the standard skip graph
    routing, and then locally and partially reconstructs the skip graph topology to
    perform topological adaptation. We propose a computational model for such
    algorithms, as well as a yardstick (working set property) to evaluate them. Our
    working set property can also be used to evaluate self-adjusting algorithms for
    other graph classes where multiple tree-like subgraphs overlap (e.g. hypercube
    networks). We derive a lower bound of the amortized routing cost for any
    algorithm that follows our model and serves an unknown sequence of
    communication requests. We show that the routing cost of our algorithm is at
    most a constant factor more than the amortized routing cost of any algorithm
    conforming to our computational model. We also show that the expected
    transformation cost for our algorithm is at most a logarithmic factor more than
    the amortized routing cost of any algorithm conforming to our computational
    model.

    The string of diamonds is tight for rumor spreading

    Omer Angel, Abbas Mehrabian, Yuval Peres
    Comments: 9 pages
    Subjects: Probability (math.PR); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI)

    For a rumor spreading protocol, the spread time is defined as the first time
    that everyone learns the rumor. We compare the synchronous push&pull rumor
    spreading protocol with its asynchronous variant, and show that for any
    )n(-vertex graph and any starting vertex, the ratio between their expected
    spread times is bounded by )O left({n}{log n}
    ight)^{1/3}(. This improves
    the )O(sqrt n)( upper bound of Giakkoupis, Nazari, and Woelfel (in Proceedings
    of ACM Symposium on Principles of Distributed Computing, 2016). Our bound is
    tight up to a factor of )O(log^{2/3}n)(, as illustrated by the string of
    diamonds graph.


    Learning

    Homotopy Parametric Simplex Method for Sparse Learning

    Haotian Pang, Tuo Zhao, Robert Vanderbei, Han Liu
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

    High dimensional sparse learning has imposed a great computational challenge
    to large scale data analysis. In this paper, we are interested in a broad class
    of sparse learning approaches formulated as linear programs parametrized by a
    {em regularization factor}, and solve them by the parametric simplex method
    (PSM). Our parametric simplex method offers significant advantages over other
    competing methods: (1) PSM naturally obtains the complete solution path for all
    values of the regularization parameter; (2) PSM provides a high precision dual
    certificate stopping criterion; (3) PSM yields sparse solutions through very
    few iterations, and the solution sparsity significantly reduces the
    computational cost per iteration. Particularly, we demonstrate the superiority
    of PSM over various sparse learning approaches, including Dantzig selector for
    sparse linear regression, LAD-Lasso for sparse robust linear regression, CLIME
    for sparse precision matrix estimation, sparse differential network estimation,
    and sparse Linear Programming Discriminant (LPD) analysis. We then provide
    sufficient conditions under which PSM always outputs sparse solutions such that
    its computational performance can be significantly boosted. Thorough numerical
    experiments are provided to demonstrate the outstanding performance of the PSM
    method.

    Polynomial Time and Sample Complexity for Non-Gaussian Component Analysis: Spectral Methods

    Yan Shuo Tan, Roman Vershynin
    Subjects: Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)

    The problem of Non-Gaussian Component Analysis (NGCA) is about finding a
    maximal low-dimensional subspace )E( in )mathbb{R}^n( so that data points
    projected onto )E( follow a non-gaussian distribution. Although this is an
    appropriate model for some real world data analysis problems, there has been
    little progress on this problem over the last decade.

    In this paper, we attempt to address this state of affairs in two ways.
    First, we give a new characterization of standard gaussian distributions in
    high-dimensions, which lead to effective tests for non-gaussianness. Second, we
    propose a simple algorithm, emph{Reweighted PCA}, as a method for solving the
    NGCA problem. We prove that for a general unknown non-gaussian distribution,
    this algorithm recovers at least one direction in )E(, with sample and time
    complexity depending polynomially on the dimension of the ambient space. We
    conjecture that the algorithm actually recovers the entire )E(.

    Online and Linear-Time Attention by Enforcing Monotonic Alignments

    Colin Raffel, Thang Luong, Peter J. Liu, Ron J. Weiss, Douglas Eck
    Comments: 19 pages, 5 figures (three full-page), including 4 pages of supplementary material
    Subjects: Learning (cs.LG); Computation and Language (cs.CL)

    Recurrent neural network models with an attention mechanism have proven to be
    extremely effective on a wide variety of sequence-to-sequence problems.
    However, the fact that soft attention mechanisms perform a pass over the entire
    input sequence when producing each element in the output sequence precludes
    their use in online settings and results in a quadratic time complexity. Based
    on the insight that the alignment between input and output sequence elements is
    monotonic in many problems of interest, we propose an end-to-end differentiable
    method for learning monotonic alignments which, at test time, enables computing
    attention online and in linear time. We validate our approach on sentence
    summarization, machine translation, and online speech recognition problems and
    achieve results competitive with existing sequence-to-sequence models.

    Multi-Advisor Reinforcement Learning

    Romain Laroche, Mehdi Fatemi, Joshua Romoff, Harm van Seijen
    Comments: Submitted at UAI2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    This article deals with a novel branch of Separation of Concerns, called
    Multi-Advisor Reinforcement Learning (MAd-RL), where a single-agent RL problem
    is distributed to )n( learners, called advisors. Each advisor tries to solve
    the problem with a different focus. Their advice is then communicated to an
    aggregator, which is in control of the system. For the local training, three
    off-policy bootstrapping methods are proposed and analysed: local-max
    bootstraps with the local greedy action, rand-policy bootstraps with respect to
    the random policy, and agg-policy bootstraps with respect to the aggregator’s
    greedy policy. MAd-RL is positioned as a generalisation of Reinforcement
    Learning with Ensemble methods. An experiment is held on a simplified version
    of the Ms. Pac-Man Atari game. The results confirm the theoretical relative
    strengths and weaknesses of each method.

    Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes

    Feras Saad, Leonardo Casarsa, Vikash Mansinghka
    Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB); Learning (cs.LG); Machine Learning (stat.ML)

    Databases are widespread, yet extracting relevant data can be difficult.
    Without substantial domain knowledge, multivariate search queries often return
    sparse or uninformative results. This paper introduces an approach for
    searching structured data based on probabilistic programming and nonparametric
    Bayes. Users specify queries in a probabilistic language that combines standard
    SQL database search operators with an information theoretic ranking function
    called predictive relevance. Predictive relevance can be calculated by a fast
    sparse matrix algorithm based on posterior samples from CrossCat, a
    nonparametric Bayesian model for high-dimensional, heterogeneously-typed data
    tables. The result is a flexible search technique that applies to a broad class
    of information retrieval problems, which we integrate into BayesDB, a
    probabilistic programming platform for probabilistic data analysis. This paper
    demonstrates applications to databases of US colleges, global macroeconomic
    indicators of public health, and classic cars. We found that human evaluators
    often prefer the results from probabilistic search to results from a standard
    baseline.

    Time Series Cluster Kernel for Learning Similarities between Multivariate Time Series with Missing Data

    Karl Øyvind Mikalsen, Filippo Maria Bianchi, Cristina Soguero-Ruiz, Robert Jenssen
    Comments: 22 pages, 6 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Similarity-based approaches represent a promising direction for time series
    analysis. However, many such methods rely on parameter tuning and have
    shortcomings if the time series are multivariate (MTS) and contain missing
    data. In this paper, we address these challenges within the powerful context of
    kernel methods by proposing the robust emph{time series cluster kernel} (TCK).
    The approach taken is to leverage the missing data handling properties of
    Gaussian mixture models (GMM) augmented with informative prior distributions.
    An ensemble learning approach is exploited to ensure robustness to parameters
    by combining the clustering results of many GMM to form the final kernel.

    We evaluate the TCK on synthetic and real data and compare to other
    state-of-the-art techniques. The experimental results demonstrate that the TCK
    is robust to parameter choices, provides competitive results for MTS without
    missing data and outstanding results for missing data.

    Brief Notes on Hard Takeoff, Value Alignment, and Coherent Extrapolated Volition

    Gopal P. Sarma
    Comments: 3 pages
    Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Learning (cs.LG)

    I make some basic observations about hard takeoff, value alignment, and
    coherent extrapolated volition, concepts which have been central in analyses of
    superintelligent AI systems.

    A comparative study of counterfactual estimators

    Thomas Nedelec, Nicolas Le Roux, Vianney Perchet
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We provide a comparative study of several widely used off-policy estimators
    (Empirical Average, Basic Importance Sampling and Normalized Importance
    Sampling), detailing the different regimes where they are individually
    suboptimal. We then exhibit properties optimal estimators should possess. In
    the case where examples have been gathered using multiple policies, we show
    that fused estimators dominate basic ones but can still be improved.

    Geometric Insights into Support Vector Machine Behavior using the KKT Conditions

    Iain Carmichael, J.S. Marron
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    The Support Vector Machine (SVM) is a powerful and widely used classification
    algorithm. Its performance is well known to be impacted by a tuning parameter
    which is frequently selected by cross-validation. This paper uses the
    Karush-Kuhn-Tucker conditions to provide rigorous mathematical proof for new
    insights into the behavior of SVM in the large and small tuning parameter
    regimes. These insights provide perhaps unexpected relationships between SVM
    and naive Bayes and maximal data piling directions. We explore how
    characteristics of the training data affect the behavior of SVM in many cases
    including: balanced vs. unbalanced classes, low vs. high dimension, separable
    vs. non-separable data. These results present a simple explanation of SVM’s
    behavior as a function of the tuning parameter. We also elaborate on the
    geometry of complete data piling directions in high dimensional space. The
    results proved in this paper suggest important implications for tuning SVM with
    cross-validation.


    Information Theory

    Uplink Performance Analysis in D2D-Enabled mmWave Cellular Networks

    Esma Turgut, M. Cenk Gursoy
    Comments: arXiv admin note: text overlap with arXiv:1510.05961, arXiv:1608.01790
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    In this paper, we provide an analytical framework to analyze the uplink
    performance of device-to-device (D2D)-enabled millimeter wave (mmWave) cellular
    networks. Signal-to- interference-plus-noise ratio (SINR) outage probabilities
    are derived for both cellular and D2D links using tools from stochastic
    geometry. The distinguishing features of mmWave communications such as
    directional beamforming and having different path loss laws for line-of-sight
    (LOS) and non-line-of-sight (NLOS) links are incorporated into the outage
    analysis by employing a flexible mode selection scheme and Nakagami fading.
    Also, the effect of beamforming alignment errors on the outage probability is
    investigated to get insight on the performance in practical scenarios.

    A New Transmitted Reference Pulse Cluster Based Ultra-Wideband Transmitter Design

    Yiming Huo, Xiaodai Dong, Ping Lu
    Comments: 4 page, 8 figures
    Subjects: Information Theory (cs.IT)

    An energy efficient ultra-wideband (UWB) transmitter based on the novel
    transmitted reference pulse cluster (TRPC) modulation scheme is presented. The
    TRPC-UWB transmitter integrates, namely, wideband active baluns, wideband I-Q
    modulator based up-conversion mixer, and differential to single-ended
    converter. The integrated circuits of TRPC-UWB front end is designed and
    implemented in the 130-nm CMOS process technology. the measured worst-case
    carrier leakage suppression is 22.4 dBc, while the single sideband suppression
    is higher than 31.6 dBc, operating at the frequency from 3.1 GHz to 8.2 GHz.
    With adjustable data rate of 10 to 300 Mbps, the transmitter achieves a high
    energy efficiency of 38.4 pJ/pulse.

    Blind Signal Detection in Massive MIMO: Exploiting the Channel Sparsity

    Jianwen Zhang, Xiaojun Yuan, Ying Jun (Angela)
    Zhang
    Comments: 30 pages, 7 figures, submitted to IEEE Trans. Commun
    Subjects: Information Theory (cs.IT)

    This paper considers a massive MIMO system, where K single-antenna transmit
    terminals communicate with an N-antenna receiver. By massive MIMO, we assume
    N>> K >> 1. We propose a novel blind detection scheme by exploiting the channel
    sparsity inherent to the angular domain of the receive antenna array. We show
    that the overhead for channel acquisition can be largely compensated by the
    potential gain due to the channel sparsity. To this end, we propose a novel
    blind detection scheme that simultaneously estimates the channel and data by
    factorizing the received signal matrix. We show that by exploiting the channel
    sparsity, our proposed scheme can achieve a DoF arbitrarily close to K(1-1/T)
    with T being the channel coherence time, provided that N is sufficiently large
    and the channel is sufficiently sparse. This achievable DoF has a fractional
    gap of only 1/T from the ideal DoF of K, which is a remarkable advance for
    understanding the performance limit of the massive MIMO system. We further show
    that the performance advantage of our proposed scheme in the asymptotic SNR
    regime carries over to the practical SNR regime. In specific, we present an
    efficient message-passing algorithm to jointly estimate the channel and detect
    the data via matrix factorization. Numerical results demonstrate that our
    proposed scheme significantly outperforms its counterpart schemes in the
    practical SNR regime under various system configurations.

    MIMO Underwater Visible Light Communications: Comprehensive Channel Study, Performance Analysis, and Multiple-Symbol Detection

    Mohammad Vahid Jamali, Pooya Nabavi, Jawad A. Salehi
    Subjects: Information Theory (cs.IT)

    In this paper, we analytically study the bit error rate (BER) performance of
    underwater visible light communication (UVLC) systems with binary pulse
    position modulation (BPPM). We simulate the channel fading-free impulse
    response (FFIR) based on Monte Carlo numerical method to take into account the
    absorption and scattering effects. Additionally, to characterize turbulence
    effects, we multiply the aforementioned FFIR by a fading coefficient which for
    weak oceanic turbulence can be modeled as a lognormal random variable (RV).
    Moreover, to mitigate turbulence effects, we employ multiple transmitters
    and/or receivers, i.e., spatial diversity technique over UVLC links.
    Closed-form expressions for the system BER are provided, when equal gain
    combiner (EGC) is employed at the receiver side, thanks to Gauss-Hermite
    quadrature formula and approximation to the sum of lognormal RVs. We further
    apply saddle-point approximation, an accurate photon-counting-based method, to
    evaluate the system BER in the presence of shot noise. Both laser-based
    collimated and light emitting diode (LED)-based diffusive links are
    investigated. Since multiple-scattering effect of UVLC channels on the
    propagating photons causes considerable inter-symbol interference (ISI),
    especially for diffusive channels, we also obtain the optimum multiple-symbol
    detection (MSD) algorithm to significantly alleviate ISI effects and improve
    the system performance. Our numerical analysis indicates good matches between
    the analytical and photon-counting results implying the negligibility of
    signal-dependent shot noise, and also between analytical results and numerical
    simulations confirming the accuracy of our derived closed-form expressions for
    the system BER. Besides, our results show that spatial diversity significantly
    mitigates fading impairments while MSD considerably alleviates ISI
    deteriorations.

    Principal Inertia Components and Applications

    Flavio P. Calmon, Ali Makhdoumi, Muriel Médard, Mayank Varia, Mark Christiansen, Ken R. Duffy
    Comments: Overlaps with arXiv:1405.1472 and arXiv:1310.1512
    Subjects: Information Theory (cs.IT)

    We explore properties and applications of the Principal Inertia Components
    (PICs) between two discrete random variables )X( and )Y(. The PICs lie in the
    intersection of information and estimation theory, and provide a fine-grained
    decomposition of the dependence between )X( and )Y(. Moreover, the PICs
    describe which functions of )X( can or cannot be reliably inferred (in terms of
    MMSE) given an observation of )Y(. We demonstrate that the PICs play an
    important role in information theory, and they can be used to characterize
    information-theoretic limits of certain estimation problems. In privacy
    settings, we prove that the PICs are related to fundamental limits of perfect
    privacy.

    Synchronization Strings: Codes for Insertions and Deletions Approaching the Singleton Bound

    Bernhard Haeupler, Amirbehshad Shahrasbi
    Subjects: Information Theory (cs.IT); Data Structures and Algorithms (cs.DS)

    We introduce synchronization strings as a novel way of efficiently dealing
    with synchronization errors, i.e., insertions and deletions. Synchronization
    errors are strictly more general and much harder to deal with than commonly
    considered half-errors, i.e., symbol corruptions and erasures. For every
    )epsilon >0(, synchronization strings allow to index a sequence with an
    )epsilon^{-O(1)}( size alphabet such that one can efficiently transform )k(
    synchronization errors into )(1+epsilon)k( half-errors. This powerful new
    technique has many applications. In this paper, we focus on designing insdel
    codes, i.e., error correcting block codes (ECCs) for insertion deletion
    channels.

    While ECCs for both half-errors and synchronization errors have been
    intensely studied, the later has largely resisted progress. Indeed, it took
    until 1999 for the first insdel codes with constant rate, constant distance,
    and constant alphabet size to be constructed by Schulman and Zuckerman. Insdel
    codes for asymptotically large or small noise rates were given in 2016 by
    Guruswami et al. but these codes are still polynomially far from the optimal
    rate-distance tradeoff. This makes the understanding of insdel codes up to this
    work equivalent to what was known for regular ECCs after Forney introduced
    concatenated codes in his doctoral thesis 50 years ago.

    A direct application of our synchronization strings based indexing method
    gives a simple black-box construction which transforms any ECC into an equally
    efficient insdel code with a slightly larger alphabet size. This instantly
    transfers much of the highly developed understanding for regular ECCs over
    large constant alphabets into the realm of insdel codes. Most notably, we
    obtain efficient insdel codes which get arbitrarily close to the optimal
    rate-distance tradeoff given by the Singleton bound for the complete noise
    spectrum.

    Sequential Active Detection of Anomalies in Heterogeneous Processes

    Boshuang Huang, Kobi Cohen, Qing Zhao
    Comments: This work has been submitted to the 56th IEEE Conference on Decision and Control. (CDC 2017). arXiv admin note: text overlap with arXiv:1403.1023
    Subjects: Information Theory (cs.IT)

    An active inference problem of detecting an anomalous process among )M(
    heterogeneous processes is considered. At each time, a subset of processes can
    be probed. The objective is a sequential probing strategy that dynamically
    determines which processes to observe at each time and when to terminate the
    search so that the expected detection time is minimized under a constraint on
    the probability of misclassifying any process. This problem falls into the
    general setting of sequential design of experiments pioneered by Chernoff in
    1959, in which a randomized strategy, referred to as the Chernoff test, was
    proposed and shown to be asymptotically optimal. For the problem considered in
    this paper, a low-complexity deterministic test is shown to enjoy the same
    asymptotic optimality while offering significantly better performance in the
    finite regime, especially when the number of processes is large. Extensions to
    detecting multiple anomalous processes are also discussed.

    Network-ensemble comparisons with stochastic rewiring and von Neumann entropy

    Zichao Li, Peter J. Mucha, Dane Taylor
    Comments: 22 pages, 5 figures
    Subjects: Physics and Society (physics.soc-ph); Disordered Systems and Neural Networks (cond-mat.dis-nn); Information Theory (cs.IT); Probability (math.PR); Methodology (stat.ME)

    Assessing whether a given network is typical or atypical for a random-network
    ensemble (i.e., network-ensemble comparison) has widespread applications
    ranging from null-model selection and hypothesis testing to clustering and
    classifying networks. We develop a framework for network-ensemble comparison by
    subjecting the network to stochastic rewiring. We study two rewiring processes,
    uniform and degree-preserved rewiring, which yield random-network ensembles
    that converge to the Erdos-Renyi and configuration-model ensembles,
    respectively. We study convergence through von Neumann entropy (VNE), a network
    summary statistic measuring information content based on the spectra of a
    Laplacian matrix, and develop a perturbation analysis for the expected effect
    of rewiring on VNE. Our analysis yields an estimate for how many rewires are
    required for a given network to resemble a typical network from an ensemble,
    offering a computationally efficient quantity for network-ensemble comparison
    that does not require simulation of the corresponding rewiring process.




沪ICP备19023445号-2号
友情链接