IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Fri, 23 Dec 2016

    我爱机器学习(52ml.net)发表于 2016-12-23 00:00:00
    love 0

    Neural and Evolutionary Computing

    Highway and Residual Networks learn Unrolled Iterative Estimation

    Klaus Greff, Rupesh K. Srivastava, Jürgen Schmidhuber
    Comments: 10 + 3pages, under review for ICLR 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

    The past year saw the introduction of new architectures such as Highway
    networks and Residual networks which, for the first time, enabled the training
    of feedforward networks with dozens to hundreds of layers using simple gradient
    descent. While depth of representation has been posited as a primary reason for
    their success, there are indications that these architectures defy a popular
    view of deep learning as a hierarchical computation of increasingly abstract
    features at each layer.

    In this report, we argue that this view is incomplete and does not adequately
    explain several recent findings. We propose an alternative viewpoint based on
    unrolled iterative estimation—a group of successive layers iteratively refine
    their estimates of the same features instead of computing an entirely new
    representation. We demonstrate that this viewpoint directly leads to the
    construction of Highway and Residual networks. Finally we provide preliminary
    experiments to discuss the similarities and differences between the two
    architectures.

    Difficulty Adjustable and Scalable Constrained Multi-objective Test Problem Toolkit

    Zhun Fan, Wenji Li, Xinye Cai, Hui Li, Kaiwen Hu, Qingfu Zhang, Kalyanmoy Deb, Erik D. Goodman
    Comments: 15 pages, 8 figures, 7 tables
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

    In order to better understand the advantages and disadvantages of a
    constrained multi-objective evolutionary algorithm (CMOEA), it is important to
    understand the nature of difficulty of a constrained multi-objective
    optimization problem (CMOP) that the CMOEA is going to deal with. In this
    paper, we first propose three primary types of difficulty to characterize the
    constraints in CMOPs, including feasibility-hardness, convergence-hardness and
    diversity-hardness. We then develop a general toolkit to construct difficulty
    adjustable CMOPs with three types of parameterized constraint functions
    according to the proposed three primary types of difficulty. In fact,
    combination of the three primary constraint functions with different parameters
    can lead to construct a large variety of CMOPs and CMaOPs, whose difficulty can
    be uniquely defined by a triplet with each of its parameter specifying the
    level of each primary difficulty type respectively. Based on this toolkit, we
    suggest fifteen difficulty adjustable CMOPs named DAC-MOP1-15 with different
    types and levels of difficulty. To study the effectiveness of DAC-MOP1-15, two
    popular CMOEAs – MOEA/D-CDP and NSGA-II-CDP are adopted to test their
    performances on them. Furthermore, this toolkit also has the ability to scale
    the number of objectives. Nine difficulty adjustable constrained many-objective
    optimization problems (DAC-MaOPs) named DAC-MaOP1-9 with the scalability to the
    number of objectives are also proposed using this toolkit. Two constrained
    many-objective evolutionary algorithms (CMaOEAs) – CNSGA-III and CMOEA/DD are
    applied to test their performances on three, five, seven and ten-objective
    DAC-MaOP1-9 with different difficulty levels and types.


    Computer Vision and Pattern Recognition

    Online Semantic Activity Forecasting with DARKO

    Nicholas Rhinehart, Kris M. Kitani
    Comments: this http URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We address the problem of continuously observing and forecasting long-term
    semantic activities of a first-person camera wearer: what the person will do,
    where they will go, and what goal they are seeking. In contrast to prior work
    in trajectory forecasting and short-term activity forecasting, our algorithm,
    DARKO, reasons about the future position, future semantic state, and future
    high-level goals of the camera-wearer at arbitrary spatial and temporal
    horizons defined only by the wearer’s behaviors. DARKO learns and forecasts
    online from first-person observations of the user’s daily behaviors. We derive
    novel mathematical results that enable efficient forecasting of different
    semantic quantities of interest. We apply our method to a dataset of 5
    large-scale environments with 3 different environment types, collected from 3
    different users, and experimentally validate DARKO on forecasting tasks.

    Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics

    Xin Li, Fuxin Li
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep learning has greatly improved visual recognition in recent years.
    However, recent research has shown that there exist many adversarial examples
    that can negatively impact the performance of such an architecture. This paper
    focuses on detecting those adversarial examples by analyzing whether they come
    from the same distribution as the normal examples. Instead of directly training
    a deep neural network to detect adversarials, a much simpler approach is
    proposed based on statistics on outputs from convolutional layers. A cascade
    classifier is designed to efficiently detect adversarials. Furthermore, trained
    from one particular adversarial generating mechanism, the resulting classifier
    can successfully detect adversarials from a completely different mechanism as
    well. After detecting adversarial examples, we show that many of them can be
    recovered by simply performing a small average filter on the image. Those
    findings should provoke us to think more about the classification mechanisms in
    deep convolutional neural networks.

    Internet-Based Image Retrieval Using End-to-End Trained Deep Distributions

    A. Vakhitov, A. Kuzmin, V. Lempitsky
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Internet image search engines have long been considered as a promising tool
    for handling open-vocabulary textual user queries to unannotated image
    datasets. However, systems that use this tool have to deal with multi-modal and
    noisy image sets returned by search engines, especially for polysemous queries.
    Generally, for many queries, only a small part of the returned sets can be
    relevant to the user intent.

    In this work, we suggest an approach that explicitly accounts for the complex
    and noisy structure of the image sets returned by Internet image search
    engines. Similarly to a considerable number of previous image retrieval works,
    we train a deep convolutional network that maps images to high-dimensional
    descriptors. To model image sets obtained from the Internet, our approach then
    fits a simple probabilistic model that accounts for multi-modality and noise
    (e.g. a Gaussian mixture model) to the deep descriptors of the images in this
    set. Finally, the resulting distribution model can be used to search in the
    unannotated image dataset by evaluating likelihoods of individual images.

    As our main contribution, we develop an end-to-end training procedure that
    tunes the parameters of a deep network using an annotated training set, while
    accounting for the distribution fitting and the subsequent matching. In the
    experiments, we show that such an end-to-end approach boosts the accuracy of
    the Internet-based image retrieval for hold-out concepts, as compared to
    retrieval systems that fit similar distribution models to pre-trained features
    and to simpler end-to-end trained baselines.

    MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving

    Marvin Teichmann, Michael Weber, Marius Zoellner, Roberto Cipolla, Raquel Urtasun
    Comments: 9 pages, 7 tables and 9 figures; first place on Kitti Road Segmentation; Code on GitHub (coming soon)
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    While most approaches to semantic reasoning have focused on improving
    performance, in this paper we argue that computational times are very important
    in order to enable real time applications such as autonomous driving. Towards
    this goal, we present an approach to joint classification, detection and
    semantic segmentation via a unified architecture where the encoder is shared
    amongst the three tasks. Our approach is very simple, can be trained end-to-end
    and performs extremely well in the challenging KITTI dataset, outperforming the
    state-of-the-art in the road segmentation task. Our approach is also very
    efficient, taking less than 100 ms to perform all tasks.

    Hardware for Machine Learning: Challenges and Opportunities

    Vivienne Sze, Yu-Hsin Chen, Joel Emer, Amr Suleiman, Zhengdong Zhang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Machine learning plays a critical role in extracting meaningful information
    out of the zetabytes of sensor data collected every day. For some applications,
    the goal is to analyze and understand the data to identify trends (e.g.,
    surveillance, portable/wearable electronics); in other applications, the goal
    is to take immediate action based the data (e.g., robotics/drones, self-driving
    cars, smart Internet of Things). For many of these applications, local embedded
    processing near the sensor is preferred over the cloud due to privacy or
    latency concerns, or limitations in the communication bandwidth. However, at
    the sensor there are often stringent constraints on energy consumption and cost
    in addition to throughput and accuracy requirements. Furthermore, flexibility
    is often required such that the processing can be adapted for different
    applications or environments (e.g., update the weights and model in the
    classifier). In many applications, machine learning often involves transforming
    the input data into a higher dimensional space, which, along with programmable
    weights, increases data movement and consequently energy consumption. In this
    paper, we will discuss how these challenges can be addressed at various levels
    of hardware design ranging from architecture, hardware-friendly algorithms,
    mixed-signal circuits, and advanced technologies (including memories and
    sensors).

    A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search

    Deng Cai
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Approximate Nearest Neighbor (ANN) search is a fundamental problem in many
    areas of machine learning and data mining. During the past decade, numerous
    hashing algorithms are proposed to solve this problem. Every proposed algorithm
    claims outperforms other state-of-the-art methods. However, there are serious
    drawbacks in the evaluation of existing hashing papers and most of the claims
    in these papers should be re-examined. 1) Most of the existing papers failed to
    correctly measure the search time which is essential for the ANN search
    problem. 2) As a result, most of the papers report the performance increases as
    the code length increases, which is wrong if we measure the search time
    correctly. 3) The performance of some hashing algorithms (eg, LSH) can easily
    be boosted if one uses multiple hash tables, which is an important factor
    should be considered in the evaluation while most of the papers failed to do
    so. In this paper, we carefully revisit many popular hashing algorithms and
    suggest one possible promising direction. For the sake of reproducibility, all
    the codes used in the paper are released on Github, which can be used as a
    testing platform to fairly compare various hashing algorithms.

    Cohort of LSTM and lexicon verification for handwriting recognition with gigantic lexicon

    Bruno Stuner, Clément Chatelain, Thierry Paquet
    Comments: 28 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Handwriting recognition state of the art methods are based on Long Short Term
    Memory (LSTM) recurrent neural networks (RNN) coupled with the use of
    linguistic knowledge. LSTM RNN presents high raw performance and interesting
    training properties that allow us to break with the standard method at the
    state of the art. We present a simple and efficient way to extract from a
    single training a large number of complementary LSTM RNN, called cohort,
    combined in a cascade architecture with a lexical verification. This process
    does not require fine tuning, making it easy to use. Our verification allow to
    deal quickly and efficiently with gigantic lexicon (over 3 million words). We
    achieve state of the art results for isolated word recognition with very large
    lexicon and present novel results for an unprecedented gigantic lexicon.

    Deep Blind Compressed Sensing

    Shikha Singh, Vanika Singhal, Angshul Majumdar
    Comments: DCC 2017 Poster
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This work addresses the problem of extracting deeply learned features
    directly from compressive measurements. There has been no work in this area.
    Existing deep learning tools only give good results when applied on the full
    signal, that too usually after preprocessing. These techniques require the
    signal to be reconstructed first. In this work we show that by learning
    directly from the compressed domain, considerably better results can be
    obtained. This work extends the recently proposed framework of deep matrix
    factorization in combination with blind compressed sensing; hence the term deep
    blind compressed sensing. Simulation experiments have been carried out on
    imaging via single pixel camera, under-sampled biomedical signals, arising in
    wireless body area network and compressive hyperspectral imaging. In all cases,
    the superiority of our proposed deep blind compressed sensing can be envisaged.

    Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks

    Yinda Zhang, Shuran Song, Ersin Yumer, Manolis Savva, Joon-Young Lee, Hailin Jin, Thomas Funkhouser
    Comments: CVPR submission, main paper and supplementary material
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Indoor scene understanding is central to applications such as robot
    navigation and human companion assistance. Over the last years, data-driven
    deep neural networks have outperformed many traditional approaches thanks to
    their representation learning capabilities. One of the bottlenecks in training
    for better representations is the amount of available per-pixel ground truth
    data that is required for core scene understanding tasks such as semantic
    segmentation, normal prediction, and object edge detection. To address this
    problem, a number of works proposed using synthetic data. However, a systematic
    study of how such synthetic data is generated is missing. In this work, we
    introduce a large-scale synthetic dataset with 400K physically-based rendered
    images from 45K realistic 3D indoor scenes. We study the effects of rendering
    methods and scene lighting on training for three computer vision tasks: surface
    normal prediction, semantic segmentation, and object boundary detection. This
    study provides insights into the best practices for training with synthetic
    data (more realistic rendering is worth it) and shows that pretraining with our
    new synthetic dataset can improve results beyond the current state of the art
    on all three tasks.

    Efficient Action Detection in Untrimmed Videos via Multi-Task Learning

    Yi Zhu, Shawn Newsam
    Comments: Accepted at WACV 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

    This paper studies the joint learning of action recognition and temporal
    localization in long, untrimmed videos. We employ a multi-task learning
    framework that performs the three highly related steps of action proposal,
    action recognition, and action localization refinement in parallel instead of
    the standard sequential pipeline that performs the steps in order. We develop a
    novel temporal actionness regression module that estimates what proportion of a
    clip contains action. We use it for temporal localization but it could have
    other applications like video retrieval, surveillance, summarization, etc. We
    also introduce random shear augmentation during training to simulate viewpoint
    change. We evaluate our framework on three popular video benchmarks. Results
    demonstrate that our joint model is efficient in terms of storage and
    computation in that we do not need to compute and cache dense trajectory
    features, and that it is several times faster than its sequential ConvNets
    counterpart. Yet, despite being more efficient, it outperforms state-of-the-art
    methods with respect to accuracy.

    Automatic Identification of Scenedesmus Polymorphic Microalgae from Microscopic Images

    Jhony-Heriberto Giraldo-Zuluaga, Geman Diez, Alexander Gomez, Tatiana Martinez, Mariana Peñuela Vasquez, Jesus Francisco Vargas Bonilla, Augusto Salazar
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Microalgae counting is used to measure biomass quantity. Usually, it is
    performed in a manual way using a Neubauer chamber and expert criterion, with
    the risk of a high error rate. This paper addresses the methodology for
    automatic identification of Scenedesmus microalgae (used in the methane
    production and food industry) and applies it to images captured by a digital
    microscope. The use of contrast adaptive histogram equalization for
    pre-processing, and active contours for segmentation are presented. The
    calculation of statistical features (Histogram of Oriented Gradients, Hu and
    Zernike moments) with texture features (Haralick and Local Binary Patterns
    descriptors) are proposed for algae characterization. Scenedesmus algae can
    build coenobia consisting of 1, 2, 4 and 8 cells. The amount of algae of each
    coenobium helps to determine the amount of lipids, proteins, and other
    substances in a given sample of a algae crop. The knowledge of the quantity of
    those elements improves the quality of bioprocess applications. Classification
    of coenobia achieves accuracies of 98.63% and 97.32% with Support Vector
    Machine (SVM) and Artificial Neural Network (ANN), respectively. According to
    the results it is possible to consider the proposed methodology as an
    alternative to the traditional technique for algae counting. The database used
    in this paper is publicly available for download.

    Top-down Visual Saliency Guided by Captions

    Vasili Ramanishka, Abir Das, Jianming Zhang, Kate Saenko
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Top-down saliency methods based on deep neural networks have recently been
    proposed for task-driven visual search. However existing methods focus on
    object or scene classification tasks and cannot be used to compute saliency
    heatmaps using a natural language sentence as the top-down input. Current
    state-of-the-art image and video captioning models can generate accurate
    sentence captions but are difficult to understand, as they do not expose the
    internal process by which spatial and temporal regions are mapped to predicted
    words. In this paper, we expose this mapping and demonstrate that
    spatio-temporal saliency is learned implicitly by the combination of CNN and
    LSTM parts of modern encoder-decoder networks. Our approach, which we call
    Caption-Guided Visual Saliency, can produce spatial or spatio-temporal heatmaps
    for both given input sentences or sentences predicted by our model. Unlike
    recent efforts that introduce explicit “attention” layers to selectively attend
    to certain inputs while generating each word, our approach recovers saliency
    without the overhead of explicit attention layers, and can be used to analyze a
    variety of existing model architectures and improve their design. We evaluate
    our approach on large scale video and image datasets and make several
    interesting discoveries about the inner workings of captioning models. The
    source code is publicly available at
    github.com/VisionLearningGroup/caption-guided-saliency.

    Re-evaluating Automatic Metrics for Image Captioning

    Mert Kilickaya, Aykut Erdem, Nazli Ikizler-Cinbis, Erkut Erdem
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    The task of generating natural language descriptions from images has received
    a lot of attention in recent years. Consequently, it is becoming increasingly
    important to evaluate such image captioning approaches in an automatic manner.
    In this paper, we provide an in-depth evaluation of the existing image
    captioning metrics through a series of carefully designed experiments.
    Moreover, we explore the utilization of the recently proposed Word Mover’s
    Distance (WMD) document metric for the purpose of image captioning. Our
    findings outline the differences and/or similarities between metrics and their
    relative robustness by means of extensive correlation, accuracy and distraction
    based evaluations. Our results also demonstrate that WMD provides strong
    advantages over other metrics.


    Artificial Intelligence

    Jointly Extracting Relations with Class Ties via Effective Deep Ranking

    Hai Ye, Wenhan Chao, Zhunchen Luo
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    In distant supervised relation extraction, the connection between relations
    of one entity tuple, which we call class ties, is common. Exploiting this
    connection may be promising for relation extraction. However, this property is
    seldom considered by previous work. In this work, to leverage class ties, we
    propose to make joint relation extraction with a unified model that integrates
    convolutional neural network with a general pairwise ranking framework, in
    which two novel ranking loss functions are introduced. Besides, an effective
    method is proposed to relieve the impact of relation NR (not relation) for
    model training and test. Experimental results on a widely used dataset show
    that: (1) Our model is much more superior than the baselines, achieving
    state-of-the-art performance; (2) Leveraging class ties, joint extraction is
    indeed better than separated extraction; (3) Relieving the impact of NR will
    significantly boost our model performance; (4) Our model can primely deal with
    wrong labeling problem.

    Solving Set Optimization Problems by Cardinality Optimization via Weak Constraints with an Application to Argumentation

    Wolfgang Faber, Mauro Vallati, Federico Cerutti, Massimiliano Giacomin
    Comments: Informal proceedings of the 1st Workshop on Trends and Applications of Answer Set Programming (TAASP 2016), Klagenfurt, Austria, 26 September 2016
    Subjects: Artificial Intelligence (cs.AI)

    Optimization – minimization or maximization – in the lattice of subsets is a
    frequent operation in Artificial Intelligence tasks. Examples are
    subset-minimal model-based diagnosis, nonmonotonic reasoning by means of
    circumscription, or preferred extensions in abstract argumentation. Finding the
    optimum among many admissible solutions is often harder than finding admissible
    solutions with respect to both computational complexity and methodology. This
    paper addresses the former issue by means of an effective method for finding
    subset-optimal solutions. It is based on the relationship between
    cardinality-optimal and subset-optimal solutions, and the fact that many
    logic-based declarative programming systems provide constructs for finding
    cardinality-optimal solutions, for example maximum satisfiability (MaxSAT) or
    weak constraints in Answer Set Programming (ASP). Clearly each
    cardinality-optimal solution is also a subset-optimal one, and if the language
    also allows for the addition of particular restricting constructs (both MaxSAT
    and ASP do) then all subset-optimal solutions can be found by an iterative
    computation of cardinality-optimal solutions. As a showcase, the computation of
    preferred extensions of abstract argumentation frameworks using the proposed
    method is studied.

    The SP Theory of Intelligence as a Foundation for the Development of a General, Human-Level Thinking Machine

    J Gerard Wolff
    Subjects: Artificial Intelligence (cs.AI)

    This paper summarises how the “SP theory of intelligence” and its realisation
    in the “SP computer model” simplifies and integrates concepts across artificial
    intelligence and related areas, and thus provides a promising foundation for
    the development of a general, human-level thinking machine, in accordance with
    the main goal of research in artificial general intelligence.

    The key to this simplification and integration is the powerful concept of
    “multiple alignment”, borrowed and adapted from bioinformatics. This concept
    has the potential to be the “double helix” of intelligence, with as much
    significance for human-level intelligence as has DNA for biological sciences.

    Strengths of the SP system include: versatility in the representation of
    diverse kinds of knowledge; versatility in aspects of intelligence (including:
    strengths in unsupervised learning; the processing of natural language; pattern
    recognition at multiple levels of abstraction that is robust in the face of
    errors in data; several kinds of reasoning (including: one-step `deductive’
    reasoning; chains of reasoning; abductive reasoning; reasoning with
    probabilistic networks and trees; reasoning with ‘rules’; nonmonotonic
    reasoning and reasoning with default values; Bayesian reasoning with
    ‘explaining away’; and more); planning; problem solving; and more); seamless
    integration of diverse kinds of knowledge and diverse aspects of intelligence
    in any combination; and potential for application in several areas (including:
    helping to solve nine problems with big data; helping to develop human-level
    intelligence in autonomous robots; serving as a database with intelligence and
    with versatility in the representation and integration of several forms of
    knowledge; serving as a vehicle for medical knowledge and as an aid to medical
    diagnosis; and several more).

    Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning

    Wendelin Böhmer, Rong Guo, Klaus Obermayer
    Comments: This paper has been presented at the 13th European Workshop on Reinforcement Learning (EWRL 2016) on the 3rd and 4th of December 2016 in Barcelona, Spain
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    This paper investigates a type of instability that is linked to the greedy
    policy improvement in approximated reinforcement learning. We show empirically
    that non-deterministic policy improvement can stabilize methods like LSPI by
    controlling the improvements’ stochasticity. Additionally we show that a
    suitable representation of the value function also stabilizes the solution to
    some degree. The presented approach is simple and should also be easily
    transferable to more sophisticated algorithms like deep reinforcement learning.

    Highway and Residual Networks learn Unrolled Iterative Estimation

    Klaus Greff, Rupesh K. Srivastava, Jürgen Schmidhuber
    Comments: 10 + 3pages, under review for ICLR 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

    The past year saw the introduction of new architectures such as Highway
    networks and Residual networks which, for the first time, enabled the training
    of feedforward networks with dozens to hundreds of layers using simple gradient
    descent. While depth of representation has been posited as a primary reason for
    their success, there are indications that these architectures defy a popular
    view of deep learning as a hierarchical computation of increasingly abstract
    features at each layer.

    In this report, we argue that this view is incomplete and does not adequately
    explain several recent findings. We propose an alternative viewpoint based on
    unrolled iterative estimation—a group of successive layers iteratively refine
    their estimates of the same features instead of computing an entirely new
    representation. We demonstrate that this viewpoint directly leads to the
    construction of Highway and Residual networks. Finally we provide preliminary
    experiments to discuss the similarities and differences between the two
    architectures.

    Difficulty Adjustable and Scalable Constrained Multi-objective Test Problem Toolkit

    Zhun Fan, Wenji Li, Xinye Cai, Hui Li, Kaiwen Hu, Qingfu Zhang, Kalyanmoy Deb, Erik D. Goodman
    Comments: 15 pages, 8 figures, 7 tables
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

    In order to better understand the advantages and disadvantages of a
    constrained multi-objective evolutionary algorithm (CMOEA), it is important to
    understand the nature of difficulty of a constrained multi-objective
    optimization problem (CMOP) that the CMOEA is going to deal with. In this
    paper, we first propose three primary types of difficulty to characterize the
    constraints in CMOPs, including feasibility-hardness, convergence-hardness and
    diversity-hardness. We then develop a general toolkit to construct difficulty
    adjustable CMOPs with three types of parameterized constraint functions
    according to the proposed three primary types of difficulty. In fact,
    combination of the three primary constraint functions with different parameters
    can lead to construct a large variety of CMOPs and CMaOPs, whose difficulty can
    be uniquely defined by a triplet with each of its parameter specifying the
    level of each primary difficulty type respectively. Based on this toolkit, we
    suggest fifteen difficulty adjustable CMOPs named DAC-MOP1-15 with different
    types and levels of difficulty. To study the effectiveness of DAC-MOP1-15, two
    popular CMOEAs – MOEA/D-CDP and NSGA-II-CDP are adopted to test their
    performances on them. Furthermore, this toolkit also has the ability to scale
    the number of objectives. Nine difficulty adjustable constrained many-objective
    optimization problems (DAC-MaOPs) named DAC-MaOP1-9 with the scalability to the
    number of objectives are also proposed using this toolkit. Two constrained
    many-objective evolutionary algorithms (CMaOEAs) – CNSGA-III and CMOEA/DD are
    applied to test their performances on three, five, seven and ten-objective
    DAC-MaOP1-9 with different difficulty levels and types.

    Counting Answer Sets via Dynamic Programming

    Johannes Fichte, Markus Hecher, Michael Morak, Stefan Woltran
    Comments: Informal proceedings of the 1st Workshop on Trends and Applications of Answer Set Programming (TAASP 2016), Klagenfurt, Austria, 26 September 2016
    Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC)

    While the solution counting problem for propositional satisfiability (#SAT)
    has received renewed attention in recent years, this research trend has not
    affected other AI solving paradigms like answer set programming (ASP). Although
    ASP solvers are designed to enumerate all solutions, and counting can therefore
    be easily done, the involved materialization of all solutions is a clear
    bottleneck for the counting problem of ASP (#ASP). In this paper we propose
    dynamic programming-based #ASP algorithms that exploit the structure of the
    underlying (ground) ASP program. Experimental results for a prototype
    implementation show promise when compared to existing solvers.

    Causal Effect Identification in Acyclic Directed Mixed Graphs and Gated Models

    Jose M. Peña, Marcus Bendtsen
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI)

    We introduce a new family of graphical models that consists of graphs with
    possibly directed, undirected and bidirected edges but without directed cycles.
    We show that these models are suitable for representing causal models with
    additive error terms. We provide a set of sufficient graphical criteria for the
    identification of arbitrary causal effects when the new models contain directed
    and undirected edges but no bidirected edge. We also provide a necessary and
    sufficient graphical criterion for the identification of the causal effect of a
    single variable on the rest of the variables. Moreover, we develop an exact
    algorithm for learning the new models from observational and interventional
    data via answer set programming. Finally, we introduce gated models for causal
    effect identification, a new family of graphical models that exploits context
    specific independences to identify additional causal effects.


    Information Retrieval

    Development of UMLS Based Health Care Web Services for Android Platform

    Nareena Soomro, Safeeulah Soomro, Zainab Alansari, Suhni Abbasi, Mohammad Riyaz Belgaum, Abdul Baqi Khakwani
    Journal-ref: Sindh Univ. Res. Jour. (Sci. Ser.) Vol. 48 (4D) 05-08 (2016)
    Subjects: Computers and Society (cs.CY); Information Retrieval (cs.IR); Software Engineering (cs.SE)

    In this fast developing world of information, the amount of medical knowledge
    is rising at an exponential level. The UMLS (Unified Medical Language Systems),
    is rich knowledge base consisting files and software that provides many health
    and biomedical vocabularies and standards. A Web service is a web solution to
    facilitate machine-to-machine interaction over a network. Few UMLS web services
    are currently available for portable devices, but most of them lack in
    efficiency and performance. It is proposed to develop Android-based web
    services for healthcare systems underlying rich knowledge source of UMLS. The
    experimental evaluation was made to analyse the efficiency and performance
    effect with and without using the designed prototype. The understand-ability
    and interaction with the prototype were greater than those who used the
    alternate sources to obtain the answers to their questions. The overall
    performance indicates that the system is convenient and easy to use. The result
    of the evaluation clearly proved that designed system retrieves all the
    pertinent information better than syntactic searches.


    Computation and Language

    Re-evaluating Automatic Metrics for Image Captioning

    Mert Kilickaya, Aykut Erdem, Nazli Ikizler-Cinbis, Erkut Erdem
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    The task of generating natural language descriptions from images has received
    a lot of attention in recent years. Consequently, it is becoming increasingly
    important to evaluate such image captioning approaches in an automatic manner.
    In this paper, we provide an in-depth evaluation of the existing image
    captioning metrics through a series of carefully designed experiments.
    Moreover, we explore the utilization of the recently proposed Word Mover’s
    Distance (WMD) document metric for the purpose of image captioning. Our
    findings outline the differences and/or similarities between metrics and their
    relative robustness by means of extensive correlation, accuracy and distraction
    based evaluations. Our results also demonstrate that WMD provides strong
    advantages over other metrics.

    Noise Mitigation for Neural Entity Typing and Relation Extraction

    Yadollah Yaghoobzadeh, Heike Adel, Hinrich Schütze
    Comments: EACL 2017; the first two authors contributed equally to this work
    Subjects: Computation and Language (cs.CL)

    In this paper, we address two different types of noise in information
    extraction models: noise from distant supervision and noise from pipeline input
    features. Our target tasks are entity typing and relation extraction. For the
    first noise type, we introduce multi-instance multi-label learning algorithms
    using neural network models, and apply them to fine-grained entity typing for
    the first time. This gives our models comparable performance with the
    state-of-the-art supervised approach which uses global embeddings of entities.
    For the second noise type, we propose ways to improve the integration of noisy
    entity type predictions into relation extraction. Our experiments show that
    probabilistic predictions are more robust than discrete predictions and that
    joint training of the two tasks performs best.

    Continuous multilinguality with language vectors

    Robert Östling, Jörg Tiedemann
    Subjects: Computation and Language (cs.CL)

    Most existing models for multilingual natural language processing (NLP) treat
    language as a discrete category, and make predictions for either one language
    or the other. In contrast, we propose using continuous vector representations
    of language. We show that these can be learned efficiently with a
    character-based neural language model, and used to improve inference about
    language varieties not seen during training. In experiments with 1303 Bible
    translations into 990 different languages, we empirically explore the capacity
    of multilingual language models, and also show that the language vectors
    capture genetic relationships between languages.

    A Context-aware Attention Network for Interactive Question Answering

    Huayu Li, Martin Renqiang Min, Yong Ge, Asim Kadav
    Comments: 13 pages, 2 figures
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    We develop a new model for Interactive Question Answering (IQA), using
    Gated-Recurrent-Unit recurrent networks (GRUs) as encoders for statements and
    questions, and another GRU as a decoder for outputs. Distinct from previous
    work, our approach employs context-dependent word-level attention for more
    accurate statement representations and question-guided sentence-level attention
    for better context modeling. Employing these mechanisms, our model accurately
    understands when it can output an answer or when it requires generating a
    supplementary question for additional input. When available, user’s feedback is
    encoded and directly applied to update sentence-level attention to infer the
    answer. Extensive experiments on QA and IQA datasets demonstrate quantitatively
    the effectiveness of our model with significant improvement over conventional
    QA models.

    Jointly Extracting Relations with Class Ties via Effective Deep Ranking

    Hai Ye, Wenhan Chao, Zhunchen Luo
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    In distant supervised relation extraction, the connection between relations
    of one entity tuple, which we call class ties, is common. Exploiting this
    connection may be promising for relation extraction. However, this property is
    seldom considered by previous work. In this work, to leverage class ties, we
    propose to make joint relation extraction with a unified model that integrates
    convolutional neural network with a general pairwise ranking framework, in
    which two novel ranking loss functions are introduced. Besides, an effective
    method is proposed to relieve the impact of relation NR (not relation) for
    model training and test. Experimental results on a widely used dataset show
    that: (1) Our model is much more superior than the baselines, achieving
    state-of-the-art performance; (2) Leveraging class ties, joint extraction is
    indeed better than separated extraction; (3) Relieving the impact of NR will
    significantly boost our model performance; (4) Our model can primely deal with
    wrong labeling problem.


    Distributed, Parallel, and Cluster Computing

    Pot: Deterministic transactional execution

    Tiago M. Vale, João A. Silva, Ricardo J. Dias, João M. Lourenço
    Comments: Published in ACM Transactions on Architecture and Code Optimization (TACO) 13, 4
    Journal-ref: ACM Trans. Archit. Code Optim. 13, 4, Article 52 (December 2016)
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    This paper presents Pot, a system that leverages the concept of preordered
    transactions to achieve deterministic multithreaded execution of programs that
    use Transactional Memory. Preordered transactions eliminate the root cause of
    nondeterminism in transactional execution: they provide the illusion of
    executing in a deterministic serial order, unlike traditional transactions
    which appear to execute in a nondeterministic order that can change from
    execution to execution. Pot uses a new concurrency control protocol that
    exploits the serialization order to distinguish between fast and speculative
    transaction execution modes in order to mitigate the overhead of imposing a
    deterministic order. We build two Pot prototypes: one using STM and another
    using off-the-shelf HTM. To the best of our knowledge, Pot enables
    deterministic execution of programs using off-the-shelf HTM for the first time.
    An experimental evaluation shows that Pot achieves deterministic execution of
    TM programs with low overhead, sometimes even outperforming nondeterministic
    executions, and clearly outperforming the state of the art.

    An efficient hybrid tridiagonal divide-and-conquer algorithm on distributed memory architectures

    Shengguo Li, Francois-Henry Rouet, Jie Liu, Chun Huang, Xingyu Gao, Xuebin Chi
    Comments: 20 pages, 7 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Numerical Analysis (cs.NA)

    In this paper, an efficient divide-and-conquer (DC) algorithm is proposed for
    the symmetric tridiagonal matrices based on ScaLAPACK and the hierarchically
    semiseparable (HSS) matrices. HSS is an important type of rank-structured
    matrices.Most time of the DC algorithm is cost by computing the eigenvectors
    via the matrix-matrix multiplications (MMM). In our parallel hybrid DC (PHDC)
    algorithm, MMM is accelerated by using the HSS matrix techniques when the
    intermediate matrix is large. All the HSS algorithms are done via the package
    STRUMPACK. PHDC has been tested by using many different matrices. Compared with
    the DC implementation in MKL, PHDC can be faster for some matrices with few
    deflations when using hundreds of processes. However, the gains decrease as the
    number of processes increases. The comparisons of PHDC with ELPA (the
    Eigenvalue soLvers for Petascale Applications library) are similar. PHDC is
    usually slower than MKL and ELPA when using 300 or more processes on Tianhe-2
    supercomputer.

    Comparative Analysis of SpatialHadoop and GeoSpark for Geospatial Big Data Analytics

    Rakesh K. Lenka, Rabindra K. Barik, Noopur Gupta, Syed Mohd Ali, Amiya Rath, Harishchandra Dubey
    Comments: 6 pages, 7 figures,table1
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computers and Society (cs.CY)

    In this digitalised world where every information is stored, the data a are
    growing exponentially. It is estimated that data are doubles itself every two
    years. Geospatial data are one of the prime contributors to the big data
    scenario. There are numerous tools of the big data analytics. But not all the
    big data analytics tools are capabilities to handle geospatial big data. In the
    present paper, it has been discussed about the recent two popular open source
    geospatial big data analytical tools i.e. Spatial- Hadoop and GeoSpark which
    can be used for analysis and process the geospatial big data in efficient
    manner. It has compared the architectural view of SpatialHadoop and GeoSpark.
    Through the architectural comparison, it has also summarised the merits and
    demerits of these tools according the execution times and volume of the data
    which has been used.

    Vertex-Centric Graph Processing: The Good, the Bad, and the Ugly

    Arijit Khan
    Comments: 6 pages, 5 figures
    Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    We study distributed graph algorithms that adopt an iterative vertex-centric
    framework for graph processing, popularized by the Google’s Pregel system.
    Since then, there are several attempts to implement many graph algorithms in a
    vertex-centric framework, as well as efforts to design optimization techniques
    for improving the efficiency. However, to the best of our knowledge, there has
    not been any systematic study to compare these vertex-centric implementations
    with their sequential counterparts. Our paper addresses this gap in two ways.
    (1) We analyze the computational complexity of such implementations with the
    notion of time-processor product, and benchmark several vertex-centric graph
    algorithms whether they perform more work with respect to their best-known
    sequential solutions. (2) Employing the concept of balanced practical Pregel
    algorithms, we study if these implementations suffer from imbalanced workload
    and large number of iterations. Our findings illustrate that with the exception
    of Euler tour tree algorithm, all other algorithms either perform more work
    than their best-known sequential approach, or suffer from imbalanced workload/
    large number of iterations, or even both. We also emphasize on graph algorithms
    that are fundamentally difficult to be expressed in vertex-centric frameworks,
    and conclude by discussing the road ahead for distributed graph processing.


    Learning

    Deep Learning and Its Applications to Machine Health Monitoring: A Survey

    Rui Zhao, Ruqiang Yan, Zhenghua Chen, Kezhi Mao, Peng Wang, Robert X. Gao
    Comments: 14 pages, 9 figures, submitted to IEEE Transactions on Neural Networks and Learning Systems
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Since 2006, deep learning (DL) has become a rapidly growing research
    direction, redefining state-of-the-art performances in a wide range of areas
    such as object recognition, image segmentation, speech recognition and machine
    translation. In modern manufacturing systems, data-driven machine health
    monitoring is gaining in popularity due to the widespread deployment of
    low-cost sensors and their connection to the Internet. Meanwhile, deep learning
    provides useful tools for processing and analyzing these big machinery data.
    The main purpose of this paper is to review and summarize the emerging research
    work of deep learning on machine health monitoring. After the brief
    introduction of deep learning techniques, the applications of deep learning in
    machine health monitoring systems are reviewed mainly from the following
    aspects: Auto-encoder (AE) and its variants, Restricted Boltzmann Machines and
    its variants including Deep Belief Network (DBN) and Deep Boltzmann Machines
    (DBM), Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN).
    Finally, some new trends of DL-based machine health monitoring methods are
    discussed.

    A note on the function approximation error bound for risk-sensitive reinforcement learning

    Prasenjit Karmakar, Shalabh Bhatnagar
    Comments: 4 pages technical short note
    Subjects: Learning (cs.LG)

    In this paper we improve the existing function approximation error bound for
    the policy evaluation algorithm when the aim is to find the risk-sensitive cost
    represented using exponential utility.

    On Coreset Constructions for the Fuzzy (K)-Means Problem

    Johannes Blömer, Sascha Brauer, Kathrin Bujna
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS)

    In this paper, we present coreset constructions for the fuzzy (K)-means
    problem. First, we show that one can construct a weak coresets for fuzzy
    (K)-means. Second, we show that there are coresets for fuzzy (K)-means with
    respect to balanced fuzzy (K)-means solutions. Third, we use these coresets to
    develop a randomized approximation algorithm whose runtime is polynomial in the
    number of the given points and the dimension of these points.

    How to Train Your Deep Neural Network with Dictionary Learning

    Vanika Singhal, Shikha Singh, Angshul Majumdar
    Comments: DCC 2017 poster
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Currently there are two predominant ways to train deep neural networks. The
    first one uses restricted Boltzmann machine (RBM) and the second one
    autoencoders. RBMs are stacked in layers to form deep belief network (DBN); the
    final representation layer is attached to the target to complete the deep
    neural network. Autoencoders are nested one inside the other to form stacked
    autoencoders; once the stcaked autoencoder is learnt the decoder portion is
    detached and the target attached to the deepest layer of the encoder to form
    the deep neural network. This work proposes a new approach to train deep neural
    networks using dictionary learning as the basic building block; the idea is to
    use the features from the shallower layer as inputs for training the next
    deeper layer. One can use any type of dictionary learning (unsupervised,
    supervised, discriminative etc.) as basic units till the pre-final layer. In
    the final layer one needs to use the label consistent dictionary learning
    formulation for classification. We compare our proposed framework with existing
    state-of-the-art deep learning techniques on benchmark problems; we are always
    within the top 10 results. In actual problems of age and gender classification,
    we are better than the best known techniques.

    Microstructure Representation and Reconstruction of Heterogeneous Materials via Deep Belief Network for Computational Material Design

    Ruijin Cang, Yaopengxiao Xu, Shaohua Chen, Yongming Liu, Yang Jiao, Max Yi Ren
    Comments: 27 pages, 17 figures
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Integrated Computational Materials Engineering (ICME) aims to accelerate
    optimal design of complex material systems by integrating material science and
    design automation. For tractable ICME, it is required that (1) a structural
    feature space be identified to allow reconstruction of new designs, and (2) the
    reconstruction process be property-preserving. The majority of existing
    structural presentation schemes rely on the designer’s understanding of
    specific material systems to identify geometric and statistical features, which
    could be biased and insufficient for reconstructing physically meaningful
    microstructures of complex material systems. In this paper, we develop a
    feature learning mechanism based on convolutional deep belief network to
    automate a two-way conversion between microstructures and their
    lower-dimensional feature representations, and to achieves a 1000-fold
    dimension reduction from the microstructure space. The proposed model is
    applied to a wide spectrum of heterogeneous material systems with distinct
    microstructural features including Ti-6Al-4V alloy, Pb63-Sn37 alloy,
    Fontainebleau sandstone, and Spherical colloids, to produce material
    reconstructions that are close to the original samples with respect to 2-point
    correlation functions and mean critical fracture strength. This capability is
    not achieved by existing synthesis methods that rely on the Markovian
    assumption of material microstructures.

    Detecting Unusual Input-Output Associations in Multivariate Conditional Data

    Charmgil Hong, Milos Hauskrecht
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Despite tremendous progress in outlier detection research in recent years,
    the majority of existing methods are designed only to detect unconditional
    outliers that correspond to unusual data patterns expressed in the joint space
    of all data attributes. Such methods are not applicable when we seek to detect
    conditional outliers that reflect unusual responses associated with a given
    context or condition. This work focuses on multivariate conditional outlier
    detection, a special type of the conditional outlier detection problem, where
    data instances consist of multi-dimensional input (context) and output
    (responses) pairs. We present a novel outlier detection framework that
    identifies abnormal input-output associations in data with the help of a
    decomposable conditional probabilistic model that is learned from all data
    instances. Since components of this model can vary in their quality, we combine
    them with the help of weights reflecting their reliability in assessment of
    outliers. We study two ways of calculating the component weights: global that
    relies on all data, and local that relies only on instances similar to the
    target instance. Experimental results on data from various domains demonstrate
    the ability of our framework to successfully identify multivariate conditional
    outliers.

    Highway and Residual Networks learn Unrolled Iterative Estimation

    Klaus Greff, Rupesh K. Srivastava, Jürgen Schmidhuber
    Comments: 10 + 3pages, under review for ICLR 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

    The past year saw the introduction of new architectures such as Highway
    networks and Residual networks which, for the first time, enabled the training
    of feedforward networks with dozens to hundreds of layers using simple gradient
    descent. While depth of representation has been posited as a primary reason for
    their success, there are indications that these architectures defy a popular
    view of deep learning as a hierarchical computation of increasingly abstract
    features at each layer.

    In this report, we argue that this view is incomplete and does not adequately
    explain several recent findings. We propose an alternative viewpoint based on
    unrolled iterative estimation—a group of successive layers iteratively refine
    their estimates of the same features instead of computing an entirely new
    representation. We demonstrate that this viewpoint directly leads to the
    construction of Highway and Residual networks. Finally we provide preliminary
    experiments to discuss the similarities and differences between the two
    architectures.

    Stacking machine learning classifiers to identify Higgs bosons at the LHC

    Alexandre Alves
    Comments: 7 pages, 3 figures, 3 tables
    Subjects: High Energy Physics – Phenomenology (hep-ph); Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)

    Machine learning (ML) algorithms have been employed in the problem of
    classifying signal and background events with high accuracy in particle
    physics. In this paper, we use a widespread ML technique, namely, emph{stacked
    generalization}, for the task of discovering a new neutral Higgs boson in gluon
    fusion. We found that, at the same time it demands far less computation
    efforts, emph{stacking} ML algorithms performs almost as well as deep neural
    networks (DNN) trained exclusively with kinematic distributions for the same
    task by building either a highly discriminating linear model or a shallower
    neural network with stacked ML outputs. We also show that it is possible to
    outperform DNN in this channel by partially exploring correlations among the
    classifiers outputs in a multivariate statistical analysis.

    Structured Sequence Modeling with Graph Convolutional Recurrent Networks

    Youngjoo Seo, Michaël Defferrard, Pierre Vandergheynst, Xavier Bresson
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    This paper introduces Graph Convolutional Recurrent Network (GCRN), a deep
    learning model able to predict structured sequences of data. Precisely, GCRN is
    a generalization of classical recurrent neural networks (RNN) to data
    structured by an arbitrary graph. Such structured sequences can represent
    series of frames in videos, spatio-temporal measurements on a network of
    sensors, or random walks on a vocabulary graph for natural language modeling.
    The proposed model combines convolutional neural networks (CNN) on graphs to
    identify spatial structures and RNN to find dynamic patterns. We study two
    possible architectures of GCRN, and apply the models to two practical problems:
    predicting moving MNIST data, and modeling natural language with the Penn
    Treebank dataset. Experiments show that exploiting simultaneously graph spatial
    and dynamic information about data can improve both precision and learning
    speed.

    Finding Statistically Significant Attribute Interactions

    Andreas Henelius, Antti Ukkonen, Kai Puolamäki
    Comments: 9 pages, 4 tables, 1 figure
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    In many data exploration tasks it is meaningful to identify groups of
    attribute interactions that are specific to a variable of interest. These
    interactions are also useful in several practical applications, for example, to
    gain insight into the structure of the data, in feature selection, and in data
    anonymisation. We present a novel method, based on statistical significance
    testing, that can be used to verify if the data set has been created by a given
    factorized class-conditional joint distribution, where the distribution is
    parametrized by partition of its attributes. Furthermore, we provide a method,
    named ASTRID, to automatically find a partition of attributes that describes
    the distribution that has generated the data. The state-of-the-art classifiers
    are utilized to capture the interactions present in the data by systematically
    breaking attribute interactions and observing the effect of this breaking on
    classifier performance. We empirically demonstrate the utility of the proposed
    method with real and synthetic data as well as with usage examples.

    Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning

    Wendelin Böhmer, Rong Guo, Klaus Obermayer
    Comments: This paper has been presented at the 13th European Workshop on Reinforcement Learning (EWRL 2016) on the 3rd and 4th of December 2016 in Barcelona, Spain
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    This paper investigates a type of instability that is linked to the greedy
    policy improvement in approximated reinforcement learning. We show empirically
    that non-deterministic policy improvement can stabilize methods like LSPI by
    controlling the improvements’ stochasticity. Additionally we show that a
    suitable representation of the value function also stabilizes the solution to
    some degree. The presented approach is simple and should also be easily
    transferable to more sophisticated algorithms like deep reinforcement learning.

    Robustness of Voice Conversion Techniques Under Mismatched Conditions

    Monisankha Pal, Dipjyoti Paul, Md Sahidullah, Goutam Saha
    Comments: 5 pages, 2 figures
    Subjects: Sound (cs.SD); Learning (cs.LG); Machine Learning (stat.ML)

    Most of the existing studies on voice conversion (VC) are conducted in
    acoustically matched conditions between source and target signal. However, the
    robustness of VC methods in presence of mismatch remains unknown. In this
    paper, we report a comparative analysis of different VC techniques under
    mismatched conditions. The extensive experiments with five different VC
    techniques on CMU ARCTIC corpus suggest that performance of VC methods
    substantially degrades in noisy conditions. We have found that bilinear
    frequency warping with amplitude scaling (BLFWAS) outperforms other methods in
    most of the noisy conditions. We further explore the suitability of different
    speech enhancement techniques for robust conversion. The objective evaluation
    results indicate that spectral subtraction and log minimum mean square error
    (logMMSE) based speech enhancement techniques can be used to improve the
    performance in specific noisy conditions.

    A Context-aware Attention Network for Interactive Question Answering

    Huayu Li, Martin Renqiang Min, Yong Ge, Asim Kadav
    Comments: 13 pages, 2 figures
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    We develop a new model for Interactive Question Answering (IQA), using
    Gated-Recurrent-Unit recurrent networks (GRUs) as encoders for statements and
    questions, and another GRU as a decoder for outputs. Distinct from previous
    work, our approach employs context-dependent word-level attention for more
    accurate statement representations and question-guided sentence-level attention
    for better context modeling. Employing these mechanisms, our model accurately
    understands when it can output an answer or when it requires generating a
    supplementary question for additional input. When available, user’s feedback is
    encoded and directly applied to update sentence-level attention to infer the
    answer. Extensive experiments on QA and IQA datasets demonstrate quantitatively
    the effectiveness of our model with significant improvement over conventional
    QA models.

    Distributed Dictionary Learning

    Amir Daneshmand, Gesualdo Scutari, Francisco Facchinei
    Subjects: Optimization and Control (math.OC); Learning (cs.LG)

    The paper studies distributed Dictionary Learning (DL) problems where the
    learning task is distributed over a multi-agent network with time-varying
    (nonsymmetric) connectivity. This formulation is relevant, for instance, in
    big-data scenarios where massive amounts of data are collected/stored in
    different spatial locations and it is unfeasible to aggregate and/or process
    all the data in a fusion center, due to resource limitations, communication
    overhead or privacy considerations. We develop a general distributed
    algorithmic framework for the (nonconvex) DL problem and establish its
    asymptotic convergence. The new method hinges on Successive Convex
    Approximation (SCA) techniques coupled with i) a gradient tracking mechanism
    instrumental to locally estimate the missing global information; and ii) a
    consensus step, as a mechanism to distribute the computations among the agents.
    To the best of our knowledge, this is the first distributed algorithm with
    provable convergence for the DL problem and, more in general, bi-convex
    optimization problems over (time-varying) directed graphs.


    Information Theory

    Sum-networks from undirected graphs: construction and capacity analysis

    Ardhendu Tripathy, Aditya Ramamoorthy
    Comments: This has an extra Appendix section giving more information about Remark 2 of the original paper published in the 52nd Annual Allerton Conference on Communication, Control and Computing, 2014
    Subjects: Information Theory (cs.IT)

    We consider a directed acyclic network with multiple sources and multiple
    terminals where each terminal is interested in decoding the sum of independent
    sources generated at the source nodes. We describe a procedure whereby a simple
    undirected graph can be used to construct such a sum-network and demonstrate an
    upper bound on its computation rate. Furthermore, we show sufficient conditions
    for the construction of a linear network code that achieves this upper bound.
    Our procedure allows us to construct sum-networks that have any arbitrary
    computation rate (frac{p}{q}) (where (p,q) are non-negative integers). Our
    work significantly generalizes a previous approach for constructing
    sum-networks with arbitrary capacities. Specifically, we answer an open
    question in prior work by demonstrating sum-networks with significantly fewer
    number of sources and terminals.

    Compressed Sensing Algorithms for OFDM Channel Estimation

    Jonathan Ling, Dmitry Chizhik, A. Tulino, Inaki Esnaola
    Comments: Patent no: US8929390 Methods And Apparatuses For Channel Estimation In Wireless Networks
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    Radio channels are typically sparse in the delay domain, and ideal for
    compressed sensing. A new compressed sensing algorithm called eX-OMP is
    developed that yields performance similar to that of the optimal MMSE
    estimator. The new algorithm relies on a small amount additional data. Both
    eX-OMP and the MMSE estimator adaptively balance channel tracking and noise
    reduction. They perform better than simple estimators such as the
    linear-interpolator which fix this trade-off a priori. Some wideband
    measurements are examined, and the channels are found to be represented by a
    few delays.

    Cache-induced Hierarchical Cooperation in Wireless Device-to-Device Caching Networks

    An Liu, Vincent Lau, Giuseppe Caire
    Comments: 45 pages, 10 figures, submitted to IEEE Transactions on Information Theory
    Subjects: Information Theory (cs.IT)

    We consider a wireless device-to-device (D2D) caching network where n nodes
    are placed on a regular grid of area A(n). Each node caches L_C*F (coded) bits
    from a library of size L*F bits, where L is the number of files and F is the
    size of each file. Each node requests a file from the library independently
    according to a popularity distribution. Under a commonly used “physical model”
    and Zipf popularity distribution, we characterize the optimal per-node capacity
    scaling law for extended networks (i.e., A(n). Moreover, we propose a
    cache-induced hierarchical cooperation scheme and associated cache content
    placement optimization algorithm to achieve the optimal per-node capacity
    scaling law. When the path loss exponent alpha<3, the optimal per-node
    capacity scaling law achieved by the cache-induced hierarchical cooperation can
    be significantly better than that achieved by the existing state-of-the-art
    schemes. To the best of our knowledge, this is the first work that completely
    characterizes the per-node capacity scaling law for wireless caching networks
    under the physical model and Zipf distribution with an arbitrary skewness
    parameter au. While scaling law analysis yields clean results, it may not
    accurately reflect the throughput performance of a large network with a finite
    number of nodes. Therefore, we also analyze the throughput of the proposed
    cache-induced hierarchical cooperation for networks of practical size. The
    analysis and simulations verify that cache-induced hierarchical cooperation can
    also achieve a large throughput gain over the cache-assisted multihop scheme
    for networks of practical size.

    Stopping Condition for Greedy Block Sparse Signal Recovery

    Yu Luo, Ronggui Xie, Huarui Yin, Weidong Wang
    Comments: 5 pages, 5 figures
    Subjects: Information Theory (cs.IT)

    For greedy block sparse recovery where the sparsity level is unknown, we
    derive a stopping condition to stop the iteration process. Focused on the block
    orthogonal matching pursuit (BOMP) algorithm, we model the energy of residual
    signals at each iteration from a probabilistic perspective. At the iteration
    when the last supporting block is detected, the resulting energy of residual
    signals is supposed to suffer an obvious decrease. Based on this, we stop the
    iteration process when the energy of residual signals is below a given
    threshold. Compared with other approaches, our derived condition works well for
    the BOMP recovery. What is more, we promote our approach to the interference
    cancellation based BOMP (ICBOMP) recovery in paper [1]. Simulation results show
    that our derived condition can save many unnecessary iterations and at the same
    time guarantees a favorable recovery accuracy, both for the BOMP and ICBOMP
    recoveries.

    Delay Optimal Joint Scheduling-and-Power-Control for Cognitive Radio Uplinks

    Ahmed Ewaisha, Cihan Tepedelenlioglu
    Comments: Globecom 2016. arXiv admin note: text overlap with arXiv:1602.08010, arXiv:1601.00608
    Subjects: Information Theory (cs.IT)

    An uplink cognitive radio system with a single primary user (PU) and multiple
    secondary users (SUs) is considered. The SUs have an individual average delay
    constraint and an aggregate average interference constraint to the PU. If the
    interference channels between the SUs and the PU are statistically different
    due to the different physical locations of the SUs, the SUs will experience
    different delay performances. This is because SUs located closer to the PU
    transmit with lower power levels. A dynamic scheduling-and-power-allocation
    policy that uses the dynamic programming, is proposed. The proposed policy can
    provide the required average delay guarantees to all SUs irrespective of their
    location as well as protect the PU from harmful interference. The policy is
    shown to be asymptotically delay optimal in the light traffic regime.
    Motivated, by the high complexity of the dynamic programming algorithm in the
    optimal policy we exploit the structure of the problem’s solution to present an
    alternative suboptimal policy. Through simulations we show that in both light
    and heavy traffic regimes, the delay performance of the suboptimal policy is
    within (0.3\%) from the optimal policy and both outperforming existing methods.

    Statistical limits of spiked tensor models

    Amelia Perry, Alexander S. Wein, Afonso S. Bandeira
    Comments: 38 pages, 5 figures
    Subjects: Probability (math.PR); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)

    We study the statistical limits of both detecting and estimating a rank-one
    deformation of a symmetric random Gaussian tensor. We establish upper and lower
    bounds on the critical signal-to-noise ratio, under a variety of priors for the
    planted vector: (i) a uniformly sampled unit vector, (ii) i.i.d. (pm 1)
    entries, and (iii) a sparse vector where a constant fraction (
    ho) of entries
    are i.i.d. (pm 1) and the rest are zero. For each of these cases, our upper
    and lower bounds match up to a (1+o(1)) factor as the order (d) of the tensor
    becomes large. For sparse signals (iii), our bounds are also asymptotically
    tight in the sparse limit (
    ho o 0) for any fixed (d) (including the (d=2)
    case of sparse PCA). Our upper bounds for (i) demonstrate a phenomenon
    reminiscent of the work of Baik, Ben Arous and P’ech’e: an `eigenvalue’ of a
    perturbed tensor emerges from the bulk at a strictly lower signal-to-noise
    ratio than when the perturbation itself exceeds the bulk; we quantify the size
    of this effect. We also provide some general results for larger classes of
    priors. In particular, the large (d) asymptotics of the threshold location
    differs between problems with discrete priors versus continuous priors.
    Finally, for priors (i) and (ii) we carry out the replica prediction from
    statistical physics, which is conjectured to give the exact
    information-theoretic threshold for any fixed (d).

    Of independent interest, we introduce a new improvement to the second moment
    method for contiguity, on which our lower bounds are based. Our technique
    conditions away from rare `bad’ events that depend on interactions between the
    signal and noise. This enables us to close (sqrt{2})-factor gaps present in
    several previous works.

    Partial (ell_1) optimization in random linear systems — finite dimensions

    Mihailo Stojnic
    Comments: arXiv admin note: text overlap with arXiv:1612.06344
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Probability (math.PR)

    In this paper we provide a complementary set of results to those we present
    in our companion work cite{Stojnicl1HidParasymldp} regarding the behavior of
    the so-called partial (ell_1) (a variant of the standard (ell_1) heuristic
    often employed for solving under-determined systems of linear equations). As is
    well known through our earlier works
    cite{StojnicICASSP10knownsupp,StojnicTowBettCompSens13}, the partial (ell_1)
    also exhibits the phase-transition (PT) phenomenon, discovered and well
    understood in the context of the standard (ell_1) through Donoho’s and our own
    works cite{DonohoPol,DonohoUnsigned,StojnicCSetam09,StojnicUpper10}.
    cite{Stojnicl1HidParasymldp} goes much further though and, in addition to the
    determination of the partial (ell_1)’s phase-transition curves (PT curves)
    (which had already been done in
    cite{StojnicICASSP10knownsupp,StojnicTowBettCompSens13}), provides a
    substantially deeper understanding of the PT phenomena through a study of the
    underlying large deviations principles (LDPs). As the PT and LDP phenomena are
    by their definitions related to large dimensional settings, both sets of our
    works, cite{StojnicICASSP10knownsupp,StojnicTowBettCompSens13} and
    cite{Stojnicl1HidParasymldp}, consider what is typically called the asymptotic
    regime. In this paper we move things in a different direction and consider
    finite dimensional scenarios. Basically, we provide explicit performance
    characterizations for any given collection of systems/parameters dimensions. We
    do so for two different variants of the partial (ell_1), one that we call
    exactly the partial (ell_1) and another one, possibly a bit more practical,
    that we call the hidden partial (ell_1).

    Partial (ell_1) optimization in random linear systems — phase transitions and large deviations

    Mihailo Stojnic
    Comments: arXiv admin note: text overlap with arXiv:1612.06361, arXiv:1612.06835
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Probability (math.PR)

    (ell_1) optimization is a well known heuristic often employed for solving
    various forms of sparse linear problems. In this paper we look at its a variant
    that we refer to as the emph{partial} (ell_1) and discuss its mathematical
    properties when used for solving linear under-determined systems of equations.
    We will focus on large random systems and discuss the phase transition (PT)
    phenomena and how they connect to the large deviation principles (LDP). Using a
    variety of probabilistic and geometric techniques that we have developed in
    recent years we will first present general guidelines that conceptually fully
    characterize both, the PTs and the LDPs. After that we will put an emphasis on
    providing a collection of explicit analytical solutions to all of the
    underlying mathematical problems. As a nice bonus to the developed concepts,
    the forms of the analytical solutions will, in our view, turn out to be fairly
    elegant as well.




沪ICP备19023445号-2号
友情链接