IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Fri, 14 Oct 2016

    我爱机器学习(52ml.net)发表于 2016-10-14 00:00:00
    love 0

    Neural and Evolutionary Computing

    Tensorial Mixture Models

    Or Sharir, Ronen Tamari, Nadav Cohen, Amnon Shashua
    Comments: 19 pages including figures and not including appendices, 9 figures. A git repository for reproducing our experiments is available at: this https URL
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We introduce a generative model, we call Tensorial Mixture Models (TMMs)
    based on mixtures of basic component distributions over local structures (e.g.
    patches in an image) where the dependencies between the local-structures are
    represented by a “priors tensor” holding the prior probabilities of assigning a
    component distribution to each local-structure.

    In their general form, TMMs are intractable as the prior tensor is typically
    of exponential size. However, when the priors tensor is decomposed it gives
    rise to an arithmetic circuit which in turn transforms the TMM into a
    Convolutional Arithmetic Circuit (ConvAC). A ConvAC corresponds to a shallow
    (single hidden layer) network when the priors tensor is decomposed by a CP (sum
    of rank-1) approach and corresponds to a deep network when the decomposition
    follows the Hierarchical Tucker (HT) model.

    The ConvAC representation of a TMM possesses several attractive properties.
    First, the inference is tractable and is implemented by a forward pass through
    a deep network. Second, the architectural design of the model follows the deep
    networks community design, i.e., the structure of TMMs is determined by just
    two easily understood factors: size of pooling windows and number of channels.
    Finally, we demonstrate the effectiveness of our model when tackling the
    problem of classification with missing data, leveraging TMMs unique ability of
    tractable marginalization which leads to optimal classifiers regardless of the
    missingness distribution.

    Why Deep Neural Networks?

    Shiyu Liang, R. Srikant
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recently there has been much interest in understanding why deep neural
    networks are preferred to shallow networks. In this paper, we show that, for a
    large class of piecewise smooth functions, the number of neurons needed by a
    shallow network to approximate a function is exponentially larger than the
    corresponding number of neurons needed by a deep network for a given degree of
    function approximation. First, we consider univariate functions on a bounded
    interval and require a neural network to achieve an approximation error of
    $varepsilon$ uniformly over the interval. We show that shallow networks (i.e.,
    networks whose depth does not depend on $varepsilon$) require
    $Omega( ext{poly}(1/varepsilon))$ neurons while deep networks (i.e.,
    networks whose depth grows with $1/varepsilon$) require
    $mathcal{O}( ext{polylog}(1/varepsilon))$ neurons. We then extend these
    results to certain classes of important multivariate functions. Our results are
    derived for neural networks which use a combination of rectifier linear units
    (ReLUs) and binary step units, two of the most popular type of activation
    functions. Our analysis builds on this simple observation that the binary
    approximation of a real number in the interval $[0,1]$ can be represented by a
    deep neural network which uses a “small” number of neurons.

    Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding

    Lina M. Rojas Barahona, Milica Gasic, Nikola Mrkšić, Pei-Hao Su, Stefan Ultes, Tsung-Hsien Wen, Steve Young
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    This paper presents a deep learning architecture for the semantic decoder
    component of a Statistical Spoken Dialogue System. In a slot-filling dialogue,
    the semantic decoder predicts the dialogue act and a set of slot-value pairs
    from a set of n-best hypotheses returned by the Automatic Speech Recognition.
    Most current models for spoken language understanding assume (i) word-aligned
    semantic annotations as in sequence taggers and (ii) delexicalisation, or a
    mapping of input words to domain-specific concepts using heuristics that try to
    capture morphological variation but that do not scale to other domains nor to
    language variation (e.g., morphology, synonyms, paraphrasing ). In this work
    the semantic decoder is trained using unaligned semantic annotations and it
    uses distributed semantic representation learning to overcome the limitations
    of explicit delexicalisation. The proposed architecture uses a convolutional
    neural network for the sentence representation and a long-short term memory
    network for the context representation. Results are presented for the publicly
    available DSTC2 corpus and an In-car corpus which is similar to DSTC2 but has a
    significantly higher word error rate (WER).

    A Survey of Voice Translation Methodologies – Acoustic Dialect Decoder

    Hans Krupakar, Keerthika Rajvel, Bharathi B, Angel Deborah S, Vallidevi Krishnamurthy
    Comments: (8 pages, 7 figures, IEEE Digital Xplore paper)
    Journal-ref: 2016 International Conference on Information Communication and
    Embedded Systems (ICICES), Chennai, 2016, pp. 1-9
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Machine Learning (stat.ML)

    Speech Translation has always been about giving source text or audio input
    and waiting for system to give translated output in desired form. In this
    paper, we present the Acoustic Dialect Decoder (ADD) – a voice to voice
    ear-piece translation device. We introduce and survey the recent advances made
    in the field of Speech Engineering, to employ in the ADD, particularly focusing
    on the three major processing steps of Recognition, Translation and Synthesis.
    We tackle the problem of machine understanding of natural language by designing
    a recognition unit for source audio to text, a translation unit for source
    language text to target language text, and a synthesis unit for target language
    text to target language speech. Speech from the surroundings will be recorded
    by the recognition unit present on the ear-piece and translation will start as
    soon as one sentence is successfully read. This way, we hope to give translated
    output as and when input is being read. The recognition unit will use Hidden
    Markov Models (HMMs) Based Tool-Kit (HTK), hybrid RNN systems with gated memory
    cells, and the synthesis unit, HMM based speech synthesis system HTS. This
    system will initially be built as an English to Tamil translation device.


    Computer Vision and Pattern Recognition

    GPU-accelerated real-time stixel computation

    Daniel Hernandez-Juarez, Antonio Espinosa, David Vázquez, Antonio Manuel López, Juan Carlos Moure
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The Stixel World is a medium-level, compact representation of road scenes
    that abstracts millions of disparity pixels into hundreds or thousands of
    stixels. The goal of this work is to implement and evaluate a complete
    multi-stixel estimation pipeline on an embedded, energy-efficient,
    GPU-accelerated device. This work presents a full GPU-accelerated
    implementation of stixel estimation that produces reliable results at 26 frames
    per second (real-time) on the Tegra X1 for disparity images of 1024×440 pixels
    and stixel widths of 5 pixels, and achieves more than 400 frames per second on
    a high-end Titan X GPU card.

    Embedded real-time stereo estimation via Semi-Global Matching on the GPU

    Daniel Hernandez-Juarez, Alejandro Chacón, Antonio Espinosa, David Vázquez, Juan Carlos Moure, Antonio Manuel López
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Dense, robust and real-time computation of depth information from
    stereo-camera systems is a computationally demanding requirement for robotics,
    advanced driver assistance systems (ADAS) and autonomous vehicles. Semi-Global
    Matching (SGM) is a widely used algorithm that propagates consistency
    constraints along several paths across the image. This work presents a
    real-time system producing reliable disparity estimation results on the new
    embedded energy-efficient GPU devices. Our design runs on a Tegra X1 at 42
    frames per second (fps) for an image size of 640×480, 128 disparity levels, and
    using 4 path directions for the SGM method.

    Automatic View-Point Selection for Inter-Operative Endoscopic Surveillance

    Anant S. Vemuri, Stephane A. Nicolau, Jacques Marescaux, Luc Soler, Nicholas Ayache
    Comments: Medical Content-based Retrieval for Clinical Decision Support and Treatment Planning, MICCAI Conference
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Esophageal adenocarcinoma arises from Barrett’s esophagus, which is the most
    serious complication of gastroesophageal reflux disease. Strategies for
    screening involve periodic surveillance and tissue biopsies. A major challenge
    in such regular examinations is to record and track the disease evolution and
    re-localization of biopsied sites to provide targeted treatments. In this
    paper, we extend our original inter-operative relocalization framework to
    provide a constrained image based search for obtaining the best view-point
    match to the live view. Within this context we investigate the effect of: the
    choice of feature descriptors and color-space; filtering of uninformative
    frames and endoscopic modality, for view-point localization. Our experiments
    indicate an improvement in the best view-point retrieval rate to [92%,87%] from
    [73%,76%] (in our previous approach) for NBI and WL.

    Towards end-to-end optimisation of functional image analysis pipelines

    Albert Vilamala, Kristoffer Hougaard Madsen, Lars Kai Hansen
    Comments: 7 pages, 2 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

    The study of neurocognitive tasks requiring accurate localisation of activity
    often rely on functional Magnetic Resonance Imaging, a widely adopted technique
    that makes use of a pipeline of data processing modules, each involving a
    variety of parameters. These parameters are frequently set according to the
    local goal of each specific module, not accounting for the rest of the
    pipeline. Given recent success of neural network research in many different
    domains, we propose to convert the whole data pipeline into a deep neural
    network, where the parameters involved are jointly optimised by the network to
    best serve a common global goal. As a proof of concept, we develop a module
    able to adaptively apply the most suitable spatial smoothing to every brain
    volume for each specific neuroimaging task, and we validate its results in a
    standard brain decoding experiment.

    Video Fill in the Blank with Merging LSTMs

    Amir Mazaheri, Dong Zhang, Mubarak Shah
    Comments: for Large Scale Movie Description and Understanding Challenge (LSMDC) 2016, “Movie fill-in-the-blank” Challenge, UCF_CRCV
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Given a video and its incomplete textural description with missing words, the
    Video-Fill-in-the-Blank (ViFitB) task is to automatically find the missing
    word. The contextual information of the sentences are important to infer the
    missing words; the visual cues are even more crucial to get a more accurate
    inference. In this paper, we presents a new method which intuitively takes
    advantage of the structure of the sentences and employs merging LSTMs (to merge
    two LSTMs) to tackle the problem with embedded textural and visual cues. In the
    experiments, we have demonstrated the superior performance of the proposed
    method on the challenging “Movie Fill-in-the-Blank” dataset.

    Stroke Sequence-Dependent Deep Convolutional Neural Network for Online Handwritten Chinese Character Recognition

    Baotian Hu, Xin Liu, Xiangping Wu, Qingcai Chen
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we propose a novel model, named Stroke Sequence-dependent Deep
    Convolutional Neural Network (SSDCNN), using the stroke sequence information
    and eight-directional features for Online Handwritten Chinese Character
    Recognition (OLHCCR). On one hand, SSDCNN can learn the representation of
    Online Handwritten Chinese Character (OLHCC) by incorporating the natural
    sequence information of the strokes. On the other hand, SSDCNN can incorporate
    eight-directional features in a natural way. In order to train SSDCNN, we
    divide the process of training into two stages: 1) The training data is used to
    pre-train the whole architecture until the performance tends to converge. 2)
    Fully-connected neural network which is used to combine the stroke
    sequence-dependent representation with eight-directional features and softmax
    layer are further trained. Experiments were conducted on the OLHCCR competition
    tasks of ICDAR 2013. Results show that, SSDCNN can reduce the recognition error
    by 50\% (5.13\% vs 2.56\%) compared to the model which only use
    eight-directional features. The proposed SSDCNN achieves 97.44\% accuracy which
    reduces the recognition error by about 1.9\% compared with the best submitted
    system on ICDAR2013 competition. These results indicate that SSDCNN can exploit
    the stroke sequence information to learn high-quality representation of OLHCC.
    It also shows that the learnt representation and the classical
    eight-directional features complement each other within the SSDCNN
    architecture.

    Predicting the dynamics of 2d objects with a deep residual network

    François Fleuret
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We investigate how a residual network can learn to predict the dynamics of
    interacting shapes purely as an image-to-image regression problem.

    With a simple 2d physics simulator, we generate short sequences composed of
    rectangles put in motion by applying a pulling force at a point picked at
    random. The network is trained with a quadratic loss to predict the image of
    the resulting configuration, given the image of the starting configuration and
    an image indicating the point of grasping.

    Experiments show that the network learns to predict accurately the resulting
    image, which implies in particular that (1) it segments rectangles as distinct
    components, (2) it infers which one contains the grasping point, (3) it models
    properly the dynamic of a single rectangle, including the torque, (4) it
    detects and handles collisions to some extent, and (5) it re-synthesizes
    properly the entire scene with displaced rectangles.

    Semi-Coupled Two-Stream Fusion ConvNets for Action Recognition at Extremely Low Resolutions

    Jiawei Chen, Jonathan Wu, Janusz Konrad, Prakash Ishwar
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep convolutional neural networks (ConvNets) have been recently shown to
    attain state-of-the-art performance for action recognition on
    standard-resolution videos. However, less attention has been paid to
    recognition performance at extremely low resolutions (eLR) (e.g., 16 x 12
    pixels). Reliable action recognition using eLR cameras would address privacy
    concerns in various application environments such as private homes, hospitals,
    nursing/rehabilitation facilities, etc. In this paper, we propose a
    semi-coupled filter-sharing network that leverages high resolution (HR) videos
    during training in order to assist an eLR ConvNet. We also study methods for
    fusing spatial and temporal ConvNets customized for eLR videos in order to take
    advantage of appearance and motion information. Our method outperforms
    state-of-the-art methods at extremely low resolutions on IXMAS (93.7%) and HMDB
    (29.2%) datasets.

    Theory and computer simulation of the moiré patterns in single-layer cylindrical particles

    Vladimir Saveljev, Irina Palchikova
    Comments: 8 pages, 14 figures, 45 equations; first written on July 3, 2016, last modified on October 12, 2016
    Subjects: Mesoscale and Nanoscale Physics (cond-mat.mes-hall); Computer Vision and Pattern Recognition (cs.CV)

    Basing on the theory for arbitrary oriented surfaces, we developed the theory
    of the moir’e effect for cylindrical single-layer objects in the paraxial
    approximation. With using the dual grids, the moir’e effect in the plane
    gratings is simulated, as well as the near-axis moir’e effect in cylinders
    including the chiral layouts. The results can be applied to the graphene
    layers, to single-walled nanotubes, and to cylinders in general.


    Artificial Intelligence

    An Information Theoretic Feature Selection Framework for Big Data under Apache Spark

    Sergio Ramírez-Gallego
    Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    With the advent of extremely high dimensional datasets, dimensionality
    reduction techniques are becoming mandatory. Among many techniques, feature
    selection has been growing in interest as an important tool to identify
    relevant features on huge datasets –both in number of instances and
    features–. The purpose of this work is to demonstrate that standard feature
    selection methods can be parallelized in Big Data platforms like Apache Spark,
    boosting both performance and accuracy. We thus propose a distributed
    implementation of a generic feature selection framework which includes a wide
    group of well-known Information Theoretic methods. Experimental results on a
    wide set of real-world datasets show that our distributed framework is capable
    of dealing with ultra-high dimensional datasets as well as those with a huge
    number of samples in a short period of time, outperforming the sequential
    version in all the cases studied.

    Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding

    Lina M. Rojas Barahona, Milica Gasic, Nikola Mrkšić, Pei-Hao Su, Stefan Ultes, Tsung-Hsien Wen, Steve Young
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    This paper presents a deep learning architecture for the semantic decoder
    component of a Statistical Spoken Dialogue System. In a slot-filling dialogue,
    the semantic decoder predicts the dialogue act and a set of slot-value pairs
    from a set of n-best hypotheses returned by the Automatic Speech Recognition.
    Most current models for spoken language understanding assume (i) word-aligned
    semantic annotations as in sequence taggers and (ii) delexicalisation, or a
    mapping of input words to domain-specific concepts using heuristics that try to
    capture morphological variation but that do not scale to other domains nor to
    language variation (e.g., morphology, synonyms, paraphrasing ). In this work
    the semantic decoder is trained using unaligned semantic annotations and it
    uses distributed semantic representation learning to overcome the limitations
    of explicit delexicalisation. The proposed architecture uses a convolutional
    neural network for the sentence representation and a long-short term memory
    network for the context representation. Results are presented for the publicly
    available DSTC2 corpus and an In-car corpus which is similar to DSTC2 but has a
    significantly higher word error rate (WER).

    Improved Knowledge Base Completion by Path-Augmented TransR Model

    Wenhao Huang, Ge Li, Zhi Jin
    Subjects: Artificial Intelligence (cs.AI)

    Knowledge base completion aims to infer new relations from existing
    information. In this paper, we propose path-augmented TransR (PTransR) model to
    improve the accuracy of link prediction. In our approach, we base PTransR model
    on TransR, which is the best one-hop model at present. Then we regularize
    TransR with information of relation paths. In our experiment, we evaluate
    PTransR on the task of entity prediction. Experimental results show that
    PTransR outperforms previous models.

    A fuzzy expert system for earthquake prediction, case study: the Zagros range

    Arash Andalib, Mehdi Zare, Farid Atry
    Comments: 4 pages, 4 figures in proceedings of the third International Conference on Modeling, Simulation and Applied Optimization, 2009
    Subjects: Artificial Intelligence (cs.AI)

    A methodology for the development of a fuzzy expert system (FES) with
    application to earthquake prediction is presented. The idea is to reproduce the
    performance of a human expert in earthquake prediction. To do this, at the
    first step, rules provided by the human expert are used to generate a fuzzy
    rule base. These rules are then fed into an inference engine to produce a fuzzy
    inference system (FIS) and to infer the results. In this paper, we have used a
    Sugeno type fuzzy inference system to build the FES. At the next step, the
    adaptive network-based fuzzy inference system (ANFIS) is used to refine the FES
    parameters and improve its performance. The proposed framework is then employed
    to attain the performance of a human expert used to predict earthquakes in the
    Zagros area based on the idea of coupled earthquakes. While the prediction
    results are promising in parts of the testing set, the general performance
    indicates that prediction methodology based on coupled earthquakes needs more
    investigation and more complicated reasoning procedure to yield satisfactory
    predictions.

    Stream Reasoning-Based Control of Caching Strategies in CCN Routers

    Harald Beck, Bruno Bierbaumer, Minh Dao-Tran, Thomas Eiter, Hermann Hellwagner, Konstantin Schekotihin
    Comments: 21 pages, 8 figures
    Subjects: Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)

    Content-Centric Networking (CCN) research addresses the mismatch between the
    modern usage of the Internet and its outdated architecture. Importantly, CCN
    routers may locally cache frequently requested content in order to speed up
    delivery to end users. Thus, the issue of caching strategies arises, i.e.,
    which content shall be stored and when it should be replaced. In this work, we
    employ novel techniques towards intelligent administration of CCN routers that
    autonomously switch between existing strategies in response to changing content
    request patterns. In particular, we present a router architecture for CCN
    networks that is controlled by rule-based stream reasoning, following the
    recent formal framework LARS which extends Answer Set Programming for streams.
    The obtained possibility for flexible router configuration at runtime allows
    for faster experimentation and may thus help to advance the further development
    of CCN. Moreover, the empirical evaluation of our feasibility study shows that
    the resulting caching agent may give significant performance gains.

    A Fuzzy Logic System to Analyze a Student's Lifestyle

    Sourish Ghosh, Aaditya Sanjay Boob, Nishant Nikhil, Nayan Raju Vysyaraju, Ankit Kumar
    Comments: Submitting in ICACI 2017
    Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

    A college student’s life can be primarily categorized into domains such as
    education, health, social and other activities which may include daily chores
    and travelling time. Time management is crucial for every student. A self
    realisation of one’s daily time expenditure in various domains is therefore
    essential to maximize one’s effective output. This paper presents how a mobile
    application using Fuzzy Logic and Global Positioning System (GPS) analyzes a
    student’s lifestyle and provides recommendations and suggestions based on the
    results.

    Reset-free Trial-and-Error Learning for Data-Efficient Robot Damage Recovery

    Konstantinos Chatzilygeroudis, Vassilis Vassiliades, Jean-Baptiste Mouret
    Comments: 8 pages, 6 figures, 3 pseudocodes/algorithms
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

    The high probability of hardware failures prevents many advanced robots (e.g.
    legged robots) to be confidently deployed in real-world situations (e.g
    post-disaster rescue). Instead of attempting to diagnose the failure(s), robots
    could adapt by trial-and-error in order to be able to complete their tasks.
    However, the best trial-and-error algorithms for robotics are all episodic:
    between each trial, the robot needs to be put back in the same state, that is,
    the robot is not learning autonomously. In this paper, we introduce a novel
    learning algorithm called “Reset-free Trial-and-Error” (RTE) that allows robots
    to recover from damage while completing their tasks. We evaluate it on a
    hexapod robot that is damaged in several ways (e.g. a missing leg, a shortened
    leg, etc.) and whose objective is to reach a sequence of targets in an arena.
    Our experiments show that the robot can recover most of its locomotion
    abilities in a few minutes, in an environment with obstacles, and without any
    human intervention. Overall, this new algorithm makes it possible to
    contemplate sending robots to places that are truly too dangerous for humans
    and in which robots cannot be rescued.

    Truthful Mechanisms for Matching and Clustering in an Ordinal World

    Elliot Anshelevich, Shreyas Sekar
    Comments: To appear in the Proceedings of WINE 2016. The introduction of this paper has minor overlap with arXiv:1512.05504 but the results are mutually exclusive
    Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)

    We study truthful mechanisms for matching and related problems in a partial
    information setting, where the agents’ true utilities are hidden, and the
    algorithm only has access to ordinal preference information. Our model is
    motivated by the fact that in many settings, agents cannot express the
    numerical values of their utility for different outcomes, but are still able to
    rank the outcomes in their order of preference. Specifically, we study problems
    where the ground truth exists in the form of a weighted graph of agent
    utilities, but the algorithm can only elicit the agents’ private information in
    the form of a preference ordering for each agent induced by the underlying
    weights. Against this backdrop, we design truthful algorithms to approximate
    the true optimum solution with respect to the hidden weights. Our techniques
    yield universally truthful algorithms for a number of graph problems: a
    1.76-approximation algorithm for Max-Weight Matching, 2-approximation algorithm
    for Max k-matching, a 6-approximation algorithm for Densest k-subgraph, and a
    2-approximation algorithm for Max Traveling Salesman as long as the hidden
    weights constitute a metric. We also provide improved approximation algorithms
    for such problems when the agents are not able to lie about their preferences.
    Our results are the first non-trivial truthful approximation algorithms for
    these problems, and indicate that in many situations, we can design robust
    algorithms even when the agents may lie and only provide ordinal information
    instead of precise utilities.

    Bank Card Usage Prediction Exploiting Geolocation Information

    Martin Wistuba, Nghia Duong-Trung, Nicolas Schilling, Lars Schmidt-Thieme
    Comments: Describes the winning solution for the ECML-PKDD 2016 Discovery Challenge on Bank Card Usage Analysis. Final results on the private leaderboard are available here: this https URL
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    We describe the solution of team ISMLL for the ECML-PKDD 2016 Discovery
    Challenge on Bank Card Usage for both tasks. Our solution is based on three
    pillars. Gradient boosted decision trees as a strong regression and
    classification model, an intensive search for good hyperparameter
    configurations and strong features that exploit geolocation information. This
    approach achieved the best performance on the public leaderboard for the first
    task and a decent fourth position for the second task.


    Information Retrieval

    Unorganized Malicious Attacks Detection

    Ming Pang, Wei Gao, Min Tao, Zhi-Hua Zhou
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

    Recommender system has attracted much attention during the past decade, and
    many attack detection algorithms have been developed for better recommendation.
    Most previous approaches focus on the shilling attacks, where the attack
    organizer fakes a large number of user profiles by the same strategy to promote
    or demote an item. In this paper, we study a different attack style:
    unorganized malicious attacks, where attackers respectively use a small number
    of user profiles to attack their own target items without any organizer. This
    attack style occurs in many real applications, yet relevant study remains open.
    In this paper, we formulate the unorganized malicious attacks detection as a
    variant of matrix completion problem, and prove that attackers can be detected
    theoretically. We propose the Unorganized Malicious Attacks detection (UMA)
    algorithm, which can be viewed as a proximal alternating splitting augmented
    Lagrangian method. We verify, both theoretically and empirically, the
    effectiveness of our proposed algorithm.

    Emergency Identification and Analysis with EAIMS

    Richard McCreadie, Craig Macdonald, Iadh Ounis
    Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    Social media platforms are now a key source of information for a large
    segment of the public. As such, these platforms have a great potential as a
    means to provide real-time information to emergency management agencies.
    Moreover, during an emergency, these agencies are very interested in social
    media as a means to find public-driven response efforts, as well as to track
    how their handling of that emergency is being perceived. However, there is
    currently a lack advanced tools designed for monitoring social media during
    emergencies. The Emergency Analysis Identification and Management System
    (EAIMS) is a prototype service that aims to fill this technology gap by
    providing richer analytic and exploration tools than current solutions. In
    particular, EAIMS provides real-time detection of emergency events, related
    information finding, information access and credibility analysis tools for use
    over social media during emergencies.


    Computation and Language

    Gated End-to-End Memory Networks

    Julien Perez, Fei Liu
    Comments: 9 pages, 3 figures, 3 tables
    Subjects: Computation and Language (cs.CL)

    Machine reading using differentiable reasoning models has recently shown
    remarkable progress. In this context, End-to-End trainable Memory Networks,
    MemN2N, have demonstrated promising performance on simple natural language
    based reasoning tasks such as factual reasoning and basic deduction. However,
    other tasks, namely multi-fact question-answering, positional reasoning or
    dialog related tasks, remain challenging particularly due to the necessity of
    more complex interactions between the memory and controller modules composing
    this family of models. In this paper, we introduce a novel end-to-end memory
    access regulation mechanism inspired by the current progress on the connection
    short-cutting principle in the field of computer vision. Concretely, we develop
    a Gated End-to-End trainable Memory Network architecture, GMemN2N. From the
    machine learning perspective, this new capability is learned in an end-to-end
    fashion without the use of any additional supervision signal which is, as far
    as our knowledge goes, the first of its kind. Our experiments show significant
    improvements on the most challenging tasks in the 20 bAbI dataset, without the
    use of any domain knowledge. Then, we show improvements on the dialog bAbI
    tasks including the real human-bot conversion-based Dialog State Tracking
    Challenge (DSTC-2) dataset. On these two datasets, our model sets the new state
    of the art.

    Dialogue Session Segmentation by Embedding-Enhanced TextTiling

    Yiping Song, Lili Mou, Rui Yan, Li Yi, Zinan Zhu, Xiaohua Hu, Ming Zhang
    Comments: INTERSPEECH-16, pages 2706–2710
    Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)

    In human-computer conversation systems, the context of a user-issued
    utterance is particularly important because it provides useful background
    information of the conversation. However, it is unwise to track all previous
    utterances in the current session as not all of them are equally important. In
    this paper, we address the problem of session segmentation. We propose an
    embedding-enhanced TextTiling approach, inspired by the observation that
    conversation utterances are highly noisy, and that word embeddings provide a
    robust way of capturing semantics. Experimental results show that our approach
    achieves better performance than the TextTiling, MMD approaches.

    Compressing Neural Language Models by Sparse Word Representations

    Yunchuan Chen, Lili Mou, Yan Xu, Ge Li, Zhi Jin
    Comments: ACL-16, pages 226–235
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Neural networks are among the state-of-the-art techniques for language
    modeling. Existing neural language models typically map discrete words to
    distributed, dense vector representations. After information processing of the
    preceding context words by hidden layers, an output layer estimates the
    probability of the next word. Such approaches are time- and memory-intensive
    because of the large numbers of parameters for word embeddings and the output
    layer. In this paper, we propose to compress neural language models by sparse
    word representations. In the experiments, the number of parameters in our model
    increases very slowly with the growth of the vocabulary size, which is almost
    imperceptible. Moreover, our approach not only reduces the parameter space to a
    large extent, but also improves the performance in terms of the perplexity
    measure.

    A Neural Network for Coordination Boundary Prediction

    Jessica Ficler, Yoav Goldberg
    Comments: EMNLP 2016
    Subjects: Computation and Language (cs.CL)

    We propose a neural-network based model for coordination boundary prediction.
    The network is designed to incorporate two signals: the similarity between
    conjuncts and the observation that replacing the whole coordination phrase with
    a conjunct tends to produce a coherent sentences. The modeling makes use of
    several LSTM networks. The model is trained solely on conjunction annotations
    in a Treebank, without using external resources. We show improvements on
    predicting coordination boundaries on the PTB compared to two state-of-the-art
    parsers; as well as improvement over previous coordination boundary prediction
    systems on the Genia corpus.

    A Survey of Voice Translation Methodologies – Acoustic Dialect Decoder

    Hans Krupakar, Keerthika Rajvel, Bharathi B, Angel Deborah S, Vallidevi Krishnamurthy
    Comments: (8 pages, 7 figures, IEEE Digital Xplore paper)
    Journal-ref: 2016 International Conference on Information Communication and
    Embedded Systems (ICICES), Chennai, 2016, pp. 1-9
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Machine Learning (stat.ML)

    Speech Translation has always been about giving source text or audio input
    and waiting for system to give translated output in desired form. In this
    paper, we present the Acoustic Dialect Decoder (ADD) – a voice to voice
    ear-piece translation device. We introduce and survey the recent advances made
    in the field of Speech Engineering, to employ in the ADD, particularly focusing
    on the three major processing steps of Recognition, Translation and Synthesis.
    We tackle the problem of machine understanding of natural language by designing
    a recognition unit for source audio to text, a translation unit for source
    language text to target language text, and a synthesis unit for target language
    text to target language speech. Speech from the surroundings will be recorded
    by the recognition unit present on the ear-piece and translation will start as
    soon as one sentence is successfully read. This way, we hope to give translated
    output as and when input is being read. The recognition unit will use Hidden
    Markov Models (HMMs) Based Tool-Kit (HTK), hybrid RNN systems with gated memory
    cells, and the synthesis unit, HMM based speech synthesis system HTS. This
    system will initially be built as an English to Tamil translation device.

    Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding

    Lina M. Rojas Barahona, Milica Gasic, Nikola Mrkšić, Pei-Hao Su, Stefan Ultes, Tsung-Hsien Wen, Steve Young
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    This paper presents a deep learning architecture for the semantic decoder
    component of a Statistical Spoken Dialogue System. In a slot-filling dialogue,
    the semantic decoder predicts the dialogue act and a set of slot-value pairs
    from a set of n-best hypotheses returned by the Automatic Speech Recognition.
    Most current models for spoken language understanding assume (i) word-aligned
    semantic annotations as in sequence taggers and (ii) delexicalisation, or a
    mapping of input words to domain-specific concepts using heuristics that try to
    capture morphological variation but that do not scale to other domains nor to
    language variation (e.g., morphology, synonyms, paraphrasing ). In this work
    the semantic decoder is trained using unaligned semantic annotations and it
    uses distributed semantic representation learning to overcome the limitations
    of explicit delexicalisation. The proposed architecture uses a convolutional
    neural network for the sentence representation and a long-short term memory
    network for the context representation. Results are presented for the publicly
    available DSTC2 corpus and an In-car corpus which is similar to DSTC2 but has a
    significantly higher word error rate (WER).

    Mapping Between Natural Movie fMRI Responses and Word-Sequence Representations

    Kiran Vodrahalli, Po-Hsuan Chen, Yingyu Liang, Janice Chen, Esther Yong, Christopher Honey, Peter Ramadge, Ken Norman, Sanjeev Arora
    Comments: 7 pages, 3 figures, accepted to NIPS 2016 Workshop on Representation Learning in Artificial and Biological Neural Networks
    Subjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL); Learning (cs.LG)

    This work provides support for the notion that distributional methods of
    representing word meaning from computational linguistics are useful for
    capturing neural correlates of real life multi-sensory stimuli, where the
    stimuli —in this case, a movie being watched by the human subjects— have
    been given text annotations. We present an approach to combining sequences of
    word vectors into a single vector. We also identify a semantically-relevant
    low-dimensional shared representation of fMRI response in an unsupervised
    fashion by using views of multiple subjects watching the same natural movie
    stimulus. Learned orthogonal linear maps between the fMRI and semantic
    representations allow us to successfully transform fMRI data generated by a
    natural movie stimulus into semantic vectors representing textual descriptions
    of the movie. We succeed at a scene classification task with 76% accuracy, over
    a 20% chance rate. When we selected five brain regions-of-interest (ROIs) and
    learned distinct maps from these ROIs to the text representations, the Default
    Mode Network (DMN) supported the highest level of decoding performance.


    Distributed, Parallel, and Cluster Computing

    Super-fast MST Algorithms in the Congested Clique using $o(m)$ Messages

    Sriram V. Pemmaraju, Vivek B. Sardeshmukh
    Comments: Full version of FST-TCS 2016 paper with the same title
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In a sequence of recent results (PODC 2015 and PODC 2016), the running time
    of the fastest algorithm for the emph{minimum spanning tree (MST)} problem in
    the emph{Congested Clique} model was first improved to $O(log log log n)$
    from $O(log log n)$ (Hegeman et al., PODC 2015) and then to $O(log^* n)$
    (Ghaffari and Parter, PODC 2016). All of these algorithms use $Theta(n^2)$
    messages independent of the number of edges in the input graph.

    This paper positively answers a question raised in Hegeman et al., and
    presents the first “super-fast” MST algorithm with $o(m)$ message complexity
    for input graphs with $m$ edges. Specifically, we present an algorithm running
    in $O(log^* n)$ rounds, with message complexity $ ilde{O}(sqrt{m cdot n})$
    and then build on this algorithm to derive a family of algorithms, containing
    for any $varepsilon$, $0 < varepsilon le 1$, an algorithm running in
    $O(log^* n/varepsilon)$ rounds, using $ ilde{O}(n^{1 +
    varepsilon}/varepsilon)$ messages. Setting $varepsilon = loglog n/log n$
    leads to the first sub-logarithmic round Congested Clique MST algorithm that
    uses only $ ilde{O}(n)$ messages.

    Our primary tools in achieving these results are (i) a component-wise bound
    on the number of candidates for MST edges, extending the sampling lemma of
    Karger, Klein, and Tarjan (Karger, Klein, and Tarjan, JACM 1995) and (ii)
    $Theta(log n)$-wise-independent linear graph sketches (Cormode and Firmani,
    Dist.~Par.~Databases, 2014) for generating MST candidate edges.

    An Information Theoretic Feature Selection Framework for Big Data under Apache Spark

    Sergio Ramírez-Gallego
    Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    With the advent of extremely high dimensional datasets, dimensionality
    reduction techniques are becoming mandatory. Among many techniques, feature
    selection has been growing in interest as an important tool to identify
    relevant features on huge datasets –both in number of instances and
    features–. The purpose of this work is to demonstrate that standard feature
    selection methods can be parallelized in Big Data platforms like Apache Spark,
    boosting both performance and accuracy. We thus propose a distributed
    implementation of a generic feature selection framework which includes a wide
    group of well-known Information Theoretic methods. Experimental results on a
    wide set of real-world datasets show that our distributed framework is capable
    of dealing with ultra-high dimensional datasets as well as those with a huge
    number of samples in a short period of time, outperforming the sequential
    version in all the cases studied.

    Optimizing Communication and Computation for Multi-UAV Information Gathering Applications

    Mason Thammawichai, Sujit P. Baliyarasimhuni, Eric C. Kerrigan, João B. Sousa
    Subjects: Systems and Control (cs.SY); Distributed, Parallel, and Cluster Computing (cs.DC); Robotics (cs.RO); Optimization and Control (math.OC)

    Mobile agent networks, such as multi-UAV systems, are constrained by limited
    resources. In particular, limited energy affects system performance directly,
    such as system lifetime. It has been demonstrated in the wireless sensor
    network literature that the communication energy consumption dominates the
    computational and the sensing energy consumption. Hence, the lifetime of the
    multi-UAV systems can be extended significantly by optimizing the amount of
    communication data, at the expense of increasing computational cost. In this
    work, we aim at attaining an optimal trade-off between the communication and
    the computational energy. Specifically, we propose a mixed-integer optimization
    formulation for a multi-hop hierarchical clustering-based self-organizing UAV
    network incorporating data aggregation, to obtain an energy-efficient
    information routing scheme. The proposed framework is tested on two
    applications, namely target tracking and area mapping. Based on simulation
    results, our method can significantly save energy compared to a baseline
    strategy, where there is no data aggregation and clustering scheme.


    Learning

    Tensorial Mixture Models

    Or Sharir, Ronen Tamari, Nadav Cohen, Amnon Shashua
    Comments: 19 pages including figures and not including appendices, 9 figures. A git repository for reproducing our experiments is available at: this https URL
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We introduce a generative model, we call Tensorial Mixture Models (TMMs)
    based on mixtures of basic component distributions over local structures (e.g.
    patches in an image) where the dependencies between the local-structures are
    represented by a “priors tensor” holding the prior probabilities of assigning a
    component distribution to each local-structure.

    In their general form, TMMs are intractable as the prior tensor is typically
    of exponential size. However, when the priors tensor is decomposed it gives
    rise to an arithmetic circuit which in turn transforms the TMM into a
    Convolutional Arithmetic Circuit (ConvAC). A ConvAC corresponds to a shallow
    (single hidden layer) network when the priors tensor is decomposed by a CP (sum
    of rank-1) approach and corresponds to a deep network when the decomposition
    follows the Hierarchical Tucker (HT) model.

    The ConvAC representation of a TMM possesses several attractive properties.
    First, the inference is tractable and is implemented by a forward pass through
    a deep network. Second, the architectural design of the model follows the deep
    networks community design, i.e., the structure of TMMs is determined by just
    two easily understood factors: size of pooling windows and number of channels.
    Finally, we demonstrate the effectiveness of our model when tackling the
    problem of classification with missing data, leveraging TMMs unique ability of
    tractable marginalization which leads to optimal classifiers regardless of the
    missingness distribution.

    Why Deep Neural Networks?

    Shiyu Liang, R. Srikant
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recently there has been much interest in understanding why deep neural
    networks are preferred to shallow networks. In this paper, we show that, for a
    large class of piecewise smooth functions, the number of neurons needed by a
    shallow network to approximate a function is exponentially larger than the
    corresponding number of neurons needed by a deep network for a given degree of
    function approximation. First, we consider univariate functions on a bounded
    interval and require a neural network to achieve an approximation error of
    $varepsilon$ uniformly over the interval. We show that shallow networks (i.e.,
    networks whose depth does not depend on $varepsilon$) require
    $Omega( ext{poly}(1/varepsilon))$ neurons while deep networks (i.e.,
    networks whose depth grows with $1/varepsilon$) require
    $mathcal{O}( ext{polylog}(1/varepsilon))$ neurons. We then extend these
    results to certain classes of important multivariate functions. Our results are
    derived for neural networks which use a combination of rectifier linear units
    (ReLUs) and binary step units, two of the most popular type of activation
    functions. Our analysis builds on this simple observation that the binary
    approximation of a real number in the interval $[0,1]$ can be represented by a
    deep neural network which uses a “small” number of neurons.

    Bank Card Usage Prediction Exploiting Geolocation Information

    Martin Wistuba, Nghia Duong-Trung, Nicolas Schilling, Lars Schmidt-Thieme
    Comments: Describes the winning solution for the ECML-PKDD 2016 Discovery Challenge on Bank Card Usage Analysis. Final results on the private leaderboard are available here: this https URL
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    We describe the solution of team ISMLL for the ECML-PKDD 2016 Discovery
    Challenge on Bank Card Usage for both tasks. Our solution is based on three
    pillars. Gradient boosted decision trees as a strong regression and
    classification model, an intensive search for good hyperparameter
    configurations and strong features that exploit geolocation information. This
    approach achieved the best performance on the public leaderboard for the first
    task and a decent fourth position for the second task.

    Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation

    Sohail Bahmani, Justin Romberg
    Subjects: Information Theory (cs.IT); Learning (cs.LG); Functional Analysis (math.FA); Optimization and Control (math.OC); Machine Learning (stat.ML)

    We propose a flexible convex relaxation for the phase retrieval problem that
    operates in the natural domain of the signal. Therefore, we avoid the
    prohibitive computational cost associated with “lifting” and semidefinite
    programming (SDP) in methods such as PhaseLift and compete with recently
    developed non-convex techniques for phase retrieval. We relax the quadratic
    equations for phaseless measurements to inequality constraints each of which
    representing a symmetric “slab”. Through a simple convex program, our proposed
    estimator finds an extreme point of the intersection of these slabs that is
    best aligned with a given anchor vector. We characterize geometric conditions
    that certify success of the proposed estimator. Furthermore, using classic
    results in statistical learning theory, we show that for random measurements
    the geometric certificates hold with high probability at an optimal sample
    complexity. Phase transition of our estimator is evaluated through simulations.
    Our numerical experiments also suggest that the proposed method can solve phase
    retrieval problems with coded diffraction measurements as well.

    An Information Theoretic Feature Selection Framework for Big Data under Apache Spark

    Sergio Ramírez-Gallego
    Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    With the advent of extremely high dimensional datasets, dimensionality
    reduction techniques are becoming mandatory. Among many techniques, feature
    selection has been growing in interest as an important tool to identify
    relevant features on huge datasets –both in number of instances and
    features–. The purpose of this work is to demonstrate that standard feature
    selection methods can be parallelized in Big Data platforms like Apache Spark,
    boosting both performance and accuracy. We thus propose a distributed
    implementation of a generic feature selection framework which includes a wide
    group of well-known Information Theoretic methods. Experimental results on a
    wide set of real-world datasets show that our distributed framework is capable
    of dealing with ultra-high dimensional datasets as well as those with a huge
    number of samples in a short period of time, outperforming the sequential
    version in all the cases studied.

    Unorganized Malicious Attacks Detection

    Ming Pang, Wei Gao, Min Tao, Zhi-Hua Zhou
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

    Recommender system has attracted much attention during the past decade, and
    many attack detection algorithms have been developed for better recommendation.
    Most previous approaches focus on the shilling attacks, where the attack
    organizer fakes a large number of user profiles by the same strategy to promote
    or demote an item. In this paper, we study a different attack style:
    unorganized malicious attacks, where attackers respectively use a small number
    of user profiles to attack their own target items without any organizer. This
    attack style occurs in many real applications, yet relevant study remains open.
    In this paper, we formulate the unorganized malicious attacks detection as a
    variant of matrix completion problem, and prove that attackers can be detected
    theoretically. We propose the Unorganized Malicious Attacks detection (UMA)
    algorithm, which can be viewed as a proximal alternating splitting augmented
    Lagrangian method. We verify, both theoretically and empirically, the
    effectiveness of our proposed algorithm.

    Generalized Online Transfer Learning for Climate Control in Residential Buildings

    Thomas Grubinger, Georgios Chasparis, Thomas Natschlaeger
    Subjects: Systems and Control (cs.SY); Learning (cs.LG)

    This paper presents an online transfer learning framework for improving
    temperature predictions in residential buildings. In transfer learning,
    prediction models trained under a set of available data from a target domain
    (e.g., house with limited data) can be improved through the use of data
    generated from similar source domains (e.g., houses with rich data). Given also
    the need for prediction models that can be trained online (e.g., as part of a
    model-predictive-control implementation), this paper introduces the generalized
    online transfer learning algorithm (GOTL). It employs a weighted combination of
    the available predictors (i.e., the target and source predictors) and
    guarantees convergence to the best weighted predictor. Furthermore, the use of
    Transfer Component Analysis (TCA) allows for using more than a single source
    domains, since it may facilitate the fit of a single model on more than one
    source domains (houses). This allows GOTL to transfer knowledge from more than
    one source domains. We further validate our results through experiments in
    climate control for residential buildings and show that GOTL may lead to
    non-negligible energy savings for given comfort levels.

    Predicting the dynamics of 2d objects with a deep residual network

    François Fleuret
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We investigate how a residual network can learn to predict the dynamics of
    interacting shapes purely as an image-to-image regression problem.

    With a simple 2d physics simulator, we generate short sequences composed of
    rectangles put in motion by applying a pulling force at a point picked at
    random. The network is trained with a quadratic loss to predict the image of
    the resulting configuration, given the image of the starting configuration and
    an image indicating the point of grasping.

    Experiments show that the network learns to predict accurately the resulting
    image, which implies in particular that (1) it segments rectangles as distinct
    components, (2) it infers which one contains the grasping point, (3) it models
    properly the dynamic of a single rectangle, including the torque, (4) it
    detects and handles collisions to some extent, and (5) it re-synthesizes
    properly the entire scene with displaced rectangles.

    Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder

    Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)

    We propose a flexible framework for spectral conversion (SC) that facilitates
    training with unaligned corpora. Many SC frameworks require parallel corpora,
    phonetic alignments, or explicit frame-wise correspondence for learning
    conversion functions or for synthesizing a target spectrum with the aid of
    alignments. However, these requirements gravely limit the scope of practical
    applications of SC due to scarcity or even unavailability of parallel corpora.
    We propose an SC framework based on variational auto-encoder which enables us
    to exploit non-parallel corpora. The framework comprises an encoder that learns
    speaker-independent phonetic representations and a decoder that learns to
    reconstruct the designated speaker. It removes the requirement of parallel
    corpora or phonetic alignments to train a spectral conversion system. We report
    objective and subjective evaluations to validate our proposed method and
    compare it to SC methods that have access to aligned corpora.

    Semi-Supervised Active Learning for Support Vector Machines: A Novel Approach that Exploits Structure Information in Data

    Tobias Reitmaier, Adrian Calma, Bernhard Sick
    Comments: 35 pages
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    In our today’s information society more and more data emerges, e.g.~in social
    networks, technical applications, or business applications. Companies try to
    commercialize these data using data mining or machine learning methods. For
    this purpose, the data are categorized or classified, but often at high
    (monetary or temporal) costs. An effective approach to reduce these costs is to
    apply any kind of active learning (AL) methods, as AL controls the training
    process of a classifier by specific querying individual data points (samples),
    which are then labeled (e.g., provided with class memberships) by a domain
    expert. However, an analysis of current AL research shows that AL still has
    some shortcomings. In particular, the structure information given by the
    spatial pattern of the (un)labeled data in the input space of a classification
    model (e.g.,~cluster information), is used in an insufficient way. In addition,
    many existing AL techniques pay too little attention to their practical
    applicability. To meet these challenges, this article presents several
    techniques that together build a new approach for combining AL and
    semi-supervised learning (SSL) for support vector machines (SVM) in
    classification tasks. Structure information is captured by means of
    probabilistic models that are iteratively improved at runtime when label
    information becomes available. The probabilistic models are considered in a
    selection strategy based on distance, density, diversity, and distribution (4DS
    strategy) information for AL and in a kernel function (Responsibility Weighted
    Mahalanobis kernel) for SVM. The approach fuses generative and discriminative
    modeling techniques. With 20 benchmark data sets and with the MNIST data set it
    is shown that our new solution yields significantly better results than
    state-of-the-art methods.

    Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network

    Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)

    In this paper, we propose a dictionary update method for Nonnegative Matrix
    Factorization (NMF) with high dimensional data in a spectral conversion (SC)
    task. Voice conversion has been widely studied due to its potential
    applications such as personalized speech synthesis and speech enhancement.
    Exemplar-based NMF (ENMF) emerges as an effective and probably the simplest
    choice among all techniques for SC, as long as a source-target parallel speech
    corpus is given. ENMF-based SC systems usually need a large amount of bases
    (exemplars) to ensure the quality of the converted speech. However, a small and
    effective dictionary is desirable but hard to obtain via dictionary update, in
    particular when high-dimensional features such as STRAIGHT spectra are used.
    Therefore, we propose a dictionary update framework for NMF by means of an
    encoder-decoder reformulation. Regarding NMF as an encoder-decoder network
    makes it possible to exploit the whole parallel corpus more effectively and
    efficiently when applied to SC. Our experiments demonstrate significant gains
    of the proposed system with small dictionaries over conventional ENMF-based
    systems with dictionaries of same or much larger size.

    Compressing Neural Language Models by Sparse Word Representations

    Yunchuan Chen, Lili Mou, Yan Xu, Ge Li, Zhi Jin
    Comments: ACL-16, pages 226–235
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Neural networks are among the state-of-the-art techniques for language
    modeling. Existing neural language models typically map discrete words to
    distributed, dense vector representations. After information processing of the
    preceding context words by hidden layers, an output layer estimates the
    probability of the next word. Such approaches are time- and memory-intensive
    because of the large numbers of parameters for word embeddings and the output
    layer. In this paper, we propose to compress neural language models by sparse
    word representations. In the experiments, the number of parameters in our model
    increases very slowly with the growth of the vocabulary size, which is almost
    imperceptible. Moreover, our approach not only reduces the parameter space to a
    large extent, but also improves the performance in terms of the perplexity
    measure.

    Mapping Between Natural Movie fMRI Responses and Word-Sequence Representations

    Kiran Vodrahalli, Po-Hsuan Chen, Yingyu Liang, Janice Chen, Esther Yong, Christopher Honey, Peter Ramadge, Ken Norman, Sanjeev Arora
    Comments: 7 pages, 3 figures, accepted to NIPS 2016 Workshop on Representation Learning in Artificial and Biological Neural Networks
    Subjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL); Learning (cs.LG)

    This work provides support for the notion that distributional methods of
    representing word meaning from computational linguistics are useful for
    capturing neural correlates of real life multi-sensory stimuli, where the
    stimuli —in this case, a movie being watched by the human subjects— have
    been given text annotations. We present an approach to combining sequences of
    word vectors into a single vector. We also identify a semantically-relevant
    low-dimensional shared representation of fMRI response in an unsupervised
    fashion by using views of multiple subjects watching the same natural movie
    stimulus. Learned orthogonal linear maps between the fMRI and semantic
    representations allow us to successfully transform fMRI data generated by a
    natural movie stimulus into semantic vectors representing textual descriptions
    of the movie. We succeed at a scene classification task with 76% accuracy, over
    a 20% chance rate. When we selected five brain regions-of-interest (ROIs) and
    learned distinct maps from these ROIs to the text representations, the Default
    Mode Network (DMN) supported the highest level of decoding performance.

    Generalization bound for kernel similarity learning

    Michael Rabadi
    Comments: 9 pages
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Similarity learning has received a large amount of interest and is an
    important tool for many scientific and industrial applications. In this
    framework, we wish to infer the distance (similarity) between points with
    respect to an arbitrary distance function $d$. Here, we formulate the problem
    as a regression from a feature space $mathcal{X}$ to an arbitrary vector space
    $mathcal{Y}$, where the Euclidean distance is proportional to $d$. We then
    give Rademacher complexity bounds on the generalization error. We find that
    with high probability, the complexity is bounded by the maximum of the radius
    of $mathcal{X}$ and the radius of $mathcal{Y}$.


    Information Theory

    Super-Resolution Delay-Doppler Estimation for OFDM Passive Radar

    Le Zheng, Xiaodong Wang
    Subjects: Information Theory (cs.IT)

    In this paper, we consider the problem of joint delay-Doppler estimation of
    moving targets in a passive radar that makes use of orthogonal
    frequency-division multiplexing (OFDM) communication signals. A compressed
    sensing algorithm is proposed to achieve supper-resolution and better accuracy,
    using both the atomic norm and the $ell_1$-norm. The atomic norm is used to
    manifest the signal sparsity in the continuous domain. Unlike previous works
    which assume the demodulation to be error free, we explicitly introduce the
    demodulation error signal whose sparsity is imposed by the $ell_1$-norm. On
    this basis, the delays and Doppler frequencies are estimated by solving a
    semidefinite program (SDP) which is convex. We also develop an iterative method
    for solving this SDP via the alternating direction method of multipliers (ADMM)
    where each iteration involves closed-form computation. Simulation results are
    presented to illustrate the high performance of the proposed algorithm.

    Partial Strong Converse for the Non-Degraded Wiretap Channel

    Yi-Peng Wei, Sennur Ulukus
    Comments: Presented at the Allerton Conference, September 2016
    Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)

    We prove the partial strong converse property for the discrete memoryless
    non-degraded wiretap channel, for which we require the leakage to the
    eavesdropper to vanish but allow an asymptotic error probability $epsilon in
    [0,1)$. We use a recently developed technique based on information spectrum and
    a recursive bounding method to evaluate the probability of correct decoding. We
    show that when the transmission rate is above the secrecy capacity, the
    probability of correct decoding decays to zero exponentially. Therefore, the
    maximum transmission rate is the same for $epsilon in [0,1)$, and the partial
    strong converse property holds.

    Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation

    Sohail Bahmani, Justin Romberg
    Subjects: Information Theory (cs.IT); Learning (cs.LG); Functional Analysis (math.FA); Optimization and Control (math.OC); Machine Learning (stat.ML)

    We propose a flexible convex relaxation for the phase retrieval problem that
    operates in the natural domain of the signal. Therefore, we avoid the
    prohibitive computational cost associated with “lifting” and semidefinite
    programming (SDP) in methods such as PhaseLift and compete with recently
    developed non-convex techniques for phase retrieval. We relax the quadratic
    equations for phaseless measurements to inequality constraints each of which
    representing a symmetric “slab”. Through a simple convex program, our proposed
    estimator finds an extreme point of the intersection of these slabs that is
    best aligned with a given anchor vector. We characterize geometric conditions
    that certify success of the proposed estimator. Furthermore, using classic
    results in statistical learning theory, we show that for random measurements
    the geometric certificates hold with high probability at an optimal sample
    complexity. Phase transition of our estimator is evaluated through simulations.
    Our numerical experiments also suggest that the proposed method can solve phase
    retrieval problems with coded diffraction measurements as well.

    Multi-Layer Precoding: A Potential Solution for Full-Dimensional Massive MIMO Systems

    Ahmed Alkhateeb, Geert Leus, Robert W. Heath Jr
    Comments: submitted to IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    Massive multiple-input multiple-output (MIMO) systems achieve high sum
    spectral efficiency by offering an order of magnitude increase in multiplexing
    gains. In time division duplexing systems, however, the reuse of uplink
    training pilots among cells results in additional channel estimation error,
    which causes downlink inter-cell interference, even when large numbers of
    antennas are employed. Handling this interference with conventional network
    MIMO techniques is challenging due to the large channel dimensionality.
    Further, the implementation of large antenna precoding/combining matrices is
    associated with high hardware complexity and power consumption. In this paper,
    we propose multi-layer precoding to enable efficient and low complexity
    operation in full-dimensional massive MIMO, where a large number of antennas is
    used in two dimensions. In multi-layer precoding, the precoding matrix of each
    base station is written as a product of a number of precoding matrices, each
    one called a layer. Multi-layer precoding (i) leverages the directional
    characteristics of large-scale MIMO channels to manage inter-cell interference
    with low channel knowledge requirements, and (ii) allows for an efficient
    implementation using low-complexity hybrid analog/digital architectures. We
    present a specific multi-layer precoding design for full-dimensional massive
    MIMO systems. The performance of this precoding design is analyzed and the
    per-user achievable rate is characterized for general channel models. The
    asymptotic optimality of the proposed multi-layer precoding design is then
    proved for some special yet important channels. Numerical simulations verify
    the analytical results and illustrate the potential gains of multi-layer
    precoding compared to traditional pilot-contaminated massive MIMO solutions.

    Monotonicity of Entropy and Fisher Information: A Quick Proof via Maximal Correlation

    Thomas A. Courtade
    Comments: To appear, Communications on Information and Systems
    Subjects: Information Theory (cs.IT); Probability (math.PR)

    A simple proof is given for the monotonicity of entropy and Fisher
    information associated to sums of i.i.d. random variables. The proof relies on
    a characterization of maximal correlation for partial sums due to Dembo, Kagan
    and Shepp.

    Distributed Hybrid Scheduling in Multi-Cloud Networks using Conflict Graphs

    Ahmed Douik, Hayssam Dahrouj, Tareq Y. Al-Naffouri, Mohamed-Slim Alouini
    Subjects: Information Theory (cs.IT)

    Recent studies on cloud-radio access networks assume either signal-level or
    scheduling-level coordination. This paper considers a hybrid coordinated scheme
    as a means to benefit from both policies. Consider the downlink of a
    multi-cloud radio access network, where each cloud is connected to several
    base-stations (BSs) via high capacity links, and, therefore, allows for joint
    signal processing within the cloud transmission. Across the multiple clouds,
    however, only scheduling-level coordination is permitted, as low levels of
    backhaul communication are feasible. The frame structure of every BS is
    composed of various time/frequency blocks, called power-zones (PZs), which are
    maintained at a fixed power level. The paper addresses the problem of
    maximizing a network-wide utility by associating users to clouds and scheduling
    them to the PZs, under the practical constraints that each user is scheduled to
    a single cloud at most, but possibly to many BSs within the cloud, and can be
    served by one or more distinct PZs within the BSs’ frame. The paper solves the
    problem using graph theory techniques by constructing the conflict graph. The
    considered scheduling problem is, then, shown to be equivalent to a
    maximum-weight independent set problem in the constructed graph, which can be
    solved using efficient techniques. The paper then proposes solving the problem
    using both optimal and heuristic algorithms that can be implemented in a
    distributed fashion across the network. The proposed distributed algorithms
    rely on the well-chosen structure of the constructed conflict graph utilized to
    solve the maximum-weight independent set problem. Simulation results suggest
    that the proposed optimal and heuristic hybrid scheduling strategies provide
    appreciable gain as compared to the scheduling-level coordinated networks, with
    a negligible degradation to signal-level coordination.

    Estimation of linear operators from scattered impulse responses

    Jérémie Bigot (IMB), Paul Escande (DISC, ITAV), Pierre Weiss (IMT, ITAV)
    Subjects: Information Theory (cs.IT); Numerical Analysis (math.NA); Statistics Theory (math.ST)

    We provide a new estimator of integral operators with smooth kernels,
    obtained from a set of scattered and noisy impulse responses. The proposed
    approach relies on the formalism of smoothing in reproducing kernel Hilbert
    spaces and on the choice of an appropriate regularization term that takes the
    smoothness of the operator into account. It is numerically tractable in very
    large dimensions. We study the estimator’s robustness to noise and analyze its
    approximation properties with respect to the size and the geometry of the
    dataset. In addition, we show minimax optimality of the proposed estimator.

    Compressive Cyclostationary Spectrum Sensing with a Constant False Alarm Rate

    Andreas Bollig, Martijn Arts, Anastasia Lavrenko, Rudolf Mathar
    Comments: 19 pages, 5 figures, submitted to EURASIP Journal on Wireless Communications and Networking
    Subjects: Information Theory (cs.IT)

    Spectrum sensing is a crucial component of opportunistic spectrum access
    schemes, which aim at improving spectrum utilization by allowing for the reuse
    of idle licensed spectrum. Sensing a spectral band before using it makes sure
    the legitimate users are not disturbed. Since information about these users’
    signals is not necessarily available, the sensor should be able to conduct
    so-called blind spectrum sensing. Historically, this has not been a feature of
    cyclostationarity-based algorithms. Indeed, in many application scenarios the
    information required for traditional cyclostationarity detection might not be
    available, hindering its practical applicability. In this work we propose two
    new cyclostationary spectrum sensing algorithms that make use of the inherent
    sparsity of the cyclic autocorrelation to make blind operation possible. Along
    with utilizing sparse recovery methods for estimating the cyclic
    autocorrelation, we take further advantage of its structure by introducing
    joint sparsity as well as general structure dictionaries into the recovery
    process. Furthermore, we extend a statistical test for cyclostationarity to
    accommodate sparse cyclic spectra. Our numerical results demonstrate that the
    new methods achieve a near constant false alarm rate behavior in contrast to
    earlier approaches from the literature.

    Fourier-Motzkin Elimination Software for Information Theoretic Inequalities

    Ido B. Gattegno, Ziv Goldfeld, Haim H. Permuter
    Subjects: Information Theory (cs.IT)

    We provide open-source software implemented in MATLAB, that performs
    Fourier-Motzkin elimination (FME) and removes constraints that are redundant
    due to Shannon-type inequalities (STIs). The FME is often used in information
    theoretic contexts to simplify rate regions, e.g., by eliminating auxiliary
    rates. Occasionally, however, the procedure becomes cumbersome, which makes an
    error-free hand-written derivation an elusive task. Some computer software have
    circumvented this difficulty by exploiting an automated FME process. However,
    the outputs of such software often include constraints that are inactive due to
    information theoretic properties. By incorporating the notion of STIs (a class
    of information inequalities provable via a computer program), our algorithm
    removes such redundant constraints based on non-negativity properties,
    chain-rules and probability mass function factorization. This newsletter first
    illustrates the program’s abilities, and then reviews the contribution of STIs
    to the identification of redundant constraints.

    SNR-Walls in Eigenvalue-based Spectrum Sensing

    Andreas Bollig, Constantin Disch, Martijn Arts, Rudolf Mathar
    Comments: 17 pages, 3 figures, submitted to EURASIP Journal on Wireless Communications and Networking
    Subjects: Information Theory (cs.IT)

    Various spectrum sensing approaches have been shown to suffer from a
    so-called SNR-wall, an SNR value below which a detector cannot perform robustly
    no matter how many observations are used. Up to now, the eigenvalue-based
    maximum-minimum-eigenvalue (MME) detector has been a notable exception. For
    instance, the model uncertainty of imperfect knowledge of the receiver noise
    power, which is known to be responsible for the energy detector’s fundamental
    limits, does not adversely affect the MME detector’s performance. While
    additive white Gaussian noise (AWGN) is a standard assumption in wireless
    communications, it is not a reasonable one for the MME detector. In fact, in
    this work we prove that uncertainty in the amount of noise coloring does lead
    to an SNR-wall for the MME detector. We derive a lower bound on this SNR-wall
    and evaluate it for example scenarios. The findings are supported by numerical
    simulations.

    Quantum information inequalities via tracial positive linear maps

    A. Dadkhah, M. S. Moslehian
    Comments: 18 pages, to appear in J. Math. Anal. Appl. (JMAA)
    Subjects: Functional Analysis (math.FA); Information Theory (cs.IT); Mathematical Physics (math-ph); Operator Algebras (math.OA)

    We present some generalizations of quantum information inequalities involving
    tracial positive linear maps between $C^*$-algebras. Among several results, we
    establish a noncommutative Heisenberg uncertainty relation. More precisely, we
    show that if $Phi: mathcal{A} o mathcal{B}$ is a tracial positive linear
    map between $C^*$-algebras , $
    ho in mathcal{A}$ is a $Phi$-density element
    and $A,B$ are self-adjoint operators of $mathcal{A}$ such that $ {
    m
    sp}(mbox{-i}
    ho^frac{1}{2}[A,B]
    ho^frac{1}{2}) subseteq [m,M] $ for some
    scalers $0<m<M$, then under some conditions egin{eqnarray}label{inemain1}
    V_{
    ho,Phi}(A)sharp V_{
    ho,Phi}(B)geq
    frac{1}{2sqrt{K_{m,M}(
    ho[A,B])}} left|Phi(
    ho [A,B])
    ight|,
    end{eqnarray} where $K_{m,M}(
    ho[A,B])$ is the Kantorovich constant of the
    operator $mbox{-i}
    ho^frac{1}{2}[A,B]
    ho^frac{1}{2}$ and
    $V_{
    ho,Phi}(X)$ is the generalized variance of $X$.\ In addition, we use
    some arguments differing from the scalar theory to present some inequalities
    related to the generalized correlation and the generalized
    Wigner–Yanase–Dyson skew information.

    Single Controller Stochastic Games for Optimized Moving Target Defense

    AbdelRahman Eldosouky, Walid Saad, Dusit Niyato
    Comments: Accepted in IEEE ICC 2016
    Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)

    Moving target defense (MTD) techniques that enable a system to randomize its
    configuration to thwart prospective attacks are an effective security solution
    for tomorrow’s wireless networks. However, there is a lack of analytical
    techniques that enable one to quantify the benefits and tradeoffs of MTDs. In
    this paper, a novel approach for implementing MTD techniques that can be used
    to randomize cryptographic techniques and keys in wireless networks is
    proposed. In particular, the problem is formulated as a stochastic game in
    which a base station (BS), acting as a defender seeks to strategically change
    its cryptographic techniques and keys in an effort to deter an attacker that is
    trying to eavesdrop on the data. The game is shown to exhibit a
    single-controller property in which only one player, the defender, controls the
    state of the game. For this game, the existence and properties of the Nash
    equilibrium are studied, in the presence of a defense cost for using MTD. Then,
    a practical algorithm for deriving the equilibrium MTD strategies is derived.
    Simulation results show that the proposed game-theoretic MTD framework can
    significantly improve the overall utility of the defender, while enabling
    effective randomization over cryptographic techniques.

    A new method for obtaining Fibonacci identities and efficient computation of any-order linear recurrences

    Dmitry I. Khomovsky
    Comments: 4 algorithms
    Subjects: Number Theory (math.NT); Information Theory (cs.IT); Numerical Analysis (math.NA); Rings and Algebras (math.RA)

    For the Lucas sequence ${U_{k}(P,Q)}$ we discuss the identities like the
    well-known Fibonacci identities. For example, the generalizations of
    $F_{2k}=F_{k+1}^2-F_{k-1}^2$ and $F_{2k+1}=F_{k+1}^2+F_{k}^2$ are $P
    U_{2k}=U_{k+1}^2- Q^2 U_{k-1}^2$ and $U_{2k+1}=U_{k+1}^2-Q U_{k}^2$,
    respectively. We propose a new simple method for obtaining identities involving
    any recurrences and use it to obtain new identities involving the Fibonacci
    numbers such as
    $F_{k+3}^5-F_{k-3}^5+60F_{k}^5=8(F_{k+2}^5+F_{k-2}^5)+40(F_{k+1}^5-F_{k-1}^5)$.
    We also present an efficient algorithm for computing any-order linear
    recurrences.




沪ICP备19023445号-2号
友情链接