IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Fri, 4 Nov 2016

    我爱机器学习(52ml.net)发表于 2016-11-04 00:00:00
    love 0

    Neural and Evolutionary Computing

    Recurrent Neural Networks for Spatiotemporal Dynamics of Intrinsic Networks from fMRI Data

    R Devon Hjelm, Sergey M. Plis, Vince Calhoun
    Comments: Accepted to “Brain and Bits” workshop for NIPS 2016
    Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

    Functional magnetic resonance imaging (fMRI) of temporally-coherent blood
    oxygenization level-dependent (BOLD) signal provides an effective means of
    analyzing functionally coherent patterns in the brain. Intrinsic networks and
    functional connectivity are important outcomes of fMRI studies and are central
    to understanding brain function and making diagnoses. The most popular method
    for separating INs, independent component analysis, begins with the assumption
    that the data is a mixture of maximally independent sources. ICA is trainable
    through one of many relatively simple optimization routines that maximize
    non-Gaussianity or minimize mutual information. Although fMRI data is a time
    series, ICA, as with other popular linear methods for separating INs, is
    order-agnostic in time: each multivariate signal at each time step is treated
    as i.i.d.. ICA in its common use in the field employs the same parameterization
    across subjects, which allows for either temporal or spatial variability, but
    not both. In order to overcome shortcomings of temporal ICA in lack of dynamics
    and subject-wise/temporal variability of spatial maps, but without abandoning
    the fundamental strengths of ICA, we combine recurrent neural networks (RNNs)
    with an ICA objective. The resulting model naturally represents temporal and
    spatial dynamics—having subject-wise and temporally variable spatial
    maps—and is easily trainable using gradient descent and back-propagation.

    Finding Local Minima for Nonconvex Optimization in Linear Time

    Naman Agarwal, Zeyuan Allen-Zhu, Brian Bullins, Elad Hazan, Tengyu Ma
    Subjects: Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We design a non-convex second-order optimization algorithm that is guaranteed
    to return an approximate local minimum in time which is linear in the input
    representation. The previously fastest methods run in time proportional to
    matrix inversion or worse.

    The time complexity of our algorithm to find a local minimum is even faster
    than that of gradient descent to find a critical point (which can be a saddle
    point). Our algorithm applies to a very general class of optimization problems
    including training a neural network and many other non-convex objectives
    arising in machine learning.

    Learning to Pivot with Adversarial Networks

    Gilles Louppe, Michael Kagan, Kyle Cranmer
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Data Analysis, Statistics and Probability (physics.data-an); Methodology (stat.ME)

    Many inference problems involve data generation processes that are not
    uniquely specified or are uncertain in some way. In a scientific context, the
    presence of several plausible data generation processes is often associated to
    the presence of systematic uncertainties. Robust inference is possible if it is
    based on a pivot — a quantity whose distribution is invariant to the unknown
    value of the (categorical or continuous) nuisance parameters that parametrizes
    this family of generation processes. In this work, we introduce a flexible
    training procedure based on adversarial networks for enforcing the pivotal
    property on a predictive model. We derive theoretical results showing that the
    proposed algorithm tends towards a minimax solution corresponding to a
    predictive model that is both optimal and independent of the nuisance
    parameters (if that models exists) or for which one can tune the trade-off
    between power and robustness. Finally, we demonstrate the effectiveness of this
    approach with a toy example and an example from particle physics.

    Maximizing Investment Value of Small-Scale PV in a Smart Grid Environment

    Jeremy Every, Li Li, Youguang G. Guo, David G. Dorrell
    Comments: To appear the proceedings of the 5th International Conference for Renewable Energy Research and Applications (ICRERA2016), Birmingham, United Kingdom. 6 pages. 3 figures
    Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Determining the optimal size and orientation of small-scale residential based
    PV arrays will become increasingly complex in the future smart grid environment
    with the introduction of smart meters and dynamic tariffs. However consumers
    can leverage the availability of smart meter data to conduct a more detailed
    exploration of PV investment options for their particular circumstances. In
    this paper, an optimization method for PV orientation and sizing is proposed
    whereby maximizing the PV investment value is set as the defining objective.
    Solar insolation and PV array models are described to form the basis of the PV
    array optimization strategy. A constrained particle swarm optimization
    algorithm is selected due to its strong performance in non-linear applications.
    The optimization algorithm is applied to real-world metered data to quantify
    the possible investment value of a PV installation under different energy
    retailers and tariff structures. The arrangement with the highest value is
    determined to enable prospective small-scale PV investors to select the most
    cost-effective system.

    Deep Convolutional Neural Network Design Patterns

    Leslie N. Smith, Nicholay Topin
    Comments: Submitted as a conference paper at ICLR 2017
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Recent research in the deep learning field has produced a plethora of new
    architectures. At the same time, a growing number of groups are applying deep
    learning to new applications and problems. Many of these groups might be
    composed of inexperienced deep learning practitioners who are baffled by the
    dizzying array of architecture choices and therefore use an older architecture,
    such as Alexnet. Here, we are attempting to bridge this gap by mining the
    collective knowledge contained in recent deep learning research to discover
    underlying principles for designing neural network architectures. In addition,
    we describe several architectural innovations, including Fractal of FractalNet,
    Stagewise Boosting Networks, and Taylor Series Networks (our Caffe code and
    prototxt files will be made publicly available). We hope others are inspired to
    build on this preliminary work.


    Computer Vision and Pattern Recognition

    Adaptive mixed norm optical flow estimation

    Vania V. Estrela, Matthias O. Franz, Ricardo T. Lopes, G. P. De Araujo
    Comments: 8 pages, 4 figures. arXiv admin note: text overlap with arXiv:1403.7365
    Journal-ref: Proc. SPIE 5960, Visual Communications and Image Processing 2005,
    59603W, July 31, 2006, Beijing, China
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The pel-recursive computation of 2-D optical flow has been extensively
    studied in computer vision to estimate motion from image sequences, but it
    still raises a wealth of issues, such as the treatment of outliers, motion
    discontinuities and occlusion. It relies on spatio-temporal brightness
    variations due to motion. Our proposed adaptive regularized approach deals with
    these issues within a common framework. It relies on the use of a data-driven
    technique called Mixed Norm (MN) to estimate the best motion vector for a given
    pixel. In our model, various types of noise can be handled, representing
    different sources of error. The motion vector estimation takes into
    consideration local image properties and it results from the minimization of a
    mixed norm functional with a regularization parameter depending on the
    kurtosis. This parameter determines the relative importance of the fourth norm
    and makes the functional convex. The main advantage of the developed procedure
    is that no knowledge of the noise distribution is necessary. Experiments
    indicate that this approach provides robust estimates of the optical flow.

    Recent Advances in Transient Imaging: A Computer Graphics and Vision Perspective

    Adrian Jarabo, Belen Masia, Julio Marco, Diego Gutierrez
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

    Transient imaging has recently made a huge impact in the computer graphics
    and computer vision fields. By capturing, reconstructing, or simulating light
    transport at extreme temporal resolutions, researchers have proposed novel
    techniques to show movies of light in motion, see around corners, detect
    objects in highly-scattering media, or infer material properties from a
    distance, to name a few. The key idea is to leverage the wealth of information
    in the temporal domain at the pico or nanosecond resolution, information
    usually lost during the capture-time temporal integration. This paper presents
    recent advances in this field of transient imaging from a graphics and vision
    perspective, including capture techniques, analysis, applications and
    simulation.

    Rough Set Based Color Channel Selection

    Soumyabrata Dev, Florian M. Savoy, Yee Hui Lee, Stefan Winkler
    Comments: Accepted in IEEE Geoscience and Remote Sensing Letters, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Color channel selection is essential for accurate segmentation of sky and
    clouds in images obtained from ground-based sky cameras. Most prior works in
    cloud segmentation use threshold based methods on color channels selected in an
    ad-hoc manner. In this letter, we propose the use of rough sets for color
    channel selection in visible-light images. Our proposed approach assesses color
    channels with respect to their contribution for segmentation, and identifies
    the most effective ones.

    An All-In-One Convolutional Neural Network for Face Analysis

    Rajeev Ranjan, Swami Sankaranarayanan, Carlos D. Castillo, Rama Chellappa
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a multi-purpose algorithm for simultaneous face detection, face
    alignment, pose estimation, gender recognition, smile detection, age estimation
    and face recognition using a single deep convolutional neural network (CNN).
    The proposed method employs a multi-task learning framework that regularizes
    the shared parameters of CNN and builds a synergy among different domains and
    tasks. Extensive experiments show that the network has a better understanding
    of face and achieves state-of-the-art result for most of these tasks.

    Optical Flow Estimation using a Spatial Pyramid Network

    Anurag Ranjan, Michael J. Black
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We learn to compute optical flow by combining a classical coarse-to-fine flow
    approach with deep learning. Specifically we adopt a spatial-pyramid
    formulation to deal with large motions. According to standard practice, we warp
    one image of a pair at each pyramid level by the current flow estimate and
    compute an update to the flow field. Instead of the standard minimization of an
    objective function at each pyramid level, we train a deep neural network at
    each level to compute the flow update. Unlike the recent FlowNet approach, the
    networks do not need to deal with large motions; these are dealt with by the
    pyramid. This has several advantages. First the network is much simpler and
    much smaller; our Spatial Pyramid Network (SPyNet) is 96% smaller than FlowNet
    in terms of model parameters. Because it is so small, the method could be made
    to run on a cell phone or other embedded system. Second, since the flow is
    assumed to be small at each level ((< 1) pixel), a convolutional approach
    applied to pairs of warped images is appropriate. Third, unlike FlowNet, the
    filters that we learn appear similar to classical spatio-temporal filters,
    possibly giving insight into how to improve the method further. Our results are
    more accurate than FlowNet in most cases and suggest a new direction of
    combining classical flow methods with deep learning.

    Learning Deep Embeddings with Histogram Loss

    Evgeniya Ustinova, Victor Lempitsky
    Comments: NIPS 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We suggest a loss for learning deep embeddings. The new loss does not
    introduce parameters that need to be tuned and results in very good embeddings
    across a range of datasets and problems. The loss is computed by estimating two
    distribution of similarities for positive (matching) and negative
    (non-matching) sample pairs, and then computing the probability of a positive
    pair to have a lower similarity score than a negative pair based on the
    estimated similarity distributions. We show that such operations can be
    performed in a simple and piecewise-differentiable manner using 1D histograms
    with soft assignment operations. This makes the proposed loss suitable for
    learning deep embeddings using stochastic optimization. In the experiments, the
    new loss performs favourably compared to recently proposed alternatives.

    Deep Convolutional Neural Network Design Patterns

    Leslie N. Smith, Nicholay Topin
    Comments: Submitted as a conference paper at ICLR 2017
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Recent research in the deep learning field has produced a plethora of new
    architectures. At the same time, a growing number of groups are applying deep
    learning to new applications and problems. Many of these groups might be
    composed of inexperienced deep learning practitioners who are baffled by the
    dizzying array of architecture choices and therefore use an older architecture,
    such as Alexnet. Here, we are attempting to bridge this gap by mining the
    collective knowledge contained in recent deep learning research to discover
    underlying principles for designing neural network architectures. In addition,
    we describe several architectural innovations, including Fractal of FractalNet,
    Stagewise Boosting Networks, and Taylor Series Networks (our Caffe code and
    prototxt files will be made publicly available). We hope others are inspired to
    build on this preliminary work.

    Initialization and Coordinate Optimization for Multi-way Matching

    Da Tang, Tony Jebara
    Comments: 18 pages (including the supplementary material), 8 figures
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We consider the problem of consistently matching multiple sets of elements to
    each other, which is a common task in fields such as computer vision. To solve
    the underlying NP-hard objective, existing methods often relax or approximate
    it, but end up with unsatisfying empirical performances due to their inexact
    objectives. We propose a coordinate update algorithm that directly solves the
    exact objective. By using the pairwise alignment information to build an
    undirected graph and initializing the permutation matrices along the edges of
    its Maximum Spanning Tree, our algorithm successfully avoids bad local optima.
    Theoretically, with high probability our algorithm could guarantee to solve
    this problem optimally on data with reasonable noise. Empirically, our
    algorithm consistently and significantly outperforms existing methods on
    several benchmark tasks on real datasets.

    Temporal Matrix Completion with Locally Linear Latent Factors for Medical Applications

    Frodo Kin Sun Chan, Andy J Ma, Pong C Yuen, Terry Cheuk-Fung Yip, Yee-Kit Tse, Vincent Wai-Sun Wong, Grace Lai-Hung Wong
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Regular medical records are useful for medical practitioners to analyze and
    monitor patient health status especially for those with chronic disease, but
    such records are usually incomplete due to unpunctuality and absence of
    patients. In order to resolve the missing data problem over time, tensor-based
    model is suggested for missing data imputation in recent papers because this
    approach makes use of low rank tensor assumption for highly correlated data.
    However, when the time intervals between records are long, the data correlation
    is not high along temporal direction and such assumption is not valid. To
    address this problem, we propose to decompose a matrix with missing data into
    its latent factors. Then, the locally linear constraint is imposed on these
    factors for matrix completion in this paper. By using a publicly available
    dataset and two medical datasets collected from hospital, experimental results
    show that the proposed algorithm achieves the best performance by comparing
    with the existing methods.


    Artificial Intelligence

    Probabilistic Modeling of Progressive Filtering

    Giuliano Armano
    Comments: The article entitled Modeling Progressive Filtering, published on Fundamenta Informaticae (Vol. 138, Issue 3, pp. 285-320, July 2015), has been derived from this extended report
    Subjects: Artificial Intelligence (cs.AI)

    Progressive filtering is a simple way to perform hierarchical classification,
    inspired by the behavior that most humans put into practice while attempting to
    categorize an item according to an underlying taxonomy. Each node of the
    taxonomy being associated with a different category, one may visualize the
    categorization process by looking at the item going downwards through all the
    nodes that accept it as belonging to the corresponding category. This paper is
    aimed at modeling the progressive filtering technique from a probabilistic
    perspective, in a hierarchical text categorization setting. As a result, the
    designer of a system based on progressive filtering should be facilitated in
    the task of devising, training, and testing it.

    Extracting Actionability from Machine Learning Models by Sub-optimal Deterministic Planning

    Qiang Lyu, Yixin Chen, Zhaorong Li, Zhicheng Cui, Ling Chen, Xing Zhang, Haihua Shen
    Comments: 16 pages, 4 figures
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    A main focus of machine learning research has been improving the
    generalization accuracy and efficiency of prediction models. Many models such
    as SVM, random forest, and deep neural nets have been proposed and achieved
    great success. However, what emerges as missing in many applications is
    actionability, i.e., the ability to turn prediction results into actions. For
    example, in applications such as customer relationship management, clinical
    prediction, and advertisement, the users need not only accurate prediction, but
    also actionable instructions which can transfer an input to a desirable goal
    (e.g., higher profit repays, lower morbidity rates, higher ads hit rates).
    Existing effort in deriving such actionable knowledge is few and limited to
    simple action models which restricted to only change one attribute for each
    action. The dilemma is that in many real applications those action models are
    often more complex and harder to extract an optimal solution.

    In this paper, we propose a novel approach that achieves actionability by
    combining learning with planning, two core areas of AI. In particular, we
    propose a framework to extract actionable knowledge from random forest, one of
    the most widely used and best off-the-shelf classifiers. We formulate the
    actionability problem to a sub-optimal action planning (SOAP) problem, which is
    to find a plan to alter certain features of a given input so that the random
    forest would yield a desirable output, while minimizing the total costs of
    actions. Technically, the SOAP problem is formulated in the SAS+ planning
    formalism, and solved using a Max-SAT based approach. Our experimental results
    demonstrate the effectiveness and efficiency of the proposed approach on a
    personal credit dataset and other benchmarks. Our work represents a new
    application of automated planning on an emerging and challenging machine
    learning paradigm.

    Maximizing Investment Value of Small-Scale PV in a Smart Grid Environment

    Jeremy Every, Li Li, Youguang G. Guo, David G. Dorrell
    Comments: To appear the proceedings of the 5th International Conference for Renewable Energy Research and Applications (ICRERA2016), Birmingham, United Kingdom. 6 pages. 3 figures
    Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Determining the optimal size and orientation of small-scale residential based
    PV arrays will become increasingly complex in the future smart grid environment
    with the introduction of smart meters and dynamic tariffs. However consumers
    can leverage the availability of smart meter data to conduct a more detailed
    exploration of PV investment options for their particular circumstances. In
    this paper, an optimization method for PV orientation and sizing is proposed
    whereby maximizing the PV investment value is set as the defining objective.
    Solar insolation and PV array models are described to form the basis of the PV
    array optimization strategy. A constrained particle swarm optimization
    algorithm is selected due to its strong performance in non-linear applications.
    The optimization algorithm is applied to real-world metered data to quantify
    the possible investment value of a PV installation under different energy
    retailers and tariff structures. The arrangement with the highest value is
    determined to enable prospective small-scale PV investors to select the most
    cost-effective system.

    Quantile Reinforcement Learning

    Hugo Gilbert, Paul Weng
    Comments: AWRL 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    In reinforcement learning, the standard criterion to evaluate policies in a
    state is the expectation of (discounted) sum of rewards. However, this
    criterion may not always be suitable, we consider an alternative criterion
    based on the notion of quantiles. In the case of episodic reinforcement
    learning problems, we propose an algorithm based on stochastic approximation
    with two timescales. We evaluate our proposition on a simple model of the TV
    show, Who wants to be a millionaire.

    Predicting Domain Generation Algorithms with Long Short-Term Memory Networks

    Jonathan Woodbridge, Hyrum S. Anderson, Anjum Ahuja, Daniel Grant
    Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

    Various families of malware use domain generation algorithms (DGAs) to
    generate a large number of pseudo-random domain names to connect to a command
    and control (C&C) server. In order to block DGA C&C traffic, security
    organizations must first discover the algorithm by reverse engineering malware
    samples, then generating a list of domains for a given seed. The domains are
    then either preregistered or published in a DNS blacklist. This process is not
    only tedious, but can be readily circumvented by malware authors using a large
    number of seeds in algorithms with multivariate recurrence properties (e.g.,
    banjori) or by using a dynamic list of seeds (e.g., bedep). Another technique
    to stop malware from using DGAs is to intercept DNS queries on a network and
    predict whether domains are DGA generated. Such a technique will alert network
    administrators to the presence of malware on their networks. In addition, if
    the predictor can also accurately predict the family of DGAs, then network
    administrators can also be alerted to the type of malware that is on their
    networks. This paper presents a DGA classifier that leverages long short-term
    memory (LSTM) networks to predict DGAs and their respective families without
    the need for a priori feature extraction. Results are significantly better than
    state-of-the-art techniques, providing 0.9993 area under the receiver operating
    characteristic curve for binary classification and a micro-averaged F1 score of
    0.9906. In other terms, the LSTM technique can provide a 90% detection rate
    with a 1:10000 false positive (FP) rate—a twenty times FP improvement over
    comparable methods. Experiments in this paper are run on open datasets and code
    snippets are provided to reproduce the results.

    DeepDGA: Adversarially-Tuned Domain Generation and Detection

    Hyrum S. Anderson, Jonathan Woodbridge, Bobby Filar
    Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

    Many malware families utilize domain generation algorithms (DGAs) to
    establish command and control (C&C) connections. While there are many methods
    to pseudorandomly generate domains, we focus in this paper on detecting (and
    generating) domains on a per-domain basis which provides a simple and flexible
    means to detect known DGA families. Recent machine learning approaches to DGA
    detection have been successful on fairly simplistic DGAs, many of which produce
    names of fixed length. However, models trained on limited datasets are somewhat
    blind to new DGA variants.

    In this paper, we leverage the concept of generative adversarial networks to
    construct a deep learning based DGA that is designed to intentionally bypass a
    deep learning based detector. In a series of adversarial rounds, the generator
    learns to generate domain names that are increasingly more difficult to detect.
    In turn, a detector model updates its parameters to compensate for the
    adversarially generated domains. We test the hypothesis of whether
    adversarially generated domains may be used to augment training sets in order
    to harden other machine learning models against yet-to-be-observed DGAs. We
    detail solutions to several challenges in training this character-based
    generative adversarial network (GAN). In particular, our deep learning
    architecture begins as a domain name auto-encoder (encoder + decoder) trained
    on domains in the Alexa one million. Then the encoder and decoder are
    reassembled competitively in a generative adversarial network (detector +
    generator), with novel neural architectures and training strategies to improve
    convergence.


    Information Retrieval

    Leveraging tagging and rating for recommendation: RMF meets weighted diffusion on tripartite graphs

    Jianguo Li, Yong Tang, Jiemin Chen
    Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    Recommender systems (RSs) have been a widely exploited approach to solving
    the information overload problem. However, the performance is still limited due
    to the extreme sparsity of the rating data. With the popularity of Web 2.0, the
    social tagging system provides more external information to improve
    recommendation accuracy. Although some existing approaches combine the matrix
    factorization models with co-occurrence properties and context of tags, they
    neglect the issue of tag sparsity without the commonly associated tags problem
    that would also result in inaccurate recommendations. Consequently, in this
    paper, we propose a novel hybrid collaborative filtering model named
    WUDiff_RMF, which improves Regularized Matrix Factorization (RMF) model by
    integrating Weighted User-Diffusion-based CF algorithm(WUDiff) that obtains the
    information of similar users from the weighted tripartite user-item-tag graph.
    This model aims to capture the degree correlation of the user-item-tag
    tripartite network to enhance the performance of recommendation. Experiments
    conducted on four real-world datasets demonstrate that our approach
    significantly performs better than already widely used methods in the accuracy
    of recommendation. Moreover, results show that WUDiff_RMF can alleviate the
    data sparsity, especially in the circumstance that users have made few ratings
    and few tags.

    A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

    Meisam Hejazi Nia
    Subjects: Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)

    Infographic is a type of information presentation that inbound marketers use.
    I suggest a method that can allow the infographic designers to benchmark their
    design against the previous viral infographics to measure whether a given
    design decision can help or hurt the probability of the design becoming viral.


    Computation and Language

    Binary Paragraph Vectors

    Karol Grzegorczyk, Marcin Kurdziel
    Comments: Under review as a conference paper at ICLR 2017
    Subjects: Computation and Language (cs.CL)

    Recently Le & Mikolov described two log-linear models, called Paragraph
    Vector, that can be used to learn state-of-the-art distributed representations
    of documents. Inspired by this work we present Binary Paragraph Vectors, simple
    neural networks that learn short binary codes for fast information retrieval.
    We show that binary paragraph vectors outperform autoencoder-based binary
    codes, despite using fewer bits. We also evaluate their precision in transfer
    learning settings, where binary codes are inferred for documents unrelated to
    the training corpus. Results from these experiments indicate that Binary
    Paragraph Vectors can capture semantics relevant for various domain-specific
    documents. Finally, we present a model that simultaneously learns short binary
    codes and longer, real-valued representations. This model can be used to
    rapidly retrieve a short list of highly relevant documents from a large
    document collection.

    CogALex-V Shared Task: ROOT18

    Emmanuele Chersoni, Giulia Rambelli, Enrico Santus
    Subjects: Computation and Language (cs.CL)

    In this paper, we describe ROOT 18, a classifier using the scores of several
    unsupervised distributional measures as features to discriminate between
    semantically related and unrelated words, and then to classify the related
    pairs according to their semantic relation (i.e. synonymy, antonymy, hypernymy,
    part-whole meronymy). Our classifier participated in the CogALex-V Shared Task,
    showing a solid performance on the first subtask, but a poor performance on the
    second subtask. The low scores reported on the second subtask suggest that
    distributional measures are not sufficient to discriminate between multiple
    semantic relations at once.

    A Hybrid Approach to Word Sense Disambiguation Combining Supervised and Unsupervised Learning

    Alok Ranjan Pal, Anirban Kundu, Abhay Singh, Raj Shekhar, Kunal Sinha
    Comments: 13 pages in International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 4, July 2013
    Subjects: Computation and Language (cs.CL)

    In this paper, we are going to find meaning of words based on distinct
    situations. Word Sense Disambiguation is used to find meaning of words based on
    live contexts using supervised and unsupervised approaches. Unsupervised
    approaches use online dictionary for learning, and supervised approaches use
    manual learning sets. Hand tagged data are populated which might not be
    effective and sufficient for learning procedure. This limitation of information
    is main flaw of the supervised approach. Our proposed approach focuses to
    overcome the limitation using learning set which is enriched in dynamic way
    maintaining new data. Trivial filtering method is utilized to achieve
    appropriate training data. We introduce a mixed methodology having Modified
    Lesk approach and Bag-of-Words having enriched bags using learning methods. Our
    approach establishes the superiority over individual Modified Lesk and
    Bag-of-Words approaches based on experimentation.

    An empirical study for Vietnamese dependency parsing

    Dat Quoc Nguyen, Mark Dras, Mark Johnson
    Comments: To appear in Proceedings of the 14th Annual Workshop of the Australasian Language Technology Association
    Subjects: Computation and Language (cs.CL)

    This paper presents an empirical comparison of different dependency parsers
    for Vietnamese, which has some unusual characteristics such as copula drop and
    verb serialization. Experimental results show that the neural network-based
    parsers perform significantly better than the traditional parsers. We report
    the highest parsing scores published to date for Vietnamese with the labeled
    attachment score (LAS) at 73.53% and the unlabeled attachment score (UAS) at
    80.66%.

    A FOFE-based Local Detection Approach for Named Entity Recognition and Mention Detection

    Mingbin Xu, Hui Jiang
    Subjects: Computation and Language (cs.CL)

    In this paper, we study a novel approach for named entity recognition (NER)
    and mention detection in natural language processing. Instead of treating NER
    as a sequence labelling problem, we propose a new local detection approach,
    which rely on the recent fixed-size ordinally forgetting encoding (FOFE) method
    to fully encode each sentence fragment and its left/right contexts into a
    fixed-size representation. Afterwards, a simple feedforward neural network is
    used to reject or predict entity label for each individual fragment. The
    proposed method has been evaluated in several popular NER and mention detection
    tasks, including the CoNLL 2003 NER task and TAC-KBP2015 and TAC-KBP2016
    Tri-lingual Entity Discovery and Linking (EDL) tasks. Our methods have yielded
    pretty strong performance in all of these examined tasks. This local detection
    approach has shown many advantages over the traditional sequence labelling
    methods.


    Distributed, Parallel, and Cluster Computing

    Architecting Time-Critical Big-Data Systems

    Pablo Basanta-Val, Neil Audsley, Andy Wellings (CS-YORK), Ian Gray (CS-YORK), Norberto Fernandez
    Comments: in IEEE Transactions on Big Data, 2016
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    – Current infrastructures for developing big-data applications are able to
    process –via big-data analytics-huge amounts of data, using clusters of
    machines that collaborate to perform parallel computations. However, current
    infrastructures were not designed to work with the requirements of
    time-critical applications; they are more focused on general-purpose
    applications rather than time-critical ones. Addressing this issue from the
    perspective of the real-time systems community, this paper considers
    time-critical big-data. It deals with the definition of a time-critical
    big-data system from the point of view of requirements, analyzing the specific
    characteristics of some popular big-data applications. This analysis is
    complemented by the challenges stemmed from the infrastructures that support
    the applications, proposing an architecture and offering initial performance
    patterns that connect application costs with infrastructure performance.

    A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

    Elias Stehle, Hans-Arno Jacobsen
    Comments: Submitted to SIGMOD 2017 for publication
    Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    Sorting is at the core of many database operations, such as index creation,
    sort-merge joins and user-requested output sorting. As GPUs are emerging as a
    promising platform to accelerate various operations, sorting on GPUs becomes a
    viable endeavour. Over the past few years, several improvements have been
    proposed for sorting on GPUs, leading to the first radix sort implementations
    that achieve a sorting rate of over one billion 32-bit keys per second. Yet,
    state-of-the-art approaches are heavily memory bandwidth-bound, as they require
    substantially more memory transfers than their CPU-based counterparts.

    Our work proposes a novel approach that almost halves the amount of memory
    transfers and, therefore, considerably lifts the memory bandwidth limitation.
    Being able to sort two gigabytes of eight byte records in as little as 50
    milliseconds, our approach achieves a 2.32-fold improvement over the
    state-of-the-art GPU-based radix sort for uniform distributions, sustaining a
    minimum speed-up of no less than a factor of 1.66 for skewed distributions.

    To address inputs that either do not reside on the GPU or exceed the
    available device memory, we build on our efficient GPU sorting approach with a
    pipelined heterogeneous sorting algorithm that mitigates the overhead
    associated with PCIe data transfers. Comparing the end-to-end sorting
    performance to the state-of-the-art CPU-based radix sort running 16 threads,
    our heterogeneous approach achieves a 2.06-fold and a 1.53-fold improvement for
    sorting 64 GB key-value pairs with a skewed and a uniform distribution,
    respectively.


    Learning

    Using a Deep Reinforcement Learning Agent for Traffic Signal Control

    Wade Genders, Saiedeh Razavi
    Subjects: Learning (cs.LG); Systems and Control (cs.SY)

    Ensuring transportation systems are efficient is a priority for modern
    society. Technological advances have made it possible for transportation
    systems to collect large volumes of varied data on an unprecedented scale. We
    propose a traffic signal control system which takes advantage of this new, high
    quality data, with minimal abstraction compared to other proposed systems. We
    apply modern deep reinforcement learning methods to build a truly adaptive
    traffic signal control agent in the traffic microsimulator SUMO. We propose a
    new state space, the discrete traffic state encoding, which is information
    dense. The discrete traffic state encoding is used as input to a deep
    convolutional neural network, trained using Q-learning with experience replay.
    Our agent was compared against a one hidden layer neural network traffic signal
    control agent and reduces average cumulative delay by 82%, average queue length
    by 66% and average travel time by 20%.

    A-Ward_p{eta}: Effective hierarchical clustering using the Minkowski metric and a fast k -means initialisation

    Renato Cordeiro de Amorim, Vladimir Makarenkov, Boris Mirkin
    Journal-ref: Information Sciences, 370, 343-354 (2016)
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    In this paper we make two novel contributions to hierarchical clustering.
    First, we introduce an anomalous pattern initialisation method for hierarchical
    clustering algorithms, called A-Ward, capable of substantially reducing the
    time they take to converge. This method generates an initial partition with a
    sufficiently large number of clusters. This allows the cluster merging process
    to start from this partition rather than from a trivial partition composed
    solely of singletons. Our second contribution is an extension of the Ward and
    Ward p algorithms to the situation where the feature weight exponent can differ
    from the exponent of the Minkowski distance. This new method, called A-Ward
    p{eta} , is able to generate a much wider variety of clustering solutions. We
    also demonstrate that its parameters can be estimated reasonably well by using
    a cluster validity index. We perform numerous experiments using data sets with
    two types of noise, insertion of noise features and blurring within-cluster
    values of some features. These experiments allow us to conclude: (i) our
    anomalous pattern initialisation method does indeed reduce the time a
    hierarchical clustering algorithm takes to complete, without negatively
    impacting its cluster recovery ability; (ii) A-Ward p{eta} provides better
    cluster recovery than both Ward and Ward p.

    Learning Locomotion Skills Using DeepRL: Does the Choice of Action Space Matter?

    Xue Bin Peng, Michiel van de Panne
    Subjects: Learning (cs.LG); Graphics (cs.GR); Robotics (cs.RO)

    The use of deep reinforcement learning allows for high-dimensional state
    descriptors, but little is known about how the choice of action representation
    impacts the learning difficulty and the resulting performance. We compare the
    impact of four different action parameterizations (torques, muscle-activations,
    target joint angles, and target joint-angle velocities) in terms of learning
    time, policy robustness, motion quality, and policy query rates. Our results
    are evaluated on a gait-cycle imitation task for multiple planar articulated
    figures and multiple gaits. We demonstrate that the local feedback provided by
    higher-level action parameterizations can significantly impact the learning,
    robustness, and quality of the resulting policies.

    Quantile Reinforcement Learning

    Hugo Gilbert, Paul Weng
    Comments: AWRL 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    In reinforcement learning, the standard criterion to evaluate policies in a
    state is the expectation of (discounted) sum of rewards. However, this
    criterion may not always be suitable, we consider an alternative criterion
    based on the notion of quantiles. In the case of episodic reinforcement
    learning problems, we propose an algorithm based on stochastic approximation
    with two timescales. We evaluate our proposition on a simple model of the TV
    show, Who wants to be a millionaire.

    Deep Convolutional Neural Network Design Patterns

    Leslie N. Smith, Nicholay Topin
    Comments: Submitted as a conference paper at ICLR 2017
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Recent research in the deep learning field has produced a plethora of new
    architectures. At the same time, a growing number of groups are applying deep
    learning to new applications and problems. Many of these groups might be
    composed of inexperienced deep learning practitioners who are baffled by the
    dizzying array of architecture choices and therefore use an older architecture,
    such as Alexnet. Here, we are attempting to bridge this gap by mining the
    collective knowledge contained in recent deep learning research to discover
    underlying principles for designing neural network architectures. In addition,
    we describe several architectural innovations, including Fractal of FractalNet,
    Stagewise Boosting Networks, and Taylor Series Networks (our Caffe code and
    prototxt files will be made publicly available). We hope others are inspired to
    build on this preliminary work.

    Temporal Matrix Completion with Locally Linear Latent Factors for Medical Applications

    Frodo Kin Sun Chan, Andy J Ma, Pong C Yuen, Terry Cheuk-Fung Yip, Yee-Kit Tse, Vincent Wai-Sun Wong, Grace Lai-Hung Wong
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Regular medical records are useful for medical practitioners to analyze and
    monitor patient health status especially for those with chronic disease, but
    such records are usually incomplete due to unpunctuality and absence of
    patients. In order to resolve the missing data problem over time, tensor-based
    model is suggested for missing data imputation in recent papers because this
    approach makes use of low rank tensor assumption for highly correlated data.
    However, when the time intervals between records are long, the data correlation
    is not high along temporal direction and such assumption is not valid. To
    address this problem, we propose to decompose a matrix with missing data into
    its latent factors. Then, the locally linear constraint is imposed on these
    factors for matrix completion in this paper. By using a publicly available
    dataset and two medical datasets collected from hospital, experimental results
    show that the proposed algorithm achieves the best performance by comparing
    with the existing methods.

    Categorical Reparameterization with Gumbel-Softmax

    Eric Jang, Shixiang Gu, Ben Poole
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Categorical variables are a natural choice for representing discrete
    structure in the world. However, stochastic neural networks rarely use
    categorical latent variables due to the inability to backpropagate through
    samples. In this work, we present an efficient gradient estimator that replaces
    the non-differentiable sample from a categorical distribution with a
    differentiable sample from a novel Gumbel-Softmax distribution. This
    distribution has the essential property that it can be smoothly annealed into a
    categorical distribution. We show that our Gumbel-Softmax estimator outperforms
    state-of-the-art gradient estimators on structured output prediction and
    unsupervised generative modeling tasks with categorical latent variables, and
    enables large speedups on semi-supervised classification.

    Cross: Efficient Low-rank Tensor Completion

    Anru Zhang
    Subjects: Methodology (stat.ME); Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

    The completion of tensors, or high-order arrays, attracts significant
    attention in recent research. Current literature on tensor completion primarily
    focuses on recovery from a set of uniformly randomly measured entries, and the
    required number of measurements to achieve recovery is not guaranteed to be
    optimal. In addition, the implementation of some previous methods are NP-hard.
    In this article, we propose a framework for low-rank tensor completion via a
    novel tensor measurement scheme we name Cross. The proposed procedure is
    efficient and easy to implement. In particular, we show that a third order
    tensor of Tucker rank-((r_1, r_2, r_3)) in (p_1)-by-(p_2)-by-(p_3) dimensional
    space can be recovered from as few as (r_1r_2r_3 + r_1(p_1-r_1) + r_2(p_2-r_2)
    + r_3(p_3-r_3)) noiseless measurements, which matches the sample complexity
    lower-bound. In the case of noisy measurements, we also develop a theoretical
    upper bound and the matching minimax lower bound for recovery error over
    certain classes of low-rank tensors for the proposed procedure. The results can
    be further extended to fourth or higher-order tensors. Simulation studies show
    that the method performs well under a variety of settings. Finally, the
    procedure is illustrated through a real dataset in neuroimaging.

    Learning to Pivot with Adversarial Networks

    Gilles Louppe, Michael Kagan, Kyle Cranmer
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Data Analysis, Statistics and Probability (physics.data-an); Methodology (stat.ME)

    Many inference problems involve data generation processes that are not
    uniquely specified or are uncertain in some way. In a scientific context, the
    presence of several plausible data generation processes is often associated to
    the presence of systematic uncertainties. Robust inference is possible if it is
    based on a pivot — a quantity whose distribution is invariant to the unknown
    value of the (categorical or continuous) nuisance parameters that parametrizes
    this family of generation processes. In this work, we introduce a flexible
    training procedure based on adversarial networks for enforcing the pivotal
    property on a predictive model. We derive theoretical results showing that the
    proposed algorithm tends towards a minimax solution corresponding to a
    predictive model that is both optimal and independent of the nuisance
    parameters (if that models exists) or for which one can tune the trade-off
    between power and robustness. Finally, we demonstrate the effectiveness of this
    approach with a toy example and an example from particle physics.

    Multitask Protein Function Prediction Through Task Dissimilarity

    Marco Frasca, Nicolò Cesa Bianchi
    Comments: 12 pages, 5 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Quantitative Methods (q-bio.QM)

    Automated protein function prediction is a challenging problem with
    distinctive features, such as the hierarchical organization of protein
    functions and the scarcity of annotated proteins for most biological functions.
    We propose a multitask learning algorithm addressing both issues. Unlike
    standard multitask algorithms, which use task (protein functions) similarity
    information as a bias to speed up learning, we show that dissimilarity
    information enforces separation of rare class labels from frequent class
    labels, and for this reason is better suited for solving unbalanced protein
    function prediction problems. We support our claim by showing that a multitask
    extension of the label propagation algorithm empirically works best when the
    task relatedness information is represented using a dissimilarity matrix as
    opposed to a similarity matrix. Moreover, the experimental comparison carried
    out on three model organism shows that our method has a more stable performance
    in both “protein-centric” and “function-centric” evaluation settings.

    Fast Eigenspace Approximation using Random Signals

    Johan Paratte, Lionel Martin
    Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG); Machine Learning (stat.ML)

    We focus in this work on the estimation of the first (k) eigenvectors of any
    graph Laplacian using filtering of Gaussian random signals. We prove that we
    only need (k) such signals to be able to exactly recover as many of the
    smallest eigenvectors, regardless of the number of nodes in the graph. In
    addition, we address key issues in implementing the theoretical concepts in
    practice using accurate approximated methods. We also propose fast algorithms
    both for eigenspace approximation and for the determination of the (k)th
    smallest eigenvalue (lambda_k). The latter proves to be extremely efficient
    under the assumption of locally uniform distribution of the eigenvalue over the
    spectrum. Finally, we present experiments which show the validity of our method
    in practice and compare it to state-of-the-art methods for clustering and
    visualization both on synthetic small-scale datasets and larger real-world
    problems of millions of nodes. We show that our method allows a better scaling
    with the number of nodes than all previous methods while achieving an almost
    perfect reconstruction of the eigenspace formed by the first (k) eigenvectors.

    Low Rank Approximation with Entrywise (ell_1)-Norm Error

    Zhao Song, David P. Woodruff, Peilin Zhong
    Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Learning (cs.LG)

    We study the (ell_1)-low rank approximation problem, where for a given (n
    imes d) matrix (A) and approximation factor (alpha geq 1), the goal is to
    output a rank-(k) matrix (widehat{A}) for which

    ()|A-widehat{A}|_1 leq alpha cdot min_{ extrm{rank-}k extrm{
    matrices}~A’}|A-A’|_1,() where for an (n imes d) matrix (C), we let
    (|C|_1 = sum_{i=1}^n sum_{j=1}^d |C_{i,j}|). This error measure is known to
    be more robust than the Frobenius norm in the presence of outliers and is
    indicated in models where Gaussian assumptions on the noise may not apply. The
    problem was shown to be NP-hard by Gillis and Vavasis and a number of
    heuristics have been proposed. It was asked in multiple places if there are any
    approximation algorithms.

    We give the first provable approximation algorithms for (ell_1)-low rank
    approximation, showing that it is possible to achieve approximation factor
    (alpha = (log d) cdot mathrm{poly}(k)) in (mathrm{nnz}(A) + (n+d)
    mathrm{poly}(k)) time, where (mathrm{nnz}(A)) denotes the number of non-zero
    entries of (A). If (k) is constant, we further improve the approximation ratio
    to (O(1)) with a (mathrm{poly}(nd))-time algorithm. Under the Exponential Time
    Hypothesis, we show there is no (mathrm{poly}(nd))-time algorithm achieving a
    ((1+frac{1}{log^{1+gamma}(nd)}))-approximation, for (gamma > 0) an
    arbitrarily small constant, even when (k = 1).

    We give a number of additional results for (ell_1)-low rank approximation:
    nearly tight upper and lower bounds for column subset selection, CUR
    decompositions, extensions to low rank approximation with respect to
    (ell_p)-norms for (1 leq p < 2) and earthmover distance, low-communication
    distributed protocols and low-memory streaming algorithms, algorithms with
    limited randomness, and bicriteria algorithms. We also give a preliminary
    empirical evaluation.

    Extracting Actionability from Machine Learning Models by Sub-optimal Deterministic Planning

    Qiang Lyu, Yixin Chen, Zhaorong Li, Zhicheng Cui, Ling Chen, Xing Zhang, Haihua Shen
    Comments: 16 pages, 4 figures
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    A main focus of machine learning research has been improving the
    generalization accuracy and efficiency of prediction models. Many models such
    as SVM, random forest, and deep neural nets have been proposed and achieved
    great success. However, what emerges as missing in many applications is
    actionability, i.e., the ability to turn prediction results into actions. For
    example, in applications such as customer relationship management, clinical
    prediction, and advertisement, the users need not only accurate prediction, but
    also actionable instructions which can transfer an input to a desirable goal
    (e.g., higher profit repays, lower morbidity rates, higher ads hit rates).
    Existing effort in deriving such actionable knowledge is few and limited to
    simple action models which restricted to only change one attribute for each
    action. The dilemma is that in many real applications those action models are
    often more complex and harder to extract an optimal solution.

    In this paper, we propose a novel approach that achieves actionability by
    combining learning with planning, two core areas of AI. In particular, we
    propose a framework to extract actionable knowledge from random forest, one of
    the most widely used and best off-the-shelf classifiers. We formulate the
    actionability problem to a sub-optimal action planning (SOAP) problem, which is
    to find a plan to alter certain features of a given input so that the random
    forest would yield a desirable output, while minimizing the total costs of
    actions. Technically, the SOAP problem is formulated in the SAS+ planning
    formalism, and solved using a Max-SAT based approach. Our experimental results
    demonstrate the effectiveness and efficiency of the proposed approach on a
    personal credit dataset and other benchmarks. Our work represents a new
    application of automated planning on an emerging and challenging machine
    learning paradigm.

    Initialization and Coordinate Optimization for Multi-way Matching

    Da Tang, Tony Jebara
    Comments: 18 pages (including the supplementary material), 8 figures
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We consider the problem of consistently matching multiple sets of elements to
    each other, which is a common task in fields such as computer vision. To solve
    the underlying NP-hard objective, existing methods often relax or approximate
    it, but end up with unsatisfying empirical performances due to their inexact
    objectives. We propose a coordinate update algorithm that directly solves the
    exact objective. By using the pairwise alignment information to build an
    undirected graph and initializing the permutation matrices along the edges of
    its Maximum Spanning Tree, our algorithm successfully avoids bad local optima.
    Theoretically, with high probability our algorithm could guarantee to solve
    this problem optimally on data with reasonable noise. Empirically, our
    algorithm consistently and significantly outperforms existing methods on
    several benchmark tasks on real datasets.

    Multidimensional Binary Search for Contextual Decision-Making

    Ilan Lobel, Renato Paes Leme, Adrian Vladu
    Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG)

    We consider a multidimensional search problem that is motivated by questions
    in contextual decision-making, such as dynamic pricing and personalized
    medicine. Nature selects a state from a (d)-dimensional unit ball and then
    generates a sequence of (d)-dimensional directions. We are given access to the
    directions, but not access to the state. After receiving a direction, we have
    to guess the value of the dot product between the state and the direction. Our
    goal is to minimize the number of times when our guess is more than (epsilon)
    away from the true answer. We construct a polynomial time algorithm that we
    call Projected Volume achieving regret (O(dlog(d/epsilon))), which is optimal
    up to a (log d) factor. The algorithm combines a volume cutting strategy with
    a new geometric technique that we call cylindrification.

    Quantum Laplacian Eigenmap

    Yiming Huang, Xiaoyu Li
    Subjects: Quantum Physics (quant-ph); Learning (cs.LG)

    Laplacian eigenmap algorithm is a typical nonlinear model for dimensionality
    reduction in classical machine learning. We propose an efficient quantum
    Laplacian eigenmap algorithm to exponentially speed up the original
    counterparts. In our work, we demonstrate that the Hermitian chain product
    proposed in quantum linear discriminant analysis (arXiv:1510.00113,2015) can be
    applied to implement quantum Laplacian eigenmap algorithm. While classical
    Laplacian eigenmap algorithm requires polynomial time to solve the eigenvector
    problem, our algorithm is able to exponentially speed up nonlinear
    dimensionality reduction.


    Information Theory

    Extension Theorems for Various Weight Functions over Frobenius Bimodules

    Heide Gluesing-Luerssen, Tefjol Pllaha
    Subjects: Information Theory (cs.IT); Rings and Algebras (math.RA)

    In this paper we study codes where the alphabet is a finite Frobenius
    bimodule over a finite ring. We discuss the extension property for various
    weight functions. Employing an entirely character-theoretic approach and a
    duality theory for partitions on Frobenius bimodules we derive alternative
    proofs for the facts that the Hamming weight and the homogeneous weight satisfy
    the extension property. We also use the same techniques to derive the extension
    property for other weights, such as the Rosenbloom-Tsfasman weight.

    Pilot Distribution Optimization and Power Control in Multi-Cellular Large Scale MIMO Systems

    Jose Carlos Marinello, Taufik Abrao
    Comments: 21 pages, 5 figures; submitted paper for a journal
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    Massive MIMO communication systems have been identified as one of the most
    prominent technologies of the next generation wireless standards, such as 5G,
    due to the large gains in energy and spectral efficiency that can be achieved.
    In the asymptotic condition of infinite number of antennas at the base station
    (BS), the performance bottleneck of these systems is due to the pilot
    contamination effect, i.e., the directional interference arising from users in
    adjacent cells that reuse the same set of orthogonal training sequences, and
    thus the interference seen by each user is determined by the pilot sequence
    assigned to him. We show in this paper that the system performance can be
    improved by appropriately assigning the pilot sequences to the users, in the
    so-called pilot allocation scheme. Depending on the optimization metric
    adopted, it is more advantageous to a user with certain long- term fading
    coefficient be assigned to a specific pilot sequence, whose interference can be
    completely estimated in advance by the BS by only knowing the long term fading
    coefficients of users in adjacent cells. Besides, if the objective is to
    maximize the number of users with a target quality of service, we have shown
    that the pilot allocation schemes can be combined with power control
    algorithms, resulting in much more improvements for the system. For unitary
    frequency reuse factor, we have found that the data throughput provided for 95%
    of the users increases when applying power control algorithm from 134kbps to
    1.461Mbps with no pilot allocation, while this performance gain provided by
    power control changes from 793kbps to 6.743Mbps when pilot allocation is
    employed. If the reuse factor increases to 3, a 95%-likely data throughput of
    17.310Mbps can be assured when pilot allocation and power control are suitably
    combined.

    Multi-Way Massive MIMO with Maximum-Ratio Processing and Imperfect CSI

    Chung Duc Ho, Hien Quoc Ngo, Michail Matthaiou, Trung Q. Duong
    Subjects: Information Theory (cs.IT)

    This paper considers a multi-way massive multiple-input multiple-output
    relaying system, where single-antenna users exchange their information-bearing
    signals with the help of one relay station equipped with unconventionally many
    antennas. The relay first estimates the channels to all users through the pilot
    signals transmitted from them. Then, the relay uses maximum-ratio processing
    (i.e. maximum-ratio combining in the multiple-access phase and maximum-ratio
    transmission in the broadcast phase) to process the signals. A rigorous
    closed-form expression for the spectral efficiency is derived. The effects of
    the channel estimation error, the channel estimation overhead, the length of
    the training duration, and the randomness of the user locations are analyzed.
    We show that by deploying massive antenna arrays at the relay and simple
    maximum-ratio processing, we can serve many users in the same time-frequency
    resource, while maintaining a given quality-of-service for each user.

    Sparse Support Recovery with Non-smooth Loss Functions

    Kévin Degraux, Gabriel Peyré, Jalal M. Fadili, Laurent Jacques
    Comments: in Proc. NIPS 2016
    Subjects: Information Theory (cs.IT)

    In this paper, we study the support recovery guarantees of underdetermined
    sparse regression using the (ell_1)-norm as a regularizer and a non-smooth
    loss function for data fidelity. More precisely, we focus in detail on the
    cases of (ell_1) and (ell_infty) losses, and contrast them with the usual
    (ell_2) loss. While these losses are routinely used to account for either
    sparse ((ell_1) loss) or uniform ((ell_infty) loss) noise models, a
    theoretical analysis of their performance is still lacking. In this article, we
    extend the existing theory from the smooth (ell_2) case to these non-smooth
    cases. We derive a sharp condition which ensures that the support of the vector
    to recover is stable to small additive noise in the observations, as long as
    the loss constraint size is tuned proportionally to the noise level. A
    distinctive feature of our theory is that it also explains what happens when
    the support is unstable. While the support is not stable anymore, we identify
    an “extended support” and show that this extended support is stable to small
    additive noise. To exemplify the usefulness of our theory, we give a detailed
    numerical analysis of the support stability/instability of compressed sensing
    recovery with these different losses. This highlights different parameter
    regimes, ranging from total support stability to progressively increasing
    support instability.

    Phase Shift Keying on the Hypersphere: Power-Efficient MIMO Communications

    Christoph Rachinger, Ralf R. Müller, Johannes B. Huber
    Subjects: Information Theory (cs.IT); Multimedia (cs.MM)

    We present Phase Shift Keying on the Hypersphere (PSKH), a generalization of
    conventional Phase Shift Keying (PSK) for Multiple-Input Multiple-Output (MIMO)
    systems, where data constellations are distributed over a multidimensional
    hypersphere. The use of such constellations with
    Peak-To-Average-Sum-Power-Ratio (PASPR) of 1 allows to use load-modulated
    transmitter which require a much smaller backoff, which in turn results in
    reduced power loss. In this paper we discuss several methods how to generate
    PSKH constellations and compare their performance. Bandwidth and power
    efficiency of PSKH systems can be greatly influenced by the choice of different
    pulse shapes which affect spectrum and PASPR of the transmission system. In
    order to further reduce the PASPR of our transmission signal, we use spherical
    interpolation to generate a smooth signal over the hypersphere and present the
    corresponding receiver and how complexity-reduction techniques influence the
    system performance. Finally, we discuss the methods presented in this paper
    regarding their trade-offs with respect to PASPR, spectrum, power efficiency
    and receiver complexity.

    Informational and Causal Architecture of Continuous-time Renewal and Hidden Semi-Markov Processes

    Sarah E. Marzen, James P. Crutchfield
    Comments: 16 pages, 7 figures; this http URL
    Subjects: Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Statistics Theory (math.ST); Chaotic Dynamics (nlin.CD)

    We introduce the minimal maximally predictive models ({epsilon}-machines) of
    processes generated by certain hidden semi-Markov models. Their causal states
    are either hybrid discrete-continuous or continuous random variables and
    causal-state transitions are described by partial differential equations.
    Closed-form expressions are given for statistical complexities, excess
    entropies, and differential information anatomy rates. We present a complete
    analysis of the {epsilon}-machines of continuous-time renewal processes and,
    then, extend this to processes generated by unifilar hidden semi-Markov models
    and semi-Markov models. Our information-theoretic analysis leads to new
    expressions for the entropy rate and the rates of related information measures
    for these very general continuous-time process classes.

    QoE-based MAC Layer Optimization for Video Teleconferencing over WiFi

    Tianyi Xu, Liangping Ma, Gregory Sternberg
    Subjects: Multimedia (cs.MM); Information Theory (cs.IT)

    In IEEE 802.11, the retry limit is set the same value for all packets. In
    this paper, we dynamically classify video teleconferencing packets based on the
    type of the video frame that a packet carries and the packet loss events that
    have happened in the network, and assign them different retry limits. We
    consider the IPPP video encoding structure with instantaneous decoder refresh
    (IDR) frame insertion based on packet loss feedback. The loss of a single frame
    causes error propagation for a period of time equal to the packet loss feedback
    delay. To optimize the video quality, we propose a method to concentrate the
    packet losses to small segments of the entire video sequence, and study the
    performance by an analytic model. Our proposed method is implemented only on
    the stations interested in enhanced video quality, and is compatible with
    unmodified IEEE 802.11 stations and access points in terms of performance.
    Simulation results show that the performance gain can be significant compared
    to the IEEE 802.11 standard without negatively affecting cross traffic.




沪ICP备19023445号-2号
友情链接