IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Thu, 16 Feb 2017

    我爱机器学习(52ml.net)发表于 2017-02-16 00:00:00
    love 0

    Neural and Evolutionary Computing

    Building Robust Stochastic Configuration Networks with Kernel Density Estimation

    Dianhui Wang, Ming Li
    Comments: 16 pages
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

    This paper aims at developing robust data modelling techniques using
    stochastic configuration networks (SCNs), where a weighted least squares method
    with the well-known kernel density estimation (KDE) is used in the design of
    SCNs. The alternating optimization (AO) technique is applied for iteratively
    building a robust SCN model that can reduce some negative impacts, caused by
    corrupted data or outliers, in learning process. Simulation studies are carried
    out on a function approximation and four benchmark datasets, also a case study
    on industrial application is reported. Comparisons against other robust
    modelling techniques, including the probabilistic robust learning algorithm for
    neural networks with random weights (PRNNRW) and an Improved RVFL, demonstrate
    that our proposed robust stochastic configuration algorithm with KDE (RSC-KED)
    perform favourably.

    Generative Temporal Models with Memory

    Mevlana Gemici, Chia-Chun Hung, Adam Santoro, Greg Wayne, Shakir Mohamed, Danilo J. Rezende, David Amos, Timothy Lillicrap
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We consider the general problem of modeling temporal data with long-range
    dependencies, wherein new observations are fully or partially predictable based
    on temporally-distant, past observations. A sufficiently powerful temporal
    model should separate predictable elements of the sequence from unpredictable
    elements, express uncertainty about those unpredictable elements, and rapidly
    identify novel elements that may help to predict the future. To create such
    models, we introduce Generative Temporal Models augmented with external memory
    systems. They are developed within the variational inference framework, which
    provides both a practical training methodology and methods to gain insight into
    the models’ operation. We show, on a range of problems with sparse, long-term
    temporal dependencies, that these models store information from early in a
    sequence, and reuse this stored information efficiently. This allows them to
    perform substantially better than existing models based on well-known recurrent
    neural networks, like LSTMs.

    Frustratingly Short Attention Spans in Neural Language Modeling

    Michał Daniluk, Tim Rocktäschel, Johannes Welbl, Sebastian Riedel
    Comments: Published as a conference paper at ICLR 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Neural language models predict the next token using a latent representation
    of the immediate token history. Recently, various methods for augmenting neural
    language models with an attention mechanism over a differentiable memory have
    been proposed. For predicting the next token, these models query information
    from a memory of the recent history which can facilitate learning mid- and
    long-range dependencies. However, conventional attention mechanisms used in
    memory-augmented neural language models produce a single output vector per time
    step. This vector is used both for predicting the next token as well as for the
    key and value of a differentiable memory of a token history. In this paper, we
    propose a neural language model with a key-value attention mechanism that
    outputs separate representations for the key and value of a differentiable
    memory, as well as for encoding the next-word distribution. This model
    outperforms existing memory-augmented neural language models on two corpora.
    Yet, we found that our method mainly utilizes a memory of the five most recent
    output representations. This led to the unexpected main finding that a much
    simpler model based only on the concatenation of recent output representations
    from previous time steps is on par with more sophisticated memory-augmented
    neural language models.


    Computer Vision and Pattern Recognition

    Multi-Task Convolutional Neural Network for Face Recognition

    Xi Yin, Xiaoming Liu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper explores multi-task learning (MTL) for face recognition. We answer
    the questions of how and why MTL can improve the face recognition performance.
    First, we propose a multi-task Convolutional Neural Network (CNN) for face
    recognition where identity recognition is the main task and pose, illumination,
    and expression estimations are the side tasks. Second, we develop a
    dynamic-weighting scheme to automatically assign the loss weight to each side
    task. Third, we propose a pose-directed multi-task CNN by grouping different
    poses to learn pose-specific identity features, simultaneously across all
    poses. We observe that the side tasks serve as regularizations to disentangle
    the variations from the learnt identity features. Extensive experiments on the
    entire Multi-PIE dataset demonstrate the effectiveness of the proposed
    approach. To the best of our knowledge, this is the first work using all data
    in Multi-PIE for face recognition. Our approach is also applicable to
    in-the-wild datasets for pose-invariant face recognition and we achieve
    comparable or better performance than state of the art on LFW, CFP, and IJB-A.

    Visual Discovery at Pinterest

    Andrew Zhai, Dmitry Kislyuk, Yushi Jing, Michael Feng, Eric Tzeng, Jeff Donahue, Yue Li Du, Trevor Darrell
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Over the past three years Pinterest has experimented with several visual
    search and recommendation services, including Related Pins (2014), Similar
    Looks (2015), Flashlight (2016) and Lens (2017). This paper presents an
    overview of our visual discovery engine powering these services, and shares the
    rationales behind our technical and product decisions such as the use of object
    detection and interactive user interfaces. We conclude that this visual
    discovery engine significantly improves engagement in both search and
    recommendation tasks.

    Handwritten Arabic Numeral Recognition using Deep Learning Neural Networks

    Akm Ashiquzzaman, Abdul Kawsar Tushar
    Comments: Conference Name – 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR17) 4 pages, 5 figures, 1 table
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Handwritten character recognition is an active area of research with
    applications in numerous fields. Past and recent works in this field have
    concentrated on various languages. Arabic is one language where the scope of
    research is still widespread, with it being one of the most popular languages
    in the world and being syntactically different from other major languages. Das
    et al. cite{DBLP:journals/corr/abs-1003-1891} has pioneered the research for
    handwritten digit recognition in Arabic. In this paper, we propose a novel
    algorithm based on deep learning neural networks using appropriate activation
    function and regularization layer, which shows significantly improved accuracy
    compared to the existing Arabic numeral recognition methods. The proposed model
    gives 97.4 percent accuracy, which is the recorded highest accuracy of the
    dataset used in the experiment. We also propose a modification of the method
    described in cite{DBLP:journals/corr/abs-1003-1891}, where our method scores
    identical accuracy as that of cite{DBLP:journals/corr/abs-1003-1891}, with the
    value of 93.8 percent.

    Computational Model for Predicting Visual Fixations from Childhood to Adulthood

    Olivier Le Meur, Antoine Coutrot, Zhi Liu, Adrien Le Roch, Andrea Helo, Pia Rama
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    How people look at visual information reveals fundamental information about
    themselves, their interests and their state of mind. While previous visual
    attention models output static 2-dimensional saliency maps, saccadic models aim
    to predict not only where observers look at but also how they move their eyes
    to explore the scene. Here we demonstrate that saccadic models are a flexible
    framework that can be tailored to emulate observer’s viewing tendencies. More
    specifically, we use the eye data from 101 observers split in 5 age groups
    (adults, 8-10 y.o., 6-8 y.o., 4-6 y.o. and 2 y.o.) to train our saccadic model
    for different stages of the development of the human visual system. We show
    that the joint distribution of saccade amplitude and orientation is a visual
    signature specific to each age group, and can be used to generate age-dependent
    scanpaths. Our age-dependent saccadic model not only outputs human-like,
    age-specific visual scanpath, but also significantly outperforms other
    state-of-the-art saliency models. In this paper, we demonstrate that the
    computational modelling of visual attention, through the use of saccadic model,
    can be efficiently adapted to emulate the gaze behavior of a specific group of
    observers.

    Visualizing Deep Neural Network Decisions: Prediction Difference Analysis

    Luisa M Zintgraf, Taco S Cohen, Tameem Adel, Max Welling
    Comments: ICLR2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    This article presents the prediction difference analysis method for
    visualizing the response of a deep neural network to a specific input. When
    classifying images, the method highlights areas in a given input image that
    provide evidence for or against a certain class. It overcomes several
    shortcoming of previous methods and provides great additional insight into the
    decision making process of classifiers. Making neural network decisions
    interpretable through visualization is important both to improve models and to
    accelerate the adoption of black-box classifiers in application areas such as
    medicine. We illustrate the method in experiments on natural images (ImageNet
    data), as well as medical images (MRI brain scans).

    Deep Multi-camera People Detection

    Tatjana Chavdarova, François Fleuret
    Comments: Under review at ICIP 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep architectures are currently the top performing methods for monocular
    pedestrian detection. Surprisingly, they have not been applied in the
    multi-camera set-up. This is probably in large part due to the lack of
    large-scale labeled multi-camera data-sets with overlapping fields of view. Our
    main contribution is a strategy in which we re-use a pre-trained object
    detection network, fine-tune it on a large-scale monocular pedestrian data-set,
    and train an architecture which combines multiple instances of it on a small
    multi-camera data-set.

    We estimate performance on both a new HD multi-view data-set, and the
    standard one – PETS 2009, on which we outperform state of the art methods.

    Normalized Total Gradient: A New Measure for Multispectral Image Registration

    Shu-Jie Chen, Hui-Liang Shen
    Comments: 12 pages, 11 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Image registration is a fundamental issue in multispectral image processing.
    In filter wheel based multispectral imaging systems, the non-coplanar placement
    of the filters always causes the misalignment of multiple channel images. The
    selective characteristic of spectral response in multispectral imaging raises
    two challenges to image registration. First, the intensity levels of a local
    region may be different in individual channel images. Second, the local
    intensity may vary rapidly in some channel images while keeps stationary in
    others. Conventional multimodal measures, such as mutual information,
    correlation coefficient, and correlation ratio, can register images with
    different regional intensity levels, but will fail in the circumstance of
    severe local intensity variation. In this paper, a new measure, namely
    normalized total gradient (NTG), is proposed for multispectral image
    registration. The NTG is applied on the difference between two channel images.
    This measure is based on the key assumption (observation) that the gradient of
    difference image between two aligned channel images is sparser than that
    between two misaligned ones. A registration framework, which incorporates image
    pyramid and global/local optimization, is further introduced for rigid
    transform. Experimental results validate that the proposed method is effective
    for multispectral image registration and performs better than conventional
    methods.

    A deep learning model integrating FCNNs and CRFs for brain tumor segmentation

    Xiaomei Zhao, Yihong Wu, Guidong Song, Zhenye Li, Yazhuo Zhang, Yong Fan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Accurate and reliable brain tumor segmentation is a critical component in
    cancer diagnosis, treatment planning, and treatment outcome evaluation. Deep
    learning techniques are appealing for their capability of learning high level
    task-adaptive image features and have been adopted in brain tumor segmentation
    studies. However, most of the existing deep learning based brain tumor
    segmentation methods are not equipped to ensure appearance and spatial
    consistency of the segmentation results. To improve tumor segmentation
    performance, we propose a novel brain tumor segmentation method by integrating
    fully convolutional neural networks (FCNNs) and Conditional Random Fields
    (CRFs) in a unified framework, rather than adopting the CRFs as a
    post-processing step of the tumor segmentation. Our segmentation model was
    trained using image patches and image slices in following steps: 1) training
    FCNNs using image patches; 2) training CRF-RNN using image slices of axial view
    with parameters of FCNNs fixed; and 3) fine-tuning the whole network using
    image slices. Our method could segment brain images slice-by-slice, much faster
    than those image patch based tumor segmentation methods. Our method could
    segment tumors based on only 3 imaging modalities (Flair, T1c, T2), rather than
    4 (Flair, T1, T1c, T2). We have evaluated our method based on imaging data
    provided by the Multimodal Brain Tumor Image Segmentation Challenge (BRATS)
    2013 and the BRATS 2016. Our method was ranked at the 1st place on the 2013
    Leaderboard dataset, the 2nd place on the 2013 Challenge dataset, and at the
    1st place in multi-temporal evaluation in the BRATS 2016.

    Application of Multi-channel 3D-cube Successive Convolution Network for Convective Storm Nowcasting

    Wei Zhang, Lei Han, Juanzhen Sun, Hanyang Guo, Jie Dai
    Comments: 9 pages, 9 figures, 3 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convective storm nowcasting has attracted substantial attention in various
    fields. Existing methods under a deep learning framework rely primarily on
    radar data. Although they perform nowcast storm advection well, it is still
    challenging to nowcast storm initiation and growth, due to the limitations of
    the radar observations. This paper describes the first attempt to nowcast storm
    initiation, growth, and advection simultaneously under a deep learning
    framework using multi-source meteorological data. To this end, we present a
    multi-channel 3D-cube successive convolution network (3D-SCN). As real-time
    re-analysis meteorological data can now provide valuable atmospheric boundary
    layer thermal dynamic information, which is essential to predict storm
    initiation and growth, both raw 3D radar and re-analysis data are used directly
    without any handcraft feature engineering. These data are formulated as
    multi-channel 3D cubes, to be fed into our network, which are convolved by
    cross-channel 3D convolutions. By stacking successive convolutional layers
    without pooling, we build an end-to-end trainable model for nowcasting.
    Experimental results show that deep learning methods achieve better performance
    than traditional extrapolation methods. The qualitative analyses of 3D-SCN show
    encouraging results of nowcasting of storm initiation, growth, and advection.

    Recognizing Dynamic Scenes with Deep Dual Descriptor based on Key Frames and Key Segments

    Sungeun Hong, Jongbin Ryu, Woobin Im, Hyun S. Yang
    Comments: 26 pages, 7 figures, 8 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Dynamic scene recognition is a challenging problem in characterizing a
    collection of static appearances and dynamic patterns in moving scenes. While
    existing methods focus on reliable capturing of static and dynamic information,
    few works have explored frame selection from a dynamic scene sequence. In this
    paper, we propose dynamic scene recognition using a deep dual descriptor based
    on ‘key frames’ and ‘key segments.’ Key frames that reflect the feature
    distribution of the sequence with a small number are used for capturing salient
    static appearances. Key segments, which are captured from the area around each
    key frame, provide an additional discriminative power by dynamic patterns
    within short time intervals. To this end, two types of transferred
    convolutional neural network features are used in our approach. A fully
    connected layer is used to select the key frames and key segments, while the
    convolutional layer is used to describe them. We conducted experiments using
    public datasets. Owing to a lack of benchmark datasets, we constructed a
    dataset comprised of 23 dynamic scene classes with 10 videos per class. The
    evaluation results demonstrated the state-of-the-art performance of the
    proposed method.

    Deep Heterogeneous Feature Fusion for Template-Based Face Recognition

    Navaneeth Bodla, Jingxiao Zheng, Hongyu Xu, Jun-Cheng Chen, Carlos Castillo, Rama Chellappa
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Although deep learning has yielded impressive performance for face
    recognition, many studies have shown that different networks learn different
    feature maps: while some networks are more receptive to pose and illumination
    others appear to capture more local information. Thus, in this work, we propose
    a deep heterogeneous feature fusion network to exploit the complementary
    information present in features generated by different deep convolutional
    neural networks (DCNNs) for template-based face recognition, where a template
    refers to a set of still face images or video frames from different sources
    which introduces more blur, pose, illumination and other variations than
    traditional face datasets. The proposed approach efficiently fuses the
    discriminative information of different deep features by 1) jointly learning
    the non-linear high-dimensional projection of the deep features and 2)
    generating a more discriminative template representation which preserves the
    inherent geometry of the deep features in the feature space. Experimental
    results on the IARPA Janus Challenge Set 3 (Janus CS3) dataset demonstrate that
    the proposed method can effectively improve the recognition performance. In
    addition, we also present a series of covariate experiments on the face
    verification task for in-depth qualitative evaluations for the proposed
    approach.

    Analyzing the Weighted Nuclear Norm Minimization and Nuclear Norm Minimization based on Group Sparse Representation

    Zhiyuan Zha, Xinggan Zhang, Qiong Wang, Yechao Bai, Lan Tang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The nuclear norm minimization (NNM) tends to over-shrink the rank components
    and treats the different rank components equally, thus limits its capability
    and flexibility in practical applications. Recent advances have suggested that
    the weighted nuclear norm minimization (WNNM) is expected to be more
    appropriate than NNM. However, it still lacks a plausible mathematical
    explanation why WNNM is more appropriate than NNM. In this paper, we analyze
    the WNNM and NNM from a point of the group sparse representation (GSR).
    Firstly, an adaptive dictionary for each group is designed. Then we show
    mathematically that WNNM is more appropriate than NNM. We exploit the proposed
    scheme to two typical low level vision tasks, including image deblurring and
    image compressive sensing (CS) recovery. Experimental results have demonstrated
    that the proposed scheme outperforms many state-of-the-art methods.

    Learning from Ambiguously Labeled Face Images

    Ching-Hui Chen, Vishal M. Patel, Rama Chellappa
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Learning a classifier from ambiguously labeled face images is challenging
    since training images are not always explicitly-labeled. For instance, face
    images of two persons in a news photo are not explicitly labeled by their names
    in the caption. We propose a Matrix Completion for Ambiguity Resolution (MCar)
    method for predicting the actual labels from ambiguously labeled images. This
    step is followed by learning a standard supervised classifier from the
    disambiguated labels to classify new images. To prevent the majority labels
    from dominating the result of MCar, we generalize MCar to a weighted MCar
    (WMCar) that handles label imbalance. Since WMCar outputs a soft labeling
    vector of reduced ambiguity for each instance, we can iteratively refine it by
    feeding it as the input to WMCar. Nevertheless, such an iterative
    implementation can be affected by the noisy soft labeling vectors, and thus the
    performance may degrade. Our proposed Iterative Candidate Elimination (ICE)
    procedure makes the iterative ambiguity resolution possible by gradually
    eliminating a portion of least likely candidates in ambiguously labeled face.
    We further extend MCar to incorporate the labeling constraints between
    instances when such prior knowledge is available. Compared to existing methods,
    our approach demonstrates improvement on several ambiguously labeled datasets.

    ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes

    Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A key requirement for leveraging supervised deep learning methods is the
    availability of large, labeled datasets. Unfortunately, in the context of RGB-D
    scene understanding, very little data is available — current datasets cover a
    small range of scene views and have limited semantic annotations. To address
    this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views
    in 1513 scenes annotated with 3D camera poses, surface reconstructions, and
    semantic segmentations. To collect this data, we designed an easy-to-use and
    scalable RGB-D capture system that includes automated surface reconstruction
    and crowdsourced semantic annotation. We show that using this data helps
    achieve state-of-the-art performance on several 3D scene understanding tasks,
    including 3D object classification, semantic voxel labeling, and CAD model
    retrieval. The dataset is freely available at this http URL

    Enhanced Facial Recognition Framework based on Skin Tone and False Alarm Rejection

    Ali Sharifara, Mohd Shafry Mohd Rahim, Farhad Navabifar, Dylan Ebert, Amir Ghaderi, Michalis Papakostas
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Face detection is one of the challenging tasks in computer vision. Human face
    detection plays an essential role in the first stage of face processing
    applications such as face recognition, face tracking, image database
    management, etc. In these applications, face objects often come from an
    inconsequential part of images that contain variations, namely different
    illumination, poses, and occlusion. These variations can decrease face
    detection rate noticeably. Most existing face detection approaches are not
    accurate, as they have not been able to resolve unstructured images due to
    large appearance variations and can only detect human faces under one
    particular variation. Existing frameworks of face detection need enhancements
    to detect human faces under the stated variations to improve detection rate and
    reduce detection time. In this study, an enhanced face detection framework is
    proposed to improve detection rate based on skin color and provide a validation
    process. A preliminary segmentation of the input images based on skin color can
    significantly reduce search space and accelerate the process of human face
    detection. The primary detection is based on Haar-like features and the
    Adaboost algorithm. A validation process is introduced to reject non-face
    objects, which might occur during the face detection process. The validation
    process is based on two-stage Extended Local Binary Patterns. The experimental
    results on the CMU-MIT and Caltech 10000 datasets over a wide range of facial
    variations in different colors, positions, scales, and lighting conditions
    indicated a successful face detection rate.

    Filling missing data in point clouds by merging structured and unstructured point clouds

    Franziska Lippoldt, Hartmut Schwandt
    Comments: 6 pages, 1 figure, in preparation
    Subjects: Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV); Discrete Mathematics (cs.DM)

    Point clouds arising from structured data, mainly as a result of CT scans,
    provides special properties on the distribution of points and the distances
    between those. Yet often, the amount of data provided can not compare to
    unstructured point clouds, i.e. data that arises from 3D light scans or laser
    scans. This article hereby proposes an approach to extend structured data and
    enhance the quality by inserting selected points from an unstructured point
    cloud. The resulting point cloud still has a partial structure that is called
    “half-structure”. In this way, missing data that can not be optimally recovered
    through other surface reconstruction methods can be completed.


    Artificial Intelligence

    A Spacetime Approach to Generalized Cognitive Reasoning in Multi-scale Learning

    Mark Burgess
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    In modern machine learning, pattern recognition replaces realtime semantic
    reasoning. The mapping from input to output is learned with fixed semantics by
    training outcomes deliberately. This is an expensive and static approach which
    depends heavily on the availability of a very particular kind of prior raining
    data to make inferences in a single step. Conventional semantic network
    approaches, on the other hand, base multi-step reasoning on modal logics and
    handcrafted ontologies, which are {em ad hoc}, expensive to construct, and
    fragile to inconsistency. Both approaches may be enhanced by a hybrid approach,
    which completely separates reasoning from pattern recognition. In this report,
    a quasi-linguistic approach to knowledge representation is discussed, motivated
    by spacetime structure. Tokenized patterns from diverse sources are integrated
    to build a lightly constrained and approximately scale-free network. This is
    then be parsed with very simple recursive algorithms to generate
    `brainstorming’ sets of reasoned knowledge.

    Local Search for Minimum Weight Dominating Set with Two-Level Configuration Checking and Frequency Based Scoring Function

    Yiyuan Wang, Shaowei Cai, Minghao Yin
    Comments: 29 pages, 1 figure
    Journal-ref: JAIR 58 (2017) 267-295
    Subjects: Artificial Intelligence (cs.AI)

    The Minimum Weight Dominating Set (MWDS) problem is an important
    generalization of the Minimum Dominating Set (MDS) problem with extensive
    applications. This paper proposes a new local search algorithm for the MWDS
    problem, which is based on two new ideas. The first idea is a heuristic called
    two-level configuration checking (CC2), which is a new variant of a recent
    powerful configuration checking strategy (CC) for effectively avoiding the
    recent search paths. The second idea is a novel scoring function based on the
    frequency of being uncovered of vertices. Our algorithm is called CC2FS,
    according to the names of the two ideas. The experimental results show that,
    CC2FS performs much better than some state-of-the-art algorithms in terms of
    solution quality on a broad range of MWDS benchmarks.

    Developing an ontology for the access to the contents of an archival fonds: the case of the Catasto Gregoriano

    Lina Antonietta Coppola
    Comments: in Italian
    Subjects: Artificial Intelligence (cs.AI); Digital Libraries (cs.DL)

    The research was proposed to exploit and extend the relational and contextual
    nature of the information assets of the Catasto Gregoriano, kept at the
    Archivio di Stato in Rome. Developed within the MODEUS project (Making Open
    Data Effectively Usable), this study originates from the following key ideas of
    MODEUS: to require Open Data to be expressed in terms of an ontology, and to
    include such an ontology as a documentation of the data themselves. Thus, Open
    Data are naturally linked by means of the ontology, which meets the
    requirements of the Linked Open Data vision.

    Entropy Non-increasing Games for the Improvement of Dataflow Programming

    Norbert Bátfai, Renátó Besenczi, Gergő Bogacsovics, Fanny Monori
    Comments: 15 pages, 7 figures
    Subjects: Artificial Intelligence (cs.AI)

    In this article, we introduce a new conception of a family of esport games
    called Samu Entropy to try to improve dataflow program graphs like the ones
    that are based on Google’s TensorFlow. Currently, the Samu Entropy project
    specifies only requirements for new esport games to be developed with
    particular attention to the investigation of the relationship between esport
    and artificial intelligence. It is quite obvious that there is a very close and
    natural relationship between esport games and artificial intelligence.
    Furthermore, the project Samu Entropy focuses not only on using artificial
    intelligence, but on creating AI in a new way. We present a reference game
    called Face Battle that implements the Samu Entropy requirements.

    Visualizing Deep Neural Network Decisions: Prediction Difference Analysis

    Luisa M Zintgraf, Taco S Cohen, Tameem Adel, Max Welling
    Comments: ICLR2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    This article presents the prediction difference analysis method for
    visualizing the response of a deep neural network to a specific input. When
    classifying images, the method highlights areas in a given input image that
    provide evidence for or against a certain class. It overcomes several
    shortcoming of previous methods and provides great additional insight into the
    decision making process of classifiers. Making neural network decisions
    interpretable through visualization is important both to improve models and to
    accelerate the adoption of black-box classifiers in application areas such as
    medicine. We illustrate the method in experiments on natural images (ImageNet
    data), as well as medical images (MRI brain scans).

    On the Discrepancy Between Kleinberg's Clustering Axioms and (k)-Means Clustering Algorithm Behavior

    Robert Kłopotek, Mieczysław Kłopotek
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    This paper investigates the validity of Kleinberg’s axioms for clustering
    functions with respect to the quite popular clustering algorithm called
    (k)-means. While Kleinberg’s axioms have been discussed heavily in the past, we
    concentrate here on the case predominantly relevant for (k)-means algorithm,
    that is behavior embedded in Euclidean space. We point at some contradictions
    and counter intuitiveness aspects of this axiomatic set within (mathbb{R}^m)
    that were evidently not discussed so far. Our results suggest that apparently
    without defining clearly what kind of clusters we expect we will not be able to
    construct a valid axiomatic system. In particular we look at the shape and the
    gaps between the clusters. Finally we demonstrate that there exist several ways
    to reconcile the formulation of the axioms with their intended meaning and that
    under this reformulation the axioms stop to be contradictory and the real-world
    (k)-means algorithm conforms to this axiomatic system.

    Frustratingly Short Attention Spans in Neural Language Modeling

    Michał Daniluk, Tim Rocktäschel, Johannes Welbl, Sebastian Riedel
    Comments: Published as a conference paper at ICLR 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Neural language models predict the next token using a latent representation
    of the immediate token history. Recently, various methods for augmenting neural
    language models with an attention mechanism over a differentiable memory have
    been proposed. For predicting the next token, these models query information
    from a memory of the recent history which can facilitate learning mid- and
    long-range dependencies. However, conventional attention mechanisms used in
    memory-augmented neural language models produce a single output vector per time
    step. This vector is used both for predicting the next token as well as for the
    key and value of a differentiable memory of a token history. In this paper, we
    propose a neural language model with a key-value attention mechanism that
    outputs separate representations for the key and value of a differentiable
    memory, as well as for encoding the next-word distribution. This model
    outperforms existing memory-augmented neural language models on two corpora.
    Yet, we found that our method mainly utilizes a memory of the five most recent
    output representations. This led to the unexpected main finding that a much
    simpler model based only on the concatenation of recent output representations
    from previous time steps is on par with more sophisticated memory-augmented
    neural language models.

    Efficient Multi-task Feature and Relationship Learning

    Han Zhao, Otilia Stretcu, Renato Negrinho, Alex Smola, Geoff Gordon
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    In this paper we propose a multi-convex framework for multi-task learning
    that improves predictions by learning relationships both between tasks and
    between features. Our framework is a generalization of related methods in
    multi-task learning, that either learn task relationships, or feature
    relationships, but not both. We start with a hierarchical Bayesian model, and
    use the empirical Bayes method to transform the underlying inference problem
    into a multi-convex optimization problem. We propose a coordinate-wise
    minimization algorithm that has a closed form solution for each block
    subproblem. Naively these solutions would be expensive to compute, but by using
    the theory of doubly stochastic matrices, we are able to reduce the underlying
    matrix optimization subproblem into a minimum weight perfect matching problem
    on a complete bipartite graph, and solve it analytically and efficiently. To
    solve the weight learning subproblem, we propose three different strategies,
    including a gradient descent method with linear convergence guarantee when the
    instances are not shared by multiple tasks, and a numerical solution based on
    Sylvester equation when instances are shared. We demonstrate the efficiency of
    our method on both synthetic datasets and real-world datasets. Experiments show
    that the proposed optimization method is orders of magnitude faster than an
    off-the-shelf projected gradient method, and our model is able to exploit the
    correlation structures among multiple tasks and features.


    Computation and Language

    Automated Identification of Drug-Drug Interactions in Pediatric Congestive Heart Failure Patients

    Daniel Miller
    Comments: Final project report for CS 221: Artificial Intelligence, Fall 2016 at Stanford University
    Subjects: Computation and Language (cs.CL)

    Congestive Heart Failure, or CHF, is a serious medical condition that can
    result in fluid buildup in the body as a result of a weak heart. When the heart
    can’t pump enough blood to efficiently deliver nutrients and oxygen to the
    body, kidney function may be impaired, resulting in fluid retention. CHF
    patients require a broad drug regimen to maintain the delicate system balance,
    particularly between their heart and kidneys. These drugs include ACE
    inhibitors and Beta Blockers to control blood pressure, anticoagulants to
    prevent blood clots, and diuretics to reduce fluid overload. Many of these
    drugs may interact, and potential effects of these interactions must be weighed
    against their benefits. For this project, we consider a set of 44 drugs
    identified as specifically relevant for treating CHF by pediatric cardiologists
    at Lucile Packard Children’s Hospital. This list was generated as part of our
    current work at the LPCH Heart Center. The goal of this project is to identify
    and evaluate potentially harmful drug-drug interactions (DDIs) within pediatric
    patients with Congestive Heart Failure. This identification will be done
    autonomously, so that it may continuously update by evaluating newly published
    literature.

    Frustratingly Short Attention Spans in Neural Language Modeling

    Michał Daniluk, Tim Rocktäschel, Johannes Welbl, Sebastian Riedel
    Comments: Published as a conference paper at ICLR 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Neural language models predict the next token using a latent representation
    of the immediate token history. Recently, various methods for augmenting neural
    language models with an attention mechanism over a differentiable memory have
    been proposed. For predicting the next token, these models query information
    from a memory of the recent history which can facilitate learning mid- and
    long-range dependencies. However, conventional attention mechanisms used in
    memory-augmented neural language models produce a single output vector per time
    step. This vector is used both for predicting the next token as well as for the
    key and value of a differentiable memory of a token history. In this paper, we
    propose a neural language model with a key-value attention mechanism that
    outputs separate representations for the key and value of a differentiable
    memory, as well as for encoding the next-word distribution. This model
    outperforms existing memory-augmented neural language models on two corpora.
    Yet, we found that our method mainly utilizes a memory of the five most recent
    output representations. This led to the unexpected main finding that a much
    simpler model based only on the concatenation of recent output representations
    from previous time steps is on par with more sophisticated memory-augmented
    neural language models.

    A Dependency-Based Neural Reordering Model for Statistical Machine Translation

    Christian Hadiwinoto, Hwee Tou Ng
    Comments: 7 pages, 3 figures, Proceedings of AAAI-17
    Journal-ref: Proceedings of AAAI-17 (2017)
    Subjects: Computation and Language (cs.CL)

    In machine translation (MT) that involves translating between two languages
    with significant differences in word order, determining the correct word order
    of translated words is a major challenge. The dependency parse tree of a source
    sentence can help to determine the correct word order of the translated words.
    In this paper, we present a novel reordering approach utilizing a neural
    network and dependency-based embeddings to predict whether the translations of
    two source words linked by a dependency relation should remain in the same
    order or should be swapped in the translated sentence. Experiments on
    Chinese-to-English translation show that our approach yields a statistically
    significant improvement of 0.57 BLEU point on benchmark NIST test sets,
    compared to our prior state-of-the-art statistical MT system that uses sparse
    dependency-based reordering features.

    Transfer Learning for Low-Resource Chinese Word Segmentation with a Novel Neural Network

    Jingjing Xu, Xu Sun
    Subjects: Computation and Language (cs.CL)

    Recent works have been shown effective in using neural networks for Chinese
    word segmentation. However, these models rely on large-scale data and are less
    effective for low-resource datasets because of insufficient training data.
    Thus, we propose a transfer learning method to improve low-resource word
    segmentation by leveraging high-resource corpora. First, we train a teacher
    model on high-resource corpora and then use the learned knowledge to initialize
    a student model. Second, a weighted data similarity method is proposed to train
    the student model on low-resource data with the help of high-resource corpora.
    Finally, given that insufficient data puts forward higher requirements for
    feature extraction, we propose a novel neural network which improves feature
    learning. Experiment results show that our work significantly improves the
    performance on low-resource datasets: 2.3% and 1.5% F-score on PKU and CTB
    datasets. Furthermore, this paper achieves state-of-the-art results: 96.1%, and
    96.2% F-score on PKU and CTB datasets. Besides, we explore an asynchronous
    parallel method on neural word segmentation to speed up training. The parallel
    method accelerates training substantially and is almost five times faster than
    a serial mode.

    Automated Phrase Mining from Massive Text Corpora

    Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, Jiawei Han
    Subjects: Computation and Language (cs.CL)

    As one of the fundamental tasks in text analysis, phrase mining aims at
    extracting quality phrases from a text corpus. Phrase mining is important in
    various tasks including automatic term recognition, document indexing,
    keyphrase extraction, and topic modeling. Most existing methods rely on
    complex, trained linguistic analyzers, and thus likely have unsatisfactory
    performance on text corpora of new domains and genres without extra but
    expensive adaption. Recently, a few data-driven methods have been developed
    successfully for extraction of phrases from massive domain-specific text.
    However, none of the state-of-the-art models is fully automated because they
    require human experts for designing rules or labeling phrases.

    In this paper, we propose a novel framework for automated phrase mining,
    AutoPhrase, which can achieve high performance with minimal human effort. Two
    new techniques have been developed: (1) by leveraging knowledge bases, a robust
    positive-only distant training method can avoid extra human labeling effort;
    and (2) when the part-of-speech (POS) tagger is available, a POS-guided phrasal
    segmentation model can better understand the syntactic information for the
    particular language and further enhance the performance by considering the
    context. Note that, AutoPhrase can support any language as long as a general
    knowledge base (e.g., Wikipedia) in that language are available, while
    benefiting from, but not requiring, a POS tagger. Compared to the
    state-of-the-art methods, the new method has shown significant improvements on
    effectiveness on five real-world datasets in different domains and languages.

    A case study on using speech-to-translation alignments for language documentation

    Antonios Anastasopoulos, David Chiang
    Comments: to be presented at ComputEL-2
    Subjects: Computation and Language (cs.CL)

    For many low-resource or endangered languages, spoken language resources are
    more likely to be annotated with translations than with transcriptions. Recent
    work exploits such annotations to produce speech-to-translation alignments,
    without access to any text transcriptions. We investigate whether providing
    such information can aid in producing better (mismatched) crowdsourced
    transcriptions, which in turn could be valuable for training speech recognition
    systems, and show that they can indeed be beneficial through a small-scale case
    study as a proof-of-concept. We also present a simple phonetically aware string
    averaging technique that produces transcriptions of higher quality.


    Distributed, Parallel, and Cluster Computing

    Leveraging cloud based big data analytics in knowledge management for enhanced decision making in organizations

    Mohammad Shorfuzzaman
    Journal-ref: International Journal of Distributed and Parallel Systems (IJDPS)
    Vol.8, No.1, January 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computers and Society (cs.CY)

    In recent past, big data opportunities have gained much momentum to enhance
    knowledge management in organizations. However, big data due to its various
    properties like high volume, variety, and velocity can no longer be effectively
    stored and analyzed with traditional data management techniques to generate
    values for knowledge development. Hence, new technologies and architectures are
    required to store and analyze this big data through advanced data analytics and
    in turn generate vital real-time knowledge for effective decision making by
    organizations. More specifically, it is necessary to have a single
    infrastructure which provides common functionality of knowledge management, and
    flexible enough to handle different types of big data and big data analysis
    tasks. Cloud computing infrastructures capable of storing and processing large
    volume of data can be used for efficient big data processing because it
    minimizes the initial cost for the large-scale computing infrastructure
    demanded by big data analytics. This paper aims to explore the impact of big
    data analytics on knowledge management and proposes a cloud-based conceptual
    framework that can analyze big data in real time to facilitate enhanced
    decision making intended for competitive advantage. Thus, this framework will
    pave the way for organizations to explore the relationship between big data
    analytics and knowledge management which are mostly deemed as two distinct
    entities.

    Adding Concurrency to Smart Contracts

    Thomas Dickerson, Paul Gazzillo, Maurice Herlihy, Eric Koskinen
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Modern cryptocurrency systems, such as Ethereum, permit complex financial
    transactions through scripts called smart contracts. These smart contracts are
    executed many, many times, always without real concurrency. First, all smart
    contracts are serially executed by miners before appending them to the
    blockchain. Later, those contracts are serially re-executed by validators to
    verify that the smart contracts were executed correctly by miners.

    Serial execution limits system throughput and fails to exploit today’s
    concurrent multicore and cluster architectures. Nevertheless, serial execution
    appears to be required: contracts share state, and contract programming
    languages have a serial semantics.

    This paper presents a novel way to permit miners and validators to execute
    smart contracts in parallel, based on techniques adapted from software
    transactional memory. Miners execute smart contracts speculatively in parallel,
    allowing non-conflicting contracts to proceed concurrently, and “discovering” a
    serializable concurrent schedule for a block’s transactions, This schedule is
    captured and encoded as a deterministic fork-join program used by validators to
    re-execute the miner’s parallel schedule deterministically but concurrently.

    Smart contract benchmarks run on a JVM with ScalaSTM show that a speedup of
    of 1.33x can be obtained for miners and 1.69x for validators with just three
    concurrent threads.

    A Concurrency-Optimal Binary Search Tree

    Vitaly Aksenov, Vincent Gramoli, Petr Kuznetsov, Anna Malova, Srivatsan Ravi
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    The paper presents the first emph{concurrency-optimal} implementation of a
    binary search tree (BST). The implementation, based on a standard sequential
    implementation of an internal tree, ensures that every emph{schedule} is
    accepted, i.e., interleaving of steps of the sequential code, unless
    linearizability is violated. To ensure this property, we use a novel read-write
    locking scheme that protects tree emph{edges} in addition to nodes.

    Our implementation outperforms the state-of-the art BSTs on most basic
    workloads, which suggests that optimizing the set of accepted schedules of the
    sequential code can be an adequate design principle for efficient concurrent
    data structures.

    A parallel implementation of the Synchronised Louvain method

    Benjamin Chiêm, Andine Havelange, Paul Van Dooren
    Comments: 20 pages
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI)

    Community detection in networks is a very actual and important field of
    research with applications in many areas. But, given that the amount of
    processed data increases more and more, existing algorithms need to be adapted
    for very large graphs. The objective of this project was to parallelise the
    Synchronised Louvain Method, a community detection algorithm developed by
    Arnaud Browet, in order to improve its performances in terms of computation
    time and thus be able to faster detect communities in very large graphs. To
    reach this goal, we used the API OpenMP to parallelise the algorithm and then
    carried out performance tests. We studied the computation time and speedup of
    the parallelised algorithm and were able to bring out some qualitative trends.
    We obtained a great speedup, compared with the theoretical prediction of Amdahl
    law. To conclude, using the parallel implementation of the algorithm of Browet
    on large graphs seems to give good results, both in terms of computation time
    and speedup. Further tests should be carried out in order to obtain more
    quantitative results.


    Learning

    Distributed deep learning on edge-devices: feasibility via adaptive compression

    Corentin Hardy, Erwan Le Merrer, Bruno Sericola
    Subjects: Learning (cs.LG)

    The state-of-the-art results provided by deep learning come at the price of
    an intensive use of computing resources. The leading frameworks (eg.,
    TensorFlow) are executed on GPUs or on high-end servers in datacenters. On the
    other end, there is a proliferation of personal devices with possibly free CPU
    cycles. In this paper, we ask the following question: Is distributed deep
    learning computation on WAN connected devices feasible, in spite of the traffic
    caused by learning tasks? We show that such a setup rises some important
    challenges, most notably the ingress traffic that the servers hosting the
    up-to-date model have to sustain. In order to reduce this stress, we propose
    AdaComp, a new algorithm for compressing worker updates to the model on the
    server. Applicable to stochastic gradient descent based approaches, it combines
    efficient gradient selection and learning rate modulation. We then experiment
    and measure the impact of compression and device reliability on the accuracy of
    learned models. To do so, we leverage an emulator platform we developed, that
    embeds the TensorFlow code into Linux containers. We report a reduction of the
    total amount of data sent by workers to the server by two order of magnitude
    (eg., 191-fold reduction for a convolutional network on the MNIST dataset),
    when compared to the standard algorithm based on asynchronous stochastic
    gradient descent, while maintaining model accuracy.

    Generative Temporal Models with Memory

    Mevlana Gemici, Chia-Chun Hung, Adam Santoro, Greg Wayne, Shakir Mohamed, Danilo J. Rezende, David Amos, Timothy Lillicrap
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We consider the general problem of modeling temporal data with long-range
    dependencies, wherein new observations are fully or partially predictable based
    on temporally-distant, past observations. A sufficiently powerful temporal
    model should separate predictable elements of the sequence from unpredictable
    elements, express uncertainty about those unpredictable elements, and rapidly
    identify novel elements that may help to predict the future. To create such
    models, we introduce Generative Temporal Models augmented with external memory
    systems. They are developed within the variational inference framework, which
    provides both a practical training methodology and methods to gain insight into
    the models’ operation. We show, on a range of problems with sparse, long-term
    temporal dependencies, that these models store information from early in a
    sequence, and reuse this stored information efficiently. This allows them to
    perform substantially better than existing models based on well-known recurrent
    neural networks, like LSTMs.

    On the Discrepancy Between Kleinberg's Clustering Axioms and (k)-Means Clustering Algorithm Behavior

    Robert Kłopotek, Mieczysław Kłopotek
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    This paper investigates the validity of Kleinberg’s axioms for clustering
    functions with respect to the quite popular clustering algorithm called
    (k)-means. While Kleinberg’s axioms have been discussed heavily in the past, we
    concentrate here on the case predominantly relevant for (k)-means algorithm,
    that is behavior embedded in Euclidean space. We point at some contradictions
    and counter intuitiveness aspects of this axiomatic set within (mathbb{R}^m)
    that were evidently not discussed so far. Our results suggest that apparently
    without defining clearly what kind of clusters we expect we will not be able to
    construct a valid axiomatic system. In particular we look at the shape and the
    gaps between the clusters. Finally we demonstrate that there exist several ways
    to reconcile the formulation of the axioms with their intended meaning and that
    under this reformulation the axioms stop to be contradictory and the real-world
    (k)-means algorithm conforms to this axiomatic system.

    Efficient Multi-task Feature and Relationship Learning

    Han Zhao, Otilia Stretcu, Renato Negrinho, Alex Smola, Geoff Gordon
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    In this paper we propose a multi-convex framework for multi-task learning
    that improves predictions by learning relationships both between tasks and
    between features. Our framework is a generalization of related methods in
    multi-task learning, that either learn task relationships, or feature
    relationships, but not both. We start with a hierarchical Bayesian model, and
    use the empirical Bayes method to transform the underlying inference problem
    into a multi-convex optimization problem. We propose a coordinate-wise
    minimization algorithm that has a closed form solution for each block
    subproblem. Naively these solutions would be expensive to compute, but by using
    the theory of doubly stochastic matrices, we are able to reduce the underlying
    matrix optimization subproblem into a minimum weight perfect matching problem
    on a complete bipartite graph, and solve it analytically and efficiently. To
    solve the weight learning subproblem, we propose three different strategies,
    including a gradient descent method with linear convergence guarantee when the
    instances are not shared by multiple tasks, and a numerical solution based on
    Sylvester equation when instances are shared. We demonstrate the efficiency of
    our method on both synthetic datasets and real-world datasets. Experiments show
    that the proposed optimization method is orders of magnitude faster than an
    off-the-shelf projected gradient method, and our model is able to exploit the
    correlation structures among multiple tasks and features.

    Small Boxes Big Data: A Deep Learning Approach to Optimize Variable Sized Bin Packing

    Feng Mao, Edgar Blanco, Mingang Fu, Rohit Jain, Anurag Gupta, Sebastien Mancel, Rong Yuan, Stephen Guo, Sai Kumar, Yayang Tian
    Comments: The Third IEEE International Conference on Big Data Computing Service and Applications, 2017
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Bin Packing problems have been widely studied because of their broad
    applications in different domains. Known as a set of NP-hard problems, they
    have different vari- ations and many heuristics have been proposed for
    obtaining approximate solutions. Specifically, for the 1D variable sized bin
    packing problem, the two key sets of optimization heuristics are the bin
    assignment and the bin allocation. Usually the performance of a single static
    optimization heuristic can not beat that of a dynamic one which is tailored for
    each bin packing instance. Building such an adaptive system requires modeling
    the relationship between bin features and packing perform profiles. The primary
    drawbacks of traditional AI machine learnings for this task are the natural
    limitations of feature engineering, such as the curse of dimensionality and
    feature selection quality. We introduce a deep learning approach to overcome
    the drawbacks by applying a large training data set, auto feature selection and
    fast, accurate labeling. We show in this paper how to build such a system by
    both theoretical formulation and engineering practices. Our prediction system
    achieves up to 89% training accuracy and 72% validation accuracy to select the
    best heuristic that can generate a better quality bin packing solution.

    Support Vector Machines and generalisation in HEP

    Adrian Bevan, Rodrigo Gamboa Goñi, Jon Hays, Tom Stevenson
    Comments: 8 pages, submitted to the proceedings of the CHEP 2016 conference
    Subjects: Data Analysis, Statistics and Probability (physics.data-an); Learning (cs.LG); High Energy Physics – Experiment (hep-ex)

    We review the concept of Support Vector Machines (SVMs) and discuss examples
    of their use in a number of scenarios. Several SVM implementations have been
    used in HEP and we exemplify this algorithm using the Toolkit for Multivariate
    Analysis (TMVA) implementation. We discuss examples relevant to HEP including
    background suppression for (H o au^+ au^-) at the LHC with several different
    kernel functions. Performance benchmarking leads to the issue of generalisation
    of hyper-parameter selection. The avoidance of fine tuning (over training or
    over fitting) in MVA hyper-parameter optimisation, i.e. the ability to ensure
    generalised performance of an MVA that is independent of the training,
    validation and test samples, is of utmost importance. We discuss this issue and
    compare and contrast performance of hold-out and k-fold cross-validation. We
    have extended the SVM functionality and introduced tools to facilitate cross
    validation in TMVA and present results based on these improvements.

    Nearest Labelset Using Double Distances for Multi-label Classification

    Hyukjun Gweon, Matthias Schonlau, Stefan Steiner
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Multi-label classification is a type of supervised learning where an instance
    may belong to multiple labels simultaneously. Predicting each label
    independently has been criticized for not exploiting any correlation between
    labels. In this paper we propose a novel approach, Nearest Labelset using
    Double Distances (NLDD), that predicts the labelset observed in the training
    data that minimizes a weighted sum of the distances in both the feature space
    and the label space to the new instance. The weights specify the relative
    tradeoff between the two distances. The weights are estimated from a binomial
    regression of the number of misclassified labels as a function of the two
    distances. Model parameters are estimated by maximum likelihood. NLDD only
    considers labelsets observed in the training data, thus implicitly taking into
    account label dependencies. Experiments on benchmark multi-label data sets show
    that the proposed method on average outperforms other well-known approaches in
    terms of Hamming loss, 0/1 loss, and multi-label accuracy and ranks second
    after ECC on the F-measure.

    A Spacetime Approach to Generalized Cognitive Reasoning in Multi-scale Learning

    Mark Burgess
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    In modern machine learning, pattern recognition replaces realtime semantic
    reasoning. The mapping from input to output is learned with fixed semantics by
    training outcomes deliberately. This is an expensive and static approach which
    depends heavily on the availability of a very particular kind of prior raining
    data to make inferences in a single step. Conventional semantic network
    approaches, on the other hand, base multi-step reasoning on modal logics and
    handcrafted ontologies, which are {em ad hoc}, expensive to construct, and
    fragile to inconsistency. Both approaches may be enhanced by a hybrid approach,
    which completely separates reasoning from pattern recognition. In this report,
    a quasi-linguistic approach to knowledge representation is discussed, motivated
    by spacetime structure. Tokenized patterns from diverse sources are integrated
    to build a lightly constrained and approximately scale-free network. This is
    then be parsed with very simple recursive algorithms to generate
    `brainstorming’ sets of reasoned knowledge.

    Frustratingly Short Attention Spans in Neural Language Modeling

    Michał Daniluk, Tim Rocktäschel, Johannes Welbl, Sebastian Riedel
    Comments: Published as a conference paper at ICLR 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Neural language models predict the next token using a latent representation
    of the immediate token history. Recently, various methods for augmenting neural
    language models with an attention mechanism over a differentiable memory have
    been proposed. For predicting the next token, these models query information
    from a memory of the recent history which can facilitate learning mid- and
    long-range dependencies. However, conventional attention mechanisms used in
    memory-augmented neural language models produce a single output vector per time
    step. This vector is used both for predicting the next token as well as for the
    key and value of a differentiable memory of a token history. In this paper, we
    propose a neural language model with a key-value attention mechanism that
    outputs separate representations for the key and value of a differentiable
    memory, as well as for encoding the next-word distribution. This model
    outperforms existing memory-augmented neural language models on two corpora.
    Yet, we found that our method mainly utilizes a memory of the five most recent
    output representations. This led to the unexpected main finding that a much
    simpler model based only on the concatenation of recent output representations
    from previous time steps is on par with more sophisticated memory-augmented
    neural language models.

    Building Robust Stochastic Configuration Networks with Kernel Density Estimation

    Dianhui Wang, Ming Li
    Comments: 16 pages
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

    This paper aims at developing robust data modelling techniques using
    stochastic configuration networks (SCNs), where a weighted least squares method
    with the well-known kernel density estimation (KDE) is used in the design of
    SCNs. The alternating optimization (AO) technique is applied for iteratively
    building a robust SCN model that can reduce some negative impacts, caused by
    corrupted data or outliers, in learning process. Simulation studies are carried
    out on a function approximation and four benchmark datasets, also a case study
    on industrial application is reported. Comparisons against other robust
    modelling techniques, including the probabilistic robust learning algorithm for
    neural networks with random weights (PRNNRW) and an Improved RVFL, demonstrate
    that our proposed robust stochastic configuration algorithm with KDE (RSC-KED)
    perform favourably.


    Information Theory

    Quantized Compressed Sensing for Partial Random Circulant Matrices

    Joe-Mei Feng, Felix Krahmer, Rayan Saab
    Comments: 15 pages
    Subjects: Information Theory (cs.IT); Numerical Analysis (math.NA)

    We provide the first analysis of a non-trivial quantization scheme for
    compressed sensing measurements arising from structured measurements.
    Specifically, our analysis studies compressed sensing matrices consisting of
    rows selected at random, without replacement, from a circulant matrix generated
    by a random subgaussian vector. We quantize the measurements using stable,
    possibly one-bit, Sigma-Delta schemes, and use a reconstruction method based on
    convex optimization. We show that the part of the reconstruction error due to
    quantization decays polynomially in the number of measurements. This is in line
    with analogous results on Sigma-Delta quantization associated with random
    Gaussian or subgaussian matrices, and significantly better than results
    associated with the widely assumed memoryless scalar quantization. Moreover, we
    prove that our approach is stable and robust; i.e., the reconstruction error
    degrades gracefully in the presence of non-quantization noise and when the
    underlying signal is not strictly sparse. The analysis relies on results
    concerning subgaussian chaos processes as well as a variation of McDiarmid’s
    inequality.

    Comparison of Polar Decoders with Existing Low-Density Parity-Check and Turbo Decoders

    Alexios Balatsoukas-Stimming, Pascal Giard, Andreas Burg
    Comments: To appear at WCNC’17 Polar Coding in Wireless Communications: Theory and Implementation Workshop
    Subjects: Information Theory (cs.IT)

    Polar codes are a recently proposed family of provably capacity-achieving
    error-correction codes that received a lot of attention. While their
    theoretical properties render them interesting, their practicality compared to
    other types of codes has not been thoroughly studied. Towards this end, in this
    paper, we perform a comparison of polar decoders against LDPC and Turbo
    decoders that are used in existing communications standards. More specifically,
    we compare both the error-correction performance and the hardware efficiency of
    the corresponding hardware implementations. This comparison enables us to
    identify applications where polar codes are superior to existing
    error-correction coding solutions as well as to determine the most promising
    research direction in terms of the hardware implementation of polar decoders.

    Jamming Resistant Receivers for Massive MIMO

    Tan Tai Do, Emil Björnson, Erik G. Larsson
    Comments: Accepted in the 42nd IEEE Int. Conf. Acoust., Speech, and Signal Process. (ICASSP2017)
    Subjects: Information Theory (cs.IT)

    We design jamming resistant receivers to enhance the robustness of a massive
    MIMO uplink channel against jamming. In the pilot phase, we estimate not only
    the desired channel, but also the jamming channel by exploiting purposely
    unused pilot sequences. The jamming channel estimate is used to construct the
    linear receive filter to reduce impact that jamming has on the achievable
    rates. The performance of the proposed scheme is analytically and numerically
    evaluated. These results show that the proposed scheme greatly improves the
    rates, as compared to conventional receivers. Moreover, the proposed schemes
    still work well with stronger jamming power.

    A note on complete classification of ((δ+αu^2))-constacyclic codes of length (p^k) over (F_{p^m}+uF_{p^m}+u^2F_{p^m})

    Reza Sobhani, Zhonghua Sun, Liqi Wang, Shixin Zhu
    Subjects: Information Theory (cs.IT)

    For units (delta) and (alpha) in (F_{p^m}), the structure of
    ((delta+alpha u^2))-constacyclic codes of length (p^k) over
    (mathbb{F}_{p^m}+umathbb{F}_{p^m}+u^2mathbb{F}_{p^m}) is studied and
    self-dual ((delta+alpha u^2))-constacyclic codes are analyzed.

    The Rare Eclipse Problem on Tiles: Quantised Embeddings of Disjoint Convex Sets

    Valerio Cambareri, Chunlei Xu, Laurent Jacques
    Comments: 5 pages, 1 figure. A 5-page version of this draft was submitted to SampTA 2017
    Subjects: Information Theory (cs.IT)

    Quantised random embeddings are an efficient dimensionality reduction
    technique which preserves the distances of low-complexity signals up to some
    controllable additive and multiplicative distortions. In this work, we instead
    focus on verifying when this technique preserves the separability of two
    disjoint closed convex sets, i.e., in a quantised view of the “rare eclipse
    problem” introduced by Bandeira et al. in 2014. This separability would ensure
    exact classification of signals in such sets from the signatures output by this
    non-linear dimensionality reduction. We here present a result relating the
    embedding’s dimension, its quantiser resolution and the sets’ separation, as
    well as some numerically testable conditions to illustrate it. Experimental
    evidence is then provided in the special case of two (ell_2)-balls, tracing
    the phase transition curves that ensure these sets’ separability in the
    embedded domain.

    A Tractable Framework for Performance Analysis of Dense Multi-Antenna Networks

    Xianghao Yu, Chang Li, Jun Zhang, Khaled B. Letaief
    Comments: 7 pages, 2 figures, accepted to IEEE Int. Conf. Commun. (ICC), Paris, France, May 2016
    Subjects: Information Theory (cs.IT)

    Densifying the network and deploying more antennas at each access point are
    two principal ways to boost the capacity of wireless networks. However, due to
    the complicated distributions of random signal and interference channel gains,
    largely induced by various space-time processing techniques, it is highly
    challenging to quantitatively characterize the performance of dense
    multi-antenna networks. In this paper, using tools from stochastic geometry, a
    tractable framework is proposed for the analytical evaluation of such networks.
    The major result is an innovative representation of the coverage probability,
    as an induced (ell_1)-norm of a Toeplitz matrix. This compact representation
    incorporates lots of existing analytical results on single- and multi-antenna
    networks as special cases, and its evaluation is almost as simple as the
    single-antenna case with Rayleigh fading. To illustrate its effectiveness, we
    apply the proposed framework to investigate two kinds of prevalent dense
    wireless networks, i.e., physical layer security aware networks and
    millimeter-wave networks. In both examples, in addition to tractable analytical
    results of relevant performance metrics, insightful design guidelines are also
    analytically obtained.

    Characterizing the Rate-Memory Tradeoff in Cache Networks within a Factor of 2

    Qian Yu, Mohammad Ali Maddah-Ali, A. Salman Avestimehr
    Subjects: Information Theory (cs.IT)

    We consider a basic caching system, where a single server with a database of
    (N) files (e.g. movies) is connected to a set of (K) users through a shared
    bottleneck link. Each user has a local cache memory with a size of (M) files.
    The system operates in two phases: a placement phase, where each cache memory
    is populated up to its size from the database, and a following delivery phase,
    where each user requests a file from the database, and the server is
    responsible for delivering the requested contents. The objective is to design
    the two phases to minimize the load (peak or average) of the bottleneck link.
    We characterize the rate-memory tradeoff of the above caching system within a
    factor of (2.00884) for both the peak rate and the average rate (under uniform
    file popularity), where the best proved characterization in the current
    literature gives a factor of (4) and (4.7) respectively. Moreover, in the
    practically important case where the number of files ((N)) is large, we exactly
    characterize the tradeoff for systems with no more than (5) users, and
    characterize the tradeoff within a factor of (2) otherwise. We establish these
    results by developing novel information theoretic outer-bounds for the caching
    problem, which improves the state of the art and gives tight characterization
    in various cases.

    Time-Invariant LDPC Convolutional Codes

    Dimitris Achlioptas, Hamed Hassani, Wei Liu, Rüdiger Urbanke
    Comments: Submitted to 2017 IEEE International Symposium on Information Theory
    Subjects: Information Theory (cs.IT)

    Spatially coupled codes have been shown to universally achieve the capacity
    for a large class of channels. Many variants of such codes have been introduced
    to date. We discuss a further such variant that is particularly simple and is
    determined by a very small number of parameters. More precisely, we consider
    time-invariant low-density convolutional codes with very large constraint
    lengths.

    We show via simulations that, despite their extreme simplicity, such codes
    still show the threshold saturation behavior known from the spatially coupled
    codes discussed in the literature. Further, we show how the size of the typical
    minimum stopping set is related to basic parameters of the code. Due to their
    simplicity and good performance, these codes might be attractive from an
    implementation perspective.

    GDSP: A Graphical Perspective on the Distributed Storage Systems

    Saeid Sahraei, Michael Gastpar
    Subjects: Information Theory (cs.IT)

    The classical distributed storage problem can be modeled by a k-uniform {it
    complete} hyper-graph where vertices represent servers and hyper-edges
    represent users. Hence each hyper-edge should be able to recover the full file
    using only the memories of the vertices associated with it. This paper
    considers the generalization of this problem to {it arbitrary} hyper-graphs
    and to the case of multiple files, where each user is only interested in one, a
    problem we will refer to as the graphical distributed storage problem (GDSP).
    Specifically, we make progress in the analysis of minimum-storage codes for two
    main subproblems of the GDSP which extend the classical model in two
    independent directions: the case of an arbitrary graph with multiple files, and
    the case of an arbitrary hyper-graph with a single file.

    Coverage Analysis for Millimeter Wave Networks: The Impact of Directional Antenna Arrays

    Xianghao Yu, Jun Zhang, Martin Haenggi, Khaled B. Letaief
    Comments: 32 pages, 5 figures, submitted to IEEE Journal on Selected Areas in Communications, under revision
    Subjects: Information Theory (cs.IT)

    Millimeter wave (mmWave) communications is considered a promising technology
    for 5G networks. Exploiting beamforming gains with large-scale antenna arrays
    to combat the increased path loss at mmWave bands is one of its defining
    features. However, previous works on mmWave network analysis usually adopted
    oversimplified antenna patterns for tractability, which can lead to significant
    deviation from the performance with actual antenna patterns. In this paper,
    using tools from stochastic geometry, we carry out a comprehensive
    investigation on the impact of directional antenna arrays in mmWave networks.
    We first present a general and tractable framework for coverage analysis with
    arbitrary distributions for interference power and arbitrary antenna patterns.
    It is then applied to mmWave ad hoc and cellular networks, where two
    sophisticated antenna patterns with desirable accuracy and analytical
    tractability are proposed to approximate the actual antenna pattern. Compared
    with previous works, the proposed approximate antenna patterns help to obtain
    more insights on the role of directional antenna arrays in mmWave networks. In
    particular, it is shown that the coverage probabilities of both types of
    networks increase as a non-decreasing concave function with the antenna array
    size. The analytical results are verified to be effective and reliable through
    simulations, and numerical results also show that large-scale antenna arrays
    are required for satisfactory coverage in mmWave networks.

    Guess & Check Codes for Deletions and Synchronization

    Serge Kas Hanna, Salim El Rouayheb
    Comments: Submitted to ISIT 2017
    Subjects: Information Theory (cs.IT)

    We consider the problem of constructing codes that can correct (delta)
    deletions occurring in an arbitrary binary string of length (n) bits.
    Varshamov-Tenengolts (VT) codes can correct all possible single deletions
    ((delta=1)) with an asymptotically optimal redundancy. Finding similar codes
    for (delta geq 2) deletions is an open problem. We propose a new family of
    codes, that we call Guess & Check (GC) codes, that can correct, with high
    probability, a constant number of deletions (delta) occurring at uniformly
    random positions within an arbitrary string. The GC codes are based on MDS
    codes and have an asymptotically optimal redundancy that is (Theta(delta log
    n)). We provide deterministic polynomial time encoding and decoding schemes for
    these codes. We also describe the applications of GC codes to file
    synchronization.

    Decentralized Baseband Processing for Massive MU-MIMO Systems

    Kaipeng Li, Rishi Sharan, Yujun Chen, Tom Goldstein, Joseph R. Cavallaro, Christoph Studer
    Comments: 14 pages; submitted to a journal
    Subjects: Information Theory (cs.IT)

    Achieving high spectral efficiency in realistic massive multi-user (MU)
    multiple-input multiple-output (MIMO) wireless systems requires
    computationally-complex algorithms for data detection in the uplink (users
    transmit to base station) and beamforming in the downlink (base station
    transmits to users). Most existing algorithms are designed to be executed on
    centralized computing hardware at the base station (BS), which both results in
    prohibitive complexity for systems with hundreds or thousands of antennas and
    generates raw baseband data rates that exceed the limits of current
    interconnect technology and chip I/O interfaces. This paper proposes a novel
    decentralized baseband processing architecture that alleviates these
    bottlenecks by partitioning the BS antenna array into clusters, each associated
    with independent radio-frequency chains, analog and digital modulation
    circuitry, and computing hardware. For this architecture, we develop novel
    decentralized data detection and beamforming algorithms that only access local
    channel-state information and require low communication bandwidth among the
    clusters. We study the associated trade-offs between error-rate performance,
    computational complexity, and interconnect bandwidth, and we demonstrate the
    scalability of our solutions for massive MU-MIMO systems with thousands of BS
    antennas using reference implementations on a graphic processing unit (GPU)
    cluster.

    On the number of inequivalent MRD codes

    Kai-Uwe Schmidt, Yue Zhou
    Comments: 11 pages
    Subjects: Combinatorics (math.CO); Information Theory (cs.IT)

    Maximum rank-distance (MRD) codes are extremal codes in the space of (m imes
    n) matrices over a finite field, equipped with the rank metric. Up to
    generalizations, the classical examples of such codes were constructed in the
    1970s and are today known as Gabidulin codes. Motivated by several recent
    approaches to construct MRD codes that are inequivalent to Gabidulin codes, we
    study the equivalence issue for Gabidulin codes themselves. This shows in
    particular that the family of Gabidulin codes already contains a huge subset of
    MRD codes that are pairwise inequivalent, provided that (2le mle n-2).




沪ICP备19023445号-2号
友情链接