IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Fri, 24 Mar 2017

    我爱机器学习(52ml.net)发表于 2017-03-24 00:00:00
    love 0

    Neural and Evolutionary Computing

    Classification-based RNN machine translation using GRUs

    Ri Wang, Maysum Panju, Mahmood Gohari
    Comments: 7 pages, 1 figure; graduate course research project
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    We report the results of our classification-based machine translation model,
    built upon the framework of a recurrent neural network using gated recurrent
    units. Unlike other RNN models that attempt to maximize the overall conditional
    log probability of sentences against sentences, our model focuses a
    classification approach of estimating the conditional probability of the next
    word given the input sequence. This simpler approach using GRUs was hoped to be
    comparable with more complicated RNN models, but achievements in this
    implementation were modest and there remains a lot of room for improving this
    classification approach.

    Generative Adversarial Residual Pairwise Networks for One Shot Learning

    Akshay Mehrotra, Ambedkar Dukkipati
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Deep neural networks achieve unprecedented performance levels over many tasks
    and scale well with large quantities of data, but performance in the low-data
    regime and tasks like one shot learning still lags behind. While recent work
    suggests many hypotheses from better optimization to more complicated network
    structures, in this work we hypothesize that having a learnable and more
    expressive similarity objective is an essential missing component. Towards
    overcoming that, we propose a network design inspired by deep residual networks
    that allows the efficient computation of this more expressive pairwise
    similarity objective. Further, we argue that regularization is key in learning
    with small amounts of data, and propose an additional generator network based
    on the Generative Adversarial Networks where the discriminator is our residual
    pairwise network. This provides a strong regularizer by leveraging the
    generated data samples. The proposed model can generate plausible variations of
    exemplars over unseen classes and outperforms strong discriminative baselines
    for few shot classification tasks. Notably, our residual pairwise network
    design outperforms previous state-of-theart on the challenging mini-Imagenet
    dataset for one shot learning by getting over 55% accuracy for the 5-way
    classification task over unseen classes.

    The role of zero synapses in unsupervised feature learning

    Haiping Huang
    Comments: 5 pages, 4 figures
    Subjects: Neurons and Cognition (q-bio.NC); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Synapses in real neural circuits can take discrete values, including zero
    (silent or potential) synapses. The computational role of zero synapses in
    unsupervised feature learning of unlabeled noisy data is still unclear, yet
    important to understand how the sparseness of synaptic activity is shaped
    during learning and its relationship with receptive field formation. Here, we
    formulate this kind of sparse feature learning by statistical mechanics
    approach. We find that learning decreases the fraction of zero synapses, and
    when the fraction decreases rapidly around a critical data size, the
    intrinsically structured receptive field starts to develop. Further increasing
    the data size refines the receptive field, while a very small fraction of zero
    synapses remain to act as contour detectors. This feature is discovered not
    only in a handwritten digits dataset, but also in retinal neural activity
    measured in a natural movie stimuli experiment.

    Adversarial synapses: Hebbian/anti-Hebbian learning optimizes min-max objectives

    Cengiz Pehlevan, Anirvan Sengupta, Dmitri B. Chklovskii
    Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)

    A promising approach towards understanding neural networks is to view them as
    implementations of online algorithms optimizing principled objectives. Existing
    neural algorithms capturing both neural activity dynamics and synaptic weight
    updates implement the same operation, either minimization or maximization of
    the objective, with respect to each variable. Here, we derive neural networks
    from principled min-max objectives: by minimizing with respect to neural
    activity and feedforward synaptic weights, and maximizing with respect to
    lateral synaptic weights. In turn, the min-max objectives are obtained via the
    Hubbard-Stratonovich (HS) transform of similarity matching objectives. The
    resulting networks perform dimensionality reduction of the input data resorting
    only to biologically plausible local learning rules. The min-max nature of the
    objective is reflected in the antagonism between Hebbian feedforward and
    anti-Hebbian lateral learning in derived networks. We prove that the only
    stable fixed points of the network dynamics correspond to the principal
    subspace projection (PSP) or the principal subspace whitening (PSW). Finally,
    from the min-max objectives we derive novel formulations of dimensionality
    reduction using fractional matrix exponents.


    Computer Vision and Pattern Recognition

    Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling

    Alexander Richard, Hilde Kuehne, Juergen Gall
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present an approach for weakly supervised learning of human actions. Given
    a set of videos and an ordered list of the occurring actions, the goal is to
    infer start and end frames of the related action classes within the video and
    to train the respective action classifiers without any need for hand labeled
    frame boundaries. To address this task, we propose a combination of a
    discriminative representation of subactions, modeled by a recurrent neural
    network, and a coarse probabilistic model to allow for a temporal alignment and
    inference over long sequences. While this system alone already generates good
    results, we show that the performance can be further improved by approximating
    the number of subactions to the characteristics of the different action
    classes. To this end, we adapt the number of subaction classes by iterating
    realignment and reestimation during training. The proposed system is evaluated
    on two benchmark datasets, the Breakfast and the Hollywood extended dataset,
    showing a competitive performance on various weak learning tasks such as
    temporal action segmentation and action alignment.

    Quality Resilient Deep Neural Networks

    Samuel Dodge, Lina Karam
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We study deep neural networks for classification of images with quality
    distortions. We first show that networks fine-tuned on distorted data greatly
    outperform the original networks when tested on distorted data. However,
    fine-tuned networks perform poorly on quality distortions that they have not
    been trained for. We propose a mixture of experts ensemble method that is
    robust to different types of distortions. The “experts” in our model are
    trained on a particular type of distortion. The output of the model is a
    weighted sum of the expert models, where the weights are determined by a
    separate gating network. The gating network is trained to predict optimal
    weights for a particular distortion type and level. During testing, the network
    is blind to the distortion level and type, yet can still assign appropriate
    weights to the expert models. We additionally investigate weight sharing
    methods for the mixture model and show that improved performance can be
    achieved with a large reduction in the number of unique network parameters.

    A Bag-of-Words Equivalent Recurrent Neural Network for Action Recognition

    Alexander Richard (1), Juergen Gall (1) ((1) University of Bonn)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The traditional bag-of-words approach has found a wide range of applications
    in computer vision. The standard pipeline consists of a generation of a visual
    vocabulary, a quantization of the features into histograms of visual words, and
    a classification step for which usually a support vector machine in combination
    with a non-linear kernel is used. Given large amounts of data, however, the
    model suffers from a lack of discriminative power. This applies particularly
    for action recognition, where the vast amount of video features needs to be
    subsampled for unsupervised visual vocabulary generation. Moreover, the kernel
    computation can be very expensive on large datasets. In this work, we propose a
    recurrent neural network that is equivalent to the traditional bag-of-words
    approach but enables for the application of discriminative training. The model
    further allows to incorporate the kernel computation into the neural network
    directly, solving the complexity issue and allowing to represent the complete
    classification system within a single network. We evaluate our method on four
    recent action recognition benchmarks and show that the conventional model as
    well as sparse coding methods are outperformed.

    Is Second-order Information Helpful for Large-scale Visual Recognition?

    Peihua Li, Jiangtao Xie, Qilong Wang, Wangmeng Zuo
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    By stacking deeper layers of convolutions and nonlinearity, convolutional
    networks (ConvNets) effectively learn from low-level to high-level features and
    discriminative representations. Since the end goal of large-scale recognition
    is to delineate the complex boundaries of thousands of classes in a
    large-dimensional space, adequate exploration of feature distributions is
    important for realizing full potentials of ConvNets. However, state-of-the-art
    works concentrate only on deeper or wider architecture design, while rarely
    exploring feature statistics higher than first-order. We take a step towards
    addressing this problem. Our method consists in covariance pooling, instead of
    the most commonly used first-order pooling, of high-level convolutional
    features. The main challenges involved are robust covariance estimation given a
    small sample of large-dimensional features and usage of the manifold structure
    of covariance matrices. To address these challenges, we present a Matrix Power
    Normalized Covariance (MPN-COV) method. We develop the forward and backward
    propagation formulas regarding the nonlinear matrix functions such that MPN-COV
    can be trained end-to-end. In addition, we analyze both qualitatively and
    quantitatively its advantage over the widely used Log-Euclidean metric. On the
    ImageNet 2012 validation set, by combining MPN-COV we achieve over 4%, 3% and
    2.5% gains for AlexNet, VGG-M and VGG-16, respectively; integration of MPN-COV
    into 50-layer ResNet outperforms ResNet-101 and is comparable to ResNet-152,
    both of which use first-order, global average pooling.

    Generative Adversarial Residual Pairwise Networks for One Shot Learning

    Akshay Mehrotra, Ambedkar Dukkipati
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Deep neural networks achieve unprecedented performance levels over many tasks
    and scale well with large quantities of data, but performance in the low-data
    regime and tasks like one shot learning still lags behind. While recent work
    suggests many hypotheses from better optimization to more complicated network
    structures, in this work we hypothesize that having a learnable and more
    expressive similarity objective is an essential missing component. Towards
    overcoming that, we propose a network design inspired by deep residual networks
    that allows the efficient computation of this more expressive pairwise
    similarity objective. Further, we argue that regularization is key in learning
    with small amounts of data, and propose an additional generator network based
    on the Generative Adversarial Networks where the discriminator is our residual
    pairwise network. This provides a strong regularizer by leveraging the
    generated data samples. The proposed model can generate plausible variations of
    exemplars over unseen classes and outperforms strong discriminative baselines
    for few shot classification tasks. Notably, our residual pairwise network
    design outperforms previous state-of-theart on the challenging mini-Imagenet
    dataset for one shot learning by getting over 55% accuracy for the 5-way
    classification task over unseen classes.

    Saliency-guided video classification via adaptively weighted learning

    Yunzhen Zhao, Yuxin Peng
    Comments: 6 pages, 1 figure, accepted by ICME 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Video classification is productive in many practical applications, and the
    recent deep learning has greatly improved its accuracy. However, existing works
    often model video frames indiscriminately, but from the view of motion, video
    frames can be decomposed into salient and non-salient areas naturally. Salient
    and non-salient areas should be modeled with different networks, for the former
    present both appearance and motion information, and the latter present static
    background information. To address this problem, in this paper, video saliency
    is predicted by optical flow without supervision firstly. Then two streams of
    3D CNN are trained individually for raw frames and optical flow on salient
    areas, and another 2D CNN is trained for raw frames on non-salient areas. For
    the reason that these three streams play different roles for each class, the
    weights of each stream are adaptively learned for each class. Experimental
    results show that saliency-guided modeling and adaptively weighted learning can
    reinforce each other, and we achieve the state-of-the-art results.

    Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs

    Timo von Marcard, Bodo Rosenhahn, Michael J. Black, Gerard Pons-Moll
    Comments: 12 pages, Accepted at Eurographics 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

    We address the problem of making human motion capture in the wild more
    practical by using a small set of inertial sensors attached to the body. Since
    the problem is heavily under-constrained, previous methods either use a large
    number of sensors, which is intrusive, or they require additional video input.
    We take a different approach and constrain the problem by: (i) making use of a
    realistic statistical body model that includes anthropometric constraints and
    (ii) using a joint optimization framework to fit the model to orientation and
    acceleration measurements over multiple frames. The resulting tracker Sparse
    Inertial Poser (SIP) enables motion capture using only 6 sensors (attached to
    the wrists, lower legs, back and head) and works for arbitrary human motions.
    Experiments on the recently released TNT15 dataset show that, using the same
    number of sensors, SIP achieves higher accuracy than the dataset baseline
    without using any video data. We further demonstrate the effectiveness of SIP
    on newly recorded challenging motions in outdoor scenarios such as climbing or
    jumping over a wall.

    A learning-based approach to text image retrieval: using CNN features and improved similarity metrics

    Mao Tan, Si-Ping Yuan, Yong-Xin Su
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Learning (cs.LG)

    Text content can have different visual presentation ways with roughly similar
    characters. While conventional text image retrieval depends on complex model of
    OCR-based text recognition and text similarity detection, this paper proposes a
    new learning-based approach to text image retrieval with the purpose of finding
    out the original or similar text through a query text image. Firstly, features
    of text images are extracted by the CNN network to obtain the deep visual
    representations. Then, the dimension of CNN features is reduced by PCA method
    to improve the efficiency of similarity detection. Based on that, an improved
    similarity metrics with article theme relevance filtering is proposed to
    improve the retrieval accuracy. In experimental procedure, we collect a group
    of academic papers both including English and Chinese as the text database, and
    cut them into pieces of text image. A text image with changed text content is
    used as the query image, experimental results show that the proposed approach
    has good ability to retrieve the original text content.

    Nonlinear Spectral Image Fusion

    Martin Benning, Michael Möller, Raz Z. Nossek, Martin Burger, Daniel Cremers, Guy Gilboa, Carola-Bibiane Schönlieb
    Comments: 13 pages, 9 figures, submitted to SSVM conference proceedings 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA)

    In this paper we demonstrate that the framework of nonlinear spectral
    decompositions based on total variation (TV) regularization is very well suited
    for image fusion as well as more general image manipulation tasks. The
    well-localized and edge-preserving spectral TV decomposition allows to select
    frequencies of a certain image to transfer particular features, such as
    wrinkles in a face, from one image to another. We illustrate the effectiveness
    of the proposed approach in several numerical experiments, including a
    comparison to the competing techniques of Poisson image editing, linear
    osmosis, wavelet fusion and Laplacian pyramid fusion. We conclude that the
    proposed spectral TV image decomposition framework is a valuable tool for semi-
    and fully-automatic image editing and fusion.

    Weakly Supervised Object Localization Using Things and Stuff Transfer

    Miaojing Shi, Holger Caesar, Vittorio Ferrari
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose to help weakly supervised object localization for classes where
    location annotations are not available, by transferring things and stuff
    knowledge from a source set with available annotations. The source and target
    classes might share similar appearance (e.g. bear fur is similar to cat fur) or
    appear against similar background (e.g. horse and sheep appear against grass).
    To exploit this, we acquire three types of knowledge from the source set: a
    segmentation model trained on both thing and stuff classes; similarity
    relations between target and source classes; and co-occurrence relations
    between thing and stuff classes in the source. The segmentation model is used
    to generate thing and stuff segmentation maps on a target image, while the
    class similarity and co-occurrence knowledge help refining them. We then
    incorporate these maps as new cues into a multiple instance learning framework
    (MIL), propagating the transferred knowledge from the pixel level to the object
    proposal level. In extensive experiments, we conduct our transfer from the
    PASCAL Context dataset (source) to the ILSVRC, COCO and PASCAL VOC 2007
    datasets (targets). We evaluate our transfer across widely different thing
    classes, including some that are not similar in appearance, but appear against
    similar background. The results demonstrate significant improvement over
    standard MIL, and we outperform the state-of-the-art in the transfer setting.

    Discriminatively Boosted Image Clustering with Fully Convolutional Auto-Encoders

    Fengfu Li, Hong Qiao, Bo Zhang, Xuanyang Xi
    Comments: 27 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Traditional image clustering methods take a two-step approach, feature
    learning and clustering, sequentially. However, recent research results
    demonstrated that combining the separated phases in a unified framework and
    training them jointly can achieve a better performance. In this paper, we first
    introduce fully convolutional auto-encoders for image feature learning and then
    propose a unified clustering framework to learn image representations and
    cluster centers jointly based on a fully convolutional auto-encoder and soft
    (k)-means scores. At initial stages of the learning procedure, the
    representations extracted from the auto-encoder may not be very discriminative
    for latter clustering. We address this issue by adopting a boosted
    discriminative distribution, where high score assignments are highlighted and
    low score ones are de-emphasized. With the gradually boosted discrimination,
    clustering assignment scores are discriminated and cluster purities are
    enlarged. Experiments on several vision benchmark datasets show that our
    methods can achieve a state-of-the-art performance.

    Image-based Localization using Hourglass Networks

    Iaroslav Melekhov, Juha Ylioinas, Juho Kannala, Esa Rahtu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we propose an encoder-decoder convolutional neural network
    (CNN) architecture for estimating camera pose (orientation and location) from a
    single RGB-image. The architecture has a hourglass shape consisting of a chain
    of convolution and up-convolution layers followed by a regression part. The
    up-convolution layers are introduced to preserve the fine-grained information
    of the input image. Following the common practice, we train our model in
    end-to-end manner utilizing transfer learning from large scale classification
    data. The experiments demonstrate the performance of the approach on data
    exhibiting different lighting conditions, reflections, and motion blur. The
    results indicate a clear improvement over the previous state-of-the-art even
    when compared to methods that utilize sequence of test frames instead of a
    single frame.

    Robust SfM with Little Image Overlap

    Yohann Salaun, Renaud Marlet, Pascal Monasse
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Usual Structure-from-Motion (SfM) techniques require at least trifocal
    overlaps to calibrate cameras and reconstruct a scene. We consider here
    scenarios of reduced image sets with little overlap, possibly as low as two
    images at most seeing the same part of the scene. We propose a new method,
    based on line coplanarity hypotheses, for estimating the relative scale of two
    independent bifocal calibrations sharing a camera, without the need of any
    trifocal information or Manhattan-world assumption. We use it to compute SfM in
    a chain of up-to-scale relative motions. For accuracy, we however also make use
    of trifocal information for line and/or point features, when present, relaxing
    usual trifocal constraints. For robustness to wrong assumptions and mismatches,
    we embed all constraints in a parameterless RANSAC-like approach. Experiments
    show that we can calibrate datasets that previously could not, and that this
    wider applicability does not come at the cost of inaccuracy.

    Recurrent Multimodal Interaction for Referring Image Segmentation

    Chenxi Liu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Alan Yuille
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper we are interested in the problem of image segmentation given
    natural language descriptions, i.e. referring expressions. Existing works
    tackle this problem by first modeling images and sentences independently and
    then segment images by combining these two types of representations. We argue
    that learning word-to-image interaction is more native in the sense of jointly
    modeling two modalities for the image segmentation task, and we propose
    convolutional multimodal LSTM to encode the sequential interactions between
    individual words, visual information, and spatial information. We show that our
    proposed model outperforms the baseline model on benchmark datasets. In
    addition, we analyze the intermediate output of the proposed multimodal LSTM
    approach and empirically explains how this approach enforces a more effective
    word-to-image interaction.

    Planar Object Tracking in the Wild: A Benchmark

    Pengpeng Liang, Yifan Wu, Haibin Ling
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Planar object tracking plays an important role in computer vision and related
    fields. While several benchmarks have been constructed for evaluating
    state-of-the-art algorithms, there is a lack of video sequences captured in the
    wild rather than in constrained laboratory environment. In this paper, we
    present a carefully designed planar object tracking benchmark containing 210
    videos of 30 planar objects sampled in the natural environment. In particular,
    for each object, we shoot seven videos involving various challenging factors,
    namely scale change, rotation, perspective distortion, motion blur, occlusion,
    out-of-view, and unconstrained. The ground truth is carefully annotated
    semi-manually to ensure the quality. Moreover, eleven state-of-the-art
    algorithms are evaluated on the benchmark using two evaluation metrics, with
    detailed analysis provided for the evaluation results. We expect the proposed
    benchmark to benefit future studies on planar object tracking.

    Guided Perturbations: Self Corrective Behavior in Convolutional Neural Networks

    Swami Sankaranarayanan, Arpit Jain, Ser Nam Lim
    Comments: 15 pages, 16 figures; Under review for conference
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Convolutional Neural Networks have been a subject of great importance over
    the past decade and great strides have been made in their utility for producing
    state of the art performance in many computer vision problems. However, the
    behavior of deep networks is yet to be fully understood and is still an active
    area of research. In this work, we present an intriguing behavior: pre-trained
    CNNs can be made to improve their predictions by structurally perturbing the
    input. We observe that these perturbations – referred as Guided Perturbations –
    enable a trained network to improve its prediction performance without any
    learning or change in network weights. We perform various ablative experiments
    to understand how these perturbations affect the local context and feature
    representations. Furthermore, we demonstrate that this idea can improve
    performance of several existing approaches on semantic segmentation and scene
    labeling tasks on the PASCAL VOC dataset and supervised classification tasks on
    MNIST and CIFAR10 datasets.

    Changing Fashion Cultures

    Kaori Abe, Teppei Suzuki, Shunya Ueta, Akio Nakamura, Yutaka Satoh, Hirokatsu Kataoka
    Comments: 9 pages, 9 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB); Multimedia (cs.MM)

    The paper presents a novel concept that analyzes and visualizes worldwide
    fashion trends. Our goal is to reveal cutting-edge fashion trends without
    displaying an ordinary fashion style. To achieve the fashion-based analysis, we
    created a new fashion culture database (FCDB), which consists of 76 million
    geo-tagged images in 16 cosmopolitan cities. By grasping a fashion trend of
    mixed fashion styles,the paper also proposes an unsupervised fashion trend
    descriptor (FTD) using a fashion descriptor, a codeword vetor, and temporal
    analysis. To unveil fashion trends in the FCDB, the temporal analysis in FTD
    effectively emphasizes consecutive features between two different times. In
    experiments, we clearly show the analysis of fashion trends and fashion-based
    city similarity. As the result of large-scale data collection and an
    unsupervised analyzer, the proposed approach achieves world-level fashion
    visualization in a time series. The code, model, and FCDB will be publicly
    available after the construction of the project page.

    Bidirectional-Convolutional LSTM Based Spectral-Spatial Feature Learning for Hyperspectral Image Classification

    Qingshan Liu, Feng Zhou, Renlong Hang, Xiaotong Yuan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes a novel deep learning framework named
    bidirectional-convolutional long short term memory (Bi-CLSTM) network to
    automatically learn the spectral-spatial feature from hyperspectral images
    (HSIs). In the network, the issue of spectral feature extraction is considered
    as a sequence learning problem, and a recurrent connection operator across the
    spectral domain is used to address it. Meanwhile, inspired from the widely used
    convolutional neural network (CNN), a convolution operator across the spatial
    domain is incorporated into the network to extract the spatial feature.
    Besides, to sufficiently capture the spectral information, a bidirectional
    recurrent connection is proposed. In the classification phase, the learned
    features are concatenated into a vector and fed to a softmax classifier via a
    fully-connected operator. To validate the effectiveness of the proposed
    Bi-CLSTM framework, we compare it with several state-of-the-art methods,
    including the CNN framework, on three widely used HSIs. The obtained results
    show that Bi-CLSTM can improve the classification performance as compared to
    other methods.

    Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression

    Aaron S. Jackson, Adrian Bulat, Vasileios Argyriou, Georgios Tzimiropoulos
    Comments: 10 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    3D face reconstruction is a fundamental Computer Vision problem of
    extraordinary difficulty. Current systems often assume the availability of
    multiple facial images (sometimes from the same subject) as input, and must
    address a number of methodological challenges such as establishing dense
    correspondences across large facial poses, expressions, and non-uniform
    illumination. In general these methods require complex and inefficient
    pipelines for model building and fitting. In this work, we propose to address
    many of these limitations by training a Convolutional Neural Network (CNN) on
    an appropriate dataset consisting of 2D images and 3D facial models or scans.
    Our CNN works with just a single 2D facial image, does not require accurate
    alignment nor establishes dense correspondence between images, works for
    arbitrary facial poses and expressions, and can be used to reconstruct the
    whole 3D facial geometry (including the non-visible parts of the face)
    bypassing the construction (during training) and fitting (during testing) of a
    3D Morphable Model. We achieve this via a simple CNN architecture that performs
    direct regression of a volumetric representation of the 3D facial geometry from
    a single 2D image. We also demonstrate how the related task of facial landmark
    localization can be incorporated into the proposed framework and help improve
    reconstruction quality, especially for the cases of large poses and facial
    expressions. Testing code will be made available online, along with pre-trained
    models this http URL

    Cross-View Image Matching for Geo-localization in Urban Environments

    Yicong Tian, Chen Chen, Mubarak Shah
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we address the problem of cross-view image geo-localization.
    Specifically, we aim to estimate the GPS location of a query street view image
    by finding the matching images in a reference database of geo-tagged bird’s eye
    view images, or vice versa. To this end, we present a new framework for
    cross-view image geo-localization by taking advantage of the tremendous success
    of deep convolutional neural networks (CNNs) in image classification and object
    detection. First, we employ the Faster R-CNN to detect buildings in the query
    and reference images. Next, for each building in the query image, we retrieve
    the (k) nearest neighbors from the reference buildings using a Siamese network
    trained on both positive matching image pairs and negative pairs. To find the
    correct NN for each query building, we develop an efficient multiple nearest
    neighbors matching method based on dominant sets. We evaluate the proposed
    framework on a new dataset that consists of pairs of street view and bird’s eye
    view images. Experimental results show that the proposed method achieves better
    geo-localization accuracy than other approaches and is able to generalize to
    images at unseen locations.

    R-C3D: Region Convolutional 3D Network for Temporal Activity Detection

    Huijuan Xu, Abir Das, Kate Saenko
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We address the problem of activity detection in continuous, untrimmed video
    streams. This is a difficult task that requires extracting meaningful
    spatio-temporal features to capture activities, accurately localizing the start
    and end times of each activity, and also dealing with very large data volumes.
    We introduce a new model, Region Convolutional 3D Network (R-C3D), which
    encodes the video streams using a three-dimensional fully convolutional
    network, then generates candidate temporal regions containing activities, and
    finally classifies selected regions into specific activities. Computation is
    saved due to the sharing of convolutional features between the proposal and the
    classification pipelines. The entire model is trained end-to-end with jointly
    optimized localization and classification losses. R-C3D is faster than existing
    methods (569 frames per second on a single Titan X Maxwell GPU) and achieves
    state-of-the-art results on THUMOS’14 (10\% absolute improvement). We further
    demonstrate that our model is a general activity detection framework that does
    not rely on assumptions about particular dataset properties by evaluating our
    approach on ActivityNet and Charades.

    Visually grounded learning of keyword prediction from untranscribed speech

    Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu
    Comments: 5 pages, 3 figures, 5 tables
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    During language acquisition, infants have the benefit of visual cues to
    ground spoken language. Robots similarly have access to audio and visual
    sensors. Recent work has shown that images and spoken captions can be mapped
    into a meaningful common space, allowing images to be retrieved using speech
    and vice versa. In this setting of images paired with untranscribed spoken
    captions, we consider whether computer vision systems can be used to obtain
    textual labels for the speech. Concretely, we use an image-to-words multi-label
    visual classifier to tag images with soft textual labels, and then train a
    neural network to map from the speech to these soft targets. We show that the
    resulting speech system is able to predict which words occur in an
    utterance—acting as a spoken bag-of-words classifier—without seeing any
    parallel speech and text. We find that the model often confuses semantically
    related words, e.g. “man” and “person”, making it even more effective as a
    semantic keyword spotter.

    Recurrent and Contextual Models for Visual Question Answering

    Abhijit Sharang, Eric Lau
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    We propose a series of recurrent and contextual neural network models for
    multiple choice visual question answering on the Visual7W dataset. Motivated by
    divergent trends in model complexities in the literature, we explore the
    balance between model expressiveness and simplicity by studying incrementally
    more complex architectures. We start with LSTM-encoding of input questions and
    answers; build on this with context generation by LSTM-encodings of neural
    image and question representations and attention over images; and evaluate the
    diversity and predictive power of our models and the ensemble thereof. All
    models are evaluated against a simple baseline inspired by the current
    state-of-the-art, consisting of involving simple concatenation of bag-of-words
    and CNN representations for the text and images, respectively. Generally, we
    observe marked variation in image-reasoning performance between our models not
    obvious from their overall performance, as well as evidence of dataset bias.
    Our standalone models achieve accuracies up to (64.6\%), while the ensemble of
    all models achieves the best accuracy of (66.67\%), within (0.5\%) of the
    current state-of-the-art for Visual7W.

    Robust Kronecker-Decomposable Component Analysis for Low Rank Modeling

    Mehdi Bahri, Yannis Panagakis, Stefanos Zafeiriou
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV)

    Dictionary learning and component analysis are part of one of the most
    well-studied and active research fields, at the intersection of signal and
    image processing, computer vision, and statistical machine learning. In
    dictionary learning, the current methods of choice are arguably K-SVD and its
    variants, which learn a dictionary (i.e., a decomposition) for sparse coding
    via Singular Value Decomposition. In robust component analysis, leading methods
    derive from Principal Component Pursuit (PCP), which recovers a low-rank matrix
    from sparse corruptions of unknown magnitude and support. While K-SVD is
    sensitive to the presence of noise and outliers in the training set, PCP does
    not provide a dictionary that respects the structure of the data (e.g.,
    images), and requires expensive SVD computations when solved by convex
    relaxation. In this paper, we introduce a new robust decomposition of images by
    combining ideas from sparse dictionary learning and PCP. We propose a novel
    Kronecker-decomposable component analysis which is robust to gross corruption,
    can be used for low-rank modeling, and leverages separability to solve
    significantly smaller problems. We design an efficient learning algorithm by
    drawing links with a restricted form of tensor factorization. The effectiveness
    of the proposed approach is demonstrated on real-world applications, namely
    background subtraction and image denoising, by performing a thorough comparison
    with the current state of the art.


    Artificial Intelligence

    Note Value Recognition for Rhythm Transcription Using a Markov Random Field Model for Musical Scores and Performances of Piano Music

    Eita Nakamura, Kazuyoshi Yoshii, Simon Dixon
    Comments: 12 pages, 15 figures, version submitted to IEEE/ACM TASLP
    Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD)

    This paper presents a statistical method for music transcription that can
    estimate score times of note onsets and offsets from polyphonic MIDI
    performance signals. Because performed note durations can deviate largely from
    score-indicated values, previous methods had the problem of not being able to
    accurately estimate offset score times (or note values) and thus could only
    output incomplete musical scores. Based on observations that the pitch context
    and onset score times are influential on the configuration of note values, we
    construct a context-tree model that provides prior distributions of note values
    using these features and combine it with a performance model in the framework
    of Markov random fields. Evaluation results showed that our method reduces the
    average error rate by around 40 percent compared to existing/simple methods. We
    also confirmed that, in our model, the score model plays a more important role
    than the performance model, and it automatically captures the voice structure
    by unsupervised learning.

    Diversification-Based Learning in Computing and Optimization

    Fred Glover, Jin-Kao Hao
    Comments: 17 pages
    Subjects: Artificial Intelligence (cs.AI)

    Diversification-Based Learning (DBL) derives from a collection of principles
    and methods introduced in the field of metaheuristics that have broad
    applications in computing and optimization. We show that the DBL framework goes
    significantly beyond that of the more recent Opposition-based learning (OBL)
    framework introduced in Tizhoosh (2005), which has become the focus of numerous
    research initiatives in machine learning and metaheuristic optimization. We
    unify and extend earlier proposals in metaheuristic search (Glover, 1997,
    Glover and Laguna, 1997) to give a collection of approaches that are more
    flexible and comprehensive than OBL for creating intensification and
    diversification strategies in metaheuristic search. We also describe potential
    applications of DBL to various subfields of machine learning and optimization.

    An overview of embedding models of entities and relationships for knowledge base completion

    Dat Quoc Nguyen
    Comments: 8 pages, 4 tables. arXiv admin note: substantial text overlap with arXiv:1606.06461, arXiv:1606.08140
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

    Knowledge bases of real-world facts about entities and their relationships
    are useful resources for a variety of natural language processing tasks.
    However, because knowledge bases are typically incomplete, it is useful to be
    able to perform knowledge base completion, i.e., predict whether a relationship
    not in the knowledge base is likely to be true. This article presents an
    overview of embedding models of entities and relationships for knowledge base
    completion, with up-to-date experimental results on two standard evaluation
    tasks of link prediction (i.e. entity prediction) and triple classification.

    Resolving the Complexity of Some Fundamental Problems in Computational Social Choice

    Palash Dey
    Comments: Ph.D. Thesis
    Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

    This thesis is in the area called computational social choice which is an
    intersection area of algorithms and social choice theory.

    Containment for Rule-Based Ontology-Mediated Queries

    Pablo Barcelo, Gerald Berger, Andreas Pieris
    Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

    Many efforts have been dedicated to identifying restrictions on ontologies
    expressed as tuple-generating dependencies (tgds), a.k.a. existential rules,
    that lead to the decidability for the problem of answering ontology-mediated
    queries (OMQs). This has given rise to three families of formalisms: guarded,
    non-recursive, and sticky sets of tgds. In this work, we study the containment
    problem for OMQs expressed in such formalisms, which is a key ingredient for
    solving static analysis tasks associated with them. Our main contribution is
    the development of specially tailored techniques for OMQ containment under the
    classes of tgds stated above. This enables us to obtain sharp complexity bounds
    for the problems at hand. We also apply our techniques to pinpoint the
    complexity of problems associated with two emerging applications of OMQ
    containment: distribution over components and UCQ rewritability of OMQs.

    Fast Stochastic Variance Reduced Gradient Method with Momentum Acceleration for Machine Learning

    Fanhua Shang, Yuanyuan Liu, James Cheng, Jiacheng Zhuo
    Comments: 19 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

    Recently, research on accelerated stochastic gradient descent methods (e.g.,
    SVRG) has made exciting progress (e.g., linear convergence for strongly convex
    problems). However, the best-known methods (e.g., Katyusha) requires at least
    two auxiliary variables and two momentum parameters. In this paper, we propose
    a fast stochastic variance reduction gradient (FSVRG) method, in which we
    design a novel update rule with the Nesterov’s momentum and incorporate the
    technique of growing epoch size. FSVRG has only one auxiliary variable and one
    momentum weight, and thus it is much simpler and has much lower per-iteration
    complexity. We prove that FSVRG achieves linear convergence for strongly convex
    problems and the optimal (mathcal{O}(1/T^2)) convergence rate for non-strongly
    convex problems, where (T) is the number of outer-iterations. We also extend
    FSVRG to directly solve the problems with non-smooth component functions, such
    as SVM. Finally, we empirically study the performance of FSVRG for solving
    various machine learning problems such as logistic regression, ridge
    regression, Lasso and SVM. Our results show that FSVRG outperforms the
    state-of-the-art stochastic methods, including Katyusha.

    Unsupervised Basis Function Adaptation for Reinforcement Learning

    Edward Barker, Charl Ras
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    When using reinforcement learning (RL) algorithms to evaluate a policy it is
    common, given a large state space, to introduce some form of approximation
    architecture for the value function (VF). The exact form of this architecture
    can have a significant effect on the accuracy of the VF estimate, however, and
    determining a suitable approximation architecture can often be a highly complex
    task. Consequently there is a large amount of interest in the potential for
    allowing RL algorithms to adaptively generate (i.e. to learn) approximation
    architectures.

    We investigate a method of adapting approximation architectures which uses
    feedback regarding the frequency with which an agent has visited certain states
    to guide which areas of the state space to approximate with greater detail. We
    introduce an algorithm based upon this idea which adapts a state aggregation
    approximation architecture on-line.

    Assuming (S) states, we demonstrate theoretically that – provided the
    following relatively non-restrictive assumptions are satisfied: (a) the number
    of cells (X) in the state aggregation architecture is of order
    (sqrt{S}ln{S}log_2{S}) or greater, (b) the policy and transition function
    are close to deterministic, and (c) the prior for the transition function is
    uniformly distributed – our algorithm can guarantee, assuming we use an
    appropriate scoring function to measure VF error, error which is arbitrarily
    close to zero as (S) becomes large. It is able to do this despite having only
    (O(Xlog_2{S})) space complexity (and negligible time complexity). We conclude
    by generating a set of empirical results which support the theoretical results.

    Guided Perturbations: Self Corrective Behavior in Convolutional Neural Networks

    Swami Sankaranarayanan, Arpit Jain, Ser Nam Lim
    Comments: 15 pages, 16 figures; Under review for conference
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Convolutional Neural Networks have been a subject of great importance over
    the past decade and great strides have been made in their utility for producing
    state of the art performance in many computer vision problems. However, the
    behavior of deep networks is yet to be fully understood and is still an active
    area of research. In this work, we present an intriguing behavior: pre-trained
    CNNs can be made to improve their predictions by structurally perturbing the
    input. We observe that these perturbations – referred as Guided Perturbations –
    enable a trained network to improve its prediction performance without any
    learning or change in network weights. We perform various ablative experiments
    to understand how these perturbations affect the local context and feature
    representations. Furthermore, we demonstrate that this idea can improve
    performance of several existing approaches on semantic segmentation and scene
    labeling tasks on the PASCAL VOC dataset and supervised classification tasks on
    MNIST and CIFAR10 datasets.

    Information-theoretic Model Identification and Policy Search using Physics Engines with Application to Robotic Manipulation

    Shaojun Zhu, Andrew Kimmel, Abdeslam Boularias
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We consider the problem of a robot learning the mechanical properties of
    objects through physical interaction with the object, and introduce a
    practical, data-efficient approach for identifying the motion models of these
    objects. The proposed method utilizes a physics engine, where the robot seeks
    to identify the inertial and friction parameters of the object by simulating
    its motion under different values of the parameters and identifying those that
    result in a simulation which matches the observed real motions. The problem is
    solved in a Bayesian optimization framework. The same framework is used for
    both identifying the model of an object online and searching for a policy that
    would minimize a given cost function according to the identified model.
    Experimental results both in simulation and using a real robot indicate that
    the proposed method outperforms state-of-the-art model-free reinforcement
    learning approaches.

    Supervised Typing of Big Graphs using Semantic Embeddings

    Mayank Kejriwal, Pedro Szekely
    Comments: 6 pages, to be published in Semantic Big Data Workshop at ACM, SIGMOD 2017; extended version in preparation for Open Journal of Semantic Web (OJSW)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We propose a supervised algorithm for generating type embeddings in the same
    semantic vector space as a given set of entity embeddings. The algorithm is
    agnostic to the derivation of the underlying entity embeddings. It does not
    require any manual feature engineering, generalizes well to hundreds of types
    and achieves near-linear scaling on Big Graphs containing many millions of
    triples and instances by virtue of an incremental execution. We demonstrate the
    utility of the embeddings on a type recommendation task, outperforming a
    non-parametric feature-agnostic baseline while achieving 15x speedup and
    near-constant memory usage on a full partition of DBpedia. Using
    state-of-the-art visualization, we illustrate the agreement of our
    extensionally derived DBpedia type embeddings with the manually curated domain
    ontology. Finally, we use the embeddings to probabilistically cluster about 4
    million DBpedia instances into 415 types in the DBpedia ontology.


    Information Retrieval

    An overview of embedding models of entities and relationships for knowledge base completion

    Dat Quoc Nguyen
    Comments: 8 pages, 4 tables. arXiv admin note: substantial text overlap with arXiv:1606.06461, arXiv:1606.08140
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

    Knowledge bases of real-world facts about entities and their relationships
    are useful resources for a variety of natural language processing tasks.
    However, because knowledge bases are typically incomplete, it is useful to be
    able to perform knowledge base completion, i.e., predict whether a relationship
    not in the knowledge base is likely to be true. This article presents an
    overview of embedding models of entities and relationships for knowledge base
    completion, with up-to-date experimental results on two standard evaluation
    tasks of link prediction (i.e. entity prediction) and triple classification.

    Quantifying and suppressing ranking bias in a large citation network

    Giacomo Vaccario, Matus Medo, Nicolas Wider, Manuel Sebastian Mariani
    Comments: Main text (pp. 1-12) and Appendices (pp. 13-17)
    Subjects: Physics and Society (physics.soc-ph); Digital Libraries (cs.DL); Information Retrieval (cs.IR); Data Analysis, Statistics and Probability (physics.data-an); Applications (stat.AP)

    It is widely recognized that citation counts for papers from different fields
    cannot be directly compared because different scientific fields adopt different
    citation practices. Citation counts are also strongly biased by paper age since
    older papers had more time to attract citations. Various procedures aim at
    suppressing these biases and give rise to new normalized indicators, such as
    the relative citation count. We use a large citation dataset from Microsoft
    Academic Graph and a new statistical framework based on the Mahalanobis
    distance to show that the rankings by well known indicators, including the
    relative citation count and Google’s PageRank score, are significantly biased
    by paper field and age. We propose a general normalization procedure motivated
    by the (z)-score which produces much less biased rankings when applied to
    citation count and PageRank score.

    A learning-based approach to text image retrieval: using CNN features and improved similarity metrics

    Mao Tan, Si-Ping Yuan, Yong-Xin Su
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Learning (cs.LG)

    Text content can have different visual presentation ways with roughly similar
    characters. While conventional text image retrieval depends on complex model of
    OCR-based text recognition and text similarity detection, this paper proposes a
    new learning-based approach to text image retrieval with the purpose of finding
    out the original or similar text through a query text image. Firstly, features
    of text images are extracted by the CNN network to obtain the deep visual
    representations. Then, the dimension of CNN features is reduced by PCA method
    to improve the efficiency of similarity detection. Based on that, an improved
    similarity metrics with article theme relevance filtering is proposed to
    improve the retrieval accuracy. In experimental procedure, we collect a group
    of academic papers both including English and Chinese as the text database, and
    cut them into pieces of text image. A text image with changed text content is
    used as the query image, experimental results show that the proposed approach
    has good ability to retrieve the original text content.


    Computation and Language

    Visually grounded learning of keyword prediction from untranscribed speech

    Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu
    Comments: 5 pages, 3 figures, 5 tables
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    During language acquisition, infants have the benefit of visual cues to
    ground spoken language. Robots similarly have access to audio and visual
    sensors. Recent work has shown that images and spoken captions can be mapped
    into a meaningful common space, allowing images to be retrieved using speech
    and vice versa. In this setting of images paired with untranscribed spoken
    captions, we consider whether computer vision systems can be used to obtain
    textual labels for the speech. Concretely, we use an image-to-words multi-label
    visual classifier to tag images with soft textual labels, and then train a
    neural network to map from the speech to these soft targets. We show that the
    resulting speech system is able to predict which words occur in an
    utterance—acting as a spoken bag-of-words classifier—without seeing any
    parallel speech and text. We find that the model often confuses semantically
    related words, e.g. “man” and “person”, making it even more effective as a
    semantic keyword spotter.

    An embedded segmental k-means model for unsupervised segmentation and clustering of speech

    Herman Kamper, Karen Livescu, Sharon Goldwater
    Comments: 5 pages, 3 figures, 2 tables
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Unsupervised segmentation and clustering of unlabelled speech are core
    problems in zero-resource speech processing. Most competitive approaches lie at
    methodological extremes: some follow a Bayesian approach, defining
    probabilistic models with convergence guarantees, while others opt for more
    efficient heuristic techniques. Here we introduce an approximation to a
    segmental Bayesian model that falls in between, with a clear objective function
    but using hard clustering and segmentation rather than full Bayesian inference.
    Like its Bayesian counterpart, this embedded segmental k-means model
    (ES-KMeans) represents arbitrary-length word segments as fixed-dimensional
    acoustic word embeddings. On English and Xitsonga data, ES-KMeans outperforms a
    leading heuristic method in word segmentation, giving similar scores to the
    Bayesian model while being five times faster with fewer hyperparameters.
    However, there is a trade-off in cluster purity, with the Bayesian model’s
    purer clusters yielding about 10% better unsupervised word error rates.

    Recurrent and Contextual Models for Visual Question Answering

    Abhijit Sharang, Eric Lau
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    We propose a series of recurrent and contextual neural network models for
    multiple choice visual question answering on the Visual7W dataset. Motivated by
    divergent trends in model complexities in the literature, we explore the
    balance between model expressiveness and simplicity by studying incrementally
    more complex architectures. We start with LSTM-encoding of input questions and
    answers; build on this with context generation by LSTM-encodings of neural
    image and question representations and attention over images; and evaluate the
    diversity and predictive power of our models and the ensemble thereof. All
    models are evaluated against a simple baseline inspired by the current
    state-of-the-art, consisting of involving simple concatenation of bag-of-words
    and CNN representations for the text and images, respectively. Generally, we
    observe marked variation in image-reasoning performance between our models not
    obvious from their overall performance, as well as evidence of dataset bias.
    Our standalone models achieve accuracies up to (64.6\%), while the ensemble of
    all models achieves the best accuracy of (66.67\%), within (0.5\%) of the
    current state-of-the-art for Visual7W.

    An overview of embedding models of entities and relationships for knowledge base completion

    Dat Quoc Nguyen
    Comments: 8 pages, 4 tables. arXiv admin note: substantial text overlap with arXiv:1606.06461, arXiv:1606.08140
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

    Knowledge bases of real-world facts about entities and their relationships
    are useful resources for a variety of natural language processing tasks.
    However, because knowledge bases are typically incomplete, it is useful to be
    able to perform knowledge base completion, i.e., predict whether a relationship
    not in the knowledge base is likely to be true. This article presents an
    overview of embedding models of entities and relationships for knowledge base
    completion, with up-to-date experimental results on two standard evaluation
    tasks of link prediction (i.e. entity prediction) and triple classification.

    Rapid-Rate: A Framework for Semi-supervised Real-time Sentiment Trend Detection in Unstructured Big Data

    Vineet John
    Subjects: Computation and Language (cs.CL)

    Commercial establishments like restaurants, service centres and retailers
    have several sources of customer feedback about products and services, most of
    which need not be as structured as rated reviews provided by services like
    Yelp, or Amazon, in terms of sentiment conveyed.

    For instance, Amazon provides a fine-grained score on a numeric scale for
    product reviews.

    Some sources, however, like social media (Twitter, Facebook), mailing lists
    (Google Groups) and forums (Quora) contain text data that is much more
    voluminous, but unstructured and unlabelled.

    It might be in the best interests of a business establishment to assess the
    general sentiment towards their brand on these platforms as well.

    This text could be pipelined into a system with a built-in prediction model,
    with the objective of generating real-time graphs on opinion and sentiment
    trends.

    Although such tasks like the one described about have been explored with
    respect to document classification problems in the past, the implementation
    described in this paper, by virtue of learning a continuous function rather
    than a discrete one, offers a lot more depth of insight as compared to document
    classification approaches.

    This study aims to explore the validity of such a continuous function
    predicting model to quantify sentiment about an entity, without the additional
    overhead of manual labelling, and computational preprocessing & feature
    extraction.

    This research project also aims to design and implement a re-usable document
    regression pipeline as a framework, Rapid-Ratecite{rapid_rate}, that can be
    used to predict document scores in real-time.

    Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation

    Jean-Benoit Delbrouck, Stephane Dupont
    Comments: Submitted to ICLR Workshop 2017
    Subjects: Computation and Language (cs.CL)

    In state-of-the-art Neural Machine Translation, an attention mechanism is
    used during decoding to enhance the translation. At every step, the decoder
    uses this mechanism to focus on different parts of the source sentence to
    gather the most useful information before outputting its target word. Recently,
    the effectiveness of the attention mechanism has also been explored for
    multimodal tasks, where it becomes possible to focus both on sentence parts and
    image regions. Approaches to pool two modalities usually include element-wise
    product, sum or concatenation. In this paper, we evaluate the more advanced
    Multimodal Compact Bilinear pooling method, which takes the outer product of
    two vectors to combine the attention features for the two modalities. This has
    been previously investigated for visual question answering. We try out this
    approach for multimodal image caption translation and show improvements
    compared to basic combination methods.

    Sequential Recurrent Neural Networks for Language Modeling

    Youssef Oualil, Clayton Greenberg, Mittul Singh, Dietrich Klakow
    Comments: published (INTERSPEECH 2016), 5 pages, 3 figures, 4 tables
    Subjects: Computation and Language (cs.CL)

    Feedforward Neural Network (FNN)-based language models estimate the
    probability of the next word based on the history of the last N words, whereas
    Recurrent Neural Networks (RNN) perform the same task based only on the last
    word and some context information that cycles in the network. This paper
    presents a novel approach, which bridges the gap between these two categories
    of networks. In particular, we propose an architecture which takes advantage of
    the explicit, sequential enumeration of the word history in FNN structure while
    enhancing each word representation at the projection layer through recurrent
    context information that evolves in the network. The context integration is
    performed using an additional word-dependent weight matrix that is also learned
    during the training. Extensive experiments conducted on the Penn Treebank (PTB)
    and the Large Text Compression Benchmark (LTCB) corpus showed a significant
    reduction of the perplexity when compared to state-of-the-art feedforward as
    well as recurrent neural network architectures.

    A network of deep neural networks for distant speech recognition

    Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Despite the remarkable progress recently made in distant speech recognition,
    state-of-the-art technology still suffers from a lack of robustness, especially
    when adverse acoustic conditions characterized by non-stationary noises and
    reverberation are met. A prominent limitation of current systems lies in the
    lack of matching and communication between the various technologies involved in
    the distant speech recognition process. The speech enhancement and speech
    recognition modules are, for instance, often trained independently. Moreover,
    the speech enhancement normally helps the speech recognizer, but the output of
    the latter is not commonly used, in turn, to improve the speech enhancement. To
    address both concerns, we propose a novel architecture based on a network of
    deep neural networks, where all the components are jointly trained and better
    cooperate with each other thanks to a full communication scheme between them.
    Experiments, conducted using different datasets, tasks and acoustic conditions,
    revealed that the proposed framework can overtake other competitive solutions,
    including recent joint training approaches.

    Supervised Typing of Big Graphs using Semantic Embeddings

    Mayank Kejriwal, Pedro Szekely
    Comments: 6 pages, to be published in Semantic Big Data Workshop at ACM, SIGMOD 2017; extended version in preparation for Open Journal of Semantic Web (OJSW)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We propose a supervised algorithm for generating type embeddings in the same
    semantic vector space as a given set of entity embeddings. The algorithm is
    agnostic to the derivation of the underlying entity embeddings. It does not
    require any manual feature engineering, generalizes well to hundreds of types
    and achieves near-linear scaling on Big Graphs containing many millions of
    triples and instances by virtue of an incremental execution. We demonstrate the
    utility of the embeddings on a type recommendation task, outperforming a
    non-parametric feature-agnostic baseline while achieving 15x speedup and
    near-constant memory usage on a full partition of DBpedia. Using
    state-of-the-art visualization, we illustrate the agreement of our
    extensionally derived DBpedia type embeddings with the manually curated domain
    ontology. Finally, we use the embeddings to probabilistically cluster about 4
    million DBpedia instances into 415 types in the DBpedia ontology.

    Dynamic Bernoulli Embeddings for Language Evolution

    Maja Rudolph, David Blei
    Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL)

    Word embeddings are a powerful approach for unsupervised analysis of
    language. Recently, Rudolph et al. (2016) developed exponential family
    embeddings, which cast word embeddings in a probabilistic framework. Here, we
    develop dynamic embeddings, building on exponential family embeddings to
    capture how the meanings of words change over time. We use dynamic embeddings
    to analyze three large collections of historical texts: the U.S. Senate
    speeches from 1858 to 2009, the history of computer science ACM abstracts from
    1951 to 2014, and machine learning papers on the Arxiv from 2007 to 2015. We
    find dynamic embeddings provide better fits than classical embeddings and
    capture interesting patterns about how language changes.


    Distributed, Parallel, and Cluster Computing

    Sparse geometries handling in lattice-Boltzmann method implementation for graphic processors

    Tadeusz Tomczak, Roman G. Szafran
    Comments: 14 pages, 10 figures, sent to IEEE Transactions on Parallel and Distributed Systems
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    We describe a high-performance implementation of the lattice-Boltzmann method
    (LBM) for sparse geometries on graphic processors. In our implementation we
    cover the whole geometry with a uniform mesh of small tiles and carry out
    calculations for each tile independently with a proper data synchronization at
    tile edges. For this method we provide both the theoretical analysis of
    complexity and the results for real implementations for 2D and 3D geometries.
    Based on the theoretical model, we show that tiles offer significantly smaller
    bandwidth overhead than solutions based on indirect addressing. For
    2-dimensional lattice arrangements a reduction of memory usage is also
    possible, though at the cost of diminished performance. We reached the
    performance of 682 MLUPS on GTX Titan (72\% of peak theoretical memory
    bandwidth) for D3Q19 lattice arrangement and double precision data.


    Learning

    Online Distributed Learning Over Networks in RKH Spaces Using Random Fourier Features

    Pantelis Bouboulis, Symeon Chouvardas, Sergios Theodoridis
    Comments: submitted paper
    Subjects: Learning (cs.LG)

    We present a novel diffusion scheme for online kernel-based learning over
    networks. So far, a major drawback of any online learning algorithm, operating
    in a reproducing kernel Hilbert space (RKHS), is the need for updating a
    growing number of parameters as time iterations evolve. Besides complexity,
    this leads to an increased need of communication resources, in a distributed
    setting. In contrast, the proposed method approximates the solution as a
    fixed-size vector (of larger dimension than the input space) using Random
    Fourier Features. This paves the way to use standard linear combine-then-adapt
    techniques. To the best of our knowledge, this is the first time that a
    complete protocol for distributed online learning in RKHS is presented.
    Conditions for asymptotic convergence and boundness of the networkwise regret
    are also provided. The simulated tests illustrate the performance of the
    proposed scheme.

    Failures of Deep Learning

    Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah
    Subjects: Learning (cs.LG)

    In recent years, Deep Learning has become the go-to solution for a broad
    range of applications, often outperforming state-of-the-art. However, it is
    important, for both theoreticians and practitioners, to gain a deeper
    understanding of the difficulties and limitations associated with common
    approaches and algorithms. We describe four families of problems for which some
    of the commonly used existing algorithms fail or suffer significant difficulty.
    We illustrate the failures through practical experiments, and provide
    theoretical insights explaining their source, and how they might be remedied.

    Fast Stochastic Variance Reduced Gradient Method with Momentum Acceleration for Machine Learning

    Fanhua Shang, Yuanyuan Liu, James Cheng, Jiacheng Zhuo
    Comments: 19 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

    Recently, research on accelerated stochastic gradient descent methods (e.g.,
    SVRG) has made exciting progress (e.g., linear convergence for strongly convex
    problems). However, the best-known methods (e.g., Katyusha) requires at least
    two auxiliary variables and two momentum parameters. In this paper, we propose
    a fast stochastic variance reduction gradient (FSVRG) method, in which we
    design a novel update rule with the Nesterov’s momentum and incorporate the
    technique of growing epoch size. FSVRG has only one auxiliary variable and one
    momentum weight, and thus it is much simpler and has much lower per-iteration
    complexity. We prove that FSVRG achieves linear convergence for strongly convex
    problems and the optimal (mathcal{O}(1/T^2)) convergence rate for non-strongly
    convex problems, where (T) is the number of outer-iterations. We also extend
    FSVRG to directly solve the problems with non-smooth component functions, such
    as SVM. Finally, we empirically study the performance of FSVRG for solving
    various machine learning problems such as logistic regression, ridge
    regression, Lasso and SVM. Our results show that FSVRG outperforms the
    state-of-the-art stochastic methods, including Katyusha.

    Unsupervised Basis Function Adaptation for Reinforcement Learning

    Edward Barker, Charl Ras
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    When using reinforcement learning (RL) algorithms to evaluate a policy it is
    common, given a large state space, to introduce some form of approximation
    architecture for the value function (VF). The exact form of this architecture
    can have a significant effect on the accuracy of the VF estimate, however, and
    determining a suitable approximation architecture can often be a highly complex
    task. Consequently there is a large amount of interest in the potential for
    allowing RL algorithms to adaptively generate (i.e. to learn) approximation
    architectures.

    We investigate a method of adapting approximation architectures which uses
    feedback regarding the frequency with which an agent has visited certain states
    to guide which areas of the state space to approximate with greater detail. We
    introduce an algorithm based upon this idea which adapts a state aggregation
    approximation architecture on-line.

    Assuming (S) states, we demonstrate theoretically that – provided the
    following relatively non-restrictive assumptions are satisfied: (a) the number
    of cells (X) in the state aggregation architecture is of order
    (sqrt{S}ln{S}log_2{S}) or greater, (b) the policy and transition function
    are close to deterministic, and (c) the prior for the transition function is
    uniformly distributed – our algorithm can guarantee, assuming we use an
    appropriate scoring function to measure VF error, error which is arbitrarily
    close to zero as (S) becomes large. It is able to do this despite having only
    (O(Xlog_2{S})) space complexity (and negligible time complexity). We conclude
    by generating a set of empirical results which support the theoretical results.

    Random Features for Compositional Kernels

    Amit Daniely, Roy Frostig, Vineet Gupta, Yoram Singer
    Subjects: Learning (cs.LG)

    We describe and analyze a simple random feature scheme (RFS) from prescribed
    compositional kernels. The compositional kernels we use are inspired by the
    structure of convolutional neural networks and kernels. The resulting scheme
    yields sparse and efficiently computable features. Each random feature can be
    represented as an algebraic expression over a small number of (random) paths in
    a composition tree. Thus, compositional random features can be stored
    compactly. The discrete nature of the generation process enables de-duplication
    of repeated features, further compacting the representation and increasing the
    diversity of the embeddings. Our approach complements and can be combined with
    previous random feature schemes.

    Faster Reinforcement Learning Using Active Simulators

    Vikas Jain, Theja Tulabandhula
    Comments: 12 pages and 4 figures
    Subjects: Learning (cs.LG)

    In this work, we propose several online methods to build a emph{learning
    curriculum} from a given set of target-task-specific training tasks in order to
    speed up reinforcement learning (RL). These methods can decrease the total
    training time needed by an RL agent compared to training on the target task
    from scratch. Unlike traditional transfer learning, we consider creating a
    sequence from several training tasks in order to provide the most benefit in
    terms of reducing the total time to train.

    Our methods utilize the learning trajectory of the agent on the curriculum
    tasks seen so far to decide which tasks to train on next. An attractive feature
    of our methods is that they are weakly coupled to the choice of the RL
    algorithm as well as the transfer learning method. Further, when there is
    domain information available, our methods can incorporate such knowledge to
    further speed up the learning. We experimentally show that these methods can be
    used to obtain suitable learning curricula that speed up the overall training
    time on two different domains.

    Randomized Kernel Methods for Least-Squares Support Vector Machines

    M. Andrecut
    Comments: 16 pages, 6 figures
    Journal-ref: Int. J. Mod. Phys. C, 28, 1750015 (2017)
    Subjects: Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

    The least-squares support vector machine is a frequently used kernel method
    for non-linear regression and classification tasks. Here we discuss several
    approximation algorithms for the least-squares support vector machine
    classifier. The proposed methods are based on randomized block kernel matrices,
    and we show that they provide good accuracy and reliable scaling for
    multi-class classification problems with relatively large data sets. Also, we
    present several numerical experiments that illustrate the practical
    applicability of the proposed methods.

    Fake News Mitigation via Point Process Based Intervention

    Mehrdad Farajtabar, Jiachen Yang, Xiaojing Ye, Huan Xu, Rakshit Trivedi, Elias Khalil, Shuang Li, Le Song, Hongyuan Zha
    Comments: Point Process, Hawkes Process, Social Networks, Intervention and Control, Reinforcement Learning
    Subjects: Learning (cs.LG); Social and Information Networks (cs.SI)

    We propose the first multistage intervention framework that tackles fake news
    in social networks by combining reinforcement learning with a point process
    network activity model. The spread of fake news and mitigation events within
    the network is modeled by a multivariate Hawkes process with additional
    exogenous control terms. By choosing a feature representation of states,
    defining mitigation actions and constructing reward functions to measure the
    effectiveness of mitigation activities, we map the problem of fake news
    mitigation into the reinforcement learning framework. We develop a policy
    iteration method unique to the multivariate networked point process, with the
    goal of optimizing the actions for maximal total reward under budget
    constraints. Our method shows promising performance in real-time intervention
    experiments on a Twitter network to mitigate a surrogate fake news campaign,
    and outperforms alternatives on synthetic datasets.

    Learning to Partition using Score Based Compatibilities

    Arun Rajkumar, Koyel Mukherjee, Theja Tulabandhula
    Comments: Appears in the Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017)
    Subjects: Learning (cs.LG)

    We study the problem of learning to partition users into groups, where one
    must learn the compatibilities between the users to achieve optimal groupings.
    We define four natural objectives that optimize for average and worst case
    compatibilities and propose new algorithms for adaptively learning optimal
    groupings. When we do not impose any structure on the compatibilities, we show
    that the group formation objectives considered are (NP) hard to solve and we
    either give approximation guarantees or prove inapproximability results. We
    then introduce an elegant structure, namely that of extit{intrinsic scores},
    that makes many of these problems polynomial time solvable. We explicitly
    characterize the optimal groupings under this structure and show that the
    optimal solutions are related to emph{homophilous} and emph{heterophilous}
    partitions, well-studied in the psychology literature. For one of the four
    objectives, we show (NP) hardness under the score structure and give a
    (frac{1}{2}) approximation algorithm for which no constant approximation was
    known thus far. Finally, under the score structure, we propose an online low
    sample complexity PAC algorithm for learning the optimal partition. We
    demonstrate the efficacy of the proposed algorithm on synthetic and real world
    datasets.

    An embedded segmental k-means model for unsupervised segmentation and clustering of speech

    Herman Kamper, Karen Livescu, Sharon Goldwater
    Comments: 5 pages, 3 figures, 2 tables
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Unsupervised segmentation and clustering of unlabelled speech are core
    problems in zero-resource speech processing. Most competitive approaches lie at
    methodological extremes: some follow a Bayesian approach, defining
    probabilistic models with convergence guarantees, while others opt for more
    efficient heuristic techniques. Here we introduce an approximation to a
    segmental Bayesian model that falls in between, with a clear objective function
    but using hard clustering and segmentation rather than full Bayesian inference.
    Like its Bayesian counterpart, this embedded segmental k-means model
    (ES-KMeans) represents arbitrary-length word segments as fixed-dimensional
    acoustic word embeddings. On English and Xitsonga data, ES-KMeans outperforms a
    leading heuristic method in word segmentation, giving similar scores to the
    Bayesian model while being five times faster with fewer hyperparameters.
    However, there is a trade-off in cluster purity, with the Bayesian model’s
    purer clusters yielding about 10% better unsupervised word error rates.

    A learning-based approach to text image retrieval: using CNN features and improved similarity metrics

    Mao Tan, Si-Ping Yuan, Yong-Xin Su
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Learning (cs.LG)

    Text content can have different visual presentation ways with roughly similar
    characters. While conventional text image retrieval depends on complex model of
    OCR-based text recognition and text similarity detection, this paper proposes a
    new learning-based approach to text image retrieval with the purpose of finding
    out the original or similar text through a query text image. Firstly, features
    of text images are extracted by the CNN network to obtain the deep visual
    representations. Then, the dimension of CNN features is reduced by PCA method
    to improve the efficiency of similarity detection. Based on that, an improved
    similarity metrics with article theme relevance filtering is proposed to
    improve the retrieval accuracy. In experimental procedure, we collect a group
    of academic papers both including English and Chinese as the text database, and
    cut them into pieces of text image. A text image with changed text content is
    used as the query image, experimental results show that the proposed approach
    has good ability to retrieve the original text content.

    A network of deep neural networks for distant speech recognition

    Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Despite the remarkable progress recently made in distant speech recognition,
    state-of-the-art technology still suffers from a lack of robustness, especially
    when adverse acoustic conditions characterized by non-stationary noises and
    reverberation are met. A prominent limitation of current systems lies in the
    lack of matching and communication between the various technologies involved in
    the distant speech recognition process. The speech enhancement and speech
    recognition modules are, for instance, often trained independently. Moreover,
    the speech enhancement normally helps the speech recognizer, but the output of
    the latter is not commonly used, in turn, to improve the speech enhancement. To
    address both concerns, we propose a novel architecture based on a network of
    deep neural networks, where all the components are jointly trained and better
    cooperate with each other thanks to a full communication scheme between them.
    Experiments, conducted using different datasets, tasks and acoustic conditions,
    revealed that the proposed framework can overtake other competitive solutions,
    including recent joint training approaches.

    Discriminatively Boosted Image Clustering with Fully Convolutional Auto-Encoders

    Fengfu Li, Hong Qiao, Bo Zhang, Xuanyang Xi
    Comments: 27 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Traditional image clustering methods take a two-step approach, feature
    learning and clustering, sequentially. However, recent research results
    demonstrated that combining the separated phases in a unified framework and
    training them jointly can achieve a better performance. In this paper, we first
    introduce fully convolutional auto-encoders for image feature learning and then
    propose a unified clustering framework to learn image representations and
    cluster centers jointly based on a fully convolutional auto-encoder and soft
    (k)-means scores. At initial stages of the learning procedure, the
    representations extracted from the auto-encoder may not be very discriminative
    for latter clustering. We address this issue by adopting a boosted
    discriminative distribution, where high score assignments are highlighted and
    low score ones are de-emphasized. With the gradually boosted discrimination,
    clustering assignment scores are discriminated and cluster purities are
    enlarged. Experiments on several vision benchmark datasets show that our
    methods can achieve a state-of-the-art performance.

    The role of zero synapses in unsupervised feature learning

    Haiping Huang
    Comments: 5 pages, 4 figures
    Subjects: Neurons and Cognition (q-bio.NC); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Synapses in real neural circuits can take discrete values, including zero
    (silent or potential) synapses. The computational role of zero synapses in
    unsupervised feature learning of unlabeled noisy data is still unclear, yet
    important to understand how the sparseness of synaptic activity is shaped
    during learning and its relationship with receptive field formation. Here, we
    formulate this kind of sparse feature learning by statistical mechanics
    approach. We find that learning decreases the fraction of zero synapses, and
    when the fraction decreases rapidly around a critical data size, the
    intrinsically structured receptive field starts to develop. Further increasing
    the data size refines the receptive field, while a very small fraction of zero
    synapses remain to act as contour detectors. This feature is discovered not
    only in a handwritten digits dataset, but also in retinal neural activity
    measured in a natural movie stimuli experiment.

    Perspective: Energy Landscapes for Machine Learning

    Andrew J. Ballard, Ritankar Das, Stefano Martiniani, Dhagash Mehta, Levent Sagun, Jacob D. Stevenson, David J. Wales
    Comments: 41 pages, 25 figures. Accepted for publication in Physical Chemistry Chemical Physics, 2017
    Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG)

    Machine learning techniques are being increasingly used as flexible
    non-linear fitting and prediction tools in the physical sciences. Fitting
    functions that exhibit multiple solutions as local minima can be analysed in
    terms of the corresponding machine learning landscape. Methods to explore and
    visualise molecular potential energy landscapes can be applied to these machine
    learning landscapes to gain new insight into the solution space involved in
    training and the nature of the corresponding predictions. In particular, we can
    define quantities analogous to molecular structure, thermodynamics, and
    kinetics, and relate these emergent properties to the structure of the
    underlying landscape. This Perspective aims to describe these analogies with
    examples from recent applications, and suggest avenues for new
    interdisciplinary research.

    Data Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains

    Tegjyot Singh Sethi, Mehmed Kantardzic
    Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Learning (cs.LG)

    While modern day web applications aim to create impact at the civilization
    level, they have become vulnerable to adversarial activity, where the next
    cyber-attack can take any shape and can originate from anywhere. The increasing
    scale and sophistication of attacks, has prompted the need for a data driven
    solution, with machine learning forming the core of many cybersecurity systems.
    Machine learning was not designed with security in mind, and the essential
    assumption of stationarity, requiring that the training and testing data follow
    similar distributions, is violated in an adversarial domain. In this paper, an
    adversary’s view point of a classification based system, is presented. Based on
    a formal adversarial model, the Seed-Explore-Exploit framework is presented,
    for simulating the generation of data driven and reverse engineering attacks on
    classifiers. Experimental evaluation, on 10 real world datasets and using the
    Google Cloud Prediction Platform, demonstrates the innate vulnerability of
    classifiers and the ease with which evasion can be carried out, without any
    explicit information about the classifier type, the training data or the
    application domain. The proposed framework, algorithms and empirical
    evaluation, serve as a white hat analysis of the vulnerabilities, and aim to
    foster the development of secure machine learning frameworks.

    SIMLR: a tool for large-scale single-cell analysis by multi-kernel learning

    Bo Wang, Daniele Ramazzotti, Luca De Sano, Junjie Zhu, Emma Pierson, Serafim Batzoglou
    Subjects: Genomics (q-bio.GN); Learning (cs.LG); Quantitative Methods (q-bio.QM)

    Motivation: We here present SIMLR (Single-cell Interpretation via
    Multi-kernel LeaRning), an open-source tool that implements a novel framework
    to learn a cell-to-cell similarity measure from single-cell RNA-seq data. SIMLR
    can be effectively used to perform tasks such as dimension reduction,
    clustering, and visualization of heterogeneous populations of cells. SIMLR was
    benchmarked against state-of-the-art methods for these three tasks on several
    public datasets, showing it to be scalable and capable of greatly improving
    clustering performance, as well as providing valuable insights by making the
    data more interpretable via better a visualization. Availability and
    Implementation: SIMLR is available on GitHub in both R and MATLAB
    implementations. Furthermore, it is also available as an R package on
    bioconductor.org. Contact: bowang87@stanford.edu or
    daniele.ramazzotti@stanford.edu Supplementary Information: Supplementary data
    are available at Bioinformatics online.

    Classification-based RNN machine translation using GRUs

    Ri Wang, Maysum Panju, Mahmood Gohari
    Comments: 7 pages, 1 figure; graduate course research project
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    We report the results of our classification-based machine translation model,
    built upon the framework of a recurrent neural network using gated recurrent
    units. Unlike other RNN models that attempt to maximize the overall conditional
    log probability of sentences against sentences, our model focuses a
    classification approach of estimating the conditional probability of the next
    word given the input sequence. This simpler approach using GRUs was hoped to be
    comparable with more complicated RNN models, but achievements in this
    implementation were modest and there remains a lot of room for improving this
    classification approach.

    Information-theoretic Model Identification and Policy Search using Physics Engines with Application to Robotic Manipulation

    Shaojun Zhu, Andrew Kimmel, Abdeslam Boularias
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We consider the problem of a robot learning the mechanical properties of
    objects through physical interaction with the object, and introduce a
    practical, data-efficient approach for identifying the motion models of these
    objects. The proposed method utilizes a physics engine, where the robot seeks
    to identify the inertial and friction parameters of the object by simulating
    its motion under different values of the parameters and identifying those that
    result in a simulation which matches the observed real motions. The problem is
    solved in a Bayesian optimization framework. The same framework is used for
    both identifying the model of an object online and searching for a policy that
    would minimize a given cost function according to the identified model.
    Experimental results both in simulation and using a real robot indicate that
    the proposed method outperforms state-of-the-art model-free reinforcement
    learning approaches.


    Information Theory

    Combinatorial Entropy Encoding

    Abu Bakar Siddique
    Subjects: Information Theory (cs.IT)

    This paper proposes a novel entropy encoding technique for lossless data
    compression. Representing a message string by its lexicographic index in the
    permutations of its symbols results in a compressed version matching Shannon
    entropy of the message. Commercial data compression standards make use of
    Huffman or arithmetic coding at some stage of the compression process. In the
    proposed method, like arithmetic coding entire string is mapped to an integer
    but is not based on fractional numbers. Unlike both arithmetic and Huffman
    coding no prior entropy model of the source is required. Simple intuitive
    algorithm based on multinomial coefficients is developed for entropy encoding
    that adoptively uses low number of bits for more frequent symbols. Correctness
    of the algorithm is demonstrated by an example.

    Cooperative Binning for Semi-deterministic Channels with Non-causal State Information

    Ido B. Gattegno, Haim H. Permuter, Shlomo Shamai (Shitz), AyferÖzgür
    Subjects: Information Theory (cs.IT)

    The capacity of the semi-deterministic relay channel (SD-RC) with non-causal
    channel state information (CSI) only at the encoder and decoder is
    characterized. The capacity is achieved by a scheme based on
    cooperative-bin-forward. This scheme allows cooperation between the transmitter
    and the relay without the need to decode a part of the message by the relay.
    The transmission is divided into blocks and each deterministic output of the
    channel (observed by the relay) is mapped to a bin. The bin index is used by
    the encoder and the relay to choose the cooperation codeword in the next
    transmission block. In causal settings the cooperation is independent of the
    state. In emph{non-causal} settings dependency between the relay’s
    transmission and the state can increase the transmission rates. The encoder
    implicitly conveys partial state information to the relay. In particular, it
    uses the states of the next block and selects a cooperation codeword
    accordingly and the relay transmission depends on the cooperation codeword and
    therefore also on the states. We also consider the multiple access channel with
    partial cribbing as a semi-deterministic channel. The capacity region of this
    channel with non-causal CSI is achieved by the new scheme. Examining the result
    in several cases, we introduce a new problem of a point-to-point (PTP) channel
    where the state is provided to the transmitter by a state encoder.
    Interestingly, even though the CSI is also available at the receiver, we
    provide an example which shows that the capacity with non-causal CSI at the
    state encoder is strictly larger than the capacity with causal CSI.

    Generalised Twisted Gabidulin Codes

    Sven Puchinger, Johan Rosenkilde né Nielsen, John Sheekey
    Comments: 10 pages, submitted to the International Workshop on Coding and Cryptography (WCC) 2017
    Subjects: Information Theory (cs.IT)

    We present a new family of maximum rank distance (MRD) codes. The new class
    contains codes that are neither equivalent to a generalised Gabidulin nor to a
    twisted Gabidulin code, the only two known general constructions of linear MRD
    codes.

    A Flexible Wideband Millimeter-Wave Channel Sounder with Local Area and NLOS to LOS Transition Measurements

    George R. MacCartney Jr., Hangsong Yan, Shu Sun, Theodore S. Rappaport
    Comments: To be published in 2017 IEEE International Conference on Communications (ICC), Paris, France, May 2017
    Subjects: Information Theory (cs.IT)

    This paper presents a millimeter-wave (mmWave) wideband sliding correlator
    channel sounder with flexibility to operate at various transmission rates. The
    channel sounder can transmit and receive up to 1 GHz of RF null-to-null
    bandwidth while measuring a 2 nanosecond multipath time resolution. The system
    architecture takes advantage of field-programmable gate arrays (FPGAs),
    high-speed digital-to-analog converters (DACs), and low phase noise Rubidium
    (Rb) references for synchronization. Using steerable narrowbeam antennas, the
    system can measure up to 185 dB of path loss. The channel sounder is used to
    measure the directional and omnidirectional received power as a receiver
    transitions from line-of-sight to non-line-of-sight conditions down an urban
    canyon. A 25 dB drop in omnidirectional received power was observed as the
    receiver transitioned from line-of-sight (LOS) conditions to deeply shadowed
    non-LOS (NLOS) conditions. The channel sounder was also used to study signal
    variation and spatial consistency for a local set of receiver locations
    arranged in a cluster spanning a 5 m x 10 m local area, where the
    omnidirectional received power in LOS and NLOS environments is found to be
    relatively stable with standard deviations of received power of 2.2 dB and 4.3
    dB, respectively. This work shows that when implementing beamforming at the
    transmitter at mmWave, the omnidirectional received power over a local area has
    little fluctuation among receiver locations separated by a few to several
    meters.

    Indoor Office Wideband Penetration Loss Measurements at 73 GHz

    Jacqueline Ryan, George R. MacCartney Jr., Theodore S. Rappaport
    Comments: To be published in 2017 IEEE International Conference on Communications Workshop (ICCW), Paris, France, May 2017
    Subjects: Information Theory (cs.IT)

    This paper presents millimeter wave (mmWave) penetration loss measurements
    and analysis at 73 GHz using a wideband sliding correlator channel sounder in
    an indoor office environment. Penetration loss was measured using a carefully
    controlled measurement setup for many common indoor building materials such as
    glass doors, glass windows, closet doors, steel doors, and whiteboard writing
    walls. Measurements were conducted using narrowbeam transmitter (TX) and
    receiver (RX) horn antennas that were boresight-aligned with a test material
    between the antennas. Overall, 21 different locations were measured for 6
    different materials such that the same type of material was tested in at least
    two locations in order to characterize the effect of penetration loss for
    materials with similar composition. As shown here, attenuation through common
    materials ranged between 0.8 dB/cm and 9.9 dB/cm for co-polarized antennas,
    while cross-polarized antennas exhibited similar attenuation for most
    materials, but up to 23.4 dB/cm of attenuation for others. The penetration loss
    results presented here are useful for site-specific planning tools that will
    model indoor mmWave networks, without the need for expensive measurement
    campaigns.

    Improved Power Decoding of One-Point Hermitian Codes

    Sven Puchinger, Irene Bouw, Johan Rosenkilde né Nielsen
    Comments: 9 pages, submitted to the International Workshop on Coding and Cryptography (WCC) 2017
    Subjects: Information Theory (cs.IT)

    We propose a new partial decoding algorithm for one-point Hermitian codes
    that can decode up to the same number of errors as the Guruswami–Sudan
    decoder. Simulations suggest that it has a similar failure probability as the
    latter one. The algorithm is based on a recent generalization of the power
    decoding algorithm for Reed–Solomon codes and does not require an expensive
    root-finding step. In addition, it promises improvements for decoding
    interleaved Hermitian codes.

    Security Considerations of a New Helper Data Scheme for Physical Unclonable Functions

    Sven Müelich, Sven Puchinger, Martin Bossert
    Comments: 10 pages, submitted to the International Workshop on Coding and Cryptography (WCC) 2017
    Subjects: Information Theory (cs.IT)

    When Physical Unclonable Functions (PUFs) are used for cryptographic
    purposes, error correction in combination with a helper data scheme is an
    essential component due to the fuzzy nature of a PUF. All known schemes require
    both a code and additional helper data to recover PUF responses. Recently,
    M”uelich and Bossert proposed a scheme that only requires a code. In this
    paper, we answer two open questions about the new scheme. First, we show that
    under certain assumptions, the code construction within the scheme is possible
    with high probability. Second, it is proven that after constructing the code,
    the security level of the scheme is known and is sufficiently large with high
    probability.

    Nonconvex Regularization Based Sparse Recovery and Demixing with Application to Color Image Inpainting

    Fei Wen, Lasith Adhikari, Ling Pei, Roummel F. Marcia, Peilin Liu, Robert C. Qiu
    Comments: 13 pages, 9 figures
    Subjects: Information Theory (cs.IT)

    This work addresses the recovery and demixing problem of signals that are
    sparse in some general dictionary. Involved applications include source
    separation, image inpainting, super-resolution, and restoration of signals
    corrupted by clipping, saturation, impulsive noise, or narrowband interference.
    We employ the (ell_q)-norm ((0 le q < 1)) for sparsity inducing and propose a
    constrained (ell_q)-minimization formulation for the recovery and demixing
    problem. This nonconvex formulation is approximately solved by two efficient
    first-order algorithms based on proximal coordinate descent and alternative
    direction method of multipliers (ADMM), respectively. The new algorithms are
    convergent in the nonconvex case under some mild conditions and scale well for
    high-dimensional problems. A convergence condition of the new ADMM algorithm
    has been derived. Furthermore, extension of the two algorithms for
    multi-channels joint recovery has been presented, which can further exploit the
    joint sparsity pattern among multi-channel signals. Various numerical
    experiments showed that the new algorithms can achieve considerable performance
    gain over the (ell_1)-regularized algorithms.

    Efficient and Robust Recovery of Sparse Signal and Image Using Generalized Nonconvex Regularization

    Fei Wen, Yuan Yang, Ling Pei, Wenxian Yu, Peilin Liu
    Comments: 13 pages, 5 figures
    Subjects: Information Theory (cs.IT)

    This work addresses the robust reconstruction problem of a sparse signal from
    compressed measurements. We propose a robust formulation for sparse
    reconstruction which employs the (ell_1)-norm as the loss function for the
    residual error and utilizes a generalized nonconvex penalty for sparsity
    inducing. The (ell_1)-loss is less sensitive to outliers in the measurements
    than the popular (ell_2)-loss, while the nonconvex penalty has the capability
    of ameliorating the bias problem of the popular convex LASSO penalty and thus
    can yield more accurate recovery. To solve this nonconvex and nonsmooth
    minimization formulation efficiently, we propose a first-order algorithm based
    on alternating direction method of multipliers (ADMM). A smoothing strategy on
    the (ell_1)-loss function has been used in deriving the new algorithm to make
    it convergent. Further, a sufficient condition for the convergence of the new
    algorithm has been provided for generalized nonconvex regularization. In
    comparison with several state-of-the-art algorithms, the new algorithm showed
    better performance in numerical experiments in recovering sparse signals and
    compressible images. The new algorithm scales well for large-scale problems, as
    often encountered in image processing.

    Toward Traffic Patterns in High-speed Railway Communication Systems: Power Allocation and Antenna Selection

    Jiaxun Lu, Ke Xiong, Xuhong Chen, Pingyi Fan
    Comments: 30 pages, 9 figures, submitted for journal publication
    Subjects: Information Theory (cs.IT)

    In high-speed railway (HSR) communication systems, distributed antenna is
    usually employed to support frequent handover and enhance the signal to noise
    ratio to user equipments. In this case, dynamic time-domain power allocation
    and antenna selection (PAWAS) could be jointly optimized to improve the system
    performances. This paper consider this problem in such a simple way where
    dynamic switching between multiple-input-multiple-output (MIMO) and
    single-input-multiple-output (SIMO) is allowed and exclusively utilized, while
    the channel states and traffic demand are taken into account. The channel
    states includes sparse and rich scattering terrains, and the traffic patterns
    includes delay-sensitive and delay-insensitive as well as hybrid. Some
    important results are obtained in theory. In sparse scattering terrains, for
    delay-sensitive traffic, the PAWAS can be viewed as the generalization of
    channel-inversion associated with transmit antenna selection. On the contrary,
    for delay-insensitive traffic, the power allocation with MIMO can be viewed as
    channel-inversion, but with SIMO, it is traditional water-filling. For the
    hybrid traffic, the PAWAS can be partitioned as delay-sensitive and
    delay-insensitive parts by some specific strategies. In rich scattering
    terrains, the corresponding PAWAS is derived by some amendments in sparse
    scattering terrains and similar results are then presented.

    A New Decoding for Polynomial Residue Codes with Non-Pairwise Coprime Moduli

    Li Xiao, Xiang-Gen Xia
    Comments: 4 pages
    Subjects: Information Theory (cs.IT)

    This paper revisits polynomial residue codes with non-pairwise coprime moduli
    by presenting a new decoding, called the minimum degree-weighted distance
    decoding. This decoding is based on the degree-weighted distance and different
    from the traditional minimum Hamming distance decoding. It is shown that for
    the two types of minimum distance decoders, i.e., the minimum degree-weighted
    distance decoding and the minimum Hamming distance decoding, one is not
    absolutely stronger than the other, but they can complement each other from
    different points of view.

    Robust Polynomial Reconstruction via Chinese Remainder Theorem in the Presence of Small Degree Residue Errors

    Li Xiao, Xiang-Gen Xia
    Comments: 5 pages
    Subjects: Information Theory (cs.IT)

    Based on unique decoding of the polynomial residue code with non-pairwise
    coprime moduli, a polynomial with degree less than that of the least common
    multiple (lcm) of all the moduli can be accurately reconstructed when the
    number of residue errors is less than half the minimum distance of the code.
    However, once the number of residue errors is beyond half the minimum distance
    of the code, the unique decoding may fail and lead to a large reconstruction
    error. In this paper, assuming that all the residues are allowed to have errors
    with small degrees, we consider how to reconstruct the polynomial as accurately
    as possible in the sense that a reconstructed polynomial is obtained with only
    the last ( au) number of coefficients being possibly erroneous, when the
    residues are affected by errors with degrees upper bounded by ( au). In this
    regard, we first propose a multi-level robust Chinese remainder theorem (CRT)
    for polynomials, namely, a trade-off between the dynamic range of the degree of
    the polynomial to be reconstructed and the residue error bound ( au) is
    formulated. Furthermore, a simple closed-form reconstruction algorithm is also
    proposed.

    Proof of Correspondence between Keys and Encoding Maps in an Authentication Code

    Juan Carlos Ku-Cauich, Guillermo Morales-Luna, Horacio Tapia-Recillas
    Subjects: Number Theory (math.NT); Information Theory (cs.IT)

    In a former paper the authors introduced two new systematic authentication
    codes based on the Gray map over a Galois ring. In this paper, it is proved the
    one-to-one onto correspondence between keys and encoding maps for the second
    introduced authentication code.

    Extending Differential Privacy for Treating Dependent Records via Information Theory

    Genqiang Wu, Xianyao Xia, Yeping He
    Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)

    Differential privacy is a strong privacy notion based on indistinguishability
    of outputs of two neighboring datasets, which represent two states of one’s
    information is within or without of a dataset. However, when facing dependent
    records, the representation would lose its foundation. Motivated by the
    observation, we introduce a variant of differential privacy notion based on the
    influence of outputs to an individual’s inputs. The new notion accurately
    captures the the weakening of the dependent records to the privacy guarantee of
    differential privacy.

    Our new privacy notion gets on well with the differential privacy. When the
    individuals are independent, the differential privacy model would be one
    spatial case of our model. When the individuals are dependent, the group
    privacy method to achieve differential privacy in dependent case can be used to
    achieve new privacy model. This fits in well with the results of differential
    privacy.

    Finally, our new privacy model fits in well with the information theory. We
    prove that if one mechanism satisfies the new privacy notion, the mutual
    information of one individual to the mechanism’s outputs would be upper bounded
    by a small valued. This implies that the rationality of our new model is based
    on the information theory.




沪ICP备19023445号-2号
友情链接