IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Thu, 4 May 2017

    我爱机器学习(52ml.net)发表于 2017-05-04 00:00:00
    love 0

    Neural and Evolutionary Computing

    Quantified advantage of discontinuous weight selection in approximations with deep neural networks

    Dmitry Yarotsky
    Comments: 12 pages, submitted to J. Approx. Theory
    Subjects: Neural and Evolutionary Computing (cs.NE)

    We consider approximations of 1D Lipschitz functions by deep ReLU networks of
    a fixed width. We prove that without the assumption of continuous weight
    selection the uniform approximation error is lower than with this assumption at
    least by a factor logarithmic in the size of the network.

    Balanced Excitation and Inhibition are Required for High-Capacity, Noise-Robust Neuronal Selectivity

    Ran Rubin, L.F. Abbott, Haim Sompolinsky
    Comments: Article and supplementary information
    Subjects: Neurons and Cognition (q-bio.NC); Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Neurons and networks in the cerebral cortex must operate reliably despite
    multiple sources of noise. To evaluate the impact of both input and output
    noise, we determine the robustness of single-neuron stimulus selective
    responses, as well as the robustness of attractor states of networks of neurons
    performing memory tasks. We find that robustness to output noise requires
    synaptic connections to be in a balanced regime in which excitation and
    inhibition are strong and largely cancel each other. We evaluate the conditions
    required for this regime to exist and determine the properties of networks
    operating within it. A plausible synaptic plasticity rule for learning that
    balances weight configurations is presented. Our theory predicts an optimal
    ratio of the number of excitatory and inhibitory synapses for maximizing the
    encoding capacity of balanced networks for a given statistics of afferent
    activations. Previous work has shown that balanced networks amplify
    spatio-temporal variability and account for observed asynchronous irregular
    states. Here we present a novel type of balanced network that amplifies small
    changes in the impinging signals, and emerges automatically from learning to
    perform neuronal and network functions robustly.

    Ternary Neural Networks with Fine-Grained Quantization

    Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose a novel fine-grained quantization method for ternarizing
    pre-trained full precision models, while also constraining activations to
    8-bits. Using this method, we demonstrate minimal loss in classification
    accuracy on state-of-the-art topologies without additional training. This
    enables a full 8-bit inference pipeline, with best reported accuracy using
    ternary weights on ImageNet dataset. Further, we also provide an improved
    theoretical formulation that forms the basis for a higher quality solution with
    this approach. Our method involves ternarizing the original weight tensor in
    groups of (N) weights. Using (N=4), we achieve Top-1 accuracy within (3.7\%)
    and (5.8\%) of the baseline full precision result for Resnet-101 and Resnet-50
    respectively, while eliminating (75\%) of all multiplications. We also study
    the impact of group size on both performance and accuracy. With a group size of
    (N=64), we eliminate (approx99\%) of the multiplications; however, this
    introduces a significant drop in accuracy, which necessitates fine tuning the
    parameters (re-training) at lower precision. To address this, we re-train
    Resnet-50 with 8-bit activations and ternary weights, improving the Top-1
    accuracy to within (4\%) of the full precision result with (<30\%) additional
    overhead. Our final quantized model can run on a full 8-bit compute pipeline
    using 2-bit weights and has the potential of up to (16 imes) improvement in
    performance compared to baseline full-precision models.

    Going Wider: Recurrent Neural Network With Parallel Cells

    Danhao Zhu, Si Shen, Xin-Yu Dai, Jiajun Chen
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recurrent Neural Network (RNN) has been widely applied for sequence modeling.
    In RNN, the hidden states at current step are full connected to those at
    previous step, thus the influence from less related features at previous step
    may potentially decrease model’s learning ability. We propose a simple
    technique called parallel cells (PCs) to enhance the learning ability of
    Recurrent Neural Network (RNN). In each layer, we run multiple small RNN cells
    rather than one single large cell. In this paper, we evaluate PCs on 2 tasks.
    On language modeling task on PTB (Penn Tree Bank), our model outperforms state
    of art models by decreasing perplexity from 78.6 to 75.3. On Chinese-English
    translation task, our model increases BLEU score for 0.39 points than baseline
    model.


    Computer Vision and Pattern Recognition

    Gabor Convolutional Networks

    Shangzhen Luan, Baochang Zhang, Chen Chen, Xianbin Cao, Qixiang Ye, Jungong Han, Jianzhuang Liu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Steerable properties dominate the design of traditional filters, e.g., Gabor
    filters, and endow features the capability of dealing with spatial
    transformations. However, such excellent properties have not been well explored
    in the popular deep convolutional neural networks (DCNNs). In this paper, we
    propose a new deep model, termed Gabor Convolutional Networks (GCNs or Gabor
    CNNs), which incorporates Gabor filters into DCNNs to enhance the resistance of
    deep learned features to the orientation and scale changes. By only
    manipulating the basic element of DCNNs based on Gabor filters, i.e., the
    convolution operator, GCNs can be easily implemented and are compatible with
    any popular deep learning architecture. Experimental results demonstrate the
    super capability of our algorithm in recognizing objects, where the scale and
    rotation changes occur frequently. The proposed GCNs have much fewer learnable
    network parameters, and thus is easier to train with an end-to-end pipeline. To
    encourage further developments, the source code is released at Github.

    Learning to Estimate 3D Hand Pose from Single RGB Images

    Christian Zimmermann, Thomas Brox
    Comments: this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Low-cost consumer depth cameras and deep learning have enabled reasonable 3D
    hand pose estimation from single depth images. In this paper, we present an
    approach that estimates 3D hand pose from regular RGB images. This task has far
    more ambiguities due to the missing depth information. To this end, we propose
    a deep network that learns a network-implicit 3D articulation prior. Together
    with detected keypoints in the images, this network yields good estimates of
    the 3D pose. We introduce a large scale 3D hand pose dataset based on synthetic
    hand models for training the involved networks. Experiments on a variety of
    test sets, including one on sign language recognition, demonstrate the
    feasibility of 3D hand pose estimation on single color images.

    Weakly-supervised Visual Grounding of Phrases with Linguistic Structures

    Fanyi Xiao, Leonid Sigal, Yong Jae Lee
    Comments: CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a weakly-supervised approach that takes image-sentence pairs as
    input and learns to visually ground (i.e., localize) arbitrary linguistic
    phrases, in the form of spatial attention masks. Specifically, the model is
    trained with images and their associated image-level captions, without any
    explicit region-to-phrase correspondence annotations. To this end, we introduce
    an end-to-end model which learns visual groundings of phrases with two types of
    carefully designed loss functions. In addition to the standard discriminative
    loss, which enforces that attended image regions and phrases are consistently
    encoded, we propose a novel structural loss which makes use of the parse tree
    structures induced by the sentences. In particular, we ensure complementarity
    among the attention masks that correspond to sibling noun phrases, and
    compositionality of attention masks among the children and parent phrases, as
    defined by the sentence parse tree. We validate the effectiveness of our
    approach on the Microsoft COCO and Visual Genome datasets.

    Why Rotation Averaging is Easy

    Anders Eriksson, Carl Olsson, Fredrik Kahl, Olof Enqvist, Tat-Jun Chin
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper we explore the role of duality principles within the problem of
    rotation averaging, a fundamental task in a wide range of computer vision
    applications. In its conventional form, rotation averaging is stated as a
    minimization over multiple rotation constraints. As these constraints are
    non-convex, this problem is generally considered very challenging to solve
    globally.

    In this work we show how to surpass this difficulty through the use of
    Lagrangian duality. While such an approach is well-known it is normally not
    guaranteed to provide a tight relaxation. We analytically prove that unless the
    noise levels are severe, there will be no duality gap. This allows us to obtain
    certifiably global solutions to a class of important non-convex problems in
    polynomial time.

    We also propose an efficient, scalable algorithm that out-performs general
    purpose numerical solvers and is able to handle the large problem instances
    commonly occurring in structure from motion settings. The potential of this
    proposed method is demonstrated on a number of different problems, consisting
    of both synthetic and real world data, with convincing results.

    FOIL it! Find One mismatch between Image and Language caption

    Ravi Shekhar, Sandro Pezzelle, Yauhen Klimovich, Aurelie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi
    Comments: To appear at ACL 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)

    In this paper, we aim to understand whether current language and vision
    (LaVi) models truly grasp the interaction between the two modalities. To this
    end, we propose an extension of the MSCOCO dataset, FOIL-COCO, which associates
    images with both correct and “foil” captions, that is, descriptions of the
    image that are highly similar to the original ones, but contain one single
    mistake (“foil word”). We show that current LaVi models fall into the traps of
    this data and perform badly on three tasks: a) caption classification (correct
    vs. foil); b) foil word detection; c) foil word correction. Humans, in
    contrast, have near-perfect performance on those tasks. We demonstrate that
    merely utilising language cues is not enough to model FOIL-COCO and that it
    challenges the state-of-the-art by requiring a fine-grained understanding of
    the relation between text and image.

    Optical Flow in Mostly Rigid Scenes

    Jonas Wulff, Laura Sevilla-Lara, Michael J. Black
    Comments: 15 pages, 10 figures; accepted for publication at CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The optical flow of natural scenes is a combination of the motion of the
    observer and the independent motion of objects. Existing algorithms typically
    focus on either recovering motion and structure under the assumption of a
    purely static world or optical flow for general unconstrained scenes. We
    combine these approaches in an optical flow algorithm that estimates an
    explicit segmentation of moving objects from appearance and physical
    constraints. In static regions we take advantage of strong constraints to
    jointly estimate the camera motion and the 3D structure of the scene over
    multiple frames. This allows us to also regularize the structure instead of the
    motion. Our formulation uses a Plane+Parallax framework, which works even under
    small baselines, and reduces the motion estimation to a one-dimensional search
    problem, resulting in more accurate estimation. In moving regions the flow is
    treated as unconstrained, and computed with an existing optical flow method.
    The resulting Mostly-Rigid Flow (MR-Flow) method achieves state-of-the-art
    results on both the MPI-Sintel and KITTI-2015 benchmarks.

    Learning Cross-Domain Disentangled Deep Representation with Supervision from A Single Domain

    Tzu-Chien Fu, Yen-Cheng Liu, Wei-Chen Chiu, Sheng-De Wang, Yu-Chiang Frank Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The recent progress and development of deep generative models have led to
    remark- able improvements in research topics in computer vision and machine
    learning. In this paper, we address the task of cross-domain feature
    disentanglement. We advance the idea of unsupervised domain adaptation and
    propose to perform joint feature disentanglement and adaptation. Based on
    generative adversarial networks, we present a novel deep learning architecture
    with disentanglement ability, which observes cross-domain image data and
    derives latent features with the underly- ing factors (e.g., attributes). As a
    result, our generative model is able to address cross-domain feature
    disentanglement with only the (attribute) supervision from the source-domain
    data (not the target-domain ones). In the experiments, we apply our model for
    generating and classifying images with particular attributes, and show that
    satisfactory results can be produced.

    Amortized Inference and Learning in Latent Conditional Random Fields for Weakly-Supervised Semantic Image Segmentation

    Gaurav Pandey, Ambedkar Dukkipati
    Comments: 8 pages, 3 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Conditional random fields (CRFs) are commonly employed as a post-processing
    tool for image segmentation tasks. The unary potentials of the CRF are often
    learnt independently by a classifier, thereby decoupling the inference in CRF
    from the training of classifier. Such a scheme works effectively, when
    pixel-level labelling is available for all the images. However, in absence of
    pixel-level labels, the classifier is faced with the uphill task of selectively
    assigning the image-level labels to the pixels of the image. Prior work often
    relied on localization cues, such as saliency maps, objectness priors, bounding
    boxes etc., to address this challenging problem. In contrast, we model the
    labels of the pixels as latent variables of a CRF. The pixels and the
    image-level labels are the observed variables of the latent CRF. We amortize
    the cost of inference in the latent CRF over the entire dataset, by training an
    inference network to approximate the posterior distribution of the latent
    variables given the observed variables. The inference network can be trained in
    an end-to-end fashion, and requires no localization cues for training.
    Moreover, unlike other approaches for weakly-supervised segmentation, the
    proposed model doesn’t require further post-processing. The proposed model
    achieves performance comparable with other approaches that employ saliency
    masks for the task of weakly-supervised semantic image segmentation on the
    challenging VOC 2012 dataset.

    Super-Resolution of Wavelet-Encoded Images

    Vildan Atalay Aydin, Hassan Foroosh
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Multiview super-resolution image reconstruction (SRIR) is often cast as a
    resampling problem by merging non-redundant data from multiple low-resolution
    (LR) images on a finer high-resolution (HR) grid, while inverting the effect of
    the camera point spread function (PSF). One main problem with multiview methods
    is that resampling from nonuniform samples (provided by LR images) and the
    inversion of the PSF are highly nonlinear and ill-posed problems. Non-linearity
    and ill-posedness are typically overcome by linearization and regularization,
    often through an iterative optimization process, which essentially trade off
    the very same information (i.e. high frequency) that we want to recover. We
    propose a novel point of view for multiview SRIR: Unlike existing multiview
    methods that reconstruct the entire spectrum of the HR image from the multiple
    given LR images, we derive explicit expressions that show how the
    high-frequency spectra of the unknown HR image are related to the spectra of
    the LR images. Therefore, by taking any of the LR images as the reference to
    represent the low-frequency spectra of the HR image, one can reconstruct the
    super-resolution image by focusing only on the reconstruction of the
    high-frequency spectra. This is very much like single-image methods, which
    extrapolate the spectrum of one image, except that we rely on information
    provided by all other views, rather than by prior constraints as in
    single-image methods (which may not be an accurate source of information). This
    is made possible by deriving and applying explicit closed-form expressions that
    define how the local high frequency information that we aim to recover for the
    reference high resolution image is related to the local low frequency
    information in the sequence of views. Results and comparisons with recently
    published state-of-the-art methods show the superiority of the proposed
    solution.

    The Forgettable-Watcher Model for Video Question Answering

    Hongyang Xue, Zhou Zhao, Deng Cai
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    A number of visual question answering approaches have been proposed recently,
    aiming at understanding the visual scenes by answering the natural language
    questions. While the image question answering has drawn significant attention,
    video question answering is largely unexplored.

    Video-QA is different from Image-QA since the information and the events are
    scattered among multiple frames. In order to better utilize the temporal
    structure of the videos and the phrasal structures of the answers, we propose
    two mechanisms: the re-watching and the re-reading mechanisms and combine them
    into the forgettable-watcher model. Then we propose a TGIF-QA dataset for video
    question answering with the help of automatic question generation. Finally, we
    evaluate the models on our dataset. The experimental results show the
    effectiveness of our proposed models.

    Part-based Weighting Aggregation of Deep Convolutional Features for Image

    Jian Xu, Cunzhao Shi, Chengzuo Qi, Chunheng Wang, Baihua Xiao
    Comments: 9 pages, 3 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Several recent works have shown that part-based image representation provides
    state-of-the-art performance for fine-grained categorization. Moreover, it has
    also been shown that image global representation generated by aggregating deep
    convolutional features provides excellent performance for image retrieval. In
    this paper we propose a novel aggregation method, which utilizes the
    information of retrieval object parts. The proposed part-based weighting
    aggregation (PWA) method utilizes the normalized feature maps as part detectors
    to weight and aggregate the convolutional features. The part detectors which
    are selected by the unsupervised method highlight the discriminative parts of
    objects and effectively suppress the noise of background. We experiment on five
    public standard datasets for image retrieval. Our unsupervised PWA method
    outperforms the state-of-the-art approaches based on pre-trained networks and
    achieves comparable accuracy with the fine-tuned methods. It is worth noting
    that our unsupervised method is very suitable and effective for the situation
    where the annotated training dataset is difficult to collect.

    Marine Animal Classification with Correntropy Loss Based Multi-view Learning

    Zheng Cao, Shujian Yu, Bing Ouyang, Fraser Dalgleish, Anni Vuorenkoski, Gabriel Alsenas, Jose Principe
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    To analyze marine animals behavior, seasonal distribution and abundance,
    digital imagery can be acquired by visual or Lidar camera. Depending on the
    quantity and properties of acquired imagery, the animals are characterized as
    either features (shape, color, texture, etc.), or dissimilarity matrices
    derived from different shape analysis methods (shape context, internal distance
    shape context, etc.). For both cases, multi-view learning is critical in
    integrating more than one set of feature or dissimilarity matrix for higher
    classification accuracy. This paper adopts correntropy loss as cost function in
    multi-view learning, which has favorable statistical properties for rejecting
    noise. For the case of features, the correntropy loss-based multi-view learning
    and its entrywise variation are developed based on the multi-view intact space
    learning algorithm. For the case of dissimilarity matrices, the robust
    Euclidean embedding algorithm is extended to its multi-view form with the
    correntropy loss function. Results from simulated data and real-world marine
    animal imagery show that the proposed algorithms can effectively enhance
    classification rate, as well as suppress noise under different noise
    conditions.

    Cascaded Boundary Regression for Temporal Action Detection

    Jiyang Gao, Zhenheng Yang, Ram Nevatia
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Temporal action detection in long videos is an important problem.
    State-of-the-art methods address this problem by applying action classifiers on
    sliding windows. Although sliding windows may contain an identifiable portion
    of the actions, they may not necessarily cover the entire action instance,
    which would lead to inferior performance. We adapt a two-stage temporal action
    detection pipeline with Cascaded Boundary Regression (CBR) model.
    Class-agnostic proposals and specific actions are detected respectively in the
    first and the second stage. CBR uses temporal coordinate regression to refine
    the temporal boundaries of the sliding windows. The salient aspect of the
    refinement process is that, inside each stage, the temporal boundaries are
    adjusted in a cascaded way by feeding the refined windows back to the system
    for further boundary refinement. We test CBR on THUMOS-14 and TVSeries, and
    achieve state-of-the-art performance on both datasets. The performance gain is
    especially remarkable under high IoU thresholds, e.g. map@tIoU=0.5 on THUMOS-14
    is improved from 19.0% to 31.0%.

    Shading Annotations in the Wild

    Balazs Kovacs, Sean Bell, Noah Snavely, Kavita Bala
    Comments: CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

    Understanding shading effects in images is critical for a variety of vision
    and graphics problems, including intrinsic image decomposition, shadow removal,
    image relighting, and inverse rendering. As is the case with other vision
    tasks, machine learning is a promising approach to understanding shading – but
    there is little ground truth shading data available for real-world images. We
    introduce Shading Annotations in the Wild (SAW), a new large-scale, public
    dataset of shading annotations in indoor scenes, comprised of multiple forms of
    shading judgments obtained via crowdsourcing, along with shading annotations
    automatically generated from RGB-D imagery. We use this data to train a
    convolutional neural network to predict per-pixel shading information in an
    image. We demonstrate the value of our data and network in an application to
    intrinsic images, where we can reduce decomposition artifacts produced by
    existing algorithms. Our database is available at
    this http URL

    Out-of-focus: Learning Depth from Image Bokeh for Robotic Perception

    Eric Cristofalo, Zijian Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    In this project, we propose a novel approach for estimating depth from RGB
    images. Traditionally, most work uses a single RGB image to estimate depth,
    which is inherently difficult and generally results in poor performance, even
    with thousands of data examples. In this work, we alternatively use multiple
    RGB images that were captured while changing the focus of the camera’s lens.
    This method leverages the natural depth information correlated to the different
    patterns of clarity/blur in the sequence of focal images, which helps
    distinguish objects at different depths. Since no such data set exists for
    learning this mapping, we collect our own data set using customized hardware.
    We then use a convolutional neural network for learning the depth from the
    stacked focal images. Comparative studies were conducted on both a standard
    RGBD data set and our own data set (learning from both single and multiple
    images), and results verified that stacked focal images yield better depth
    estimation than using just single RGB image.

    Recovery of structure of looped jointed objects from multiframes

    Mieczysław Kłopotek
    Journal-ref: a preliminary version for Machine Graphics and Vision, Vol. 3 No.
    4, pp. 645-656, 1995
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A method to recover structural parameters of looped jointed objects from
    multiframes is being developed. Each rigid part of the jointed body needs only
    to be traced at two (that is at junction) points.

    This method has been linearized for 4-part loops, with recovery from at least
    19 frames.


    Artificial Intelligence

    Answer Set Programming for Non-Stationary Markov Decision Processes

    Leonardo A. Ferreira, Reinaldo A. C. Bianchi, Paulo E. Santos, Ramon Lopez de Mantaras
    Subjects: Artificial Intelligence (cs.AI)

    Non-stationary domains, where unforeseen changes happen, present a challenge
    for agents to find an optimal policy for a sequential decision making problem.
    This work investigates a solution to this problem that combines Markov Decision
    Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming
    (ASP) in a method we call ASP(RL). In this method, Answer Set Programming is
    used to find the possible trajectories of an MDP, from where Reinforcement
    Learning is applied to learn the optimal policy of the problem. Results show
    that ASP(RL) is capable of efficiently finding the optimal solution of an MDP
    representing non-stationary domains.

    A Rule-Based Computational Model of Cognitive Arithmetic

    Ashis Pati, Kantwon Rogers, Hanqing Zhu
    Subjects: Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

    Cognitive arithmetic studies the mental processes used in solving math
    problems. This area of research explores the retrieval mechanisms and
    strategies used by people during a common cognitive task. Past research has
    shown that human performance in arithmetic operations is correlated to the
    numerical size of the problem. Past research on cognitive arithmetic has
    pinpointed this trend to either retrieval strength, error checking, or
    strategy-based approaches when solving equations. This paper describes a
    rule-based computational model that performs the four major arithmetic
    operations (addition, subtraction, multiplication and division) on two
    operands. We then evaluated our model to probe its validity in representing the
    prevailing concepts observed in psychology experiments from the related works.
    The experiments specifically explore the problem size effect, an
    activation-based model for fact retrieval, backup strategies when retrieval
    fails, and finally optimization strategies when faced with large operands. From
    our experimental results, we concluded that our model’s response times were
    comparable to results observed when people performed similar tasks during
    psychology experiments. The fit of our model in reproducing these results and
    incorporating accuracy into our model are discussed.

    Navigating Intersections with Autonomous Vehicles using Deep Reinforcement Learning

    David Isele, Akansel Cosgun, Kaushik Subramanian, Kikuo Fujimura
    Comments: Submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017)
    Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

    Providing an efficient strategy to navigate safely through unsignaled
    intersections is a difficult task that requires determining the intent of other
    drivers. We explore the effectiveness of using Deep Reinforcement Learning to
    handle intersection problems. Combining several recent advances in Deep RL,
    were we able to learn policies that surpass the performance of a commonly-used
    heuristic approach in several metrics including task completion time and goal
    success rate. Our analysis, and the solutions learned by the network point out
    several short comings of current rule-based methods. The fact that Deep RL
    policies resulted in collisions, although rarely, combined with the limitations
    of the policy to generalize well to out-of-sample scenarios suggest a need for
    further research.

    Imagining Probabilistic Belief Change as Imaging (Technical Report)

    Gavin Rens, Thomas Meyer
    Comments: 21 pages
    Subjects: Artificial Intelligence (cs.AI)

    Imaging is a form of probabilistic belief change which could be employed for
    both revision and update. In this paper, we propose a new framework for
    probabilistic belief change based on imaging, called Expected Distance Imaging
    (EDI). EDI is sufficiently general to define Bayesian conditioning and other
    forms of imaging previously defined in the literature. We argue that, and
    investigate how, EDI can be used for both revision and update. EDI’s definition
    depends crucially on a weight function whose properties are studied and whose
    effect on belief change operations is analysed. Finally, four EDI
    instantiations are proposed, two for revision and two for update, and
    probabilistic rationality postulates are suggested for their analysis.

    Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks

    Ruediger Ehlers
    Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We present an approach for the verification of feed-forward neural networks
    in which all nodes have a piece-wise linear activation function. Such networks
    are often used in deep learning and have been shown to be hard to verify for
    modern satisfiability modulo theory (SMT) and integer linear programming (ILP)
    solvers.

    The starting point of our approach is the addition of a global linear
    approximation of the overall network behavior to the verification problem that
    helps with SMT-like reasoning over the network behavior. We present a
    specialized verification algorithm that employs this approximation in a search
    process in which it infers additional node phases for the non-linear nodes in
    the network from partial node phase assignments, similar to unit propagation in
    classical SAT solving. We also show how to infer additional conflict clauses
    and safe node fixtures from the results of the analysis steps performed during
    the search. The resulting approach is evaluated on collision avoidance and
    handwritten digit recognition case studies.

    On the effectiveness of feature set augmentation using clusters of word embeddings

    Georgios Balikas, Ioannis Partalas, Massih-Reza Amini
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    Word clusters have been empirically shown to offer important performance
    improvements on various tasks. Despite their importance, their incorporation in
    the standard pipeline of feature engineering relies more on a trial-and-error
    procedure where one evaluates several hyper-parameters, like the number of
    clusters to be used. In order to better understand the role of such features we
    systematically evaluate their effect on four tasks, those of named entity
    segmentation and classification as well as, those of five-point sentiment
    classification and quantification. Our results strongly suggest that cluster
    membership features improve the performance.

    A Versatile, Sound Tool for Simplifying Definitions

    Alessandro Coglio (Kestrel Institute), Matt Kaufmann (Department of Computer Science, The University of Texas at Austin), Eric W. Smith (Kestrel Institute)
    Comments: In Proceedings ACL2Workshop 2017, arXiv:1705.00766
    Journal-ref: EPTCS 249, 2017, pp. 61-77
    Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI)

    We present a tool, simplify-defun, that transforms the definition of a given
    function into a simplified definition of a new function, providing a proof
    checked by ACL2 that the old and new functions are equivalent. When appropriate
    it also generates termination and guard proofs for the new function. We explain
    how the tool is engineered so that these proofs will succeed. Examples
    illustrate its utility, in particular for program transformation in synthesis
    and verification.

    Lifelong Metric Learning

    Gan Sun, Yang Cong, Ji Liu, Xiaowei Xu
    Comments: 7 pages, 3 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    The state-of-the-art online learning approaches is only capable of learning
    the metric for predefined tasks. In this paper, we consider lifelong learning
    problem to mimic “human learning”, i.e., endow a new capability to the learned
    metric for a new task from new online samples and incorporating previous
    experiences and knowledge. Therefore, we propose a new framework: lifelong
    metric learning (LML), which only utilizes the data of the new task to train
    the metric model while preserving the original capabilities. More specifically,
    the proposed LML maintains a common subspace for all learned metrics, named
    lifelong dictionary, transfers knowledge from the common subspace to each new
    metric task with task-specific idiosyncrasy, and redefines the common subspace
    over time to maximize performance across all metric tasks. We apply online
    Passive Aggressive optimization to solve the proposed LML framework. Finally,
    we evaluate our approach by analyzing several multi-task metric learning
    datasets. Extensive experimental results demonstrate effectiveness and
    efficiency of the proposed framework.

    Analyzing Knowledge Transfer in Deep Q-Networks for Autonomously Handling Multiple Intersections

    David Isele, Akansel Cosgun, Kikuo Fujimura
    Comments: Submitted to IEEE International Conference on Intelligent Transportation Systems (ITSC 2017)
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    We analyze how the knowledge to autonomously handle one type of intersection,
    represented as a Deep Q-Network, translates to other types of intersections
    (tasks). We view intersection handling as a deep reinforcement learning
    problem, which approximates the state action Q function as a deep neural
    network. Using a traffic simulator, we show that directly copying a network
    trained for one type of intersection to another type of intersection decreases
    the success rate. We also show that when a network that is pre-trained on Task
    A and then is fine-tuned on a Task B, the resulting network not only performs
    better on the Task B than an network exclusively trained on Task A, but also
    retained knowledge on the Task A. Finally, we examine a lifelong learning
    setting, where we train a single network on five different types of
    intersections sequentially and show that the resulting network exhibited
    catastrophic forgetting of knowledge on previous tasks. This result suggests a
    need for a long-term memory component to preserve knowledge.

    Towards Full Automated Drive in Urban Environments: A Demonstration in GoMentum Station, California

    Akansel Cosgun, Lichao Ma, Jimmy Chiu, Jiawei Huang, Mahmut Demir, Alexandre Miranda Anon, Thang Lian, Hasan Tafish, Samir Al-Stouhi
    Comments: Accepted to Intelligent Vehicles Conference (IV 2017)
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

    Each year, millions of motor vehicle traffic accidents all over the world
    cause a large number of fatalities, injuries and significant material loss.
    Automated Driving (AD) has potential to drastically reduce such accidents. In
    this work, we focus on the technical challenges that arise from AD in urban
    environments. We present the overall architecture of an AD system and describe
    in detail the perception and planning modules. The AD system, built on a
    modified Acura RLX, was demonstrated in a course in GoMentum Station in
    California. We demonstrated autonomous handling of 4 scenarios: traffic lights,
    cross-traffic at intersections, construction zones and pedestrians. The AD
    vehicle displayed safe behavior and performed consistently in repeated
    demonstrations with slight variations in conditions. Overall, we completed 44
    runs, encompassing 110km of automated driving with only 3 cases where the
    driver intervened the control of the vehicle, mostly due to error in GPS
    positioning. Our demonstration showed that robust and consistent behavior in
    urban scenarios is possible, yet more investigation is necessary for full scale
    roll-out on public roads.


    Information Retrieval

    Neural Models for Information Retrieval

    Bhaskar Mitra, Nick Craswell
    Subjects: Information Retrieval (cs.IR)

    Neural ranking models for information retrieval (IR) use shallow or deep
    neural networks to rank search results in response to a query. Traditional
    learning to rank models employ machine learning techniques over hand-crafted IR
    features. By contrast, neural models learn representations of language from raw
    text that can bridge the gap between query and document vocabulary. Unlike
    classical IR models, these new machine learning based approaches are
    data-hungry, requiring large scale training data before they can be deployed.
    This tutorial introduces basic concepts and intuitions behind neural IR models,
    and places them in the context of traditional retrieval models. We begin by
    introducing fundamental concepts of IR and different neural and non-neural
    approaches to learning vector representations of text. We then review shallow
    neural IR methods that employ pre-trained neural term embeddings without
    learning the IR task end-to-end. We introduce deep neural networks next,
    discussing popular deep architectures. Finally, we review the current DNN
    models for information retrieval. We conclude with a discussion on potential
    future directions for neural IR.

    Social Network Analysis of yahoo web-search engine query logs

    Mohamed Aboeleinen, A H M Forhadul Islam
    Comments: 14 pages, 5 figures, project work
    Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    Web is now the undisputed warehouse for information. It can now provide most
    of the answers for modern problems. Search engines do a great job by combining
    and ranking the best results when the users try to search for any particular
    information. However, as we know ‘with great power comes great responsibility’,
    it is not an easy task for data analysts to find the most relevant information
    for the queries. One major challenge is that web search engines face
    difficulties in recognizing users’ specific search interests given his initial
    query. In this project, we have tried to build query networks from web search
    engine query logs, with the nodes representing queries and the edges exhibiting
    the semantic relatedness between Queries.


    Computation and Language

    Chunk-Based Bi-Scale Decoder for Neural Machine Translation

    Hao Zhou, Zhaopeng Tu, Shujian Huang, Xiaohua Liu, Hang Li, Jiajun Chen
    Comments: Accepted as a short paper by ACL 2017
    Subjects: Computation and Language (cs.CL)

    In typical neural machine translation~(NMT), the decoder generates a sentence
    word by word, packing all linguistic granularities in the same time-scale of
    RNN. In this paper, we propose a new type of decoder for NMT, which splits the
    decode state into two parts and updates them in two different time-scales.
    Specifically, we first predict a chunk time-scale state for phrasal modeling,
    on top of which multiple word time-scale states are generated. In this way, the
    target sentence is translated hierarchically from chunks to words, with
    information in different granularities being leveraged. Experiments show that
    our proposed model significantly improves the translation performance over the
    state-of-the-art NMT model.

    Going Wider: Recurrent Neural Network With Parallel Cells

    Danhao Zhu, Si Shen, Xin-Yu Dai, Jiajun Chen
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recurrent Neural Network (RNN) has been widely applied for sequence modeling.
    In RNN, the hidden states at current step are full connected to those at
    previous step, thus the influence from less related features at previous step
    may potentially decrease model’s learning ability. We propose a simple
    technique called parallel cells (PCs) to enhance the learning ability of
    Recurrent Neural Network (RNN). In each layer, we run multiple small RNN cells
    rather than one single large cell. In this paper, we evaluate PCs on 2 tasks.
    On language modeling task on PTB (Penn Tree Bank), our model outperforms state
    of art models by decreasing perplexity from 78.6 to 75.3. On Chinese-English
    translation task, our model increases BLEU score for 0.39 points than baseline
    model.

    Amobee at SemEval-2017 Task 4: Deep Learning System for Sentiment Detection on Twitter

    Alon Rozental, Daniel Fleischer
    Comments: 6 pages, accepted to the 11th International Workshop on Semantic Evaluation (SemEval-2017)
    Subjects: Computation and Language (cs.CL); Machine Learning (stat.ML)

    This paper describes the Amobee sentiment analysis system, adapted to compete
    in SemEval 2017 task 4. The system consists of two parts: a supervised training
    of RNN models based on a Twitter sentiment treebank, and the use of feedforward
    NN, Naive Bayes and logistic regression classifiers to produce predictions for
    the different sub-tasks. The algorithm reached the 3rd place on the 5-label
    classification task (sub-task C).

    On the effectiveness of feature set augmentation using clusters of word embeddings

    Georgios Balikas, Ioannis Partalas, Massih-Reza Amini
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    Word clusters have been empirically shown to offer important performance
    improvements on various tasks. Despite their importance, their incorporation in
    the standard pipeline of feature engineering relies more on a trial-and-error
    procedure where one evaluates several hyper-parameters, like the number of
    clusters to be used. In order to better understand the role of such features we
    systematically evaluate their effect on four tasks, those of named entity
    segmentation and classification as well as, those of five-point sentiment
    classification and quantification. Our results strongly suggest that cluster
    membership features improve the performance.

    A Hybrid Architecture for Multi-Party Conversational Systems

    Maira Gatti de Bayser, Paulo Cavalin, Renan Souza, Alan Braz, Heloisa Candello, Claudio Pinhanez, Jean-Pierre Briot
    Subjects: Computation and Language (cs.CL)

    Multi-party Conversational Systems are systems with natural language
    interaction between one or more people or systems. From the moment that an
    utterance is sent to a group, to the moment that it is replied in the group by
    a member, several activities must be done by the system: utterance
    understanding, information search, reasoning, among others. In this paper we
    present the challenges of designing and building multi-party conversational
    systems, the state of the art, our proposed hybrid architecture using both
    rules and machine learning and some insights after implementing and evaluating
    one on the finance domain.

    FOIL it! Find One mismatch between Image and Language caption

    Ravi Shekhar, Sandro Pezzelle, Yauhen Klimovich, Aurelie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi
    Comments: To appear at ACL 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)

    In this paper, we aim to understand whether current language and vision
    (LaVi) models truly grasp the interaction between the two modalities. To this
    end, we propose an extension of the MSCOCO dataset, FOIL-COCO, which associates
    images with both correct and “foil” captions, that is, descriptions of the
    image that are highly similar to the original ones, but contain one single
    mistake (“foil word”). We show that current LaVi models fall into the traps of
    this data and perform badly on three tasks: a) caption classification (correct
    vs. foil); b) foil word detection; c) foil word correction. Humans, in
    contrast, have near-perfect performance on those tasks. We demonstrate that
    merely utilising language cues is not enough to model FOIL-COCO and that it
    challenges the state-of-the-art by requiring a fine-grained understanding of
    the relation between text and image.

    The Forgettable-Watcher Model for Video Question Answering

    Hongyang Xue, Zhou Zhao, Deng Cai
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    A number of visual question answering approaches have been proposed recently,
    aiming at understanding the visual scenes by answering the natural language
    questions. While the image question answering has drawn significant attention,
    video question answering is largely unexplored.

    Video-QA is different from Image-QA since the information and the events are
    scattered among multiple frames. In order to better utilize the temporal
    structure of the videos and the phrasal structures of the answers, we propose
    two mechanisms: the re-watching and the re-reading mechanisms and combine them
    into the forgettable-watcher model. Then we propose a TGIF-QA dataset for video
    question answering with the help of automatic question generation. Finally, we
    evaluate the models on our dataset. The experimental results show the
    effectiveness of our proposed models.


    Distributed, Parallel, and Cluster Computing

    Deterministic Distributed Construction of (T)-Dominating Sets in Time (T)

    Avery Miller, Andrzej Pelc
    Comments: 13 pages
    Journal-ref: Discrete Applied Mathematics 222: 172-178 (2017)
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    A (k)-dominating set is a set (D) of nodes of a graph such that, for each
    node (v), there exists a node (w in D) at distance at most (k) from (v). Our
    aim is the deterministic distributed construction of small (T)-dominating sets
    in time (T) in networks modeled as undirected (n)-node graphs and under the
    (cal{LOCAL}) communication model.

    For any positive integer (T), if (b) is the size of a pairwise disjoint
    collection of balls of radii at least (T) in a graph, then (b) is an obvious
    lower bound on the size of a (T)-dominating set. Our first result shows that,
    even on rings, it is impossible to construct a (T)-dominating set of size (s)
    asymptotically (b) (i.e., such that (s/b
    ightarrow 1)) in time (T).

    In the range of time (T in Theta (log^* n)), the size of a (T)-dominating
    set turns out to be very sensitive to multiplicative constants in running time.
    Indeed, it follows from cite{KP}, that for time (T=gamma log^* n) with large
    constant (gamma), it is possible to construct a (T)-dominating set whose size
    is a small fraction of (n). By contrast, we show that, for time (T=alpha
    log^* n ) for small constant (alpha), the size of a (T)-dominating set must
    be a large fraction of (n).

    Finally, when (T in o (log^* n)), the above lower bound implies that, for
    any constant (x<1), it is impossible to construct a (T)-dominating set of size
    smaller than (xn), even on rings. On the positive side, we provide an algorithm
    that constructs a (T)-dominating set of size (n- Theta(T)) on all graphs.

    How does Docker affect energy consumption? Evaluating workloads in and out of Docker containers

    Eddie Antonio Santos, Carson McLean, Christopher Solinas, Abram Hindle
    Comments: 12 pages (minus references), 10 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

    Context: Virtual machines provide isolation of services at the cost of
    hypervisors and more resource usage. This spurred the growth of systems like
    Docker that enable single hosts to isolate several applications, similar to
    VMs, within a low-overhead abstraction called containers.

    Motivation: Although containers tout low overhead performance, do they still
    have low energy consumption?

    Methodology: This work statistically compares ((t)-test, Wilcoxon) the energy
    consumption of three application workloads in Docker and on bare-metal Linux.

    Results: In all cases, there was a statistically significant ((t)-test and
    Wilcoxon (p < 0.05)) increase in energy consumption when running tests in
    Docker, mostly due to the performance of I/O system calls.

    Population protocols for leader election and exact majority with O(log^2 n) states and O(log^2 n) convergence time

    Andreas Bilke, Colin Cooper, Robert Elsaesser, Tomasz Radzik
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    We consider the model of population protocols, which can be viewed as a
    sequence of random pairwise interactions of (n) agents (nodes). We show
    population protocols for two problems: the leader election and the exact
    majority voting. The leader election starts with all agents in the same initial
    state and the goal is to converge to the (global) state when exactly one agent
    is in a distinct state (L). The exact majority voting starts with each agent in
    one of the two distinct states (A) or (B) and the goal is to make all nodes
    know which of these two states was the initial majority state, even if that
    majority was just by a single vote.

    Alistarh and Gelashvili [ICALP 2015] showed a leader-election protocol which
    converges in (O(log^3 n)) time w.h.p. and in expectation and needs
    (Theta(log^3 n)) states per agent. We present a protocol which elects the
    leader in (O(log^2 n)) time w.h.p. and in expectation and uses (Theta(log^2
    n)) states per agent. For the exact majority voting, we show a population
    protocol with the same asymptotic performance: (O(log^2 n)) time and
    (Theta(log^2 n)) states per agent. The exact-majority protocol proposed by
    Alistarh et al. [PODC 2015] achieves expected (O(log^2 n)) time, but requires
    a relatively high initial imbalance between (A)’s and (B)’s or a large number
    of states per agent. More recently, Alistarh et al. [SODA 2017] showed
    (O(log^2 n))-state protocols for both problems, with the exact majority
    protocol converging in time (O(log^3 n)), and the leader election protocol
    converging in time (O(log^{6.3} n)) w.h.p. and (O(log^{5.3} n)) in
    expectation.

    Our leader election and exact majority protocols are based on the idea of
    agents counting their local interactions and rely on the probabilistic fact
    that the uniform random selection would limit the divergence of the individual
    counts.

    A Fast Causal Profiler for Task Parallel Programs

    Adarsh Yoga, Santosh Nagarakatte
    Comments: 14 pages
    Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC)

    This paper proposes TaskProf, a profiler that identifies parallelism
    bottlenecks in task parallel programs that manifest when the program is
    executed on a large number of processors. TaskProf computes this profile by
    fine-grained attribution of work to parts of the program and by leveraging the
    structure of a task parallel execution. TaskProf’s profile execution runs in
    parallel using multi-cores. TaskProf’s use of hardware performance counters to
    perform fine-grained measurements minimizes perturbation. TaskProf’s causal
    profile enables users to estimate improvements in parallelism by optimizing a
    region of the program even when concrete optimizations are not known. We have
    used TaskProf to isolate parallelism bottlenecks in twenty three applications
    that use the Intel Threading Building Blocks library. We have designed
    parallelization techniques in five applications to increase parallelism by an
    order of magnitude using TaskProf. Our user study indicates that developers are
    able to isolate performance bottlenecks with ease using TaskProf.


    Learning

    XES Tensorflow – Process Prediction using the Tensorflow Deep-Learning Framework

    Joerg Evermann, Jana-Rebecca Rehse, Peter Fettke
    Subjects: Learning (cs.LG)

    Predicting the next activity of a running process is an important aspect of
    process management. Recently, artificial neural networks, so called
    deep-learning approaches, have been proposed to address this challenge. This
    demo paper describes a software application that applies the Tensorflow
    deep-learning framework to process prediction. The software application reads
    industry-standard XES files for training and presents the user with an
    easy-to-use graphical user interface for both training and prediction. The
    system provides several improvements over earlier work. This demo paper focuses
    on the software implementation and describes the architecture and user
    interface.

    Efficient Spatio-Temporal Gaussian Regression via Kalman Filtering

    Marco Todescato, Andrea Carron, Ruggero Carli, Gianluigi Pillonetto, Luca Schenato
    Comments: 26 pages, 12 figures. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    In this work we study the non-parametric reconstruction of spatio-temporal
    dynamical Gaussian processes (GPs) via GP regression from sparse and noisy
    data. GPs have been mainly applied to spatial regression where they represent
    one of the most powerful estimation approaches also thanks to their universal
    representing properties. Their extension to dynamical processes has been
    instead elusive so far since classical implementations lead to unscalable
    algorithms. We then propose a novel procedure to address this problem by
    coupling GP regression and Kalman filtering. In particular, assuming space/time
    separability of the covariance (kernel) of the process and rational time
    spectrum, we build a finite-dimensional discrete-time state-space process
    representation amenable of Kalman filtering. With sampling over a finite set of
    fixed spatial locations, our major finding is that the Kalman filter state at
    instant (t_k) represents a sufficient statistic to compute the minimum variance
    estimate of the process at any (t geq t_k) over the entire spatial domain.
    This result can be interpreted as a novel Kalman representer theorem for
    dynamical GPs. We then extend the study to situations where the set of spatial
    input locations can vary over time. The proposed algorithms are finally tested
    on both synthetic and real field data, also providing comparisons with standard
    GP and truncated GP regression techniques.

    Ternary Neural Networks with Fine-Grained Quantization

    Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose a novel fine-grained quantization method for ternarizing
    pre-trained full precision models, while also constraining activations to
    8-bits. Using this method, we demonstrate minimal loss in classification
    accuracy on state-of-the-art topologies without additional training. This
    enables a full 8-bit inference pipeline, with best reported accuracy using
    ternary weights on ImageNet dataset. Further, we also provide an improved
    theoretical formulation that forms the basis for a higher quality solution with
    this approach. Our method involves ternarizing the original weight tensor in
    groups of (N) weights. Using (N=4), we achieve Top-1 accuracy within (3.7\%)
    and (5.8\%) of the baseline full precision result for Resnet-101 and Resnet-50
    respectively, while eliminating (75\%) of all multiplications. We also study
    the impact of group size on both performance and accuracy. With a group size of
    (N=64), we eliminate (approx99\%) of the multiplications; however, this
    introduces a significant drop in accuracy, which necessitates fine tuning the
    parameters (re-training) at lower precision. To address this, we re-train
    Resnet-50 with 8-bit activations and ternary weights, improving the Top-1
    accuracy to within (4\%) of the full precision result with (<30\%) additional
    overhead. Our final quantized model can run on a full 8-bit compute pipeline
    using 2-bit weights and has the potential of up to (16 imes) improvement in
    performance compared to baseline full-precision models.

    Lifelong Metric Learning

    Gan Sun, Yang Cong, Ji Liu, Xiaowei Xu
    Comments: 7 pages, 3 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    The state-of-the-art online learning approaches is only capable of learning
    the metric for predefined tasks. In this paper, we consider lifelong learning
    problem to mimic “human learning”, i.e., endow a new capability to the learned
    metric for a new task from new online samples and incorporating previous
    experiences and knowledge. Therefore, we propose a new framework: lifelong
    metric learning (LML), which only utilizes the data of the new task to train
    the metric model while preserving the original capabilities. More specifically,
    the proposed LML maintains a common subspace for all learned metrics, named
    lifelong dictionary, transfers knowledge from the common subspace to each new
    metric task with task-specific idiosyncrasy, and redefines the common subspace
    over time to maximize performance across all metric tasks. We apply online
    Passive Aggressive optimization to solve the proposed LML framework. Finally,
    we evaluate our approach by analyzing several multi-task metric learning
    datasets. Extensive experimental results demonstrate effectiveness and
    efficiency of the proposed framework.

    Local Shrunk Discriminant Analysis (LSDA)

    Zan Gao, Guotai Zhang, Feiping Nie, Hua Zhang
    Subjects: Learning (cs.LG)

    Dimensionality reduction is a crucial step for pattern recognition and data
    mining tasks to overcome the curse of dimensionality. Principal component
    analysis (PCA) is a traditional technique for unsupervised dimensionality
    reduction, which is often employed to seek a projection to best represent the
    data in a least-squares sense, but if the original data is nonlinear structure,
    the performance of PCA will quickly drop. An supervised dimensionality
    reduction algorithm called Linear discriminant analysis (LDA) seeks for an
    embedding transformation, which can work well with Gaussian distribution data
    or single-modal data, but for non-Gaussian distribution data or multimodal
    data, it gives undesired results. What is worse, the dimension of LDA cannot be
    more than the number of classes. In order to solve these issues, Local shrunk
    discriminant analysis (LSDA) is proposed in this work to process the
    non-Gaussian distribution data or multimodal data, which not only incorporate
    both the linear and nonlinear structures of original data, but also learn the
    pattern shrinking to make the data more flexible to fit the manifold structure.
    Further, LSDA has more strong generalization performance, whose objective
    function will become local LDA and traditional LDA when different extreme
    parameters are utilized respectively. What is more, a new efficient
    optimization algorithm is introduced to solve the non-convex objective function
    with low computational cost. Compared with other related approaches, such as
    PCA, LDA and local LDA, the proposed method can derive a subspace which is more
    suitable for non-Gaussian distribution and real data. Promising experimental
    results on different kinds of data sets demonstrate the effectiveness of the
    proposed approach.

    Analyzing Knowledge Transfer in Deep Q-Networks for Autonomously Handling Multiple Intersections

    David Isele, Akansel Cosgun, Kikuo Fujimura
    Comments: Submitted to IEEE International Conference on Intelligent Transportation Systems (ITSC 2017)
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    We analyze how the knowledge to autonomously handle one type of intersection,
    represented as a Deep Q-Network, translates to other types of intersections
    (tasks). We view intersection handling as a deep reinforcement learning
    problem, which approximates the state action Q function as a deep neural
    network. Using a traffic simulator, we show that directly copying a network
    trained for one type of intersection to another type of intersection decreases
    the success rate. We also show that when a network that is pre-trained on Task
    A and then is fine-tuned on a Task B, the resulting network not only performs
    better on the Task B than an network exclusively trained on Task A, but also
    retained knowledge on the Task A. Finally, we examine a lifelong learning
    setting, where we train a single network on five different types of
    intersections sequentially and show that the resulting network exhibited
    catastrophic forgetting of knowledge on previous tasks. This result suggests a
    need for a long-term memory component to preserve knowledge.

    Summarized Network Behavior Prediction

    Shih-Chieh Su
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    This work studies the entity-wise topical behavior from massive network logs.
    Both the temporal and the spatial relationships of the behavior are explored
    with the learning architectures combing the recurrent neural network (RNN) and
    the convolutional neural network (CNN). To make the behavioral data appropriate
    for the spatial learning in CNN, several reduction steps are taken to form the
    topical metrics and place them homogeneously like pixels in the images. The
    experimental result shows both the temporal- and the spatial- gains when
    compared to a multilayer perceptron (MLP) network. A new learning framework
    called spatially connected convolutional networks (SCCN) is introduced to more
    efficiently predict the behavior.

    Balanced Excitation and Inhibition are Required for High-Capacity, Noise-Robust Neuronal Selectivity

    Ran Rubin, L.F. Abbott, Haim Sompolinsky
    Comments: Article and supplementary information
    Subjects: Neurons and Cognition (q-bio.NC); Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Neurons and networks in the cerebral cortex must operate reliably despite
    multiple sources of noise. To evaluate the impact of both input and output
    noise, we determine the robustness of single-neuron stimulus selective
    responses, as well as the robustness of attractor states of networks of neurons
    performing memory tasks. We find that robustness to output noise requires
    synaptic connections to be in a balanced regime in which excitation and
    inhibition are strong and largely cancel each other. We evaluate the conditions
    required for this regime to exist and determine the properties of networks
    operating within it. A plausible synaptic plasticity rule for learning that
    balances weight configurations is presented. Our theory predicts an optimal
    ratio of the number of excitatory and inhibitory synapses for maximizing the
    encoding capacity of balanced networks for a given statistics of afferent
    activations. Previous work has shown that balanced networks amplify
    spatio-temporal variability and account for observed asynchronous irregular
    states. Here we present a novel type of balanced network that amplifies small
    changes in the impinging signals, and emerges automatically from learning to
    perform neuronal and network functions robustly.

    Data-Driven Synthesis of Smoke Flows with CNN-based Feature Descriptors

    Mengyu Chu, Nils Thuerey
    Comments: 14 pages, 17 figures, to appear at SIGGRAPH 2017
    Subjects: Graphics (cs.GR); Learning (cs.LG)

    We present a novel data-driven algorithm to synthesize high resolution flow
    simulations with reusable repositories of space-time flow data. In our work, we
    employ a descriptor learning approach to encode the similarity between fluid
    regions with differences in resolution and numerical viscosity. We use
    convolutional neural networks to generate the descriptors from fluid data such
    as smoke density and flow velocity. At the same time, we present a deformation
    limiting patch advection method which allows us to robustly track deformable
    fluid regions. With the help of this patch advection, we generate stable
    space-time data sets from detailed fluids for our repositories. We can then use
    our learned descriptors to quickly localize a suitable data set when running a
    new simulation. This makes our approach very efficient, and resolution
    independent. We will demonstrate with several examples that our method yields
    volumes with very high effective resolutions, and non-dissipative small scale
    details that naturally integrate into the motions of the underlying flow.

    Going Wider: Recurrent Neural Network With Parallel Cells

    Danhao Zhu, Si Shen, Xin-Yu Dai, Jiajun Chen
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recurrent Neural Network (RNN) has been widely applied for sequence modeling.
    In RNN, the hidden states at current step are full connected to those at
    previous step, thus the influence from less related features at previous step
    may potentially decrease model’s learning ability. We propose a simple
    technique called parallel cells (PCs) to enhance the learning ability of
    Recurrent Neural Network (RNN). In each layer, we run multiple small RNN cells
    rather than one single large cell. In this paper, we evaluate PCs on 2 tasks.
    On language modeling task on PTB (Penn Tree Bank), our model outperforms state
    of art models by decreasing perplexity from 78.6 to 75.3. On Chinese-English
    translation task, our model increases BLEU score for 0.39 points than baseline
    model.

    Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks

    Ruediger Ehlers
    Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We present an approach for the verification of feed-forward neural networks
    in which all nodes have a piece-wise linear activation function. Such networks
    are often used in deep learning and have been shown to be hard to verify for
    modern satisfiability modulo theory (SMT) and integer linear programming (ILP)
    solvers.

    The starting point of our approach is the addition of a global linear
    approximation of the overall network behavior to the verification problem that
    helps with SMT-like reasoning over the network behavior. We present a
    specialized verification algorithm that employs this approximation in a search
    process in which it infers additional node phases for the non-linear nodes in
    the network from partial node phase assignments, similar to unit propagation in
    classical SAT solving. We also show how to infer additional conflict clauses
    and safe node fixtures from the results of the analysis steps performed during
    the search. The resulting approach is evaluated on collision avoidance and
    handwritten digit recognition case studies.


    Information Theory

    Some New Permutation Polynomials over Finite Fields

    Nouara Zoubir, Kenza Guenda
    Subjects: Information Theory (cs.IT)

    In this paper, we construct a new class of complete permutation monomials and
    several classes of permutation polynomials. Further, by giving another
    characterization of o-polynomials, we obtain a class of permutation polynomials
    of the form (G(x)+ gamma Tr(H(x))), where G(X) is neither a permutation nor a
    linearized polynomial. This is an answer to the open problem 1 of Charpin and
    Kyureghyan in [P. Charpin and G. Kyureghyan, When does (G(x)+ gamma Tr(H(x)))
    permute (mathbb{F}_{p^n})?, Finite Fields and Their Applications 15 (2009)
    615–632].

    A Characterization of the Shannon Ordering of Communication Channels

    Rajai Nasser
    Comments: 23 pages, presented in part at ISIT’17. arXiv admin note: text overlap with arXiv:1702.00727
    Subjects: Information Theory (cs.IT)

    The ordering of communication channels was first introduced by Shannon. In
    this paper, we aim to find a characterization of the Shannon ordering. We show
    that (W’) contains (W) if and only if (W) is the skew-composition of (W’) with
    a convex-product channel. This fact is used to derive a characterization of the
    Shannon ordering that is similar to the Blackwell-Sherman-Stein theorem. Two
    channels are said to be Shannon-equivalent if each one is contained in the
    other. We investigate the topologies that can be constructed on the space of
    Shannon-equivalent channels. We introduce the strong topology and the BRM
    metric on this space. Finally, we study the continuity of a few channel
    parameters and operations under the strong topology.

    Fast Real-Time DC State Estimation in Electric Power Systems Using Belief Propagation

    Mirsad Cosovic, Dejan Vukobratovic
    Comments: 6 pages; 7 figures; submitted in the IEEE International Conference on Smart Grid Communications (SmartGridComm 2017)
    Subjects: Information Theory (cs.IT)

    We propose a fast real-time state estimator based on the belief propagation
    algorithm for the power system state estimation. The proposed estimator is easy
    to distribute and parallelize, thus alleviating computational limitations and
    allowing for processing measurements in real time. In fully distributed
    implementation, local modules compute state estimates at the bus level (e.g.,
    at generators, loads, substations) and, instead of forwarding raw measurements,
    they forward their estimates to the control center. The presented algorithm may
    run as a continuous process, with each new measurement being seamlessly
    processed by the distributed state estimator. In contrast to the matrix-based
    state estimation methods, the belief propagation approach is robust to
    ill-conditioned scenarios caused by significant differences between measurement
    variances, thus resulting in a solution that eliminates observability analysis.
    Using the DC model, we numerically demonstrate the performance of the state
    estimator in a realistic real-time system model with asynchronous measurements.
    We note that the extension to the

    Experimental Comparison of Probabilistic Shaping Methods for Unrepeated Fiber Transmission

    Julian Renner, Tobias Fehenberger, Metodi P. Yankov, Francesco Da Ros, Søren Forchhammer, Georg Böcherer, Norbert Hanik
    Comments: 8 pages, 6 figures, 7 tables
    Subjects: Information Theory (cs.IT)

    This paper studies the impact of probabilistic shaping on effective
    signal-to-noise ratios (SNRs) and achievable information rates (AIRs) in a
    back-to-back configuration and in unrepeated nonlinear fiber transmissions. For
    back-to-back, various shaped quadrature amplitude modulation (QAM)
    distributions are found to have the same implementation penalty as uniform
    input. By demonstrating in transmission experiments that shaped QAM input leads
    to lower effective SNR than uniform input at a fixed average launch power, we
    experimentally confirm that shaping enhances the fiber nonlinearities. However,
    shaping is ultimately found to increase the AIR, which is the most relevant
    figure of merit as it is directly related to spectral efficiency. In a detailed
    study of these shaping gains for the nonlinear fiber channel, four strategies
    for optimizing QAM input distributions are evaluated and experimentally
    compared in wavelength division multiplexing (WDM) systems. The first shaping
    scheme generates a Maxwell-Boltzmann (MB) distribution based on a linear
    additive white Gaussian noise channel. The second strategy uses the
    Blahut-Arimoto algorithm to optimize an unconstrained QAM distribution for a
    split-step Fourier method based channel model. In the third and fourth
    approach, MB-shaped QAM and unconstrained QAM are optimized via the enhanced
    Gaussian noise model. All considered strategies result in similar AIR gains of
    up to 0.27 bits per 4D symbol, indicating that, also for highly nonlinear fiber
    transmission, the shaping gains are insensitive to the exact shape of the input
    distribution. This behavior is observed in 9-channel and fully-loaded WDM
    experiments.

    On-The-Fly Secure Key Generation with Deterministic Models

    Rick Fritschek, Gerhard Wunder
    Comments: To appear in ICC 2017
    Subjects: Information Theory (cs.IT)

    It is well-known that wireless channel reciprocity together with fading can
    be exploited to generate a common secret key between two legitimate
    communication partners. This can be achieved by exchanging known deterministic
    pilot signals between both partners from which the random fading gains can be
    estimated and processed. However, the entropy and thus quality of the generated
    key depends on the channel coherence time. This can result in poor key
    generation rates in a low mobility environment, where the fading gains are
    nearly constant. Therefore, wide-spread deployment of wireless channel-based
    secret key generation is limited. To overcome these issues, we follow up on a
    recent idea which uses unknown random pilots and enables “on-the-fly” key
    generation. In addition, the scheme is able to incorporate local sources of
    randomness but performance bounds are hard to obtain with standard methods. In
    this paper, we analyse such a scheme analytically and derive achievable key
    rates in the Alice-Bob-Eve setting. For this purpose, we develop a novel
    approximation model which is inspired by the linear deterministic and the lower
    triangular deterministic model. Using this model, we can derive key rates for
    specific scenarios. We claim that our novel approach provides an intuitive and
    clear framework to analyse similar key generation problems.

    Non-Orthogonal Random Access (NORA) for 5G Networks

    Yanan Liang, Xu Li, Jiayi Zhang, Zhiguo Ding
    Comments: 15 pages, 11 figures,accepted by IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    The massive amounts of machine-type user equipments (UEs) will be supported
    in the future fifth generation (5G) networks. However, the potential large
    random access (RA) delay calls for a new RA scheme and for a detailed
    assessment of its performance. Motivated by the key idea of non-orthogonal
    multiple access, the non-orthogonal random access (NORA) scheme based on
    successive interference cancellation (SIC) is proposed in this paper to
    alleviate the access congestion problem. Specifically, NORA utilizes the
    difference of time of arrival to identify multiple UEs with the identical
    preamble, and enables power domain multiplexing of collided UEs in the
    following access process, while the base station performs SIC based on the
    channel conditions obtained through preamble detection. Our analysis show that
    the performance of NORA is superior to the conventional orthogonal random
    access (ORA) scheme in terms of the preamble collision probability, access
    success probability and throughput of random access. Simulation results verify
    our analysis and further show that our NORA scheme can improve the number of
    the supported UEs by more than 30%. Moreover, the number of preamble
    transmissions and the access delay for successfully accessed UEs are also
    reduced significantly by using the proposed random access scheme.

    The 5G Cellular Backhaul Management Dilemma: To Cache or to Serve

    Kenza Hamidouche, Walid Saad, Mérouane Debbah, Ju Bin Song, Choong Seon Hong
    Comments: Accepted for publication at Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    With the introduction of caching capabilities into small cell networks
    (SCNs), new backaul management mechanisms need to be developed to prevent the
    predicted files that are downloaded by the at the small base stations (SBSs) to
    be cached from jeopardizing the urgent requests that need to be served via the
    backhaul. Moreover, these mechanisms must account for the heterogeneity of the
    backhaul that will be encompassing both wireless backhaul links at various
    frequency bands and a wired backhaul component. In this paper, the
    heterogeneous backhaul management problem is formulated as a minority game in
    which each SBS has to define the number of predicted files to download, without
    affecting the required transmission rate of the current requests. For the
    formulated game, it is shown that a unique fair proper mixed Nash equilibrium
    (PMNE) exists. Self-organizing reinforcement learning algorithm is proposed and
    proved to converge to a unique Boltzmann-Gibbs equilibrium which approximates
    the desired PMNE. Simulation results show that the performance of the proposed
    approach can be close to that of the ideal optimal algorithm while it
    outperforms a centralized greedy approach in terms of the amount of data that
    is cached without jeopardizing the quality-of-service of current requests.

    Resource Allocation for Elastic Optical Networks using Geometric Optimization

    Mohammad Hadi, Mohammad Reza Pakravan
    Comments: 10 pages, 9 figures, 2 tables
    Subjects: Information Theory (cs.IT)

    Resource allocation with quality of service constraints is one of the most
    challenging problems in elastic optical networks which is normally formulated
    as an MINLP optimization program. In this paper, we focus on novel properties
    of geometric optimization and provide a heuristic approach for resource
    allocation which is very faster than its MINLP counterpart. Our heuristic
    consists of two main parts for routing/traffic ordering and power/spectrum
    assignment. It aims at minimization of transmitted optical power and spectrum
    usage constrained to quality of service and physical requirements. We consider
    three routing/traffic ordering procedures and compare them in terms of total
    transmitted optical power, total received noise power and total nonlinear
    interference including self- and cross-channel interferences. We propose a
    posynomial expression for optical signal to noise ratio in which fiber
    nonlinearities and spontaneous emission noise have been addressed. We also
    propose posynomial expressions that relate modulation spectral efficiency to
    its corresponding minimum required optical signal to noise ratio. We then use
    the posynomial expressions to develop six geometric formulations for
    power/spectrum assignment part of the heuristic which are different in run
    time, complexity and accuracy. Simulation results demonstrate that the proposed
    solution has a very good accuracy and much lower computational complexity in
    comparison with MINLP formulation. As example for European Cost239 optical
    network with 46 transmit transponders, the geometric formulations can be more
    than 59 times faster than its MINLP counterpart. Numerical results also reveal
    that in long-haul elastic optical networks, considering the product of the
    number of common fiber spans and the transmission bit rate is a better goal
    function for routing/traffic ordering sub-problem.

    Randomness cost of symmetric twirling

    Holger Boche, Gisbert Janßen, Sajad Saeedinaeeni
    Comments: 8 pages, 2 figures
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    We study random unitary channels which reproduce the action of the twirling
    channel corresponding to the representation of the symmetric groupon an n-fold
    tensor product. We derive upper andlower bounds on the randomness cost of
    implementing such a map which depend exponentially on the number of systems.
    Consequently, symmetrictwirling can be regarded as a reasonable Shannon
    theoretic protocol. On the other hand, such protocols are disqualified by their
    resource-inefficiency in situations where randomness is a costly resource.

    Informative and misinformative interactions in a school of fish

    Emanuele Crosato, Li Jiang, Valentin Lecheval, Joseph T. Lizier, X. Rosalind Wang, Pierre Tichit, Guy Theraulaz, Mikhail Prokopenko
    Subjects: Quantitative Methods (q-bio.QM); Information Theory (cs.IT)

    It is generally accepted that, when moving in groups, animals process
    information to coordinate their motion. Recent studies have begun to apply
    rigorous methods based on Information Theory to quantify such distributed
    computation. Following this perspective, we use transfer entropy to quantify
    dynamic information flows locally in space and time across a school of fish
    during directional changes around a circular tank, i.e. U-turns. This analysis
    reveals peaks in information flows during collective U-turns and identifies two
    different flows: an informative flow (positive transfer entropy) based on fish
    that have already turned about fish that are turning, and a misinformative flow
    (negative transfer entropy) based on fish that have not turned yet about fish
    that are turning. We also reveal that the information flows are related to
    relative position and alignment between fish, and identify spatial patterns of
    information and misinformation cascades. This study offers several
    methodological contributions and we expect further application of these
    methodologies to reveal intricacies of self-organisation in other animal groups
    and active matter in general.

    Rational ignorance: simpler models learn more from finite data

    Henry H. Mattingly, Mark K. Transtrum, Michael C. Abbott, Benjamin B. Machta
    Comments: 7 pages, 4 figures
    Subjects: Data Analysis, Statistics and Probability (physics.data-an); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT)

    We use the language of uninformative Bayesian prior choice to study the
    selection of appropriately simple effective models. We consider the prior which
    maximizes the mutual information between parameters and predictions, learning
    as much as possible from finite data. When many parameters are poorly
    constrained by data, this puts weight only on boundaries of the parameter
    manifold. Thus it selects a lower-dimensional effective theory in a principled
    way, ignoring irrelevant parameter directions. Only in the limit where there is
    sufficient data to tightly constrain all parameters do we recover Jeffreys
    prior.




沪ICP备19023445号-2号
友情链接