IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 30 Nov 2016

    我爱机器学习(52ml.net)发表于 2016-11-30 00:00:00
    love 0

    Neural and Evolutionary Computing

    Emergence of foveal image sampling from learning to attend in visual scenes

    Brian Cheung, Eric Weiss, Bruno Olshausen
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We describe a neural attention model with a learnable retinal sampling
    lattice. The model is trained on a visual search task requiring the
    classification of an object embedded in a visual scene amidst background
    distractors using the smallest number of fixations. We explore the tiling
    properties that emerge in the model’s retinal sampling lattice after training.
    Specifically, we show that this lattice resembles the eccentricity dependent
    sampling lattice of the primate retina, with a high resolution region in the
    fovea surrounded by a low resolution periphery. Furthermore, we find conditions
    where these emergent properties are amplified or eliminated providing clues to
    their function.

    Multi-objective Active Control Policy Design for Commensurate and Incommensurate Fractional Order Chaotic Financial Systems

    Indranil Pan, Saptarshi Das, Shantanu Das
    Comments: 26 pages, 8 figures, 2 tables
    Journal-ref: Applied Mathematical Modelling, Volume 39, Issue 2, 15 January
    2015, Pages 500-514
    Subjects: Optimization and Control (math.OC); Computational Engineering, Finance, and Science (cs.CE); Neural and Evolutionary Computing (cs.NE); Systems and Control (cs.SY); Chaotic Dynamics (nlin.CD)

    In this paper, an active control policy design for a fractional order (FO)
    financial system is attempted, considering multiple conflicting objectives. An
    active control template as a nonlinear state feedback mechanism is developed
    and the controller gains are chosen within a multi-objective optimization (MOO)
    framework to satisfy the conditions of asymptotic stability, derived
    analytically. The MOO gives a set of solutions on the Pareto optimal front for
    the multiple conflicting objectives that are considered. It is shown that there
    is a trade-off between the multiple design objectives and a better performance
    in one objective can only be obtained at the cost of performance deterioration
    in the other objectives. The multi-objective controller design has been
    compared using three different MOO techniques viz. Non Dominated Sorting
    Genetic Algorithm-II (NSGA-II), epsilon variable Multi-Objective Genetic
    Algorithm (ev-MOGA), and Multi Objective Evolutionary Algorithm with
    Decomposition (MOEA/D). The robustness of the same control policy designed with
    the nominal system settings have been investigated also for gradual decrease in
    the commensurate and incommensurate fractional orders of the financial system.

    Fractional Order Load-Frequency Control of Interconnected Power Systems Using Chaotic Multi-objective Optimization

    Indranil Pan, Saptarshi Das
    Comments: 31 pages, 19 figures, 2 tables
    Journal-ref: Applied Soft Computing, Volume 29, April 2015, Pages 328-344
    Subjects: Optimization and Control (math.OC); Computational Engineering, Finance, and Science (cs.CE); Neural and Evolutionary Computing (cs.NE); Systems and Control (cs.SY)

    Fractional order proportional-integral-derivative (FOPID) controllers are
    designed for load frequency control (LFC) of two interconnected power systems.
    Conflicting time domain design objectives are considered in a multi objective
    optimization (MOO) based design framework to design the gains and the
    fractional differ-integral orders of the FOPID controllers in the two areas.
    Here, we explore the effect of augmenting two different chaotic maps along with
    the uniform random number generator (RNG) in the popular MOO algorithm – the
    Non-dominated Sorting Genetic Algorithm-II (NSGA-II). Different measures of
    quality for MOO e.g. hypervolume indicator, moment of inertia based diversity
    metric, total Pareto spread, spacing metric are adopted to select the best set
    of controller parameters from multiple runs of all the NSGA-II variants (i.e.
    nominal and chaotic versions). The chaotic versions of the NSGA-II algorithm
    are compared with the standard NSGA-II in terms of solution quality and
    computational time. In addition, the Pareto optimal fronts showing the
    trade-off between the two conflicting time domain design objectives are
    compared to show the advantage of using the FOPID controller over that with
    simple PID controller. The nature of fast/slow and high/low noise amplification
    effects of the FOPID structure or the four quadrant operation in the two
    inter-connected areas of the power system is also explored. A fuzzy logic based
    method has been adopted next to select the best compromise solution from the
    best Pareto fronts corresponding to each MOO comparison criteria. The time
    domain system responses are shown for the fuzzy best compromise solutions under
    nominal operating conditions. Comparative analysis on the merits and de-merits
    of each controller structure is reported then. A robustness analysis is also
    done for the PID and the FOPID controllers.

    Fractional Order AGC for Distributed Energy Resources Using Robust Optimization

    Indranil Pan, Saptarshi Das
    Comments: 12 pages, 16 figures, 5 tables
    Journal-ref: IEEE Transactions on Smart Grid, Volume 7, Issue 5, Pages 2175 –
    2186, Sept 2016
    Subjects: Systems and Control (cs.SY); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)

    The applicability of fractional order (FO) automatic generation control (AGC)
    for power system frequency oscillation damping is investigated in this paper,
    employing distributed energy generation. The hybrid power system employs
    various autonomous generation systems like wind turbine, solar photovoltaic,
    diesel engine, fuel-cell and aqua electrolyzer along with other energy storage
    devices like the battery and flywheel. The controller is placed in a remote
    location while receiving and sending signals over an unreliable communication
    network with stochastic delay. The controller parameters are tuned using robust
    optimization techniques employing different variants of Particle Swarm
    Optimization (PSO) and are compared with the corresponding optimal solutions.
    An archival based strategy is used for reducing the number of function
    evaluations for the robust optimization methods. The solutions obtained through
    the robust optimization are able to handle higher variation in the controller
    gains and orders without significant decrease in the system performance. This
    is desirable from the FO controller implementation point of view, as the design
    is able to accommodate variations in the system parameter which may result due
    to the approximation of FO operators, using different realization methods and
    order of accuracy. Also a comparison is made between the FO and the integer
    order (IO) controllers to highlight the merits and demerits of each scheme.

    Intelligible Language Modeling with Input Switched Affine Networks

    Jakob N. Foerster, Justin Gilmer, Jan Chorowski, Jascha Sohl-Dickstein, David Sussillo
    Comments: ICLR 2107 submission: this https URL
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    The computational mechanisms by which nonlinear recurrent neural networks
    (RNNs) achieve their goals remains an open question. There exist many problem
    domains where intelligibility of the network model is crucial for deployment.
    Here we introduce a recurrent architecture composed of input-switched affine
    transformations, in other words an RNN without any nonlinearity and with one
    set of weights per input. We show that this architecture achieves near
    identical performance to traditional architectures on language modeling of
    Wikipedia text, for the same number of model parameters. It can obtain this
    performance with the potential for computational speedup compared to existing
    methods, by precomputing the composed affine transformations corresponding to
    longer input sequences. As our architecture is affine, we are able to
    understand the mechanisms by which it functions using linear methods. For
    example, we show how the network linearly combines contributions from the past
    to make predictions at the current time step. We show how representations for
    words can be combined in order to understand how context is transferred across
    word boundaries. Finally, we demonstrate how the system can be executed and
    analyzed in arbitrary bases to aid understanding.


    Computer Vision and Pattern Recognition

    Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction

    Richard Zhang, Phillip Isola, Alexei A. Efros
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose split-brain autoencoders, a straightforward modification of the
    traditional autoencoder architecture, for unsupervised representation learning.
    The method adds a split to the network, resulting in two disjoint sub-networks.
    Each sub-network is trained to perform a difficult task — predicting one
    subset of the data channels from another. Together, the sub-networks extract
    features from the entire input signal. By forcing the network to solve
    cross-channel prediction tasks, we induce a representation within the network
    which transfers well to other, unseen tasks. This method achieves
    state-of-the-art performance on several large-scale transfer learning
    benchmarks.

    Monocular 3D Human Pose Estimation Using Transfer Learning and Improved CNN Supervision

    Dushyant Mehta, Helge Rhodin, Dan Casas, Oleksandr Sotnychenko, Weipeng Xu, Christian Theobalt
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a new CNN-based method for regressing 3D human body pose from a
    single image that improves over the state-of-the-art on standard benchmarks by
    more than 25%. Our approach addresses the limited generalizability of models
    trained solely on the starkly limited publicly available 3D body pose data.
    Improved CNN supervision leverages first and second order parent relationships
    along the skeletal kinematic tree, and improved multi-level skip connections to
    learn better representations through implicit modification of the loss
    landscape. Further, transfer learning from 2D human pose prediction
    significantly improves accuracy and generalizability to unseen poses and camera
    views. Additionally, we contribute a new benchmark and training set for human
    body pose estimation from monocular images of real humans, that has ground
    truth captured with marker-less motion capture. It complements existing corpora
    with greater diversity in pose, human appearance, clothing, occlusion, and
    viewpoints, and enables increased scope of augmentation. The benchmark covers
    outdoors and indoor scenes.

    3D Ultrasound image segmentation: A Survey

    Mohammad Hamed Mozaffari, WonSook Lee
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Three-dimensional Ultrasound image segmentation methods are surveyed in this
    paper. The focus of this report is to investigate applications of these
    techniques and a review of the original ideas and concepts. Although many
    two-dimensional image segmentation in the literature have been considered as a
    three-dimensional approach by mistake but we review them as a three-dimensional
    technique. We select the studies that have addressed the problem of medical
    three-dimensional Ultrasound image segmentation utilizing their proposed
    techniques. The evaluation methods and comparison between them are presented
    and tabulated in terms of evaluation techniques, interactivity, and robustness.

    InterpoNet, A brain inspired neural network for optical flow dense interpolation

    Shay Zweig, Lior Wolf
    Comments: 16 pages, 11 figures, 7 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Sparse-to-dense interpolation for optical flow is a fundamental phase in the
    pipeline of most of the leading optical flow estimation algorithms. The current
    state-of-the-art method for interpolation, EpicFlow, is a local average method
    based on an edge aware geodesic distance. We propose a new data-driven
    sparse-to-dense interpolation algorithm based on a fully convolutional network.
    We draw inspiration from the filling-in process in the visual cortex and
    introduce lateral dependencies between neurons and multi-layer supervision into
    our learning process. We also show the importance of the image contour to the
    learning process. Our method is robust and outperforms EpicFlow on competitive
    optical flow benchmarks with several underlying matching algorithms. This leads
    to state-of-the-art performance on the Sintel and KITTI 2012 benchmarks.

    Computer Aided Detection of Oral Lesions on CT Images

    Shaikat Galib, Fahima Islam, Muhammad Abir, Hyoung-Koo Lee
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Oral lesions are important findings on computed tomography (CT) images. In
    this study, a fully automatic method to detect oral lesions in mandibular
    region from dental CT images is proposed. Two methods were developed to
    recognize two types of lesions namely (1) Close border (CB) lesions and (2)
    Open border (OB) lesions, which cover most of the lesion types that can be
    found on CT images. For the detection of CB lesions, fifteen features were
    extracted from each initial lesion candidates and multi layer perceptron (MLP)
    neural network was used to classify suspicious regions. Moreover, OB lesions
    were detected using a rule based image processing method, where no feature
    extraction or classification algorithm were used. The results were validated
    using a CT dataset of 52 patients, where 22 patients had abnormalities and 30
    patients were normal. Using non-training dataset, CB detection algorithm
    yielded 71% sensitivity with 0.31 false positives per patient. Furthermore, OB
    detection algorithm achieved 100% sensitivity with 0.13 false positives per
    patient. Results suggest that, the proposed framework, which consists of two
    methods, has the potential to be used in clinical context, and assist
    radiologists for better diagnosis.

    Gossip training for deep learning

    Michael Blot, David Picard, Matthieu Cord, Nicolas Thome
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    We address the issue of speeding up the training of convolutional networks.
    Here we study a distributed method adapted to stochastic gradient descent
    (SGD). The parallel optimization setup uses several threads, each applying
    individual gradient descents on a local variable. We propose a new way to share
    information between different threads inspired by gossip algorithms and showing
    good consensus convergence properties. Our method called GoSGD has the
    advantage to be fully asynchronous and decentralized. We compared our method to
    the recent EASGD in cite{elastic} on CIFAR-10 show encouraging results.

    Efficient Linear Programming for Dense CRFs

    Thalaiyasingam Ajanthan, Alban Desmaison, Rudy Bunel, Mathieu Salzmann, Philip H.S. Torr, M. Pawan Kumar
    Comments: 24 pages, 10 figures, 4 tables and 51 equations
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The fully connected conditional random field (CRF) with Gaussian pairwise
    potentials has proven popular and effective for multi-class semantic
    segmentation. While the energy of a dense CRF can be minimized accurately using
    a linear programming (LP) relaxation, the state-of-the-art algorithm is too
    slow to be useful in practice. To alleviate this deficiency, we introduce an
    efficient LP minimization algorithm for dense CRFs. To this end, we develop a
    proximal minimization framework, where the dual of each proximal problem is
    optimized via block coordinate descent. We show that each block of variables
    can be efficiently optimized. Specifically, for one block, the problem
    decomposes into significantly smaller subproblems, each of which is defined
    over a single pixel. For the other block, the problem is optimized via
    conditional gradient descent. This has two advantages: 1) the conditional
    gradient can be computed in a time linear in the number of pixels and labels;
    and 2) the optimal step size can be computed analytically. Our experiments on
    standard datasets provide compelling evidence that our approach outperforms all
    existing baselines including the previous LP based approach for dense CRFs.

    Surveillance Video Parsing with Single Frame Supervision

    Si Liu, Changhu Wang, Ruihe Qian, Han Yu, Renda Bao
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Surveillance video parsing, which segments the video frames into several
    labels, e.g., face, pants, left-leg, has wide applications.
    However,pixel-wisely annotating all frames is tedious and inefficient. In this
    paper, we develop a Single frame Video Parsing (SVP) method which requires only
    one labeled frame per video in training stage. To parse one particular frame,
    the video segment preceding the frame is jointly considered. SVP (1) roughly
    parses the frames within the video segment, (2) estimates the optical flow
    between frames and (3) fuses the rough parsing results warped by optical flow
    to produce the refined parsing result. The three components of SVP, namely
    frame parsing, optical flow estimation and temporal fusion are integrated in an
    end-to-end manner. Experimental results on two surveillance video datasets show
    the superiority of SVP over state-of-the-arts.

    A Large-scale Distributed Video Parsing and Evaluation Platform

    Kai Yu, Yang Zhou, Da Li, Zhang Zhang, Kaiqi Huang
    Comments: Accepted by Chinese Conference on Intelligent Visual Surveillance 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Visual surveillance systems have become one of the largest data sources of
    Big Visual Data in real world. However, existing systems for video analysis
    still lack the ability to handle the problems of scalability, expansibility and
    error-prone, though great advances have been achieved in a number of visual
    recognition tasks and surveillance applications, e.g., pedestrian/vehicle
    detection, people/vehicle counting. Moreover, few algorithms explore the
    specific values/characteristics in large-scale surveillance videos. To address
    these problems in large-scale video analysis, we develop a scalable video
    parsing and evaluation platform through combining some advanced techniques for
    Big Data processing, including Spark Streaming, Kafka and Hadoop Distributed
    Filesystem (HDFS). Also, a Web User Interface is designed in the system, to
    collect users’ degrees of satisfaction on the recognition tasks so as to
    evaluate the performance of the whole system. Furthermore, the highly
    extensible platform running on the long-term surveillance videos makes it
    possible to develop more intelligent incremental algorithms to enhance the
    performance of various visual recognition tasks.

    Fast Face-swap Using Convolutional Neural Networks

    Iryna Korshunova, Wenzhe Shi, Joni Dambre, Lucas Theis
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We consider the problem of face swapping in images, where an input identity
    is transformed into a target identity while preserving pose, facial expression,
    and lighting. To perform this mapping, we use convolutional neural networks
    trained to capture the appearance of the target identity from an unstructured
    collection of his/her photographs.This approach is enabled by framing the face
    swapping problem in terms of style transfer, where the goal is to render an
    image in the style of another one. Building on recent advances in this area, we
    devise a new loss function that enables the network to produce highly
    photorealistic results. By combining neural networks with simple pre- and
    post-processing steps, we aim at making face swap work in real-time with no
    input from the user.

    Occlusion-Aware Video Deblurring with a New Layered Blur Model

    Byeongjoo Ahn, Tae Hyun Kim, Wonsik Kim, Kyoung Mu Lee
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a deblurring method for scenes with occluding objects using a
    carefully designed layered blur model. Layered blur model is frequently used in
    the motion deblurring problem to handle locally varying blurs, which is caused
    by object motions or depth variations in a scene. However, conventional models
    have a limitation in representing the layer interactions occurring at occlusion
    boundaries. In this paper, we address this limitation in both theoretical and
    experimental ways, and propose a new layered blur model reflecting actual blur
    generation process. Based on this model, we develop an occlusion-aware
    deblurring method that can estimate not only the clear foreground and
    background, but also the object motion more accurately. We also provide a novel
    analysis on the blur kernel at object boundaries, which shows the distinctive
    characteristics of the blur kernel that cannot be captured by conventional blur
    models. Experimental results on synthetic and real blurred videos demonstrate
    that the proposed method yields superior results, especially at object
    boundaries.

    Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model

    Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Data-driven saliency has recently gained a lot of attention thanks to the use
    of Convolutional Neural Networks. In this paper we go beyond the standard
    approach to saliency prediction, in which gaze maps are computed with a
    feed-forward network, and we present a novel Saliency Attentive Model which can
    predict accurate saliency maps by incorporating attentive mechanisms. Our
    solution is composed of a Convolutional LSTM, that iteratively focuses on the
    most salient regions of the input, and a Residual Architecture designed to
    preserve spatial resolution. Additionally, to tackle the center bias present in
    human eye fixations, our model incorporates prior maps generated by learned
    Gaussian functions. We show, through an extensive evaluation, that the proposed
    architecture overcomes the current state of the art on three public saliency
    prediction datasets: SALICON, MIT300 and CAT2000. We further study the
    contribution of each key components to demonstrate their robustness on
    different scenarios.

    Lens Distortion Rectification using Triangulation based Interpolation

    Burak Benligiray, Cihan Topal
    Comments: International Symposium on Visual Computing, 2015
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Nonlinear lens distortion rectification is a common first step in image
    processing applications where the assumption of a linear camera model is
    essential. For rectifying the lens distortion, forward distortion model needs
    to be known. However, many self-calibration methods estimate the inverse
    distortion model. In the literature, the inverse of the estimated model is
    approximated for image rectification, which introduces additional error to the
    system. We propose a novel distortion rectification method that uses the
    inverse distortion model directly. The method starts by mapping the distorted
    pixels to the rectified image using the inverse distortion model. The resulting
    set of points with subpixel locations are triangulated. The pixel values of the
    rectified image are linearly interpolated based on this triangulation. The
    method is applicable to all camera calibration methods that estimate the
    inverse distortion model and performs well across a large range of parameters.

    Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce

    Tom Zahavy, Alessandro Magnani, Abhinandan Krishnan, Shie Mannor
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    Classifying products into categories precisely and efficiently is a major
    challenge in modern e-commerce. The high traffic of new products uploaded daily
    and the dynamic nature of the categories raise the need for machine learning
    models that can reduce the cost and time of human editors. In this paper, we
    propose a decision level fusion approach for multi-modal product classification
    using text and image inputs. We train input specific state-of-the-art deep
    neural networks for each input source, show the potential of forging them
    together into a multi-modal architecture and train a novel policy network that
    learns to choose between them. Finally, we demonstrate that our multi-modal
    network improves the top-1 accuracy % over both networks on a real-world
    large-scale product classification dataset that we collected fromWalmart.com.
    While we focus on image-text fusion that characterizes e-commerce domains, our
    algorithms can be easily applied to other modalities such as audio, video,
    physical sensors, etc.

    Deep Quantization: Encoding Convolutional Activations with Deep Generative Model

    Zhaofan Qiu, Ting Yao, Tao Mei
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep convolutional neural networks (CNNs) have proven highly effective for
    visual recognition, where learning a universal representation from activations
    of convolutional layer plays a fundamental problem. In this paper, we present
    Fisher Vector encoding with Variational Auto-Encoder (FV-VAE), a novel deep
    architecture that quantizes the local activations of convolutional layer in a
    deep generative model, by training them in an end-to-end manner. To incorporate
    FV encoding strategy into deep generative models, we introduce Variational
    Auto-Encoder model, which steers a variational inference and learning in a
    neural network which can be straightforwardly optimized using standard
    stochastic gradient method. Different from the FV characterized by conventional
    generative models (e.g., Gaussian Mixture Model) which parsimoniously fit a
    discrete mixture model to data distribution, the proposed FV-VAE is more
    flexible to represent the natural property of data for better generalization.
    Extensive experiments are conducted on three public datasets, i.e., UCF101,
    ActivityNet, and CUB-200-2011 in the context of video action recognition and
    fine-grained image classification, respectively. Superior results are reported
    when compared to state-of-the-art representations. Most remarkably, our
    proposed FV-VAE achieves to-date the best published accuracy of 94.2% on
    UCF101.

    Inertial-Based Scale Estimation for Structure from Motion on Mobile Devices

    Janne Mustaniemi, Juho Kannala, Simo Särkkä, Jiri Matas, Janne Heikkilä
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Structure from motion algorithms have an inherent limitation that the
    reconstruction can only be determined up to the unknown scale factor. Modern
    mobile devices are equipped with an inertial measurement unit (IMU), which can
    be used for estimating the scale of the reconstruction. We propose a method
    that recovers the metric scale given inertial measurements and camera poses. In
    the process, we also perform a temporal and spatial alignment of the camera and
    the IMU. Therefore, our solution can be easily combined with any existing
    visual reconstruction software. The method can cope with noisy camera pose
    estimates, typically caused by motion blur or rolling shutter artifacts, via
    utilizing a Rauch-Tung-Striebel (RTS) smoother. Furthermore, the scale
    estimation is performed in the frequency domain, which provides more robustness
    to inaccurate sensor time stamps and noisy IMU samples than the previously used
    time domain representation. In contrast to previous methods, our approach has
    no parameters that need to be tuned for achieving a good performance. In the
    experiments, we show that the algorithm outperforms the state-of-the-art in
    both accuracy and convergence speed of the scale estimate. The accuracy of the
    scale is around (1\%) from the ground truth depending on the recording. We also
    demonstrate that our method can improve the scale accuracy of the Project
    Tango’s build-in motion tracking.

    Social Behavior Prediction from First Person Videos

    Shan Su, Jung Pyo Hong, Jianbo Shi, Hyun Soo Park
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a method to predict the future movements (location and
    gaze direction) of basketball players as a whole from their first person
    videos. The predicted behaviors reflect an individual physical space that
    affords to take the next actions while conforming to social behaviors by
    engaging to joint attention. Our key innovation is to use the 3D reconstruction
    of multiple first person cameras to automatically annotate each other’s the
    visual semantics of social configurations.

    We leverage two learning signals uniquely embedded in first person videos.
    Individually, a first person video records the visual semantics of a spatial
    and social layout around a person that allows associating with past similar
    situations. Collectively, first person videos follow joint attention that can
    link the individuals to a group. We learn the egocentric visual semantics of
    group movements using a Siamese neural network to retrieve future trajectories.
    We consolidate the retrieved trajectories from all players by maximizing a
    measure of social compatibility—the gaze alignment towards joint attention
    predicted by their social formation, where the dynamics of joint attention is
    learned by a long-term recurrent convolutional network. This allows us to
    characterize which social configuration is more plausible and predict future
    group trajectories.

    Material Recognition from Local Appearance in Global Context

    Gabriel Schwartz, Ko Nishino
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recognition of materials has proven to be a challenging problem due to the
    wide variation in appearance within and between categories. Many recent
    material recognition methods treat materials as yet another set of labels like
    objects. Materials are, however, fundamentally different from objects as they
    have no inherent shape or defined spatial extent. This makes local material
    recognition particularly hard. Global image context, such as where the material
    is or what object it makes up, can be crucial to recognizing the material.
    Existing methods, however, operate on an implicit fusion of materials and
    context by using large receptive fields as input (i.e., large image patches).
    Such an approach can only take advantage of limited context as it appears
    during training, and will be bounded by the combinations seen in the training
    data. We instead show that recognizing materials purely from their local
    appearance and integrating separately recognized global contextual cues
    including objects and places leads to superior dense, per-pixel, material
    recognition. We achieve this by training a fully-convolutional material
    recognition network end-to-end with only material category supervision. We
    integrate object and place estimates to this network from independent CNNs.
    This approach avoids the necessity of preparing an infeasible amount of
    training data that covers the product space of materials, objects, and scenes,
    while fully leveraging contextual cues for dense material recognition.
    Experimental results validate the effectiveness of our approach and show that
    our method outperforms past methods that build on inseparable material and
    contextual information.

    Generating Holistic 3D Scene Abstractions for Text-based Image Retrieval

    Ang Li, Jin Sun, Joe Yue-Hei Ng, Ruichi Yu, Vlad I. Morariu, Larry S. Davis
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG); Information Retrieval (cs.IR)

    Spatial relationships between objects provide important information for
    text-based image retrieval. As users are more likely to describe a scene from a
    real world perspective, using 3D spatial relationships rather than 2D
    relationships that assume a particular viewing direction, one of the main
    challenges is to infer the 3D structure that bridges images with users’ text
    descriptions. However, direct inference of 3D structure from images requires
    learning from large scale annotated data. Since interactions between objects
    can be reduced to a limited set of atomic spatial relations in 3D, we study the
    possibility of inferring 3D structure from a text description rather than an
    image, applying physical relation models to synthesize holistic 3D abstract
    object layouts satisfying the spatial constraints present in a textual
    description. We present a generic framework for retrieving images from a
    textual description of a scene by matching images with these generated abstract
    object layouts. Images are ranked by matching object detection outputs
    (bounding boxes) to 2D layout candidates (also represented by bounding boxes)
    which are obtained by projecting the 3D scenes with sampled camera directions.
    We validate our approach using public indoor scene datasets and show that our
    method outperforms an object occurrence based and a learned 2D pairwise
    relation based baselines.

    Measuring and modeling the perception of natural and unconstrained gaze in humans and machines

    Daniel Harari, Tao Gao, Nancy Kanwisher, Joshua Tenenbaum, Shimon Ullman
    Comments: Daniel Harari and Tao Gao contributed equally to this work
    Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Humans are remarkably adept at interpreting the gaze direction of other
    individuals in their surroundings. This skill is at the core of the ability to
    engage in joint visual attention, which is essential for establishing social
    interactions. How accurate are humans in determining the gaze direction of
    others in lifelike scenes, when they can move their heads and eyes freely, and
    what are the sources of information for the underlying perceptual processes?
    These questions pose a challenge from both empirical and computational
    perspectives, due to the complexity of the visual input in real-life
    situations. Here we measure empirically human accuracy in perceiving the gaze
    direction of others in lifelike scenes, and study computationally the sources
    of information and representations underlying this cognitive capacity. We show
    that humans perform better in face-to-face conditions compared with recorded
    conditions, and that this advantage is not due to the availability of input
    dynamics. We further show that humans are still performing well when only the
    eyes-region is visible, rather than the whole face. We develop a computational
    model, which replicates the pattern of human performance, including the finding
    that the eyes-region contains on its own, the required information for
    estimating both head orientation and direction of gaze. Consistent with
    neurophysiological findings on task-specific face regions in the brain, the
    learned computational representations reproduce perceptual effects such as the
    Wollaston illusion, when trained to estimate direction of gaze, but not when
    trained to recognize objects or faces.

    On the Existence of Synchrostates in Multichannel EEG Signals during Face-perception Tasks

    Wasifa Jamal, Saptarshi Das, Koushik Maharatna, Fabio Apicella, Georgia Chronaki, Federico Sicca, David Cohen, Filippo Muratori
    Comments: 30 pages, 22 figures, 2 tables
    Journal-ref: Biomedical Physics & Engineering Express, vol. 1, no. 1, pp.
    015002, 2015
    Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP); Machine Learning (stat.ML)

    Phase synchronisation in multichannel EEG is known as the manifestation of
    functional brain connectivity. Traditional phase synchronisation studies are
    mostly based on time average synchrony measures hence do not preserve the
    temporal evolution of the phase difference. Here we propose a new method to
    show the existence of a small set of unique phase synchronised patterns or
    “states” in multi-channel EEG recordings, each “state” being stable of the
    order of ms, from typical and pathological subjects during face perception
    tasks. The proposed methodology bridges the concepts of EEG microstates and
    phase synchronisation in time and frequency domain respectively. The analysis
    is reported for four groups of children including typical, Autism Spectrum
    Disorder (ASD), low and high anxiety subjects – a total of 44 subjects. In all
    cases, we observe consistent existence of these states – termed as
    synchrostates – within specific cognition related frequency bands (beta and
    gamma bands), though the topographies of these synchrostates differ for
    different subject groups with different pathological conditions. The
    inter-synchrostate switching follows a well-defined sequence capturing the
    underlying inter-electrode phase relation dynamics in stimulus- and
    person-centric manner. Our study is motivated from the well-known EEG
    microstate exhibiting stable potential maps over the scalp. However, here we
    report a similar observation of quasi-stable phase synchronised states in
    multichannel EEG. The existence of the synchrostates coupled with their unique
    switching sequence characteristics could be considered as a potentially new
    field over contemporary EEG phase synchronisation studies.

    Easy-setup eye movement recording system for human-computer interaction

    Manh Duong Phung, Quang Vinh Tran, Kenji Hara, Hirohito Inagaki, Masanobu Abe
    Comments: In IEEE International Conference on Research, Innovation and Vision for the Future (RIVF), 2008
    Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)

    Tracking the movement of human eyes is expected to yield natural and
    convenient applications based on human-computer interaction (HCI). To implement
    an effective eye-tracking system, eye movements must be recorded without
    placing any restriction on the user’s behavior or user discomfort. This paper
    describes an eye movement recording system that offers free-head, simple
    configuration. It does not require the user to wear anything on her head, and
    she can move her head freely. Instead of using a computer, the system uses a
    visual digital signal processor (DSP) camera to detect the position of eye
    corner, the center of pupil and then calculate the eye movement. Evaluation
    tests show that the sampling rate of the system can be 300 Hz and the accuracy
    is about 1.8 degree/s.


    Artificial Intelligence

    Dialogue Learning With Human-In-The-Loop

    Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    An important aspect of developing conversational agents is to give a bot the
    ability to improve through communicating with humans and to learn from the
    mistakes that it makes. Most research has focused on learning from fixed
    training sets of labeled data rather than interacting with a dialogue partner
    in an online fashion. In this paper we explore this direction in a
    reinforcement learning setting where the bot improves its question-answering
    ability from feedback a teacher gives following its generated responses. We
    build a simulator that tests various aspects of such learning in a synthetic
    environment, and introduce models that work in this regime. Finally, real
    experiments with Mechanical Turk validate the approach.

    Learning Concept Hierarchies through Probabilistic Topic Modeling

    V. S. Anoop, S. Asharaf, P. Deepak
    Journal-ref: International Journal of Information Processing (IJIP), Volume 10,
    Issue 3, 2016
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

    With the advent of semantic web, various tools and techniques have been
    introduced for presenting and organizing knowledge. Concept hierarchies are one
    such technique which gained significant attention due to its usefulness in
    creating domain ontologies that are considered as an integral part of semantic
    web. Automated concept hierarchy learning algorithms focus on extracting
    relevant concepts from unstructured text corpus and connect them together by
    identifying some potential relations exist between them. In this paper, we
    propose a novel approach for identifying relevant concepts from plain text and
    then learns hierarchy of concepts by exploiting subsumption relation between
    them. To start with, we model topics using a probabilistic topic model and then
    make use of some lightweight linguistic process to extract semantically rich
    concepts. Then we connect concepts by identifying an “is-a” relationship
    between pair of concepts. The proposed method is completely unsupervised and
    there is no need for a domain specific training corpus for concept extraction
    and learning. Experiments on large and real-world text corpora such as BBC News
    dataset and Reuters News corpus shows that the proposed method outperforms some
    of the existing methods for concept extraction and efficient concept hierarchy
    learning is possible if the overall task is guided by a probabilistic topic
    modeling algorithm.

    Intelligible Language Modeling with Input Switched Affine Networks

    Jakob N. Foerster, Justin Gilmer, Jan Chorowski, Jascha Sohl-Dickstein, David Sussillo
    Comments: ICLR 2107 submission: this https URL
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    The computational mechanisms by which nonlinear recurrent neural networks
    (RNNs) achieve their goals remains an open question. There exist many problem
    domains where intelligibility of the network model is crucial for deployment.
    Here we introduce a recurrent architecture composed of input-switched affine
    transformations, in other words an RNN without any nonlinearity and with one
    set of weights per input. We show that this architecture achieves near
    identical performance to traditional architectures on language modeling of
    Wikipedia text, for the same number of model parameters. It can obtain this
    performance with the potential for computational speedup compared to existing
    methods, by precomputing the composed affine transformations corresponding to
    longer input sequences. As our architecture is affine, we are able to
    understand the mechanisms by which it functions using linear methods. For
    example, we show how the network linearly combines contributions from the past
    to make predictions at the current time step. We show how representations for
    words can be combined in order to understand how context is transferred across
    word boundaries. Finally, we demonstrate how the system can be executed and
    analyzed in arbitrary bases to aid understanding.

    Adams Conditioning and Likelihood Ratio Transfer Mediated Inference

    Jan A. Bergstra
    Comments: 43 pages
    Subjects: Artificial Intelligence (cs.AI)

    Forensic science advocates the use of inference mechanisms which may be
    viewed as simple multi-agent protocols. An important protocol of this kind
    involves an agent FE (forensic expert) who communicates to a second agent TOF
    (trier of fact) first its value of a certain likelihood ratio with respect to
    its own belief state which is supposed to be captured by a probability function
    on FE’s proposition space. Subsequently FE communicates its recently acquired
    confirmation that a certain evidence proposition is true. The inference part of
    this sort of reasoning, here referred to as likelihood ratio transfer mediated
    reasoning, involves TOF’s revision of its own belief state, and in particular
    an evaluation of the resulting belief in the hypothesis proposition.

    Different realizations of likelihood ratio transfer mediated reasoning are
    distinguished: if the evidence hypothesis is included in the prior proposition
    space of TOF then a comparison is made between understanding the TOF side of a
    belief revision step as a composition of two successive steps of single
    likelihood Adams conditioning followed by a Bayes conditioning step, and as a
    single step of double likelihood Adams conditioning followed by Bayes
    conditioning; if, however the evidence hypothesis is initially outside the
    proposition space of TOF an application of proposition kinetics for the
    introduction of the evidence proposition precedes Bayesian conditioning, which
    is followed by Jeffrey conditioning on the hypothesis proposition.

    NewsQA: A Machine Comprehension Dataset

    Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, Kaheer Suleman
    Comments: Under review for ICLR 2016
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We present NewsQA, a challenging machine comprehension dataset of over
    100,000 question-answer pairs. Crowdworkers supply questions and answers based
    on a set of over 10,000 news articles from CNN, with answers consisting in
    spans of text from the corresponding articles. We collect this dataset through
    a four-stage process designed to solicit exploratory questions that require
    reasoning. A thorough analysis confirms that NewsQA demands abilities beyond
    simple word matching and recognizing entailment. We measure human performance
    on the dataset and compare it to several strong neural models. The performance
    gap between humans and machines (25.3% F1) indicates that significant progress
    can be made on NewsQA through future research. The dataset is freely available
    at datasets.maluuba.com/NewsQA.

    Measuring and modeling the perception of natural and unconstrained gaze in humans and machines

    Daniel Harari, Tao Gao, Nancy Kanwisher, Joshua Tenenbaum, Shimon Ullman
    Comments: Daniel Harari and Tao Gao contributed equally to this work
    Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Humans are remarkably adept at interpreting the gaze direction of other
    individuals in their surroundings. This skill is at the core of the ability to
    engage in joint visual attention, which is essential for establishing social
    interactions. How accurate are humans in determining the gaze direction of
    others in lifelike scenes, when they can move their heads and eyes freely, and
    what are the sources of information for the underlying perceptual processes?
    These questions pose a challenge from both empirical and computational
    perspectives, due to the complexity of the visual input in real-life
    situations. Here we measure empirically human accuracy in perceiving the gaze
    direction of others in lifelike scenes, and study computationally the sources
    of information and representations underlying this cognitive capacity. We show
    that humans perform better in face-to-face conditions compared with recorded
    conditions, and that this advantage is not due to the availability of input
    dynamics. We further show that humans are still performing well when only the
    eyes-region is visible, rather than the whole face. We develop a computational
    model, which replicates the pattern of human performance, including the finding
    that the eyes-region contains on its own, the required information for
    estimating both head orientation and direction of gaze. Consistent with
    neurophysiological findings on task-specific face regions in the brain, the
    learned computational representations reproduce perceptual effects such as the
    Wollaston illusion, when trained to estimate direction of gaze, but not when
    trained to recognize objects or faces.

    Fractional Order Fuzzy Control of Hybrid Power System with Renewable Generation Using Chaotic PSO

    Indranil Pan, Saptarshi Das
    Comments: 21 pages, 12 figures, 4 tables
    Journal-ref: ISA Transactions, Volume 62, May 2016, Pages 19-29
    Subjects: Systems and Control (cs.SY); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Optimization and Control (math.OC); Chaotic Dynamics (nlin.CD)

    This paper investigates the operation of a hybrid power system through a
    novel fuzzy control scheme. The hybrid power system employs various autonomous
    generation systems like wind turbine, solar photovoltaic, diesel engine,
    fuel-cell, aqua electrolyzer etc. Other energy storage devices like the
    battery, flywheel and ultra-capacitor are also present in the network. A novel
    fractional order (FO) fuzzy control scheme is employed and its parameters are
    tuned with a particle swarm optimization (PSO) algorithm augmented with two
    chaotic maps for achieving an improved performance. This FO fuzzy controller
    shows better performance over the classical PID, and the integer order fuzzy
    PID controller in both linear and nonlinear operating regimes. The FO fuzzy
    controller also shows stronger robustness properties against system parameter
    variation and rate constraint nonlinearity, than that with the other controller
    structures. The robustness is a highly desirable property in such a scenario
    since many components of the hybrid power system may be switched on/off or may
    run at lower/higher power output, at different time instants.

    Fractional Order AGC for Distributed Energy Resources Using Robust Optimization

    Indranil Pan, Saptarshi Das
    Comments: 12 pages, 16 figures, 5 tables
    Journal-ref: IEEE Transactions on Smart Grid, Volume 7, Issue 5, Pages 2175 –
    2186, Sept 2016
    Subjects: Systems and Control (cs.SY); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)

    The applicability of fractional order (FO) automatic generation control (AGC)
    for power system frequency oscillation damping is investigated in this paper,
    employing distributed energy generation. The hybrid power system employs
    various autonomous generation systems like wind turbine, solar photovoltaic,
    diesel engine, fuel-cell and aqua electrolyzer along with other energy storage
    devices like the battery and flywheel. The controller is placed in a remote
    location while receiving and sending signals over an unreliable communication
    network with stochastic delay. The controller parameters are tuned using robust
    optimization techniques employing different variants of Particle Swarm
    Optimization (PSO) and are compared with the corresponding optimal solutions.
    An archival based strategy is used for reducing the number of function
    evaluations for the robust optimization methods. The solutions obtained through
    the robust optimization are able to handle higher variation in the controller
    gains and orders without significant decrease in the system performance. This
    is desirable from the FO controller implementation point of view, as the design
    is able to accommodate variations in the system parameter which may result due
    to the approximation of FO operators, using different realization methods and
    order of accuracy. Also a comparison is made between the FO and the integer
    order (IO) controllers to highlight the merits and demerits of each scheme.

    Semantic Parsing of Mathematics by Context-based Learning from Aligned Corpora and Theorem Proving

    Cezary Kaliszyk, Josef Urban, Jiří Vyskočil
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We study methods for automated parsing of informal mathematical expressions
    into formal ones, a main prerequisite for deep computer understanding of
    informal mathematical texts. We propose a context-based parsing approach that
    combines efficient statistical learning of deep parse trees with their semantic
    pruning by type checking and large-theory automated theorem proving. We show
    that the methods very significantly improve on previous results in parsing
    theorems from the Flyspeck corpus.

    Generic and Efficient Solution Solves the Shortest Paths Problem in Square Runtime

    Yong Tan
    Comments: 26 pages, 11,100 words, 2 pictures
    Subjects: Discrete Mathematics (cs.DM); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

    We study a group of new methods to solve an open problem that is the shortest
    paths problem on a given fix-weighted instance. It is the real significance at
    a considerable altitude to reach our aim to meet these qualities of generic,
    efficiency, precision which we generally require to a methodology. Besides our
    proof to guarantee our measures might work normally, we pay more interest to
    root out the vital theory about calculation and logic in favor of our extension
    to range over a wide field about decision, operator, economy, management,
    robot, AI and etc.

    Learning Filter Banks Using Deep Learning For Acoustic Signals

    Shuhui Qu, Juncheng Li, Wei Dai, Samarjit Das
    Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)

    Designing appropriate features for acoustic event recognition tasks is an
    active field of research. Expressive features should both improve the
    performance of the tasks and also be interpret-able. Currently, heuristically
    designed features based on the domain knowledge requires tremendous effort in
    hand-crafting, while features extracted through deep network are difficult for
    human to interpret. In this work, we explore the experience guided learning
    method for designing acoustic features. This is a novel hybrid approach
    combining both domain knowledge and purely data driven feature designing. Based
    on the procedure of log Mel-filter banks, we design a filter bank learning
    layer. We concatenate this layer with a convolutional neural network (CNN)
    model. After training the network, the weight of the filter bank learning layer
    is extracted to facilitate the design of acoustic features. We smooth the
    trained weight of the learning layer and re-initialize it in filter bank
    learning layer as audio feature extractor. For the environmental sound
    recognition task based on the Urban- sound8K dataset, the experience guided
    learning leads to a 2% accuracy improvement compared with the fixed feature
    extractors (the log Mel-filter bank). The shape of the new filter banks are
    visualized and explained to prove the effectiveness of the feature design
    process.

    Maximizing Non-Monotone DR-Submodular Functions with Cardinality Constraints

    Ali Khodabakhsh, Evdokia Nikolova
    Comments: 7 pages with 2 figures
    Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI)

    We consider the problem of maximizing a non-monotone DR-submodular function
    subject to a cardinality constraint. Diminishing returns (DR) submodularity is
    a generalization of the diminishing returns property for functions defined over
    the integer lattice. This generalization can be used to solve many machine
    learning or combinatorial optimization problems such as optimal budget
    allocation, revenue maximization, etc. In this work we propose the first
    polynomial-time approximation algorithms for non-monotone constrained
    maximization. We implement our algorithms for a revenue maximization problem
    with a real-world dataset to check their efficiency and performance.

    Emergence of foveal image sampling from learning to attend in visual scenes

    Brian Cheung, Eric Weiss, Bruno Olshausen
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We describe a neural attention model with a learnable retinal sampling
    lattice. The model is trained on a visual search task requiring the
    classification of an object embedded in a visual scene amidst background
    distractors using the smallest number of fixations. We explore the tiling
    properties that emerge in the model’s retinal sampling lattice after training.
    Specifically, we show that this lattice resembles the eccentricity dependent
    sampling lattice of the primate retina, with a high resolution region in the
    fovea surrounded by a low resolution periphery. Furthermore, we find conditions
    where these emergent properties are amplified or eliminated providing clues to
    their function.

    Split-door criterion for causal identification: Automatic search for natural experiments

    Amit Sharma, Jake M. Hofman, Duncan J. Watts
    Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Applications (stat.AP)

    Unobserved or unknown confounders complicate even the simplest attempts to
    estimate the effect of one variable on another using observational data. When
    cause and effect are both affected by unobserved confounders, methods based on
    identifying natural experiments have been proposed to eliminate confounds.
    However, their validity is hard to verify because they depend on assumptions
    about the independence of variables, that by definition, cannot be measured. In
    this paper we investigate a particular scenario in time series data that
    permits causal identification in the presence of unobserved confounders and
    present an algorithm to automatically find such scenarios. Specifically, we
    examine what we call the split-door setting, when the effect variable can be
    split up into two parts: one that is potentially affected by the cause, and
    another that is independent of it. We show that when both of these variables
    are caused by the same (unobserved) confounders, the problem of identification
    reduces to that of testing for independence among observed variables. We
    discuss various situations in which split-door variables are commonly recorded
    in both online and offline settings, and demonstrate the method by estimating
    the causal impact of Amazon’s recommender system, obtaining more than 23,000
    natural experiments that provide similar—but more precise—estimates than
    past studies.


    Information Retrieval

    A Graph-based Push Service Platform

    Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He
    Subjects: Information Retrieval (cs.IR)

    It is well known that learning customers’ preference and making
    recommendations to them from today’s information-exploded environment is
    critical and non-trivial in an on-line system. There are two different modes of
    recommendation systems, namely pull-mode and push-mode. The majority of the
    recommendation systems are pull-mode, which recommend items to users only when
    and after users enter Application Market. While push-mode works more actively
    to enhance or re-build connection between Application Market and users. As one
    of the most successful phone manufactures,both the number of users and apps
    increase dramatically in Huawei Application Store (also named Hispace Store),
    which has approximately 0.3 billion registered users and 1.2 million apps until
    2016 and whose number of users is growing with high-speed. For the needs of
    real scenario, we establish a Push Service Platform (shortly, PSP) to discover
    the target user group automatically from web-scale user operation log data with
    an additional small set of labelled apps (usually around 10 apps),in Hispace
    Store. As presented in this work,PSP includes distributed storage layer,
    application layer and evaluation layer. In the application layer, we design a
    practical graph-based algorithm (named A-PARW) for user group discovery, which
    is an approximate version of partially absorbing random walk. Based on I mode
    of A-PARW, the effectiveness of our system is significantly improved, compared
    to the predecessor to presented system, which uses Personalized Pagerank in its
    application layer.

    Learning Concept Hierarchies through Probabilistic Topic Modeling

    V. S. Anoop, S. Asharaf, P. Deepak
    Journal-ref: International Journal of Information Processing (IJIP), Volume 10,
    Issue 3, 2016
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

    With the advent of semantic web, various tools and techniques have been
    introduced for presenting and organizing knowledge. Concept hierarchies are one
    such technique which gained significant attention due to its usefulness in
    creating domain ontologies that are considered as an integral part of semantic
    web. Automated concept hierarchy learning algorithms focus on extracting
    relevant concepts from unstructured text corpus and connect them together by
    identifying some potential relations exist between them. In this paper, we
    propose a novel approach for identifying relevant concepts from plain text and
    then learns hierarchy of concepts by exploiting subsumption relation between
    them. To start with, we model topics using a probabilistic topic model and then
    make use of some lightweight linguistic process to extract semantically rich
    concepts. Then we connect concepts by identifying an “is-a” relationship
    between pair of concepts. The proposed method is completely unsupervised and
    there is no need for a domain specific training corpus for concept extraction
    and learning. Experiments on large and real-world text corpora such as BBC News
    dataset and Reuters News corpus shows that the proposed method outperforms some
    of the existing methods for concept extraction and efficient concept hierarchy
    learning is possible if the overall task is guided by a probabilistic topic
    modeling algorithm.

    Generating Holistic 3D Scene Abstractions for Text-based Image Retrieval

    Ang Li, Jin Sun, Joe Yue-Hei Ng, Ruichi Yu, Vlad I. Morariu, Larry S. Davis
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG); Information Retrieval (cs.IR)

    Spatial relationships between objects provide important information for
    text-based image retrieval. As users are more likely to describe a scene from a
    real world perspective, using 3D spatial relationships rather than 2D
    relationships that assume a particular viewing direction, one of the main
    challenges is to infer the 3D structure that bridges images with users’ text
    descriptions. However, direct inference of 3D structure from images requires
    learning from large scale annotated data. Since interactions between objects
    can be reduced to a limited set of atomic spatial relations in 3D, we study the
    possibility of inferring 3D structure from a text description rather than an
    image, applying physical relation models to synthesize holistic 3D abstract
    object layouts satisfying the spatial constraints present in a textual
    description. We present a generic framework for retrieving images from a
    textual description of a scene by matching images with these generated abstract
    object layouts. Images are ranked by matching object detection outputs
    (bounding boxes) to 2D layout candidates (also represented by bounding boxes)
    which are obtained by projecting the 3D scenes with sampled camera directions.
    We validate our approach using public indoor scene datasets and show that our
    method outperforms an object occurrence based and a learned 2D pairwise
    relation based baselines.

    Times series averaging and denoising from a probabilistic perspective on time-elastic kernels

    Pierre-François Marteau (EXPRESSION)
    Subjects: Learning (cs.LG); Information Retrieval (cs.IR)

    In the light of regularized dynamic time warping kernels, this paper
    re-considers the concept of time elastic centroid for a setof time series. We
    derive a new algorithm based on a probabilistic interpretation of kernel
    alignment matrices. This algorithm expressesthe averaging process in terms of a
    stochastic alignment automata. It uses an iterative agglomerative heuristic
    method for averagingthe aligned samples, while also averaging the times of
    occurrence of the aligned samples. By comparing classification accuracies for45
    heterogeneous time series datasets obtained by first nearest centroid/medoid
    classifiers we show that: i) centroid-basedapproaches significantly outperform
    medoid-based approaches, ii) for the considered datasets, our algorithm that
    combines averagingin the sample space and along the time axes, emerges as the
    most significantly robust model for time-elastic averaging with apromising
    noise reduction capability. We also demonstrate its benefit in an isolated
    gesture recognition experiment and its ability tosignificantly reduce the size
    of training instance sets. Finally we highlight its denoising capability using
    demonstrative synthetic data:we show that it is possible to retrieve, from few
    noisy instances, a signal whose components are scattered in a wide spectral
    band.


    Computation and Language

    NewsQA: A Machine Comprehension Dataset

    Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, Kaheer Suleman
    Comments: Under review for ICLR 2016
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We present NewsQA, a challenging machine comprehension dataset of over
    100,000 question-answer pairs. Crowdworkers supply questions and answers based
    on a set of over 10,000 news articles from CNN, with answers consisting in
    spans of text from the corresponding articles. We collect this dataset through
    a four-stage process designed to solicit exploratory questions that require
    reasoning. A thorough analysis confirms that NewsQA demands abilities beyond
    simple word matching and recognizing entailment. We measure human performance
    on the dataset and compare it to several strong neural models. The performance
    gap between humans and machines (25.3% F1) indicates that significant progress
    can be made on NewsQA through future research. The dataset is freely available
    at datasets.maluuba.com/NewsQA.

    Geometry of Compositionality

    Hongyu Gong, Suma Bhat, Pramod Viswanath
    Subjects: Computation and Language (cs.CL)

    This paper proposes a simple test for compositionality (i.e., literal usage)
    of a word or phrase in a context-specific way. The test is computationally
    simple, relying on no external resources and only uses a set of trained word
    vectors. Experiments show that the proposed method is competitive with state of
    the art and displays high accuracy in context-specific compositionality
    detection of a variety of natural language phenomena (idiomaticity, sarcasm,
    metaphor) for different datasets in multiple languages. The key insight is to
    connect compositionality to a curious geometric property of word embeddings,
    which is of independent interest.

    Semantic Parsing of Mathematics by Context-based Learning from Aligned Corpora and Theorem Proving

    Cezary Kaliszyk, Josef Urban, Jiří Vyskočil
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We study methods for automated parsing of informal mathematical expressions
    into formal ones, a main prerequisite for deep computer understanding of
    informal mathematical texts. We propose a context-based parsing approach that
    combines efficient statistical learning of deep parse trees with their semantic
    pruning by type checking and large-theory automated theorem proving. We show
    that the methods very significantly improve on previous results in parsing
    theorems from the Flyspeck corpus.

    Sentiment Analysis for Twitter : Going Beyond Tweet Text

    Lahari Poddar, Kishaloy Halder, Xianyan Jia
    Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)

    Analysing sentiment of tweets is important as it helps to determine the
    users’ opinion. Knowing people’s opinion is crucial for several purposes
    starting from gathering knowledge about customer base, e-governance,
    campaigning and many more. In this report, we aim to develop a system to detect
    the sentiment from tweets. We employ several linguistic features along with
    some other external sources of information to detect the sentiment of a tweet.
    We show that augmenting the 140 character-long tweet with information harvested
    from external urls shared in the tweet as well as Social Media features
    enhances the sentiment prediction accuracy significantly.

    An End-to-End Architecture for Keyword Spotting and Voice Activity Detection

    Chris Lengerich, Awni Hannun
    Comments: NIPS 2016 End-to-End Learning for Speech and Audio Processing Workshop
    Subjects: Computation and Language (cs.CL)

    We propose a single neural network architecture for two tasks: on-line
    keyword spotting and voice activity detection. We develop novel inference
    algorithms for an end-to-end Recurrent Neural Network trained with the
    Connectionist Temporal Classification loss function which allow our model to
    achieve high accuracy on both keyword spotting and voice activity detection
    without retraining. In contrast to prior voice activity detection models, our
    architecture does not require aligned training data and uses the same
    parameters as the keyword spotting model. This allows us to deploy a high
    quality voice activity detector with no additional memory or maintenance
    requirements.

    Dialogue Learning With Human-In-The-Loop

    Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    An important aspect of developing conversational agents is to give a bot the
    ability to improve through communicating with humans and to learn from the
    mistakes that it makes. Most research has focused on learning from fixed
    training sets of labeled data rather than interacting with a dialogue partner
    in an online fashion. In this paper we explore this direction in a
    reinforcement learning setting where the bot improves its question-answering
    ability from feedback a teacher gives following its generated responses. We
    build a simulator that tests various aspects of such learning in a synthetic
    environment, and introduce models that work in this regime. Finally, real
    experiments with Mechanical Turk validate the approach.

    Learning Concept Hierarchies through Probabilistic Topic Modeling

    V. S. Anoop, S. Asharaf, P. Deepak
    Journal-ref: International Journal of Information Processing (IJIP), Volume 10,
    Issue 3, 2016
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

    With the advent of semantic web, various tools and techniques have been
    introduced for presenting and organizing knowledge. Concept hierarchies are one
    such technique which gained significant attention due to its usefulness in
    creating domain ontologies that are considered as an integral part of semantic
    web. Automated concept hierarchy learning algorithms focus on extracting
    relevant concepts from unstructured text corpus and connect them together by
    identifying some potential relations exist between them. In this paper, we
    propose a novel approach for identifying relevant concepts from plain text and
    then learns hierarchy of concepts by exploiting subsumption relation between
    them. To start with, we model topics using a probabilistic topic model and then
    make use of some lightweight linguistic process to extract semantically rich
    concepts. Then we connect concepts by identifying an “is-a” relationship
    between pair of concepts. The proposed method is completely unsupervised and
    there is no need for a domain specific training corpus for concept extraction
    and learning. Experiments on large and real-world text corpora such as BBC News
    dataset and Reuters News corpus shows that the proposed method outperforms some
    of the existing methods for concept extraction and efficient concept hierarchy
    learning is possible if the overall task is guided by a probabilistic topic
    modeling algorithm.

    Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce

    Tom Zahavy, Alessandro Magnani, Abhinandan Krishnan, Shie Mannor
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    Classifying products into categories precisely and efficiently is a major
    challenge in modern e-commerce. The high traffic of new products uploaded daily
    and the dynamic nature of the categories raise the need for machine learning
    models that can reduce the cost and time of human editors. In this paper, we
    propose a decision level fusion approach for multi-modal product classification
    using text and image inputs. We train input specific state-of-the-art deep
    neural networks for each input source, show the potential of forging them
    together into a multi-modal architecture and train a novel policy network that
    learns to choose between them. Finally, we demonstrate that our multi-modal
    network improves the top-1 accuracy % over both networks on a real-world
    large-scale product classification dataset that we collected fromWalmart.com.
    While we focus on image-text fusion that characterizes e-commerce domains, our
    algorithms can be easily applied to other modalities such as audio, video,
    physical sensors, etc.

    Intelligible Language Modeling with Input Switched Affine Networks

    Jakob N. Foerster, Justin Gilmer, Jan Chorowski, Jascha Sohl-Dickstein, David Sussillo
    Comments: ICLR 2107 submission: this https URL
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    The computational mechanisms by which nonlinear recurrent neural networks
    (RNNs) achieve their goals remains an open question. There exist many problem
    domains where intelligibility of the network model is crucial for deployment.
    Here we introduce a recurrent architecture composed of input-switched affine
    transformations, in other words an RNN without any nonlinearity and with one
    set of weights per input. We show that this architecture achieves near
    identical performance to traditional architectures on language modeling of
    Wikipedia text, for the same number of model parameters. It can obtain this
    performance with the potential for computational speedup compared to existing
    methods, by precomputing the composed affine transformations corresponding to
    longer input sequences. As our architecture is affine, we are able to
    understand the mechanisms by which it functions using linear methods. For
    example, we show how the network linearly combines contributions from the past
    to make predictions at the current time step. We show how representations for
    words can be combined in order to understand how context is transferred across
    word boundaries. Finally, we demonstrate how the system can be executed and
    analyzed in arbitrary bases to aid understanding.


    Distributed, Parallel, and Cluster Computing

    Serving the Grid: an Experimental Study of Server Clusters as Real-Time Demand Response Resources

    Josiah McClurg, Raghuraman Mudumbai
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

    Demand response is a crucial technology to allow large-scale penetration of
    intermittent renewable energy sources in the electric grid. This paper is based
    on the thesis that datacenters represent especially attractive candidates for
    providing flexible, real-time demand response services to the grid; they are
    capable of finely-controllable power consumption, fast power ramp-rates, and
    large dynamic range. This paper makes two main contributions: (a) it provides
    detailed experimental evidence justifying this thesis, and (b) it presents a
    comparative investigation of three candidate software interfaces for power
    control within the servers. All of these results are based on a series of
    experiments involving real-time power measurements on a lab-scale server
    cluster. This cluster was specially instrumented for accurate and fast power
    measurements on a time-scale of 100 ms or less. Our results provide preliminary
    evidence for the feasibility of large scale demand response using datacenters,
    and motivates future work on exploiting this capability.

    Proposal of Optimum Application Deployment Technology for Heterogeneous IaaS Cloud

    Yoji Yamato
    Comments: 4 pages, 1 figure, 2016 6th International Workshop on Computer Science and Engineering (WCSE 2016), June 2016
    Journal-ref: 2016 6th International Workshop on Computer Science and
    Engineering (WCSE 2016), pp.34-37, June 2016
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Recently, cloud systems composed of heterogeneous hardware have been
    increased to utilize progressed hardware power. However, to program
    applications for heterogeneous hardware to achieve high performance needs much
    technical skill and is difficult for users. Therefore, to achieve high
    performance easily, this paper proposes a PaaS which analyzes application
    logics and offloads computations to GPU and FPGA automatically when users
    deploy applications to clouds.

    Server Structure Proposal and Automatic Verification Technology on IaaS Cloud of Plural Type Servers

    Yoji Yamato
    Comments: 13 pages, 9 figures, International Conference on Internet Studies (NETs2015), July 2015
    Journal-ref: International Conference on Internet Studies (NETs2015), July 2015
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In this paper, we propose a server structure proposal and automatic
    performance verification technology which proposes and verifies an appropriate
    server structure on Infrastructure as a Service (IaaS) cloud with baremetal
    servers, container based virtual servers and virtual machines. Recently, cloud
    services have been progressed and providers provide not only virtual machines
    but also baremetal servers and container based virtual servers. However, users
    need to design an appropriate server structure for their requirements based on
    3 types quantitative performances and users need much technical knowledge to
    optimize their system performances. Therefore, we study a technology which
    satisfies users’ performance requirements on these 3 types IaaS cloud. Firstly,
    we measure performances of a baremetal server, Docker containers, KVM (Kernel
    based Virtual Machine) virtual machines on OpenStack with virtual server number
    changing. Secondly, we propose a server structure proposal technology based on
    the measured quantitative data. A server structure proposal technology receives
    an abstract template of OpenStack Heat and function/performance requirements
    and then creates a concrete template with server specification information.
    Thirdly, we propose an automatic performance verification technology which
    executes necessary performance tests automatically on provisioned user
    environments according to the template.

    Cluster-wide Scheduling of Flexible, Distributed Analytic Applications

    Pace Francesco, Daniele Venzano, Damiano Carra, Pietro Michiardi
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    This work addresses the problem of scheduling user-defined analytic
    applications, which we define as high-level compositions of frameworks, their
    components, and the logic necessary to carry out work. The key idea in our
    application definition, is to distinguish classes of components, including
    rigid and elastic types: the first being required for an application to make
    progress, the latter contributing to reduced execution times. We show that the
    problem of scheduling such applications poses new challenges, which existing
    approaches address inefficiently.

    Thus, we present the design and evaluation of a novel, flexible heuristic to
    schedule analytic applications, that aim at high system responsiveness, by
    allocating resources efficiently, thanks to the flexibility of elastic
    components. Our algorithm is evaluated using a trace-driven simulation
    approach, with large-scale real system traces: our flexible scheduler
    outperforms a baseline approach across a variety of metrics, including
    application turnaround times, and resource allocation efficiency.

    We also present the design and evaluation of a full-fledged system, which we
    called Zoe, that incorporates the ideas presented in this paper, and report
    concrete improvements in terms of efficiency and performance, with respect to
    prior generations of our system.


    Learning

    Improving Variational Auto-Encoders using Householder Flow

    Jakub M. Tomczak, Max Welling
    Comments: Bayesian Deep Learning Workshop (NIPS 2016)
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Variational auto-encoders (VAE) are scalable and powerful generative models.
    However, the choice of the variational posterior determines tractability and
    flexibility of the VAE. Commonly, latent variables are modeled using the normal
    distribution with a diagonal covariance matrix. This results in computational
    efficiency but typically it is not flexible enough to match the true posterior
    distribution. One fashion of enriching the variational posterior distribution
    is application of normalizing flows, i.e., a series of invertible
    transformations to latent variables with a simple posterior. In this paper, we
    follow this line of thinking and propose a volume-preserving flow that uses a
    series of Householder transformations. We show empirically on MNIST dataset and
    histopathology data that the proposed flow allows to obtain more flexible
    variational posterior and highly competitive results comparing to other
    normalizing flows.

    Graph-Based Manifold Frequency Analysis for Denoising

    Shay Deutsch, Antonio Ortega, Gerard Medioni
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We propose a new framework for manifold denoising based on processing in the
    graph Fourier frequency domain, derived from the spectral decomposition of the
    discrete graph Laplacian. Our approach uses the Spectral Graph Wavelet
    transform in order to per- form non-iterative denoising directly in the graph
    frequency domain, an approach inspired by conventional wavelet-based signal
    denoising methods. We theoretically justify our approach, based on the fact
    that for smooth manifolds the coordinate information energy is localized in the
    low spectral graph wavelet sub-bands, while the noise affects all frequency
    bands in a similar way. Experimental results show that our proposed manifold
    frequency denoising (MFD) approach significantly outperforms the state of the
    art denoising meth- ods, and is robust to a wide range of parameter selections,
    e.g., the choice of k nearest neighbor connectivity of the graph.

    Cost-Sensitive Random Pair Encoding for Multi-Label Classification

    Yao-Yuan Yang, Chih-Wei Chang, Hsuan-Tien Lin
    Subjects: Learning (cs.LG)

    We propose a novel cost-sensitive multi-label classification algorithm called
    cost-sensitive random pair encoding (CSRPE). CSRPE reduces the cost-sensitive
    multi-label classification problem to many cost-sensitive binary classification
    problems through the label powerset approach followed by the classic
    one-versus-one decomposition. While such a naive reduction results in
    exponentially-many classifiers, we resolve the training challenge of building
    the many classifiers by random sampling, and the prediction challenge of voting
    from the many classifiers by nearest-neighbor decoding through casting the
    one-versus-one decomposition as a special case of error-correcting code.
    Extensive experimental results demonstrate that CSRPE achieves stable
    convergence and reaches better performance than other ensemble-learning and
    error-correcting-coding algorithms for multi-label classification. The results
    also justify that CSRPE is competitive with state-of-the-art cost-sensitive
    multi-label classification algorithms for cost-sensitive multi-label
    classification.

    The Emergence of Organizing Structure in Conceptual Representation

    Brenden M. Lake, Neil D. Lawrence, Joshua B. Tenenbaum
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Both scientists and children make important structural discoveries, yet their
    computational underpinnings are not well understood. Structure discovery has
    previously been formalized as probabilistic inference about the right
    structural form — where form could be a tree, ring, chain, grid, etc. [Kemp &
    Tenenbaum (2008). The discovery of structural form. PNAS, 105(3), 10687-10692].
    While this approach can learn intuitive organizations, including a tree for
    animals and a ring for the color circle, it assumes a strong inductive bias
    that considers only these particular forms, and each form is explicitly
    provided as initial knowledge. Here we introduce a new computational model of
    how organizing structure can be discovered, utilizing a broad hypothesis space
    with a preference for sparse connectivity. Given that the inductive bias is
    more general, the model’s initial knowledge shows little qualitative
    resemblance to some of the discoveries it supports. As a consequence, the model
    can also learn complex structures for domains that lack intuitive description,
    as well as predict human property induction judgments without explicit
    structural forms. By allowing form to emerge from sparsity, our approach
    clarifies how both the richness and flexibility of human conceptual
    organization can coexist.

    Learning Features of Music from Scratch

    John Thickstun, Zaid Harchaoui, Sham Kakade
    Comments: 13 pages
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)

    We introduce a new large-scale music dataset, MusicNet, to serve as a source
    of supervision and evaluation of machine learning methods for music research.
    MusicNet consists of hundreds of freely-licensed classical music recordings by
    10 composers, written for 11 instruments, together with instrument/note
    annotations resulting in over 1 million temporal labels on 34 hours of chamber
    music performances under various studio and microphone conditions.

    We define a multi-label classification task to predict notes in musical
    recordings, along with an evaluation protocol. We benchmark several machine
    learning architectures for this task: i) learning from “hand-crafted”
    spectrogram features; ii) end-to-end learning with a neural net; iii)
    end-to-end learning with a convolutional neural net. We show that several
    end-to-end learning proposals outperform approaches based on learning from
    hand-crafted audio features.

    Measuring and modeling the perception of natural and unconstrained gaze in humans and machines

    Daniel Harari, Tao Gao, Nancy Kanwisher, Joshua Tenenbaum, Shimon Ullman
    Comments: Daniel Harari and Tao Gao contributed equally to this work
    Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Humans are remarkably adept at interpreting the gaze direction of other
    individuals in their surroundings. This skill is at the core of the ability to
    engage in joint visual attention, which is essential for establishing social
    interactions. How accurate are humans in determining the gaze direction of
    others in lifelike scenes, when they can move their heads and eyes freely, and
    what are the sources of information for the underlying perceptual processes?
    These questions pose a challenge from both empirical and computational
    perspectives, due to the complexity of the visual input in real-life
    situations. Here we measure empirically human accuracy in perceiving the gaze
    direction of others in lifelike scenes, and study computationally the sources
    of information and representations underlying this cognitive capacity. We show
    that humans perform better in face-to-face conditions compared with recorded
    conditions, and that this advantage is not due to the availability of input
    dynamics. We further show that humans are still performing well when only the
    eyes-region is visible, rather than the whole face. We develop a computational
    model, which replicates the pattern of human performance, including the finding
    that the eyes-region contains on its own, the required information for
    estimating both head orientation and direction of gaze. Consistent with
    neurophysiological findings on task-specific face regions in the brain, the
    learned computational representations reproduce perceptual effects such as the
    Wollaston illusion, when trained to estimate direction of gaze, but not when
    trained to recognize objects or faces.

    Co-adaptive learning over a countable space

    Michael Rabadi
    Comments: 6 pages, 1 figure, NIPS 2016 Time Series Workshop
    Journal-ref: In NIPS 2016 Time Series Workshop. Barcelona, Spain
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Co-adaptation is a special form of on-line learning where an algorithm
    (mathcal{A}) must assist an unknown algorithm (mathcal{B}) to perform some
    task. This is a general framework and has applications in recommendation
    systems, search, education, and much more. Today, the most common use of
    co-adaptive algorithms is in brain-computer interfacing (BCI), where algorithms
    help patients gain and maintain control over prosthetic devices. While previous
    studies have shown strong empirical results Kowalski et al. (2013); Orsborn et
    al. (2014) or have been studied in specific examples Merel et al. (2013, 2015),
    there is no general analysis of the co-adaptive learning problem. Here we will
    study the co-adaptive learning problem in the online, closed-loop setting. We
    will prove that, with high probability, co-adaptive learning is guaranteed to
    outperform learning with a fixed decoder as long as a particular condition is
    met.

    Gossip training for deep learning

    Michael Blot, David Picard, Matthieu Cord, Nicolas Thome
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    We address the issue of speeding up the training of convolutional networks.
    Here we study a distributed method adapted to stochastic gradient descent
    (SGD). The parallel optimization setup uses several threads, each applying
    individual gradient descents on a local variable. We propose a new way to share
    information between different threads inspired by gossip algorithms and showing
    good consensus convergence properties. Our method called GoSGD has the
    advantage to be fully asynchronous and decentralized. We compared our method to
    the recent EASGD in cite{elastic} on CIFAR-10 show encouraging results.

    Associative Memory using Dictionary Learning and Expander Decoding

    Arya Mazumdar, Ankit Singh Rawat
    Comments: To appear in AAAI 2017
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    An associative memory is a framework of content-addressable memory that
    stores a collection of message vectors (or a dataset) over a neural network
    while enabling a neurally feasible mechanism to recover any message in the
    dataset from its noisy version. Designing an associative memory requires
    addressing two main tasks: 1) learning phase: given a dataset, learn a concise
    representation of the dataset in the form of a graphical model (or a neural
    network), 2) recall phase: given a noisy version of a message vector from the
    dataset, output the correct message vector via a neurally feasible algorithm
    over the network learnt during the learning phase. This paper studies the
    problem of designing a class of neural associative memories which learns a
    network representation for a large dataset that ensures correction against a
    large number of adversarial errors during the recall phase. Specifically, the
    associative memories designed in this paper can store dataset containing
    (exp(n)) (n)-length message vectors over a network with (O(n)) nodes and can
    tolerate (Omega(frac{n}{{
    m polylog} n})) adversarial errors. This paper
    carries out this memory design by mapping the learning phase and recall phase
    to the tasks of dictionary learning with a square dictionary and iterative
    error correction in an expander code, respectively.

    Fast Wavenet Generation Algorithm

    Tom Le Paine, Pooya Khorrami, Shiyu Chang, Yang Zhang, Prajit Ramachandran, Mark A. Hasegawa-Johnson, Thomas S. Huang
    Comments: Technical Report
    Subjects: Sound (cs.SD); Data Structures and Algorithms (cs.DS); Learning (cs.LG)

    This paper presents an efficient implementation of the Wavenet generation
    process called Fast Wavenet. Compared to a naive implementation that has
    complexity O(2^L) (L denotes the number of layers in the network), our proposed
    approach removes redundant convolution operations by caching previous
    calculations, thereby reducing the complexity to O(L) time. Timing experiments
    show significant advantages of our fast implementation over a naive one. While
    this method is presented for Wavenet, the same scheme can be applied anytime
    one wants to perform autoregressive generation or online prediction using a
    model with dilated convolution layers. The code for our method is publicly
    available.

    The Upper Bound on Knots in Neural Networks

    Kevin K. Chen
    Comments: 19 pages, 8 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Neural networks with rectified linear unit activations are essentially
    multivariate linear splines. As such, one of many ways to measure the
    “complexity” or “expressivity” of a neural network is to count the number of
    knots in the spline model. We study the number of knots in fully-connected
    feedforward neural networks with rectified linear unit activation functions. We
    intentionally keep the neural networks very simple, so as to make theoretical
    analyses more approachable. An induction on the number of layers (l) reveals a
    tight upper bound on the number of knots in (mathbb{R} o mathbb{R}^p) deep
    neural networks. With (n_i gg 1) neurons in layer (i = 1, dots, l), the upper
    bound is approximately (n_1 dots n_l). We then show that the exact upper bound
    is tight, and we demonstrate the upper bound with an example. The purpose of
    these analyses is to pave a path for understanding the behavior of general
    (mathbb{R}^q o mathbb{R}^p) neural networks.

    The empirical size of trained neural networks

    Kevin K. Chen, Anthony Gamst, Alden Walker
    Comments: 6 pages, 5 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    ReLU neural networks define piecewise linear functions of their inputs.
    However, initializing and training a neural network is very different from
    fitting a linear spline. In this paper, we expand empirically upon previous
    theoretical work to demonstrate features of trained neural networks. Standard
    network initialization and training produce networks vastly simpler than a
    naive parameter count would suggest and can impart odd features to the trained
    network. However, we also show the forced simplicity is beneficial and, indeed,
    critical for the wide success of these networks.

    Intelligible Language Modeling with Input Switched Affine Networks

    Jakob N. Foerster, Justin Gilmer, Jan Chorowski, Jascha Sohl-Dickstein, David Sussillo
    Comments: ICLR 2107 submission: this https URL
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    The computational mechanisms by which nonlinear recurrent neural networks
    (RNNs) achieve their goals remains an open question. There exist many problem
    domains where intelligibility of the network model is crucial for deployment.
    Here we introduce a recurrent architecture composed of input-switched affine
    transformations, in other words an RNN without any nonlinearity and with one
    set of weights per input. We show that this architecture achieves near
    identical performance to traditional architectures on language modeling of
    Wikipedia text, for the same number of model parameters. It can obtain this
    performance with the potential for computational speedup compared to existing
    methods, by precomputing the composed affine transformations corresponding to
    longer input sequences. As our architecture is affine, we are able to
    understand the mechanisms by which it functions using linear methods. For
    example, we show how the network linearly combines contributions from the past
    to make predictions at the current time step. We show how representations for
    words can be combined in order to understand how context is transferred across
    word boundaries. Finally, we demonstrate how the system can be executed and
    analyzed in arbitrary bases to aid understanding.

    Emergence of foveal image sampling from learning to attend in visual scenes

    Brian Cheung, Eric Weiss, Bruno Olshausen
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We describe a neural attention model with a learnable retinal sampling
    lattice. The model is trained on a visual search task requiring the
    classification of an object embedded in a visual scene amidst background
    distractors using the smallest number of fixations. We explore the tiling
    properties that emerge in the model’s retinal sampling lattice after training.
    Specifically, we show that this lattice resembles the eccentricity dependent
    sampling lattice of the primate retina, with a high resolution region in the
    fovea surrounded by a low resolution periphery. Furthermore, we find conditions
    where these emergent properties are amplified or eliminated providing clues to
    their function.

    Safety-Aware Robot Damage Recovery Using Constrained Bayesian Optimization and Simulated Priors

    Vaios Papaspyros, Konstantinos Chatzilygeroudis, Vassilis Vassiliades, Jean-Baptiste Mouret
    Comments: Accepted at the BayesOpt 2016 NIPS workshop, 5 pages, 2 figures, 1 algorithm
    Subjects: Robotics (cs.RO); Learning (cs.LG)

    The recently introduced Intelligent Trial-and-Error (IT&E) algorithm showed
    that robots can adapt to damage in a matter of a few trials. The success of
    this algorithm relies on two components: prior knowledge acquired through
    simulation with an intact robot, and Bayesian optimization (BO) that operates
    on-line, on the damaged robot. While IT&E leads to fast damage recovery, it
    does not incorporate any safety constraints that prevent the robot from
    attempting harmful behaviors. In this work, we address this limitation by
    replacing the BO component with a constrained BO procedure. We evaluate our
    approach on a simulated damaged humanoid robot that needs to crawl as fast as
    possible, while performing as few unsafe trials as possible. We compare our new
    “safety-aware IT&E” algorithm to IT&E and a multi-objective version of IT&E in
    which the safety constraints are dealt as separate objectives. Our results show
    that our algorithm outperforms the other approaches, both in crawling speed
    within the safe regions and number of unsafe trials.


    Information Theory

    Perturbation-Based Regularization for Signal Estimation in Linear Discrete Ill-posed Problems

    Mohamed Suliman, Tarig Ballal, Tareq Y. Al-Naffouri
    Comments: 13 pages, Journal
    Subjects: Information Theory (cs.IT)

    Estimating the values of unknown parameters from corrupted measured data
    faces a lot of challenges in ill-posed problems. In such problems, many
    fundamental estimation methods fail to provide a meaningful stabilized
    solution. In this work, we propose a new regularization approach and a new
    regularization parameter selection approach for linear least-squares discrete
    ill-posed problems. The proposed approach is based on enhancing the
    singular-value structure of the ill-posed model matrix to acquire a better
    solution. Unlike many other regularization algorithms that seek to minimize the
    estimated data error, the proposed approach is developed to minimize the
    mean-squared error of the estimator which is the objective in many typical
    estimation scenarios. The performance of the proposed approach is demonstrated
    by applying it to a large set of real-world discrete ill-posed problems.
    Simulation results demonstrate that the proposed approach outperforms a set of
    benchmark regularization methods in most cases. In addition, the approach also
    enjoys the lowest runtime and offers the highest level of robustness amongst
    all the tested benchmark regularization methods.

    Information Rates and post-FEC BER Prediction in Optical Fiber Communications

    Alex Alvarado
    Comments: Invited paper, OFC 2017
    Subjects: Information Theory (cs.IT)

    Information-theoretic metrics to analyze optical fiber communications systems
    with binary and nonbinary soft-decision FEC are reviewed. The numerical
    evaluation of these metrics in both simulations and experiments is also
    discussed. Ready-to-use closed-form approximations are presented.

    Transmit design for MIMO wiretap channel with a malicious jammer

    Duo Zhang, Weidong Mei, Lingxiang Li, Zhi Chen
    Comments: 2015 IEEE 81st Vehicular Technology Conference (VTC Spring)
    Subjects: Information Theory (cs.IT)

    In this paper, we consider the transmit design for multi-input multi-output
    (MIMO) wiretap channel including a malicious jammer. We first transform the
    system model into the traditional three-node wiretap channel by whitening the
    interference at the legitimate user. Additionally, the eavesdropper channel
    state information (ECSI) may be fully or statistically known, even unknown to
    the transmitter. Hence, some strategies are proposed in terms of different
    levels of ECSI available to the transmitter in our paper. For the case of
    unknown ECSI, a target rate for the legitimate user is first specified. And
    then an inverse water-filling algorithm is put forward to find the optimal
    power allocation for each information symbol, with a stepwise search being used
    to adjust the spatial dimension allocated to artificial noise (AN) such that
    the target rate is achievable. As for the case of statistical ECSI, several
    simulated channels are randomly generated according to the distribution of
    ECSI. We show that the ergodic secrecy capacity can be approximated as the
    average secrecy capacity of these simulated channels. Through maximizing this
    average secrecy capacity, we can obtain a feasible power and spatial dimension
    allocation scheme by using one dimension search. Finally, numerical results
    reveal the effectiveness and computational efficiency of our algorithms.

    Generalization of the de Bruijn's identity to general (φ)-entropies and (φ)-Fisher informations

    Irene Valero Toranzo, Steeve Zozor, Jean-Marc Brossier
    Subjects: Information Theory (cs.IT)

    In this paper, we propose generalizations of the de Bruijn’s identities based
    on extensions of the Shannon entropy, Fisher information and their associated
    divergences or relative measures. The foundation of these generalizations are
    the (phi)-entropies and divergences of the Csisz’a’s class (or Salicr’u’s
    class) considered within a multidimensional context, included the
    monodimensional case, and for several type of noisy channels characterized by a
    more general probability distribution beyond the well-known Gaussian noise. It
    is found that the gradient and/or the hessian of these entropies or divergences
    with respect to the noise parameters give naturally rise to generalized
    versions of the Fisher information or divergence, which are named as the
    (phi)-Fisher information (divergence). The obtained identities can be viewed
    as further extensions of the classical de Bruijn’s identity. Analogously, it is
    shown that a similar relation holds between the (phi)-divergence and a
    extended mean-square error, named (phi)-mean square error, for the Gaussian
    channel.

    Associative Memory using Dictionary Learning and Expander Decoding

    Arya Mazumdar, Ankit Singh Rawat
    Comments: To appear in AAAI 2017
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    An associative memory is a framework of content-addressable memory that
    stores a collection of message vectors (or a dataset) over a neural network
    while enabling a neurally feasible mechanism to recover any message in the
    dataset from its noisy version. Designing an associative memory requires
    addressing two main tasks: 1) learning phase: given a dataset, learn a concise
    representation of the dataset in the form of a graphical model (or a neural
    network), 2) recall phase: given a noisy version of a message vector from the
    dataset, output the correct message vector via a neurally feasible algorithm
    over the network learnt during the learning phase. This paper studies the
    problem of designing a class of neural associative memories which learns a
    network representation for a large dataset that ensures correction against a
    large number of adversarial errors during the recall phase. Specifically, the
    associative memories designed in this paper can store dataset containing
    (exp(n)) (n)-length message vectors over a network with (O(n)) nodes and can
    tolerate (Omega(frac{n}{{
    m polylog} n})) adversarial errors. This paper
    carries out this memory design by mapping the learning phase and recall phase
    to the tasks of dictionary learning with a square dictionary and iterative
    error correction in an expander code, respectively.




沪ICP备19023445号-2号
友情链接