IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Fri, 16 Dec 2016

    我爱机器学习(52ml.net)发表于 2016-12-16 00:00:00
    love 0

    Neural and Evolutionary Computing

    Graphical RNN Models

    Ashish Bora, Sugato Basu, Joydeep Ghosh
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Many time series are generated by a set of entities that interact with one
    another over time. This paper introduces a broad, flexible framework to learn
    from multiple inter-dependent time series generated by such entities. Our
    framework explicitly models the entities and their interactions through time.
    It achieves this by building on the capabilities of Recurrent Neural Networks,
    while also offering several ways to incorporate domain knowledge/constraints
    into the model architecture. The capabilities of our approach are showcased
    through an application to weather prediction, which shows gains over strong
    baselines.

    Improving Neural Network Generalization by Combining Parallel Circuits with Dropout

    Kien Tuong Phan, Tomas Henrique Maul, Tuong Thuy Vu, Lai Weng Kin
    Comments: Pre-print. The final publication is available at Springer via this http URL
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    In an attempt to solve the lengthy training times of neural networks, we
    proposed Parallel Circuits (PCs), a biologically inspired architecture.
    Previous work has shown that this approach fails to maintain generalization
    performance in spite of achieving sharp speed gains. To address this issue, and
    motivated by the way Dropout prevents node co-adaption, in this paper, we
    suggest an improvement by extending Dropout to the PC architecture. The paper
    provides multiple insights into this combination, including a variety of fusion
    approaches. Experiments show promising results in which improved error rates
    are achieved in most cases, whilst maintaining the speed advantage of the PC
    approach.

    Learning binary or real-valued time-series via spike-timing dependent plasticity

    Takayuki Osogami
    Comments: This paper was accepted and presented at Computing with Spikes NIPS 2016 Workshop, Barcelona, Spain, December 2016
    Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    A dynamic Boltzmann machine (DyBM) has been proposed as a model of a spiking
    neural network, and its learning rule of maximizing the log-likelihood of given
    time-series has been shown to exhibit key properties of spike-timing dependent
    plasticity (STDP), which had been postulated and experimentally confirmed in
    the field of neuroscience as a learning rule that refines the Hebbian rule.
    Here, we relax some of the constraints in the DyBM in a way that it becomes
    more suitable for computation and learning. We show that learning the DyBM can
    be considered as logistic regression for binary-valued time-series. We also
    show how the DyBM can learn real-valued data in the form of a Gaussian DyBM and
    discuss its relation to the vector autoregressive (VAR) model. The Gaussian
    DyBM extends the VAR by using additional explanatory variables, which
    correspond to the eligibility traces of the DyBM and capture long term
    dependency of the time-series. Numerical experiments show that the Gaussian
    DyBM significantly improves the predictive accuracy over VAR.

    Neural Networks for Joint Sentence Classification in Medical Paper Abstracts

    Franck Dernoncourt, Ji Young Lee, Peter Szolovits
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Existing models based on artificial neural networks (ANNs) for sentence
    classification often do not incorporate the context in which sentences appear,
    and classify sentences individually. However, traditional sentence
    classification approaches have been shown to greatly benefit from jointly
    classifying subsequent sentences, such as with conditional random fields. In
    this work, we present an ANN architecture that combines the effectiveness of
    typical ANN models to classify sentences in isolation, with the strength of
    structured prediction. Our model achieves state-of-the-art results on two
    different datasets for sequential sentence classification in medical abstracts.

    Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNN

    Li Jing, Yichen Shen, Tena Dubček, John Peurifoy, Scott Skirlo, Max Tegmark, Marin Soljačić
    Comments: 9 pages, 4 figures
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We present a method for implementing an Efficient Unitary Neural Network
    (EUNN) whose computational complexity is merely (mathcal{O}(1)) per parameter
    and has full tunability, from spanning part of unitary space to all of it. We
    apply the EUNN in Recurrent Neural Networks, and test its performance on the
    standard copying task and the MNIST digit recognition benchmark, finding that
    it significantly outperforms a non-unitary RNN, an LSTM network, an exclusively
    partial space URNN and a projective URNN with comparable parameter numbers.


    Computer Vision and Pattern Recognition

    Visual Compiler: Synthesizing a Scene-Specific Pedestrian Detector and Pose Estimator

    Namhoon Lee, Xinshuo Weng, Vishnu Naresh Boddeti, Yu Zhang, Fares Beainy, Kris Kitani, Takeo Kanade
    Comments: submitted to CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We introduce the concept of a Visual Compiler that generates a scene specific
    pedestrian detector and pose estimator without any pedestrian observations.
    Given a single image and auxiliary scene information in the form of camera
    parameters and geometric layout of the scene, the Visual Compiler first infers
    geometrically and photometrically accurate images of humans in that scene
    through the use of computer graphics rendering. Using these renders we learn a
    scene-and-region specific spatially-varying fully convolutional neural network,
    for simultaneous detection, pose estimation and segmentation of pedestrians. We
    demonstrate that when real human annotated data is scarce or non-existent, our
    data generation strategy can provide an excellent solution for bootstrapping
    human detection and pose estimation. Experimental results show that our
    approach outperforms off-the-shelf state-of-the-art pedestrian detectors and
    pose estimators that are trained on real data.

    CSVideoNet: A Recurrent Convolutional Neural Network for Compressive Sensing Video Reconstruction

    Kai Xu, Fengbo Ren
    Comments: 10 pages, 6 pages, 2 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In this paper, we develop a deep neural network architecture called
    “CSVideoNet” that can learn visual representations from random measurements for
    compressive sensing (CS) video reconstruction. CSVideoNet is an end-to-end
    trainable and non-iterative model that combines convolutional neural networks
    (CNNs) with a recurrent neural networks (RNN) to facilitate video
    reconstruction by leveraging temporal-spatial features. The proposed network
    can accept random measurements with a multi-level compression ratio (CR). The
    lightly and aggressively compressed measurements offer background information
    and object details, respectively. This is similar to the variable bit rate
    techniques widely used in conventional video coding approaches. The RNN
    employed by CSVideoNet can leverage temporal coherence that exists in adjacent
    video frames to extrapolate motion features and merge them with spatial visual
    features extracted by the CNNs to further enhance reconstruction quality,
    especially at high CRs. We test our CSVideoNet on the UCF-101 dataset.
    Experimental results show that CSVideoNet outperforms the existing video CS
    reconstruction approaches. The results demonstrate that our method can preserve
    relatively excellent visual details from original videos even at a 100x CR,
    which is difficult to realize with the reference approaches. Also, the
    non-iterative nature of CSVideoNet results in an decrease in runtime by three
    orders of magnitude over iterative reconstruction algorithms. Furthermore,
    CSVideoNet can enhance the CR of CS cameras beyond the limitation of
    conventional approaches, ensuring a reduction in bandwidth for data
    transmission. These benefits are especially favorable to high-frame-rate video
    applications.

    SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

    John McCormac, Ankur Handa, Stefan Leutenegger, Andrew J. Davison
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We introduce SceneNet RGB-D, expanding the previous work of SceneNet to
    enable large scale photorealistic rendering of indoor scene trajectories. It
    provides pixel-perfect ground truth for scene understanding problems such as
    semantic segmentation, instance segmentation, and object detection, and also
    for geometric computer vision problems such as optical flow, depth estimation,
    camera pose estimation, and 3D reconstruction. Random sampling permits
    virtually unlimited scene configurations, and here we provide a set of 5M
    rendered RGB-D images from over 15K trajectories in synthetic layouts with
    random but physically simulated object poses. Each layout also has random
    lighting, camera trajectories, and textures. The scale of this dataset is well
    suited for pre-training data-driven computer vision techniques from scratch
    with RGB-D inputs, which previously has been limited by relatively small
    labelled datasets in NYUv2 and SUN RGB-D. It also provides a basis for
    investigating 3D scene labelling tasks by providing perfect camera poses and
    depth data as proxy for a SLAM system. We host the dataset at
    this http URL

    Reflectance Adaptive Filtering Improves Intrinsic Image Estimation

    Thomas Nestmeyer, Peter V. Gehler
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Separation of an input image into its reflectance and shading layers poses a
    challenge for learning approaches because no large corpus of precise and
    realistic ground truth decompositions exists. The Intrinsic Images in the Wild
    dataset (IIW) provides a sparse set of relative human reflectance judgments,
    which serves as a standard benchmark for intrinsic images. This dataset led to
    an increase in methods that learn statistical dependencies between the images
    and their reflectance layer. Although learning plays a role in pushing
    state-of-the-art performance, we show that a standard signal processing
    technique achieves performance on par with recent developments. We propose a
    loss function that enables learning dense reflectance predictions with a CNN.
    Our results show a simple pixel-wise decision, without any context or prior
    knowledge, is sufficient to provide a strong baseline on IIW. This sets a
    competitive bar and we find that only two approaches surpass this result. We
    then develop a joint bilateral filtering method that implements strong prior
    knowledge about reflectance constancy. This filtering operation can be applied
    to any intrinsic image algorithm and we improve several previous results
    achieving a new state-of-the-art on IIW. Our findings suggest that the effect
    of learning-based approaches may be over-estimated and that it is still the use
    of explicit prior knowledge that drives performance on intrinsic image
    decompositions.

    Objective Micro-Facial Movement Detection Using FACS-Based Regions and Baseline Evaluation

    Adrian K. Davison, Cliff Lansley, Choon Ching Ng, Kevin Tan, Moi Hoon Yap
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Micro-facial expressions are regarded as an important human behavioural event
    that can highlight emotional deception. Spotting these movements is difficult
    for humans and machines, however research into using computer vision to detect
    subtle facial expressions is growing in popularity. This paper proposes an
    individualised baseline micro-movement detection method using 3D Histogram of
    Oriented Gradients (3D HOG) temporal difference method. We define a face
    template consisting of 26 regions based on the Facial Action Coding System
    (FACS). We extract the temporal features of each region using 3D HOG. Then, we
    use Chi-square distance to find subtle facial motion in the local regions.
    Finally, an automatic peak detector is used to detect micro-movements above the
    newly proposed adaptive baseline threshold. The performance is validated on two
    FACS coded datasets: SAMM and CASME II. This objective method focuses on the
    movement of the 26 face regions. When comparing with the ground truth, the best
    result was an AUC of 0.7512 and 0.7261 on SAMM and CASME II, respectively. The
    results show that 3D HOG outperformed for micro-movement detection, compared to
    state-of-the-art feature representations: Local Binary Patterns in Three
    Orthogonal Planes and Histograms of Oriented Optical Flow.

    A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract

    Alexander Hewer, Stefanie Wuhrer, Ingmar Steiner, Korin Richmond
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a multilinear statistical model of the human tongue that captures
    anatomical and tongue pose related shape variations separately. The model was
    derived from 3D magnetic resonance imaging data of 11 speakers sustaining
    speech related vocal tract configurations. The extraction was performed by
    using a minimally supervised method that uses as basis an image segmentation
    approach and a template fitting technique. Furthermore, it uses image denoising
    to deal with possibly corrupt data, palate surface information reconstruction
    to handle palatal tongue contacts, and a bootstrap strategy to refine the
    obtained shapes. Our experiments concluded that limiting the degrees of freedom
    for the anatomical and speech related variations to 5 and 4 respectively
    produces a model that can reliably register unknown data while avoiding
    overfitting effects.

    Development of a Real-time Colorectal Tumor Classification System for Narrow-band Imaging zoom-videoendoscopy

    Tsubasa Hirakawa, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Tetsushi Koide, Shigeto Yoshida, Hiroshi Mieno, Shinji Tanaka
    Comments: 9 pages, 8 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Colorectal endoscopy is important for the early detection and treatment of
    colorectal cancer and is used worldwide. A computer-aided diagnosis (CAD)
    system that provides an objective measure to endoscopists during colorectal
    endoscopic examinations would be of great value. In this study, we describe a
    newly developed CAD system that provides real-time objective measures. Our
    system captures the video stream from an endoscopic system and transfers it to
    a desktop computer. The captured video stream is then classified by a
    pretrained classifier and the results are displayed on a monitor. The
    experimental results show that our developed system works efficiently in actual
    endoscopic examinations and is medically significant.

    Design of Image Matched Non-Separable Wavelet using Convolutional Neural Network

    Naushad Ansari, Anubha Gupta, Rahul Duggal
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Image-matched nonseparable wavelets can find potential use in many
    applications including image classification, segmen- tation, compressive
    sensing, etc. This paper proposes a novel design methodology that utilizes
    convolutional neural net- work (CNN) to design two-channel non-separable
    wavelet matched to a given image. The design is proposed on quin- cunx lattice.
    The loss function of the convolutional neural network is setup with total
    squared error between the given input image to CNN and the reconstructed image
    at the output of CNN, leading to perfect reconstruction at the end of train-
    ing. Simulation results have been shown on some standard images.

    Cloud Dictionary: Sparse Coding and Modeling for Point Clouds

    Or Litany, Tal Remez, Alex Bronstein
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

    With the development of range sensors such as LIDAR and time-of-flight
    cameras, 3D point cloud scans have become ubiquitous in computer vision
    applications, the most prominent ones being gesture recognition and autonomous
    driving. Parsimony-based algorithms have shown great success on images and
    videos where data points are sampled on a regular Cartesian grid. We propose an
    adaptation of these techniques to irregularly sampled signals by using
    continuous dictionaries. We present an example application in the form of point
    cloud denoising.

    Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

    Hao Liu, Yang Yang, Fumin Shen, Lixin Duan, Heng Tao Shen
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    Along with the prosperity of recurrent neural network in modelling sequential
    data and the power of attention mechanism in automatically identify salient
    information, image captioning, a.k.a., image description, has been remarkably
    advanced in recent years. Nonetheless, most existing paradigms may suffer from
    the deficiency of invariance to images with different scaling, rotation, etc.;
    and effective integration of standalone attention to form a holistic end-to-end
    system. In this paper, we propose a novel image captioning architecture, termed
    Recurrent Image Captioner ( extbf{RIC}), which allows visual encoder and
    language decoder to coherently cooperate in a recurrent manner. Specifically,
    we first equip CNN-based visual encoder with a differentiable layer to enable
    spatially invariant transformation of visual signals. Moreover, we deploy an
    attention filter module (differentiable) between encoder and decoder to
    dynamically determine salient visual parts. We also employ bidirectional LSTM
    to preprocess sentences for generating better textual representations. Besides,
    we propose to exploit variational inference to optimize the whole architecture.
    Extensive experimental results on three benchmark datasets (i.e., Flickr8k,
    Flickr30k and MS COCO) demonstrate the superiority of our proposed architecture
    as compared to most of the state-of-the-art methods.

    Regressing Robust and Discriminative 3D Morphable Models with a very Deep Neural Network

    Anh Tuan Tran, Tal Hassner, Iacopo Masi, Gerard Medioni
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The 3D shapes of faces are well known to be discriminative. Yet despite this,
    they are rarely used for face recognition and always under controlled viewing
    conditions. We claim that this is a symptom of a serious but often overlooked
    problem with existing methods for single view 3D face reconstruction: when
    applied “in the wild”, their 3D estimates are either unstable and change for
    different photos of the same subject or they are over-regularized and generic.
    In response, we describe a robust method for regressing discriminative 3D
    morphable face models (3DMM). We use a convolutional neural network (CNN) to
    regress 3DMM shape and texture parameters directly from an input photo. We
    overcome the shortage of training data required for this purpose by offering a
    method for generating huge numbers of labeled examples. The 3D estimates
    produced by our CNN surpass state of the art accuracy on the MICC data set.
    Coupled with a 3D-3D face matching pipeline, we show the first competitive face
    recognition results on the LFW, YTF and IJB-A benchmarks using 3D face shapes
    as representations, rather than the opaque deep feature vectors used by other
    modern systems.

    Tinkering Under the Hood: Interactive Zero-Shot Learning with Net Surgery

    Vivek Krishnan, Deva Ramanan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We consider the task of visual net surgery, in which a CNN can be
    reconfigured without extra data to recognize novel concepts that may be omitted
    from the training set. While most prior work make use of linguistic cues for
    such “zero-shot” learning, we do so by using a pictorial language
    representation of the training set, implicitly learned by a CNN, to generalize
    to new classes. To this end, we introduce a set of visualization techniques
    that better reveal the activation patterns and relations between groups of CNN
    filters. We next demonstrate that knowledge of pictorial languages can be used
    to rewire certain CNN neurons into a part model, which we call a pictorial
    language classifier. We demonstrate the robustness of simple PLCs by applying
    them in a weakly supervised manner: labeling unlabeled concepts for visual
    classes present in the training data. Specifically we show that a PLC built on
    top of a CNN trained for ImageNet classification can localize humans in Graz-02
    and determine the pose of birds in PASCAL-VOC without extra labeled data or
    additional training. We then apply PLCs in an interactive zero-shot manner,
    demonstrating that pictorial languages are expressive enough to detect a set of
    visual classes in MS-COCO that never appear in the ImageNet training set.

    Scale Coding Bag of Deep Features for Human Attribute and Action Recognition

    Fahad Shahbaz Khan, Joost van de Weijer, Rao Muhammad Anwer, Andrew D. Bagdanov, Michael Felsberg, Jorma Laaksonen
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Most approaches to human attribute and action recognition in still images are
    based on image representation in which multi-scale local features are pooled
    across scale into a single, scale-invariant encoding. Both in bag-of-words and
    the recently popular representations based on convolutional neural networks,
    local features are computed at multiple scales. However, these multi-scale
    convolutional features are pooled into a single scale-invariant representation.
    We argue that entirely scale-invariant image representations are sub-optimal
    and investigate approaches to scale coding within a Bag of Deep Features
    framework.

    Our approach encodes multi-scale information explicitly during the image
    encoding stage. We propose two strategies to encode multi-scale information
    explicitly in the final image representation. We validate our two scale coding
    techniques on five datasets: Willow, PASCAL VOC 2010, PASCAL VOC 2012,
    Stanford-40 and Human Attributes (HAT-27). On all datasets, the proposed scale
    coding approaches outperform both the scale-invariant method and the standard
    deep features of the same network. Further, combining our scale coding
    approaches with standard deep features leads to consistent improvement over the
    state-of-the-art.

    Border-Peeling Clustering

    Nadav Bar, Hadar Averbuch-Elor, Daniel Cohen-Or
    Comments: 9 pages, 9 figures, supplementary material added as ancillary file
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we present a novel non-parametric clustering technique, which
    is based on an iterative algorithm that peels off layers of points around the
    clusters. Our technique is based on the notion that each latent cluster is
    comprised of layers that surround its core, where the external layers, or
    border points, implicitly separate the clusters. Analyzing the K-nearest
    neighbors of the points makes it possible to identify the border points and
    associate them with points of inner layers. Our clustering algorithm
    iteratively identifies border points, peels them, and separates the latent
    clusters. We show that the peeling process adapts to the local density and
    successfully separates adjacent clusters. A notable quality of the
    Border-Peeling algorithm is that it does not require any parameter tuning in
    order to outperform state-of-the-art finely-tuned non-parametric clustering
    methods, including Mean-Shift and DBSCAN. We further assess our technique on
    high-dimensional datasets that vary in size and characteristics. In particular,
    we analyze the space of deep features that were trained by a convolutional
    neural network.

    A fuzzy approach for segmentation of touching characters

    Giuseppe Airò Farulla, Nadir Murru, Rosaria Rossini
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The problem of correctly segmenting touching characters is an hard task to
    solve and it is of major relevance in pattern recognition. In the recent years,
    many methods and algorithms have been proposed; still, a definitive solution is
    far from being found. In this paper, we propose a novel method based on fuzzy
    logic. The proposed method combines in a novel way three features for
    segmenting touching characters that have been already proposed in other studies
    but have been exploited only singularly so far. The proposed strategy is based
    on a 3–input/1–output fuzzy inference system with fuzzy rules specifically
    optimized for segmenting touching characters in the case of Latin printed and
    handwritten characters. The system performances are illustrated and supported
    by numerical examples showing that our approach can achieve a reasonable good
    overall accuracy in segmenting characters even on tricky conditions of touching
    characters. Moreover, numerical results suggest that the method can be applied
    to many different datasets of characters by means of a convenient tuning of the
    fuzzy sets and rules.

    Temporal-Needle: A view and appearance invariant video descriptor

    Michal Yarom, Michal Irani
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The ability to detect similar actions across videos can be very useful for
    real-world applications in many fields. However, this task is still challenging
    for existing systems, since videos that present the same action, can be taken
    from significantly different viewing directions, performed by different actors
    and backgrounds and under various video qualities. Video descriptors play a
    significant role in these systems. In this work we propose the
    “temporal-needle” descriptor which captures the dynamic behavior, while being
    invariant to viewpoint and appearance. The descriptor is computed using multi
    temporal scales of the video and by computing self-similarity for every patch
    through time in every temporal scale. The descriptor is computed for every
    pixel in the video. However, to find similar actions across videos, we consider
    only a small subset of the descriptors – the statistical significant
    descriptors. This allow us to find good correspondences across videos more
    efficiently. Using the descriptor, we were able to detect the same behavior
    across videos in a variety of scenarios. We demonstrate the use of the
    descriptor in tasks such as temporal and spatial alignment, action detection
    and even show its potential in unsupervised video clustering into categories.
    In this work we handled only videos taken with stationary cameras, but the
    descriptor can be extended to handle moving camera as well.

    The More You Know: Using Knowledge Graphs for Image Classification

    Kenneth Marino, Ruslan Salakhutdinov, Abhinav Gupta
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Humans have the remarkable capability to learn a large variety of visual
    concepts, often with very few examples, whereas current state-of-the-art vision
    algorithms require hundreds or thousands of examples per category and struggle
    with ambiguity. One characteristic that sets humans apart is our ability to
    acquire knowledge about the world and reason using this knowledge. This paper
    investigates the use of structured prior knowledge in the form of knowledge
    graphs and shows that using this knowledge improves performance on image
    classification. Specifically, we introduce the Graph Search Neural Network as a
    way of efficiently incorporating large knowledge graphs into a fully end-to-end
    learning system. We show in a number of experiments that our method outperforms
    baselines for multi-label classification, even under low data and few-shot
    settings.

    Coupling Adaptive Batch Sizes with Learning Rates

    Lukas Balles, Javier Romero, Philipp Hennig
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Mini-batch stochastic gradient descent and variants thereof have become
    standard for large-scale empirical risk minimization like the training of
    neural networks. These methods are usually used with a constant batch size
    chosen by simple empirical inspection. The batch size significantly influences
    the behavior of the stochastic optimization algorithm, though, since it
    determines the variance of the gradient estimates. This variance also changes
    over the optimization process; when using a constant batch size, stability and
    convergence is thus often enforced by means of a (manually tuned) decreasing
    learning rate schedule. We propose a practical method for dynamic batch size
    adaptation. It estimates the variance of the stochastic gradients and adapts
    the batch size to decrease the variance proportionally to the value of the
    objective function, removing the need for the aforementioned learning rate
    decrease. In contrast to recent related work, our algorithm couples the batch
    size to the learning rate, directly reflecting the known relationship between
    the two. On three image classification benchmarks, our batch size adaptation
    yields faster optimization convergence, while simultaneously simplifying
    learning rate tuning. A TensorFlow implementation is available.

    Towards Score Following in Sheet Music Images

    Matthias Dorfer, Andreas Arzt, Gerhard Widmer
    Comments: Published In Proceedings of the 17th International Society for Music Information Retrieval Conference (2016)
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    This paper addresses the matching of short music audio snippets to the
    corresponding pixel location in images of sheet music. A system is presented
    that simultaneously learns to read notes, listens to music and matches the
    currently played music to its corresponding notes in the sheet. It consists of
    an end-to-end multi-modal convolutional neural network that takes as input
    images of sheet music and spectrograms of the respective audio snippets. It
    learns to predict, for a given unseen audio snippet (covering approximately one
    bar of music), the corresponding position in the respective score line. Our
    results suggest that with the use of (deep) neural networks — which have
    proven to be powerful image processing models — working with sheet music
    becomes feasible and a promising future research direction.

    Deep learning is effective for the classification of OCT images of normal versus Age-related Macular Degeneration

    Cecilia S. Lee, Doug M. Baughman, Aaron Y. Lee
    Comments: 4 Figures, 1 Table
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Objective: The advent of Electronic Medical Records (EMR) with large
    electronic imaging databases along with advances in deep neural networks with
    machine learning has provided a unique opportunity to achieve milestones in
    automated image analysis. Optical coherence tomography (OCT) is the most
    commonly obtained imaging modality in ophthalmology and represents a dense and
    rich dataset when combined with labels derived from the EMR. We sought to
    determine if deep learning could be utilized to distinguish normal OCT images
    from images from patients with Age-related Macular Degeneration (AMD). Methods:
    Automated extraction of an OCT imaging database was performed and linked to
    clinical endpoints from the EMR. OCT macula scans were obtained by Heidelberg
    Spectralis, and each OCT scan was linked to EMR clinical endpoints extracted
    from EPIC. The central 11 images were selected from each OCT scan of two
    cohorts of patients: normal and AMD. Cross-validation was performed using a
    random subset of patients. Area under receiver operator curves (auROC) were
    constructed at an independent image level, macular OCT level, and patient
    level. Results: Of an extraction of 2.6 million OCT images linked to clinical
    datapoints from the EMR, 52,690 normal and 48,312 AMD macular OCT images were
    selected. A deep neural network was trained to categorize images as either
    normal or AMD. At the image level, we achieved an auROC of 92.78% with an
    accuracy of 87.63%. At the macula level, we achieved an auROC of 93.83% with an
    accuracy of 88.98%. At a patient level, we achieved an auROC of 97.45% with an
    accuracy of 93.45%. Peak sensitivity and specificity with optimal cutoffs were
    92.64% and 93.69% respectively. Conclusions: Deep learning techniques are
    effective for classifying OCT images. These findings have important
    implications in utilizing OCT in automated screening and computer aided
    diagnosis tools.


    Artificial Intelligence

    Ontohub: A semantic repository for heterogeneous ontologies

    Mihai Codescu, Eugen Kuksa, Oliver Kutz, Till Mossakowski, Fabian Neuhaus
    Comments: Preprint, journal special issue
    Subjects: Artificial Intelligence (cs.AI)

    Ontohub is a repository engine for managing distributed heterogeneous
    ontologies. The distributed nature enables communities to share and exchange
    their contributions easily. The heterogeneous nature makes it possible to
    integrate ontologies written in various ontology languages. Ontohub supports a
    wide range of formal logical and ontology languages, as well as various
    structuring and modularity constructs and inter-theory (concept) mappings,
    building on the OMG-standardized DOL language. Ontohub repositories are
    organised as Git repositories, thus inheriting all features of this popular
    version control system. Moreover, Ontohub is the first repository engine
    meeting a substantial amount of the requirements formulated in the context of
    the Open Ontology Repository (OOR) initiative, including an API for federation
    as well as support for logical inference and axiom selection.

    Crowdsourced Outcome Determination in Prediction Markets

    Rupert Freeman, Sebastien Lahaie, David M. Pennock
    Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

    A prediction market is a useful means of aggregating information about a
    future event. To function, the market needs a trusted entity who will verify
    the true outcome in the end. Motivated by the recent introduction of
    decentralized prediction markets, we introduce a mechanism that allows for the
    outcome to be determined by the votes of a group of arbiters who may themselves
    hold stakes in the market. Despite the potential conflict of interest, we
    derive conditions under which we can incentivize arbiters to vote truthfully by
    using funds raised from market fees to implement a peer prediction mechanism.
    Finally, we investigate what parameter values could be used in a real-world
    implementation of our mechanism.

    Collaborative creativity with Monte-Carlo Tree Search and Convolutional Neural Networks

    Memo Akten, Mick Grierson
    Comments: Presented at the Constructive Machine Learning workshop at NIPS 2016 as a poster and spotlight talk. 8 pages including 2 page references, 2 page appendix, 3 figures. Blog post (including videos) at this https URL
    Subjects: Artificial Intelligence (cs.AI)

    We investigate a human-machine collaborative drawing environment in which an
    autonomous agent sketches images while optionally allowing a user to directly
    influence the agent’s trajectory. We combine Monte Carlo Tree Search with image
    classifiers and test both shallow models (e.g. multinomial logistic regression)
    and deep Convolutional Neural Networks (e.g. LeNet, Inception v3). We found
    that using the shallow model, the agent produces a limited variety of images,
    which are noticably recogonisable by humans. However, using the deeper models,
    the agent produces a more diverse range of images, and while the agent remains
    very confident (99.99%) in having achieved its objective, to humans they mostly
    resemble unrecognisable ‘random’ noise. We relate this to recent research which
    also discovered that ‘deep neural networks are easily fooled’ cite{Nguyen2015}
    and we discuss possible solutions and future directions for the research.

    Neural Networks for Joint Sentence Classification in Medical Paper Abstracts

    Franck Dernoncourt, Ji Young Lee, Peter Szolovits
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Existing models based on artificial neural networks (ANNs) for sentence
    classification often do not incorporate the context in which sentences appear,
    and classify sentences individually. However, traditional sentence
    classification approaches have been shown to greatly benefit from jointly
    classifying subsequent sentences, such as with conditional random fields. In
    this work, we present an ANN architecture that combines the effectiveness of
    typical ANN models to classify sentences in isolation, with the strength of
    structured prediction. Our model achieves state-of-the-art results on two
    different datasets for sequential sentence classification in medical abstracts.

    Improving Scalability of Reinforcement Learning by Separation of Concerns

    Harm van Seijen, Mehdi Fatemi, Joshua Romoff
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    In this paper, we propose a framework for solving a single-agent task by
    using multiple agents, each focusing on different aspects of the task. This
    approach has two main advantages: 1) it allows for specialized agents for
    different parts of the task, and 2) it provides a new way to transfer
    knowledge, by transferring trained agents. Our framework generalizes the
    traditional hierarchical decomposition, in which, at any moment in time, a
    single agent has control until it has solved its particular subtask. We
    illustrate our framework using a number of examples.

    Adversarial Message Passing For Graphical Models

    Theofanis Karaletsos
    Comments: (12 pages, 2 figures) Presented at NIPS Advances In Approximate Inference 2016 (AABI 2016)
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI)

    Bayesian inference on structured models typically relies on the ability to
    infer posterior distributions of underlying hidden variables. However,
    inference in implicit models or complex posterior distributions is hard. A
    popular tool for learning implicit models are generative adversarial networks
    (GANs) which learn parameters of generators by fooling discriminators.
    Typically, GANs are considered to be models themselves and are not understood
    in the context of inference. Current techniques rely on inefficient global
    discrimination of joint distributions to perform learning, or only consider
    discriminating a single output variable. We overcome these limitations by
    treating GANs as a basis for likelihood-free inference in generative models and
    generalize them to Bayesian posterior inference over factor graphs. We propose
    local learning rules based on message passing minimizing a global divergence
    criterion involving cooperating local adversaries used to sidestep explicit
    likelihood evaluations. This allows us to compose models and yields a unified
    inference and learning framework for adversarial learning. Our framework treats
    model specification and inference separately and facilitates richly structured
    models within the family of Directed Acyclic Graphs, including components such
    as intractable likelihoods, non-differentiable models, simulators and generally
    cumbersome models. A key result of our treatment is the insight that Bayesian
    inference on structured models can be performed only with sampling and
    discrimination when using nonparametric variational families, without access to
    explicit distributions. As a side-result, we discuss the link to likelihood
    maximization. These approaches hold promise to be useful in the toolbox of
    probabilistic modelers and enrich the gamut of current probabilistic
    programming applications.

    TeKnowbase: Towards Construction of a Knowledge-base of Technical Concepts

    Prajna Upadhyay, Tanuma Patra, Ashwini Purkar, Maya Ramanath
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    In this paper, we describe the construction of TeKnowbase, a knowledge-base
    of technical concepts in computer science. Our main information sources are
    technical websites such as Webopedia and Techtarget as well as Wikipedia and
    online textbooks. We divide the knowledge-base construction problem into two
    parts — the acquisition of entities and the extraction of relationships among
    these entities. Our knowledge-base consists of approximately 100,000 triples.
    We conducted an evaluation on a sample of triples and report an accuracy of a
    little over 90\%. We additionally conducted classification experiments on
    StackOverflow data with features from TeKnowbase and achieved improved
    classification accuracy.

    Learning Through Dialogue Interactions

    Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    A good dialogue agent should have the ability to interact with users. In this
    work, we explore this direction by designing a simulator and a set of synthetic
    tasks in the movie domain that allow the learner to interact with a teacher by
    both asking and answering questions. We investigate how a learner can benefit
    from asking questions in both an offline and online reinforcement learning
    setting. We demonstrate that the learner improves when asking questions. Our
    work represents a first step in developing end-to-end learned interactive
    dialogue agents.

    Dynamical Kinds and their Discovery

    Benjamin C. Jantzen
    Comments: Accepted for the proceedings of the Causation: Foundation to Application Workshop, UAI 2016
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

    We demonstrate the possibility of classifying causal systems into kinds that
    share a common structure without first constructing an explicit dynamical model
    or using prior knowledge of the system dynamics. The algorithmic ability to
    determine whether arbitrary systems are governed by causal relations of the
    same form offers significant practical applications in the development and
    validation of dynamical models. It is also of theoretical interest as an
    essential stage in the scientific inference of laws from empirical data. The
    algorithm presented is based on the dynamical symmetry approach to dynamical
    kinds. A dynamical symmetry with respect to time is an intervention on one or
    more variables of a system that commutes with the time evolution of the system.
    A dynamical kind is a class of systems sharing a set of dynamical symmetries.
    The algorithm presented classifies deterministic, time-dependent causal systems
    by directly comparing their exhibited symmetries. Using simulated, noisy data
    from a variety of nonlinear systems, we show that this algorithm correctly
    sorts systems into dynamical kinds. It is robust under significant sampling
    error, is immune to violations of normality in sampling error, and fails
    gracefully with increasing dynamical similarity. The algorithm we demonstrate
    is the first to address this aspect of automated scientific discovery.

    Interpretable Semantic Textual Similarity: Finding and explaining differences between sentences

    I. Lopez-Gazpio, M. Maritxalar, A. Gonzalez-Agirre, G. Rigau, L. Uria, E. Agirre
    Comments: Preprint version, Knowledge-Based Systems (ISSN: 0950-7051). (2016)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    User acceptance of artificial intelligence agents might depend on their
    ability to explain their reasoning, which requires adding an interpretability
    layer that fa- cilitates users to understand their behavior. This paper focuses
    on adding an in- terpretable layer on top of Semantic Textual Similarity (STS),
    which measures the degree of semantic equivalence between two sentences. The
    interpretability layer is formalized as the alignment between pairs of segments
    across the two sentences, where the relation between the segments is labeled
    with a relation type and a similarity score. We present a publicly available
    dataset of sentence pairs annotated following the formalization. We then
    develop a system trained on this dataset which, given a sentence pair, explains
    what is similar and different, in the form of graded and typed segment
    alignments. When evaluated on the dataset, the system performs better than an
    informed baseline, showing that the dataset and task are well-defined and
    feasible. Most importantly, two user studies show how the system output can be
    used to automatically produce explanations in natural language. Users performed
    better when having access to the explanations, pro- viding preliminary evidence
    that our dataset and method to automatically produce explanations is useful in
    real applications.


    Information Retrieval

    Using the Context of User Feedback in Recommender Systems

    Ladislav Peska (Charles University in Prague, Faculty of Mathematics and Physics)
    Comments: In Proceedings MEMICS 2016, arXiv:1612.04037
    Journal-ref: EPTCS 233, 2016, pp. 1-12
    Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC)

    Our work is generally focused on recommending for small or medium-sized
    e-commerce portals, where explicit feedback is absent and thus the usage of
    implicit feedback is necessary. Nonetheless, for some implicit feedback
    features, the presentation context may be of high importance. In this paper, we
    present a model of relevant contextual features affecting user feedback,
    propose methods leveraging those features, publish a dataset of real e-commerce
    users containing multiple user feedback indicators as well as its context and
    finally present results of purchase prediction and recommendation experiments.
    Off-line experiments with real users of a Czech travel agency website
    corroborated the importance of leveraging presentation context in both purchase
    prediction and recommendation tasks.

    A Graph Summarization: A Survey

    Yike Liu, Abhilash Dighe, Tara Safavi, Danai Koutra
    Subjects: Information Retrieval (cs.IR)

    While advances in computing resources have made processing enormous amounts
    of data possible, human ability to identify patterns in such data has not
    scaled accordingly. Thus, efficient computational methods for condensing and
    simplifying data are becoming vital for extracting actionable insights. In
    particular, while data summarization techniques have been studied extensively,
    only recently has summarizing interconnected data, or graphs, become popular.
    This survey is a structured, comprehensive overview of the state-of-the-art
    methods for summarizing graph data. We first broach the motivation behind and
    the challenges of graph summarization. We then categorize summarization
    approaches by the type of graphs taken as input and further organize each
    category by core methodology. Finally, we discuss applications of summarization
    on real-world graphs and conclude by describing some open problems in the
    field.

    Towards End-to-End Audio-Sheet-Music Retrieval

    Matthias Dorfer, Andreas Arzt, Gerhard Widmer
    Comments: In NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop, Barcelona, Spain
    Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Learning (cs.LG)

    This paper demonstrates the feasibility of learning to retrieve short
    snippets of sheet music (images) when given a short query excerpt of music
    (audio) — and vice versa –, without any symbolic representation of music or
    scores. This would be highly useful in many content-based musical retrieval
    scenarios. Our approach is based on Deep Canonical Correlation Analysis (DCCA)
    and learns correlated latent spaces allowing for cross-modality retrieval in
    both directions. Initial experiments with relatively simple monophonic music
    show promising results.


    Computation and Language

    Neural Networks for Joint Sentence Classification in Medical Paper Abstracts

    Franck Dernoncourt, Ji Young Lee, Peter Szolovits
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Existing models based on artificial neural networks (ANNs) for sentence
    classification often do not incorporate the context in which sentences appear,
    and classify sentences individually. However, traditional sentence
    classification approaches have been shown to greatly benefit from jointly
    classifying subsequent sentences, such as with conditional random fields. In
    this work, we present an ANN architecture that combines the effectiveness of
    typical ANN models to classify sentences in isolation, with the strength of
    structured prediction. Our model achieves state-of-the-art results on two
    different datasets for sequential sentence classification in medical abstracts.

    Building a robust sentiment lexicon with (almost) no resource

    Mickael Rouvier, Benoit Favre
    Subjects: Computation and Language (cs.CL)

    Creating sentiment polarity lexicons is labor intensive. Automatically
    translating them from resourceful languages requires in-domain machine
    translation systems, which rely on large quantities of bi-texts. In this paper,
    we propose to replace machine translation by transferring words from the
    lexicon through word embeddings aligned across languages with a simple linear
    transform. The approach leads to no degradation, compared to machine
    translation, when tested on sentiment polarity classification on tweets from
    four languages.

    Transition-based Parsing with Context Enhancement and Future Reward Reranking

    Fugen Zhou, Fuxiang Wu, Zhengchen Zhang, Minghui Dong
    Subjects: Computation and Language (cs.CL)

    This paper presents a novel reranking model, future reward reranking, to
    re-score the actions in a transition-based parser by using a global scorer.
    Different to conventional reranking parsing, the model searches for the best
    dependency tree in all feasible trees constraining by a sequence of actions to
    get the future reward of the sequence. The scorer is based on a first-order
    graph-based parser with bidirectional LSTM, which catches different parsing
    view compared with the transition-based parser. Besides, since context
    enhancement has shown substantial improvement in the arc-stand transition-based
    parsing over the parsing accuracy, we implement context enhancement on an
    arc-eager transition-base parser with stack LSTMs, the dynamic oracle and
    dropout supporting and achieve further improvement. With the global scorer and
    context enhancement, the results show that UAS of the parser increases as much
    as 1.20% for English and 1.66% for Chinese, and LAS increases as much as 1.32%
    for English and 1.63% for Chinese. Moreover, we get state-of-the-art LASs,
    achieving 87.58% for Chinese and 93.37% for English.

    TeKnowbase: Towards Construction of a Knowledge-base of Technical Concepts

    Prajna Upadhyay, Tanuma Patra, Ashwini Purkar, Maya Ramanath
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    In this paper, we describe the construction of TeKnowbase, a knowledge-base
    of technical concepts in computer science. Our main information sources are
    technical websites such as Webopedia and Techtarget as well as Wikipedia and
    online textbooks. We divide the knowledge-base construction problem into two
    parts — the acquisition of entities and the extraction of relationships among
    these entities. Our knowledge-base consists of approximately 100,000 triples.
    We conducted an evaluation on a sample of triples and report an accuracy of a
    little over 90\%. We additionally conducted classification experiments on
    StackOverflow data with features from TeKnowbase and achieved improved
    classification accuracy.

    Learning Through Dialogue Interactions

    Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    A good dialogue agent should have the ability to interact with users. In this
    work, we explore this direction by designing a simulator and a set of synthetic
    tasks in the movie domain that allow the learner to interact with a teacher by
    both asking and answering questions. We investigate how a learner can benefit
    from asking questions in both an offline and online reinforcement learning
    setting. We demonstrate that the learner improves when asking questions. Our
    work represents a first step in developing end-to-end learned interactive
    dialogue agents.

    Interpretable Semantic Textual Similarity: Finding and explaining differences between sentences

    I. Lopez-Gazpio, M. Maritxalar, A. Gonzalez-Agirre, G. Rigau, L. Uria, E. Agirre
    Comments: Preprint version, Knowledge-Based Systems (ISSN: 0950-7051). (2016)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    User acceptance of artificial intelligence agents might depend on their
    ability to explain their reasoning, which requires adding an interpretability
    layer that fa- cilitates users to understand their behavior. This paper focuses
    on adding an in- terpretable layer on top of Semantic Textual Similarity (STS),
    which measures the degree of semantic equivalence between two sentences. The
    interpretability layer is formalized as the alignment between pairs of segments
    across the two sentences, where the relation between the segments is labeled
    with a relation type and a similarity score. We present a publicly available
    dataset of sentence pairs annotated following the formalization. We then
    develop a system trained on this dataset which, given a sentence pair, explains
    what is similar and different, in the form of graded and typed segment
    alignments. When evaluated on the dataset, the system performs better than an
    informed baseline, showing that the dataset and task are well-defined and
    feasible. Most importantly, two user studies show how the system output can be
    used to automatically produce explanations in natural language. Users performed
    better when having access to the explanations, pro- viding preliminary evidence
    that our dataset and method to automatically produce explanations is useful in
    real applications.

    Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

    Hao Liu, Yang Yang, Fumin Shen, Lixin Duan, Heng Tao Shen
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    Along with the prosperity of recurrent neural network in modelling sequential
    data and the power of attention mechanism in automatically identify salient
    information, image captioning, a.k.a., image description, has been remarkably
    advanced in recent years. Nonetheless, most existing paradigms may suffer from
    the deficiency of invariance to images with different scaling, rotation, etc.;
    and effective integration of standalone attention to form a holistic end-to-end
    system. In this paper, we propose a novel image captioning architecture, termed
    Recurrent Image Captioner ( extbf{RIC}), which allows visual encoder and
    language decoder to coherently cooperate in a recurrent manner. Specifically,
    we first equip CNN-based visual encoder with a differentiable layer to enable
    spatially invariant transformation of visual signals. Moreover, we deploy an
    attention filter module (differentiable) between encoder and decoder to
    dynamically determine salient visual parts. We also employ bidirectional LSTM
    to preprocess sentences for generating better textual representations. Besides,
    we propose to exploit variational inference to optimize the whole architecture.
    Extensive experimental results on three benchmark datasets (i.e., Flickr8k,
    Flickr30k and MS COCO) demonstrate the superiority of our proposed architecture
    as compared to most of the state-of-the-art methods.


    Distributed, Parallel, and Cluster Computing

    Private Learning on Networks

    Shripad Gade, Nitin H. Vaidya
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Optimization and Control (math.OC)

    Continual data collection and widespread deployment of machine learning
    algorithms, particularly the distributed variants, have raised new privacy
    challenges. In a distributed machine learning scenario, the dataset is stored
    among several machines and they solve a distributed optimization problem to
    collectively learn the underlying model. We present a secure multi-party
    computation inspired privacy preserving distributed algorithm for optimizing a
    convex function consisting of several possibly non-convex functions. Each
    individual objective function is privately stored with an agent while the
    agents communicate model parameters with neighbor machines connected in a
    network. We show that our algorithm can correctly optimize the overall
    objective function and learn the underlying model accurately. We further prove
    that under a vertex connectivity condition on the topology, our algorithm
    preserves privacy of individual objective functions. We establish limits on the
    what a coalition of adversaries can learn by observing the messages and states
    shared over a network.

    GentleRain+: Making GentleRain Robust on Clock Anomalies

    Mohammad Roohitavaf, Sandeep Kulkarni
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Causal consistency is in an intermediate consistency model that can be
    achieved together with high availability and high performance requirements even
    in presence of network partitions. There are several proposals in the
    literature for causally consistent data stores. Thanks to the use of single
    scalar physical clocks, GentleRain has a throughput higher than other proposals
    such as COPS or Orbe. However, both of its correctness and performance relay on
    monotonic synchronized physical clocks. Specifically, if physical clocks go
    backward its correctness is violated. In addition, GentleRain is sensitive on
    the clock synchronization, and clock skew may slow write operations in
    GenlteRain. In this paper, we want to solve this issue in GenlteRain by using
    Hybrid Logical Clock (HLC) instead of physical clocks. Using HLC, GentleRain
    protocl is not sensitive on the clock skew anymore. In addition, even if clocks
    go backward, the correctness of the system is not violated. Furthermore, by
    HLC, we timestamp versions with a clock very close to the physical clocks.
    Thus, we can take causally consistency snapshot of the system at any give
    physical time. We call GentleRain protocol with HLCs GentleRain+. We have
    implemented GentleRain+ protocol, and have evaluated it experimentally.
    GentleRain+ provides faster write operations compare to GentleRain that rely
    solely on physical clocks to achieve causal consistency. We have also shown
    that using HLC instead of physical clock does not have any overhead. Thus, it
    makes GentleRain more robust on clock anomalies at no cost.

    Scalable Byzantine Consensus via Hardware-assisted Secret Sharing

    Jian Liu, Wenting Li, Ghassan O. Karame, N. Asokan
    Comments: 11 pages, 10 figures
    Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

    The surging interest in blockchain technology has revitalized the search for
    effective Byzantine consensus schemes. In particular, the blockchain community
    has been looking for ways to effectively integrate traditional Byzantine
    fault-tolerant (BFT) protocols into a blockchain consensus layer allowing
    various financial institutions to securely agree on the order of transactions.
    However, existing BFT protocols can only scale to tens of nodes due to their
    (O(n^2)) message complexity.

    In this paper, we propose FastBFT, the fastest and most scalable BFT protocol
    to-date. At the heart of FastBFT is a novel message aggregation technique that
    combines hardware-based trusted execution environments (TEEs) with lightweight
    secret sharing primitives. Combining this technique with several other
    optimizations (i.e., optimistic execution, tree topology and failure
    detection), FastBFT achieves low latency and high throughput even for large
    scale networks. Via systematic analysis and experiments, we demonstrate that
    FastBFT has better scalability and performance than previous BFT protocols.

    Efficient Distributed Semi-Supervised Learning using Stochastic Regularization over Affinity Graphs

    Sunil Thulasidasan, Jeffrey Bilmes, Garrett Kenyon
    Comments: NIPS 2016 Workshop on Machine Learning Systems
    Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    We describe a computationally efficient, stochastic graph-regularization
    technique that can be utilized for the semi-supervised training of deep neural
    networks in a parallel or distributed setting. We utilize a technique, first
    described in [13] for the construction of mini-batches for stochastic gradient
    descent (SGD) based on synthesized partitions of an affinity graph that are
    consistent with the graph structure, but also preserve enough stochasticity for
    convergence of SGD to good local minima. We show how our technique allows a
    graph-based semi-supervised loss function to be decomposed into a sum over
    objectives, facilitating data parallelism for scalable training of machine
    learning models. Empirical results indicate that our method significantly
    improves classification accuracy compared to the fully-supervised case when the
    fraction of labeled data is low, and in the parallel case, achieves significant
    speed-up in terms of wall-clock time to convergence. We show the results for
    both sequential and distributed-memory semi-supervised DNN training on a speech
    corpus.


    Learning

    Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNN

    Li Jing, Yichen Shen, Tena Dubček, John Peurifoy, Scott Skirlo, Max Tegmark, Marin Soljačić
    Comments: 9 pages, 4 figures
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We present a method for implementing an Efficient Unitary Neural Network
    (EUNN) whose computational complexity is merely (mathcal{O}(1)) per parameter
    and has full tunability, from spanning part of unitary space to all of it. We
    apply the EUNN in Recurrent Neural Networks, and test its performance on the
    standard copying task and the MNIST digit recognition benchmark, finding that
    it significantly outperforms a non-unitary RNN, an LSTM network, an exclusively
    partial space URNN and a projective URNN with comparable parameter numbers.

    Improving Scalability of Reinforcement Learning by Separation of Concerns

    Harm van Seijen, Mehdi Fatemi, Joshua Romoff
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    In this paper, we propose a framework for solving a single-agent task by
    using multiple agents, each focusing on different aspects of the task. This
    approach has two main advantages: 1) it allows for specialized agents for
    different parts of the task, and 2) it provides a new way to transfer
    knowledge, by transferring trained agents. Our framework generalizes the
    traditional hierarchical decomposition, in which, at any moment in time, a
    single agent has control until it has solved its particular subtask. We
    illustrate our framework using a number of examples.

    Coupling Adaptive Batch Sizes with Learning Rates

    Lukas Balles, Javier Romero, Philipp Hennig
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Mini-batch stochastic gradient descent and variants thereof have become
    standard for large-scale empirical risk minimization like the training of
    neural networks. These methods are usually used with a constant batch size
    chosen by simple empirical inspection. The batch size significantly influences
    the behavior of the stochastic optimization algorithm, though, since it
    determines the variance of the gradient estimates. This variance also changes
    over the optimization process; when using a constant batch size, stability and
    convergence is thus often enforced by means of a (manually tuned) decreasing
    learning rate schedule. We propose a practical method for dynamic batch size
    adaptation. It estimates the variance of the stochastic gradients and adapts
    the batch size to decrease the variance proportionally to the value of the
    objective function, removing the need for the aforementioned learning rate
    decrease. In contrast to recent related work, our algorithm couples the batch
    size to the learning rate, directly reflecting the known relationship between
    the two. On three image classification benchmarks, our batch size adaptation
    yields faster optimization convergence, while simultaneously simplifying
    learning rate tuning. A TensorFlow implementation is available.

    A Fully Convolutional Deep Auditory Model for Musical Chord Recognition

    Filip Korzeniowski, Gerhard Widmer
    Comments: In Proceedings of the 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), Vietro sul Mare, Italy
    Subjects: Learning (cs.LG); Sound (cs.SD)

    Chord recognition systems depend on robust feature extraction pipelines.
    While these pipelines are traditionally hand-crafted, recent advances in
    end-to-end machine learning have begun to inspire researchers to explore
    data-driven methods for such tasks. In this paper, we present a chord
    recognition system that uses a fully convolutional deep auditory model for
    feature extraction. The extracted features are processed by a Conditional
    Random Field that decodes the final chord sequence. Both processing stages are
    trained automatically and do not require expert knowledge for optimising
    parameters. We show that the learned auditory system extracts musically
    interpretable features, and that the proposed chord recognition system achieves
    results on par or better than state-of-the-art algorithms.

    Towards Score Following in Sheet Music Images

    Matthias Dorfer, Andreas Arzt, Gerhard Widmer
    Comments: Published In Proceedings of the 17th International Society for Music Information Retrieval Conference (2016)
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    This paper addresses the matching of short music audio snippets to the
    corresponding pixel location in images of sheet music. A system is presented
    that simultaneously learns to read notes, listens to music and matches the
    currently played music to its corresponding notes in the sheet. It consists of
    an end-to-end multi-modal convolutional neural network that takes as input
    images of sheet music and spectrograms of the respective audio snippets. It
    learns to predict, for a given unseen audio snippet (covering approximately one
    bar of music), the corresponding position in the respective score line. Our
    results suggest that with the use of (deep) neural networks — which have
    proven to be powerful image processing models — working with sheet music
    becomes feasible and a promising future research direction.

    A Data-Driven Compressive Sensing Framework Tailored For Energy-Efficient Wearable Sensing

    Kai Xu, Yixing Li, Fengbo Ren
    Comments: Accepted as an oral presentation in 42th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    Subjects: Learning (cs.LG); Information Theory (cs.IT)

    Compressive sensing (CS) is a promising technology for realizing
    energy-efficient wireless sensors for long-term health monitoring. However,
    conventional model-driven CS frameworks suffer from limited compression ratio
    and reconstruction quality when dealing with physiological signals due to
    inaccurate models and the overlook of individual variability. In this paper, we
    propose a data-driven CS framework that can learn signal characteristics and
    personalized features from any individual recording of physiologic signals to
    enhance CS performance with a minimized number of measurements. Such
    improvements are accomplished by a co-training approach that optimizes the
    sensing matrix and the dictionary towards improved restricted isometry property
    and signal sparsity, respectively. Experimental results upon ECG signals show
    that the proposed method, at a compression ratio of 10x, successfully reduces
    the isometry constant of the trained sensing matrices by 86% against random
    matrices and improves the overall reconstructed signal-to-noise ratio by 15dB
    over conventional model-driven approaches.

    Bayesian Optimization for Machine Learning : A Practical Guidebook

    Ian Dewancker, Michael McCourt, Scott Clark
    Subjects: Learning (cs.LG)

    The engineering of machine learning systems is still a nascent field; relying
    on a seemingly daunting collection of quickly evolving tools and best
    practices. It is our hope that this guidebook will serve as a useful resource
    for machine learning practitioners looking to take advantage of Bayesian
    optimization techniques. We outline four example machine learning problems that
    can be solved using open source machine learning libraries, and highlight the
    benefits of using Bayesian optimization in the context of these common machine
    learning applications.

    Constraint Selection in Metric Learning

    Hoel Le Capitaine
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    A number of machine learning algorithms are using a metric, or a distance, in
    order to compare individuals. The Euclidean distance is usually employed, but
    it may be more efficient to learn a parametric distance such as Mahalanobis
    metric. Learning such a metric is a hot topic since more than ten years now,
    and a number of methods have been proposed to efficiently learn it. However,
    the nature of the problem makes it quite difficult for large scale data, as
    well as data for which classes overlap. This paper presents a simple way of
    improving accuracy and scalability of any iterative metric learning algorithm,
    where constraints are obtained prior to the algorithm. The proposed approach
    relies on a loss-dependent weighted selection of constraints that are used for
    learning the metric. Using the corresponding dedicated loss function, the
    method clearly allows to obtain better results than state-of-the-art methods,
    both in terms of accuracy and time complexity. Some experimental results on
    real world, and potentially large, datasets are demonstrating the effectiveness
    of our proposition.

    Private Learning on Networks

    Shripad Gade, Nitin H. Vaidya
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Optimization and Control (math.OC)

    Continual data collection and widespread deployment of machine learning
    algorithms, particularly the distributed variants, have raised new privacy
    challenges. In a distributed machine learning scenario, the dataset is stored
    among several machines and they solve a distributed optimization problem to
    collectively learn the underlying model. We present a secure multi-party
    computation inspired privacy preserving distributed algorithm for optimizing a
    convex function consisting of several possibly non-convex functions. Each
    individual objective function is privately stored with an agent while the
    agents communicate model parameters with neighbor machines connected in a
    network. We show that our algorithm can correctly optimize the overall
    objective function and learn the underlying model accurately. We further prove
    that under a vertex connectivity condition on the topology, our algorithm
    preserves privacy of individual objective functions. We establish limits on the
    what a coalition of adversaries can learn by observing the messages and states
    shared over a network.

    CSVideoNet: A Recurrent Convolutional Neural Network for Compressive Sensing Video Reconstruction

    Kai Xu, Fengbo Ren
    Comments: 10 pages, 6 pages, 2 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In this paper, we develop a deep neural network architecture called
    “CSVideoNet” that can learn visual representations from random measurements for
    compressive sensing (CS) video reconstruction. CSVideoNet is an end-to-end
    trainable and non-iterative model that combines convolutional neural networks
    (CNNs) with a recurrent neural networks (RNN) to facilitate video
    reconstruction by leveraging temporal-spatial features. The proposed network
    can accept random measurements with a multi-level compression ratio (CR). The
    lightly and aggressively compressed measurements offer background information
    and object details, respectively. This is similar to the variable bit rate
    techniques widely used in conventional video coding approaches. The RNN
    employed by CSVideoNet can leverage temporal coherence that exists in adjacent
    video frames to extrapolate motion features and merge them with spatial visual
    features extracted by the CNNs to further enhance reconstruction quality,
    especially at high CRs. We test our CSVideoNet on the UCF-101 dataset.
    Experimental results show that CSVideoNet outperforms the existing video CS
    reconstruction approaches. The results demonstrate that our method can preserve
    relatively excellent visual details from original videos even at a 100x CR,
    which is difficult to realize with the reference approaches. Also, the
    non-iterative nature of CSVideoNet results in an decrease in runtime by three
    orders of magnitude over iterative reconstruction algorithms. Furthermore,
    CSVideoNet can enhance the CR of CS cameras beyond the limitation of
    conventional approaches, ensuring a reduction in bandwidth for data
    transmission. These benefits are especially favorable to high-frame-rate video
    applications.

    On the Potential of Simple Framewise Approaches to Piano Transcription

    Rainer Kelz, Matthias Dorfer, Filip Korzeniowski, Sebastian Böck, Andreas Arzt, Gerhard Widmer
    Comments: Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR 2016), New York, NY
    Subjects: Sound (cs.SD); Learning (cs.LG)

    In an attempt at exploring the limitations of simple approaches to the task
    of piano transcription (as usually defined in MIR), we conduct an in-depth
    analysis of neural network-based framewise transcription. We systematically
    compare different popular input representations for transcription systems to
    determine the ones most suitable for use with neural networks. Exploiting
    recent advances in training techniques and new regularizers, and taking into
    account hyper-parameter tuning, we show that it is possible, by simple
    bottom-up frame-wise processing, to obtain a piano transcriber that outperforms
    the current published state of the art on the publicly available MAPS dataset
    — without any complex post-processing steps. Thus, we propose this simple
    approach as a new baseline for this dataset, for future transcription research
    to build on and improve.

    Towards End-to-End Audio-Sheet-Music Retrieval

    Matthias Dorfer, Andreas Arzt, Gerhard Widmer
    Comments: In NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop, Barcelona, Spain
    Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Learning (cs.LG)

    This paper demonstrates the feasibility of learning to retrieve short
    snippets of sheet music (images) when given a short query excerpt of music
    (audio) — and vice versa –, without any symbolic representation of music or
    scores. This would be highly useful in many content-based musical retrieval
    scenarios. Our approach is based on Deep Canonical Correlation Analysis (DCCA)
    and learns correlated latent spaces allowing for cross-modality retrieval in
    both directions. Initial experiments with relatively simple monophonic music
    show promising results.

    Feature Learning for Chord Recognition: The Deep Chroma Extractor

    Filip Korzeniowski, Gerhard Widmer
    Comments: In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), New York, USA, 2016
    Subjects: Sound (cs.SD); Learning (cs.LG)

    We explore frame-level audio feature learning for chord recognition using
    artificial neural networks. We present the argument that chroma vectors
    potentially hold enough information to model harmonic content of audio for
    chord recognition, but that standard chroma extractors compute too noisy
    features. This leads us to propose a learned chroma feature extractor based on
    artificial neural networks. It is trained to compute chroma features that
    encode harmonic information important for chord recognition, while being robust
    to irrelevant interferences. We achieve this by feeding the network an audio
    spectrum with context instead of a single frame as input. This way, the network
    can learn to selectively compensate noise and resolve harmonic ambiguities.

    We compare the resulting features to hand-crafted ones by using a simple
    linear frame-wise classifier for chord recognition on various data sets. The
    results show that the learned feature extractor produces superior chroma
    vectors for chord recognition.

    Graphical RNN Models

    Ashish Bora, Sugato Basu, Joydeep Ghosh
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Many time series are generated by a set of entities that interact with one
    another over time. This paper introduces a broad, flexible framework to learn
    from multiple inter-dependent time series generated by such entities. Our
    framework explicitly models the entities and their interactions through time.
    It achieves this by building on the capabilities of Recurrent Neural Networks,
    while also offering several ways to incorporate domain knowledge/constraints
    into the model architecture. The capabilities of our approach are showcased
    through an application to weather prediction, which shows gains over strong
    baselines.

    Optimal structure and parameter learning of Ising models

    Andrey Y. Lokhov, Marc Vuffray, Sidhant Misra, Michael Chertkov
    Comments: 4 pages, 11 pages of supplementary information
    Subjects: Statistical Mechanics (cond-mat.stat-mech); Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

    Reconstruction of structure and parameters of a graphical model from binary
    samples is a problem of practical importance in a variety of disciplines,
    ranging from statistical physics and computational biology to image processing
    and machine learning. The focus of the research community shifted towards
    developing universal reconstruction algorithms which are both computationally
    efficient and require the minimal amount of expensive data. We introduce a new
    method, Interaction Screening, which accurately estimates the model parameters
    using local optimization problems. The algorithm provably achieves perfect
    graph structure recovery with an information-theoretically optimal number of
    samples and outperforms state of the art techniques, especially in the
    low-temperature regime which is known to be the hardest for learning. We assess
    the efficacy of Interaction Screening through extensive numerical tests on
    Ising models of various topologies and with different types of interactions,
    ranging from ferromagnetic to spin-glass.

    Graph-based semi-supervised learning for relational networks

    Leto Peel
    Comments: 11 pages, 8 figures
    Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

    We address the problem of semi-supervised learning in relational networks,
    networks in which nodes are entities and links are the relationships or
    interactions between them. Typically this problem is confounded with the
    problem of graph-based semi-supervised learning (GSSL), because both problems
    represent the data as a graph and predict the missing class labels of nodes.
    However, not all graphs are created equally. In GSSL a graph is constructed,
    often from independent data, based on similarity. As such, edges tend to
    connect instances with the same class label. Relational networks, however, can
    be more heterogeneous and edges do not always indicate similarity. For
    instance, instead of links being more likely to connect nodes with the same
    class label, they may occur more frequently between nodes with different class
    labels (link-heterogeneity). Or nodes with the same class label do not
    necessarily have the same type of connectivity across the whole network
    (class-heterogeneity), e.g. in a network of sexual interactions we may observe
    links between opposite genders in some parts of the graph and links between the
    same genders in others. Performing classification in networks with different
    types of heterogeneity is a hard problem that is made harder still when we do
    not know a-priori the type or level of heterogeneity. Here we present two
    scalable approaches for graph-based semi-supervised learning for the more
    general case of relational networks. We demonstrate these approaches on
    synthetic and real-world networks that display different link patterns within
    and between classes. Compared to state-of-the-art approaches, ours give better
    classification performance without prior knowledge of how classes interact. In
    particular, our two-step label propagation algorithm gives consistently good
    accuracy and runs on networks of over 1.6 million nodes and 30 million edges in
    around 12 seconds.

    Improving Neural Network Generalization by Combining Parallel Circuits with Dropout

    Kien Tuong Phan, Tomas Henrique Maul, Tuong Thuy Vu, Lai Weng Kin
    Comments: Pre-print. The final publication is available at Springer via this http URL
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    In an attempt to solve the lengthy training times of neural networks, we
    proposed Parallel Circuits (PCs), a biologically inspired architecture.
    Previous work has shown that this approach fails to maintain generalization
    performance in spite of achieving sharp speed gains. To address this issue, and
    motivated by the way Dropout prevents node co-adaption, in this paper, we
    suggest an improvement by extending Dropout to the PC architecture. The paper
    provides multiple insights into this combination, including a variety of fusion
    approaches. Experiments show promising results in which improved error rates
    are achieved in most cases, whilst maintaining the speed advantage of the PC
    approach.

    Dynamical Kinds and their Discovery

    Benjamin C. Jantzen
    Comments: Accepted for the proceedings of the Causation: Foundation to Application Workshop, UAI 2016
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

    We demonstrate the possibility of classifying causal systems into kinds that
    share a common structure without first constructing an explicit dynamical model
    or using prior knowledge of the system dynamics. The algorithmic ability to
    determine whether arbitrary systems are governed by causal relations of the
    same form offers significant practical applications in the development and
    validation of dynamical models. It is also of theoretical interest as an
    essential stage in the scientific inference of laws from empirical data. The
    algorithm presented is based on the dynamical symmetry approach to dynamical
    kinds. A dynamical symmetry with respect to time is an intervention on one or
    more variables of a system that commutes with the time evolution of the system.
    A dynamical kind is a class of systems sharing a set of dynamical symmetries.
    The algorithm presented classifies deterministic, time-dependent causal systems
    by directly comparing their exhibited symmetries. Using simulated, noisy data
    from a variety of nonlinear systems, we show that this algorithm correctly
    sorts systems into dynamical kinds. It is robust under significant sampling
    error, is immune to violations of normality in sampling error, and fails
    gracefully with increasing dynamical similarity. The algorithm we demonstrate
    is the first to address this aspect of automated scientific discovery.

    Semi-Supervised Phone Classification using Deep Neural Networks and Stochastic Graph-Based Entropic Regularization

    Sunil Thulasidasan, Jeffrey Bilmes
    Comments: InterSpeech Workshop on Machine Learning in Speech and Language Processing, 2016
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We describe a graph-based semi-supervised learning framework in the context
    of deep neural networks that uses a graph-based entropic regularizer to favor
    smooth solutions over a graph induced by the data. The main contribution of
    this work is a computationally efficient, stochastic graph-regularization
    technique that uses mini-batches that are consistent with the graph structure,
    but also provides enough stochasticity (in terms of mini-batch data diversity)
    for convergence of stochastic gradient descent methods to good solutions. For
    this work, we focus on results of frame-level phone classification accuracy on
    the TIMIT speech corpus but our method is general and scalable to much larger
    data sets. Results indicate that our method significantly improves
    classification accuracy compared to the fully-supervised case when the fraction
    of labeled data is low, and it is competitive with other methods in the fully
    labeled case.

    Efficient Distributed Semi-Supervised Learning using Stochastic Regularization over Affinity Graphs

    Sunil Thulasidasan, Jeffrey Bilmes, Garrett Kenyon
    Comments: NIPS 2016 Workshop on Machine Learning Systems
    Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    We describe a computationally efficient, stochastic graph-regularization
    technique that can be utilized for the semi-supervised training of deep neural
    networks in a parallel or distributed setting. We utilize a technique, first
    described in [13] for the construction of mini-batches for stochastic gradient
    descent (SGD) based on synthesized partitions of an affinity graph that are
    consistent with the graph structure, but also preserve enough stochasticity for
    convergence of SGD to good local minima. We show how our technique allows a
    graph-based semi-supervised loss function to be decomposed into a sum over
    objectives, facilitating data parallelism for scalable training of machine
    learning models. Empirical results indicate that our method significantly
    improves classification accuracy compared to the fully-supervised case when the
    fraction of labeled data is low, and in the parallel case, achieves significant
    speed-up in terms of wall-clock time to convergence. We show the results for
    both sequential and distributed-memory semi-supervised DNN training on a speech
    corpus.

    Deep learning is effective for the classification of OCT images of normal versus Age-related Macular Degeneration

    Cecilia S. Lee, Doug M. Baughman, Aaron Y. Lee
    Comments: 4 Figures, 1 Table
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Objective: The advent of Electronic Medical Records (EMR) with large
    electronic imaging databases along with advances in deep neural networks with
    machine learning has provided a unique opportunity to achieve milestones in
    automated image analysis. Optical coherence tomography (OCT) is the most
    commonly obtained imaging modality in ophthalmology and represents a dense and
    rich dataset when combined with labels derived from the EMR. We sought to
    determine if deep learning could be utilized to distinguish normal OCT images
    from images from patients with Age-related Macular Degeneration (AMD). Methods:
    Automated extraction of an OCT imaging database was performed and linked to
    clinical endpoints from the EMR. OCT macula scans were obtained by Heidelberg
    Spectralis, and each OCT scan was linked to EMR clinical endpoints extracted
    from EPIC. The central 11 images were selected from each OCT scan of two
    cohorts of patients: normal and AMD. Cross-validation was performed using a
    random subset of patients. Area under receiver operator curves (auROC) were
    constructed at an independent image level, macular OCT level, and patient
    level. Results: Of an extraction of 2.6 million OCT images linked to clinical
    datapoints from the EMR, 52,690 normal and 48,312 AMD macular OCT images were
    selected. A deep neural network was trained to categorize images as either
    normal or AMD. At the image level, we achieved an auROC of 92.78% with an
    accuracy of 87.63%. At the macula level, we achieved an auROC of 93.83% with an
    accuracy of 88.98%. At a patient level, we achieved an auROC of 97.45% with an
    accuracy of 93.45%. Peak sensitivity and specificity with optimal cutoffs were
    92.64% and 93.69% respectively. Conclusions: Deep learning techniques are
    effective for classifying OCT images. These findings have important
    implications in utilizing OCT in automated screening and computer aided
    diagnosis tools.

    Interpretable Semantic Textual Similarity: Finding and explaining differences between sentences

    I. Lopez-Gazpio, M. Maritxalar, A. Gonzalez-Agirre, G. Rigau, L. Uria, E. Agirre
    Comments: Preprint version, Knowledge-Based Systems (ISSN: 0950-7051). (2016)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    User acceptance of artificial intelligence agents might depend on their
    ability to explain their reasoning, which requires adding an interpretability
    layer that fa- cilitates users to understand their behavior. This paper focuses
    on adding an in- terpretable layer on top of Semantic Textual Similarity (STS),
    which measures the degree of semantic equivalence between two sentences. The
    interpretability layer is formalized as the alignment between pairs of segments
    across the two sentences, where the relation between the segments is labeled
    with a relation type and a similarity score. We present a publicly available
    dataset of sentence pairs annotated following the formalization. We then
    develop a system trained on this dataset which, given a sentence pair, explains
    what is similar and different, in the form of graded and typed segment
    alignments. When evaluated on the dataset, the system performs better than an
    informed baseline, showing that the dataset and task are well-defined and
    feasible. Most importantly, two user studies show how the system output can be
    used to automatically produce explanations in natural language. Users performed
    better when having access to the explanations, pro- viding preliminary evidence
    that our dataset and method to automatically produce explanations is useful in
    real applications.

    Uncovering the Dynamics of Crowdlearning and the Value of Knowledge

    Utkarsh Upadhyay, Isabel Valera, Manuel Gomez-Rodriguez
    Comments: To appear in Tenth ACM International conference on Web Search and Data Mining (WSDM) in 2017
    Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Physics and Society (physics.soc-ph); Machine Learning (stat.ML)

    Learning from the crowd has become increasingly popular in the Web and social
    media. There is a wide variety of crowdlearning sites in which, on the one
    hand, users learn from the knowledge that other users contribute to the site,
    and, on the other hand, knowledge is reviewed and curated by the same users
    using assessment measures such as upvotes or likes.

    In this paper, we present a probabilistic modeling framework of
    crowdlearning, which uncovers the evolution of a user’s expertise over time by
    leveraging other users’ assessments of her contributions. The model allows for
    both off-site and on-site learning and captures forgetting of knowledge. We
    then develop a scalable estimation method to fit the model parameters from
    millions of recorded learning and contributing events. We show the
    effectiveness of our model by tracing activity of ~25 thousand users in Stack
    Overflow over a 4.5 year period. We find that answers with high knowledge value
    are rare. Newbies and experts tend to acquire less knowledge than users in the
    middle range. Prolific learners tend to be also proficient contributors that
    post answers with high knowledge value.


    Information Theory

    Lossy Transmission of Correlated Sources over a Multiple Access Channel: Necessary Conditions and Separation Results

    Basak Guler, Deniz Gunduz, Aylin Yener
    Comments: Submitted to IEEE Transactions on Information Theory on Nov 30, 2016
    Subjects: Information Theory (cs.IT)

    Lossy communication of correlated sources over a multiple access channel is
    studied. First, lossy communication is investigated in the presence of
    correlated decoder side information. An achievable joint source-channel coding
    scheme is presented, and the conditions under which separate source and channel
    coding is optimal are explored. It is shown that separation is optimal when the
    encoders and the decoder have access to a common observation conditioned on
    which the two sources are independent. Separation is shown to be optimal also
    when only the encoders have access to such a common observation whose lossless
    recovery is required at the decoder. Moreover, the optimality of separation is
    shown for sources with a common part, and sources with reconstruction
    constraints. Next, these results obtained for the system in presence of side
    information are utilized to provide a set of necessary conditions for the
    transmission of correlated sources over a multiple access channel without side
    information. The identified necessary conditions are specialized to the case of
    bivariate Gaussian sources over a Gaussian multiple access channel, and are
    shown to be tighter than known results in the literature in certain cases. Our
    results indicate that side information can have a significant impact on the
    optimality of source-channel separation in lossy transmission, in addition to
    being instrumental in identifying necessary conditions for the transmission of
    correlated sources when no side information is present.

    Privacy-Protecting Energy Management Unit through Model-Distribution Predictive Control

    Jun-Xing Chin, Tomas Tinoco De Rubira, Gabriela Hug
    Comments: Pre-print, submitted for review
    Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

    The roll-out of smart meters in electricity networks introduces risks for
    consumer privacy due to increased measurement frequency and granularity.
    Through various Non-Intrusive Load Monitoring techniques, consumer behavior may
    be inferred from their metering data. In this paper, we propose an energy
    management method that protects privacy through the minimization of information
    leakage. The method is based on a Model Predictive Controller that utilizes
    energy storage and local generation, and that predicts the effects of its
    actions on the statistics of the actual energy consumption of a consumer and
    that seen by the grid. Computationally, the method requires solving a
    Mixed-Integer Quadratic Program of manageable size whenever new meter readings
    are available. We simulate the controller on generated residential load
    profiles with different privacy costs in a two-tier time-of-use energy pricing
    environment. Results show that information leakage is effectively reduced at
    the expense of increased energy cost. The results also show that, using the
    proposed controller, the consumer load profile seen by the grid resembles a
    mixture between that obtained with Non-Intrusive Load Leveling and Lazy
    Stepping.

    Variations of the McEliece Cryptosystem

    Jessalyn Bolkema, Heide Gluesing-Luerssen, Christine A. Kelley, Kristin Lauter, Beth Malmskog, Joachim Rosenthal
    Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)

    Two variations of the McEliece cryptosystem are presented. The first one is
    based on a relaxation of the column permutation in the classical McEliece
    scrambling process. This is done in such a way that the Hamming weight of the
    error, added in the encryption process, can be controlled so that efficient
    decryption remains possible. The second variation is based on the use of
    spatially coupled moderate-density parity-check codes as secret codes. These
    codes are known for their excellent error-correction performance and allow for
    a relatively low key size in the cryptosystem. For both variants the security
    with respect to known attacks is discussed.

    QoS-Based Linear Transceiver Optimization for Full-Duplex Multi-User Communications

    Tsung-Hui Chang, Ya-Feng Liu, Shih-Chun Lin
    Comments: submitted for publication
    Subjects: Information Theory (cs.IT)

    In this paper, we consider a multi-user wireless system with one full duplex
    (FD) base station (BS) serving a set of half duplex (HD) mobile users.To cope
    with the in-band self-interference (SI) and co-channel interference, we
    formulate a quality-of-service (QoS) based linear transceiver design problem.
    The problem jointly optimizes the downlink (DL) and uplink (UL) beamforming
    vectors of the BS and the transmission powers of UL users so as to provide both
    the DL and UL users with guaranteed signal-to-interference-plus-noise ratio
    performance, using a minimum UL and DL transmission sum power.The considered
    system model not only takes into account noise caused by non-ideal RF circuits,
    analog/digital SI cancellation but also constrains the maximum signal power at
    the input of the analog-to-digital converter (ADC) for avoiding signal
    distortion due to finite ADC precision. The formulated design problem is not
    convex and challenging to solve in general. We first show that for a special
    case where the SI channel estimation errors are independent and identically
    distributed, the QoS-based linear transceiver design problem is globally
    solvable by a polynomial-time bisection algorithm.For the general case, we
    propose a suboptimal algorithm based on alternating optimization (AO). The AO
    algorithm is guaranteed to converge to a Karush-Kuhn-Tucker solution.To reduce
    the complexity of the AO algorithm, we further develop a fixed-point method by
    extending the classical uplink-downlink duality in HD systems to the FD
    system.Simulation results are presented to demonstrate the performance of the
    proposed algorithms and the comparison with HD systems.

    Antenna Selection for MIMO Non-orthogonal Multiple Access Systems

    Yuehua Yu, He Chen, Yonghui Li, Zhiguo Ding, Branka Vucetic
    Comments: Submitted for possible journal publication
    Subjects: Information Theory (cs.IT)

    This paper considers the joint antenna selection (AS) problem for a classical
    two-user MIMO non-orthogonal multiple access (NOMA) system, where both the base
    station (BS) and users (UEs) are equipped with multiple antennas. Specifically,
    several computationally-efficient AS algorithms are developed for two
    commonly-used NOMA scenarios: fixed power allocation NOMA (F-NOMA) and
    cognitive radio-inspired NOMA (CR-NOMA). For the F-NOMA system, two novel AS
    schemes, namely max-max-max AS (A(^3)-AS) and max-min-max AS (AIA-AS), are
    proposed to maximize the system sum-rate, without and with the consideration of
    user fairness, respectively. In the CR-NOMA network, a novel AS algorithm,
    termed maximum-channel-gain-based AS (MCG-AS), is proposed to maximize the
    achievable rate of the secondary user, under the condition that the primary
    user’s quality of service requirement is satisfied. The asymptotic closed-form
    expressions of the average sum-rate for A(^3)-AS and AIA-AS and that of the
    average rate of the secondary user for MCG-AS are derived, respectively.
    Numerical results demonstrate that the AIA-AS provides better user-fairness,
    while the A(^3)-AS achieves a near-optimal sum-rate in F-NOMA systems. For the
    CR-NOMA scenario, MCG-AS achieves a near-optimal performance in a wide SNR
    regime. Furthermore, all the proposed AS algorithms yield a significant
    computational complexity reduction, compared to exhaustive search-based
    counterparts.

    Optical Adaptive Precoding for Visible Light Communications

    Hanaa Marshoud, Paschalis C. Sofotasios, Sami Muhaidat, Bayan S. Sharif, George K. Karagiannidis
    Subjects: Information Theory (cs.IT)

    Multiple-input multiple-output (MIMO) techniques have recently demonstrated
    significant potentials in visible light communications (VLC), as they can
    overcome the modulation bandwidth limitation and provide substantial
    improvement in terms of spectral efficiency and link reliability. However, MIMO
    systems typically suffer from inter-channel interference, which causes severe
    degradation to the system performance. In this context, we propose a novel
    optical adaptive precoding (OAP) scheme for the downlink of MIMO VLC systems,
    which exploits the knowledge of transmitted symbols to enhance the effective
    signal-to-interference-plus-noise ratio. We also derive bit-error-rate
    expressions for the OAP under perfect and outdated channel state information
    (CSI). Our results demonstrate that the proposed scheme is more robust to both
    CSI error and channel correlation, compared to conventional channel inversion
    precoding.

    State Estimation with Secrecy against Eavesdroppers

    Anastasios Tsiamis, Konstantinos Gatsis, George J. Pappas
    Subjects: Systems and Control (cs.SY); Cryptography and Security (cs.CR); Information Theory (cs.IT)

    We study the problem of remote state estimation, in the presence of an
    eavesdropper. An authorized user estimates the state of a linear plant, based
    on the data received from a sensor, while the data may also be intercepted by
    the eavesdropper. To maintain confidentiality with respect to state, we
    introduce a novel control-theoretic definition of perfect secrecy requiring
    that the user’s expected error remains bounded while the eavesdropper’s
    expected error grows unbounded. We propose a secrecy mechanism which guarantees
    perfect secrecy by randomly withholding sensor information, under the condition
    that the user’s packet reception rate is larger than the eavesdropper’s
    interception rate. Given this mechanism, we also explore the tradeoff between
    user’s utility and confidentiality with respect to the eavesdropper, via an
    optimization problem. Finally, some examples are studied to provide insights
    about this tradeoff.

    A Data-Driven Compressive Sensing Framework Tailored For Energy-Efficient Wearable Sensing

    Kai Xu, Yixing Li, Fengbo Ren
    Comments: Accepted as an oral presentation in 42th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    Subjects: Learning (cs.LG); Information Theory (cs.IT)

    Compressive sensing (CS) is a promising technology for realizing
    energy-efficient wireless sensors for long-term health monitoring. However,
    conventional model-driven CS frameworks suffer from limited compression ratio
    and reconstruction quality when dealing with physiological signals due to
    inaccurate models and the overlook of individual variability. In this paper, we
    propose a data-driven CS framework that can learn signal characteristics and
    personalized features from any individual recording of physiologic signals to
    enhance CS performance with a minimized number of measurements. Such
    improvements are accomplished by a co-training approach that optimizes the
    sensing matrix and the dictionary towards improved restricted isometry property
    and signal sparsity, respectively. Experimental results upon ECG signals show
    that the proposed method, at a compression ratio of 10x, successfully reduces
    the isometry constant of the trained sensing matrices by 86% against random
    matrices and improves the overall reconstructed signal-to-noise ratio by 15dB
    over conventional model-driven approaches.




沪ICP备19023445号-2号
友情链接