IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Mon, 27 Feb 2017

    我爱机器学习(52ml.net)发表于 2017-02-27 00:00:00
    love 0

    Neural and Evolutionary Computing

    Control of the Correlation of Spontaneous Neuron Activity in Biological and Noise-activated CMOS Artificial Neural Microcircuits

    Ramin M. Hasani, Giorgio Ferrari, Hideaki Yamamoto, Sho Kono, Koji Ishihara, Soya Fujimori, Takashi Tanii, Enrico Prati
    Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM)

    There are several indications that brain is organized not on a basis of
    individual unreliable neurons, but on a micro-circuital scale providing Lego
    blocks employed to create complex architectures. At such an intermediate scale,
    the firing activity in the microcircuits is governed by collective effects
    emerging by the background noise soliciting spontaneous firing, the degree of
    mutual connections between the neurons, and the topology of the connections. We
    compare spontaneous firing activity of small populations of neurons adhering to
    an engineered scaffold with simulations of biologically plausible CMOS
    artificial neuron populations whose spontaneous activity is ignited by tailored
    background noise. We provide a full set of flexible and low-power consuming
    silicon blocks including neurons, excitatory and inhibitory synapses, and both
    white and pink noise generators for spontaneous firing activation. We achieve a
    comparable degree of correlation of the firing activity of the biological
    neurons by controlling the kind and the number of connection among the silicon
    neurons. The correlation between groups of neurons, organized as a ring of four
    distinct populations connected by the equivalent of interneurons, is triggered
    more effectively by adding multiple synapses to the connections than increasing
    the number of independent point-to-point connections. The comparison between
    the biological and the artificial systems suggests that a considerable number
    of synapses is active also in biological populations adhering to engineered
    scaffolds.

    RNN Decoding of Linear Block Codes

    Eliya Nachmani, Elad Marciano, David Burshtein, Yair Be'ery
    Subjects: Information Theory (cs.IT); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Designing a practical, low complexity, close to optimal, channel decoder for
    powerful algebraic codes with short to moderate block length is an open
    research problem. Recently it has been shown that a feed-forward neural network
    architecture can improve on standard belief propagation decoding, despite the
    large example space. In this paper we introduce a recurrent neural network
    architecture for decoding linear block codes. Our method shows comparable bit
    error rate results compared to the feed-forward neural network with
    significantly less parameters. We also demonstrate improved performance over
    belief propagation on sparser Tanner graph representations of the codes.
    Furthermore, we demonstrate that the RNN decoder can be used to improve the
    performance or alternatively reduce the computational complexity of the mRRD
    algorithm for low complexity, close to optimal, decoding of short BCH codes.


    Computer Vision and Pattern Recognition

    A recommender system to restore images with impulse noise

    Alfredo Nava-Tudela
    Comments: 22 pages, 34 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    We build a collaborative filtering recommender system to restore images with
    impulse noise for which the noisy pixels have been previously identified. We
    define this recommender system in terms of a new color image representation
    using three matrices that depend on the noise-free pixels of the image to
    restore, and two parameters: (k), the number of features; and (lambda), the
    regularization factor. We perform experiments on a well known image database to
    test our algorithm and we provide image quality statistics for the results
    obtained. We discuss the roles of bias and variance in the performance of our
    algorithm as determined by the values of (k) and (lambda), and provide
    guidance on how to choose the values of these parameters. Finally, we discuss
    the possibility of using our collaborative filtering recommender system to
    perform image inpainting and super-resolution.

    How ConvNets model Non-linear Transformations

    Dipan K. Pal, Marios Savvides
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In this paper, we theoretically address three fundamental problems involving
    deep convolutional networks regarding invariance, depth and hierarchy. We
    introduce the paradigm of Transformation Networks (TN) which are a direct
    generalization of Convolutional Networks (ConvNets). Theoretically, we show
    that TNs (and thereby ConvNets) are can be invariant to non-linear
    transformations of the input despite pooling over mere local translations. Our
    analysis provides clear insights into the increase in invariance with depth in
    these networks. Deeper networks are able to model much richer classes of
    transformations. We also find that a hierarchical architecture allows the
    network to generate invariance much more efficiently than a non-hierarchical
    network. Our results provide useful insight into these three fundamental
    problems in deep learning using ConvNets.

    Fast and robust curve skeletonization for real-world elongated objects

    Amy Tabb, Henry Medeiros
    Comments: 33 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

    We consider the problem of extracting curve skeletons of three-dimensional,
    elongated objects given a noisy surface, which has applications in agricultural
    contexts such as extracting the branching structure of plants. We describe an
    efficient and robust method based on breadth-first search that can determine
    curve skeletons in these contexts. Our approach is capable of automatically
    detecting junction points as well as spurious segments and loops. All of that
    is accomplished with only one user-adjustable parameter. The run time of our
    method ranges from hundreds of milliseconds to less than four seconds on large,
    challenging datasets, which makes it appropriate for situations where real-time
    decision making is needed. Experiments on synthetic models as well as on data
    from real world objects, some of which were collected in challenging field
    conditions, show that our approach compares favorably to classical thinning
    algorithms as well as to recent contributions to the field.

    Automatic segmentation in dynamic outdoor environments

    Amy Tabb, Henry Medeiros
    Comments: 5 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Segmentation in dynamic outdoor environments can be difficult when the
    illumination levels and other aspects of the scene cannot be controlled. In
    this paper, we describe a method that uses superpixels to determine low texture
    regions of the image that correspond to the background material, and then show
    how this information can be integrated with the color distribution of the image
    to compute optimal segmentation parameters for traditional binary segmentation
    as well as to produce silhouette probability maps. We show results of this
    algorithm in the context of an application for tree modeling.

    How hard is it to cross the room? — Training (Recurrent) Neural Networks to steer a UAV

    Klaas Kelchtermans, Tinne Tuytelaars
    Comments: 12 pages, 30 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This work explores the feasibility of steering a drone with a (recurrent)
    neural network, based on input from a forward looking camera, in the context of
    a high-level navigation task. We set up a generic framework for training a
    network to perform navigation tasks based on imitation learning. It can be
    applied to both aerial and land vehicles. As a proof of concept we apply it to
    a UAV (Unmanned Aerial Vehicle) in a simulated environment, learning to cross a
    room containing a number of obstacles. So far only feedforward neural networks
    (FNNs) have been used to train UAV control. To cope with more complex tasks, we
    propose the use of recurrent neural networks (RNN) instead and successfully
    train an LSTM (Long-Short Term Memory) network for controlling UAVs. Vision
    based control is a sequential prediction problem, known for its highly
    correlated input data. The correlation makes training a network hard,
    especially an RNN. To overcome this issue, we investigate an alternative
    sampling method during training, namely window-wise truncated backpropagation
    through time (WW-TBPTT). Further, end-to-end training requires a lot of data
    which often is not available. Therefore, we compare the performance of
    retraining only the Fully Connected (FC) and LSTM control layers with networks
    which are trained end-to-end. Performing the relatively simple task of crossing
    a room already reveals important guidelines and good practices for training
    neural control networks. Different visualizations help to explain the behavior
    learned.

    Toward high-performance online HCCR: a CNN approach with DropDistortion, path signature and spatial stochastic max-pooling

    Songxuan Lai, Lianwen Jin, Weixin Yang
    Comments: 10 pages, 7 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents an investigation of several techniques that increase the
    accuracy of online handwritten Chinese character recognition (HCCR). We propose
    a new training strategy named DropDistortion to train a deep convolutional
    neural network (DCNN) with distorted samples. DropDistortion gradually lowers
    the degree of character distortion during training, which allows the DCNN to
    better generalize. Path signature is used to extract effective features for
    online characters. Further improvement is achieved by employing spatial
    stochastic max-pooling as a method of feature map distortion and model
    averaging. Experiments were carried out on three publicly available datasets,
    namely CASIA-OLHWDB 1.0, CASIA-OLHWDB 1.1, and the ICDAR2013 online HCCR
    competition dataset. The proposed techniques yield state-of-the-art recognition
    accuracies of 97.67%, 97.30%, and 97.99%, respectively.

    Deep representation learning for human motion prediction and classification

    Judith Bütepage, Michael Black, Danica Kragic, Hedvig Kjellström
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Generative models of 3D human motion are often restricted to a small number
    of activities and can therefore not generalize well to novel movements or
    applications. In this work we propose a deep learning framework for human
    motion capture data that learns a generic representation from a large corpus of
    motion capture data and generalizes well to new, unseen, motions. Using an
    encoding-decoding network that learns to predict future 3D poses from the most
    recent past, we extract a feature representation of human motion. Most work on
    deep learning for sequence prediction focuses on video and speech. Since
    skeletal data has a different structure, we present and evaluate different
    network architectures that make different assumptions about time dependencies
    and limb correlations. To quantify the learned features, we use the output of
    different layers for action classification and visualize the receptive fields
    of the network units. Our method outperforms the recent state of the art in
    skeletal motion prediction even though these use action specific training data.
    Our results show that deep feedforward networks, trained from a generic mocap
    database, can successfully be used for feature extraction from human motion
    data and that this representation can be used as a foundation for
    classification and prediction.

    Speckle Reduction with Trained Nonlinear Diffusion Filtering

    Wensen Feng, Yunjin Chen
    Comments: to appear in Journal of Mathematical Imaging and Vision. Demo codes are available from this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Speckle reduction is a prerequisite for many image processing tasks in
    synthetic aperture radar (SAR) images, as well as all coherent images. In
    recent years, predominant state-of-the-art approaches for despeckling are
    usually based on nonlocal methods which mainly concentrate on achieving utmost
    image restoration quality, with relatively low computational efficiency.
    Therefore, in this study we aim to propose an efficient despeckling model with
    both high computational efficiency and high recovery quality. To this end, we
    exploit a newly-developed trainable nonlinear reaction diffusion(TNRD)
    framework which has proven a simple and effective model for various image
    restoration problems. {In the original TNRD applications, the diffusion network
    is usually derived based on the direct gradient descent scheme. However, this
    approach will encounter some problem for the task of multiplicative noise
    reduction exploited in this study. To solve this problem, we employed a new
    architecture derived from the proximal gradient descent method.} {Taking into
    account the speckle noise statistics, the diffusion process for the despeckling
    task is derived. We then retrain all the model parameters in the presence of
    speckle noise. Finally, optimized nonlinear diffusion filtering models are
    obtained, which are specialized for despeckling with various noise levels.
    Experimental results substantiate that the trained filtering models provide
    comparable or even better results than state-of-the-art nonlocal approaches.
    Meanwhile, our proposed model merely contains convolution of linear filters
    with an image, which offers high level parallelism on GPUs. As a consequence,
    for images of size (512 imes 512), our GPU implementation takes less than 0.1
    seconds to produce state-of-the-art despeckling performance.}

    Simultaneous Feature and Body-Part Learning for Real-Time Robot Awareness of Human Behaviors

    Fei Han, Xue Yang, Christopher Reardon, Yu Zhang, Hao Zhang
    Comments: 8 pages, 6 figures, accepted by ICRA’17
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    Robot awareness of human actions is an essential research problem in robotics
    with many important real-world applications, including human-robot
    collaboration and teaming. Over the past few years, depth sensors have become a
    standard device widely used by intelligent robots for 3D perception, which can
    also offer human skeletal data in 3D space. Several methods based on skeletal
    data were designed to enable robot awareness of human actions with satisfactory
    accuracy. However, previous methods treated all body parts and features equally
    important, without the capability to identify discriminative body parts and
    features. In this paper, we propose a novel simultaneous Feature And Body-part
    Learning (FABL) approach that simultaneously identifies discriminative body
    parts and features, and efficiently integrates all available information
    together to enable real-time robot awareness of human behaviors. We formulate
    FABL as a regression-like optimization problem with structured
    sparsity-inducing norms to model interrelationships of body parts and features.
    We also develop an optimization algorithm to solve the formulated problem,
    which possesses a theoretical guarantee to find the optimal solution. To
    evaluate FABL, three experiments were performed using public benchmark
    datasets, including the MSR Action3D and CAD-60 datasets, as well as a Baxter
    robot in practical assistive living applications. Experimental results show
    that our FABL approach obtains a high recognition accuracy with a processing
    speed of the order-of-magnitude of 10e4 Hz, which makes FABL a promising method
    to enable real-time robot awareness of human behaviors in practical robotics
    applications.

    Learning Non-local Image Diffusion for Image Denoising

    Peng Qiao, Yong Dou, Wensen Feng, Yunjin Chen
    Comments: under review in a journal
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Image diffusion plays a fundamental role for the task of image denoising.
    Recently proposed trainable nonlinear reaction diffusion (TNRD) model defines a
    simple but very effective framework for image denoising. However, as the TNRD
    model is a local model, the diffusion behavior of which is purely controlled by
    information of local patches, it is prone to create artifacts in the homogenous
    regions and over-smooth highly textured regions, especially in the case of
    strong noise levels. Meanwhile, it is widely known that the non-local
    self-similarity (NSS) prior stands as an effective image prior for image
    denoising, which has been widely exploited in many non-local methods. In this
    work, we are highly motivated to embed the NSS prior into the TNRD model to
    tackle its weaknesses. In order to preserve the expected property that
    end-to-end training is available, we exploit the NSS prior by a set of
    non-local filters, and derive our proposed trainable non-local reaction
    diffusion (TNLRD) model for image denoising. Together with the local filters
    and influence functions, the non-local filters are learned by employing
    loss-specific training. The experimental results show that the trained TNLRD
    model produces visually plausible recovered images with more textures and less
    artifacts, compared to its local versions. Moreover, the trained TNLRD model
    can achieve strongly competitive performance to recent state-of-the-art image
    denoising methods in terms of peak signal-to-noise ratio (PSNR) and structural
    similarity index (SSIM).

    Viewpoint Adaptation for Rigid Object Detection

    Patrick Wang, Kenneth Morton, Peter Torrione, Leslie Collins
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    An object detector performs suboptimally when applied to image data taken
    from a viewpoint different from the one with which it was trained. In this
    paper, we present a viewpoint adaptation algorithm that allows a trained
    single-view object detector to be adapted to a new, distinct viewpoint. We
    first illustrate how a feature space transformation can be inferred from a
    known homography between the source and target viewpoints. Second, we show that
    a variety of trained classifiers can be modified to behave as if that
    transformation were applied to each testing instance. The proposed algorithm is
    evaluated on a person detection task using images from the PETS 2007 and CAVIAR
    datasets, as well as from a new synthetic multi-view person detection dataset.
    It yields substantial performance improvements when adapting single-view person
    detectors to new viewpoints, and simultaneously reduces computational
    complexity. This work has the potential to improve detection performance for
    cameras viewing objects from arbitrary viewpoints, while simplifying data
    collection and feature extraction.

    Multi-Context Attention for Human Pose Estimation

    Xiao Chu, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L. Yuille, Xiaogang Wang
    Comments: The first two authors contribute equally to this work
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we propose to incorporate convolutional neural networks with a
    multi-context attention mechanism into an end-to-end framework for human pose
    estimation. We adopt stacked hourglass networks to generate attention maps from
    features at multiple resolutions with various semantics. The Conditional Random
    Field (CRF) is utilized to model the correlations among neighboring regions in
    the attention map. We further combine the holistic attention model, which
    focuses on the global consistency of the full human body, and the body part
    attention model, which focuses on the detailed description for different body
    parts. Hence our model has the ability to focus on different granularity from
    local salient regions to global semantic-consistent spaces. Additionally, we
    design novel Hourglass Residual Units (HRUs) to increase the receptive field of
    the network. These units are extensions of residual units with a side branch
    incorporating filters with larger receptive fields, hence features with various
    scales are learned and combined within the HRUs. The effectiveness of the
    proposed multi-context attention mechanism and the hourglass residual units is
    evaluated on two widely used human pose estimation benchmarks. Our approach
    outperforms all existing methods on both benchmarks over all the body parts.

    WaterGAN: Unsupervised Generative Network to Enable Real-time Color Correction of Monocular Underwater Images

    Jie Li, Katherine A. Skinner, Ryan M. Eustice, Matthew Johnson-Roberson
    Comments: 8 pages, 16 figures, submission to RA-letter/IROS 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    This paper reports on WaterGAN, a generative adversarial network (GAN) for
    generating realistic underwater images from in-air image and depth pairings in
    an unsupervised pipeline used for color correction of monocular underwater
    images. Cameras onboard autonomous and remotely operated vehicles can capture
    high resolution images to map the seafloor, however, underwater image formation
    is subject to the complex process of light propagation through the water
    column. The raw images retrieved are characteristically different than images
    taken in air due to effects such as absorption and scattering, which cause
    attenuation of light at different rates for different wavelengths. While this
    physical process is well described theoretically, the model depends on many
    parameters intrinsic to the water column as well as the objects in the scene.
    These factors make recovery of these parameters difficult without simplifying
    assumptions or field calibration, hence, restoration of underwater images is a
    non-trivial problem. Deep learning has demonstrated great success in modeling
    complex nonlinear systems but requires a large amount of training data, which
    is difficult to compile in deep sea environments. Using WaterGAN, we generate a
    large training dataset of paired imagery, both raw underwater and true color
    in-air, as well as depth data. This data serves as input to a novel end-to-end
    network for color correction of monocular underwater images. Due to the
    depth-dependent water column effects inherent to underwater environments, we
    show that our end-to-end network implicitly learns a coarse depth estimate of
    the underwater scene from monocular underwater images. Our proposed pipeline is
    validated with testing on real data collected from both a pure water tank and
    from underwater surveys in field testing. Source code is made publicly
    available with sample datasets and pretrained models.

    Toward Streaming Synapse Detection with Compositional ConvNets

    Shibani Santurkar, David Budden, Alexander Matveev, Heather Berlin, Hayk Saribekyan, Yaron Meirovitch, Nir Shavit
    Comments: 10 pages, 9 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Connectomics is an emerging field in neuroscience that aims to reconstruct
    the 3-dimensional morphology of neurons from electron microscopy (EM) images.
    Recent studies have successfully demonstrated the use of convolutional neural
    networks (ConvNets) for segmenting cell membranes to individuate neurons.
    However, there has been comparatively little success in high-throughput
    identification of the intercellular synaptic connections required for deriving
    connectivity graphs.

    In this study, we take a compositional approach to segmenting synapses,
    modeling them explicitly as an intercellular cleft co-located with an
    asymmetric vesicle density along a cell membrane. Instead of requiring a deep
    network to learn all natural combinations of this compositionality, we train
    lighter networks to model the simpler marginal distributions of membranes,
    clefts and vesicles from just 100 electron microscopy samples. These feature
    maps are then combined with simple rules-based heuristics derived from prior
    biological knowledge.

    Our approach to synapse detection is both more accurate than previous
    state-of-the-art (7% higher recall and 5% higher F1-score) and yields a 20-fold
    speed-up compared to the previous fastest implementations. We demonstrate by
    reconstructing the first complete, directed connectome from the largest
    available anisotropic microscopy dataset (245 GB) of mouse somatosensory cortex
    (S1) in just 9.7 hours on a single shared-memory CPU system. We believe that
    this work marks an important step toward the goal of a microscope-pace
    streaming connectomics pipeline.

    Feasibility of Principal Component Analysis in hand gesture recognition system

    Tanu Srivastava, Raj Shree Singh, Sunil Kumar, Pavan Chakraborty
    Comments: conference
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Nowadays actions are increasingly being handled in electronic ways, instead
    of physical interaction. From earlier times biometrics is used in the
    authentication of a person. It recognizes a person by using a human trait
    associated with it like eyes (by calculating the distance between the eyes) and
    using hand gestures, fingerprint detection, face detection etc. Advantages of
    using these traits for identification are that they uniquely identify a person
    and cannot be forgotten or lost. These are unique features of a human being
    which are being used widely to make the human life simpler. Hand gesture
    recognition system is a powerful tool that supports efficient interaction
    between the user and the computer. The main moto of hand gesture recognition
    research is to create a system which can recognise specific hand gestures and
    use them to convey useful information for device control. This paper presents
    an experimental study over the feasibility of principal component analysis in
    hand gesture recognition system. PCA is a powerful tool for analyzing data. The
    primary goal of PCA is dimensionality reduction. Frames are extracted from the
    Sheffield KInect Gesture (SKIG) dataset. The implementation is done by creating
    a training set and then training the recognizer. It uses Eigen space by
    processing the eigenvalues and eigenvectors of the images in training set.
    Euclidean distance with the threshold value is used as similarity metric to
    recognize the gestures. The experimental results show that PCA is feasible to
    be used for hand gesture recognition system.

    Improving high-pass fusion method using wavelets

    Hamid Reza Shahdoosti
    Comments: 7 pages, in Persian
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In an appropriate image fusion method, spatial information of the
    panchromatic image is injected into the multispectral images such that the
    spectral information is not distorted. The high-pass modulation method is a
    successful method in image fusion. However, the main drawback of this method is
    that this technique uses the boxcar filter to extract the high frequency
    information of the panchromatic image. Using the boxcar filter introduces the
    ringing effect into the fused image. To cope with this problem, we use the
    wavelet transform instead of boxcar filters. Then, the results of the proposed
    method and those of other methods such as, Brovey, IHS, and PCA ones are
    compared. Experiments show the superiority of the proposed method in terms of
    correlation coefficient and mutual information.

    Inertia-Constrained Pixel-by-Pixel Nonnegative Matrix Factorisation: a Hyperspectral Unmixing Method Dealing with Intra-class Variability

    Charlotte Revel, Yannick Deville, Véronique Achard, Xavier Briottet
    Subjects: Methodology (stat.ME); Computer Vision and Pattern Recognition (cs.CV); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

    Blind source separation is a common processing tool to analyse the
    constitution of pixels of hyperspectral images. Such methods usually suppose
    that pure pixel spectra (endmembers) are the same in all the image for each
    class of materials. In the framework of remote sensing, such an assumption is
    no more valid in the presence of intra-class variabilities due to illumination
    conditions, weathering, slight variations of the pure materials, etc… In this
    paper, we first describe the results of investigations highlighting intra-class
    variability measured in real images. Considering these results, a new
    formulation of the linear mixing model is presented leading to two new methods.
    Unconstrained Pixel-by-pixel NMF (UP-NMF) is a new blind source separation
    method based on the assumption of a linear mixing model, which can deal with
    intra-class variability. To overcome UP-NMF limitations an extended method is
    proposed, named Inertia-constrained Pixel-by-pixel NMF (IP-NMF). For each
    sensed spectrum, these extended versions of NMF extract a corresponding set of
    source spectra. A constraint is set to limit the spreading of each source’s
    estimates in IP-NMF. The methods are tested on a semi-synthetic data set built
    with spectra extracted from a real hyperspectral image and then numerically
    mixed. We thus demonstrate the interest of our methods for realistic source
    variabilities. Finally, IP-NMF is tested on a real data set and it is shown to
    yield better performance than state of the art methods.

    Robot gains Social Intelligence through Multimodal Deep Reinforcement Learning

    Ahmed Hussain Qureshi, Yutaka Nakamura, Yuichiro Yoshikawa, Hiroshi Ishiguro
    Comments: The paper is published in IEEE-RAS International Conference on Humanoid Robots (Humanoids) 2016
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    For robots to coexist with humans in a social world like ours, it is crucial
    that they possess human-like social interaction skills. Programming a robot to
    possess such skills is a challenging task. In this paper, we propose a
    Multimodal Deep Q-Network (MDQN) to enable a robot to learn human-like
    interaction skills through a trial and error method. This paper aims to develop
    a robot that gathers data during its interaction with a human and learns human
    interaction behaviour from the high-dimensional sensory information using
    end-to-end reinforcement learning. This paper demonstrates that the robot was
    able to learn basic interaction skills successfully, after 14 days of
    interacting with people.

    Sequence-based Multimodal Apprenticeship Learning For Robot Perception and Decision Making

    Fei Han, Xue Yang, Yu Zhang, Hao Zhang
    Comments: 8 pages, 6 figures, accepted by ICRA’17
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    Apprenticeship learning has recently attracted a wide attention due to its
    capability of allowing robots to learn physical tasks directly from
    demonstrations provided by human experts. Most previous techniques assumed that
    the state space is known a priori or employed simple state representations that
    usually suffer from perceptual aliasing. Different from previous research, we
    propose a novel approach named Sequence-based Multimodal Apprenticeship
    Learning (SMAL), which is capable to simultaneously fusing temporal information
    and multimodal data, and to integrate robot perception with decision making. To
    evaluate the SMAL approach, experiments are performed using both simulations
    and real-world robots in the challenging search and rescue scenarios. The
    empirical study has validated that our SMAL approach can effectively learn
    plans for robots to make decisions using sequence of multimodal observations.
    Experimental results have also showed that SMAL outperforms the baseline
    methods using individual images.

    Building Usage Profiles Using Deep Neural Nets

    Domenic Curro, Konstantinos G. Derpanis, Andriy V. Miranskyy
    Subjects: Software Engineering (cs.SE); Computer Vision and Pattern Recognition (cs.CV)

    To improve software quality, one needs to build test scenarios resembling the
    usage of a software product in the field. This task is rendered challenging
    when a product’s customer base is large and diverse. In this scenario, existing
    profiling approaches, such as operational profiling, are difficult to apply. In
    this work, we consider publicly available video tutorials of a product to
    profile usage. Our goal is to construct an automatic approach to extract
    information about user actions from instructional videos. To achieve this goal,
    we use a Deep Convolutional Neural Network (DCNN) to recognize user actions.
    Our pilot study shows that a DCNN trained to recognize user actions in video
    can classify five different actions in a collection of 236 publicly available
    Microsoft Word tutorial videos (published on YouTube). In our empirical
    evaluation we report a mean average precision of 94.42% across all actions.
    This study demonstrates the efficacy of DCNN-based methods for extracting
    software usage information from videos. Moreover, this approach may aid in
    other software engineering activities that require information about customer
    usage of a product.

    Continuous-Time Visual-Inertial Trajectory Estimation with Event Cameras

    Elias Mueggler, Guillermo Gallego, Henri Rebecq, Davide Scaramuzza
    Comments: 14 pages, 12 figures, 4 tables
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    In contrast to traditional cameras, which output images at a fixed rate,
    event cameras have independent pixels that output asynchronous pixel-level
    brightness changes with microsecond resolution. In this paper, we leverage a
    continuous-time framework to perform trajectory estimation by fusing visual
    data from a moving event camera with inertial data from an IMU. This framework
    allows direct integration of the asynchronous events with micro-second accuracy
    and the inertial measurements at high frequency. The pose trajectory is
    approximated by a smooth curve in the space of rigid-body motions using cubic
    splines. This formulation significantly reduces the number of variables in
    trajectory estimation problems. We evaluate our method on real data from
    several scenes and compare the results against ground truth from a
    motion-capture system. We show superior performance of the proposed technique
    compared to non-batch event-based algorithms. We also show that both the map
    orientation and scale can be recovered accurately by fusing events and inertial
    data. To the best of our knowledge, this is the first work on visual-inertial
    fusion with event cameras using a continuous-time framework.

    Emergence of Selective Invariance in Hierarchical Feed Forward Networks

    Dipan K. Pal, Vishnu Boddeti, Marios Savvides
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Many theories have emerged which investigate how in- variance is generated in
    hierarchical networks through sim- ple schemes such as max and mean pooling.
    The restriction to max/mean pooling in theoretical and empirical studies has
    diverted attention away from a more general way of generating invariance to
    nuisance transformations. We con- jecture that hierarchically building
    selective invariance (i.e. carefully choosing the range of the transformation
    to be in- variant to at each layer of a hierarchical network) is im- portant
    for pattern recognition. We utilize a novel pooling layer called adaptive
    pooling to find linear pooling weights within networks. These networks with the
    learnt pooling weights have performances on object categorization tasks that
    are comparable to max/mean pooling networks. In- terestingly, adaptive pooling
    can converge to mean pooling (when initialized with random pooling weights),
    find more general linear pooling schemes or even decide not to pool at all. We
    illustrate the general notion of selective invari- ance through object
    categorization experiments on large- scale datasets such as SVHN and ILSVRC
    2012.


    Artificial Intelligence

    Embedding Knowledge Graphs Based on Transitivity and Antisymmetry of Rules

    Mengya Wang, Hankui Zhuo, Huiling Zhu
    Subjects: Artificial Intelligence (cs.AI)

    Representation learning of knowledge graphs encodes entities and relation
    types into a continuous low-dimensional vector space, learns embeddings of
    entities and relation types. Most existing methods only concentrate on
    knowledge triples, ignoring logic rules which contain rich background
    knowledge. Although there has been some work aiming at leveraging both
    knowledge triples and logic rules, they ignore the transitivity and
    antisymmetry of logic rules. In this paper, we propose a novel approach to
    learn knowledge representations with entities and ordered relations in
    knowledges and logic rules. The key idea is to integrate knowledge triples and
    logic rules, and approximately order the relation types in logic rules to
    utilize the transitivity and antisymmetry of logic rules. All entries of the
    embeddings of relation types are constrained to be non-negative. We translate
    the general constrained optimization problem into an unconstrained optimization
    problem to solve the non-negative matrix factorization. Experimental results
    show that our model significantly outperforms other baselines on knowledge
    graph completion task. It indicates that our model is capable of capturing the
    transitivity and antisymmetry information, which is significant when learning
    embeddings of knowledge graphs.

    Robot gains Social Intelligence through Multimodal Deep Reinforcement Learning

    Ahmed Hussain Qureshi, Yutaka Nakamura, Yuichiro Yoshikawa, Hiroshi Ishiguro
    Comments: The paper is published in IEEE-RAS International Conference on Humanoid Robots (Humanoids) 2016
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    For robots to coexist with humans in a social world like ours, it is crucial
    that they possess human-like social interaction skills. Programming a robot to
    possess such skills is a challenging task. In this paper, we propose a
    Multimodal Deep Q-Network (MDQN) to enable a robot to learn human-like
    interaction skills through a trial and error method. This paper aims to develop
    a robot that gathers data during its interaction with a human and learns human
    interaction behaviour from the high-dimensional sensory information using
    end-to-end reinforcement learning. This paper demonstrates that the robot was
    able to learn basic interaction skills successfully, after 14 days of
    interacting with people.

    Sequence-based Multimodal Apprenticeship Learning For Robot Perception and Decision Making

    Fei Han, Xue Yang, Yu Zhang, Hao Zhang
    Comments: 8 pages, 6 figures, accepted by ICRA’17
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    Apprenticeship learning has recently attracted a wide attention due to its
    capability of allowing robots to learn physical tasks directly from
    demonstrations provided by human experts. Most previous techniques assumed that
    the state space is known a priori or employed simple state representations that
    usually suffer from perceptual aliasing. Different from previous research, we
    propose a novel approach named Sequence-based Multimodal Apprenticeship
    Learning (SMAL), which is capable to simultaneously fusing temporal information
    and multimodal data, and to integrate robot perception with decision making. To
    evaluate the SMAL approach, experiments are performed using both simulations
    and real-world robots in the challenging search and rescue scenarios. The
    empirical study has validated that our SMAL approach can effectively learn
    plans for robots to make decisions using sequence of multimodal observations.
    Experimental results have also showed that SMAL outperforms the baseline
    methods using individual images.

    Strongly-Typed Agents are Guaranteed to Interact Safely

    David Balduzzi
    Comments: 13 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

    As artificial agents proliferate, it is becoming increasingly important to
    ensure that their interactions with one another are well-behaved. In this
    paper, we formalize a common-sense notion of when algorithms are well-behaved:
    an algorithm is safe if it does no harm. Motivated by recent progress in deep
    learning, we focus on the specific case where agents update their actions
    according to gradient descent. The first result is that gradient descent
    converges to a Nash equilibrium in safe games.

    The paper provides sufficient conditions that guarantee safe interactions.
    The main contribution is to define strongly-typed agents and show they are
    guaranteed to interact safely. A series of examples show that strong-typing
    generalizes certain key features of convexity and is closely related to blind
    source separation. The analysis introduce a new perspective on classical
    multilinear games based on tensor decomposition.

    Unitary-Group Invariant Kernels and Features from Transformed Unlabeled Data

    Dipan K. Pal, Marios Savvides
    Comments: 11 page main paper (including references), 2 page supplementary, for a total of 13 pages. Submitted for review at ICLR 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    The study of representations invariant to common transformations of the data
    is important to learning. Most techniques have focused on local approximate
    invariance implemented within expensive optimization frameworks lacking
    explicit theoretical guarantees. In this paper, we study kernels that are
    invariant to the unitary group while having theoretical guarantees in
    addressing practical issues such as (1) unavailability of transformed versions
    of labelled data and (2) not observing all transformations. We present a
    theoretically motivated alternate approach to the invariant kernel SVM. Unlike
    previous approaches to the invariant SVM, the proposed formulation solves both
    issues mentioned. We also present a kernel extension of a recent technique to
    extract linear unitary-group invariant features addressing both issues and
    extend some guarantees regarding invariance and stability. We present
    experiments on the UCI ML datasets to illustrate and validate our methods.


    Information Retrieval

    Consistent Alignment of Word Embedding Models

    Cem Safak Sahin, Rajmonda S. Caceres, Brandon Oselio, William M. Campbell
    Comments: 4 pages, 2 figures
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (stat.ML)

    Word embedding models offer continuous vector representations that can
    capture rich contextual semantics based on their word co-occurrence patterns.
    While these word vectors can provide very effective features used in many NLP
    tasks such as clustering similar words and inferring learning relationships,
    many challenges and open research questions remain. In this paper, we propose a
    solution that aligns variations of the same model (or different models) in a
    joint low-dimensional latent space leveraging carefully generated synthetic
    data points. This generative process is inspired by the observation that a
    variety of linguistic relationships is captured by simple linear operations in
    embedded space. We demonstrate that our approach can lead to substantial
    improvements in recovering embeddings of local neighborhoods.


    Computation and Language

    Consistent Alignment of Word Embedding Models

    Cem Safak Sahin, Rajmonda S. Caceres, Brandon Oselio, William M. Campbell
    Comments: 4 pages, 2 figures
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (stat.ML)

    Word embedding models offer continuous vector representations that can
    capture rich contextual semantics based on their word co-occurrence patterns.
    While these word vectors can provide very effective features used in many NLP
    tasks such as clustering similar words and inferring learning relationships,
    many challenges and open research questions remain. In this paper, we propose a
    solution that aligns variations of the same model (or different models) in a
    joint low-dimensional latent space leveraging carefully generated synthetic
    data points. This generative process is inspired by the observation that a
    variety of linguistic relationships is captured by simple linear operations in
    embedded space. We demonstrate that our approach can lead to substantial
    improvements in recovering embeddings of local neighborhoods.

    Use Generalized Representations, But Do Not Forget Surface Features

    Nafise Sadat Moosavi, Michael Strube
    Comments: CORBON workshop@EACL 2017
    Subjects: Computation and Language (cs.CL)

    Only a year ago, all state-of-the-art coreference resolvers were using an
    extensive amount of surface features. Recently, there was a paradigm shift
    towards using word embeddings and deep neural networks, where the use of
    surface features is very limited. In this paper, we show that a simple SVM
    model with surface features outperforms more complex neural models for
    detecting anaphoric mentions. Our analysis suggests that using generalized
    representations and surface features have different strength that should be
    both taken into account for improving coreference resolution.

    Dirichlet-vMF Mixture Model

    Shaohua Li
    Comments: 5 pages
    Subjects: Computation and Language (cs.CL)

    This document is about the multi-document Von-Mises-Fisher mixture model with
    a Dirichlet prior, referred to as VMFMix. VMFMix is analogous to Latent
    Dirichlet Allocation (LDA) in that they can capture the co-occurrence patterns
    acorss multiple documents. The difference is that in VMFMix, the topic-word
    distribution is defined on a continuous n-dimensional hypersphere. Hence VMFMix
    is used to derive topic embeddings, i.e., representative vectors, from multiple
    sets of embedding vectors. An efficient Variational Expectation-Maximization
    inference algorithm is derived. The performance of VMFMix on two document
    classification tasks is reported, with some preliminary analysis.


    Distributed, Parallel, and Cluster Computing

    DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge

    Chen Wu, Rodrigo Tobar, Kevin Vinsen, Andreas Wicenec, Dave Pallot, Baoqiang Lao, Ruonan Wang, Tao An, Mark Boulton, Ian Cooper, Richard Dodson, Markus Dolensky, Ying Mei, Feng Wang
    Comments: 31 pages, 12 figures, currently under review by Astronomy and Computing
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Instrumentation and Detectors (physics.ins-det)

    The Data Activated Liu Graph Engine – DALiuGE – is an execution framework for
    processing large astronomical datasets at a scale required by the Square
    Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex
    data reduction pipelines consisting of both data sets and algorithmic
    components and an implementation run-time to execute such pipelines on
    distributed resources. By mapping the logical view of a pipeline to its
    physical realisation, DALiuGE separates the concerns of multiple stakeholders,
    allowing them to collectively optimise large-scale data processing solutions in
    a coherent manner. The execution in DALiuGE is data-activated, where each
    individual data item autonomously triggers the processing on itself. Such
    decentralisation also makes the execution framework very scalable and flexible,
    supporting pipeline sizes ranging from less than ten tasks running on a laptop
    to tens of millions of concurrent tasks on the second fastest supercomputer in
    the world. DALiuGE has been used in production for reducing interferometry data
    sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide
    Spectral Radioheliograph; and is being developed as the execution framework
    prototype for the Science Data Processor (SDP) consortium of the Square
    Kilometre Array (SKA) telescope. This paper presents a technical overview of
    DALiuGE and discusses case studies from the CHILES and MUSER projects that use
    DALiuGE to execute production pipelines. In a companion paper, we provide
    in-depth analysis of DALiuGE’s scalability to very large numbers of tasks on
    two supercomputing facilities.

    Compact Self-Stabilizing Leader Election for Arbitrary Networks

    Lélia Blin, Sébastien Tixeuil
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    We present a self-stabilizing leader election algorithm for arbitrary
    networks, with space-complexity (O(max{log Delta, log log n})) bits per
    node in (n)-node networks with maximum degree~(Delta). This space complexity
    is sub-logarithmic in (n) as long as (Delta = n^{o(1)}). The best
    space-complexity known so far for arbitrary networks was (O(log n)) bits per
    node, and algorithms with sub-logarithmic space-complexities were known for the
    ring only. To our knowledge, our algorithm is the first algorithm for
    self-stabilizing leader election to break the (Omega(log n)) bound for silent
    algorithms in arbitrary networks. Breaking this bound was obtained via the
    design of a (non-silent) self-stabilizing algorithm using sophisticated tools
    such as solving the distance-2 coloring problem in a silent self-stabilizing
    manner, with space-complexity (O(max{log Delta, log log n})) bits per
    node. Solving this latter coloring problem allows us to implement a
    sub-logarithmic encoding of spanning trees — storing the IDs of the neighbors
    requires (Omega(log n)) bits per node, while we encode spanning trees using
    (O(max{log Delta, log log n})) bits per node. Moreover, we show how to
    construct such compactly encoded spanning trees without relying on variables
    encoding distances or number of nodes, as these two types of variables would
    also require (Omega(log n)) bits per node.

    Medical Image Retrieval Based On the Parallelization of the Cluster Sampling Algorithm

    Hesham Arafat Ali, Salah Attiya, Ibrahim El-henawy
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In this paper we develop parallel cluster sampling algorithms and show that a
    multi-chain version is embarrassingly parallel and can be used efficiently for
    medical image retrieval among other applications.

    Streaming supercomputing needs workflow-enabled programming-in-the-large

    Justin M Wozniak, Jonathan Ozik, Daniel S. Katz, Michael Wilde
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    This is a position paper, submitted to the Future Online Analysis Platform
    Workshop (this https URL), which argues that simple
    data analysis applications are common today, but future online supercomputing
    workloads will need to couple multiple advanced technologies (streams, caches,
    analysis, and simulations) to rapidly deliver scientific results. Each of these
    technologies are active research areas when integrated with high-performance
    computing. These components will interact in complex ways, therefore coupling
    them needs to be programmed. Programming in the large, on top of existing
    applications, enables us to build much more capable applications and to
    productively manage this complexity.

    A Debt-Aware Learning Approach for Resource Adaptations in Cloud Elasticity Management

    Carlos Mera-Gómez, Francisco Ramírez, Rami Bahsoon, Rajkumar Buyya
    Subjects: Software Engineering (cs.SE); Distributed, Parallel, and Cluster Computing (cs.DC)

    Elasticity is a cloud property that enables applications and its execution
    systems to dynamically acquire and release shared computational resources on
    demand. Moreover, it unfolds the advantage of economies of scale in the cloud
    through a drop in the average costs of these shared resources. However, it is
    still an open challenge to achieve a perfect match between resource demand and
    provision in autonomous elasticity management. Resource adaptation decisions
    essentially involve a trade-off between economics and performance, which
    produces a gap between the ideal and actual resource provisioning. This gap, if
    not properly managed, can negatively impact the aggregate utility of a cloud
    customer in the long run. To address this limitation, we propose a technical
    debt-aware learning approach for autonomous elasticity management based on a
    reinforcement learning of elasticity debts in resource provisioning; the
    adaptation pursues strategic decisions that trades off economics against
    performance. We extend CloudSim and Burlap to evaluate our approach. The
    evaluation shows that a reinforcement learning of technical debts in elasticity
    obtains a higher utility for a cloud customer, while conforming expected levels
    of performance.

    Making Asynchronous Distributed Computations Robust to Noise

    Keren Censor-Hillel, Ran Gelles, Bernhard Haeupler
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

    We consider the problem of making distributed computations robust to noise,
    in particular to worst-case (adversarial) corruptions of messages. We give a
    general distributed interactive coding scheme which simulates any asynchronous
    distributed protocol while tolerating an optimal corruption of a (Theta(1/n))
    fraction of all messages while incurring a moderate blowup of (O(nlog^2 n)) in
    the communication complexity.

    Our result is the first fully distributed interactive coding scheme in which
    the topology of the communication network is not known in advance. Prior work
    required either a coordinating node to be connected to all other nodes in the
    network or assumed a synchronous network in which all nodes already know the
    complete topology of the network.


    Learning

    Tight Bounds for Bandit Combinatorial Optimization

    Alon Cohen, Tamir Hazan, Tomer Koren
    Subjects: Learning (cs.LG)

    We revisit the study of optimal regret rates in bandit combinatorial
    optimization—a fundamental framework for sequential decision making under
    uncertainty that abstracts numerous combinatorial prediction problems. We prove
    that the attainable regret in this setting grows as
    (widetilde{Theta}(k^{3/2}sqrt{dT})) where (d) is the dimension of the
    problem and (k) is a bound over the maximal instantaneous loss, disproving a
    conjecture of Audibert, Bubeck, and Lugosi (2013) who argued that the optimal
    rate should be of the form (widetilde{Theta}(ksqrt{dT})). Our bounds apply
    to several important instances of the framework, and in particular, imply a
    tight bound for the well-studied bandit shortest path problem. By that, we also
    resolve an open problem posed by Cesa-Bianchi and Lugosi (2012).

    Online Meta-learning by Parallel Algorithm Competition

    Stefan Elfwing, Eiji Uchibe, Kenji Doya
    Comments: 15 pages, 10 figures. arXiv admin note: text overlap with arXiv:1702.03118
    Subjects: Learning (cs.LG)

    The efficiency of reinforcement learning algorithms depends critically on a
    few meta-parameters that modulates the learning updates and the trade-off
    between exploration and exploitation. The adaptation of the meta-parameters is
    an open question in reinforcement learning, which arguably has become more of
    an issue recently with the success of deep reinforcement learning in
    high-dimensional state spaces. The long learning times in domains such as Atari
    2600 video games makes it not feasible to perform comprehensive searches of
    appropriate meta-parameter values. We propose the Online Meta-learning by
    Parallel Algorithm Competition (OMPAC) method. In the OMPAC method, several
    instances of a reinforcement learning algorithm are run in parallel with small
    differences in the initial values of the meta-parameters. After a fixed number
    of episodes, the instances are selected based on their performance in the task
    at hand. Before continuing the learning, Gaussian noise is added to the
    meta-parameters with a predefined probability. We validate the OMPAC method by
    improving the state-of-the-art results in stochastic SZ-Tetris and in standard
    Tetris with a smaller, 10( imes)10, board, by 31% and 84%, respectively, and
    by improving the results for deep Sarsa((lambda)) agents in three Atari 2600
    games by 62% or more. The experiments also show the ability of the OMPAC method
    to adapt the meta-parameters according to the learning progress in different
    tasks.

    Strongly-Typed Agents are Guaranteed to Interact Safely

    David Balduzzi
    Comments: 13 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

    As artificial agents proliferate, it is becoming increasingly important to
    ensure that their interactions with one another are well-behaved. In this
    paper, we formalize a common-sense notion of when algorithms are well-behaved:
    an algorithm is safe if it does no harm. Motivated by recent progress in deep
    learning, we focus on the specific case where agents update their actions
    according to gradient descent. The first result is that gradient descent
    converges to a Nash equilibrium in safe games.

    The paper provides sufficient conditions that guarantee safe interactions.
    The main contribution is to define strongly-typed agents and show they are
    guaranteed to interact safely. A series of examples show that strong-typing
    generalizes certain key features of convexity and is closely related to blind
    source separation. The analysis introduce a new perspective on classical
    multilinear games based on tensor decomposition.

    Bandits with Movement Costs and Adaptive Pricing

    Tomer Koren, Roi Livni, Yishay Mansour
    Subjects: Learning (cs.LG); Computer Science and Game Theory (cs.GT)

    We extend the model of Multi-armed Bandit with unit switching cost to
    incorporate a metric between the actions. We consider the case where the metric
    over the actions can be modeled by a complete binary tree, and the distance
    between two leaves is the size of the subtree of their least common ancestor,
    which abstracts the case that the actions are points on the continuous interval
    ([0,1]) and the switching cost is their distance. In this setting, we give a
    new algorithm that establishes a regret of (widetilde{O}(sqrt{kT} + T/k)),
    where (k) is the number of actions and (T) is the time horizon. When the set of
    actions corresponds to whole ([0,1]) interval we can exploit our method for the
    task of bandit learning with Lipschitz loss functions, where our algorithm
    achieves an optimal regret rate of (widetilde{Theta}(T^{2/3})), which is the
    same rate one obtains when there is no penalty for movements. As our main
    application, we use our new algorithm to solve an adaptive pricing problem.
    Specifically, we consider the case of a single seller faced with a stream of
    patient buyers. Each buyer has a private value and a window of time in which
    they are interested in buying, and they buy at the lowest price in the window,
    if it is below their value. We show that with an appropriate discretization of
    the prices, the seller can achieve a regret of (widetilde{O}(T^{2/3}))
    compared to the best fixed price in hindsight, which outperform the previous
    regret bound of (widetilde{O}(T^{3/4})) for the problem.

    Computationally Efficient Robust Estimation of Sparse Functionals

    Simon S. Du, Sivaraman Balakrishnan, Aarti Singh
    Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Learning (cs.LG)

    Many conventional statistical procedures are extremely sensitive to seemingly
    minor deviations from modeling assumptions. This problem is exacerbated in
    modern high-dimensional settings, where the problem dimension can grow with and
    possibly exceed the sample size. We consider the problem of robust estimation
    of sparse functionals, and provide a computationally and statistically
    efficient algorithm in the high-dimensional setting. Our theory identifies a
    unified set of deterministic conditions under which our algorithm guarantees
    accurate recovery. By further establishing that these deterministic conditions
    hold with high-probability for a wide range of statistical models, our theory
    applies to many problems of considerable interest including sparse mean and
    covariance estimation; sparse linear regression; and sparse generalized linear
    models.

    Bayes-Optimal Entropy Pursuit for Active Choice-Based Preference Learning

    Stephen N. Pallone, Peter I. Frazier, Shane G. Henderson
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    We analyze the problem of learning a single user’s preferences in an active
    learning setting, sequentially and adaptively querying the user over a finite
    time horizon. Learning is conducted via choice-based queries, where the user
    selects her preferred option among a small subset of offered alternatives.
    These queries have been shown to be a robust and efficient way to learn an
    individual’s preferences. We take a parametric approach and model the user’s
    preferences through a linear classifier, using a Bayesian prior to encode our
    current knowledge of this classifier. The rate at which we learn depends on the
    alternatives offered at every time epoch. Under certain noise assumptions, we
    show that the Bayes-optimal policy for maximally reducing entropy of the
    posterior distribution of this linear classifier is a greedy policy, and that
    this policy achieves a linear lower bound when alternatives can be constructed
    from the continuum. Further, we analyze a different metric called
    misclassification error, proving that the performance of the optimal policy
    that minimizes misclassification error is bounded below by a linear function of
    differential entropy. Lastly, we numerically compare the greedy entropy
    reduction policy with a knowledge gradient policy under a number of scenarios,
    examining their performance under both differential entropy and
    misclassification error.

    How ConvNets model Non-linear Transformations

    Dipan K. Pal, Marios Savvides
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In this paper, we theoretically address three fundamental problems involving
    deep convolutional networks regarding invariance, depth and hierarchy. We
    introduce the paradigm of Transformation Networks (TN) which are a direct
    generalization of Convolutional Networks (ConvNets). Theoretically, we show
    that TNs (and thereby ConvNets) are can be invariant to non-linear
    transformations of the input despite pooling over mere local translations. Our
    analysis provides clear insights into the increase in invariance with depth in
    these networks. Deeper networks are able to model much richer classes of
    transformations. We also find that a hierarchical architecture allows the
    network to generate invariance much more efficiently than a non-hierarchical
    network. Our results provide useful insight into these three fundamental
    problems in deep learning using ConvNets.

    Control of Gene Regulatory Networks with Noisy Measurements and Uncertain Inputs

    Mahdi Imani, Ulisses Braga-Neto
    Subjects: Molecular Networks (q-bio.MN); Learning (cs.LG); Machine Learning (stat.ML)

    This paper is concerned with the problem of stochastic control of gene
    regulatory networks (GRNs) observed indirectly through noisy measurements and
    with uncertainty in the intervention inputs. The partial observability of the
    gene states and uncertainty in the intervention process are accounted for by
    modeling GRNs using the partially-observed Boolean dynamical system (POBDS)
    signal model with noisy gene expression measurements. Obtaining the optimal
    infinite-horizon control strategy for this problem is not attainable in
    general, and we apply reinforcement learning and Gaussian process techniques to
    find a near-optimal solution. The POBDS is first transformed to a
    directly-observed Markov Decision Process in a continuous belief space, and the
    Gaussian process is used for modeling the cost function over the belief and
    intervention spaces. Reinforcement learning then is used to learn the cost
    function from the available gene expression data. In addition, we employ
    sparsification, which enables the control of large partially-observed GRNs. The
    performance of the resulting algorithm is studied through a comprehensive set
    of numerical experiments using synthetic gene expression data generated from a
    melanoma gene regulatory network.

    RNN Decoding of Linear Block Codes

    Eliya Nachmani, Elad Marciano, David Burshtein, Yair Be'ery
    Subjects: Information Theory (cs.IT); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Designing a practical, low complexity, close to optimal, channel decoder for
    powerful algebraic codes with short to moderate block length is an open
    research problem. Recently it has been shown that a feed-forward neural network
    architecture can improve on standard belief propagation decoding, despite the
    large example space. In this paper we introduce a recurrent neural network
    architecture for decoding linear block codes. Our method shows comparable bit
    error rate results compared to the feed-forward neural network with
    significantly less parameters. We also demonstrate improved performance over
    belief propagation on sparser Tanner graph representations of the codes.
    Furthermore, we demonstrate that the RNN decoder can be used to improve the
    performance or alternatively reduce the computational complexity of the mRRD
    algorithm for low complexity, close to optimal, decoding of short BCH codes.

    Learning Rates for Kernel-Based Expectile Regression

    Muhammad Farooqn, Ingo Steinwart
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Conditional expectiles are becoming an increasingly important tool in finance
    as well as in other areas of applications. We analyse a support vector machine
    type approach for estimating conditional expectiles and establish learning
    rates that are minimax optimal modulo a logarithmic factor if Gaussian RBF
    kernels are used and the desired expectile is smooth in a Besov sense. As a
    special case, our learning rates improve the best known rates for kernel-based
    least squares regression in this scenario. Key ingredients of our statistical
    analysis are a general calibration inequality for the asymmetric least squares
    loss, a corresponding variance bound as well as an improved entropy number
    bound for Gaussian RBF kernels.

    Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning

    Briland Hitaj, Giuseppe Ateniese, Fernando Perez-Cruz
    Comments: 14 pages, 12 figures
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG); Machine Learning (stat.ML)

    In recent years, a branch of machine learning called Deep Learning has become
    incredibly popular thanks to the ability of a new class of algorithms to model
    and interpret a large quantity of data in a similar way to humans. Properly
    training deep learning models involves collecting a vast amount of users’
    private data, including habits, geographical positions, interests, and much
    more. Another major issue is that it is possible to extract from trained models
    useful information about the training set and this hinders collaboration among
    distrustful participants or parties that deal with sensitive information. To
    tackle this problem, collaborative deep learning models have recently been
    proposed where parties share only a subset of the parameters in the attempt to
    keep their respective training sets private. Parameters can also be obfuscated
    via differential privacy to make information extraction even more challenging,
    as shown by Shokri and Shmatikov at CCS’15. Unfortunately, we show that any
    privacy-preserving collaborative deep learning is susceptible to a powerful
    attack that we devise in this paper. In particular, we show that a distributed
    or decentralized deep learning approach is fundamentally broken and does not
    protect the training sets of honest participants. The attack we developed
    exploits the real-time nature of the learning process that allows the adversary
    to train a Generative Adversarial Network (GAN) that generates valid samples of
    the targeted training set that was meant to be private. Interestingly, we show
    that differential privacy applied to shared parameters of the model as
    suggested at CCS’15 and CCS’16 is utterly futile. In our generative model
    attack, all techniques adopted to scramble or obfuscate shared parameters in
    collaborative deep learning are rendered ineffective with no possibility of a
    remedy under the threat model considered.

    Sequence Modeling via Segmentations

    Chong Wang, Yining Wang, Po-Sen Huang, Abdelrahman Mohamed, Dengyong Zhou, Li Deng
    Comments: recurrent neural networks, dynamic programming, structured prediction
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Segmental structure is a common pattern in many types of sequences such as
    phrases in human languages. In this paper, we present a probabilistic model for
    sequences via their segmentations. The probability of a segmented sequence is
    calculated as the product of the probabilities of all its segments, where each
    segment is modeled using existing tools such as recurrent neural networks.
    Since the segmentation of a sequence is usually unknown in advance, we sum over
    all valid segmentations to obtain the final probability for the sequence. An
    efficient dynamic programming algorithm is developed for forward and backward
    computations without resorting to any approximation. We demonstrate our
    approach on text segmentation and speech recognition tasks. In addition to
    quantitative results, we also show that our approach can discover meaningful
    segments in their respective application contexts.

    Neural Decision Trees

    Randall Balestriero
    Comments: 11 pages
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    In this paper we propose a synergistic melting of neural networks and
    decision trees into a deep hashing neural network (HNN) having a modeling
    capability exponential with respect to its number of neurons. We first derive a
    soft decision tree named neural decision tree allowing the optimization of
    arbitrary decision function at each split node. We then rewrite this soft space
    partitioning as a new kind of neural network layer, namely the hashing layer
    (HL), which can be seen as a generalization of the known soft-max layer. This
    HL can easily replace the standard last layer of ANN in any known network
    topology and thus can be used after a convolutional or recurrent neural network
    for example. We present the modeling capacity of this deep hashing function on
    small datasets where one can reach at least equally good results as standard
    neural networks by diminishing the number of output neurons. Finally, we show
    that for the case where the number of output neurons is large, the neural
    network can mitigate the absence of linear decision boundaries by learning for
    each difficult class a collection of not necessarily connected sub-regions of
    the space leading to more flexible decision surfaces. Finally, the HNN can be
    seen as a deep locality sensitive hashing function which can be trained in a
    supervised or unsupervised setting as we will demonstrate for classification
    and regression problems.


    Information Theory

    Bounds on the reliability of typewriter channels

    M. Dalai, Y. Polyanskiy
    Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

    New lower and upper bounds on the reliability function of typewriter channels
    are given. Our lower bounds improve upon the (multiletter) expurgated bound of
    Gallager, furnishing a new and simple counterexample to a conjecture made in
    1967 by Shannon, Gallager and Berlekamp on its tightness. The only other known
    counterexample is due to Katsman, Tsfasman and Vlu{a}duc{t} who used
    algebraic-geometric codes on a (q)-ary symmetric channels, (qgeq 49). Here we
    prove, by introducing dependence between codewords of a random ensemble, that
    the conjecture is false even for a typewriter channel with (q=4) inputs. In the
    process, we also demonstrate that Lov’asz’s proof of the capacity of the
    pentagon was implicitly contained (but unnoticed!) in the works of Jelinek and
    Gallager on the expurgated bound done at least ten years before Lov’asz. In
    the opposite direction, new upper bounds on the reliability function are
    derived for channels with an odd number of inputs by using an adaptation of
    Delsarte’s linear programming bound. First we derive a bound based on the
    minimum distance, which combines Lov’asz’s construction for bounding the graph
    capacity with the McEliece-Rodemich-Rumsey-Welch construction for bounding the
    minimum distance of codes in the Hamming space. Then, for the particular case
    of cross-over probability (1/2), we derive an improved bound by also using the
    method of Kalai and Linial to study the spectrum distribution of codes.

    Crosscorrelation of Rudin-Shapiro-Like Polynomials

    Daniel J. Katz, Sangman Lee, Stanislav A. Trunov
    Comments: 30 pages
    Subjects: Information Theory (cs.IT); Complex Variables (math.CV); Number Theory (math.NT)

    We consider the Rudin-Shapiro-like polynomials, whose (L^4) norms on the
    complex unit circle were studied by Borwein and Mossinghoff. The polynomial
    (f(z)=f_0+f_1 z + cdots + f_d z^d) is identified with the sequence
    ((f_0,f_1,ldots,f_d)) of its coefficients. From the (L^4) norm of a
    polynomial, one can easily calculate the autocorrelation merit factor of its
    associated sequence, and conversely. In this paper, we study the
    crosscorrelation properties of pairs of sequences associated to
    Rudin-Shapiro-like polynomials. We find an explicit formula for the
    crosscorrelation merit factor. A computer search is then used to find pairs of
    Rudin-Shapiro-like polynomials whose autocorrelation and crosscorrelation merit
    factors are simultaneously high. Pursley and Sarwate proved a bound that limits
    how good this combined autocorrelation and crosscorrelation performance can be.
    We find infinite families of polynomials whose performance approaches quite
    close to this fundamental limit.

    Capacity of the Aperture-Constrained AWGN Free-Space Communication Channel

    Richard J. Barton
    Subjects: Information Theory (cs.IT)

    In this paper, we derive upper and lower bounds as well as a simple
    closed-form approximation for the capacity of the continuous-time, bandlimited,
    additive white Gaussian noise channel in a three-dimensional free-space
    electromagnetic propagation environment subject to constraints on the total
    effective antenna aperture area of the link and a total transmitter power
    constraint. We assume that the communication range is much larger than the
    radius of the sphere containing the antennas at both ends of the link, and we
    show that, in general, the capacity can only be achieved by transmitting
    multiple spatially-multiplexed data streams simultaneously over the channel.
    Furthermore, the lower bound on capacity can be approached asymptotically by
    transmitting the data streams between a pair of physically-realizable
    distributed antenna arrays at either end of the link. A consequence of this
    result is that, in general, communication at close to the maximum achievable
    data rate on a deep-space communication link can be achieved in practice if and
    only if the communication system utilizes spatial multiplexing over a
    distributed MIMO antenna array. Such an approach to deep-space communication
    does not appear to be envisioned currently by any of the international space
    agencies or any commercial space companies. A second consequence is that the
    capacity of a long-range free-space communication link, if properly utilized,
    grows asymptotically as a function of the square root of the received SNR
    rather than only logarithmically in the received SNR.

    Multipath Error Correction in Radio Interferometric Positioning Systems

    Cheng Zhang, Wangdong Qi, Li Wei, Jiang Chang, Yuexin Zhao
    Comments: 5 pages, 5 figures
    Subjects: Information Theory (cs.IT)

    The radio interferometric positioning system (RIPS) is an accurate node
    localization method featuring a novel phase-based ranging process. Multipath is
    the limiting error source for RIPS in ground-deployed scenarios or indoor
    applications. There are four distinct channels involved in the ranging process
    for RIPS. Multipath reflections affect both the phase and amplitude of the
    ranging signal for each channel. By exploiting untapped amplitude information,
    we put forward a scheme to estimate each channel’s multipath profile, which is
    then subsequently used to correct corresponding errors in phase measurements.
    Simulations show that such a scheme is very effective in reducing multipath
    phase errors, which are essentially brought down to the level of receiver noise
    under moderate multipath conditions. It is further demonstrated that ranging
    errors in RIPS are also greatly reduced via the proposed scheme.

    On the Total Energy Efficiency of Cell-Free Massive MIMO

    Hien Quoc Ngo, Le-Nam Tran, Trung Q. Duong, Michail Matthaiou, Erik G. Larsson
    Subjects: Information Theory (cs.IT)

    We consider the cell-free massive multiple-input multiple-output (MIMO)
    downlink, where a very large number of distributed multiple-antenna access
    points (APs) serve many single-antenna users in the same time-frequency
    resource. A simple (distributed) conjugate beamforming scheme is applied at
    each AP via the use of local channel state information (CSI). This CSI is
    acquired through time-division duplex operation and the reception of uplink
    training signals transmitted by the users. We derive a closed-form expression
    for the spectral efficiency taking into account the effects of channel
    estimation errors and power control. This closed-form result enables us to
    analyze the effects of backhaul power consumption, the number of APs, and the
    number of antennas per AP on the total energy efficiency, as well as, to design
    an optimal power allocation algorithm. The optimal power allocation algorithm
    aims at maximizing the total energy efficiency, subject to a per-user spectral
    efficiency constraint and a per-AP power constraint. Compared with the case of
    without power control, our proposed power allocation scheme can double the
    total energy efficiency. Furthermore, we propose AP selections schemes, in
    which each user chooses a subset of APs, to reduce the power consumption caused
    by the backhaul links. With our proposed AP selection schemes, the total energy
    efficiency increases significantly, especially for large numbers of APs.
    Moreover, under a requirement of good quality-of-service for all users,
    cell-free massive MIMO outperforms the colocated counterpart in terms of energy
    efficiency.

    RNN Decoding of Linear Block Codes

    Eliya Nachmani, Elad Marciano, David Burshtein, Yair Be'ery
    Subjects: Information Theory (cs.IT); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Designing a practical, low complexity, close to optimal, channel decoder for
    powerful algebraic codes with short to moderate block length is an open
    research problem. Recently it has been shown that a feed-forward neural network
    architecture can improve on standard belief propagation decoding, despite the
    large example space. In this paper we introduce a recurrent neural network
    architecture for decoding linear block codes. Our method shows comparable bit
    error rate results compared to the feed-forward neural network with
    significantly less parameters. We also demonstrate improved performance over
    belief propagation on sparser Tanner graph representations of the codes.
    Furthermore, we demonstrate that the RNN decoder can be used to improve the
    performance or alternatively reduce the computational complexity of the mRRD
    algorithm for low complexity, close to optimal, decoding of short BCH codes.

    Optimal Energy Beamforming under Per-Antenna Power Constraint

    Zahra Rezaei, Ehsan Yazdian, Foroogh S. Tabataba
    Comments: 29 pages
    Subjects: Information Theory (cs.IT)

    Energy beamforming (EB) is a key technique to significantly enhance the
    efficiency of wireless power transfer (WPT). In this paper, we study optimal EB
    under per-antenna power constraint (PAC) which is more practical than
    conventional sum-power constraint (SPC) at multi antenna energy transmitter
    (ET). We consider a broadcast network, where one multi antenna ET with PAC,
    transfers wireless energy for energy receivers (ER)s which are randomly placed
    within the cell area. First, we consider sum energy maximization problem with
    PAC without fairness and provide the optimal solution structure for general
    case. This optimal structure implies that similar to SPC, sending one energy
    beam is optimal with PAC which means that the rank of transmit covariance
    matrix is one. We also derive closed-form solutions for two special cases and
    propose two sub-optimal solutions for general case, which are very close to
    optimal numerical results. To consider the fairness among the ERs, we further
    propose max-min fair problem with PAC, and analyze it for the special case of
    two transmit antennas. Simulation results show its advantages in comparison to
    the recent works in the literature.

    High Throughput Probabilistic Shaping with Product Distribution Matching

    Georg Böcherer, Patrick Schulte, Fabian Steiner
    Comments: Including 7 diagrams and 5 figures with simulation results
    Subjects: Information Theory (cs.IT)

    Product distribution matching (PDM) is proposed to generate target
    distributions over large alphabets by combining the output of several parallel
    distribution matchers (DMs) with smaller output alphabets. The parallel
    architecture of PDM enables low-complexity and high-throughput implementation.
    PDM is used as a shaping device for probabilistic amplitude shaping (PAS). For
    64-ASK and a spectral efficiency of 4.5 bits per channel use (bpcu), PDM is as
    power efficient as a single full-fledged DM. It is shown how PDM enables PAS
    for parallel channels present in multi-carrier systems like digital subscriber
    line (DSL) and orthogonal frequency-division multiplexing (OFDM). The key
    feature is that PDM shares the DMs for lower bit-levels among different
    sub-carriers, which improves the power efficiency significantly. A
    representative parallel channel example shows that PAS with PDM is 0.93 dB more
    power efficient than conventional uniform signaling and PDM is 0.35 dB more
    power efficient than individual per channel DMs.

    Secure Clustered Distributed Storage Against Eavesdroppers

    Beongjun Choi, Jy-yong Sohn, Sung Whan Yoon, Jaekyun Moon
    Comments: 6 pages, accepted at IEEE ICC 2017
    Subjects: Information Theory (cs.IT)

    This paper considers the security issue of practical distributed storage
    systems (DSSs) which consist of multiple clusters of storage nodes. Noticing
    that actual storage nodes constituting a DSS are distributed in multiple
    clusters, two novel eavesdropper models – the node-restricted model and the
    cluster-restricted model – are suggested which reflect the clustered nature of
    DSSs. In the node-restricted model, an eavesdropper cannot access the
    individual nodes, but can eavesdrop incoming/outgoing data for (L_c)
    compromised clusters. In the cluster-restricted model, an eavesdropper can
    access a total of (l) individual nodes but the number of accessible clusters is
    limited to (L_c). We provide an upper bound on the securely storable data for
    each model, while a specific network coding scheme which achieves the upper
    bound is obtained for the node-restricted model, given some mild condition on
    the node storage size.

    On the Optimality of Secret Key Agreement via Omniscience

    Chung Chan, Manuj Mukherjee, Navin Kashyap, Qiaoqiao Zhou
    Subjects: Information Theory (cs.IT)

    For the multiterminal secret key agreement problem under a private source
    model, it is known that the maximum key rate, i.e., the secrecy capacity, can
    be achieved through communication for omniscience, but the omniscience strategy
    can be strictly suboptimal in terms of minimizing the public discussion rate.
    While a single-letter characterization is not known for the minimum discussion
    rate needed for achieving the secrecy capacity, we derive single-letter lower
    and upper bounds that yield some simple conditions for omniscience to be
    discussion-rate optimal. These conditions turn out to be enough to deduce the
    optimality of omniscience for a large class of sources including the
    hypergraphical sources. Through conjectures and examples, we explore other
    source models to which our methods do not easily extend.

    Founsure 1.0: An Erasure Code Library with Efficient Repair and Update Features

    Şuayb Ş. Arslan
    Comments: To be submitted to Elsevier Open access journal SoftwareX, 2017
    Subjects: Information Theory (cs.IT)

    Founsure is an open-source software library, distributed under LGPLv3 license
    and implements a multi-dimensional graph-based erasure coding entirely based on
    fast exclusive OR (XOR) logic. Its implementation uses compiler optimizations
    to generate the right assembly for the given SIMD-enabled architectures.
    Founsure 1.0 supports a variety of features that shall find interesting
    applications in modern data storage as well as communication and computer
    network systems which are becoming hungry in terms of network bandwidth,
    computational resources and average consumed power. In particular, Founsure
    erasure code provides a three dimensional design space i.e., computation
    complexity, coding overhead and repair bandwidth to meet the requirements of
    modern distributed data storage and processing systems in which the data needs
    to be protected against device/hardware failure

    Bayes-Optimal Entropy Pursuit for Active Choice-Based Preference Learning

    Stephen N. Pallone, Peter I. Frazier, Shane G. Henderson
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    We analyze the problem of learning a single user’s preferences in an active
    learning setting, sequentially and adaptively querying the user over a finite
    time horizon. Learning is conducted via choice-based queries, where the user
    selects her preferred option among a small subset of offered alternatives.
    These queries have been shown to be a robust and efficient way to learn an
    individual’s preferences. We take a parametric approach and model the user’s
    preferences through a linear classifier, using a Bayesian prior to encode our
    current knowledge of this classifier. The rate at which we learn depends on the
    alternatives offered at every time epoch. Under certain noise assumptions, we
    show that the Bayes-optimal policy for maximally reducing entropy of the
    posterior distribution of this linear classifier is a greedy policy, and that
    this policy achieves a linear lower bound when alternatives can be constructed
    from the continuum. Further, we analyze a different metric called
    misclassification error, proving that the performance of the optimal policy
    that minimizes misclassification error is bounded below by a linear function of
    differential entropy. Lastly, we numerically compare the greedy entropy
    reduction policy with a knowledge gradient policy under a number of scenarios,
    examining their performance under both differential entropy and
    misclassification error.

    On Polynomial Time Methods for Exact Low Rank Tensor Completion

    Dong Xia, Ming Yuan
    Comments: 56 pages, 4 figures
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    In this paper, we investigate the sample size requirement for exact recovery
    of a high order tensor of low rank from a subset of its entries. We show that a
    gradient descent algorithm with initial value obtained from a spectral method
    can, in particular, reconstruct a ({d imes d imes d}) tensor of multilinear
    ranks ((r,r,r)) with high probability from as few as
    (O(r^{7/2}d^{3/2}log^{7/2}d+r^7dlog^6d)) entries. In the case when the ranks
    (r=O(1)), our sample size requirement matches those for nuclear norm
    minimization (Yuan and Zhang, 2016a), or alternating least squares assuming
    orthogonal decomposability (Jain and Oh, 2014). Unlike these earlier
    approaches, however, our method is efficient to compute, easy to implement, and
    does not impose extra structures on the tensor. Numerical results are presented
    to further demonstrate the merits of the proposed approach.

    Sample complexity of population recovery

    Yury Polyanskiy, Ananda Theertha Suresh, Yihong Wu
    Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (stat.ML)

    The problem of population recovery refers to estimating a distribution based
    on incomplete or corrupted samples. Consider a random poll of sample size (n)
    conducted on a population of individuals, where each pollee is asked to answer
    (d) binary questions. We consider one of the two polling impediments: (a) in
    lossy population recovery, a pollee may skip each question with probability
    (epsilon); (b) in noisy population recovery, a pollee may lie on each question
    with probability (epsilon). Given (n) lossy or noisy samples, the goal is to
    estimate the probabilities of all (2^d) binary vectors simultaneously within
    accuracy (delta) with high probability.

    This paper settles the sample complexity of population recovery. For lossy
    model, the optimal sample complexity is ( ildeTheta(delta^{
    -2max{frac{epsilon}{1-epsilon},1}})), improving the state of the art by
    Moitra and Saks (2013) in several ways: a lower bound is established, the upper
    bound is improved and the result is dimension-free. Surprisingly, the sample
    complexity undergoes a phase transition from parametric to nonparametric rate
    when (epsilon) exceeds (1/2). For noisy population recovery, the sharp sample
    complexity turns out to be dimension-dependent and scales as
    (exp(Theta(d^{1/3} log^{2/3}(1/delta)))) except for the trivial cases of
    (epsilon=0,1/2) or (1).

    For both models, our estimators simply compute the empirical mean of a
    certain function, which is found by pre-solving a linear program (LP).
    Curiously, the dual LP can be understood as Le Cam’s method for lower-bounding
    the minimax risk, thus establishing the statistical optimality of the proposed
    estimators. The value of the LP is determined by complex-analytic methods.




沪ICP备19023445号-2号
友情链接