IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Fri, 2 Jun 2017

    我爱机器学习(52ml.net)发表于 2017-06-02 00:00:00
    love 0

    Neural and Evolutionary Computing

    Integer Echo State Networks: Hyperdimensional Reservoir Computing

    Denis Kleyko, Edward Paxon Frady, Evgeny Osipov
    Comments: 10 pages, 5 figures
    Subjects: Neural and Evolutionary Computing (cs.NE)

    We propose an integer approximation of Echo State Networks (ESN) based on the
    mathematics of hyperdimensional computing. The reservoir of the proposed
    Integer Echo State Network (intESN) contains only n-bits integers and replaces
    the recurrent matrix multiply with an efficient cyclic shift operation. Such an
    architecture results in dramatic improvements in memory footprint and
    computational efficiency, with minimal performance loss. Our architecture
    naturally supports the usage of the trained reservoir in symbolic processing
    tasks of analogy making and logical inference.

    Blind nonnegative source separation using biological neural networks

    Cengiz Pehlevan, Sreyas Mohan, Dmitri B. Chklovskii
    Comments: Accepted for publication in Neural Computation
    Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)

    Blind source separation, i.e. extraction of independent sources from a
    mixture, is an important problem for both artificial and natural signal
    processing. Here, we address a special case of this problem when sources (but
    not the mixing matrix) are known to be nonnegative, for example, due to the
    physical nature of the sources. We search for the solution to this problem that
    can be implemented using biologically plausible neural networks. Specifically,
    we consider the online setting where the dataset is streamed to a neural
    network. The novelty of our approach is that we formulate blind nonnegative
    source separation as a similarity matching problem and derive neural networks
    from the similarity matching objective. Importantly, synaptic weights in our
    networks are updated according to biologically plausible local learning rules.

    Transfer Learning for Speech Recognition on a Budget

    Julius Kunze, Louis Kirsch, Ilia Kurenkov, Andreas Krug, Jens Johannsmeier, Sebastian Stober
    Comments: Accepted for 2nd ACL Workshop on Representation Learning for NLP
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    End-to-end training of automated speech recognition (ASR) systems requires
    massive data and compute resources. We explore transfer learning based on model
    adaptation as an approach for training ASR models under constrained GPU memory,
    throughput and training data. We conduct several systematic experiments
    adapting a Wav2Letter convolutional neural network originally trained for
    English ASR to the German language. We show that this technique allows faster
    training on consumer-grade resources while requiring less training data in
    order to achieve the same accuracy, thereby lowering the cost of training ASR
    models in other languages. Model introspection revealed that small adaptations
    to the network’s weights were sufficient for good performance, especially for
    inner layers.

    Free energy-based reinforcement learning using a quantum processor

    Anna Levit, Daniel Crawford, Navid Ghadermarzy, Jaspreet S. Oberoi, Ehsan Zahedinejad, Pooya Ronagh
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC); Quantum Physics (quant-ph)

    Recent theoretical and experimental results suggest the possibility of using
    current and near-future quantum hardware in challenging sampling tasks. In this
    paper, we introduce free energy-based reinforcement learning (FERL) as an
    application of quantum hardware. We propose a method for processing a quantum
    annealer’s measured qubit spin configurations in approximating the free energy
    of a quantum Boltzmann machine (QBM). We then apply this method to perform
    reinforcement learning on the grid-world problem using the D-Wave 2000Q quantum
    annealer. The experimental results show that our technique is a promising
    method for harnessing the power of quantum sampling in reinforcement learning
    tasks.


    Computer Vision and Pattern Recognition

    Fader Networks: Manipulating Images by Sliding Attributes

    Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper introduces a new encoder-decoder architecture that is trained to
    reconstruct images by disentangling the salient information of the image and
    the values of attributes directly in the latent space. As a result, after
    training, our model can generate different realistic versions of an input image
    by varying the attribute values. By using continuous attribute values, we can
    choose how much a specific attribute is perceivable in the generated image.
    This property could allow for applications where users can modify an image
    using sliding knobs, like faders on a mixing console, to change the facial
    expression of a portrait, or to update the color of some objects. Compared to
    the state-of-the-art which mostly relies on training adversarial networks in
    pixel space by altering attribute values at train time, our approach results in
    much simpler training schemes and nicely scales to multiple attributes. We
    present evidence that our model can significantly change the perceived value of
    the attributes while preserving the naturalness of images.

    Line Profile Based Segmentation Algorithm for Touching Corn Kernels

    Ali Mahdi, Jun Qin
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Image segmentation of touching objects plays a key role in providing accurate
    classification for computer vision technologies. A new line profile based
    imaging segmentation algorithm has been developed to provide a robust and
    accurate segmentation of a group of touching corns. The performance of the line
    profile based algorithm has been compared to a watershed based imaging
    segmentation algorithm. Both algorithms are tested on three different patterns
    of images, which are isolated corns, single-lines, and random distributed
    formations. The experimental results show that the algorithm can segment a
    large number of touching corn kernels efficiently and accurately.

    DiracNets: Training Very Deep Neural Networks Without Skip-Connections

    Sergey Zagoruyko, Nikos Komodakis
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep neural networks with skip-connections, such as ResNet, show excellent
    performance in various image classification benchmarks. It is though observed
    that the initial motivation behind them – training deeper networks – does not
    actually hold true, and the benefits come from increased capacity, rather than
    from depth. Motivated by this, and inspired from ResNet, we propose a simple
    Dirac weight parameterization, which allows us to train very deep plain
    networks without skip-connections, and achieve nearly the same performance.
    This parameterization has a minor computational cost at training time and no
    cost at all at inference. We’re able to achieve 95.5% accuracy on CIFAR-10 with
    34-layer deep plain network, surpassing 1001-layer deep ResNet, and approaching
    Wide ResNet. Our parameterization also mostly eliminates the need of careful
    initialization in residual and non-residual networks. The code and models for
    our experiments are available at this https URL

    Deep Mutual Learning

    Ying Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu
    Comments: 10 pages, 4 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Model distillation is an effective and widely used technique to transfer
    knowledge from a teacher to a student network. The typical application is to
    transfer from a powerful large network or ensemble to a small network, that is
    better suited to low-memory or fast execution requirements. In this paper, we
    present a deep mutual learning (DML) strategy where, rather than one way
    transfer between a static pre-defined teacher and a student, an ensemble of
    students learn collaboratively and teach each other throughout the training
    process. Our experiments show that a variety of network architectures benefit
    from mutual learning and achieve compelling results on CIFAR-100 recognition
    and Market-1501 person re-identification benchmarks. Surprisingly, it is
    revealed that no prior powerful teacher network is necessary — mutual learning
    of a collection of simple student networks works, and moreover outperforms
    distillation from a more powerful yet static teacher.

    TransFlow: Unsupervised Motion Flow by Joint Geometric and Pixel-level Estimation

    Stefano Alletto, Davide Abati, Simone Calderara, Rita Cucchiara, Luca Rigazio
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We address unsupervised optical flow estimation for ego-centric motion. We
    argue that optical flow can be cast as a geometrical warping between two
    successive video frames and devise a deep architecture to estimate such
    transformation in two stages. First, a dense pixel-level flow is computed with
    a geometric prior imposing strong spatial constraints. Such prior is typical of
    driving scenes, where the point of view is coherent with the vehicle motion. We
    show how such global transformation can be approximated with an homography and
    how spatial transformer layers can be employed to compute the flow field
    implied by such transformation. The second stage then refines the prediction
    feeding a second deeper network. A final reconstruction loss compares the
    warping of frame X(t) with the subsequent frame X(t+1) and guides both
    estimates. The model, which we named TransFlow, performs favorably compared to
    other unsupervised algorithms, and shows better generalization compared to
    supervised methods with a 3x reduction in error on unseen data.

    An Effective Approach for Point Clouds Registration Based on the Hard and Soft Assignments

    Congcong Jin, Jihua Zhu, Yaochen Li, Shaoyi Du, Zhongyu Li, Huimin Lu
    Comments: 23 pages, 6 figures, 2 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    For the registration of partially overlapping point clouds, this paper
    proposes an effective approach based on both the hard and soft assignments.
    Given two initially posed clouds, it firstly establishes the forward
    correspondence for each point in the data shape and calculates the value of
    binary variable, which can indicate whether this point correspondence is
    located in the overlapping areas or not. Then, it establishes the bilateral
    correspondence and computes bidirectional distances for each point in the
    overlapping areas. Based on the ratio of bidirectional distances, the
    exponential function is selected and utilized to calculate the probability
    value, which can indicate the reliability of the point correspondence.
    Subsequently, both the values of hard and soft assignments are embedded into
    the proposed objective function for registration of partially overlapping point
    clouds and a novel variant of ICP algorithm is proposed to obtain the optimal
    rigid transformation. The proposed approach can achieve good registration of
    point clouds, even when their overlap percentage is low. Experimental results
    tested on public data sets illustrate its superiority over previous approaches
    on accuracy and robustness.

    Depth Structure Preserving Scene Image Generation

    Wendong Zhang, Bingbing Ni, Yichao Yan, Jingwei Xu, Xiaokang Yang
    Comments: 9 pages, 8 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Key to automatically generate natural scene images is to properly arrange
    among various spatial elements, especially in the depth direction. To this end,
    we introduce a novel depth structure preserving scene image generation network
    (DSP-GAN), which favors a hierarchical and heterogeneous architecture, for the
    purpose of depth structure preserving scene generation. The main trunk of the
    proposedinfrastructureisbuiltonaHawkespointprocessthat models the spatial
    dependency between different depth layers. Within each layer generative
    adversarial sub-networks are trained collaboratively to generate realistic
    scene components, conditioned on the layer information produced by the point
    process. We experiment our model on a sub-set of SUNdataset with annotated
    scene images and demonstrate that our models are capable of generating
    depth-realistic natural scene image.

    Shape and Positional Geometry of Multi-Object Configurations

    James Damon, Ellen Gasparovic
    Comments: This paper presents material relevant for two and three dimensional images that builds on and makes many references to a previous paper by the authors, arXiv:1402.5517
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In previous work, we introduced a method for modeling a configuration of
    objects in 2D and 3D images using a mathematical “medial/skeletal linking
    structure.” In this paper, we show how these structures allow us to capture
    positional properties of a multi-object configuration in addition to the shape
    properties of the individual objects. In particular, we introduce numerical
    invariants for positional properties which measure the closeness of neighboring
    objects, including identifying the parts of the objects which are close, and
    the “relative significance” of objects compared with the other objects in the
    configuration. Using these numerical measures, we introduce a hierarchical
    ordering and relations between the individual objects, and quantitative
    criteria for identifying subconfigurations. In addition, the invariants provide
    a “proximity matrix” which yields a unique set of weightings measuring overall
    proximity of objects in the configuration. Furthermore, we show that these
    invariants, which are volumetrically defined and involve external regions, may
    be computed via integral formulas in terms of “skeletal linking integrals”
    defined on the internal skeletal structures of the objects.

    Faster Spatially Regularized Correlation Filters for Visual Tracking

    Xiaoxiang Hu, Yujiu Yang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Discriminatively learned correlation filters (DCF) have been widely used in
    online visual tracking filed due to its simplicity and efficiency. These
    methods utilize a periodic assumption of the training samples to construct a
    circulant data matrix, which implicitly increases the training samples and
    reduces both storage and computational complexity.The periodic assumption also
    introduces unwanted boundary effects. Recently, Spatially Regularized
    Correlation Filters (SRDCF) solved this issue by introducing penalization on
    correlation filter coefficients depending on their spatial location. However,
    SRDCF’s efficiency dramatically decreased due to the breaking of circulant
    structure.

    We propose Faster Spatially Regularized Discriminative Correlation Filters
    (FSRDCF) for tracking. The FSRDCF is constructed from Ridge Regression, the
    circulant structure of training samples in the spatial domain is fully used,
    more importantly, we further exploit the circulant structure of regularization
    function in the Fourier domain, which allows our problem to be solved more
    directly and efficiently. Experiments are conducted on three benchmark
    datasets: OTB-2013, OTB-2015 and VOT2016. Our approach achieves equivalent
    performance to the baseline tracker SRDCF on all three datasets. On OTB-2013
    and OTB-2015 datasets, our approach obtains a more than twice faster running
    speed and a more than third times shorter start-up time than the SRDCF. For
    state-of-the-art comparison, our approach demonstrates superior performance
    compared to other non-spatial-regularization trackers.

    Superhuman Accuracy on the SNEMI3D Connectomics Challenge

    Kisuk Lee, Jonathan Zung, Peter Li, Viren Jain, H. Sebastian Seung
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    For the past decade, convolutional networks have been used for 3D
    reconstruction of neurons from electron microscopic (EM) brain images. Recent
    years have seen great improvements in accuracy, as evidenced by submissions to
    the SNEMI3D benchmark challenge. Here we report the first submission to surpass
    the estimate of human accuracy provided by the SNEMI3D leaderboard. A variant
    of 3D U-Net is trained on a primary task of predicting affinities between
    nearest neighbor voxels, and an auxiliary task of predicting long-range
    affinities. The training data is augmented by simulated image defects. The
    nearest neighbor affinities are used to create an oversegmentation, and then
    supervoxels are greedily agglomerated based on mean affinity. The resulting
    SNEMI3D score exceeds the estimate of human accuracy by a large margin. While
    one should be cautious about extrapolating from the SNEMI3D benchmark to
    real-world accuracy of large-scale neural circuit reconstruction, our result
    inspires optimism that the goal of full automation may be realizable in the
    future.

    Blood capillaries and vessels segmentation in optical coherence tomography angiogram using fuzzy C-means and Curvelet transform

    Fariborz Taherkhani
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Blood capillaries density and vessels structure in vivo brain provide
    information about diseases and hemodynamic in the brain network. Capillary
    density in animal brain is one of the indexes that determines if it has
    diabetes. Capillary density in the animal brain subjected to diabetes
    diminishes over time. Variation in physical structure and diameter of the
    vessels as a function of time gives us information about hemodynamic in the
    brain network as well. Capillaries and vessels segmentation is a process that
    can help us in predicting some of diseases such as diabetes as well as
    monitoring certain area of the brain that are supposed to be stimulated by
    Optogenetics technique for controlling blood flow in the brain network. In this
    work, we propose an unsupervised learning approach to distinguish capillaries
    from vessels. The proposed method is based on curvelet transform which measures
    thickness of curve singularities in Optical Coherence Tomography (OCT) images.
    In this work, first we enhance contrast of image by strengthening faint
    capillaries and softening vessels; we suppress noise level of image in this
    step as well. This step of the algorithm is performed by modifying curvelet
    coefficients using a non- linear function. Second, we extract a feature vector
    from each pixel of image using curvelet transform and finally, we apply a
    well-tuned fuzzy C-mean clustering approach to segment capillaries and vessels
    from background. Improvement in segmentation accuracy is one of the main
    motivations of this work; experimental results on test images indicate that the
    proposed method outperforms some of existing segmentation methods.

    Megapixel Size Image Creation using Generative Adversarial Networks

    Marco Marchesi
    Comments: 3 pages, 4 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Learning (cs.LG)

    Since its appearance, Generative Adversarial Networks (GANs) have received a
    lot of interest in the AI community. In image generation several projects
    showed how GANs are able to generate photorealistic images but the results so
    far did not look adequate for the quality standard of visual media production
    industry. We present an optimized image generation process based on a Deep
    Convolutional Generative Adversarial Networks (DCGANs), in order to create
    photorealistic high-resolution images (up to 1024×1024 pixels). Furthermore,
    the system was fed with a limited dataset of images, less than two thousand
    images. All these results give more clue about future exploitation of GANs in
    Computer Graphics and Visual Effects.

    Deep Generative Adversarial Networks for Compressed Sensing Automates MRI

    Morteza Mardani, Enhao Gong, Joseph Y. Cheng, Shreyas Vasanawala, Greg Zaharchuk, Marcus Alley, Neil Thakur, Song Han, William Dally, John M. Pauly, Lei Xing
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Magnetic resonance image (MRI) reconstruction is a severely ill-posed linear
    inverse task demanding time and resource intensive computations that can
    substantially trade off {it accuracy} for {it speed} in real-time imaging. In
    addition, state-of-the-art compressed sensing (CS) analytics are not cognizant
    of the image {it diagnostic quality}. To cope with these challenges we put
    forth a novel CS framework that permeates benefits from generative adversarial
    networks (GAN) to train a (low-dimensional) manifold of diagnostic-quality MR
    images from historical patients. Leveraging a mixture of least-squares (LS)
    GANs and pixel-wise (ell_1) cost, a deep residual network with skip
    connections is trained as the generator that learns to remove the {it
    aliasing} artifacts by projecting onto the manifold. LSGAN learns the texture
    details, while (ell_1) controls the high-frequency noise. A multilayer
    convolutional neural network is then jointly trained based on diagnostic
    quality images to discriminate the projection quality. The test phase performs
    feed-forward propagation over the generator network that demands a very low
    computational overhead. Extensive evaluations are performed on a large
    contrast-enhanced MR dataset of pediatric patients. In particular, images rated
    based on expert radiologists corroborate that GANCS retrieves high contrast
    images with detailed texture relative to conventional CS, and pixel-wise
    schemes. In addition, it offers reconstruction under a few milliseconds, two
    orders of magnitude faster than state-of-the-art CS-MRI schemes.

    Cross-modal Common Representation Learning by Hybrid Transfer Network

    Xin Huang, Yuxin Peng, Mingkuan Yuan
    Comments: To appear in the proceedings of 26th International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, Aug. 19-25, 2017
    Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    DNN-based cross-modal retrieval is a research hotspot to retrieve across
    different modalities as image and text, but existing methods often face the
    challenge of insufficient cross-modal training data. In single-modal scenario,
    similar problem is usually relieved by transferring knowledge from large-scale
    auxiliary datasets (as ImageNet). Knowledge from such single-modal datasets is
    also very useful for cross-modal retrieval, which can provide rich general
    semantic information that can be shared across different modalities. However,
    it is challenging to transfer useful knowledge from single-modal (as image)
    source domain to cross-modal (as image/text) target domain. Knowledge in source
    domain cannot be directly transferred to both two different modalities in
    target domain, and the inherent cross-modal correlation contained in target
    domain provides key hints for cross-modal retrieval which should be preserved
    during transfer process. This paper proposes Cross-modal Hybrid Transfer
    Network (CHTN) with two subnetworks: Modal-sharing transfer subnetwork utilizes
    the modality in both source and target domains as a bridge, for transferring
    knowledge to both two modalities simultaneously; Layer-sharing correlation
    subnetwork preserves the inherent cross-modal semantic correlation to further
    adapt to cross-modal retrieval task. Cross-modal data can be converted to
    common representation by CHTN for retrieval, and comprehensive experiment on 3
    datasets shows its effectiveness.

    Teaching Machines to Describe Images via Natural Language Feedback

    Huan Ling, Sanja Fidler
    Comments: 13 pages
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

    Robots will eventually be part of every household. It is thus critical to
    enable algorithms to learn from and be guided by non-expert users. In this
    paper, we bring a human in the loop, and enable a human teacher to give
    feedback to a learning agent in the form of natural language. We argue that a
    descriptive sentence can provide a much stronger learning signal than a numeric
    reward in that it can easily point to where the mistakes are and how to correct
    them. We focus on the problem of image captioning in which the quality of the
    output can easily be judged by non-experts. We propose a hierarchical
    phrase-based captioning model trained with policy gradients, and design a
    feedback network that provides reward to the learner by conditioning on the
    human-provided feedback. We show that by exploiting descriptive feedback our
    model learns to perform better than when given independently written human
    captions.

    Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers

    Ken Hoover, Sourish Chaudhuri, Caroline Pantofaru, Malcolm Slaney, Ian Sturdy
    Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)

    In this paper, we present a system that associates faces with voices in a
    video by fusing information from the audio and visual signals. The thesis
    underlying our work is that an extremely simple approach to generating (weak)
    speech clusters can be combined with visual signals to effectively associate
    faces and voices by aggregating statistics across a video. This approach does
    not need any training data specific to this task and leverages the natural
    coherence of information in the audio and visual streams. It is particularly
    applicable to tracking speakers in videos on the web where a priori information
    about the environment (e.g., number of speakers, spatial signals for
    beamforming) is not available. We performed experiments on a real-world dataset
    using this analysis framework to determine the speaker in a video. Given a
    ground truth labeling determined by human rater consensus, our approach had
    ~71% accuracy.


    Artificial Intelligence

    Enhancing workflow-nets with data for trace completion

    Riccardo De Masellis, Chiara Di Francescomarino, Chiara Ghidini, Sergio Tessaris
    Subjects: Artificial Intelligence (cs.AI)

    The growing adoption of IT-systems for modeling and executing (business)
    processes or services has thrust the scientific investigation towards
    techniques and tools which support more complex forms of process analysis. Many
    of them, such as conformance checking, process alignment, mining and
    enhancement, rely on complete observation of past (tracked and logged)
    executions. In many real cases, however, the lack of human or IT-support on all
    the steps of process execution, as well as information hiding and abstraction
    of model and data, result in incomplete log information of both data and
    activities. This paper tackles the issue of automatically repairing traces with
    missing information by notably considering not only activities but also data
    manipulated by them. Our technique recasts such a problem in a reachability
    problem and provides an encoding in an action language which allows to
    virtually use any state-of-the-art planning to return solutions.

    Grounding Symbols in Multi-Modal Instructions

    Yordan Hristov, Svetlin Penkov, Alex Lascarides, Subramanian Ramamoorthy
    Comments: 9 pages, 8 figures, To appear in the Proceedings of the ACL workshop Language Grounding for Robotics, Vancouver, Canada
    Subjects: Artificial Intelligence (cs.AI)

    As robots begin to cohabit with humans in semi-structured environments, the
    need arises to understand instructions involving rich variability—for
    instance, learning to ground symbols in the physical world. Realistically, this
    task must cope with small datasets consisting of a particular users’ contextual
    assignment of meaning to terms. We present a method for processing a raw stream
    of cross-modal input—i.e., linguistic instructions, visual perception of a
    scene and a concurrent trace of 3D eye tracking fixations—to produce the
    segmentation of objects with a correspondent association to high-level
    concepts. To test our framework we present experiments in a table-top object
    manipulation scenario. Our results show our model learns the user’s notion of
    colour and shape from a small number of physical demonstrations, generalising
    to identifying physical referents for novel combinations of the words.

    Diversified Top-k Partial MaxSAT Solving

    Junping Zhou, Huanyao Sun, Feifei Ma, Jian Gao, Ke Xu, Minghao Yin
    Subjects: Artificial Intelligence (cs.AI)

    We introduce a diversified top-k partial MaxSAT problem, a combination of
    partial MaxSAT problem and enumeration problem. Given a partial MaxSAT formula
    F and a positive integer k, the diversified top-k partial MaxSAT is to find k
    maximal solutions for F such that the k maximal solutions satisfy the maximum
    number of soft clauses of F. This problem can be widely used in many
    applications including community detection, sensor place, motif discovery, and
    combinatorial testing. We prove the problem is NP-hard and propose an approach
    for solving the problem. The concrete idea of the approach is to design an
    encoding EE which reduces diversified top-k partial MaxSAT problem into partial
    MaxSAT problem, and then solve the resulting problem with state-of-art solvers.
    In addition, we present an algorithm MEMKC exactly solving the diversified
    top-k partial MaxSAT. Through several experiments we show that our approach can
    be successfully applied to the interesting problem.

    Descriptions of Objectives and Processes of Mechanical Learning

    Chuyu Xiong
    Subjects: Artificial Intelligence (cs.AI)

    In [1], we introduced mechanical learning and proposed 2 approaches to
    mechanical learning. Here, we follow one such approach to well describe the
    objects and the processes of learning. We discuss 2 kinds of patterns:
    objective and subjective pattern. Subjective pattern is crucial for learning
    machine. We prove that for any objective pattern we can find a proper
    subjective pattern based upon least base patterns to express the objective
    pattern well. X-form is algebraic expression for subjective pattern. Collection
    of X-forms form internal representation space, which is center of learning
    machine. We discuss learning by teaching and without teaching. We define data
    sufficiency by X-form. We then discussed some learning strategies. We show, in
    each strategy, with sufficient data, and with certain capabilities, learning
    machine indeed can learn any pattern (universal learning machine). In appendix,
    with knowledge of learning machine, we try to view deep learning from a
    different angle, i.e. its internal representation space and its learning
    dynamics.

    A Diversified Multi-Start Algorithm for Unconstrained Binary Quadratic Problems Leveraging the Graphics Processor Unit

    Mark W. Lewis
    Comments: Quality solutions quickly obtained for xQx using the GPU to perform matrix multiplication, however improvements to solution intensification are needed
    Subjects: Artificial Intelligence (cs.AI)

    Multi-start algorithms are a common and effective tool for metaheuristic
    searches. In this paper we amplify multi-start capabilities by employing the
    parallel processing power of the graphics processer unit (GPU) to quickly
    generate a diverse starting set of solutions for the Unconstrained Binary
    Quadratic Optimization Problem which are evaluated and used to implement
    screening methods to select solutions for further optimization. This method is
    implemented as an initial high quality solution generation phase prior to a
    secondary steepest ascent search and a comparison of results to best known
    approaches on benchmark unconstrained binary quadratic problems demonstrates
    that GPU-enabled diversified multi-start with screening quickly yields very
    good results.

    Learning Disentangled Representations with Semi-Supervised Deep Generative Models

    N. Siddharth, Brooks Paige, Jan-Willem Van de Meent, Alban Desmaison, Frank Wood, Noah D. Goodman, Pushmeet Kohli, Philip H.S. Torr
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Variational autoencoders (VAEs) learn representations of data by jointly
    training a probabilistic encoder and decoder network. Typically these models
    encode all features of the data into a single variable. Here we are interested
    in learning disentangled representations that encode distinct aspects of the
    data into separate variables. We propose to learn such representations using
    model architectures that generalize from standard VAEs, employing a general
    graphical model structure in the encoder and decoder. This allows us to train
    partially-specified models that make relatively strong assumptions about a
    subset of interpretable variables and rely on the flexibility of neural
    networks to learn representations for the remaining variables. We further
    define a general objective for semi-supervised learning in this model class,
    which can be approximated using an importance sampling procedure. We evaluate
    our framework’s ability to learn disentangled representations, both by
    qualitative exploration of its generative capacity, and quantitative evaluation
    of its discriminative ability on a variety of models and datasets.

    Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

    Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Bernhard Schölkopf, Sergey Levine
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

    Off-policy model-free deep reinforcement learning methods using previously
    collected data can improve sample efficiency over on-policy policy gradient
    techniques. On the other hand, on-policy algorithms are often more stable and
    easier to use. This paper examines, both theoretically and empirically,
    approaches to merging on- and off-policy updates for deep reinforcement
    learning. Theoretical results show that off-policy updates with a value
    function estimator can be interpolated with on-policy policy gradient updates
    whilst still satisfying performance bounds. Our analysis uses control variate
    methods to produce a family of policy gradient algorithms, with several
    recently proposed algorithms being special cases of this family. We then
    provide an empirical comparison of these techniques with the remaining
    algorithmic details fixed, and show how different mixing of off-policy gradient
    estimates with on-policy samples contribute to improvements in empirical
    performance. The final algorithm provides a generalization and unification of
    existing deep policy gradient techniques, has theoretical guarantees on the
    bias introduced by off-policy updates, and improves on the state-of-the-art
    model-free deep RL methods on a number of OpenAI Gym continuous control
    benchmarks.

    Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints

    Nikola Mrkšić, Ivan Vulić, Diarmuid Ó Séaghdha, Ira Leviant, Roi Reichart, Milica Gašić, Anna Korhonen, Steve Young
    Comments: Accepted for publication at TACL (to be presented at EMNLP 2017)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We present Attract-Repel, an algorithm for improving the semantic quality of
    word vectors by injecting constraints extracted from lexical resources.
    Attract-Repel facilitates the use of constraints from mono- and cross-lingual
    resources, yielding semantically specialised cross-lingual vector spaces. Our
    evaluation shows that the method can make use of existing cross-lingual
    lexicons to construct high-quality vector spaces for a plethora of different
    languages, facilitating semantic transfer from high- to lower-resource ones.
    The effectiveness of our approach is demonstrated with state-of-the-art results
    on semantic similarity datasets in six languages. We next show that
    Attract-Repel-specialised vectors boost performance in the downstream task of
    dialogue state tracking (DST) across multiple languages. Finally, we show that
    cross-lingual vector spaces produced by our algorithm facilitate the training
    of multilingual DST models, which brings further performance improvements.

    Discovering Discrete Latent Topics with Neural Variational Inference

    Yishu Miao, Edward Grefenstette, Phil Blunsom
    Comments: ICML 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Learning (cs.LG)

    Topic models have been widely explored as probabilistic generative models of
    documents. Traditional inference methods have sought closed-form derivations
    for updating the models, however as the expressiveness of these models grows,
    so does the difficulty of performing fast and accurate inference over their
    parameters. This paper presents alternative neural approaches to topic
    modelling by providing parameterisable distributions over topics which permit
    training by backpropagation in the framework of neural variational inference.
    In addition, with the help of a stick-breaking construction, we propose a
    recurrent network that is able to discover a notionally unbounded number of
    topics, analogous to Bayesian non-parametric topic models. Experimental results
    on the MXM Song Lyrics, 20NewsGroups and Reuters News datasets demonstrate the
    effectiveness and efficiency of these neural topic models.

    One button machine for automating feature engineering in relational databases

    Hoang Thanh Lam, Johann-Michael Thiebaut, Mathieu Sinn, Bei Chen, Tiep Mai, Oznur Alkan
    Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

    Feature engineering is one of the most important and time consuming tasks in
    predictive analytics projects. It involves understanding domain knowledge and
    data exploration to discover relevant hand-crafted features from raw data. In
    this paper, we introduce a system called One Button Machine, or OneBM for
    short, which automates feature discovery in relational databases. OneBM
    automatically performs a key activity of data scientists, namely, joining of
    database tables and applying advanced data transformations to extract useful
    features from data. We validated OneBM in Kaggle competitions in which OneBM
    achieved performance as good as top 16% to 24% data scientists in three Kaggle
    competitions. More importantly, OneBM outperformed the state-of-the-art system
    in a Kaggle competition in terms of prediction accuracy and ranking on Kaggle
    leaderboard. The results show that OneBM can be useful for both data scientists
    and non-experts. It helps data scientists reduce data exploration time allowing
    them to try and error many ideas in short time. On the other hand, it enables
    non-experts, who are not familiar with data science, to quickly extract value
    from their data with a little effort, time and cost.

    Teaching Machines to Describe Images via Natural Language Feedback

    Huan Ling, Sanja Fidler
    Comments: 13 pages
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

    Robots will eventually be part of every household. It is thus critical to
    enable algorithms to learn from and be guided by non-expert users. In this
    paper, we bring a human in the loop, and enable a human teacher to give
    feedback to a learning agent in the form of natural language. We argue that a
    descriptive sentence can provide a much stronger learning signal than a numeric
    reward in that it can easily point to where the mistakes are and how to correct
    them. We focus on the problem of image captioning in which the quality of the
    output can easily be judged by non-experts. We propose a hierarchical
    phrase-based captioning model trained with policy gradients, and design a
    feedback network that provides reward to the learner by conditioning on the
    human-provided feedback. We show that by exploiting descriptive feedback our
    model learns to perform better than when given independently written human
    captions.

    Free energy-based reinforcement learning using a quantum processor

    Anna Levit, Daniel Crawford, Navid Ghadermarzy, Jaspreet S. Oberoi, Ehsan Zahedinejad, Pooya Ronagh
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC); Quantum Physics (quant-ph)

    Recent theoretical and experimental results suggest the possibility of using
    current and near-future quantum hardware in challenging sampling tasks. In this
    paper, we introduce free energy-based reinforcement learning (FERL) as an
    application of quantum hardware. We propose a method for processing a quantum
    annealer’s measured qubit spin configurations in approximating the free energy
    of a quantum Boltzmann machine (QBM). We then apply this method to perform
    reinforcement learning on the grid-world problem using the D-Wave 2000Q quantum
    annealer. The experimental results show that our technique is a promising
    method for harnessing the power of quantum sampling in reinforcement learning
    tasks.

    The Sample Complexity of Online One-Class Collaborative Filtering

    Reinhard Heckel, Kannan Ramchandran
    Comments: ICML 2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

    We consider the online one-class collaborative filtering (CF) problem that
    consists of recommending items to users over time in an online fashion based on
    positive ratings only. This problem arises when users respond only occasionally
    to a recommendation with a positive rating, and never with a negative one. We
    study the impact of the probability of a user responding to a recommendation,
    p_f, on the sample complexity, i.e., the number of ratings required to make
    `good’ recommendations, and ask whether receiving positive and negative
    ratings, instead of positive ratings only, improves the sample complexity. Both
    questions arise in the design of recommender systems. We introduce a simple
    probabilistic user model, and analyze the performance of an online user-based
    CF algorithm. We prove that after an initial cold start phase, where
    recommendations are invested in exploring the user’s preferences, this
    algorithm makes—up to a fraction of the recommendations required for updating
    the user’s preferences—perfect recommendations. The number of ratings
    required for the cold start phase is nearly proportional to 1/p_f, and that for
    updating the user’s preferences is essentially independent of p_f. As a
    consequence we find that, receiving positive and negative ratings instead of
    only positive ones improves the number of ratings required for initial
    exploration by a factor of 1/p_f, which can be significant.

    A Learning Based Optimal Human Robot Collaboration with Linear Temporal Logic Constraints

    Bo Wu, Bin Hu, Hai Lin
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

    This paper considers an optimal task allocation problem for human robot
    collaboration in human robot systems with persistent tasks. Such human robot
    systems consist of human operators and intelligent robots collaborating with
    each other to accomplish complex tasks that cannot be done by either part
    alone. The system objective is to maximize the probability of successfully
    executing persistent tasks that are formulated as linear temporal logic
    specifications and minimize the average cost between consecutive visits of a
    particular proposition. This paper proposes to model the human robot
    collaboration under a framework with the composition of multiple Markov
    Decision Process (MDP) with possibly unknown transition probabilities, which
    characterizes how human cognitive states, such as human trust and fatigue,
    stochastically change with the robot performance. Under the unknown MDP models,
    an algorithm is developed to learn the model and obtain an optimal task
    allocation policy that minimizes the expected average cost for each task cycle
    and maximizes the probability of satisfying linear temporal logic constraints.
    Moreover, this paper shows that the difference between the optimal policy based
    on the learned model and that based on the underlying ground truth model can be
    bounded by arbitrarily small constant and large confidence level with
    sufficient samples. The case study of an assembly process demonstrates the
    effectiveness and benefits of our proposed learning based human robot
    collaboration.


    Information Retrieval

    Item-Item Music Recommendations With Side Information

    Özgür Demir, Alexey Rodriguez Yakushev, Rany Keddo, Ursula Kallio
    Subjects: Information Retrieval (cs.IR)

    Online music services have tens of millions of tracks. The content itself is
    broad and covers various musical genres as well as non-musical audio content
    such as radio plays and podcasts. The sheer scale and diversity of content
    makes it difficult for a user to find relevant tracks. Relevant recommendations
    are therefore crucial for a good user experience. Here we present a method to
    compute track-track similarities using collaborative filtering signals with
    side information. On a data set from music streaming service SoundCloud, the
    method here outperforms the widely adopted implicit matrix factorization
    technique. The implementation of our method is open sourced and can be applied
    to related item-item recommendation tasks with side information.

    Discovering Discrete Latent Topics with Neural Variational Inference

    Yishu Miao, Edward Grefenstette, Phil Blunsom
    Comments: ICML 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Learning (cs.LG)

    Topic models have been widely explored as probabilistic generative models of
    documents. Traditional inference methods have sought closed-form derivations
    for updating the models, however as the expressiveness of these models grows,
    so does the difficulty of performing fast and accurate inference over their
    parameters. This paper presents alternative neural approaches to topic
    modelling by providing parameterisable distributions over topics which permit
    training by backpropagation in the framework of neural variational inference.
    In addition, with the help of a stick-breaking construction, we propose a
    recurrent network that is able to discover a notionally unbounded number of
    topics, analogous to Bayesian non-parametric topic models. Experimental results
    on the MXM Song Lyrics, 20NewsGroups and Reuters News datasets demonstrate the
    effectiveness and efficiency of these neural topic models.

    Deep Learning for Hate Speech Detection in Tweets

    Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, Vasudeva Varma
    Comments: In Proceedings of ACM WWW’17 Companion, Perth, Western Australia, Apr 2017 (WWW’17), 2 pages
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Hate speech detection on Twitter is critical for applications like
    controversial event extraction, building AI chatterbots, content
    recommendation, and sentiment analysis. We define this task as being able to
    classify a tweet as racist, sexist or neither. The complexity of the natural
    language constructs makes this task very challenging. We perform extensive
    experiments with multiple deep learning architectures to learn semantic word
    embeddings to handle this complexity. Our experiments on a benchmark dataset of
    16K annotated tweets show that such deep learning methods outperform
    state-of-the-art char/word n-gram methods by ~18 F1 points.

    Network Capacity Bound for Personalized PageRank in Multimodal Networks

    M.A. Kłopotek, S.T. Wierzchoń, R.A. Kłopotek
    Comments: 28 pages. arXiv admin note: text overlap with arXiv:1702.03734
    Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR); Physics and Society (physics.soc-ph)

    In a former paper the concept of Bipartite PageRank was introduced and a
    theorem on the limit of authority flowing between nodes for personalized
    PageRank has been generalized. In this paper we want to extend those results to
    multimodal networks. In particular we introduce a hypergraph type that may be
    used for describing multimodal network where a hyperlink connects nodes from
    each of the modalities. We introduce a generalisation of PageRank for such
    graphs and define the respective random walk model that can be used for
    computations. we finally state and prove theorems on the limit of outflow of
    authority for cases where individual modalities have identical and distinct
    damping factors.


    Computation and Language

    Morph-fitting: Fine-Tuning Word Vector Spaces with Simple Language-Specific Rules

    Ivan Vulić, Nikola Mrkšić, Roi Reichart, Diarmuid Ó Séaghdha, Steve Young, Anna Korhonen
    Comments: ACL 2017 (Long paper)
    Subjects: Computation and Language (cs.CL)

    Morphologically rich languages accentuate two properties of distributional
    vector space models: 1) the difficulty of inducing accurate representations for
    low-frequency word forms; and 2) insensitivity to distinct lexical relations
    that have similar distributional signatures. These effects are detrimental for
    language understanding systems, which may infer that ‘inexpensive’ is a
    rephrasing for ‘expensive’ or may not associate ‘acquire’ with ‘acquires’. In
    this work, we propose a novel morph-fitting procedure which moves past the use
    of curated semantic lexicons for improving distributional vector spaces.
    Instead, our method injects morphological constraints generated using simple
    language-specific rules, pulling inflectional forms of the same word close
    together and pushing derivational antonyms far apart. In intrinsic evaluation
    over four languages, we show that our approach: 1) improves low-frequency word
    estimates; and 2) boosts the semantic quality of the entire word vector
    collection. Finally, we show that morph-fitted vectors yield large gains in the
    downstream task of dialogue state tracking, highlighting the importance of
    morphology for tackling long-tail phenomena in language understanding tasks.

    Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints

    Nikola Mrkšić, Ivan Vulić, Diarmuid Ó Séaghdha, Ira Leviant, Roi Reichart, Milica Gašić, Anna Korhonen, Steve Young
    Comments: Accepted for publication at TACL (to be presented at EMNLP 2017)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We present Attract-Repel, an algorithm for improving the semantic quality of
    word vectors by injecting constraints extracted from lexical resources.
    Attract-Repel facilitates the use of constraints from mono- and cross-lingual
    resources, yielding semantically specialised cross-lingual vector spaces. Our
    evaluation shows that the method can make use of existing cross-lingual
    lexicons to construct high-quality vector spaces for a plethora of different
    languages, facilitating semantic transfer from high- to lower-resource ones.
    The effectiveness of our approach is demonstrated with state-of-the-art results
    on semantic similarity datasets in six languages. We next show that
    Attract-Repel-specialised vectors boost performance in the downstream task of
    dialogue state tracking (DST) across multiple languages. Finally, we show that
    cross-lingual vector spaces produced by our algorithm facilitate the training
    of multilingual DST models, which brings further performance improvements.

    Discovering Discrete Latent Topics with Neural Variational Inference

    Yishu Miao, Edward Grefenstette, Phil Blunsom
    Comments: ICML 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Learning (cs.LG)

    Topic models have been widely explored as probabilistic generative models of
    documents. Traditional inference methods have sought closed-form derivations
    for updating the models, however as the expressiveness of these models grows,
    so does the difficulty of performing fast and accurate inference over their
    parameters. This paper presents alternative neural approaches to topic
    modelling by providing parameterisable distributions over topics which permit
    training by backpropagation in the framework of neural variational inference.
    In addition, with the help of a stick-breaking construction, we propose a
    recurrent network that is able to discover a notionally unbounded number of
    topics, analogous to Bayesian non-parametric topic models. Experimental results
    on the MXM Song Lyrics, 20NewsGroups and Reuters News datasets demonstrate the
    effectiveness and efficiency of these neural topic models.

    Using of heterogeneous corpora for training of an ASR system

    Jan Trmal, Gaurav Kumar, Vimal Manohar, Sanjeev Khudanpur, Matt Post, Paul McNamee
    Subjects: Computation and Language (cs.CL)

    The paper summarizes the development of the LVCSR system built as a part of
    the Pashto speech-translation system at the SCALE (Summer Camp for Applied
    Language Exploration) 2015 workshop on “Speech-to-text-translation for
    low-resource languages”. The Pashto language was chosen as a good “proxy”
    low-resource language, exhibiting multiple phenomena which make the
    speech-recognition and and speech-to-text-translation systems development hard.

    Even when the amount of data is seemingly sufficient, given the fact that the
    data originates from multiple sources, the preliminary experiments reveal that
    there is little to no benefit in merging (concatenating) the corpora and more
    elaborate ways of making use of all of the data must be worked out.

    This paper concentrates only on the LVCSR part and presents a range of
    different techniques that were found to be useful in order to benefit from
    multiple different corpora

    Polish Read Speech Corpus for Speech Tools and Services

    Danijel Koržinek, Krzysztof Marasek, Łukasz Brocki, Krzysztof Wołk
    Subjects: Computation and Language (cs.CL)

    This paper describes the speech processing activities conducted at the Polish
    consortium of the CLARIN project. The purpose of this segment of the project
    was to develop specific tools that would allow for automatic and semi-automatic
    processing of large quantities of acoustic speech data. The tools include the
    following: grapheme-to-phoneme conversion, speech-to-text alignment, voice
    activity detection, speaker diarization, keyword spotting and automatic speech
    transcription. Furthermore, in order to develop these tools, a large
    high-quality studio speech corpus was recorded and released under an open
    license, to encourage development in the area of Polish speech research.
    Another purpose of the corpus was to serve as a reference for studies in
    phonetics and pronunciation. All the tools and resources were released on the
    the Polish CLARIN website. This paper discusses the current status and future
    plans for the project.

    Deep Learning for Hate Speech Detection in Tweets

    Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, Vasudeva Varma
    Comments: In Proceedings of ACM WWW’17 Companion, Perth, Western Australia, Apr 2017 (WWW’17), 2 pages
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Hate speech detection on Twitter is critical for applications like
    controversial event extraction, building AI chatterbots, content
    recommendation, and sentiment analysis. We define this task as being able to
    classify a tweet as racist, sexist or neither. The complexity of the natural
    language constructs makes this task very challenging. We perform extensive
    experiments with multiple deep learning architectures to learn semantic word
    embeddings to handle this complexity. Our experiments on a benchmark dataset of
    16K annotated tweets show that such deep learning methods outperform
    state-of-the-art char/word n-gram methods by ~18 F1 points.

    Natural Language Generation for Spoken Dialogue System using RNN Encoder-Decoder Networks

    Van-Khanh Tran, Le-Minh Nguyen
    Comments: has been accepted to appear at CoNLL 2017
    Subjects: Computation and Language (cs.CL)

    Natural language generation (NLG) is a critical component in a spoken
    dialogue system. This paper presents a Recurrent Neural Network based
    Encoder-Decoder architecture, in which an LSTM-based decoder is introduced to
    select, aggregate semantic elements produced by an attention mechanism over the
    input elements, and to produce the required utterances. The proposed generator
    can be jointly trained both sentence planning and surface realization to
    produce natural language sentences. The proposed model was extensively
    evaluated on four different NLG datasets. The experimental results showed that
    the proposed generators not only consistently outperform the previous methods
    across all the NLG domains but also show an ability to generalize from a new,
    unseen domain and learn from multi-domain datasets.

    Semantic Refinement GRU-based Neural Language Generation for Spoken Dialogue Systems

    Van-Khanh Tran, Le-Minh Nguyen
    Comments: has been accepted to appear at PACLING 2017
    Subjects: Computation and Language (cs.CL)

    Natural language generation (NLG) plays a critical role in spoken dialogue
    systems. This paper presents a new approach to NLG by using recurrent neural
    networks (RNN), in which a gating mechanism is applied before RNN computation.
    This allows the proposed model to generate appropriate sentences. The RNN-based
    generator can be learned from unaligned data by jointly training sentence
    planning and surface realization to produce natural language responses. The
    model was extensively evaluated on four different NLG domains. The results show
    that the proposed generator achieved better performance on all the NLG domains
    compared to previous generators.

    Teaching Machines to Describe Images via Natural Language Feedback

    Huan Ling, Sanja Fidler
    Comments: 13 pages
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

    Robots will eventually be part of every household. It is thus critical to
    enable algorithms to learn from and be guided by non-expert users. In this
    paper, we bring a human in the loop, and enable a human teacher to give
    feedback to a learning agent in the form of natural language. We argue that a
    descriptive sentence can provide a much stronger learning signal than a numeric
    reward in that it can easily point to where the mistakes are and how to correct
    them. We focus on the problem of image captioning in which the quality of the
    output can easily be judged by non-experts. We propose a hierarchical
    phrase-based captioning model trained with policy gradients, and design a
    feedback network that provides reward to the learner by conditioning on the
    human-provided feedback. We show that by exploiting descriptive feedback our
    model learns to perform better than when given independently written human
    captions.

    Transfer Learning for Speech Recognition on a Budget

    Julius Kunze, Louis Kirsch, Ilia Kurenkov, Andreas Krug, Jens Johannsmeier, Sebastian Stober
    Comments: Accepted for 2nd ACL Workshop on Representation Learning for NLP
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    End-to-end training of automated speech recognition (ASR) systems requires
    massive data and compute resources. We explore transfer learning based on model
    adaptation as an approach for training ASR models under constrained GPU memory,
    throughput and training data. We conduct several systematic experiments
    adapting a Wav2Letter convolutional neural network originally trained for
    English ASR to the German language. We show that this technique allows faster
    training on consumer-grade resources while requiring less training data in
    order to achieve the same accuracy, thereby lowering the cost of training ASR
    models in other languages. Model introspection revealed that small adaptations
    to the network’s weights were sufficient for good performance, especially for
    inner layers.

    Learning to Compute Word Embeddings on the Fly

    Dzmitry Bahdanau, Tom Bosc, Stanisław Jastrzębski, Edward Grefenstette, Pascal Vincent, Yoshua Bengio
    Subjects: Learning (cs.LG); Computation and Language (cs.CL)

    Words in natural language follow a Zipfian distribution whereby some words
    are frequent but most are rare. Learning representations for words in the “long
    tail” of this distribution requires enormous amounts of data. Representations
    of rare words trained directly on end-tasks are usually poor, requiring us to
    pre-train embeddings on external data, or treat all rare words as
    out-of-vocabulary words with a unique representation. We provide a method for
    predicting embeddings of rare words on the fly from small amounts of auxiliary
    data with a network trained against the end task. We show that this improves
    results against baselines where embeddings are trained on the end task in a
    reading comprehension task, a recognizing textual entailment task, and in
    language modelling.


    Distributed, Parallel, and Cluster Computing

    A graph model of message passing processes

    Andrew M. Mironov
    Subjects: Logic in Computer Science (cs.LO); Distributed, Parallel, and Cluster Computing (cs.DC)

    In the paper we consider a graph model of message passing processes and
    present a method verification of message passing processes. The method is
    illustrated by an example of a verification of sliding window protocol.

    Using GPI-2 for Distributed Memory Paralleliziation of the Caffe Toolbox to Speed up Deep Neural Network Training

    Martin Kuehn, Janis Keuper, Franz-Josef Pfreundt
    Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

    Deep Neural Network (DNN) are currently of great inter- est in research and
    application. The training of these net- works is a compute intensive and time
    consuming task. To reduce training times to a bearable amount at reasonable
    cost we extend the popular Caffe toolbox for DNN with an efficient distributed
    memory communication pattern. To achieve good scalability we emphasize the
    overlap of computation and communication and prefer fine granu- lar
    synchronization patterns over global barriers. To im- plement these
    communication patterns we rely on the the Global address space Programming
    Interface version 2 (GPI-2) communication library. This interface provides a
    light-weight set of asynchronous one-sided communica- tion primitives
    supplemented by non-blocking fine gran- ular data synchronization mechanisms.
    Therefore, Caf- feGPI is the name of our parallel version of Caffe. First
    benchmarks demonstrate better scaling behavior com- pared with other
    extensions, e.g., the Intel TM Caffe. Even within a single symmetric
    multiprocessing machine with four graphics processing units, the CaffeGPI
    scales bet- ter than the standard Caffe toolbox. These first results
    demonstrate that the use of standard High Performance Computing (HPC) hardware
    is a valid cost saving ap- proach to train large DDNs. I/O is an other
    bottleneck to work with DDNs in a standard parallel HPC setting, which we will
    consider in more detail in a forthcoming paper.


    Learning

    Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

    Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Bernhard Schölkopf, Sergey Levine
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

    Off-policy model-free deep reinforcement learning methods using previously
    collected data can improve sample efficiency over on-policy policy gradient
    techniques. On the other hand, on-policy algorithms are often more stable and
    easier to use. This paper examines, both theoretically and empirically,
    approaches to merging on- and off-policy updates for deep reinforcement
    learning. Theoretical results show that off-policy updates with a value
    function estimator can be interpolated with on-policy policy gradient updates
    whilst still satisfying performance bounds. Our analysis uses control variate
    methods to produce a family of policy gradient algorithms, with several
    recently proposed algorithms being special cases of this family. We then
    provide an empirical comparison of these techniques with the remaining
    algorithmic details fixed, and show how different mixing of off-policy gradient
    estimates with on-policy samples contribute to improvements in empirical
    performance. The final algorithm provides a generalization and unification of
    existing deep policy gradient techniques, has theoretical guarantees on the
    bias introduced by off-policy updates, and improves on the state-of-the-art
    model-free deep RL methods on a number of OpenAI Gym continuous control
    benchmarks.

    Transfer Learning for Speech Recognition on a Budget

    Julius Kunze, Louis Kirsch, Ilia Kurenkov, Andreas Krug, Jens Johannsmeier, Sebastian Stober
    Comments: Accepted for 2nd ACL Workshop on Representation Learning for NLP
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    End-to-end training of automated speech recognition (ASR) systems requires
    massive data and compute resources. We explore transfer learning based on model
    adaptation as an approach for training ASR models under constrained GPU memory,
    throughput and training data. We conduct several systematic experiments
    adapting a Wav2Letter convolutional neural network originally trained for
    English ASR to the German language. We show that this technique allows faster
    training on consumer-grade resources while requiring less training data in
    order to achieve the same accuracy, thereby lowering the cost of training ASR
    models in other languages. Model introspection revealed that small adaptations
    to the network’s weights were sufficient for good performance, especially for
    inner layers.

    Learning to Compute Word Embeddings on the Fly

    Dzmitry Bahdanau, Tom Bosc, Stanisław Jastrzębski, Edward Grefenstette, Pascal Vincent, Yoshua Bengio
    Subjects: Learning (cs.LG); Computation and Language (cs.CL)

    Words in natural language follow a Zipfian distribution whereby some words
    are frequent but most are rare. Learning representations for words in the “long
    tail” of this distribution requires enormous amounts of data. Representations
    of rare words trained directly on end-tasks are usually poor, requiring us to
    pre-train embeddings on external data, or treat all rare words as
    out-of-vocabulary words with a unique representation. We provide a method for
    predicting embeddings of rare words on the fly from small amounts of auxiliary
    data with a network trained against the end task. We show that this improves
    results against baselines where embeddings are trained on the end task in a
    reading comprehension task, a recognizing textual entailment task, and in
    language modelling.

    Krylov Subspace Recycling for Fast Iterative Least-Squares in Machine Learning

    Filip de Roos, Philipp Hennig
    Subjects: Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)

    Solving symmetric positive definite linear problems is a fundamental
    computational task in machine learning. The exact solution, famously, is
    cubicly expensive in the size of the matrix. To alleviate this problem, several
    linear-time approximations, such as spectral and inducing-point methods, have
    been suggested and are now in wide use. These are low-rank approximations that
    choose the low-rank space a priori and do not refine it over time. While this
    allows linear cost in the data-set size, it also causes a finite, uncorrected
    approximation error. Authors from numerical linear algebra have explored ways
    to iteratively refine such low-rank approximations, at a cost of a small number
    of matrix-vector multiplications. This idea is particularly interesting in the
    many situations in machine learning where one has to solve a sequence of
    related symmetric positive definite linear problems. From the machine learning
    perspective, such deflation methods can be interpreted as transfer learning of
    a low-rank approximation across a time-series of numerical tasks. We study the
    use of such methods for our field. Our empirical results show that, on
    regression and classification problems of intermediate size, this approach can
    interpolate between low computational cost and numerical precision.

    Subjective fairness: Fairness is in the eye of the beholder

    Christos Dimitrakakis, Yang Liu, David Parkes, Goran Radanovic
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We analyze different notions of fairness in decision making when the
    underlying model is not known with certainty. We argue that recent notions of
    fairness in machine learning need to be modified to incorporate uncertainties
    about model parameters. We introduce the notion of {em subjective fairness} as
    a suitable candidate for fair Bayesian decision making rules, relate this
    definition with existing ones, and experimentally demonstrate the inherent
    accuracy-fairness tradeoff under this definition.

    Using GPI-2 for Distributed Memory Paralleliziation of the Caffe Toolbox to Speed up Deep Neural Network Training

    Martin Kuehn, Janis Keuper, Franz-Josef Pfreundt
    Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

    Deep Neural Network (DNN) are currently of great inter- est in research and
    application. The training of these net- works is a compute intensive and time
    consuming task. To reduce training times to a bearable amount at reasonable
    cost we extend the popular Caffe toolbox for DNN with an efficient distributed
    memory communication pattern. To achieve good scalability we emphasize the
    overlap of computation and communication and prefer fine granu- lar
    synchronization patterns over global barriers. To im- plement these
    communication patterns we rely on the the Global address space Programming
    Interface version 2 (GPI-2) communication library. This interface provides a
    light-weight set of asynchronous one-sided communica- tion primitives
    supplemented by non-blocking fine gran- ular data synchronization mechanisms.
    Therefore, Caf- feGPI is the name of our parallel version of Caffe. First
    benchmarks demonstrate better scaling behavior com- pared with other
    extensions, e.g., the Intel TM Caffe. Even within a single symmetric
    multiprocessing machine with four graphics processing units, the CaffeGPI
    scales bet- ter than the standard Caffe toolbox. These first results
    demonstrate that the use of standard High Performance Computing (HPC) hardware
    is a valid cost saving ap- proach to train large DDNs. I/O is an other
    bottleneck to work with DDNs in a standard parallel HPC setting, which we will
    consider in more detail in a forthcoming paper.

    Free energy-based reinforcement learning using a quantum processor

    Anna Levit, Daniel Crawford, Navid Ghadermarzy, Jaspreet S. Oberoi, Ehsan Zahedinejad, Pooya Ronagh
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC); Quantum Physics (quant-ph)

    Recent theoretical and experimental results suggest the possibility of using
    current and near-future quantum hardware in challenging sampling tasks. In this
    paper, we introduce free energy-based reinforcement learning (FERL) as an
    application of quantum hardware. We propose a method for processing a quantum
    annealer’s measured qubit spin configurations in approximating the free energy
    of a quantum Boltzmann machine (QBM). We then apply this method to perform
    reinforcement learning on the grid-world problem using the D-Wave 2000Q quantum
    annealer. The experimental results show that our technique is a promising
    method for harnessing the power of quantum sampling in reinforcement learning
    tasks.

    The Sample Complexity of Online One-Class Collaborative Filtering

    Reinhard Heckel, Kannan Ramchandran
    Comments: ICML 2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

    We consider the online one-class collaborative filtering (CF) problem that
    consists of recommending items to users over time in an online fashion based on
    positive ratings only. This problem arises when users respond only occasionally
    to a recommendation with a positive rating, and never with a negative one. We
    study the impact of the probability of a user responding to a recommendation,
    p_f, on the sample complexity, i.e., the number of ratings required to make
    `good’ recommendations, and ask whether receiving positive and negative
    ratings, instead of positive ratings only, improves the sample complexity. Both
    questions arise in the design of recommender systems. We introduce a simple
    probabilistic user model, and analyze the performance of an online user-based
    CF algorithm. We prove that after an initial cold start phase, where
    recommendations are invested in exploring the user’s preferences, this
    algorithm makes—up to a fraction of the recommendations required for updating
    the user’s preferences—perfect recommendations. The number of ratings
    required for the cold start phase is nearly proportional to 1/p_f, and that for
    updating the user’s preferences is essentially independent of p_f. As a
    consequence we find that, receiving positive and negative ratings instead of
    only positive ones improves the number of ratings required for initial
    exploration by a factor of 1/p_f, which can be significant.

    Learning Time-Efficient Deep Architectures with Budgeted Super Networks

    Tom Veniat, Ludovic Denoyer
    Subjects: Learning (cs.LG)

    Learning neural network architectures is a way to discover new highly
    predictive models. We propose to focus on this problem from a different
    perspective where the goal is to discover architectures efficient in terms of
    both prediction quality and computation cost, e.g time in milliseconds, number
    of operations… For instance, our approach is able to solve the following
    task: find the best neural network architecture (in a very large set of
    possible architectures) able to predict well in less than 100 milliseconds on
    my mobile phone.

    Our contribution is based on a new family of models called Budgeted Super
    Networks that are learned using reinforcement-learning inspired techniques
    applied to a budgeted learning objective function which includes the
    computation cost during disk/memory operations at inference. We present a set
    of experiments on computer vision problems and show the ability of our method
    to discover efficient architectures in terms of both predictive quality and
    computation time.

    Biased Importance Sampling for Deep Neural Network Training

    Angelos Katharopoulos, François Fleuret
    Subjects: Learning (cs.LG)

    Importance sampling has been successfully used to accelerate stochastic
    optimization in many convex problems. However, the lack of an efficient way to
    calculate the importance still hinders its application to Deep Learning. In
    this paper, we show that the loss value can be used as an alternative
    importance metric, and propose a way to efficiently approximate it for a deep
    model, using a small model trained for that purpose in parallel.

    This method allows in particular to utilize a biased gradient estimate that
    implicitly optimizes a soft max-loss, and leads to better generalization
    performance. While such method suffers from a prohibitively high variance of
    the gradient estimate when using a standard stochastic optimizer, we show that
    when it is combined with our sampling mechanism, it results in a reliable
    procedure.

    We showcase the generality of our method by testing it on both image
    classification and language modeling tasks using deep convolutional and
    recurrent neural networks. In particular, in case of CIFAR10 we reach 10%
    classification error 50 epochs faster than when using uniform sampling.

    Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks

    Arash Vahdat
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Collecting large training datasets, annotated with high quality labels, is a
    costly process. This paper proposes a novel framework for training deep
    convolutional neural networks from noisy labeled datasets. The problem is
    formulated using an undirected graphical model that represents the relationship
    between noisy and clean labels, trained in a semi-supervised setting. In the
    proposed structure, the inference over latent clean labels is tractable and is
    regularized during training using auxiliary sources of information. The
    proposed model is applied to the image labeling problem and is shown to be
    effective in labeling unseen images as well as reducing label noise in training
    on CIFAR-10 and MS COCO datasets.

    Learning Disentangled Representations with Semi-Supervised Deep Generative Models

    N. Siddharth, Brooks Paige, Jan-Willem Van de Meent, Alban Desmaison, Frank Wood, Noah D. Goodman, Pushmeet Kohli, Philip H.S. Torr
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Variational autoencoders (VAEs) learn representations of data by jointly
    training a probabilistic encoder and decoder network. Typically these models
    encode all features of the data into a single variable. Here we are interested
    in learning disentangled representations that encode distinct aspects of the
    data into separate variables. We propose to learn such representations using
    model architectures that generalize from standard VAEs, employing a general
    graphical model structure in the encoder and decoder. This allows us to train
    partially-specified models that make relatively strong assumptions about a
    subset of interpretable variables and rely on the flexibility of neural
    networks to learn representations for the remaining variables. We further
    define a general objective for semi-supervised learning in this model class,
    which can be approximated using an importance sampling procedure. We evaluate
    our framework’s ability to learn disentangled representations, both by
    qualitative exploration of its generative capacity, and quantitative evaluation
    of its discriminative ability on a variety of models and datasets.

    Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints

    Nikola Mrkšić, Ivan Vulić, Diarmuid Ó Séaghdha, Ira Leviant, Roi Reichart, Milica Gašić, Anna Korhonen, Steve Young
    Comments: Accepted for publication at TACL (to be presented at EMNLP 2017)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We present Attract-Repel, an algorithm for improving the semantic quality of
    word vectors by injecting constraints extracted from lexical resources.
    Attract-Repel facilitates the use of constraints from mono- and cross-lingual
    resources, yielding semantically specialised cross-lingual vector spaces. Our
    evaluation shows that the method can make use of existing cross-lingual
    lexicons to construct high-quality vector spaces for a plethora of different
    languages, facilitating semantic transfer from high- to lower-resource ones.
    The effectiveness of our approach is demonstrated with state-of-the-art results
    on semantic similarity datasets in six languages. We next show that
    Attract-Repel-specialised vectors boost performance in the downstream task of
    dialogue state tracking (DST) across multiple languages. Finally, we show that
    cross-lingual vector spaces produced by our algorithm facilitate the training
    of multilingual DST models, which brings further performance improvements.

    Discovering Discrete Latent Topics with Neural Variational Inference

    Yishu Miao, Edward Grefenstette, Phil Blunsom
    Comments: ICML 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Learning (cs.LG)

    Topic models have been widely explored as probabilistic generative models of
    documents. Traditional inference methods have sought closed-form derivations
    for updating the models, however as the expressiveness of these models grows,
    so does the difficulty of performing fast and accurate inference over their
    parameters. This paper presents alternative neural approaches to topic
    modelling by providing parameterisable distributions over topics which permit
    training by backpropagation in the framework of neural variational inference.
    In addition, with the help of a stick-breaking construction, we propose a
    recurrent network that is able to discover a notionally unbounded number of
    topics, analogous to Bayesian non-parametric topic models. Experimental results
    on the MXM Song Lyrics, 20NewsGroups and Reuters News datasets demonstrate the
    effectiveness and efficiency of these neural topic models.

    Stable recovery of the factors from a deep matrix product and application to convolutional networks. Focus on sparsity constraints

    Francois Malgouyres (IMT), Joseph Landsberg (TAMU)
    Comments: arXiv admin note: substantial text overlap with arXiv:1703.08044
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Statistics Theory (math.ST)

    We study a deep matrix factorization problem. It takes as input a matrix X
    obtained by multiplying K matrices (called factors). Each factor is obtained by
    applying a fixed linear operator to a vector of parameters satisfying a
    sparsity constraint. We provide sharp conditions on the structure of the model
    that guarantee the stable recovery of the factors from the knowledge of X and
    the model for the factors. This is crucial in order to interpret the factors
    and the intermediate features obtained when applying a few factors to a datum.
    When K = 1: the paper provides compressed sensing statements; K = 2 covers (for
    instance) Non-negative Matrix Factorization, Dictionary learning, low rank
    approximation, phase recovery. The particularity of this paper is to extend the
    study to deep problems. As an illustration, we detail the analysis and provide
    (entirely computable) guarantees for the stable recovery of a (non-neural)
    sparse convolutional network.

    Discriminative k-shot learning using probabilistic models

    Matthias Bauer, Mateo Rojas-Carulla, Jakub Bartłomiej Świątkowski, Bernhard Schölkopf, Richard E. Turner
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    This paper introduces a probabilistic framework for k-shot image
    classification. The goal is to generalise from an initial large-scale
    classification task to a separate task comprising new classes and small numbers
    of examples. The new approach not only leverages the feature-based
    representation learned by a neural network from the initial task
    (representational transfer), but also information about the form of the classes
    (concept transfer). The concept information is encapsulated in a probabilistic
    model for the final layer weights of the neural network which then acts as a
    prior when probabilistic k-shot learning is performed. Surprisingly, simple
    probabilistic models and inference schemes outperform many existing k-shot
    learning approaches and compare favourably with the state-of-the-art method in
    terms of error-rate. The new probabilistic methods are also able to accurately
    model uncertainty, leading to well calibrated classifiers, and they are easily
    extensible and flexible, unlike many recent approaches to k-shot learning.

    Supervised Quantile Normalisation

    Marine Le Morvan (CBIO), Jean-Philippe Vert (DMA, CBIO)
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Quantitative Methods (q-bio.QM)

    Quantile normalisation is a popular normalisation method for data subject to
    unwanted variations such as images, speech, or genomic data. It applies a
    monotonic transformation to the feature values of each sample to ensure that
    after normalisation, they follow the same target distribution for each sample.
    Choosing a “good” target distribution remains however largely empirical and
    heuristic, and is usually done independently of the subsequent analysis of
    normalised data. We propose instead to couple the quantile normalisation step
    with the subsequent analysis, and to optimise the target distribution jointly
    with the other parameters in the analysis. We illustrate this principle on the
    problem of estimating a linear model over normalised data, and show that it
    leads to a particular low-rank matrix regression problem that can be solved
    efficiently. We illustrate the potential of our method, which we term SUQUAN,
    on simulated data, images and genomic data, where it outperforms standard
    quantile normalisation.

    Cross-modal Common Representation Learning by Hybrid Transfer Network

    Xin Huang, Yuxin Peng, Mingkuan Yuan
    Comments: To appear in the proceedings of 26th International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, Aug. 19-25, 2017
    Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    DNN-based cross-modal retrieval is a research hotspot to retrieve across
    different modalities as image and text, but existing methods often face the
    challenge of insufficient cross-modal training data. In single-modal scenario,
    similar problem is usually relieved by transferring knowledge from large-scale
    auxiliary datasets (as ImageNet). Knowledge from such single-modal datasets is
    also very useful for cross-modal retrieval, which can provide rich general
    semantic information that can be shared across different modalities. However,
    it is challenging to transfer useful knowledge from single-modal (as image)
    source domain to cross-modal (as image/text) target domain. Knowledge in source
    domain cannot be directly transferred to both two different modalities in
    target domain, and the inherent cross-modal correlation contained in target
    domain provides key hints for cross-modal retrieval which should be preserved
    during transfer process. This paper proposes Cross-modal Hybrid Transfer
    Network (CHTN) with two subnetworks: Modal-sharing transfer subnetwork utilizes
    the modality in both source and target domains as a bridge, for transferring
    knowledge to both two modalities simultaneously; Layer-sharing correlation
    subnetwork preserves the inherent cross-modal semantic correlation to further
    adapt to cross-modal retrieval task. Cross-modal data can be converted to
    common representation by CHTN for retrieval, and comprehensive experiment on 3
    datasets shows its effectiveness.

    Scalable Generalized Linear Bandits: Online Computation and Hashing

    Kwang-Sung Jun, Aniruddha Bhargava, Robert Nowak, Rebecca Willett
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Generalized Linear Bandits (GLBs), a natural extension of the stochastic
    linear bandits, has been popular and successful in recent years. However,
    existing GLBs scale poorly with the number of rounds and the number of arms,
    limiting their utility in practice. This paper proposes new, scalable solutions
    to the GLB problem in two respects. First, unlike existing GLBs, whose
    per-time-step space and time complexity grow at least linearly with time (t),
    we propose a new algorithm that performs online computations to enjoy a
    constant space and time complexity. At its heart is a novel Generalized Linear
    extension of the Online-to-confidence-set Conversion (GLOC method) that takes
    emph{any} online learning algorithm and turns it into a GLB algorithm. As a
    special case, we apply GLOC to the online Newton step algorithm, which results
    in a low-regret GLB algorithm with much lower time and memory complexity than
    prior work. Second, for the case where the number (N) of arms is very large, we
    propose new algorithms in which each next arm is selected via an inner product
    search. Such methods can be implemented via hashing algorithms (i.e.,
    “hash-amenable”) and result in a time complexity sublinear in (N). While a
    Thompson sampling extension of GLOC is hash-amenable, its regret bound for
    (d)-dimensional arm sets scales with (d^{3/2}), whereas GLOC’s regret bound is
    linear in (d). Towards closing this gap, we propose a new hash-amenable
    algorithm whose regret bound scales with (d^{5/4}). Finally, we propose a fast
    approximate hash-key computation (inner product) that has a better accuracy
    than the state-of-the-art, which can be of independent interest. We conclude
    the paper with preliminary experimental results confirming the merits of our
    methods.

    Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

    Jonathan Scarlett, Ilijia Bogunovic, Volkan Cevher
    Comments: Accepted to COLT 2017
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    In this paper, we consider the problem of sequentially optimizing a black-box
    function (f) based on noisy samples and bandit feedback. We assume that (f) is
    smooth in the sense of having a bounded norm in some reproducing kernel Hilbert
    space (RKHS), yielding a commonly-considered non-Bayesian form of Gaussian
    process bandit optimization. We provide algorithm-independent lower bounds on
    the simple regret, measuring the suboptimality of a single point reported after
    (T) rounds, and on the cumulative regret, measuring the sum of regrets over the
    (T) chosen points.

    For the isotropic squared-exponential kernel in (d) dimensions, we find that
    an average simple regret of (epsilon) requires (T =
    Omegaig(frac{1}{epsilon^2} (logfrac{1}{epsilon})^{d/2}ig)), and the
    average cumulative regret is at least (Omegaig( sqrt{T(log T)^d} ig)),
    thus matching existing upper bounds up to the replacement of (d/2) by (d+O(1))
    in both cases. For the Mat’ern-(
    u) kernel, we give analogous bounds of the
    form (Omegaig( (frac{1}{epsilon})^{2+d/
    u}ig)) and (Omegaig(
    T^{frac{
    u + d}{2
    u + d}} ig)), and discuss the resulting gaps to the
    existing upper bounds.

    Megapixel Size Image Creation using Generative Adversarial Networks

    Marco Marchesi
    Comments: 3 pages, 4 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Learning (cs.LG)

    Since its appearance, Generative Adversarial Networks (GANs) have received a
    lot of interest in the AI community. In image generation several projects
    showed how GANs are able to generate photorealistic images but the results so
    far did not look adequate for the quality standard of visual media production
    industry. We present an optimized image generation process based on a Deep
    Convolutional Generative Adversarial Networks (DCGANs), in order to create
    photorealistic high-resolution images (up to 1024×1024 pixels). Furthermore,
    the system was fed with a limited dataset of images, less than two thousand
    images. All these results give more clue about future exploitation of GANs in
    Computer Graphics and Visual Effects.

    Low-Rank Matrix Approximation in the Infinity Norm

    Nicolas Gillis, Yaroslav Shitov
    Comments: 12 pages, 3 tables
    Subjects: Computational Complexity (cs.CC); Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)

    The low-rank matrix approximation problem with respect to the entry-wise
    (ell_{infty})-norm is the following: given a matrix (M) and a factorization
    rank (r), find a matrix (X) whose rank is at most (r) and that minimizes
    (max_{i,j} |M_{ij} – X_{ij}|). In this paper, we prove that the decision
    variant of this problem for (r=1) is NP-complete using a reduction from the
    problem `not all equal 3SAT’. We also analyze several cases when the problem
    can be solved in polynomial time, and propose a simple practical heuristic
    algorithm which we apply on the problem of the recovery of a quantized low-rank
    matrix.

    Deep Generative Adversarial Networks for Compressed Sensing Automates MRI

    Morteza Mardani, Enhao Gong, Joseph Y. Cheng, Shreyas Vasanawala, Greg Zaharchuk, Marcus Alley, Neil Thakur, Song Han, William Dally, John M. Pauly, Lei Xing
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Magnetic resonance image (MRI) reconstruction is a severely ill-posed linear
    inverse task demanding time and resource intensive computations that can
    substantially trade off {it accuracy} for {it speed} in real-time imaging. In
    addition, state-of-the-art compressed sensing (CS) analytics are not cognizant
    of the image {it diagnostic quality}. To cope with these challenges we put
    forth a novel CS framework that permeates benefits from generative adversarial
    networks (GAN) to train a (low-dimensional) manifold of diagnostic-quality MR
    images from historical patients. Leveraging a mixture of least-squares (LS)
    GANs and pixel-wise (ell_1) cost, a deep residual network with skip
    connections is trained as the generator that learns to remove the {it
    aliasing} artifacts by projecting onto the manifold. LSGAN learns the texture
    details, while (ell_1) controls the high-frequency noise. A multilayer
    convolutional neural network is then jointly trained based on diagnostic
    quality images to discriminate the projection quality. The test phase performs
    feed-forward propagation over the generator network that demands a very low
    computational overhead. Extensive evaluations are performed on a large
    contrast-enhanced MR dataset of pediatric patients. In particular, images rated
    based on expert radiologists corroborate that GANCS retrieves high contrast
    images with detailed texture relative to conventional CS, and pixel-wise
    schemes. In addition, it offers reconstruction under a few milliseconds, two
    orders of magnitude faster than state-of-the-art CS-MRI schemes.

    Machine Learning Based Crackle Detection in Lung Sounds

    Morten Grønnesby, Juan Carlos Aviles Solis, Einar Holsbø, Hasse Melbye, Lars Ailo Bongo
    Subjects: Sound (cs.SD); Learning (cs.LG)

    The stethoscope is a well-known and widely available diagnostic instrument.
    In recent years, many innovative solutions for recording and viewing sounds
    from a stethoscope have become available. However, to fully utilize such
    devices, there is a need for an automated approach for detecting abnormal lung
    sounds, which is better than the existing methods that typically have been
    developed and evaluated using a small and non-diverse dataset.

    We propose a machine learning based approach for detecting crackles in lung
    sounds recorded using a stethoscope in a large health survey. Our method is
    trained and evaluated using 209 files with crackles classified by expert
    listeners. Our analysis pipeline is based on features extracted from small
    windows in audio files. We evaluated several feature extraction methods and
    classifiers. We evaluated the pipeline using a training set of 175 crackle
    windows and 208 normal windows.

    We found and evaluated a 5-dimenstional vector with four features from the
    time domain and one from the spectrum domain. We evaluated several classifiers
    and found SVM with a Radial Basis Function Kernel to perform best for our
    5-dimensional feature vector. Our approach had a precision of 86% and recall of
    84% for classifying a crackle in a window, which is more accurate than found in
    studies of health personnel. The low-dimensional feature vector makes the SVM
    very fast. The model can be trained on a regular computer in 1.44 seconds, and
    319 crackles can be classified in 1.08 seconds.

    Our approach detects and visualizes individual crackles in recorded audio
    files. It is accurate, fast, and has low resource requirements. The approach is
    therefore well suited for deployment on smart devices and phones or as a web
    application. It can be used to train health personnel or as part of a
    smartphone application for Bluetooth stethoscopes.

    Surface Networks

    Ilya Kostrikov, Joan Bruna, Daniele Panozzo, Denis Zorin
    Subjects: Machine Learning (stat.ML); Graphics (cs.GR); Learning (cs.LG)

    We study data-driven representations for three-dimensional triangle meshes,
    which are one of the prevalent objects used to represent 3D geometry. Recent
    works have developed models that exploit the intrinsic geometry of manifolds
    and graphs, namely the Graph Neural Networks (GNNs) and its spectral variants,
    which learn from the local metric tensor via the Laplacian operator.

    Despite offering excellent sample complexity and built-in invariances,
    intrinsic geometry alone is invariant to isometric deformations, making it
    unsuitable for many applications. To overcome this limitation, we propose
    several upgrades to GNNs to leverage extrinsic differential geometry properties
    of three-dimensional surfaces, increasing its modeling power. In particular, we
    propose to exploit the Dirac operator, whose spectrum detects principal
    curvature directions — this is in stark contrast with the classical Laplace
    operator, which directly measures mean curvature. We coin the resulting model
    the emph{Surface Network (SN)}. We demonstrate the efficiency and versatility
    of SNs on two challenging tasks: temporal prediction of mesh deformations under
    non-linear dynamics and generative models using a variational autoencoder
    framework with encoders/decoders given by SNs.

    Better Text Understanding Through Image-To-Text Transfer

    Karol Kurach, Sylvain Gelly, Michal Jastrzebski, Philip Haeusser, Olivier Teytaud, Damien Vincent, Olivier Bousquet
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Generic text embeddings are successfully used in a variety of tasks. However,
    they are often learnt by capturing the co-occurrence structure from pure text
    corpora, resulting in limitations of their ability to generalize. In this
    paper, we explore models that incorporate visual information into the text
    representation. Based on comprehensive ablation studies, we propose a
    conceptually simple, yet well performing architecture. It outperforms previous
    multimodal approaches on a set of well established benchmarks. We also improve
    the state-of-the-art results for image-related text datasets, using orders of
    magnitude less data.


    Information Theory

    Benchmark problems for phase retrieval

    Veit Elser, Ti-Yen Lan
    Comments: 7 pages, 6 figures
    Subjects: Information Theory (cs.IT); Data Analysis, Statistics and Probability (physics.data-an)

    The hardest instances of phase retrieval arise in crystallography, where the
    signal is periodic and comprised of atomic distributions arranged uniformly in
    the unit cell of the crystal. We have constructed a graded set of benchmark
    problems for evaluating algorithms that perform this type of phase retrieval. A
    simple iterative algorithm was used to establish baseline runtimes that
    empirically grow exponentially in the sparsity of the signal autocorrelation.
    We also review the algorithms used by the leading software packages for
    crystallographic phase retrieval.

    More new classes of permutation trinomials over (mathbb{F}_{2^n})

    Yanping Wang, WeiGuo Zhang, Zhengbang Zha
    Comments: 17 pages
    Subjects: Information Theory (cs.IT); Number Theory (math.NT)

    Permutation polynomials over finite fields have wide applications in many
    areas of science and engineering. In this paper, we present six new classes of
    permutation trinomials over (mathbb{F}_{2^n}) which have explicit forms by
    determining the solutions of some equations.

    Optimal Slotted ALOHA under Delivery Deadline Constraint for Multiple-Packet Reception

    Yijin Zhang, Yuan-Hsun Lo, Feng Shu, Jun Li
    Subjects: Information Theory (cs.IT)

    This paper considers a time-slotted communication channel that is shared by
    (N) users transmitting to a single receiver under saturated traffic. It is
    assumed that the receiver has the ability of the multiple-packet reception
    (MPR) to correctly decode up to (M) ((1 leq M < N)) time-overlapping packets.
    Each user attempts to access the channel following a slotted ALOHA protocol,
    and transmits a packet within a channel slot with a common transmission
    probability. To evaluate the reliability in the scenario that a packet needs to
    be transmitted within a strict delivery deadline (D) ((Dgeq 1)) slots since
    its arrival at the head of queue, we consider the successful delivery
    probability of a packet as performance metric of interest. Generalizing
    previous studies that only focused on the single-packet reception (SPR) channel
    (i.e., (M=1)) or the throughput performance (i.e., (D=1)), we derive the
    optimal transmission probability that maximizes the successful delivery
    probability for any (1leq M<N) and any (Dgeq1). In particular, since that the
    previous result on the optimal transmission probability under MPR for (D=1) was
    obtained relying on an unproved technical condition, the throughput
    maximization issue under MPR is first completely addressed in this paper.
    Furthermore, by noting that maximizing the successful delivery probability for
    (D>1) would degrade the throughput performance, we obtain the optimal
    transmission probability subject to throughput constraint.

    Multi-point Codes from the GGS Curves

    Chuangqiang Hu, Shudi Yang
    Comments: 24 pages. arXiv admin note: text overlap with arXiv:1607.05462
    Subjects: Information Theory (cs.IT)

    This paper is concerned with the construction of algebraic geometric codes
    defined from GGS curves. It is of significant use to describe bases for the
    Riemann-Roch spaces associated with totally ramified places, which enables us
    to study multi-point AG codes. Along this line, we characterize explicitly the
    Weierstrass semigroups and pure gaps. Additionally, we determine the floor of a
    certain type of divisor and investigate the properties of AG codes from GGS
    curves. Finally, we apply these results to find multi-point codes with
    excellent parameters. As one of the examples, a presented code with parameters
    ( [216,190,geqslant 18] ) over ( mathbb{F}_{64} ) yields a new record.

    Energy Harvesting Networks with General Utility Functions: Near Optimal Online Policies

    Ahmed Arafa, Abdulrahman Baknina, Sennur Ulukus
    Comments: To appear in the 2017 IEEE International Symposium on Information Theory. arXiv admin note: text overlap with arXiv:1705.10305
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    We consider online scheduling policies for single-user energy harvesting
    communication systems, where the goal is to characterize online policies that
    maximize the long term average utility, for some general concave and
    monotonically increasing utility function. In our setting, the transmitter
    relies on energy harvested from nature to send its messages to the receiver,
    and is equipped with a finite-sized battery to store its energy. Energy packets
    are independent and identically distributed (i.i.d.) over time slots, and are
    revealed causally to the transmitter. Only the average arrival rate is known a
    priori. We first characterize the optimal solution for the case of Bernoulli
    arrivals. Then, for general i.i.d. arrivals, we first show that fixed fraction
    policies [Shaviv-Ozgur] are within a constant multiplicative gap from the
    optimal solution for all energy arrivals and battery sizes. We then derive a
    set of sufficient conditions on the utility function to guarantee that fixed
    fraction policies are within a constant additive gap as well from the optimal
    solution.

    Modeling and Design of Millimeter-Wave Networks for Highway Vehicular Communication

    Andrea Tassi, Malcolm Egan, Robert J. Piechocki, Andrew Nix
    Subjects: Information Theory (cs.IT); Performance (cs.PF)

    Connected and autonomous vehicles will play a pivotal role in future
    Intelligent Transportation Systems (ITSs) and smart cities, in general.
    High-speed and low-latency wireless communication links will allow
    municipalities to warn vehicles against safety hazards, as well as support
    cloud-driving solutions to drastically reduce traffic jams and air pollution.
    To achieve these goals, vehicles need to be equipped with a wide range of
    sensors generating and exchanging high rate data streams. Recently, millimeter
    wave (mmWave) techniques have been introduced as a means of fulfilling such
    high data rate requirements. In this paper, we model a highway communication
    network and characterize its fundamental link budget metrics. In particular, we
    specifically consider a network where vehicles are served by mmWave Base
    Stations (BSs) deployed alongside the road. To evaluate our highway network, we
    develop a new theoretical model that accounts for a typical scenario where
    heavy vehicles (such as buses and lorries) in slow lanes obstruct Line-of-Sight
    (LOS) paths of vehicles in fast lanes and, hence, act as blockages. Using tools
    from stochastic geometry, we derive approximations for the
    Signal-to-Interference-plus-Noise ratio (SINR) outage probability, as well as
    the probability that a user achieves a target communication rate (rate coverage
    probability). Our analysis provides new design insights for mmWave highway
    communication networks. In particular, we show that smaller antenna beamwidths
    and, unlike bi-dimensional mmWave cellular networks, smaller BS densities do
    not necessarily have a disruptive impact on improving the SINR outage
    probability, and consequently the rate coverage probability.

    Coding Method for Parallel Iterative Linear Solver

    Yaoqing Yang, Pulkit Grover, Soummya Kar
    Comments: submitted
    Subjects: Information Theory (cs.IT)

    Computationally intensive distributed and parallel computing is often
    bottlenecked by a small set of slow workers known as stragglers. In this paper,
    we utilize the emerging idea of “coded computation” to design a novel
    error-correcting-code inspired technique for solving linear inverse problems
    under specific iterative methods in a parallelized implementation affected by
    stragglers. Example applications include inverse problems in machine learning
    on graphs, such as personalized PageRank and sampling on graphs. We provably
    show that our coded-computation technique can reduce the mean-squared error
    under a computational deadline constraint. In fact, the ratio of mean-squared
    error of replication-based and coded techniques diverges to infinity as the
    deadline increases. Our experiments for personalized PageRank performed on real
    systems and real social networks show that this ratio can be as large as
    (10^4). Further, unlike coded-computation techniques proposed thus far, our
    strategy combines outputs of all workers, including the stragglers, to produce
    more accurate estimates at the computational deadline. This also ensures that
    the accuracy degrades “gracefully” in the event that the number of stragglers
    is large.

    Asymptotic Outage Analysis of HARQ-IR over Time-Correlated Nakagami-(m) Fading Channels

    Zheng Shi, Shaodan Ma, Guanghua Yang, Kam-Weng Tam, Minghua Xia
    Subjects: Information Theory (cs.IT)

    In this paper, outage performance of hybrid automatic repeat request with
    incremental redundancy (HARQ-IR) is analyzed. Unlike prior analyses,
    time-correlated Nakagami-(m) fading channel is considered. The outage analysis
    thus involves the probability distribution analysis of a product of multiple
    correlated shifted Gamma random variables and is more challenging than prior
    analyses. Based on the finding of the conditional independence of the received
    signal-to-noise ratios (SNRs), the outage probability is exactly derived by
    using conditional Mellin transform. Specifically, the outage probability of
    HARQ-IR under time-correlated Nakagami-(m) fading channels can be written as a
    weighted sum of outage probabilities of HARQ-IR over independent Nakagami
    fading channels, where the weightings are determined by a negative multinomial
    distribution. This result enables not only an efficient truncation
    approximation of the outage probability with uniform convergence but also
    asymptotic outage analysis to further extract clear insights which have never
    been discovered for HARQ-IR even under fast fading channels. The asymptotic
    outage probability is then derived in a simple form which clearly quantifies
    the impacts of transmit powers, channel time correlation and information
    transmission rate. It is proved that the asymptotic outage probability is an
    inverse power function of the product of transmission powers in all HARQ
    rounds, an increasing function of the channel time correlation coefficients,
    and a monotonically increasing and convex function of information transmission
    rate. The simple expression of the asymptotic result enables optimal power
    allocation and optimal rate selection of HARQ-IR with low complexity. Finally,
    numerical results are provided to verify our analytical results and justify the
    application of the asymptotic result for optimal system design.

    Optimal repair of Reed-Solomon codes: Achieving the cut-set bound

    Itzhak Tamo, Min Ye, Alexander Barg
    Subjects: Information Theory (cs.IT)

    Coding for distributed storage gives rise to a new set of problems in coding
    theory related to the need of reducing inter-node communication in the system.
    A large number of recent papers addressed the problem of optimizing the total
    amount of information downloaded for repair of a single failed node (the repair
    bandwidth) by accessing information on (d) {em helper nodes}, where (kle dle
    n-1.) By the so-called cut-set bound (Dimakis et al., 2010), the repair
    bandwidth of an ((n,k=n-r)) MDS code using (d) helper nodes is at least
    (dl/(d+1-k),) where (l) is the size of the node. Also, a number of known
    constructions of MDS array codes meet this bound with equality. In a related
    but separate line of work, Guruswami and Wootters (2016) studied repair of
    Reed-Solomon (RS) codes, showing that these codes can be repaired using a
    smaller bandwidth than under the trivial approach. At the same time, their work
    as well as follow-up papers stopped short of constructing RS codes (or any
    scalar MDS codes) that meet the cut-set bound with equality, which has been an
    open problem in coding theory. In this work we present a solution to this
    problem, constructing RS codes of length (n) over the field (q^l,
    l=exp((1+o(1))nlog n)) that meet the cut-set bound. We also prove an almost
    matching lower bound on (l), showing that the super-exponential scaling is both
    necessary and sufficient for achieving the cut-set bound using linear repair
    schemes. More precisely, we prove that for scalar MDS codes (including the RS
    codes) to meet this bound, the sub-packetization (l) must satisfy (l ge
    exp((1+o(1)) klog k).)

    Low Subpacketization Schemes for Coded Caching

    Li Tang, Aditya Ramamoorthy
    Comments: This paper was presented in part at the 2016 IEEE Workshop on Network Coding and Applications (NetCod) and will be presented in part at the 2017 IEEE International Symposium on Information Theory (ISIT)
    Subjects: Information Theory (cs.IT)

    Coded caching is a technique that generalizes conventional caching and
    promises significant reductions in traffic over caching networks. However, the
    basic coded caching scheme requires that each file hosted in the server be
    partitioned into a large number (i.e., the subpacketization level) of
    non-overlapping subfiles. From a practical perspective, this is problematic as
    it means that prior schemes are only applicable when the size of the files is
    extremely large. In this work, we propose coded caching schemes based on
    combinatorial structures called resolvable designs. These structures can be
    obtained in a natural manner from linear block codes whose generator matrices
    possess certain rank properties. We demonstrate that several schemes with
    subpacketization levels that are exponentially smaller than the basic scheme
    can be obtained.

    Inexact Gradient Projection and Fast Data Driven Compressed Sensing

    Mohammad Golbabaee, Mike E. Davies
    Subjects: Information Theory (cs.IT)

    We study convergence of the iterative projected gradient (IPG) algorithm for
    arbitrary (possibly nonconvex) sets and when both the gradient and projection
    oracles are computed approximately. We consider different notions of
    approximation of which we show that the Progressive Fixed Precision (PFP) and
    the ((1+epsilon))-optimal oracles can achieve the same accuracy as for the
    exact IPG algorithm. We show that the former scheme is also able to maintain
    the (linear) rate of convergence of the exact algorithm, under the same
    embedding assumption. In contrast, the ((1+epsilon))-approximate oracle
    requires a stronger embedding condition, moderate compression ratios and it
    typically slows down the convergence. We apply our results to accelerate
    solving a class of data driven compressed sensing problems, where we replace
    iterative exhaustive searches over large datasets by fast approximate nearest
    neighbour search strategies based on the cover tree data structure. For
    datasets with low intrinsic dimensions our proposed algorithm achieves a
    complexity logarithmic in terms of the dataset population as opposed to the
    linear complexity of a brute force search. By running several numerical
    experiments we conclude similar observations as predicted by our theoretical
    analysis.

    Interference Modeling for Cellular Networks under Beamforming Transmission

    Hussain Elkotby, Mai Vu
    Comments: 32 pages, 9 figures, accepted in IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    We propose analytical models for the interference power distribution in a
    cellular system employing MIMO beamforming in rich and limited scattering
    environments, which capture non line-of-sight signal propagation in the
    microwave and mmWave bands, respectively. Two candidate models are considered:
    the Inverse Gaussian and the Inverse Weibull, both are two-parameter heavy tail
    distributions. We further propose a mixture of these two distributions as a
    model with three parameters. To estimate the parameters of these distributions,
    three approaches are used: moment matching, individual distribution maximum
    likelihood estimation (MLE), and mixture distribution MLE with a designed
    expectation maximization algorithm. We then introduce simple fitted functions
    for the mixture model parameters as polynomials of the channel path loss
    exponent and shadowing variance. To measure the goodness of these models, the
    information-theoretic metric relative entropy is used to capture the distance
    from the model distribution to a reference one. The interference models are
    tested against data obtained by simulating a cellular network based on
    stochastic geometry. The results show that the three-parameter mixture model
    offers remarkably good fit to simulated interference power. The mixture model
    is further used to analyze the capacity of a cellular network employing joint
    transmit and receive beamforming and confirms a good fit with simulation.

    Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

    Jonathan Scarlett, Ilijia Bogunovic, Volkan Cevher
    Comments: Accepted to COLT 2017
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    In this paper, we consider the problem of sequentially optimizing a black-box
    function (f) based on noisy samples and bandit feedback. We assume that (f) is
    smooth in the sense of having a bounded norm in some reproducing kernel Hilbert
    space (RKHS), yielding a commonly-considered non-Bayesian form of Gaussian
    process bandit optimization. We provide algorithm-independent lower bounds on
    the simple regret, measuring the suboptimality of a single point reported after
    (T) rounds, and on the cumulative regret, measuring the sum of regrets over the
    (T) chosen points.

    For the isotropic squared-exponential kernel in (d) dimensions, we find that
    an average simple regret of (epsilon) requires (T =
    Omegaig(frac{1}{epsilon^2} (logfrac{1}{epsilon})^{d/2}ig)), and the
    average cumulative regret is at least (Omegaig( sqrt{T(log T)^d} ig)),
    thus matching existing upper bounds up to the replacement of (d/2) by (d+O(1))
    in both cases. For the Mat’ern-(
    u) kernel, we give analogous bounds of the
    form (Omegaig( (frac{1}{epsilon})^{2+d/
    u}ig)) and (Omegaig(
    T^{frac{
    u + d}{2
    u + d}} ig)), and discuss the resulting gaps to the
    existing upper bounds.

    The Sample Complexity of Online One-Class Collaborative Filtering

    Reinhard Heckel, Kannan Ramchandran
    Comments: ICML 2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

    We consider the online one-class collaborative filtering (CF) problem that
    consists of recommending items to users over time in an online fashion based on
    positive ratings only. This problem arises when users respond only occasionally
    to a recommendation with a positive rating, and never with a negative one. We
    study the impact of the probability of a user responding to a recommendation,
    p_f, on the sample complexity, i.e., the number of ratings required to make
    `good’ recommendations, and ask whether receiving positive and negative
    ratings, instead of positive ratings only, improves the sample complexity. Both
    questions arise in the design of recommender systems. We introduce a simple
    probabilistic user model, and analyze the performance of an online user-based
    CF algorithm. We prove that after an initial cold start phase, where
    recommendations are invested in exploring the user’s preferences, this
    algorithm makes—up to a fraction of the recommendations required for updating
    the user’s preferences—perfect recommendations. The number of ratings
    required for the cold start phase is nearly proportional to 1/p_f, and that for
    updating the user’s preferences is essentially independent of p_f. As a
    consequence we find that, receiving positive and negative ratings instead of
    only positive ones improves the number of ratings required for initial
    exploration by a factor of 1/p_f, which can be significant.




沪ICP备19023445号-2号
友情链接