IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 22 Feb 2017

    我爱机器学习(52ml.net)发表于 2017-02-22 00:00:00
    love 0

    Neural and Evolutionary Computing

    Online Representation Learning with Multi-layer Hebbian Networks for Image Classification Tasks

    Yanis Bahroun, Andrea Soltoggio
    Comments: 8 pages, 6 figures
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

    Unsupervised learning allows algorithms to adapt to different data thanks to
    the autonomous discovery of discriminating features during the training. When
    these algorithms are reducible to cost-function minimisation, better
    interpretations of their learning dynamics are possible. Recently, new
    Hebbian-like plasticity, bio-inspired, local and unsupervised learning rules
    for neural networks, have been shown to minimise a cost-function while
    performing online sparse representation learning. However, it is unclear to
    what degree such rules are effective to learn features from images. To
    investigate this point, this study introduces a novel multi-layer Hebbian
    network trained by a rule derived from a non-negative classical
    multidimensional scaling cost-function. The performance is compared to that of
    other fully unsupervised learning algorithms.

    Predicting non-linear dynamics: a stable local learning scheme for recurrent spiking neural networks

    Aditya Gilra, Wulfram Gerstner
    Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)

    Brains need to predict how our muscles and body react to motor commands. How
    networks of spiking neurons can learn to reproduce these non-linear dynamics,
    using local, online and stable learning rules, is an important, open question.
    Here, we present a supervised learning scheme for the feedforward and recurrent
    connections in a network of heterogeneous spiking neurons. The error in the
    output is fed back through fixed random connections with a negative gain,
    causing the network to follow the desired dynamics, while an online and local
    rule changes the weights; hence we call the scheme FOLLOW (Feedback-based
    Online Local Learning Of Weights) The rule is local in the sense that weight
    changes depend on the presynaptic activity and the error signal projected onto
    the post-synaptic neuron. We provide examples of learning linear, non-linear
    and chaotic dynamics, as well as the dynamics of a two-link arm. Using the
    Lyapunov method, and under reasonable assumptions and approximations, we show
    that FOLLOW learning is uniformly stable, with the error going to zero
    asymptotically.

    Survey of Reasoning using Neural networks

    Amit Sahu
    Comments: 12 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Reason and inference require process as well as memory skills by humans.
    Neural networks are able to process tasks like image recognition (better than
    humans) but in memory aspects are still limited (by attention mechanism, size).
    Recurrent Neural Network (RNN) and it’s modified version LSTM are able to solve
    small memory contexts, but as context becomes larger than a threshold, it is
    difficult to use them. The Solution is to use large external memory. Still, it
    poses many challenges like, how to train neural networks for discrete memory
    representation, how to describe long term dependencies in sequential data etc.
    Most prominent neural architectures for such tasks are Memory networks:
    inference components combined with long term memory and Neural Turing Machines:
    neural networks using external memory resources. Also, additional techniques
    like attention mechanism, end to end gradient descent on discrete memory
    representation are needed to support these solutions. Preliminary results of
    above neural architectures on simple algorithms (sorting, copying) and Question
    Answering (based on story, dialogs) application are comparable with the state
    of the art. In this paper, I explain these architectures (in general), the
    additional techniques used and the results of their application.


    Computer Vision and Pattern Recognition

    VidLoc: 6-DoF Video-Clip Relocalization

    Ronald Clark, Sen Wang, Andrew Markham, Niki Trigoni, Hongkai Wen
    Comments: Preliminary version
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Machine learning techniques, namely convolutional neural networks (CNN) and
    regression forests, have recently shown great promise in performing 6-DoF
    localization of monocular images. However, in most cases image-sequences,
    rather only single images, are readily available. To this extent, none of the
    proposed learning-based approaches exploit the valuable constraint of temporal
    smoothness, often leading to situations where the per-frame error is larger
    than the camera motion. In this paper we propose a recurrent model for
    performing 6-DoF localization of video-clips. We find that, even by considering
    only short sequences (20 frames), the pose estimates are smoothed and the
    localization error can be drastically reduced. Finally, we consider means of
    obtaining probabilistic pose estimates from our model. We evaluate our method
    on openly-available real-world autonomous driving and indoor localization
    datasets.

    PixelNet: Representation of the pixels, by the pixels, and for the pixels

    Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta. Deva Ramanan
    Comments: Project Page: this http URL arXiv admin note: substantial text overlap with arXiv:1609.06694
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)

    We explore design principles for general pixel-level prediction problems,
    from low-level edge detection to mid-level surface normal estimation to
    high-level semantic segmentation. Convolutional predictors, such as the
    fully-convolutional network (FCN), have achieved remarkable success by
    exploiting the spatial redundancy of neighboring pixels through convolutional
    processing. Though computationally efficient, we point out that such approaches
    are not statistically efficient during learning precisely because spatial
    redundancy limits the information learned from neighboring pixels. We
    demonstrate that stratified sampling of pixels allows one to (1) add diversity
    during batch updates, speeding up learning; (2) explore complex nonlinear
    predictors, improving accuracy; and (3) efficiently train state-of-the-art
    models tabula rasa (i.e., “from scratch”) for diverse pixel-labeling tasks. Our
    single architecture produces state-of-the-art results for semantic segmentation
    on PASCAL-Context dataset, surface normal estimation on NYUDv2 depth dataset,
    and edge detection on BSDS.

    Crowd Sourcing Image Segmentation with iaSTAPLE

    Dmitrij Schlesinger, Florian Jug, Gene Myers, Carsten Rother, Dagmar Kainmüller
    Comments: Accepted to ISBI2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a novel label fusion technique as well as a crowdsourcing protocol
    to efficiently obtain accurate epithelial cell segmentations from non-expert
    crowd workers. Our label fusion technique simultaneously estimates the true
    segmentation, the performance levels of individual crowd workers, and an image
    segmentation model in the form of a pairwise Markov random field. We term our
    approach image-aware STAPLE (iaSTAPLE) since our image segmentation model
    seamlessly integrates into the well-known and widely used STAPLE approach. In
    an evaluation on a light microscopy dataset containing more than 5000 membrane
    labeled epithelial cells of a fly wing, we show that iaSTAPLE outperforms
    STAPLE in terms of segmentation accuracy as well as in terms of the accuracy of
    estimated crowd worker performance levels, and is able to correctly segment 99%
    of all cells when compared to expert segmentations. These results show that
    iaSTAPLE is a highly useful tool for crowd sourcing image segmentation.

    Traffic Surveillance Camera Calibration by 3D Model Bounding Box Alignment for Accurate Vehicle Speed Measurement

    Jakub Sochor, Roman Juránek, Adam Herout
    Comments: under review in CVIU
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we focus on fully automatic traffic surveillance camera
    calibration which we use for speed measurement of passing vehicles. We improve
    over a recent state-of-the-art camera calibration method for traffic
    surveillance based on two detected vanishing points. More importantly, we
    propose a novel automatic scene scale inference based on matching bounding
    boxes of rendered 3D models of vehicles with detected bounding boxes in the
    image. The proposed method can be used from an arbitrary viewpoint and it has
    no constraints on camera placement. We evaluate our method on recent
    comprehensive dataset for speed measurement BrnoCompSpeed. Experiments show
    that our automatic camera calibration by detected two vanishing points method
    reduces the error by 50% compared to the previous state-of-the-art method. We
    also show that our scene scale inference method is much more precise (mean
    speed measurement error 1.10km/h) outperforming both state of the art automatic
    calibration method (error reduction by 86% — mean error 7.98km/h) and manual
    calibration (error reduction by 19% — mean error 1.35km/h). We also present
    qualitative results of automatic camera calibration method on video sequences
    obtained from real surveillance cameras on various places and under different
    lighting conditions (night, dawn, day).

    BrnoCompSpeed: Review of Traffic Camera Calibration and Comprehensive Dataset for Monocular Speed Measurement

    Jakub Sochor, Roman Juránek, Jakub Špaňhel, Lukáš Maršík, Adam Široký, Adam Herout, Pavel Zemčík
    Comments: under review in IEEE T-ITS
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we focus on traffic camera calibration and visual speed
    measurement from a single monocular camera, which is an important task of
    visual traffic surveillance. Existing methods addressing this problem are hard
    to compare due to lack of a common dataset with reliable ground truth.
    Therefore, it is not clear how the methods compare in various aspects and what
    are the factors affecting their performance. We captured a new dataset of 18
    full-HD videos, each around one hour long, captured at 6 different locations.
    Vehicles in videos (20,865 instances in total) are annotated with precise speed
    measurements from optical gates using LIDAR and verified with several reference
    GPS tracks. We provide the videos and metadata (calibration, lengths of
    features in image, annotations, etc.) for future comparison and evaluation.
    Camera calibration is the most crucial part of the speed measurement;
    therefore, we provide a review of the methods and analyze a recently published
    method for fully automatic camera calibration and vehicle speed measurement and
    report the results on this dataset in detail.

    A Discriminative Event Based Model for Alzheimer's Disease Progression Modeling

    Vikram Venkatraghavan, Esther Bron, Wiro Niessen, Stefan Klein
    Comments: Information Processing in Medical Imaging (IPMI), 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)

    The event-based model (EBM) for data-driven disease progression modeling
    estimates the sequence in which biomarkers for a disease become abnormal. This
    helps in understanding the dynamics of disease progression and facilitates
    early diagnosis by staging patients on a disease progression timeline. Existing
    EBM methods are all generative in nature. In this work we propose a novel
    discriminative approach to EBM, which is shown to be more accurate as well as
    computationally more efficient than existing state-of-the art EBM methods. The
    method first estimates for each subject an approximate ordering of events, by
    ranking the posterior probabilities of individual biomarkers being abnormal.
    Subsequently, the central ordering over all subjects is estimated by fitting a
    generalized Mallows model to these approximate subject-specific orderings based
    on a novel probabilistic Kendall’s Tau distance. To evaluate the accuracy, we
    performed extensive experiments on synthetic data simulating the progression of
    Alzheimer’s disease. Subsequently, the method was applied to the Alzheimer’s
    Disease Neuroimaging Initiative (ADNI) data to estimate the central event
    ordering in the dataset. The experiments benchmark the accuracy of the new
    model under various conditions and compare it with existing state-of-the-art
    EBM methods. The results indicate that discriminative EBM could be a simple and
    elegant approach to disease progression modeling.

    Fast Resampling of 3D Point Clouds via Graphs

    Siheng Chen, Dong Tian, Chen Feng, Anthony Vetro, Jelena Kovačević
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    To reduce cost in storing, processing and visualizing a large-scale point
    cloud, we consider a randomized resampling strategy to select a representative
    subset of points while preserving application-dependent features. The proposed
    strategy is based on graphs, which can represent underlying surfaces and lend
    themselves well to efficient computation. We use a general feature-extraction
    operator to represent application-dependent features and propose a general
    reconstruction error to evaluate the quality of resampling. We obtain a general
    form of optimal resampling distribution by minimizing the reconstruction error.
    The proposed optimal resampling distribution is guaranteed to be shift,
    rotation and scale-invariant in the 3D space. We next specify the
    feature-extraction operator to be a graph filter and study specific resampling
    strategies based on all-pass, low-pass, high-pass graph filtering and graph
    filter banks. We finally apply the proposed methods to three applications:
    large-scale visualization, accurate registration and robust shape modeling. The
    empirical performance validates the effectiveness and efficiency of the
    proposed resampling methods.

    Mimicking Ensemble Learning with Deep Branched Networks

    Byungju Kim, Youngsoo Kim, Yeakang Lee, Junmo Kim
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes a branched residual network for image classification. It
    is known that high-level features of deep neural network are more
    representative than lower-level features. By sharing the low-level features,
    the network can allocate more memory to high-level features. The upper layers
    of our proposed network are branched, so that it mimics the ensemble learning.
    By mimicking ensemble learning with single network, we have achieved better
    performance on ImageNet classification task.

    Object Detection in Videos with Tubelet Proposal Networks

    Kai Kang, Hongsheng Li, Tong Xiao, Wanli Ouyang, Junjie Yan, Xihui Liu, Xiaogang Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Object detection in videos has drawn increasing attention recently with the
    introduction of the large-scale ImageNet VID dataset. Different from object
    detection in static images, temporal information in videos provides vital
    information for object detection. To fully utilize temporal information,
    state-of-the-art methods are therefore based on spatiotemporal tubelets, which
    are essentially sequences of associated bounding boxes across time. However,
    the existing methods have major limitations in generating tubelets in terms of
    quality and efficiency. Motion-based methods are able to obtain dense tubelets,
    but the lengths are generally only several frames, which is not optimal to
    incorporate long-term temporal information. Appearance-based methods, usually
    involving generic object tracking, could generate long tubelets, but are
    usually computational expensive. In this work, we propose a framework for
    object detection in videos, which consists of a novel tubelet proposal network
    to efficiently generate spatiotemporal proposals, and a Long Short-term Memory
    (LSTM) network that incorporates temporal information from tubelet proposals
    for achieving high object detection accuracy in videos. The experiments on the
    large-scale ImageNet VID dataset demonstrate the effectiveness of the proposed
    framework for object detection in videos.

    Just DIAL: DomaIn Alignment Layers for Unsupervised Domain Adaptation

    Fabio Maria Carlucci, Lorenzo Porzi, Barbara Caputo, Elisa Ricci, Samuel Rota Bulò
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The empirical fact that classifiers, trained on given data collections,
    perform poorly when tested on data acquired in different settings is
    theoretically explained in domain adaptation through a shift among
    distributions of the source and target domains. Alleviating the domain shift
    problem, especially in the challenging setting where no labeled data are
    available for the target domain, is paramount for having visual recognition
    systems working in the wild. As the problem stems from a shift among
    distributions, intuitively one should try to align them. In the literature,
    this has resulted in a stream of works attempting to align the feature
    representations learned from the source and target domains. Here we take a
    different route. Rather than introducing regularization terms aiming to promote
    the alignment of the two representations, we act at the distribution level
    through the introduction of emph{DomaIn Alignment Layers} (DIAL), able to
    match the observed source and target data distributions to a reference one.
    Thorough experiments on three different public benchmarks we confirm the power
    of our approach.

    Learning Compact Appearance Representation for Video-based Person Re-Identification

    Wei Zhang, Shengnan Hu, Kan Liu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a novel approach for video-based person re-identification
    using multiple Convolutional Neural Networks (CNNs). Unlike previous work, we
    intend to extract a compact yet discriminative appearance representation from
    several frames rather than the whole sequence. Specifically, given a video, the
    representative frames are selected based on the walking profile of consecutive
    frames. A multiple CNN architecture incorporated with feature pooling is
    proposed to learn and compile the features of the selected representative
    frames into a compact description about the pedestrian for identification.
    Experiments are conducted on benchmark datasets to demonstrate the superiority
    of the proposed method over existing person re-identification approaches.

    Visual Tracking by Reinforced Decision Making

    Janghoon Choi, Junseok Kwon, Kyoung Mu Lee
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    One of the major challenges of model-free visual tracking problem has been
    the difficulty originating from the unpredictable and drastic changes in the
    appearance of objects we target to track. Existing methods tackle this problem
    by updating the appearance model on-line in order to adapt to the changes in
    the appearance. Despite the success of these methods however, inaccurate and
    erroneous updates of the appearance model result in a tracker drift. In this
    paper, we introduce a novel visual tracking algorithm based on a template
    selection strategy constructed by deep reinforcement learning methods. The
    tracking algorithm utilizes this strategy to choose the best template for
    tracking a given frame. The template selection strategy is self-learned by
    utilizing a simple policy gradient method on numerous training episodes
    randomly generated from a tracking benchmark dataset. Our proposed
    reinforcement learning framework is generally applicable to other confidence
    map based tracking algorithms. The experiment shows that our tracking algorithm
    effectively decides the best template for visual tracking.

    Weighted Motion Averaging for the Registration of Multi-View Range Scans

    Rui Guo, Jihua Zhu, Yaochen Li, Zhongyu Li, Dapeng Chen, Yongqin Zhang
    Comments: 9 pages, 6 figures, 2tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Multi-view registration is a fundamental but challenging problem in 3D
    reconstruction and robot vision. Although the original motion averaging
    algorithm has been introduced as an effective means to solve the multi-view
    registration problem, it does not consider the reliability and accuracy of each
    relative motion. Accordingly, this paper proposes a novel motion averaging
    algorithm for multi-view registration. Firstly, it utilizes the pair-wise
    registration algorithm to estimate the relative motion and overlapping
    percentage of each scan pair with a certain degree of overlap. With the
    overlapping percentage available, it views the overlapping percentage as the
    corresponding weight of each scan pair and proposes the weight motion averaging
    algorithm, which can pay more attention to reliable and accurate relative
    motions. By treating each relative motion distinctively, more accurate
    registration can be achieved by applying the weighted motion averaging to
    multi-view range scans. Experimental results demonstrate the superiority of our
    proposed approach compared with the state-of-the-art methods in terms of
    accuracy, robustness and efficiency.

    The Power of Sparsity in Convolutional Neural Networks

    Soravit Changpinyo, Mark Sandler, Andrey Zhmoginov
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep convolutional networks are well-known for their high computational and
    memory demands. Given limited resources, how does one design a network that
    balances its size, training time, and prediction accuracy? A surprisingly
    effective approach to trade accuracy for size and speed is to simply reduce the
    number of channels in each convolutional layer by a fixed fraction and retrain
    the network. In many cases this leads to significantly smaller networks with
    only minimal changes to accuracy. In this paper, we take a step further by
    empirically examining a strategy for deactivating connections between filters
    in convolutional layers in a way that allows us to harvest savings both in
    run-time and memory for many network architectures. More specifically, we
    generalize 2D convolution to use a channel-wise sparse connection structure and
    show that this leads to significantly better results than the baseline approach
    for large networks including VGG and Inception V3.

    Learning to Generate Posters of Scientific Papers by Probabilistic Graphical Models

    Yu-ting Qiang, Yanwei Fu, Xiao Yu, Yanwen Guo, Zhi-Hua Zhou, Leonid Sigal
    Comments: 10 pages, submission to IEEE TPAMI. arXiv admin note: text overlap with arXiv:1604.01219
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)

    Researchers often summarize their work in the form of scientific posters.
    Posters provide a coherent and efficient way to convey core ideas expressed in
    scientific papers. Generating a good scientific poster, however, is a complex
    and time consuming cognitive task, since such posters need to be readable,
    informative, and visually aesthetic. In this paper, for the first time, we
    study the challenging problem of learning to generate posters from scientific
    papers. To this end, a data-driven framework, that utilizes graphical models,
    is proposed. Specifically, given content to display, the key elements of a good
    poster, including attributes of each panel and arrangements of graphical
    elements are learned and inferred from data. During the inference stage, an MAP
    inference framework is employed to incorporate some design principles. In order
    to bridge the gap between panel attributes and the composition within each
    panel, we also propose a recursive page splitting algorithm to generate the
    panel layout for a poster. To learn and validate our model, we collect and
    release a new benchmark dataset, called NJU-Fudan Paper-Poster dataset, which
    consists of scientific papers and corresponding posters with exhaustively
    labelled panels and attributes. Qualitative and quantitative results indicate
    the effectiveness of our approach.

    Efficient Dense Labeling of Human Activity Sequences from Wearables using Fully Convolutional Networks

    Rui Yao, Guosheng Lin, Qinfeng Shi, Damith Ranasinghe
    Comments: 7 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

    Recognizing human activities in a sequence is a challenging area of research
    in ubiquitous computing. Most approaches use a fixed size sliding window over
    consecutive samples to extract features—either handcrafted or learned
    features—and predict a single label for all samples in the window. Two key
    problems emanate from this approach: i) the samples in one window may not
    always share the same label. Consequently, using one label for all samples
    within a window inevitably lead to loss of information; ii) the testing phase
    is constrained by the window size selected during training while the best
    window size is difficult to tune in practice. We propose an efficient algorithm
    that can predict the label of each sample, which we call dense labeling, in a
    sequence of human activities of arbitrary length using a fully convolutional
    network. In particular, our approach overcomes the problems posed by the
    sliding window step. Additionally, our algorithm learns both the features and
    classifier automatically. We release a new daily activity dataset based on a
    wearable sensor with hospitalized patients. We conduct extensive experiments
    and demonstrate that our proposed approach is able to outperform the
    state-of-the-arts in terms of classification and label misalignment measures on
    three challenging datasets: Opportunity, Hand Gesture, and our new dataset.

    Forest understory trees revealed using sufficiently dense airborne laser scanning point clouds

    Hamid Hamraz, Marco A. Contreras, Jun Zhang
    Comments: arXiv admin note: text overlap with arXiv:1701.00169
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG)

    Airborne laser scanning (lidar) point clouds can be process to extract
    tree-level information over large forested landscapes. Existing procedures
    typically detect more than 90% of overstory trees, yet they barely detect 60%
    of understory trees because of reduced number of lidar points penetrating the
    top canopy layer. Although understory trees provide limited financial value,
    they offer habitat for numerous wildlife species and are important for stand
    development. Here we model tree identification accuracy according to point
    cloud density by decomposing lidar point cloud into overstory and multiple
    understory canopy layers, estimating the fraction of points representing the
    different layers, and inspecting tree identification accuracy as a function of
    point density. We show at a density of about 170 pt/m2 understory tree
    identification accuracy likely plateaus, which we regard as the required point
    density for reasonable identification of understory trees. Given the
    advancements of lidar sensor technology, point clouds can feasibly reach the
    required density to enable effective identification of individual understory
    trees, ultimately making remote quantification of forest resources more
    accurate. The layer decomposition methodology can also be adopted for other
    similar remote sensing or advanced imaging applications such as geological
    subsurface modelling or biomedical tissue analysis.

    Developing a comprehensive framework for multimodal feature extraction

    Quinten McNamara, Alejandro de la Vega, Tal Yarkoni
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Learning (cs.LG); Multimedia (cs.MM)

    Feature extraction is a critical component of many applied data science
    workflows. In recent years, rapid advances in artificial intelligence and
    machine learning have led to an explosion of feature extraction tools and
    services that allow data scientists to cheaply and effectively annotate their
    data along a vast array of dimensions—ranging from detecting faces in images
    to analyzing the sentiment expressed in coherent text. Unfortunately, the
    proliferation of powerful feature extraction services has been mirrored by a
    corresponding expansion in the number of distinct interfaces to feature
    extraction services. In a world where nearly every new service has its own API,
    documentation, and/or client library, data scientists who need to combine
    diverse features obtained from multiple sources are often forced to write and
    maintain ever more elaborate feature extraction pipelines. To address this
    challenge, we introduce a new open-source framework for comprehensive
    multimodal feature extraction. Pliers is an open-source Python package that
    supports standardized annotation of diverse data types (video, images, audio,
    and text), and is expressly with both ease-of-use and extensibility in mind.
    Users can apply a wide range of pre-existing feature extraction tools to their
    data in just a few lines of Python code, and can also easily add their own
    custom extractors by writing modular classes. A graph-based API enables rapid
    development of complex feature extraction pipelines that output results in a
    single, standardized format. We describe the package’s architecture, detail its
    major advantages over previous feature extraction toolboxes, and use a sample
    application to a large functional MRI dataset to illustrate how pliers can
    significantly reduce the time and effort required to construct sophisticated
    feature extraction workflows while increasing code clarity and maintainability.

    Derivative Based Focal Plane Array Nonuniformity Correction

    G. Ness, A. Oved, I. Kakon
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a fast and robust method for fixed pattern noise
    nonuniformity correction of infrared focal plane arrays. The proposed method
    requires neither shutter nor elaborate calibrations and therefore enables a
    real time correction with no interruptions. Based on derivative estimation of
    the fixed pattern noise from pixel sized translations of the focal plane array,
    the proposed method has the advantages of being invariant to the noise
    magnitude and robust to unknown camera and inter-scene movements while being
    virtually transparent to the end-user.

    Online Representation Learning with Multi-layer Hebbian Networks for Image Classification Tasks

    Yanis Bahroun, Andrea Soltoggio
    Comments: 8 pages, 6 figures
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

    Unsupervised learning allows algorithms to adapt to different data thanks to
    the autonomous discovery of discriminating features during the training. When
    these algorithms are reducible to cost-function minimisation, better
    interpretations of their learning dynamics are possible. Recently, new
    Hebbian-like plasticity, bio-inspired, local and unsupervised learning rules
    for neural networks, have been shown to minimise a cost-function while
    performing online sparse representation learning. However, it is unclear to
    what degree such rules are effective to learn features from images. To
    investigate this point, this study introduces a novel multi-layer Hebbian
    network trained by a rule derived from a non-negative classical
    multidimensional scaling cost-function. The performance is compared to that of
    other fully unsupervised learning algorithms.

    A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks

    Yixing Li, Zichuan Liu, Kai Xu, Hao Yu, Fengbo Ren
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    FPGA-based hardware accelerators for convolutional neural networks (CNNs)
    have obtained great attentions due to their higher energy efficiency than GPUs.
    However, it is challenging for FPGA-based solutions to achieve a higher
    throughput than GPU counterparts. In this paper, we demonstrate that FPGA
    acceleration can be a superior solution in terms of both throughput and energy
    efficiency when a CNN is trained with binary constraints on weights and
    activations. Specifically, we propose an optimized accelerator architecture
    tailored for bitwise convolution and normalization that features massive
    spatial parallelism with deep pipelines stages. Experiment results show that
    the proposed architecture is 8.3x faster and 75x more energy-efficient than a
    Titan X GPU for processing online individual requests (in small batch size).
    For processing static data (in large batch size), the proposed solution is on a
    par with a Titan X GPU in terms of throughput while delivering 9.5x higher
    energy efficiency.

    Deep Geometric Retrieval

    Y Qian, E Vazquez, B Sengupta
    Subjects: Information Retrieval (cs.IR); Computer Vision and Pattern Recognition (cs.CV)

    Comparing images in order to recommend items from an image-inventory is a
    subject of continued interest. Added with the scalability of deep-learning
    architectures the once `manual’ job of hand-crafting features have been largely
    alleviated, and images can be compared according to features generated from a
    deep convolutional neural network. In this paper, we compare distance metrics
    (and divergences) to rank features generated from a neural network, for
    content-based image retrieval. Specifically, after modelling individual images
    using approximations of mixture models or sparse covariance estimators we
    resort to their information-theoretic and Riemann geometric comparisons. We
    show that using approximations of mixture models enable us to to compute a
    distance measure based on the Wasserstein metric that requires less effort than
    computationally intensive optimal transport plans; finally, an affine invariant
    metric is used to compare the optimal transport metric to its Riemann geometric
    counterpart — we conclude that although expensive, retrieval metric based on
    Wasserstein geometry are more suitable than information theoretic comparison of
    images. In short, we combine GPU scalability in learning deep feature vectors
    with computationally efficient metrics that we foresee being utilized in a
    commercial setting.

    Is Saki #delicious? The Food Perception Gap on Instagram and Its Relation to Health

    Ferda Ofli, Yusuf Aytar, Ingmar Weber, Raggi al Hammouri, Antonio Torralba
    Comments: This is a pre-print of our paper accepted to appear in the Proceedings of 2017 International World Wide Web Conference (WWW’17)
    Subjects: Computers and Society (cs.CY); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)

    Food is an integral part of our life and what and how much we eat crucially
    affects our health. Our food choices largely depend on how we perceive certain
    characteristics of food, such as whether it is healthy, delicious or if it
    qualifies as a salad. But these perceptions differ from person to person and
    one person’s “single lettuce leaf” might be another person’s “side salad”.
    Studying how food is perceived in relation to what it actually is typically
    involves a laboratory setup. Here we propose to use recent advances in image
    recognition to tackle this problem. Concretely, we use data for 1.9 million
    images from Instagram from the US to look at systematic differences in how a
    machine would objectively label an image compared to how a human subjectively
    does. We show that this difference, which we call the “perception gap”, relates
    to a number of health outcomes observed at the county level. To the best of our
    knowledge, this is the first time that image recognition is being used to study
    the “misalignment” of how people describe food images vs. what they actually
    depict.

    Projection based advanced motion model for cubic mapping for 360-degree video

    Li Li, Zhu Li, Madhukar Budagavi, Houqiang Li
    Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes a novel advanced motion model to handle the irregular
    motion for the cubic map projection of 360-degree video. Since the irregular
    motion is mainly caused by the projection from the sphere to the cube map, we
    first try to project the pixels in both the current picture and reference
    picture from unfolding cube back to the sphere. Then through utilizing the
    characteristic that most of the motions in the sphere are uniform, we can
    derive the relationship between the motion vectors of various pixels in the
    unfold cube. The proposed advanced motion model is implemented in the High
    Efficiency Video Coding reference software. Experimental results demonstrate
    that quite obvious performance improvement can be achieved for the sequences
    with obvious motions.


    Artificial Intelligence

    Delving Deeper into MOOC Student Dropout Prediction

    Jacob Whitehill, Kiran Mohan, Daniel Seaton, Yigal Rosen, Dustin Tingley
    Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

    In order to obtain reliable accuracy estimates for automatic MOOC dropout
    predictors, it is important to train and test them in a manner consistent with
    how they will be used in practice. Yet most prior research on MOOC dropout
    prediction has measured test accuracy on the same course used for training the
    classifier, which can lead to overly optimistic accuracy estimates. In order to
    understand better how accuracy is affected by the training+testing regime, we
    compared the accuracy of a standard dropout prediction architecture
    (clickstream features + logistic regression) across 4 different training
    paradigms. Results suggest that (1) training and testing on the same course
    (“post-hoc”) can overestimate accuracy by several percentage points; (2)
    dropout classifiers trained on proxy labels based on students’ persistence are
    surprisingly competitive with post-hoc training (87.33% versus 90.20% AUC
    averaged over 8 weeks of 40 HarvardX MOOCs); and (3) classifier performance
    does not vary significantly with the academic discipline. Finally, we also
    research new dropout prediction architectures based on deep, fully-connected,
    feed-forward neural networks and find that (4) networks with as many as 5
    hidden layers can statistically significantly increase test accuracy over that
    of logistic regression.

    Towards a Common Implementation of Reinforcement Learning for Multiple Robotic Tasks

    Angel Martínez-Tenor, Juan Antonio Fernández-Madrigal, Ana Cruz-Martín, Javier González-Jiménez
    Comments: 15 pages, 10 figures, 7 tables. To be published in a scientific journal
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)

    Mobile robots are increasingly being employed for performing complex tasks in
    dynamic environments. Reinforcement learning (RL) methods are recognized to be
    promising for specifying such tasks in a relatively simple manner. However, the
    strong dependency between the learning method and the task to learn is a
    well-known problem that restricts practical implementations of RL in robotics,
    often requiring major modifications of parameters and adding other techniques
    for each particular task. In this paper we present a practical core
    implementation of RL which enables the learning process for multiple robotic
    tasks with minimal per-task tuning or none. Based on value iteration methods,
    this implementation includes a novel approach for action selection, called
    Q-biased softmax regression (QBIASSR), which avoids poor performance of the
    learning process when the robot reaches new unexplored states. Our approach
    takes advantage of the structure of the state space by attending the physical
    variables involved (e.g., distances to obstacles, X,Y,{ heta} pose, etc.),
    thus experienced sets of states may favor the decision-making process of
    unexplored or rarely-explored states. This improvement has a relevant role in
    reducing the tuning of the algorithm for particular tasks. Experiments with
    real and simulated robots, performed with the software framework also
    introduced here, show that our implementation is effectively able to learn
    different robotic tasks without tuning the learning method. Results also
    suggest that the combination of true online SARSA({lambda}) with QBIASSR can
    outperform the existing RL core algorithms in low-dimensional robotic tasks.

    Sample Efficient Policy Search for Optimal Stopping Domains

    Karan Goel, Christoph Dann, Emma Brunskill
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    Arising naturally in many fields, optimal stopping problems consider the
    question of deciding when to stop an observation-generating process. We examine
    the problem of simultaneously learning and planning in such domains, when data
    is collected directly from the environment. We propose GFSE, a simple and
    flexible model-free policy search method that reuses data for sample efficiency
    by leveraging problem structure. We bound the sample complexity of our approach
    to guarantee uniform convergence of policy value estimates, tightening existing
    PAC bounds to achieve logarithmic dependence on horizon length for our setting.
    We also examine the benefit of our method against prevalent model-based and
    model-free approaches on 3 domains taken from diverse fields.

    Beating the World's Best at Super Smash Bros. with Deep Reinforcement Learning

    Vlad Firoiu, William F. Whitney, Joshua B. Tenenbaum
    Comments: Submitted to IJCAI 2017
    Subjects: Artificial Intelligence (cs.AI)

    There has been a recent explosion in the capabilities of game-playing
    artificial intelligence. Many classes of RL tasks, from Atari games to motor
    control to board games, are now solvable by fairly generic algorithms, based on
    deep learning, that learn to play from experience with minimal knowledge of the
    specific domain of interest. In this work, we will investigate the performance
    of these methods on Super Smash Bros. Melee (SSBM), a popular console fighting
    game. The SSBM environment has complex dynamics and partial observability,
    making it challenging for human and machine alike. The multi-player aspect
    poses an additional challenge, as the vast majority of recent advances in RL
    have focused on single-agent environments. Nonetheless, we will show that it is
    possible to train agents that are competitive against and even surpass human
    professionals, a new result for the multi-player video game setting.

    The Dialog State Tracking Challenge with Bayesian Approach

    Quan Nguyen
    Subjects: Artificial Intelligence (cs.AI)

    Generative model has been one of the most common approaches for solving the
    Dialog State Tracking Problem with the capabilities to model the dialog
    hypotheses in an explicit manner. The most important task in such Bayesian
    networks models is constructing the most reliable user models by learning and
    reflecting the training data into the probability distribution of user actions
    conditional on networks states. This paper provides an overall picture of the
    learning process in a Bayesian framework with an emphasize on the
    state-of-the-art theoretical analyses of the Expectation Maximization learning
    algorithm.

    Synthesizing Imperative Programs for Introductory Programming Assignments

    Sunbeom So, Hakjoo Oh
    Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI)

    We present a novel algorithm that synthesizes imperative programs for
    introductory programming courses. Given a set of input-output examples and a
    partial program, our algorithm generates a complete program that is consistent
    with every example. Our key idea is to combine enumerative program synthesis
    and static analysis, which aggressively prunes out a large search space while
    guaranteeing to find, if any, a correct solution. We have implemented our
    algorithm in a tool, called SIMPL, and evaluated it on 30 problems used in
    introductory programming courses. The results show that SIMPL is able to solve
    the benchmark problems in 6.6 seconds on average.

    Survey of Reasoning using Neural networks

    Amit Sahu
    Comments: 12 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Reason and inference require process as well as memory skills by humans.
    Neural networks are able to process tasks like image recognition (better than
    humans) but in memory aspects are still limited (by attention mechanism, size).
    Recurrent Neural Network (RNN) and it’s modified version LSTM are able to solve
    small memory contexts, but as context becomes larger than a threshold, it is
    difficult to use them. The Solution is to use large external memory. Still, it
    poses many challenges like, how to train neural networks for discrete memory
    representation, how to describe long term dependencies in sequential data etc.
    Most prominent neural architectures for such tasks are Memory networks:
    inference components combined with long term memory and Neural Turing Machines:
    neural networks using external memory resources. Also, additional techniques
    like attention mechanism, end to end gradient descent on discrete memory
    representation are needed to support these solutions. Preliminary results of
    above neural architectures on simple algorithms (sorting, copying) and Question
    Answering (based on story, dialogs) application are comparable with the state
    of the art. In this paper, I explain these architectures (in general), the
    additional techniques used and the results of their application.


    Information Retrieval

    Algorithmes de classification et d'optimisation: participation du LIA/ADOC á DEFT'14

    Luis Adrián Cabrera-Diego, Stéphane Huet, Bassam Jabaian, Alejandro Molina, Juan-Manuel Torres-Moreno, Marc El-Bèze, Barthélémy Durette
    Comments: 8 pages, 3 tables, Conference paper (in French)
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    This year, the DEFT campaign (D’efi Fouilles de Textes) incorporates a task
    which aims at identifying the session in which articles of previous TALN
    conferences were presented. We describe the three statistical systems developed
    at LIA/ADOC for this task. A fusion of these systems enables us to obtain
    interesting results (micro-precision score of 0.76 measured on the test corpus)

    Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

    Carlos-Emiliano González-Gallardo, Juan-Manuel Torres-Moreno, Azucena Montes Rendón, Gerardo Sierra
    Comments: 8 pages, 6 figures, Conference paper
    Journal-ref: Proceedings of the 8th International Joint Conference on Knowledge
    Discovery, Knowledge Engineering and Knowledge Management, Vol 1: KDIR,
    307-314, 2016, Porto, Portugal
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Social and Information Networks (cs.SI)

    In this paper we describe a dynamic normalization process applied to social
    network multilingual documents (Facebook and Twitter) to improve the
    performance of the Author profiling task for short texts. After the
    normalization process, (n)-grams of characters and n-grams of POS tags are
    obtained to extract all the possible stylistic information encoded in the
    documents (emoticons, character flooding, capital letters, references to other
    users, hyperlinks, hashtags, etc.). Experiments with SVM showed up to 90% of
    performance.

    Deep Geometric Retrieval

    Y Qian, E Vazquez, B Sengupta
    Subjects: Information Retrieval (cs.IR); Computer Vision and Pattern Recognition (cs.CV)

    Comparing images in order to recommend items from an image-inventory is a
    subject of continued interest. Added with the scalability of deep-learning
    architectures the once `manual’ job of hand-crafting features have been largely
    alleviated, and images can be compared according to features generated from a
    deep convolutional neural network. In this paper, we compare distance metrics
    (and divergences) to rank features generated from a neural network, for
    content-based image retrieval. Specifically, after modelling individual images
    using approximations of mixture models or sparse covariance estimators we
    resort to their information-theoretic and Riemann geometric comparisons. We
    show that using approximations of mixture models enable us to to compute a
    distance measure based on the Wasserstein metric that requires less effort than
    computationally intensive optimal transport plans; finally, an affine invariant
    metric is used to compare the optimal transport metric to its Riemann geometric
    counterpart — we conclude that although expensive, retrieval metric based on
    Wasserstein geometry are more suitable than information theoretic comparison of
    images. In short, we combine GPU scalability in learning deep feature vectors
    with computationally efficient metrics that we foresee being utilized in a
    commercial setting.

    SAR: A Semantic Analysis Approach for Recommendation

    Han Xiao, Minlie Huang, Xiaoyan Zhu
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

    Recommendation system is a common demand in daily life and matrix completion
    is a widely adopted technique for this task. However, most matrix completion
    methods lack semantic interpretation and usually result in weak-semantic
    recommendations. To this end, this paper proposes a {f S}emantic {f
    A}nalysis approach for {f R}ecommendation systems extbf{(SAR)}, which
    applies a two-level hierarchical generative process that assigns semantic
    properties and categories for user and item. SAR learns semantic
    representations of users/items merely from user ratings on items, which offers
    a new path to recommendation by semantic matching with the learned
    representations. Extensive experiments demonstrate SAR outperforms other
    state-of-the-art baselines substantially.

    MOLIERE: Automatic Biomedical Hypothesis Generation System

    Justin Sybrandt, Michael Shtutman, Ilya Safro
    Subjects: Information Retrieval (cs.IR); Digital Libraries (cs.DL); Social and Information Networks (cs.SI); Quantitative Methods (q-bio.QM); Other Statistics (stat.OT)

    Hypothesis generation is becoming a crucial time-saving technique which
    allows biomedical researchers to quickly discover implicit connections between
    important concepts. Typically, these systems operate on domain-specific
    fractions of public medical data. MOLIERE, in contrast, utilizes information
    from over 24.5 million documents. At the heart of our approach lies a
    multi-modal and multi-relational network of biomedical objects extracted from
    several heterogeneous datasets from the National Center for Biotechnology
    Information (NCBI). These objects include but are not limited to scientific
    papers, keywords, genes, proteins, diseases, and diagnoses. We model hypotheses
    using Latent Dirichlet Allocation applied on abstracts found near shortest
    paths discovered within this network, and demonstrate the effectiveness of
    MOLIERE by performing hypothesis generation on historical data. Our network,
    implementation, and resulting data are all publicly available for the broad
    scientific community.

    Systèmes du LIA à DEFT'13

    Xavier Bost, Ilaria Brunetti, Luis Adrián Cabrera-Diego, Jean-Valère Cossu, Andréa Linhares, Mohamed Morchid, Juan-Manuel Torres-Moreno, Marc El-Bèze, Richard Dufour
    Comments: 12 pages, 3 tables, (Paper in French)
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    The 2013 D’efi de Fouille de Textes (DEFT) campaign is interested in two
    types of language analysis tasks, the document classification and the
    information extraction in the specialized domain of cuisine recipes. We present
    the systems that the LIA has used in DEFT 2013. Our systems show interesting
    results, even though the complexity of the proposed tasks.

    Filtering Tweets for Social Unrest

    Alan Mishler, Kevin Wonus, Wendy Chambers, Michael Bloodgood
    Comments: 7 pages, 8 figures, 3 tables; to appear in Proceedings of the Eleventh IEEE International Conference on Semantic Computing (ICSC 2017), San Diego, California, 2017
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)

    Since the events of the Arab Spring, there has been increased interest in
    using social media to anticipate social unrest. While efforts have been made
    toward automated unrest prediction, we focus on filtering the vast volume of
    tweets to identify tweets relevant to unrest, which can be provided to
    downstream users for further analysis. We train a supervised classifier that is
    able to label Arabic language tweets as relevant to unrest with high
    reliability. We examine the relationship between training data size and
    performance and investigate ways to optimize the model building process while
    minimizing cost. We also explore how confidence thresholds can be set to
    achieve desired levels of performance.

    Robust Phase Retrieval via ADMM with Outliers

    Xue Jiang, H. C. So, X. Liu
    Subjects: Information Theory (cs.IT); Information Retrieval (cs.IR)

    An outlier-resistance phase retrieval algorithm based on alternating
    direction method of multipliers (ADMM) is devised in this letter. Instead of
    the widely used least squares criterion that is only optimal for Gaussian noise
    environment, we adopt the least absolute deviation criterion to enhance the
    robustness against outliers. Considering both intensity- and amplitude-based
    observation models, the framework of ADMM is developed to solve the resulting
    non-differentiable optimization problems. It is demonstrated that the core
    subproblem of ADMM is the proximity operator of the L1-norm, which can be
    computed efficiently by soft-thresholding in each iteration. Simulation results
    are provided to validate the accuracy and efficiency of the proposed approach
    compared to the existing schemes.

    Developing a comprehensive framework for multimodal feature extraction

    Quinten McNamara, Alejandro de la Vega, Tal Yarkoni
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Learning (cs.LG); Multimedia (cs.MM)

    Feature extraction is a critical component of many applied data science
    workflows. In recent years, rapid advances in artificial intelligence and
    machine learning have led to an explosion of feature extraction tools and
    services that allow data scientists to cheaply and effectively annotate their
    data along a vast array of dimensions—ranging from detecting faces in images
    to analyzing the sentiment expressed in coherent text. Unfortunately, the
    proliferation of powerful feature extraction services has been mirrored by a
    corresponding expansion in the number of distinct interfaces to feature
    extraction services. In a world where nearly every new service has its own API,
    documentation, and/or client library, data scientists who need to combine
    diverse features obtained from multiple sources are often forced to write and
    maintain ever more elaborate feature extraction pipelines. To address this
    challenge, we introduce a new open-source framework for comprehensive
    multimodal feature extraction. Pliers is an open-source Python package that
    supports standardized annotation of diverse data types (video, images, audio,
    and text), and is expressly with both ease-of-use and extensibility in mind.
    Users can apply a wide range of pre-existing feature extraction tools to their
    data in just a few lines of Python code, and can also easily add their own
    custom extractors by writing modular classes. A graph-based API enables rapid
    development of complex feature extraction pipelines that output results in a
    single, standardized format. We describe the package’s architecture, detail its
    major advantages over previous feature extraction toolboxes, and use a sample
    application to a large functional MRI dataset to illustrate how pliers can
    significantly reduce the time and effort required to construct sophisticated
    feature extraction workflows while increasing code clarity and maintainability.


    Computation and Language

    Systèmes du LIA à DEFT'13

    Xavier Bost, Ilaria Brunetti, Luis Adrián Cabrera-Diego, Jean-Valère Cossu, Andréa Linhares, Mohamed Morchid, Juan-Manuel Torres-Moreno, Marc El-Bèze, Richard Dufour
    Comments: 12 pages, 3 tables, (Paper in French)
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    The 2013 D’efi de Fouille de Textes (DEFT) campaign is interested in two
    types of language analysis tasks, the document classification and the
    information extraction in the specialized domain of cuisine recipes. We present
    the systems that the LIA has used in DEFT 2013. Our systems show interesting
    results, even though the complexity of the proposed tasks.

    Multi-task Learning with CTC and Segmental CRF for Speech Recognition

    Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith
    Comments: 5 pages, 2 figures, submitted to Interspeech 2017
    Subjects: Computation and Language (cs.CL)

    Segmental conditional random fields (SCRFs) and connectionist temporal
    classification (CTC) are two sequence labeling objectives used for end-to-end
    training of speech recognition models. Both models define the transcription
    probability by marginalizing decisions about latent segmentation alternatives
    to derive a sequence probability: the former uses a globally normalized joint
    model of segment labels and durations, and the latter classifies each frame as
    either an output symbol or a “continuation” of the previous label. In this
    paper, we train a recognition model by optimizing an interpolation between the
    SCRF and CTC losses, where the same recurrent neural network (RNN) encoder used
    for feature extraction for both outputs. We find that this multi-task objective
    improves recognition accuracy when decoding with either the SCRF or CTC models.
    Additionally, we show that CTC can also be used to pretrain the RNN encoder,
    which improves the convergence rate when learning the joint model.

    Hybrid Dialog State Tracker with ASR Features

    Miroslav Vodolán, Rudolf Kadlec, Jan Kleindienst
    Comments: Accepted to EACL 2017
    Subjects: Computation and Language (cs.CL)

    This paper presents a hybrid dialog state tracker enhanced by trainable
    Spoken Language Understanding (SLU) for slot-filling dialog systems. Our
    architecture is inspired by previously proposed neural-network-based
    belief-tracking systems. In addition, we extended some parts of our modular
    architecture with differentiable rules to allow end-to-end training. We
    hypothesize that these rules allow our tracker to generalize better than pure
    machine-learning based systems. For evaluation, we used the Dialog State
    Tracking Challenge (DSTC) 2 dataset – a popular belief tracking testbed with
    dialogs from restaurant information system. To our knowledge, our hybrid
    tracker sets a new state-of-the-art result in three out of four categories
    within the DSTC2.

    Reinforcement Learning Based Argument Component Detection

    Yang Gao, Hao Wang, Chen Zhang, Wei Wang
    Subjects: Computation and Language (cs.CL)

    Argument component detection (ACD) is an important sub-task in argumentation
    mining. ACD aims at detecting and classifying different argument components in
    natural language texts. Historical annotations (HAs) are important features the
    human annotators consider when they manually perform the ACD task. However, HAs
    are largely ignored by existing automatic ACD techniques. Reinforcement
    learning (RL) has proven to be an effective method for using HAs in some
    natural language processing tasks. In this work, we propose a RL-based ACD
    technique, and evaluate its performance on two well-annotated corpora. Results
    suggest that, in terms of classification accuracy, HAs-augmented RL outperforms
    plain RL by at most 17.85%, and outperforms the state-of-the-art supervised
    learning algorithm by at most 11.94%.

    Learning to generate one-sentence biographies from Wikidata

    Andrew Chisholm, Will Radford, Ben Hachey
    Subjects: Computation and Language (cs.CL)

    We investigate the generation of one-sentence Wikipedia biographies from
    facts derived from Wikidata slot-value pairs. We train a recurrent neural
    network sequence-to-sequence model with attention to select facts and generate
    textual summaries. Our model incorporates a novel secondary objective that
    helps ensure it generates sentences that contain the input facts. The model
    achieves a BLEU score of 41, improving significantly upon the vanilla
    sequence-to-sequence model and scoring roughly twice that of a simple template
    baseline. Human preference evaluation suggests the model is nearly as good as
    the Wikipedia reference. Manual analysis explores content selection, suggesting
    the model can trade the ability to infer knowledge against the risk of
    hallucinating incorrect information.

    Filtering Tweets for Social Unrest

    Alan Mishler, Kevin Wonus, Wendy Chambers, Michael Bloodgood
    Comments: 7 pages, 8 figures, 3 tables; to appear in Proceedings of the Eleventh IEEE International Conference on Semantic Computing (ICSC 2017), San Diego, California, 2017
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)

    Since the events of the Arab Spring, there has been increased interest in
    using social media to anticipate social unrest. While efforts have been made
    toward automated unrest prediction, we focus on filtering the vast volume of
    tweets to identify tweets relevant to unrest, which can be provided to
    downstream users for further analysis. We train a supervised classifier that is
    able to label Arabic language tweets as relevant to unrest with high
    reliability. We examine the relationship between training data size and
    performance and investigate ways to optimize the model building process while
    minimizing cost. We also explore how confidence thresholds can be set to
    achieve desired levels of performance.

    Enabling Multi-Source Neural Machine Translation By Concatenating Source Sentences In Multiple Languages

    Raj Dabre, Fabien Cromieres, Sadao Kurohashi
    Comments: Work in progress. Augmented manuscripts to follow
    Subjects: Computation and Language (cs.CL)

    In this paper, we propose a novel and elegant solution to “Multi-Source
    Neural Machine Translation” (MSNMT) which only relies on preprocessing a N-way
    multilingual corpus without modifying the Neural Machine Translation (NMT)
    architecture or training procedure. We simply concatenate the source sentences
    to form a single long multi-source input sentence while keeping the target side
    sentence as it is and train an NMT system using this augmented corpus. We
    evaluate our method in a low resource, general domain setting and show its
    effectiveness (+2 BLEU using 2 source languages and +6 BLEU using 5 source
    languages) along with some insights on how the NMT system leverages
    multilingual information in such a scenario by visualizing attention.

    Algorithmes de classification et d'optimisation: participation du LIA/ADOC á DEFT'14

    Luis Adrián Cabrera-Diego, Stéphane Huet, Bassam Jabaian, Alejandro Molina, Juan-Manuel Torres-Moreno, Marc El-Bèze, Barthélémy Durette
    Comments: 8 pages, 3 tables, Conference paper (in French)
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    This year, the DEFT campaign (D’efi Fouilles de Textes) incorporates a task
    which aims at identifying the session in which articles of previous TALN
    conferences were presented. We describe the three statistical systems developed
    at LIA/ADOC for this task. A fusion of these systems enables us to obtain
    interesting results (micro-precision score of 0.76 measured on the test corpus)

    Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

    Carlos-Emiliano González-Gallardo, Juan-Manuel Torres-Moreno, Azucena Montes Rendón, Gerardo Sierra
    Comments: 8 pages, 6 figures, Conference paper
    Journal-ref: Proceedings of the 8th International Joint Conference on Knowledge
    Discovery, Knowledge Engineering and Knowledge Management, Vol 1: KDIR,
    307-314, 2016, Porto, Portugal
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Social and Information Networks (cs.SI)

    In this paper we describe a dynamic normalization process applied to social
    network multilingual documents (Facebook and Twitter) to improve the
    performance of the Author profiling task for short texts. After the
    normalization process, (n)-grams of characters and n-grams of POS tags are
    obtained to extract all the possible stylistic information encoded in the
    documents (emoticons, character flooding, capital letters, references to other
    users, hyperlinks, hashtags, etc.). Experiments with SVM showed up to 90% of
    performance.


    Distributed, Parallel, and Cluster Computing

    A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks

    Yixing Li, Zichuan Liu, Kai Xu, Hao Yu, Fengbo Ren
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    FPGA-based hardware accelerators for convolutional neural networks (CNNs)
    have obtained great attentions due to their higher energy efficiency than GPUs.
    However, it is challenging for FPGA-based solutions to achieve a higher
    throughput than GPU counterparts. In this paper, we demonstrate that FPGA
    acceleration can be a superior solution in terms of both throughput and energy
    efficiency when a CNN is trained with binary constraints on weights and
    activations. Specifically, we propose an optimized accelerator architecture
    tailored for bitwise convolution and normalization that features massive
    spatial parallelism with deep pipelines stages. Experiment results show that
    the proposed architecture is 8.3x faster and 75x more energy-efficient than a
    Titan X GPU for processing online individual requests (in small batch size).
    For processing static data (in large batch size), the proposed solution is on a
    par with a Titan X GPU in terms of throughput while delivering 9.5x higher
    energy efficiency.

    Edge-Fog Cloud: A Distributed Cloud for Internet of Things Computations

    Nitinder Mohan, Jussi Kangasharju
    Comments: Published in IEEE 2nd Conference on Cloudification of Internet of Things (CIoT) – 2016, Paris, France
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Internet of Things typically involves a significant number of smart sensors
    sensing information from the environment and sharing it to a cloud service for
    processing. Various architectural abstractions, such as Fog and Edge computing,
    have been proposed to localize some of the processing near the sensors and away
    from the central cloud servers. In this paper, we propose Edge-Fog Cloud which
    distributes task processing on the participating cloud resources in the
    network. We develop the Least Processing Cost First (LPCF) method for assigning
    the processing tasks to nodes which provide the optimal processing time and
    near optimal networking costs. We evaluate LPCF in a variety of scenarios and
    demonstrate its effectiveness in finding the processing task assignments.

    Demystifying Fog Computing: Characterizing Architectures, Applications and Abstractions

    Prateeksha Varshney, Yogesh Simmhan
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Internet of Things (IoT) has accelerated the deployment of millions of
    sensors at the edge of the network, through Smart City infrastructure and
    lifestyle devices. Cloud computing platforms are often tasked with handling
    these large volumes and fast streams of data from the edge. Recently, Fog
    computing has emerged as a concept for low-latency and resource-rich processing
    of these observation streams, to complement Edge and Cloud computing. In this
    paper, we review various dimensions of system architecture, application
    characteristics and platform abstractions that are manifest in this Edge, Fog
    and Cloud eco-system. We highlight novel capabilities of the Edge and Fog
    layers, such as physical and application mobility, privacy sensitivity, and a
    nascent runtime environment. IoT application case studies based on first-hand
    experiences across diverse domains drive this categorization. We also highlight
    the gap between the potential and the reality of Fog computing, and identify
    challenges that need to be overcome for the solution to be sustainable.
    Together, our article can help platform and application developers bridge the
    gap that remains in making Fog computing viable.

    A Rollback in the History of Communication-Induced Checkpointing

    Islene C. Garcia, Gustavo M. D. Vieira, Luiz E. Buzato
    Comments: First draft
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    The literature on communication-induced checkpointing presents a family of
    protocols that use logical clocks to control whether forced checkpoints must be
    taken. For many years, HMNR, also called Fully Informed (FI), was the most
    complex and efficient protocol of this family. The Lazy-FI protocol applies a
    lazy strategy that defers the increase of logical clocks, resulting in a
    protocol with better perfomance for distributed systems where processes can
    take basic checkpoints at different, asymmetric, rates. Recently, the Fully
    Informed aNd Efficient (FINE) protocol was proposed using the same control
    structures as FI, but with a stronger and, presumably better,
    checkpoint-inducing condition. FINE and its lazy version, called Lazy-FINE,
    would now be the most efficient checkpointing protocols based on logical
    clocks. This paper reviews this family of protocols, proves a theorem on a
    condition that must be enforced by all stronger versions of FI, and proves that
    both FINE and Lazy-FINE do not guarantee the absence of useless checkpoints. As
    a consequence, FI and Lazy-FI can be rolled back to the position of most
    efficient protocols of this family of index-based checkpointing protocols.

    Computing Influence of a Product through Uncertain Reverse Skyline

    Md. Saiful Islam, Wenny Rahayu, Chengfei Liu, Tarique Anwar, Bela Stantic
    Comments: 12 pages, 3 tables, 12 figures, submitted to SSDBM 2017
    Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    Understanding the influence of a product is crucially important for making
    informed business decisions. This paper introduces a new type of skyline
    queries, called uncertain reverse skyline, for measuring the influence of a
    probabilistic product in uncertain data settings. More specifically, given a
    dataset of probabilistic products P and a set of customers C, an uncertain
    reverse skyline of a probabilistic product q retrieves all customers c in C
    which include q as one of their preferred products. We present efficient
    pruning ideas and techniques for processing the uncertain reverse skyline query
    of a probabilistic product using R-Tree data index. We also present an
    efficient parallel approach to compute the uncertain reverse skyline and
    influence score of a probabilistic product. Our approach significantly
    outperforms the baseline approach derived from the existing literature. The
    efficiency of our approach is demonstrated by conducting extensive experiments
    with both real and synthetic datasets.

    Review of Apriori Based Algorithms on MapReduce Framework

    Sudhakar Singh, Rakhi Garg, P. K. Mishra
    Comments: 12 pages, 3 figures, ICC-2014
    Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)

    The Apriori algorithm that mines frequent itemsets is one of the most popular
    and widely used data mining algorithms. Now days many algorithms have been
    proposed on parallel and distributed platforms to enhance the performance of
    Apriori algorithm. They differ from each other on the basis of load balancing
    technique, memory system, data decomposition technique and data layout used to
    implement them. The problems with most of the distributed framework are
    overheads of managing distributed system and lack of high level parallel
    programming language. Also with grid computing there is always potential
    chances of node failures which cause multiple re-executions of tasks. These
    problems can be overcome by the MapReduce framework introduced by Google.
    MapReduce is an efficient, scalable and simplified programming model for large
    scale distributed data processing on a large cluster of commodity computers and
    also used in cloud computing. In this paper, we present the overview of
    parallel Apriori algorithm implemented on MapReduce framework. They are
    categorized on the basis of Map and Reduce functions used to implement them
    e.g. 1-phase vs. k-phase, I/O of Mapper, Combiner and Reducer, using
    functionality of Combiner inside Mapper etc. This survey discusses and analyzes
    the various implementations of Apriori on MapReduce framework on the basis of
    their distinguishing characteristics. Moreover, it also includes the advantages
    and limitations of MapReduce framework.


    Learning

    A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe

    Francesco Locatello, Rajiv Khanna, Michael Tschannen, Martin Jaggi
    Comments: Accepted for presentation at AISTATS 2017, preliminary version
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Two of the most fundamental prototypes of greedy optimization are the
    matching pursuit and Frank-Wolfe algorithms. In this paper, we take a unified
    view on both classes of methods, leading to the first explicit convergence
    rates of matching pursuit methods in an optimization sense, for general sets of
    atoms. We derive sublinear ((1/t)) convergence for both classes on general
    smooth objectives, and linear convergence on strongly convex objectives, as
    well as a clear correspondence of algorithm variants. Our presented algorithms
    and rates are affine invariant, and do not need any incoherence or sparsity
    assumptions.

    Negative-Unlabeled Tensor Factorization for Location Category Inference from Inaccurate Mobility Data

    Jinfeng Yi, Qi Lei, Wesley Gifford, Ji Liu
    Subjects: Learning (cs.LG)

    Identifying significant location categories visited by mobile phone users is
    the key to a variety of applications. This is an extremely challenging task due
    to the possible deviation between the estimated location coordinate and the
    actual location, which could be on the order of kilometers. Using the collected
    location coordinate as the center and its associated location error as the
    radius, we can draw a location uncertainty circle that may cover multiple
    location categories, especially in densely populated areas. To estimate the
    actual location category more precisely, we propose a novel tensor
    factorization framework, through several key observations including the
    intrinsic correlations between users, to infer the most likely location
    categories within the location uncertainty circle. In addition, the proposed
    algorithm can also predict where users are even when there is no location
    update. In order to efficiently solve the proposed framework, we propose a
    parameter-free and scalable optimization algorithm by effectively exploring the
    sparse and low-rank structure of the tensor. Our empirical studies show that
    the proposed algorithm is both efficient and effective: it can solve problems
    with millions of users and billions of location updates, and also provides
    superior prediction accuracies on real-world location updates and check-in data
    sets.

    Positive-Unlabeled Demand-Aware Recommendation

    Jinfeng Yi, Cho-Jui Hsieh, Kush Varshney, Lijun Zhang, Yao Li
    Subjects: Learning (cs.LG)

    Recommendation for e-commerce with a mix of durable and nondurable goods has
    characteristics that distinguish it from the well-studied media recommendation
    problem. The demand for items is a combined effect of form utility and time
    utility, i.e., a product must both be intrinsically appealing to a consumer and
    the time must be right for purchase. In particular for durable goods, time
    utility is a function of inter-purchase duration within product category
    because consumers are unlikely to purchase two items in the same category in
    close temporal succession. Moreover, purchase data, in contrast to ratings
    data, is implicit with non-purchases not necessarily indicating dislike.
    Together, these issues give rise to the positive-unlabeled demand-aware
    recommendation problem that we pose via joint low-rank tensor completion and
    product category inter-purchase duration vector estimation. We further relax
    this problem and propose a highly scalable alternating minimization approach
    with which we can solve problems with millions of users and items. We also show
    superior prediction accuracies on multiple real-world data sets.

    Fast rates for online learning in Linearly Solvable Markov Decision Processes

    Gergely Neu, Vicenç Gómez
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

    We study the problem of online learning in a class of Markov decision
    processes known as linearly solvable MDPs. In the stationary version of this
    problem, a learner interacts with its environment by directly controlling the
    state transitions, attempting to balance a fixed state-dependent cost and a
    certain smooth cost penalizing extreme control inputs. In the current paper, we
    consider an online setting where the state costs may change arbitrarily between
    consecutive rounds, and the learner only observes the costs at the end of each
    respective round. We are interested in constructing algorithms for the learner
    that guarantee small regret against the best stationary control policy chosen
    in full knowledge of the cost sequence. Our main result is showing that the
    smoothness of the control cost enables the simple algorithm of following the
    leader to achieve a regret of order (log^2 T) after (T) rounds, vastly
    improving on the best known regret bound of order (T^{3/4}) for this setting.

    Convolution Aware Initialization

    Armen Aghajanyan
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Initialization of parameters in deep neural networks has been shown to have a
    big impact on the performance of the networks (Mishkin & Matas, 2015). The
    initialization scheme devised by He et al, allowed convolution activations to
    carry a constrained mean which allowed deep networks to be trained effectively
    (He et al., 2015a). Orthogonal initializations and more generally orthogonal
    matrices in standard recurrent networks have been proved to eradicate the
    vanishing and exploding gradient problem (Pascanu et al., 2012). Majority of
    current initialization schemes do not take fully into account the intrinsic
    structure of the convolution operator. This paper introduces a new type of
    initialization built around the duality of the Fourier transform and the
    convolution operator. With Convolution Aware Initialization we noticed not only
    higher accuracy and lower loss, but faster convergence in general. We achieve
    new state of the art on the CIFAR10 dataset, and achieve close to state of the
    art on various other tasks.

    Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

    Emre Çakır, Giambattista Parascandolo, Toni Heittola, Heikki Huttunen, Tuomas Virtanen
    Comments: Accepted for IEEE Transactions on Audio, Speech and Language Processing, Special Issue on Sound Scene and Event Analysis
    Subjects: Learning (cs.LG); Sound (cs.SD)

    Sound events often occur in unstructured environments where they exhibit wide
    variations in their frequency content and temporal structure. Convolutional
    neural networks (CNN) are able to extract higher level features that are
    invariant to local spectral and temporal variations. Recurrent neural networks
    (RNNs) are powerful in learning the longer term temporal context in the audio
    signals. CNNs and RNNs as classifiers have recently shown improved performances
    over established methods in various sound recognition tasks. We combine these
    two approaches in a Convolutional Recurrent Neural Network (CRNN) and apply it
    on a polyphonic sound event detection task. We compare the performance of the
    proposed CRNN method with CNN, RNN, and other established methods, and observe
    a considerable improvement for four different datasets consisting of everyday
    sound events.

    Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox

    Jialei Wang, Weiran Wang, Nathan Srebro
    Subjects: Learning (cs.LG)

    We present and analyze statistically optimal, communication and memory
    efficient distributed stochastic optimization algorithms with near-linear
    speedups (up to (log)-factors). This improves over prior work which includes
    methods with near-linear speedups but polynomial communication requirements
    (accelerated minibatch SGD) and communication efficient methods which do not
    exhibit any runtime speedups over a naive single-machine approach. We first
    analyze a distributed SVRG variant as a distributed stochastic optimization
    method and show that it can achieve near-linear speedups with logarithmic
    rounds of communication, at the cost of high memory requirements. We then
    present a novel method, stochastic DANE, which trades off memory for
    communication and still allows for optimization with communication which scales
    only logarithmically with the desired accuracy while also being memory
    efficient. Stochastic DANE is based on a minibatch prox procedure, solving a
    non-linearized subproblem on a minibatch at each iteration. We provide a novel
    analysis for this procedure which achieves the statistical optimal rate
    regardless of minibatch size and smoothness, and thus significantly improving
    on prior work.

    Exact tensor completion with sum-of-squares

    Aaron Potechin, David Steurer
    Subjects: Learning (cs.LG); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Machine Learning (stat.ML)

    We obtain the first polynomial-time algorithm for exact tensor completion
    that improves over the bound implied by reduction to matrix completion. The
    algorithm recovers an unknown 3-tensor with (r) incoherent, orthogonal
    components in (mathbb R^n) from (rcdot ilde O(n^{1.5})) randomly observed
    entries of the tensor. This bound improves over the previous best one of
    (rcdot ilde O(n^{2})) by reduction to exact matrix completion. Our bound
    also matches the best known results for the easier problem of approximate
    tensor completion (Barak & Moitra, 2015).

    Our algorithm and analysis extends seminal results for exact matrix
    completion (Candes & Recht, 2009) to the tensor setting via the sum-of-squares
    method. The main technical challenge is to show that a small number of randomly
    chosen monomials are enough to construct a degree-3 polynomial with a precisely
    planted orthogonal global optima over the sphere and that this fact can be
    certified within the sum-of-squares proof system.

    Survey of Reasoning using Neural networks

    Amit Sahu
    Comments: 12 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Reason and inference require process as well as memory skills by humans.
    Neural networks are able to process tasks like image recognition (better than
    humans) but in memory aspects are still limited (by attention mechanism, size).
    Recurrent Neural Network (RNN) and it’s modified version LSTM are able to solve
    small memory contexts, but as context becomes larger than a threshold, it is
    difficult to use them. The Solution is to use large external memory. Still, it
    poses many challenges like, how to train neural networks for discrete memory
    representation, how to describe long term dependencies in sequential data etc.
    Most prominent neural architectures for such tasks are Memory networks:
    inference components combined with long term memory and Neural Turing Machines:
    neural networks using external memory resources. Also, additional techniques
    like attention mechanism, end to end gradient descent on discrete memory
    representation are needed to support these solutions. Preliminary results of
    above neural architectures on simple algorithms (sorting, copying) and Question
    Answering (based on story, dialogs) application are comparable with the state
    of the art. In this paper, I explain these architectures (in general), the
    additional techniques used and the results of their application.

    Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization

    Mahdi Soltanolkotabi
    Subjects: Learning (cs.LG); Information Theory (cs.IT); Functional Analysis (math.FA); Optimization and Control (math.OC); Machine Learning (stat.ML)

    This paper concerns the problem of recovering an unknown but structured
    signal (x in R^n) from (m) quadratic measurements of the form
    (y_r=|<a_r,x>|^2) for (r=1,2,…,m). We focus on the under-determined setting
    where the number of measurements is significantly smaller than the dimension of
    the signal ((m<<n)). We formulate the recovery problem as a nonconvex
    optimization problem where prior structural information about the signal is
    enforced through constrains on the optimization variables. We prove that
    projected gradient descent, when initialized in a neighborhood of the desired
    signal, converges to the unknown signal at a linear rate. These results hold
    for any constraint set (convex or nonconvex) providing convergence guarantees
    to the global optimum even when the objective function and constraint set is
    nonconvex. Furthermore, these results hold with a number of measurements that
    is only a constant factor away from the minimal number of measurements required
    to uniquely identify the unknown signal. Our results provide the first provably
    tractable algorithm for this data-poor regime, breaking local sample complexity
    barriers that have emerged in recent literature. In a companion paper we
    demonstrate favorable properties for the optimization problem that may enable
    similar results to continue to hold more globally (over the entire ambient
    space). Collectively these two papers utilize and develop powerful tools for
    uniform convergence of empirical processes that may have broader implications
    for rigorous understanding of constrained nonconvex optimization heuristics.
    The mathematical results in this paper also pave the way for a new generation
    of data-driven phase-less imaging systems that can utilize prior information to
    significantly reduce acquisition time and enhance image reconstruction,
    enabling nano-scale imaging at unprecedented speeds and resolutions.

    On the Consistency of (k)-means++ algorithm

    Mieczysław A. Kłopotek
    Subjects: Learning (cs.LG)

    We prove in this paper that the expected value of the objective function of
    the (k)-means++ algorithm for samples converges to population expected value.
    As (k)-means++, for samples, provides with constant factor approximation for
    (k)-means objectives, such an approximation can be achieved for the population
    with increase of the sample size.

    This result is of potential practical relevance when one is considering using
    subsampling when clustering large data sets (large data bases).

    PixelNet: Representation of the pixels, by the pixels, and for the pixels

    Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta. Deva Ramanan
    Comments: Project Page: this http URL arXiv admin note: substantial text overlap with arXiv:1609.06694
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)

    We explore design principles for general pixel-level prediction problems,
    from low-level edge detection to mid-level surface normal estimation to
    high-level semantic segmentation. Convolutional predictors, such as the
    fully-convolutional network (FCN), have achieved remarkable success by
    exploiting the spatial redundancy of neighboring pixels through convolutional
    processing. Though computationally efficient, we point out that such approaches
    are not statistically efficient during learning precisely because spatial
    redundancy limits the information learned from neighboring pixels. We
    demonstrate that stratified sampling of pixels allows one to (1) add diversity
    during batch updates, speeding up learning; (2) explore complex nonlinear
    predictors, improving accuracy; and (3) efficiently train state-of-the-art
    models tabula rasa (i.e., “from scratch”) for diverse pixel-labeling tasks. Our
    single architecture produces state-of-the-art results for semantic segmentation
    on PASCAL-Context dataset, surface normal estimation on NYUDv2 depth dataset,
    and edge detection on BSDS.

    A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks

    Yixing Li, Zichuan Liu, Kai Xu, Hao Yu, Fengbo Ren
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    FPGA-based hardware accelerators for convolutional neural networks (CNNs)
    have obtained great attentions due to their higher energy efficiency than GPUs.
    However, it is challenging for FPGA-based solutions to achieve a higher
    throughput than GPU counterparts. In this paper, we demonstrate that FPGA
    acceleration can be a superior solution in terms of both throughput and energy
    efficiency when a CNN is trained with binary constraints on weights and
    activations. Specifically, we propose an optimized accelerator architecture
    tailored for bitwise convolution and normalization that features massive
    spatial parallelism with deep pipelines stages. Experiment results show that
    the proposed architecture is 8.3x faster and 75x more energy-efficient than a
    Titan X GPU for processing online individual requests (in small batch size).
    For processing static data (in large batch size), the proposed solution is on a
    par with a Titan X GPU in terms of throughput while delivering 9.5x higher
    energy efficiency.

    Causal Inference on Multivariate Mixed-Type Data by Minimum Description Length

    Alexander Marx, Jilles Vreeken
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Given data over the joint distribution of two univariate or multivariate
    random variables (X) and (Y) of mixed or single type data, we consider the
    problem of inferring the most likely causal direction between (X) and (Y). We
    take an information theoretic approach, from which it follows that first
    describing the data over cause and then that of effect given cause is shorter
    than the reverse direction.

    For practical inference, we propose a score for causal models for mixed type
    data based on the Minimum Description Length (MDL) principle. In particular, we
    model dependencies between (X) and (Y) using classification and regression
    trees. Inferring the optimal model is NP-hard, and hence we propose Crack, a
    fast greedy algorithm to infer the most likely causal direction directly from
    the data.

    Empirical evaluation on synthetic, benchmark, and real world data shows that
    Crack reliably and with high accuracy infers the correct causal direction on
    both univariate and multivariate cause–effect pairs over both single and mixed
    type data.

    Interpreting Outliers: Localized Logistic Regression for Density Ratio Estimation

    Makoto Yamada, Song Liu, Samuel Kaski
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We propose an inlier-based outlier detection method capable of both
    identifying the outliers and explaining why they are outliers, by identifying
    the outlier-specific features. Specifically, we employ an inlier-based outlier
    detection criterion, which uses the ratio of inlier and test probability
    densities as a measure of plausibility of being an outlier. For estimating the
    density ratio function, we propose a localized logistic regression algorithm.
    Thanks to the locality of the model, variable selection can be
    outlier-specific, and will help interpret why points are outliers in a
    high-dimensional space. Through synthetic experiments, we show that the
    proposed algorithm can successfully detect the important features for outliers.
    Moreover, we show that the proposed algorithm tends to outperform existing
    algorithms in benchmark datasets.

    Towards a Common Implementation of Reinforcement Learning for Multiple Robotic Tasks

    Angel Martínez-Tenor, Juan Antonio Fernández-Madrigal, Ana Cruz-Martín, Javier González-Jiménez
    Comments: 15 pages, 10 figures, 7 tables. To be published in a scientific journal
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)

    Mobile robots are increasingly being employed for performing complex tasks in
    dynamic environments. Reinforcement learning (RL) methods are recognized to be
    promising for specifying such tasks in a relatively simple manner. However, the
    strong dependency between the learning method and the task to learn is a
    well-known problem that restricts practical implementations of RL in robotics,
    often requiring major modifications of parameters and adding other techniques
    for each particular task. In this paper we present a practical core
    implementation of RL which enables the learning process for multiple robotic
    tasks with minimal per-task tuning or none. Based on value iteration methods,
    this implementation includes a novel approach for action selection, called
    Q-biased softmax regression (QBIASSR), which avoids poor performance of the
    learning process when the robot reaches new unexplored states. Our approach
    takes advantage of the structure of the state space by attending the physical
    variables involved (e.g., distances to obstacles, X,Y,{ heta} pose, etc.),
    thus experienced sets of states may favor the decision-making process of
    unexplored or rarely-explored states. This improvement has a relevant role in
    reducing the tuning of the algorithm for particular tasks. Experiments with
    real and simulated robots, performed with the software framework also
    introduced here, show that our implementation is effectively able to learn
    different robotic tasks without tuning the learning method. Results also
    suggest that the combination of true online SARSA({lambda}) with QBIASSR can
    outperform the existing RL core algorithms in low-dimensional robotic tasks.

    On the (Statistical) Detection of Adversarial Examples

    Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, Patrick McDaniel
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG); Machine Learning (stat.ML)

    Machine Learning (ML) models are applied in a variety of tasks such as
    network intrusion detection or malware classification. Yet, these models are
    vulnerable to a class of malicious inputs known as adversarial examples. These
    are slightly perturbed inputs that are classified incorrectly by the ML model.
    The mitigation of these adversarial inputs remains an open problem.

    As a step towards a model-agnostic defense against adversarial examples, we
    show that they are not drawn from the same distribution than the original data,
    and can thus be detected using statistical tests. As the number of malicious
    points included in samples presented to the test diminishes, its detection
    confidence decreases. Hence, we introduce a complimentary approach to identify
    specific inputs that are adversarial among sets of inputs flagged by the
    statistical test. Specifically, we augment our ML model with an additional
    output, in which the model is trained to classify all adversarial inputs.

    We evaluate our approach on multiple adversarial example crafting methods
    (including the fast gradient sign and Jacobian-based saliency map methods) with
    several datasets. The statistical test flags sample sets containing adversarial
    inputs with confidence above 80%. Furthermore, our augmented model either
    detects adversarial examples with high accuracy (>80%) or increases the
    adversary’s cost—the perturbation added—by more than 150%. In this way, we
    show that statistical properties of adversarial examples are essential to their
    detection.

    SAR: A Semantic Analysis Approach for Recommendation

    Han Xiao, Minlie Huang, Xiaoyan Zhu
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

    Recommendation system is a common demand in daily life and matrix completion
    is a widely adopted technique for this task. However, most matrix completion
    methods lack semantic interpretation and usually result in weak-semantic
    recommendations. To this end, this paper proposes a {f S}emantic {f
    A}nalysis approach for {f R}ecommendation systems extbf{(SAR)}, which
    applies a two-level hierarchical generative process that assigns semantic
    properties and categories for user and item. SAR learns semantic
    representations of users/items merely from user ratings on items, which offers
    a new path to recommendation by semantic matching with the learned
    representations. Extensive experiments demonstrate SAR outperforms other
    state-of-the-art baselines substantially.

    Sample Efficient Policy Search for Optimal Stopping Domains

    Karan Goel, Christoph Dann, Emma Brunskill
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    Arising naturally in many fields, optimal stopping problems consider the
    question of deciding when to stop an observation-generating process. We examine
    the problem of simultaneously learning and planning in such domains, when data
    is collected directly from the environment. We propose GFSE, a simple and
    flexible model-free policy search method that reuses data for sample efficiency
    by leveraging problem structure. We bound the sample complexity of our approach
    to guarantee uniform convergence of policy value estimates, tightening existing
    PAC bounds to achieve logarithmic dependence on horizon length for our setting.
    We also examine the benefit of our method against prevalent model-based and
    model-free approaches on 3 domains taken from diverse fields.

    Determination of hysteresis in finite-state random walks using Bayesian cross validation

    Joshua C. Chang
    Comments: Submitted
    Subjects: Methodology (stat.ME); Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Quantitative Methods (q-bio.QM)

    Consider the problem of modeling hysteresis for finite-state random walks
    using higher-order Markov chains. This Letter introduces a Bayesian framework
    to determine, from data, the number of prior states of recent history upon
    which a trajectory is statistically dependent. The general recommendation is to
    use leave-one-out cross validation, using an easily-computable formula that is
    provided in closed form. Importantly, Bayes factors using flat model priors are
    biased in favor of too-complex a model (more hysteresis) when a large amount of
    data is present and the Akaike information criterion (AIC) is biased in favor
    of too-sparse a model (less hysteresis) when few data are present.

    Filtering Tweets for Social Unrest

    Alan Mishler, Kevin Wonus, Wendy Chambers, Michael Bloodgood
    Comments: 7 pages, 8 figures, 3 tables; to appear in Proceedings of the Eleventh IEEE International Conference on Semantic Computing (ICSC 2017), San Diego, California, 2017
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)

    Since the events of the Arab Spring, there has been increased interest in
    using social media to anticipate social unrest. While efforts have been made
    toward automated unrest prediction, we focus on filtering the vast volume of
    tweets to identify tweets relevant to unrest, which can be provided to
    downstream users for further analysis. We train a supervised classifier that is
    able to label Arabic language tweets as relevant to unrest with high
    reliability. We examine the relationship between training data size and
    performance and investigate ways to optimize the model building process while
    minimizing cost. We also explore how confidence thresholds can be set to
    achieve desired levels of performance.

    Bayesian Boolean Matrix Factorisation

    Tammo Rukat, Chris C. Holmes, Michalis K. Titsias, Christopher Yau
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Numerical Analysis (cs.NA); Genomics (q-bio.GN); Quantitative Methods (q-bio.QM); Methodology (stat.ME)

    Boolean matrix factorisation (BooMF) infers interpretable decompositions of a
    binary data matrix into a pair of low-rank, binary matrices: One containing
    meaningful patterns, the other quantifying how the observations can be
    expressed as a combination of these patterns. We introduce the OrMachine, a
    probabilistic generative model for BooMF and derive a Metropolised Gibbs
    sampler that facilitates very efficient parallel posterior inference. Our
    method outperforms all currently existing approaches for Boolean Matrix
    factorization and completion, as we show on simulated and real world data. This
    is the first method to provide full posterior inference for BooMF which is
    relevant in applications, e.g. for controlling false positive rates in
    collaborative filtering, and crucially it improves the interpretability of the
    inferred patterns. The proposed algorithm scales to large datasets as we
    demonstrate by analysing single cell gene expression data in 1.3 million mouse
    brain cells across 11,000 genes on commodity hardware.

    Developing a comprehensive framework for multimodal feature extraction

    Quinten McNamara, Alejandro de la Vega, Tal Yarkoni
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Learning (cs.LG); Multimedia (cs.MM)

    Feature extraction is a critical component of many applied data science
    workflows. In recent years, rapid advances in artificial intelligence and
    machine learning have led to an explosion of feature extraction tools and
    services that allow data scientists to cheaply and effectively annotate their
    data along a vast array of dimensions—ranging from detecting faces in images
    to analyzing the sentiment expressed in coherent text. Unfortunately, the
    proliferation of powerful feature extraction services has been mirrored by a
    corresponding expansion in the number of distinct interfaces to feature
    extraction services. In a world where nearly every new service has its own API,
    documentation, and/or client library, data scientists who need to combine
    diverse features obtained from multiple sources are often forced to write and
    maintain ever more elaborate feature extraction pipelines. To address this
    challenge, we introduce a new open-source framework for comprehensive
    multimodal feature extraction. Pliers is an open-source Python package that
    supports standardized annotation of diverse data types (video, images, audio,
    and text), and is expressly with both ease-of-use and extensibility in mind.
    Users can apply a wide range of pre-existing feature extraction tools to their
    data in just a few lines of Python code, and can also easily add their own
    custom extractors by writing modular classes. A graph-based API enables rapid
    development of complex feature extraction pipelines that output results in a
    single, standardized format. We describe the package’s architecture, detail its
    major advantages over previous feature extraction toolboxes, and use a sample
    application to a large functional MRI dataset to illustrate how pliers can
    significantly reduce the time and effort required to construct sophisticated
    feature extraction workflows while increasing code clarity and maintainability.


    Information Theory

    A data-driven basis for direct estimation of functionals of distributions

    Alan Wisler, Visar Berisha, Andreas Spanias, Alfred O. Hero
    Comments: Under review for IEEE Transactions on Signal Processing
    Subjects: Information Theory (cs.IT); Machine Learning (stat.ML)

    A number of fundamental quantities in statistical signal processing and
    information theory can be expressed as integral functions of two probability
    density functions. Such quantities are called density functionals as they map
    density functions onto the real line. For example, information divergence
    functions measure the dissimilarity between two probability density functions
    and are particularly useful in a number of applications. Typically, estimating
    these quantities requires complete knowledge of the underlying distribution
    followed by multi-dimensional integration. Existing methods make parametric
    assumptions about the data distribution or use non-parametric density
    estimation followed by high-dimensional integration. In this paper, we propose
    a new alternative. We introduce the concept of “data-driven” basis functions –
    functions of distributions whose value we can estimate given only samples from
    the underlying distributions without requiring distribution fitting or direct
    integration. We derive a new data-driven complete basis that is similar to the
    deterministic Bernstein polynomial basis and develop two methods for performing
    basis expansions of functionals of two distributions. We also show that the new
    basis set allows us to approximate functions of distributions as closely as
    desired. Finally, we evaluate the methodology by developing data driven
    estimators for the Kullback-Leibler divergences and the Hellinger distance and
    by constructing tight data-driven bounds on the Bayes Error Rate.

    Timely CSI Acquisition Exploiting Full Duplex

    Jesus Arnau, Marios Kountouris
    Comments: 6 pages, 4 figures, accepted at IEEE WCNC 2017
    Subjects: Information Theory (cs.IT)

    In this paper, we propose a method for acquiring accurate and timely channel
    state information (CSI) by leveraging full-duplex transmission. Specifically,
    we propose a mobile communication system in which base stations continuously
    transmit a pilot sequence in the uplink frequency band, while terminals use
    self-interference cancellation capabilities to obtain CSI at any time. Our
    proposal outperforms its half-duplex counterpart by at least 50% in terms of
    throughput while ensuring the same (or even lower) outage probability.
    Remarkably, it also outperforms using full duplex for downlink data
    transmission for low values of downlink bandwidth and received power.

    3-Dimensional Optical Orthogonal Codes with Ideal Autocorrelation-Bounds and Optimal Constructions

    Tim L. Alderson
    Subjects: Information Theory (cs.IT)

    Several new constructions of 3-dimensional optical orthogonal codes are
    presented here. In each case the codes have ideal autocorrelation (mathbf{
    lambda_a=0} ), and in all but one case a cross correlation of (
    mathbf{lambda_c=1} ). All codes produced are optimal with respect to the
    applicable Johnson bound either presented or developed here. Thus, on one hand
    the codes are as large as possible, and on the other, the bound(s) are shown to
    be tight. All codes are constructed by using a particular automorphism (a
    Singer cycle) of ( mathbf{ PG(k,q)} ), the finite projective geometry of
    dimension ( k ) over the field of order ( mathbf{q} ), or by using an affine
    analogue in ( AG(k,q) ).

    Several Classes of Permutation Trinomials over (mathbb F_{5^n}) From Niho Exponents

    Gaofei Wu, Nian Li
    Subjects: Information Theory (cs.IT)

    The construction of permutation trinomials over finite fields attracts
    people’s interest recently due to their simple form and some additional
    properties. Motivated by some results on the construction of permutation
    trinomials with Niho exponents, by constructing some new fractional polynomials
    that permute the set of the ((q+1))-th roots of unity in (mathbb F_{q^2}), we
    present several classes of permutation trinomials with Niho exponents over
    (mathbb F_{q^2}), where (q=5^k).

    Phaseless Sampling and Reconstruction of Real-Valued Signals in Shift-Invariant Spaces

    Cheng Cheng, Junzheng Jiang, Qiyu Sun
    Subjects: Information Theory (cs.IT); Numerical Analysis (math.NA)

    Sampling in shift-invariant spaces is a realistic model for signals with
    smooth spectrum. In this paper, we consider phaseless sampling and
    reconstruction of real-valued signals in a shift-invariant space from their
    magnitude measurements on the whole Euclidean space and from their phaseless
    samples taken on a discrete set with finite sampling density. We introduce an
    undirected graph to a signal and use connectivity of the graph to characterize
    whether the signal can be determined, up to a sign, from its magnitude
    measurements on the whole Euclidean space. Under the local complement property
    assumption on a shift-invariant space, we find a discrete set with finite
    sampling density such that signals in the shift-invariant space, that are
    determined from their magnitude measurements on the whole Euclidean space, can
    be reconstructed in a stable way from their phaseless samples taken on that
    discrete set. In this paper, we also propose a reconstruction algorithm which
    provides a suboptimal approximation to the original signal when its noisy
    phaseless samples are available only. Finally, numerical simulations are
    performed to demonstrate the robust reconstruction of box spline signals from
    their noisy phaseless samples.

    Phase Transitions of Spectral Initialization for High-Dimensional Nonconvex Estimation

    Yue M. Lu, Gen Li
    Subjects: Information Theory (cs.IT); Machine Learning (stat.ML)

    We study a spectral initialization method that serves as a key ingredient in
    recent work on using efficient iterative algorithms for estimating signals in
    nonconvex settings. Unlike previous analysis in the literature, which is
    restricted to the phase retrieval setting and which provides only performance
    bounds, we consider arbitrary generalized linear sensing models and present a
    precise asymptotic characterization of the performance of the spectral method
    in the high-dimensional regime. Our analysis reveals a phase transition
    phenomenon that depends on the sampling ratio. When the ratio is below a
    minimum threshold, the estimates given by the spectral method are no better
    than a random guess drawn uniformly from the hypersphere; above a maximum
    threshold, however, the estimates become increasingly aligned with the target
    signal. The computational complexity of the spectral method is also markedly
    different in the two phases. Worked examples and numerical results are provided
    to illustrate and verify the analytical predictions. In particular, simulations
    show that our asymptotic formulas provide accurate predictions even at moderate
    signal dimensions.

    Finite Horizon Energy-Efficient Scheduling with Energy Harvesting Transmitters over Fading Channels

    Baran Tan Bacinoglu, Elif Uysal-Biyikoglu, Can Emre Koksal
    Comments: A version of this paper has been submitted to IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    In this paper, energy-efficient transmission schemes achieving maximal
    throughput over a finite time interval are studied in a problem setting
    including energy harvests, data arrivals and channel variation. The goal is to
    express the offline optimal policy in a way that facilitates a good online
    solution. We express any throughput maximizing energy efficient offline
    schedule (EE-TM-OFF) explicitly in terms of water levels. This allows per-slot
    real-time evaluation of transmit power and rate decisions, using estimates of
    the associated offline water levels. To compute the online power level, we
    construct a stochastic dynamic program that incorporates the offline optimal
    solution as a stochastic process. We introduce the “Immediate Fill” metric
    which provides a lower bound on the efficiency of any online policy with
    respect to the corresponding optimal offline solution. The online algorithms
    obtained this way exhibit performance close to the offline optimal, not only in
    the long run but also in short problem horizons, deeming them suitable for
    practical implementations.

    Compressive Channel Estimation and Multi-user Detection in C-RAN

    Qi He, Tony Q.S. Quek, Zhi Chen, Shaoqian Li
    Comments: 6 pages, 3 figures
    Subjects: Information Theory (cs.IT)

    This paper considers the channel estimation (CE) and multi-user detection
    (MUD) problems in cloud radio access network (C-RAN). Assuming that active
    users are sparse in the network, we solve CE and MUD problems with compressed
    sensing (CS) technology to greatly reduce the long identification pilot
    overhead. A mixed L{2,1}-regularization functional for extended sparse
    group-sparsity recovery is proposed to exploit the inherently sparse property
    existing both in user activities and remote radio heads (RRHs) that active
    users are attached to. Empirical and theoretical guidelines are provided to
    help choosing tuning parameters which have critical effect on the performance
    of the penalty functional. To speed up the processing procedure, based on
    alternating direction method of multipliers and variable splitting strategy, an
    efficient algorithm is formulated which is guaranteed to be convergent.
    Numerical results are provided to illustrate the effectiveness of the proposed
    functional and efficient algorithm.

    Ultra-Reliable Short-Packet Communications with Wireless Energy Transfer

    Onel L. Alcaraz López, Hirley Alves, Richard Demo Souza, Evelio Martín García Fernández
    Comments: In press – IEEE SPL
    Subjects: Information Theory (cs.IT)

    We analyze and optimize a wireless system with energy transfer in the
    downlink and information transfer in the uplink, under quasi-static Nakagami-m
    fading. We consider ultra-reliable communication scenarios representative of
    the fifth-generation of wireless systems, with strict error and latency
    requirements. The error probability and delay are investigated, and an
    approximation for the former is given and validated through simulations. The
    numerical results demonstrate that there are optimum numbers of channels uses
    for both energy and information transfer for a given message length.

    Information-Theoretic Perspectives on Brascamp-Lieb Inequality and Its Reverse

    Jingbo Liu, Thomas A. Courtade, Paul Cuff, Sergio Verdu
    Comments: 57 pages; 6 figures; presented in part in ISIT 2016; submitted
    Subjects: Information Theory (cs.IT)

    We introduce an inequality which may be viewed as a generalization of both
    the Brascamp-Lieb inequality and its reverse (Barthe’s inequality), and prove
    its information-theoretic (i.e. entropic) formulation. This result leads to a
    unified approach to functional inequalities such as the variational formula of
    R’enyi entropy, hypercontractivity and its reverse, strong data processing
    inequalities, and transportation-cost inequalities, whose utility in the proofs
    of various coding theorems has gained growing popularity recently. We show that
    our information-theoretic setting is convenient for proving properties such as
    data processing, tensorization, convexity (Riesz-Thorin interpolation) and
    Gaussian optimality. In particular, we elaborate on a “doubling trick” used by
    Lieb and Geng-Nair to prove several results on Gaussian optimality. Several
    applications are discussed, including a generalization of the Brascamp-Lieb
    inequality involving Gaussian random transformations, the determination of
    Wyner’s common information of vector Gaussian sources, and the achievable rate
    region of certain key generation problems in the case of vector Gaussian
    sources.

    Noise Models in the Nonlinear Spectral Domain for Optical Fibre Communications

    Qun Zhang, Terence H. Chan
    Comments: submitted to IEEE Trans. on IT
    Subjects: Information Theory (cs.IT); Probability (math.PR)

    Existing works on building a soliton transmission system only encode
    information using the imaginary part of the eigenvalue, which fails to make
    full use of the signal degree-of-freedoms. Motivated by this observation, we
    make the first step of encoding information using (discrete) spectral
    amplitudes by proposing analytical noise models for the spectral amplitudes of
    (N)-solitons ((Ngeq 1)). To our best knowledge, this is the first work in
    building an analytical noise model for spectral amplitudes, which leads to many
    interesting information theoretic questions, such as channel capacity analysis,
    and has a potential of increasing the transmission rate. The noise statistics
    of the spectral amplitude of a soliton are also obtained without the Gaussian
    approximation.

    Multi-Agent Reinforcement Learning for Energy Harvesting Two-Hop Communications with Full Cooperation

    Andrea Ortiz, Hussein Al-Shatri, Tobias Weber, Anja Klein
    Subjects: Information Theory (cs.IT)

    We focus on energy harvesting (EH) two-hop communications since they are the
    essential building blocks of more complicated multi-hop networks. The scenario
    consists of three nodes, where an EH transmitter wants to send data to a
    receiver through an EH relay. The harvested energy is used exclusively for data
    transmission and we address the problem of how to efficiently use it. As in
    practical scenarios, we assume only causal knowledge at the EH nodes, i.e., in
    each time interval, the transmitter and the relay know their own current and
    past amounts of incoming energy, battery levels, data buffer levels and channel
    coefficients for their own transmit channels. Our goal is to find transmission
    policies which aim at maximizing the throughput considering that the EH nodes
    fully cooperate with each other to exchange their causal knowledge during a
    signaling phase. We model the problem as a Markov game and propose a
    multi-agent reinforcement learning algorithm to find the transmission policies.
    Furthermore, we show the trade-off between the achievable throughput and the
    signaling required, and provide convergence guarantees for the proposed
    algorithm. Results show that even when the signaling overhead is taken into
    account, the proposed algorithm outperforms other approaches that do not
    consider cooperation among the nodes.

    Robust Phase Retrieval via ADMM with Outliers

    Xue Jiang, H. C. So, X. Liu
    Subjects: Information Theory (cs.IT); Information Retrieval (cs.IR)

    An outlier-resistance phase retrieval algorithm based on alternating
    direction method of multipliers (ADMM) is devised in this letter. Instead of
    the widely used least squares criterion that is only optimal for Gaussian noise
    environment, we adopt the least absolute deviation criterion to enhance the
    robustness against outliers. Considering both intensity- and amplitude-based
    observation models, the framework of ADMM is developed to solve the resulting
    non-differentiable optimization problems. It is demonstrated that the core
    subproblem of ADMM is the proximity operator of the L1-norm, which can be
    computed efficiently by soft-thresholding in each iteration. Simulation results
    are provided to validate the accuracy and efficiency of the proposed approach
    compared to the existing schemes.

    Throughput Optimal Beam Alignment in Millimeter Wave Networks

    Muddassar Hussain, Nicolo Michelusi
    Subjects: Information Theory (cs.IT)

    Millimeter wave communications rely on narrow-beam transmissions to cope with
    the strong signal attenuation at these frequencies, thus demanding precise beam
    alignment between transmitter and receiver. The communication overhead incurred
    to achieve beam alignment may become a severe impairment in mobile networks.
    This paper addresses the problem of optimizing beam alignment acquisition, with
    the goal of maximizing throughput. Specifically, the algorithm jointly
    determines the portion of time devoted to beam alignment acquisition, as well
    as, within this portion of time, the optimal beam search parameters, using the
    framework of Markov decision processes. It is proved that a bisection search
    algorithm is optimal, and that it outperforms exhaustive and iterative search
    algorithms proposed in the literature. The duration of the beam alignment phase
    is optimized so as to maximize the overall throughput. The numerical results
    show that the throughput, optimized with respect to the duration of the beam
    alignment phase, achievable under the exhaustive algorithm is 88.3% lower than
    that achievable under the bisection algorithm. Similarly, the throughput
    achievable by the iterative search algorithm for a division factor of 4 and 8
    is, respectively, 12.8% and 36.4% lower than that achievable by the bisection
    algorithm.

    An Inequality for the Correlation of Two Functions Operating on Symmetric Bivariate Normal Variables

    Ran Hadad, Uri Erez, Yaming Yu
    Subjects: Information Theory (cs.IT)

    An inequality is derived for the correlation of two univariate functions
    operating on symmetric bivariate normal random variables. The inequality is a
    simple consequence of the Cauchy-Schwarz inequality.

    Energy-Efficient Wireless Content Delivery with Proactive Caching

    Samuel O. Somuyiwa, András György, Deniz Gündüz
    Comments: 6 pages, 2 figures. Submitted, under review
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Optimization and Control (math.OC)

    We propose an intelligent proactive content caching scheme to reduce the
    energy consumption in wireless downlink. We consider an online social network
    (OSN) setting where new contents are generated over time, and remain
    extit{relevant} to the user for a random lifetime. Contents are downloaded to
    the user equipment (UE) through a time-varying wireless channel at an energy
    cost that depends on the channel state and the number of contents downloaded.
    The user accesses the OSN at random time instants, and consumes all the
    relevant contents. To reduce the energy consumption, we propose
    extit{proactive caching} of contents under favorable channel conditions to a
    finite capacity cache memory. Assuming that the channel quality (or
    equivalently, the cost of downloading data) is memoryless over time slots, we
    show that the optimal caching policy, which may replace contents in the cache
    with shorter remaining lifetime with contents at the server that remain
    relevant longer, has certain threshold structure with respect to the channel
    quality. Since the optimal policy is computationally demanding in practice, we
    introduce a simplified caching scheme and optimize its parameters using policy
    search. We also present two lower bounds on the energy consumption. We
    demonstrate through numerical simulations that the proposed caching scheme
    significantly reduces the energy consumption compared to traditional reactive
    caching tools, and achieves close-to-optimal performance for a wide variety of
    system parameters.

    Exact tensor completion with sum-of-squares

    Aaron Potechin, David Steurer
    Subjects: Learning (cs.LG); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Machine Learning (stat.ML)

    We obtain the first polynomial-time algorithm for exact tensor completion
    that improves over the bound implied by reduction to matrix completion. The
    algorithm recovers an unknown 3-tensor with (r) incoherent, orthogonal
    components in (mathbb R^n) from (rcdot ilde O(n^{1.5})) randomly observed
    entries of the tensor. This bound improves over the previous best one of
    (rcdot ilde O(n^{2})) by reduction to exact matrix completion. Our bound
    also matches the best known results for the easier problem of approximate
    tensor completion (Barak & Moitra, 2015).

    Our algorithm and analysis extends seminal results for exact matrix
    completion (Candes & Recht, 2009) to the tensor setting via the sum-of-squares
    method. The main technical challenge is to show that a small number of randomly
    chosen monomials are enough to construct a degree-3 polynomial with a precisely
    planted orthogonal global optima over the sphere and that this fact can be
    certified within the sum-of-squares proof system.

    Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization

    Mahdi Soltanolkotabi
    Subjects: Learning (cs.LG); Information Theory (cs.IT); Functional Analysis (math.FA); Optimization and Control (math.OC); Machine Learning (stat.ML)

    This paper concerns the problem of recovering an unknown but structured
    signal (x in R^n) from (m) quadratic measurements of the form
    (y_r=|<a_r,x>|^2) for (r=1,2,…,m). We focus on the under-determined setting
    where the number of measurements is significantly smaller than the dimension of
    the signal ((m<<n)). We formulate the recovery problem as a nonconvex
    optimization problem where prior structural information about the signal is
    enforced through constrains on the optimization variables. We prove that
    projected gradient descent, when initialized in a neighborhood of the desired
    signal, converges to the unknown signal at a linear rate. These results hold
    for any constraint set (convex or nonconvex) providing convergence guarantees
    to the global optimum even when the objective function and constraint set is
    nonconvex. Furthermore, these results hold with a number of measurements that
    is only a constant factor away from the minimal number of measurements required
    to uniquely identify the unknown signal. Our results provide the first provably
    tractable algorithm for this data-poor regime, breaking local sample complexity
    barriers that have emerged in recent literature. In a companion paper we
    demonstrate favorable properties for the optimization problem that may enable
    similar results to continue to hold more globally (over the entire ambient
    space). Collectively these two papers utilize and develop powerful tools for
    uniform convergence of empirical processes that may have broader implications
    for rigorous understanding of constrained nonconvex optimization heuristics.
    The mathematical results in this paper also pave the way for a new generation
    of data-driven phase-less imaging systems that can utilize prior information to
    significantly reduce acquisition time and enhance image reconstruction,
    enabling nano-scale imaging at unprecedented speeds and resolutions.




沪ICP备19023445号-2号
友情链接