IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 18 Apr 2017

    我爱机器学习(52ml.net)发表于 2017-04-18 00:00:00
    love 0

    Neural and Evolutionary Computing

    Interval Arithmetic and Interval-Aware Operators for Genetic Programming

    Grant Dick
    Comments: Extended version of: Grant Dick. 2017. Revisiting Interval Arithmetic for Regression Problems in Genetic Programming. In Proceedings of the 2017 Annual Conference on Genetic and Evolutionary Computation. ACM. To appear 8 pages, 10 figures
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Symbolic regression via genetic programming is a flexible approach to machine
    learning that does not require up-front specification of model structure.
    However, traditional approaches to symbolic regression require the use of
    protected operators, which can lead to perverse model characteristics and poor
    generalisation. In this paper, we revisit interval arithmetic as one possible
    solution to allow genetic programming to perform regression using unprotected
    operators. Using standard benchmarks, we show that using interval arithmetic
    within model evaluation does not prevent invalid solutions from entering the
    population, meaning that search performance remains compromised. We extend the
    basic interval arithmetic concept with `safe’ search operators that integrate
    interval information into their process, thereby greatly reducing the number of
    invalid solutions produced during search. The resulting algorithms are able to
    more effectively identify good models that generalise well to unseen data. We
    conclude with an analysis of the sensitivity of interval arithmetic-based
    operators with respect to the accuracy of the supplied input feature intervals.

    A Sport Tournament Scheduling by Genetic Algorithm with Swapping Method

    Tinnaluk Rutjanisarakul, Thiradet Jiarasuksakun
    Subjects: Neural and Evolutionary Computing (cs.NE)

    A sport tournament problem is considered the Traveling Tournament Problem
    (TTP). One interesting type is the mirrored Traveling Tournament Problem
    (mTTP). The objective of the problem is to minimize either the total number of
    traveling or the total distances of traveling or both. This research aims to
    find an optimized solution of the mirrored Traveling Tournament Problem with
    minimum total number of traveling. The solutions consisting of traveling and
    scheduling tables are solved by using genetic algorithm (GA) with swapping
    method. The number of traveling of all teams from obtained solutions are close
    to the lower bound theory of number of traveling. Moreover, this algorithm
    generates better solutions than known results for most cases.

    Self-Adaptive Differential Evolution for Bio-Inspired Neuromorphic Collision Avoidance

    Llewyn Salt, David Howard
    Comments: 8 page conference paper. Submitted to GECCO2017. Accepted as 2 page poster
    Subjects: Neural and Evolutionary Computing (cs.NE)

    We present the optimisation of a neuromorphic adaptation of a spiking neural
    network model of the locust Lobula Giant Movement Detector (LGMD), which
    detects looming objects and can be used to facilitate obstacle avoidance in
    robotic applications. Our model is constrained by the parameters of a mixed
    signal analogue-digital neuromorphic device and is driven by the output of a
    neuromorphic vision sensor DVS. Due to the number of user-defined parameters
    and the difficulty to find values that perform well we investigate the use of
    Differential Evolution and self-adaptive DE (SADE) to find optimal values. We
    demonstrate that these optimisation algorithms are suitable candidates to find
    suitable parameters for an obstacle avoidance system on an unmanned aerial
    vehicle (UAV).

    A Hybrid ACO Algorithm for the Next Release Problem

    He Jiang, Jingyuan Zhang, Jifeng Xuan, Zhilei Ren, Yan Hu
    Comments: 6 pages, 2 figures, Proceedings of 2nd International Conference on Software Engineering and Data Mining (SEDM 2010), 2010
    Subjects: Neural and Evolutionary Computing (cs.NE)

    In this paper, we propose a Hybrid Ant Colony Optimization algorithm (HACO)
    for Next Release Problem (NRP). NRP, a NP-hard problem in requirement
    engineering, is to balance customer requests, resource constraints, and
    requirement dependencies by requirement selection. Inspired by the successes of
    Ant Colony Optimization algorithms (ACO) for solving NP-hard problems, we
    design our HACO to approximately solve NRP. Similar to traditional ACO
    algorithms, multiple artificial ants are employed to construct new solutions.
    During the solution construction phase, both pheromone trails and neighborhood
    information will be taken to determine the choices of every ant. In addition, a
    local search (first found hill climbing) is incorporated into HACO to improve
    the solution quality. Extensively wide experiments on typical NRP test
    instances show that HACO outperforms the existing algorithms (GRASP and
    simulated annealing) in terms of both solution uality and running time.

    Adversarial and Clean Data Are Not Twins

    Zhitao Gong, Wenlu Wang, Wei-Shinn Ku
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Adversarial attack has cast a shadow on the massive success of deep neural
    networks. Despite being almost visually identical to the clean data, the
    adversarial images can fool deep neural networks into wrong predictions with
    very high confidence. In this paper, however, we show that we can build a
    simple binary classifier separating the adversarial apart from the clean data
    with accuracy over 99%. We also empirically show that the binary classifier is
    robust to a second-round adversarial attack. In other words, it is difficult to
    disguise adversarial samples to bypass the binary classifier. Further more, we
    empirically investigate the generalization limitation which lingers on all
    current defensive methods, including the binary classifier approach. And we
    hypothesize that this is the result of intrinsic property of adversarial
    crafting algorithms.

    In-Datacenter Performance Analysis of a Tensor Processing Unit

    Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, et al. (19 additional authors not shown)
    Comments: 17 pages, 11 figures, 8 tables. To appear at the 44th International Symposium on Computer Architecture (ISCA), Toronto, Canada, June 24-28, 2017
    Subjects: Hardware Architecture (cs.AR); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Many architects believe that major improvements in cost-energy-performance
    must now come from domain-specific hardware. This paper evaluates a custom
    ASIC—called a Tensor Processing Unit (TPU)—deployed in datacenters since
    2015 that accelerates the inference phase of neural networks (NN). The heart of
    the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak
    throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed
    on-chip memory. The TPU’s deterministic execution model is a better match to
    the 99th-percentile response-time requirement of our NN applications than are
    the time-varying optimizations of CPUs and GPUs (caches, out-of-order
    execution, multithreading, multiprocessing, prefetching, …) that help average
    throughput more than guaranteed latency. The lack of such features helps
    explain why, despite having myriad MACs and a big memory, the TPU is relatively
    small and low power. We compare the TPU to a server-class Intel Haswell CPU and
    an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters.
    Our workload, written in the high-level TensorFlow framework, uses production
    NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters’
    NN inference demand. Despite low utilization for some applications, the TPU is
    on average about 15X – 30X faster than its contemporary GPU or CPU, with
    TOPS/Watt about 30X – 80X higher. Moreover, using the GPU’s GDDR5 memory in the
    TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and
    200X the CPU.

    A fast ILP-based Heuristic for the robust design of Body Wireless Sensor Networks

    Fabio D'Andreagiovanni, Antonella Nardin, Enrico Natalizio
    Comments: This is the authors’ final version of the paper published in G. Squillero and K. Sim (Eds.): EvoApplications 2017, Part I, LNCS 10199, pp. 1-17, 2017. DOI: 10.1007/978-3-319-55849-3\_16. The final publication is available at Springer via this http URL
    Journal-ref: EvoApplications 2017, Springer LNCS 10199 (2017) 1-17
    Subjects: Optimization and Control (math.OC); Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)

    We consider the problem of optimally designing a body wireless sensor
    network, while taking into account the uncertainty of data generation of
    biosensors. Since the related min-max robustness Integer Linear Programming
    (ILP) problem can be difficult to solve even for state-of-the-art commercial
    optimization solvers, we propose an original heuristic for its solution. The
    heuristic combines deterministic and probabilistic variable fixing strategies,
    guided by the information coming from strengthened linear relaxations of the
    ILP robust model, and includes a very large neighborhood search for reparation
    and improvement of generated solutions, formulated as an ILP problem solved
    exactly. Computational tests on realistic instances show that our heuristic
    finds solutions of much higher quality than a state-of-the-art solver and than
    an effective benchmark heuristic.


    Computer Vision and Pattern Recognition

    End-to-end 3D face reconstruction with deep neural networks

    Pengfei Dou, Shishir K. Shah, Ioannis A. Kakadiaris
    Comments: Accepted to CVPR17
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Monocular 3D facial shape reconstruction from a single 2D facial image has
    been an active research area due to its wide applications. Inspired by the
    success of deep neural networks (DNN), we propose a DNN-based approach for
    End-to-End 3D FAce Reconstruction (UH-E2FAR) from a single 2D image. Different
    from recent works that reconstruct and refine the 3D face in an iterative
    manner using both an RGB image and an initial 3D facial shape rendering, our
    DNN model is end-to-end, and thus the complicated 3D rendering process can be
    avoided. Moreover, we integrate in the DNN architecture two components, namely
    a multi-task loss function and a fusion convolutional neural network (CNN) to
    improve facial expression reconstruction. With the multi-task loss function, 3D
    face reconstruction is divided into neutral 3D facial shape reconstruction and
    expressive 3D facial shape reconstruction. The neutral 3D facial shape is
    class-specific. Therefore, higher layer features are useful. In comparison, the
    expressive 3D facial shape favors lower or intermediate layer features. With
    the fusion-CNN, features from different intermediate layers are fused and
    transformed for predicting the 3D expressive facial shape. Through extensive
    experiments, we demonstrate the superiority of our end-to-end framework in
    improving the accuracy of 3D face reconstruction.

    AMTnet: Action-Micro-Tube regression by end-to-end trainable deep architecture

    Suman Saha, Gurkirt Singh, Fabio Cuzzolin
    Comments: 14 pages, 8 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Dominant approaches to action detection can only provide sub-optimal
    solutions to the problem, as they rely on seeking frame-level detections, to
    later compose them into action tubes in a post-processing step. With this paper
    we radically depart from current practice, and take a first step towards the
    design and implementation of a deep network architecture able to classify and
    regress whole video subsets, so providing a truly optimal solution of the
    action detection problem. In this work, in particular, we propose a novel deep
    net framework able to regress and classify 3D region proposals spanning two
    consecutive video frames, whose core is an evolution of classical region
    proposal networks (RPNs). As such, our 3D-RPN net is able to effectively encode
    the temporal aspect of actions by purely exploiting appearance, as opposed to
    methods which heavily rely on expensive flow maps computed in a parallel
    stream. The proposed model is end-to-end trainable and can be jointly optimised
    for action localisation and classification using a single step of optimisation.
    At test time the network predicts ‘micro-tubes’ spanning two frames, which are
    linked up into complete action tubes via a new algorithm which exploits the
    temporal encoding learned by the network and cuts computation time by 50%.
    Promising results on the J-HMDB-21 and UCF-101 action detection datasets show
    that our model does indeed outperform the state-of-the-art when relying purely
    on appearance.

    Multi-View Image Generation from a Single-View

    Bo Zhao, Xiao Wu, Zhi-Qi Cheng, Hao Liu, Jiashi Feng
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

    This paper addresses a challenging problem — how to generate multi-view
    cloth images from only a single view input. To generate realistic-looking
    images with different views from the input, we propose a new image generation
    model termed VariGANs that combines the strengths of the variational inference
    and the Generative Adversarial Networks (GANs). Our proposed VariGANs model
    generates the target image in a coarse-to-fine manner instead of a single pass
    which suffers from severe artifacts. It first performs variational inference to
    model global appearance of the object (e.g., shape and color) and produce a
    coarse image with a different view. Conditioned on the generated low resolution
    images, it then proceeds to perform adversarial learning to fill details and
    generate images of consistent details with the input. Extensive experiments
    conducted on two clothing datasets, MVC and DeepFashion, have demonstrated that
    images of a novel view generated by our model are more plausible than those
    generated by existing approaches, in terms of more consistent global appearance
    as well as richer and sharper details.

    An improved ellipsoid fitting algorithm using iterative random projections

    Amit Reza, Anand S. Sengupta
    Comments: Submitted to Applied Mathematics and Computation (Elsevier)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A generalized method for ellipsoid fitting against a minimum set of data
    points is described. The proposed method is numerically stable and applies to a
    wide range of ellipsoidal shapes, including highly elongated and arbitrarily
    oriented ellipsoids. This new method also provides for the retrieval of
    rotational angle and length of semi-axes of the fitted ellipsoids accurately.
    We demonstrate the efficacy of this algorithm on simulated data sets and also
    indicate its potential use in gravitational wave data analysis.

    Gang of GANs: Generative Adversarial Networks with Maximum Margin Ranking

    Felix Juefei-Xu, Vishnu Naresh Boddeti, Marios Savvides
    Comments: 16 pages. 11 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Traditional generative adversarial networks (GAN) and many of its variants
    are trained by minimizing the KL or JS-divergence loss that measures how close
    the generated data distribution is from the true data distribution. A recent
    advance called the WGAN based on Wasserstein distance can improve on the KL and
    JS-divergence based GANs, and alleviate the gradient vanishing, instability,
    and mode collapse issues that are common in the GAN training. In this work, we
    aim at improving on the WGAN by first generalizing its discriminator loss to a
    margin-based one, which leads to a better discriminator, and in turn a better
    generator, and then carrying out a progressive training paradigm involving
    multiple GANs to contribute to the maximum margin ranking loss so that the GAN
    at later stages will improve upon early stages. We call this method Gang of
    GANs (GoGAN). We have shown theoretically that the proposed GoGAN can reduce
    the gap between the true data distribution and the generated data distribution
    by at least half in an optimally trained WGAN. We have also proposed a new way
    of measuring GAN quality which is based on image completion tasks. We have
    evaluated our method on four visual datasets: CelebA, LSUN Bedroom, CIFAR-10,
    and 50K-SSFF, and have seen both visual and quantitative improvement over
    baseline WGAN.

    MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

    Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a class of efficient models called MobileNets for mobile and
    embedded vision applications. MobileNets are based on a streamlined
    architecture that uses depth-wise separable convolutions to build light weight
    deep neural networks. We introduce two simple global hyper-parameters that
    efficiently trade off between latency and accuracy. These hyper-parameters
    allow the model builder to choose the right sized model for their application
    based on the constraints of the problem. We present extensive experiments on
    resource and accuracy tradeoffs and show strong performance compared to other
    popular models on ImageNet classification. We then demonstrate the
    effectiveness of MobileNets across a wide range of applications and use cases
    including object detection, finegrain classification, face attributes and large
    scale geo-localization.

    Replicator Equation: Applications Revisited

    Tinsae G.Dulecha
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The replicator equation is a simple model of evolution that leads to stable
    form of Nash Equilibrium, Evolutionary Stable Strategy (ESS). It has been
    studied in connection with Evolutionary Game Theory and was originally
    developed for symmetric games. Beyond its first emphasis in biological use,
    evolutionary game theory has been expanded well beyond in social studies for
    behavioral analysis, in machine learning, computer vision and others. Its
    several applications in the fields of machine learning and computer vision has
    drawn my attention which is the reason to write this extended abstract

    Harvesting Multiple Views for Marker-less 3D Human Pose Annotations

    Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, Kostas Daniilidis
    Comments: CVPR 2017 Camera Ready
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent advances with Convolutional Networks (ConvNets) have shifted the
    bottleneck for many computer vision tasks to annotated data collection. In this
    paper, we present a geometry-driven approach to automatically collect
    annotations for human pose prediction tasks. Starting from a generic ConvNet
    for 2D human pose, and assuming a multi-view setup, we describe an automatic
    way to collect accurate 3D human pose annotations. We capitalize on constraints
    offered by the 3D geometry of the camera setup and the 3D structure of the
    human body to probabilistically combine per view 2D ConvNet predictions into a
    globally optimal 3D pose. This 3D pose is used as the basis for harvesting
    annotations. The benefit of the annotations produced automatically with our
    approach is demonstrated in two challenging settings: (i) fine-tuning a generic
    ConvNet-based 2D pose predictor to capture the discriminative aspects of a
    subject’s appearance (i.e.,”personalization”), and (ii) training a ConvNet from
    scratch for single view 3D human pose prediction without leveraging 3D pose
    groundtruth. The proposed multi-view pose estimator achieves state-of-the-art
    results on standard benchmarks, demonstrating the effectiveness of our method
    in exploiting the available multi-view information.

    AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching

    David Novotny, Diane Larlus, Andrea Vedaldi
    Comments: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Despite significant progress of deep learning in recent years,
    state-of-the-art semantic matching methods still rely on legacy features such
    as SIFT or HoG. We argue that the strong invariance properties that are key to
    the success of recent deep architectures on the classification task make them
    unfit for dense correspondence tasks, unless a large amount of supervision is
    used. In this work, we propose a deep network, termed AnchorNet, that produces
    image representations that are well-suited for semantic matching. It relies on
    a set of filters whose response is geometrically consistent across different
    object instances, even in the presence of strong intra-class, scale, or
    viewpoint variations. Trained only with weak image-level labels, the final
    representation successfully captures information about the object structure and
    improves results of state-of-the-art semantic matching methods such as the
    deformable spatial pyramid or the proposal flow methods. We show positive
    results on the cross-instance matching task where different instances of the
    same object category are matched as well as on a new cross-category semantic
    matching task aligning pairs of instances each from a different object class.

    Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions

    Amir Mazaheri, Dong Zhang, Mubarak Shah
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Given a video and a description sentence with one missing word (we call it
    the “source sentence”), Video-Fill-In-the-Blank (VFIB) problem is to find the
    missing word automatically. The contextual information of the sentence, as well
    as visual cues from the video, are important to infer the missing word
    accurately. Since the source sentence is broken into two fragments: the
    sentence’s left fragment (before the blank) and the sentence’s right fragment
    (after the blank), traditional Recurrent Neural Networks cannot encode this
    structure accurately because of many possible variations of the missing word in
    terms of the location and type of the word in the source sentence. For example,
    a missing word can be the first word or be in the middle of the sentence and it
    can be a verb or an adjective. In this paper, we propose a framework to tackle
    the textual encoding: Two separate LSTMs (the LR and RL LSTMs) are employed to
    encode the left and right sentence fragments and a novel structure is
    introduced to combine each fragment with an “external memory” corresponding the
    opposite fragments. For the visual encoding, end-to-end spatial and temporal
    attention models are employed to select discriminative visual representations
    to find the missing word. In the experiments, we demonstrate the superior
    performance of the proposed method on challenging VFIB problem. Furthermore, we
    introduce an extended and more generalized version of VFIB, which is not
    limited to a single blank. Our experiments indicate the generalization
    capability of our method in dealing with such more realistic scenarios.

    Temporal Action Localization by Structured Maximal Sums

    Zehuan Yuan, Jonathan C. Stroud, Tong Lu, Jia Deng
    Comments: Accepted to CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We address the problem of temporal action localization in videos. We pose
    action localization as a structured prediction over arbitrary-length temporal
    windows, where each window is scored as the sum of frame-wise classification
    scores. Additionally, our model classifies the start, middle, and end of each
    action as separate components, allowing our system to explicitly model each
    action’s temporal evolution and take advantage of informative temporal
    dependencies present in this structure. In this framework, we localize actions
    by searching for the structured maximal sum, a problem for which we develop a
    novel, provably-efficient algorithmic solution. The frame-wise classification
    scores are computed using features from a deep Convolutional Neural Network
    (CNN), which are trained end-to-end to directly optimize for a novel structured
    objective. We evaluate our system on the THUMOS 14 action detection benchmark
    and achieve competitive performance.

    Integrating Scene Text and Visual Appearance for Fine-Grained Image Classification with Convolutional Neural Networks

    Xiang Bai, Mingkun Yang, Pengyuan Lyu, Yongchao Xu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Text in natural images contains plenty of semantics that are often highly
    relevant to objects or scene. In this paper, we are concerned with the problem
    on fully exploiting scene text for visual understanding. The basic idea is
    combining word representations and deep visual features into a globally
    trainable deep convolutional neural network. First, the recognized words are
    obtained by a scene text reading system. Then, we combine the word embedding of
    the recognized words and the deep visual features into a single representation,
    which is optimized by a convolutional neural network for fine-grained image
    classification. In our framework, the attention mechanism is adopted to reveal
    the relevance between each recognized word and the given image, which
    reasonably enhances the recognition performance. We have performed experiments
    on two datasets: Con-Text dataset and Drink Bottle dataset, that are proposed
    for fine-grained classification of business places and drink bottles
    respectively. The experimental results consistently demonstrate that the
    proposed method combining textual and visual cues significantly outperforms
    classification with only visual representations. Moreover, we have shown that
    the learned representation improves the retrieval performance on the drink
    bottle images by a large margin, which is potentially useful in product search.

    Deep Learning for Photoacoustic Tomography from Sparse Data

    Stephan Antholzer, Markus Haltmeier, Johannes Schwab
    Comments: 13 pages, 8 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    The development of fast and accurate image reconstruction algorithms is a
    central aspect of computed tomography. In this paper we investigate this issue
    for the sparse data problem of photoacoustic tomography (PAT). We develop
    direct and highly efficient reconstruction algorithms based on deep-learning.
    In this approach image reconstruction is performed with a deep convolutional
    neural network (CNN), whose weights are adjusted prior to the actual image
    reconstruction based on a set of training data. Our results demonstrate that
    the proposed deep learning approach reconstructs images with a quality
    komparable to state of the art iterative approaches from sparse data. At the
    same time, the numerically complexity of our approach is much smaller and the
    image reconstruction is performed in a fraction of the time required by
    iterative methods.

    Interpretable 3D Human Action Analysis with Temporal Convolutional Networks

    Tae Soo Kim, Austin Reiter
    Comments: 8 pages, 5 figures, BNMW CVPR 2017 Submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The discriminative power of modern deep learning models for 3D human action
    recognition is growing ever so potent. In conjunction with the recent
    resurgence of 3D human action representation with 3D skeletons, the quality and
    the pace of recent progress have been significant. However, the inner workings
    of state-of-the-art learning based methods in 3D human action recognition still
    remain mostly black-box. In this work, we propose to use a new class of models
    known as Temporal Convolutional Neural Networks (TCN) for 3D human action
    recognition. Compared to popular LSTM-based Recurrent Neural Network models,
    given interpretable input such as 3D skeletons, TCN provides us a way to
    explicitly learn readily interpretable spatio-temporal representations for 3D
    human action recognition. We provide our strategy in re-designing the TCN with
    interpretability in mind and how such characteristics of the model is leveraged
    to construct a powerful 3D activity recognition method. Through this work, we
    wish to take a step towards a spatio-temporal model that is easier to
    understand, explain and interpret. The resulting model, Res-TCN, achieves
    state-of-the-art results on the largest 3D human action recognition dataset,
    NTU-RGBD.

    Recovery of damped exponentials using structured low rank matrix completion

    Arvind Balachandrasekaran, Vincent Magnotta, Mathews Jacob
    Comments: 14 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We introduce a structured low rank matrix completion algorithm to recover a
    series of images from their under-sampled measurements, where the signal along
    the parameter dimension at every pixel is described by a linear combination of
    exponentials. We exploit the exponential behavior of the signal at every pixel,
    along with the spatial smoothness of the exponential parameters to derive an
    annihilation relation in the Fourier domain. This relation translates to a
    low-rank property on a structured matrix constructed from the Fourier samples.
    We enforce the low rank property of the structured matrix as a regularization
    prior to recover the images. Since the direct use of current low rank matrix
    recovery schemes to this problem is associated with high computational
    complexity and memory demand, we adopt an iterative re-weighted least squares
    (IRLS) algorithm, which facilitates the exploitation of the convolutional
    structure of the matrix. Novel approximations involving two dimensional Fast
    Fourier Transforms (FFT) are introduced to drastically reduce the memory demand
    and computational complexity, which facilitates the extension of structured low
    rank methods to large scale three dimensional problems. We demonstrate our
    algorithm in the MR parameter mapping setting and show improvement over the
    state-of-the-art methods.

    Improving Object Detection With One Line of Code

    Navaneeth Bodla, Bharat Singh, Rama Chellappa, Larry S. Davis
    Comments: ICCV 2017 submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Non-maximum suppression is an integral part of the object detection pipeline.
    First, it sorts all detection boxes on the basis of their scores. The detection
    box M with the maximum score is selected and all other detection boxes with a
    significant overlap (using a pre-defined threshold) with M are suppressed. This
    process is recursively applied on the remaining boxes. As per the design of the
    algorithm, if an object lies within the predefined overlap threshold, it leads
    to a miss. To this end, we propose Soft-NMS, an algorithm which decays the
    detection scores of all other objects as a continuous function of their overlap
    with M. Hence, no object is eliminated in this process. Soft-NMS obtains
    consistent improvements for the coco-style mAP metric on standard datasets like
    PASCAL VOC 2007 (1.7\% for both R-FCN and Faster-RCNN) and MS-COCO (1.3\% for
    R-FCN and 1.1\% for Faster-RCNN) by just changing the NMS algorithm without any
    additional hyper-parameters. Further, the computational complexity of Soft-NMS
    is the same as traditional NMS and hence it can be efficiently implemented.
    Since Soft-NMS does not require any extra training and is simple to implement,
    it can be easily integrated into any object detection pipeline. Code for
    Soft-NMS is publicly available on GitHub url{this http URL}.

    CT Image Reconstruction in a Low Dimensional Manifold

    Wenxiang Cong, Ge Wang, Qingsong Yang, Jiang Hsieh, Jia Li, Rongjie Lai
    Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV)

    Regularization methods are commonly used in X-ray CT image reconstruction.
    Different regularization methods reflect the characterization of different
    prior knowledge of images. In a recent work, a new regularization method called
    a low-dimensional manifold model (LDMM) is investigated to characterize the
    low-dimensional patch manifold structure of natural images, where the manifold
    dimensionality characterizes structural information of an image. In this paper,
    we propose a CT image reconstruction method based on the prior knowledge of the
    low-dimensional manifold of CT image. Using the clinical raw projection data
    from GE clinic, we conduct comparisons for the CT image reconstruction among
    the proposed method, the simultaneous algebraic reconstruction technique (SART)
    with the total variation (TV) regularization, and the filtered back projection
    (FBP) method. Results show that the proposed method can successfully recover
    structural details of an imaging object, and achieve higher spatial and
    contrast resolution of the reconstructed image than counterparts of FBP and
    SART with TV.

    Trigger for the SoLid Reactor Antineutrino Experiment

    Lukas On Arnold, for the SoLid collaboration
    Comments: Poster presented at NuPhys2016 (London, 12-14 December 2016). 8 pages, LaTeX, 6 png figures, 1 pdf figure
    Subjects: Instrumentation and Detectors (physics.ins-det); Computer Vision and Pattern Recognition (cs.CV); High Energy Physics – Experiment (hep-ex)

    SoLid, located at SCK-CEN in Mol, Belgium, is a reactor antineutrino
    experiment at a very short baseline of 5.5 — 10m aiming at the search for
    sterile neutrinos and for high precision measurement of the neutrino energy
    spectrum of Uranium-235. It uses a novel approach using Lithium-6 sheets and
    PVT cubes as scintillators for tagging the Inverse Beta-Decay products (neutron
    and positron). Being located overground and close to the BR2 research reactor,
    the experiment faces a large amount of backgrounds. Efficient real-time
    background and noise rejection is essential in order to increase the
    signal-background ratio for precise oscillation measurement and decrease data
    production to a rate which can be handled by the online software. Therefore, a
    reliable distinction between the neutrons and background signals is crucial.
    This can be performed online with a dedicated firmware trigger. A peak counting
    algorithm and an algorithm measuring time over threshold have been identified
    as performing well both in terms of efficiency and fake rate, and have been
    implemented onto FPGA.

    Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy

    Jan Kremer, Kristoffer Stensbo-Smidt, Fabian Gieseke, Kim Steenstrup Pedersen, Christian Igel
    Journal-ref: IEEE Intelligent Systems, vol. 32, no. , pp. 16-22, Mar.-Apr. 2017
    Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Astrophysics and cosmology are rich with data. The advent of wide-area
    digital cameras on large aperture telescopes has led to ever more ambitious
    surveys of the sky. Data volumes of entire surveys a decade ago can now be
    acquired in a single night and real-time analysis is often desired. Thus,
    modern astronomy requires big data know-how, in particular it demands highly
    efficient machine learning and image analysis algorithms. But scalability is
    not the only challenge: Astronomy applications touch several current machine
    learning research questions, such as learning from biased data and dealing with
    label and measurement noise. We argue that this makes astronomy a great domain
    for computer science research, as it pushes the boundaries of data analysis. In
    the following, we will present this exciting application area for data
    scientists. We will focus on exemplary results, discuss main challenges, and
    highlight some recent methodological advancements in machine learning and image
    analysis triggered by astronomical applications.

    A learning-based approach for automatic image and video colorization

    Raj Kumar Gupta, Alex Yong-Sang Chia, Deepu Rajan, Huang Zhiyong
    Comments: Computer Graphics International – 2012
    Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we present a color transfer algorithm to colorize a broad
    range of gray images without any user intervention. The algorithm uses a
    machine learning-based approach to automatically colorize grayscale images. The
    algorithm uses the superpixel representation of the reference color images to
    learn the relationship between different image features and their corresponding
    color values. We use this learned information to predict the color value of
    each grayscale image superpixel. As compared to processing individual image
    pixels, our use of superpixels helps us to achieve a much higher degree of
    spatial consistency as well as speeds up the colorization process. The
    predicted color values of the gray-scale image superpixels are used to provide
    a ‘micro-scribble’ at the centroid of the superpixels. These color scribbles
    are refined by using a voting based approach. To generate the final
    colorization result, we use an optimization-based approach to smoothly spread
    the color scribble across all pixels within a superpixel. Experimental results
    on a broad range of images and the comparison with existing state-of-the-art
    colorization methods demonstrate the greater effectiveness of the proposed
    algorithm.

    ShapeWorld – A new test methodology for multimodal language understanding

    Alexander Kuhnle, Ann Copestake
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    We introduce a novel framework for evaluating multimodal deep learning models
    with respect to their language understanding and generalization abilities. In
    this approach, artificial data is automatically generated according to the
    experimenter’s specifications. The content of the data, both during training
    and evaluation, can be controlled in detail, which enables tasks to be created
    that require true generalization abilities, in particular the combination of
    previously introduced concepts in novel ways. We demonstrate the potential of
    our methodology by evaluating various visual question answering models on four
    different tasks, and show how our framework gives us detailed insights into
    their capabilities and limitations. By open-sourcing our framework, we hope to
    stimulate progress in the field of multimodal language understanding.


    Artificial Intelligence

    Morpheo: Traceable Machine Learning on Hidden data

    Mathieu Galtier, Camille Marini
    Comments: whitepaper, 9 pages, 6 figures
    Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

    Morpheo is a transparent and secure machine learning platform collecting and
    analysing large datasets. It aims at building state-of-the art prediction
    models in various fields where data are sensitive. Indeed, it offers strong
    privacy of data and algorithm, by preventing anyone to read the data, apart
    from the owner and the chosen algorithms. Computations in Morpheo are
    orchestrated by a blockchain infrastructure, thus offering total traceability
    of operations. Morpheo aims at building an attractive economic ecosystem around
    data prediction by channelling crypto-money from prediction requests to useful
    data and algorithms providers. Morpheo is designed to handle multiple data
    sources in a transfer learning approach in order to mutualize knowledge
    acquired from large datasets for applications with smaller but similar
    datasets.

    Probabilistic programs for inferring the goals of autonomous agents

    Marco F. Cusumano-Towner, Alexey Radul, David Wingate, Vikash K. Mansinghka
    Subjects: Artificial Intelligence (cs.AI)

    Intelligent systems sometimes need to infer the probable goals of people,
    cars, and robots, based on partial observations of their motion. This paper
    introduces a class of probabilistic programs for formulating and solving these
    problems. The formulation uses randomized path planning algorithms as the basis
    for probabilistic models of the process by which autonomous agents plan to
    achieve their goals. Because these path planning algorithms do not have
    tractable likelihood functions, new inference algorithms are needed. This paper
    proposes two Monte Carlo techniques for these “likelihood-free” models, one of
    which can use likelihood estimates from neural networks to accelerate
    inference. The paper demonstrates efficacy on three simple examples, each using
    under 50 lines of probabilistic code.

    Pseudorehearsal in actor-critic agents

    Marochko Vladimir, Leonard Johard, Manuel Mazzara
    Subjects: Artificial Intelligence (cs.AI)

    Catastrophic forgetting has a serious impact in reinforcement learning, as
    the data distribution is generally sparse and non-stationary over time. The
    purpose of this study is to investigate whether pseudorehearsal can increase
    performance of an actor-critic agent with neural-network based policy selection
    and function approximation in a pole balancing task and compare different
    pseudorehearsal approaches. We expect that pseudorehearsal assists learning
    even in such very simple problems, given proper initialization of the rehearsal
    parameters.

    Approximating the Backbone in the Weighted Maximum Satisfiability Problem

    He Jiang, Jifeng Xuan, Yan Hu
    Comments: 14 pages, 1 figure, Proceedings of Advances in Knowledge Discovery and Data Mining 2008 (PAKDD 2008), 2008
    Subjects: Artificial Intelligence (cs.AI)

    The weighted Maximum Satisfiability problem (weighted MAX-SAT) is a NP-hard
    problem with numerous applications arising in artificial intelligence. As an
    efficient tool for heuristic design, the backbone has been applied to
    heuristics design for many NP-hard problems. In this paper, we investigated the
    computational complexity for retrieving the backbone in weighted MAX-SAT and
    developed a new algorithm for solving this problem. We showed that it is
    intractable to retrieve the full backbone under the assumption that . Moreover,
    it is intractable to retrieve a fixed fraction of the backbone as well. And
    then we presented a backbone guided local search (BGLS) with Walksat operator
    for weighted MAX-SAT. BGLS consists of two phases: the first phase samples the
    backbone information from local optima and the backbone phase conducts local
    search under the guideline of backbone. Extensive experimental results on the
    benchmark showed that BGLS outperforms the existing heuristics in both solution
    quality and runtime.

    FML-based Prediction Agent and Its Application to Game of Go

    Chang-Shing Lee, Mei-Hui Wang, Chia-Hsiu Kao, Sheng-Chi Yang, Yusuke Nojima, Ryosuke Saga, Nan Shuo, Naoyuki Kubota
    Comments: 6 pages, 12 figures, Joint 17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems (IFSA-SCIS 2017), Otsu, Japan, Jun. 27-30, 2017
    Subjects: Artificial Intelligence (cs.AI)

    In this paper, we present a robotic prediction agent including a darkforest
    Go engine, a fuzzy markup language (FML) assessment engine, an FML-based
    decision support engine, and a robot engine for game of Go application. The
    knowledge base and rule base of FML assessment engine are constructed by
    referring the information from the darkforest Go engine located in NUTN and
    OPU, for example, the number of MCTS simulations and winning rate prediction.
    The proposed robotic prediction agent first retrieves the database of Go
    competition website, and then the FML assessment engine infers the winning
    possibility based on the information generated by darkforest Go engine. The
    FML-based decision support engine computes the winning possibility based on the
    partial game situation inferred by FML assessment engine. Finally, the robot
    engine combines with the human-friendly robot partner PALRO, produced by
    Fujisoft incorporated, to report the game situation to human Go players.
    Experimental results show that the FML-based prediction agent can work
    effectively.

    Online Spatial Concept and Lexical Acquisition with Simultaneous Localization and Mapping

    Akira Taniguchi, Yoshinobu Hagiwara, Tadahiro Taniguchi, Tetsunari Inamura
    Comments: Preprint submitted to 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Received March 1, 2017
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Robotics (cs.RO)

    In this paper, we propose an online learning algorithm based on a
    Rao-Blackwellized particle filter for spatial concept acquisition and mapping.
    We have proposed a nonparametric Bayesian spatial concept acquisition model
    (SpCoA). We propose a novel method (SpCoSLAM) integrating SpCoA and FastSLAM in
    the theoretical framework of the Bayesian generative model. The proposed method
    can simultaneously learn place categories and lexicons while incrementally
    generating an environmental map. Furthermore, the proposed method has scene
    image features and a language model added to SpCoA. In the experiments, we
    tested online learning of spatial concepts and environmental maps in a novel
    environment of which the robot did not have a map. Then, we evaluated the
    results of online learning of spatial concepts and lexical acquisition. The
    experimental results demonstrated that the robot was able to more accurately
    learn the relationships between words and the place in the environmental map
    incrementally by using the proposed method.

    The Reactor: A Sample-Efficient Actor-Critic Architecture

    Audrunas Gruslys, Mohammad Gheshlaghi Azar, Marc G. Bellemare, Remi Munos
    Subjects: Artificial Intelligence (cs.AI)

    In this work we present a new reinforcement learning agent, called Reactor
    (for Retrace-actor), based on an off-policy multi-step return actor-critic
    architecture. The agent uses a deep recurrent neural network for function
    approximation. The network outputs a target policy {pi} (the actor), an
    action-value Q-function (the critic) evaluating the current policy {pi}, and
    an estimated behavioral policy {hat mu} which we use for off-policy
    correction. The agent maintains a memory buffer filled with past experiences.
    The critic is trained by the multi-step off-policy Retrace algorithm and the
    actor is trained by a novel {eta}-leave-one-out policy gradient estimate
    (which uses both the off-policy corrected return and the estimated Q-function).
    The Reactor is sample-efficient thanks to the use of memory replay, and
    numerical efficient since it uses multi-step returns. Also both acting and
    learning can be parallelized. We evaluated our algorithm on 57 Atari 2600 games
    and demonstrate that it achieves state-of-the-art performance.

    Larger is Better: The Effect of Learning Rates Enjoyed by Stochastic Optimization with Progressive Variance Reduction

    Fanhua Shang
    Comments: 36 pages, 10 figures. The simple variant of SVRG is much better than the best-known stochastic method, Katyusha
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

    In this paper, we propose a simple variant of the original stochastic
    variance reduction gradient (SVRG), where hereafter we refer to as the variance
    reduced stochastic gradient descent (VR-SGD). Different from the choices of the
    snapshot point and starting point in SVRG and its proximal variant, Prox-SVRG,
    the two vectors of each epoch in VR-SGD are set to the average and last iterate
    of the previous epoch, respectively. This setting allows us to use much larger
    learning rates or step sizes than SVRG, e.g., 3/(7L) for VR-SGD vs 1/(10L) for
    SVRG, and also makes our convergence analysis more challenging. In fact, a
    larger learning rate enjoyed by VR-SGD means that the variance of its
    stochastic gradient estimator asymptotically approaches zero more rapidly.
    Unlike common stochastic methods such as SVRG and proximal stochastic methods
    such as Prox-SVRG, we design two different update rules for smooth and
    non-smooth objective functions, respectively. In other words, VR-SGD can tackle
    non-smooth and/or non-strongly convex problems directly without using any
    reduction techniques such as quadratic regularizers. Moreover, we analyze the
    convergence properties of VR-SGD for strongly convex problems, which show that
    VR-SGD attains a linear convergence rate. We also provide the convergence
    guarantees of VR-SGD for non-strongly convex problems. Experimental results
    show that the performance of VR-SGD is significantly better than its
    counterparts, SVRG and Prox-SVRG, and it is also much better than the best
    known stochastic method, Katyusha.

    Effective Warm Start for the Online Actor-Critic Reinforcement Learning based mHealth Intervention

    Feiyun Zhu, Peng Liao
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Online reinforcement learning (RL) is increasingly popular for the
    personalized mobile health (mHealth) intervention. It is able to personalize
    the type and dose of interventions according to user’s ongoing statuses and
    changing needs. However, at the beginning of online learning, there are usually
    too few samples to support the RL updating, which leads to poor performances. A
    delay in good performance of the online learning algorithms can be especially
    detrimental in the mHealth, where users tend to quickly disengage with apps. To
    address this problem, we propose a new online RL methodology that focuses on an
    effective warm start. The main idea is to make full use of the data accumulated
    and the decision rule achieved in a former study. As a result, we can greatly
    enrich the data size at the beginning of online learning in our method. Such
    case accelerates the online learning process for new users to achieve good
    performances not only at the beginning of online learning but also through the
    whole online learning process. Besides, we use the decision rules achieved
    previously to initialize the parameter in our online RL model for new users. It
    provides a good initialization for the proposed online RL algorithm. Experiment
    results show that promising improvements have been achieved by our method
    compared with the state-of-the-art method.

    A Novel Experimental Platform for In-Vessel Multi-Chemical Molecular Communications

    Nariman Farsad, David Pan, Andrea Goldsmith
    Subjects: Emerging Technologies (cs.ET); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Learning (cs.LG)

    This work presents a new multi-chemical experimental platform for molecular
    communication where the transmitter can release different chemicals. This
    platform is designed to be inexpensive and accessible, and it can be expanded
    to simulate different environments including the cardiovascular system and
    complex network of pipes in industrial complexes and city infrastructures. To
    demonstrate the capabilities of the platform, we implement a time-slotted
    binary communication system where a bit-0 is represented by an acid pulse, a
    bit-1 by a base pulse, and information is carried via pH signals. The channel
    model for this system, which is nonlinear and has long memories, is unknown.
    Therefore, we devise novel detection algorithms that use techniques from
    machine learning and deep learning to train a maximum-likelihood detector.
    Using these algorithms the bit error rate improves by an order of magnitude
    relative to the approach used in previous works. Moreover, our system achieves
    a data rate that is an order of magnitude higher than any of the previous
    molecular communication platforms.

    A Security Monitoring Framework For Virtualization Based HEP Infrastructures

    A. Gomez Ramirez, M. Martinez Pedreira, C. Grigoras, L. Betev, C. Lara, U. Kebschull for the ALICE Collaboration
    Comments: Proceedings of the 22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016, 10-14 October 2016, San Francisco. Submitted to Journal of Physics: Conference Series (JPCS)
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); High Energy Physics – Experiment (hep-ex)

    High Energy Physics (HEP) distributed computing infrastructures require
    automatic tools to monitor, analyze and react to potential security incidents.
    These tools should collect and inspect data such as resource consumption, logs
    and sequence of system calls for detecting anomalies that indicate the presence
    of a malicious agent. They should also be able to perform automated reactions
    to attacks without administrator intervention. We describe a novel framework
    that accomplishes these requirements, with a proof of concept implementation
    for the ALICE experiment at CERN. We show how we achieve a fully virtualized
    environment that improves the security by isolating services and Jobs without a
    significant performance impact. We also describe a collected dataset for
    Machine Learning based Intrusion Prevention and Detection Systems on Grid
    computing. This dataset is composed of resource consumption measurements (such
    as CPU, RAM and network traffic), logfiles from operating system services, and
    system call data collected from production Jobs running in an ALICE Grid test
    site and a big set of malware. This malware was collected from security
    research sites. Based on this dataset, we will proceed to develop Machine
    Learning algorithms able to detect malicious Jobs.

    Learn-Memorize-Recall-Reduce A Robotic Cloud Computing Paradigm

    Shaoshan Liu, Bolin Ding, Jie Tang, Dawei Sun, Zhe Zhang, Grace Tsai, Jean-Luc Gaudiot
    Comments: 6 pages, 7 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Robotics (cs.RO)

    The rise of robotic applications has led to the generation of a huge volume
    of unstructured data, whereas the current cloud infrastructure was designed to
    process limited amounts of structured data. To address this problem, we propose
    a learn-memorize-recall-reduce paradigm for robotic cloud computing. The
    learning stage converts incoming unstructured data into structured data; the
    memorization stage provides effective storage for the massive amount of data;
    the recall stage provides efficient means to retrieve the raw data; while the
    reduction stage provides means to make sense of this massive amount of
    unstructured data with limited computing resources.

    RACE: Large-scale ReAding Comprehension Dataset From Examinations

    Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, Eduard Hovy
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We present RACE, a new dataset for benchmark evaluation of methods in the
    reading comprehension task. Collected from the English exams for middle and
    high school Chinese students in the age range between 12 to 18, RACE consists
    of 28,000+ passages and near 100,000 questions generated by human experts
    (English instructors), and covers a variety of topics which are carefully
    designed for evaluating the students’ ability in understanding and reasoning.
    In particular, the proportion of questions that requires reasoning is much
    larger in RACE than that in other benchmark datasets for reading comprehension,
    and there is a significant gap between the performance of the state-of-the-art
    models (43%) and the ceiling human performance (95%). We hope this new dataset
    can serve as a valuable resource for research and evaluation in machine
    comprehension. The dataset is freely available at
    this http URL

    ShapeWorld – A new test methodology for multimodal language understanding

    Alexander Kuhnle, Ann Copestake
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    We introduce a novel framework for evaluating multimodal deep learning models
    with respect to their language understanding and generalization abilities. In
    this approach, artificial data is automatically generated according to the
    experimenter’s specifications. The content of the data, both during training
    and evaluation, can be controlled in detail, which enables tasks to be created
    that require true generalization abilities, in particular the combination of
    previously introduced concepts in novel ways. We demonstrate the potential of
    our methodology by evaluating various visual question answering models on four
    different tasks, and show how our framework gives us detailed insights into
    their capabilities and limitations. By open-sourcing our framework, we hope to
    stimulate progress in the field of multimodal language understanding.


    Information Retrieval

    NEXT: A Neural Network Framework for Next POI Recommendation

    Zhiqian Zhang, Chenliang Li, Zhiyong Wu, Aixin Sun, Dengpan Ye, Xiangyang Luo
    Subjects: Information Retrieval (cs.IR)

    The task of next POI recommendation has been studied extensively in recent
    years. However, developing an unified recommendation framework to incorporate
    multiple factors associated with both POIs and users remains challenging,
    because of the heterogeneity nature of these information. Further, effective
    mechanisms to handle cold-start and endow the system with interpretability are
    also difficult topics. Inspired by the recent success of neural networks in
    many areas, in this paper, we present a simple but effective neural network
    framework for next POI recommendation, named NEXT. NEXT is an unified framework
    to learn the hidden intent regarding user’s next move, by incorporating
    different factors in an unified manner. Specifically, in NEXT, we incorporate
    meta-data information and two kinds of temporal contexts (i.e., time interval
    and visit time). To leverage sequential relations and geographical influence,
    we propose to adopt DeepWalk, a network representation learning technique, to
    encode such knowledge. We evaluate the effectiveness of NEXT against
    state-of-the-art alternatives and neural networks based solutions. Experimental
    results over three publicly available datasets demonstrate that NEXT
    significantly outperforms baselines in real-time next POI recommendation.
    Further experiments demonstrate the superiority of NEXT in handling cold-start.
    More importantly, we show that NEXT provides meaningful explanation of the
    dimensions in hidden intent space.

    Task-Oriented Query Reformulation with Reinforcement Learning

    Rodrigo Nogueira, Kyunghyun Cho
    Subjects: Information Retrieval (cs.IR)

    Search engines play an important role in our everyday lives by assisting us
    in finding the information we need. When we input a complex query, however,
    results are often far from satisfactory. In this work, we introduce a query
    reformulation system based on a neural network that rewrites a query to
    maximize the number of relevant documents returned. We train this neural
    network with reinforcement learning. The actions correspond to selecting terms
    to build a reformulated query, and the reward is the document recall. We
    evaluate our approach on three datasets against strong baselines and show a
    relative improvement of 5-20% in terms of recall. Furthermore, we present a
    simple method to estimate a conservative upper-bound performance of a model in
    a particular environment and verify that there is still large room for
    improvements.

    Generic LSH Families for the Angular Distance Based on Johnson-Lindenstrauss Projections and Feature Hashing LSH

    Luis Argerich, Natalia Golmar
    Subjects: Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR)

    In this paper we propose the creation of generic LSH families for the angular
    distance based on Johnson-Lindenstrauss projections. We show that feature
    hashing is a valid J-L projection and propose two new LSH families based on
    feature hashing. These new LSH families are tested on both synthetic and real
    datasets with very good results and a considerable performance improvement over
    other LSH families. While the theoretical analysis is done for the angular
    distance, these families can also be used in practice for the euclidean
    distance with excellent results [2]. Our tests using real datasets show that
    the proposed LSH functions work well for the euclidean distance.


    Computation and Language

    Sparse Communication for Distributed Gradient Descent

    Alham Fikri Aji, Kenneth Heafield
    Comments: Submitted to EMNLP 2017
    Subjects: Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    We make distributed stochastic gradient descent faster by exchanging 99%
    sparse updates instead of dense updates. In data-parallel training, nodes pull
    updated values of the parameters from a sharded server, compute gradients, push
    their gradients to the server, and repeat. These push and pull updates strain
    the network. However, most updates are near zero, so we map the 99% smallest
    updates (by absolute value) to zero then exchange sparse matrices. Even simple
    coordinate and value encoding achieves 50x reduction in bandwidth. Our
    experiment with a neural machine translation on 4 GPUs achieved a 22% speed
    boost without impacting BLEU score.

    Deep Joint Entity Disambiguation with Local Neural Attention

    Octavian-Eugen Ganea, Thomas Hofmann
    Subjects: Computation and Language (cs.CL)

    We propose a novel deep learning model for joint document-level entity
    disambiguation, which leverages learned neural representations. Key components
    are entity embeddings, a neural attention mechanism over local context windows,
    and a differentiable joint inference stage for disambiguation. Our approach
    thereby combines benefits of deep learning with more traditional approaches
    such as graphical models and probabilistic mention-entity maps. Extensive
    experiments show that we are able to obtain competitive or state-of-the-art
    accuracy at moderate computational costs.

    Learning Character-level Compositionality with Visual Features

    Frederick Liu, Han Lu, Chieh Lo, Graham Neubig
    Comments: Accepted to ACL 2017
    Subjects: Computation and Language (cs.CL)

    Previous work has modeled the compositionality of words by creating
    character-level models of meaning, reducing problems of sparsity for rare
    words. However, in many writing systems compositionality has an effect even on
    the character-level: the meaning of a character is derived by the sum of its
    parts. In this paper, we model this effect by creating embeddings for
    characters based on their visual characteristics, creating an image for the
    character and running it through a convolutional neural network to produce a
    visual character embedding. Experiments on a text classification task
    demonstrate that such model allows for better processing of instances with rare
    characters in languages such as Chinese, Japanese, and Korean. Additionally,
    qualitative analyses demonstrate that our proposed model learns to focus on the
    parts of characters that carry semantic content, resulting in embeddings that
    are coherent in visual space.

    A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes

    Pablo Loyola, Edison Marrese-Taylor, Yutaka Matsuo
    Comments: Accepted at ACL 2017
    Subjects: Computation and Language (cs.CL)

    We propose a model to automatically describe changes introduced in the source
    code of a program using natural language. Our method receives as input a set of
    code commits, which contains both the modifications and message introduced by
    an user. These two modalities are used to train an encoder-decoder
    architecture. We evaluated our approach on twelve real world open source
    projects from four different programming languages. Quantitative and
    qualitative results showed that the proposed approach can generate feasible and
    semantically sound descriptions not only in standard in-project settings, but
    also in a cross-project setting.

    Towards String-to-Tree Neural Machine Translation

    Roee Aharoni, Yoav Goldberg
    Comments: Accepted as a short paper in ACL 2017
    Subjects: Computation and Language (cs.CL)

    We present a simple method to incorporate syntactic information about the
    target language in a neural machine translation system by translating into
    linearized, lexicalized constituency trees. An experiment on the WMT16
    German-English news translation task resulted in an improved BLEU score when
    compared to a syntax-agnostic NMT baseline trained on the same dataset. An
    analysis of the translations from the syntax-aware system shows that it
    performs more reordering during translation in comparison to the baseline. A
    small-scale human evaluation also showed an advantage to the syntax-aware
    system.

    RACE: Large-scale ReAding Comprehension Dataset From Examinations

    Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, Eduard Hovy
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We present RACE, a new dataset for benchmark evaluation of methods in the
    reading comprehension task. Collected from the English exams for middle and
    high school Chinese students in the age range between 12 to 18, RACE consists
    of 28,000+ passages and near 100,000 questions generated by human experts
    (English instructors), and covers a variety of topics which are carefully
    designed for evaluating the students’ ability in understanding and reasoning.
    In particular, the proportion of questions that requires reasoning is much
    larger in RACE than that in other benchmark datasets for reading comprehension,
    and there is a significant gap between the performance of the state-of-the-art
    models (43%) and the ceiling human performance (95%). We hope this new dataset
    can serve as a valuable resource for research and evaluation in machine
    comprehension. The dataset is freely available at
    this http URL

    Graph Convolutional Encoders for Syntax-aware Neural Machine Translation

    Joost Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, Khalil Sima'an
    Subjects: Computation and Language (cs.CL)

    We present a simple and effective approach to incorporating syntactic
    structure into neural attention-based encoder-decoder models for machine
    translation. We rely on graph-convolutional networks (GCNs), a recent class of
    neural networks developed for modeling graph-structured data. Our GCNs use
    predicted syntactic dependency trees of source sentences to produce
    representations of words (i.e. hidden states of the encoder) that are sensitive
    to their syntactic neighborhoods. GCNs take word representations as input and
    produce word representations as output, so they can easily be incorporated as
    layers into standard encoders (e.g., on top of bidirectional RNNs or
    convolutional neural networks). We evaluate their effectiveness with
    English-German and English-Czech translation experiments for different types of
    encoders and observe substantial improvements over their syntax-agnostic
    versions in all the considered setups.

    MUSE: Modularizing Unsupervised Sense Embeddings

    Guang-He Lee, Yun-Nung Chen
    Subjects: Computation and Language (cs.CL)

    This paper proposes to address the word sense ambiguity issue in an
    unsupervised manner, where word sense representations are learned along a word
    sense selection mechanism given contexts. Prior work about learning multi-sense
    embeddings suffered from either ambiguity of different-level embeddings or
    inefficient sense selection. The proposed modular framework, MUSE, implements
    flexible modules to optimize distinct mechanisms, achieving the first purely
    sense-level representation learning system with linear-time sense selection. We
    leverage reinforcement learning to enable joint training on the proposed
    modules, and introduce various exploration techniques on sense selection for
    better robustness. The experiments on benchmark data show that the proposed
    approach achieves the state-of-the-art performance on synonym selection as well
    as on contextual word similarities in terms of MaxSimC.

    Neural Paraphrase Identification of Questions with Noisy Pretraining

    Gaurav Singh Tomar, Thyago Duque, Oscar Täckström, Jakob Uszkoreit, Dipanjan Das
    Subjects: Computation and Language (cs.CL)

    We present a solution to the problem of paraphrase identification of
    questions. We focus on a recent dataset of question pairs annotated with binary
    paraphrase labels and show that a variant of the decomposable attention model
    (Parikh et al., 2016) results in accurate performance on this task, while being
    far simpler than many competing neural architectures. Furthermore, when the
    model is pretrained on a noisy dataset of automatically collected question
    paraphrases, it obtains the best reported performance on the dataset.

    Distributional model on a diet: One-shot word learning from text only

    Su Wang, Stephen Roller, Katrin Erk
    Comments: Keywords: Distributional semantics; Lexical semantics; Bayesian models
    Subjects: Computation and Language (cs.CL)

    We test whether distributional models can do one-shot learning of
    definitional properties from text only. Using Bayesian models, we find that
    first learning overarching structure in the known data, regularities in textual
    contexts and in properties, helps one-shot learning, and that individual
    context items can be highly informative.

    Cross-lingual Abstract Meaning Representation Parsing

    Marco Damonte, Shay B. Cohen
    Subjects: Computation and Language (cs.CL)

    Abstract Meaning Representation (AMR) annotation efforts have mostly focused
    on English. In order to train parsers on other languages, we propose a method
    based on annotation projection, which involves exploiting annotations in a
    source language and a parallel corpus of the source language and a target
    language. Using English as the source language, we show promising results for
    Italian, Spanish, German and Chinese as target languages. Besides evaluating
    the target parsers on non-gold datasets, we further propose an evaluation
    method that exploits the English gold annotations and does not require access
    to gold annotations for the target languages. This is achieved by inverting the
    projection process: a new English parser is learned from the target language
    parser and evaluated on the existing English gold standard.

    Neural Extractive Summarization with Side Information

    Shashi Narayan, Nikos Papasarantopoulos, Mirella Lapata, Shay B. Cohen
    Comments: 10 pages
    Subjects: Computation and Language (cs.CL)

    Most extractive summarization methods focus on the main body of the document
    from which sentences need to be extracted. The gist of the document often lies
    in the side information of the document, such as title and image captions.
    These types of side information are often available for newswire articles. We
    propose to explore side information in the context of single-document
    extractive summarization. We develop a framework for single-document
    summarization composed of a hierarchical document encoder and an
    attention-based extractor with attention over side information. We evaluate our
    models on a large scale news dataset. We show that extractive summarization
    with side information consistently outperforms its counterpart (that does not
    use any side information), in terms on both informativeness and fluency.

    Translation of Patent Sentences with a Large Vocabulary of Technical Terms Using Neural Machine Translation

    Zi Long, Takehito Utsuro, Tomoharu Mitsuhashi, Mikio Yamamoto
    Comments: WAT 2016
    Subjects: Computation and Language (cs.CL)

    Neural machine translation (NMT), a new approach to machine translation, has
    achieved promising results comparable to those of traditional approaches such
    as statistical machine translation (SMT). Despite its recent success, NMT
    cannot handle a larger vocabulary because training complexity and decoding
    complexity proportionally increase with the number of target words. This
    problem becomes even more serious when translating patent documents, which
    contain many technical terms that are observed infrequently. In NMTs, words
    that are out of vocabulary are represented by a single unknown token. In this
    paper, we propose a method that enables NMT to translate patent sentences
    comprising a large vocabulary of technical terms. We train an NMT system on
    bilingual data wherein technical terms are replaced with technical term tokens;
    this allows it to translate most of the source sentences except technical
    terms. Further, we use it as a decoder to translate source sentences with
    technical term tokens and replace the tokens with technical term translations
    using SMT. We also use it to rerank the 1,000-best SMT translations on the
    basis of the average of the SMT score and that of the NMT rescoring of the
    translated sentences with technical term tokens. Our experiments on
    Japanese-Chinese patent sentences show that the proposed NMT system achieves a
    substantial improvement of up to 3.1 BLEU points and 2.3 RIBES points over
    traditional SMT systems and an improvement of approximately 0.6 BLEU points and
    0.8 RIBES points over an equivalent NMT system without our proposed technique.

    Neural Machine Translation Model with a Large Vocabulary Selected by Branching Entropy

    Zi Long, Takehito Utsuro, Tomoharu Mitsuhashi, Mikio Yamamoto
    Comments: EMNLP 2017, under review
    Subjects: Computation and Language (cs.CL)

    Neural machine translation (NMT), a new approach to machine translation, has
    achieved promising results comparable to those of traditional approaches such
    as statistical machine translation (SMT). Despite its recent success, NMT
    cannot handle a larger vocabulary because the training complexity and decoding
    complexity proportionally increase with the number of target words. This
    problem becomes even more serious when translating patent documents, which
    contain many technical terms that are observed infrequently. In this paper, we
    propose to select phrases that contain out-of-vocabulary words using the
    statistical approach of branching entropy. This allows the proposed NMT system
    to be applied to a translation task of any language pair without any
    language-specific knowledge about technical term identification. The selected
    phrases are then replaced with tokens during training and post-translated by
    the phrase translation table of SMT. Evaluation on Japanese-to-Chinese and
    Chinese-to-Japanese patent sentence translation proved the effectiveness of
    phrases selected with branching entropy, where the proposed NMT system achieves
    a substantial improvement over a baseline NMT system without our proposed
    technique.

    ShapeWorld – A new test methodology for multimodal language understanding

    Alexander Kuhnle, Ann Copestake
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    We introduce a novel framework for evaluating multimodal deep learning models
    with respect to their language understanding and generalization abilities. In
    this approach, artificial data is automatically generated according to the
    experimenter’s specifications. The content of the data, both during training
    and evaluation, can be controlled in detail, which enables tasks to be created
    that require true generalization abilities, in particular the combination of
    previously introduced concepts in novel ways. We demonstrate the potential of
    our methodology by evaluating various visual question answering models on four
    different tasks, and show how our framework gives us detailed insights into
    their capabilities and limitations. By open-sourcing our framework, we hope to
    stimulate progress in the field of multimodal language understanding.

    Online Spatial Concept and Lexical Acquisition with Simultaneous Localization and Mapping

    Akira Taniguchi, Yoshinobu Hagiwara, Tadahiro Taniguchi, Tetsunari Inamura
    Comments: Preprint submitted to 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Received March 1, 2017
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Robotics (cs.RO)

    In this paper, we propose an online learning algorithm based on a
    Rao-Blackwellized particle filter for spatial concept acquisition and mapping.
    We have proposed a nonparametric Bayesian spatial concept acquisition model
    (SpCoA). We propose a novel method (SpCoSLAM) integrating SpCoA and FastSLAM in
    the theoretical framework of the Bayesian generative model. The proposed method
    can simultaneously learn place categories and lexicons while incrementally
    generating an environmental map. Furthermore, the proposed method has scene
    image features and a language model added to SpCoA. In the experiments, we
    tested online learning of spatial concepts and environmental maps in a novel
    environment of which the robot did not have a map. Then, we evaluated the
    results of online learning of spatial concepts and lexical acquisition. The
    experimental results demonstrated that the robot was able to more accurately
    learn the relationships between words and the place in the environmental map
    incrementally by using the proposed method.


    Distributed, Parallel, and Cluster Computing

    Space-Optimal Majority in Population Protocols

    Dan Alistarh, James Aspnes, Rati Gelashvili
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Population protocols are a model of distributed computing, in which (n)
    agents with limited local state interact randomly, and cooperate to
    collectively compute global predicates. An extensive series of papers, across
    different communities, has examined the computability and complexity
    characteristics of this model. Majority, or consensus, is a central task, in
    which agents need to collectively reach a decision as to which one of two
    states (A) or (B) had a higher initial count. Two complexity metrics are
    important: the time that a protocol requires to stabilize to an output
    decision, and the state space size that each agent requires.

    It is known that majority requires (Omega(log log n)) states per agent to
    allow for poly-logarithmic time stabilization, and that (O(log^2 n)) states
    are sufficient. Thus, there is an exponential gap between the upper and lower
    bounds.

    We address this question. We provide a new lower bound of (Omega(log n))
    states for any protocol which stabilizes in (O( n^c )) time, for any (c leq 1)
    constant. This result is conditional on basic monotonicity and output
    assumptions, satisfied by all known protocols. Technically, it represents a
    significant departure from previous lower bounds, as it does not rely on the
    existence of dense configurations. Instead, we introduce a new generalized
    surgery technique to prove the existence of incorrect executions which
    contradict the correctness of algorithms that stabilize too fast.

    We give an algorithm for majority which uses (O(log n)) states, and
    stabilizes in (O(log^2 n)) time. Central to the algorithm is a new leaderless
    phase clock, which allows nodes to synchronize in phases of (Theta(n log{n}))
    consecutive interactions using (O(log n)) states per node. We also build a new
    leader election algorithm with (O(log n )) states, which stabilizes in
    (O(log^2 n)) time.

    A Security Monitoring Framework For Virtualization Based HEP Infrastructures

    A. Gomez Ramirez, M. Martinez Pedreira, C. Grigoras, L. Betev, C. Lara, U. Kebschull for the ALICE Collaboration
    Comments: Proceedings of the 22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016, 10-14 October 2016, San Francisco. Submitted to Journal of Physics: Conference Series (JPCS)
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); High Energy Physics – Experiment (hep-ex)

    High Energy Physics (HEP) distributed computing infrastructures require
    automatic tools to monitor, analyze and react to potential security incidents.
    These tools should collect and inspect data such as resource consumption, logs
    and sequence of system calls for detecting anomalies that indicate the presence
    of a malicious agent. They should also be able to perform automated reactions
    to attacks without administrator intervention. We describe a novel framework
    that accomplishes these requirements, with a proof of concept implementation
    for the ALICE experiment at CERN. We show how we achieve a fully virtualized
    environment that improves the security by isolating services and Jobs without a
    significant performance impact. We also describe a collected dataset for
    Machine Learning based Intrusion Prevention and Detection Systems on Grid
    computing. This dataset is composed of resource consumption measurements (such
    as CPU, RAM and network traffic), logfiles from operating system services, and
    system call data collected from production Jobs running in an ALICE Grid test
    site and a big set of malware. This malware was collected from security
    research sites. Based on this dataset, we will proceed to develop Machine
    Learning algorithms able to detect malicious Jobs.

    Learn-Memorize-Recall-Reduce A Robotic Cloud Computing Paradigm

    Shaoshan Liu, Bolin Ding, Jie Tang, Dawei Sun, Zhe Zhang, Grace Tsai, Jean-Luc Gaudiot
    Comments: 6 pages, 7 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Robotics (cs.RO)

    The rise of robotic applications has led to the generation of a huge volume
    of unstructured data, whereas the current cloud infrastructure was designed to
    process limited amounts of structured data. To address this problem, we propose
    a learn-memorize-recall-reduce paradigm for robotic cloud computing. The
    learning stage converts incoming unstructured data into structured data; the
    memorization stage provides effective storage for the massive amount of data;
    the recall stage provides efficient means to retrieve the raw data; while the
    reduction stage provides means to make sense of this massive amount of
    unstructured data with limited computing resources.

    A novel approach for fast mining frequent itemsets use N-list structure based on MapReduce

    Arkan A. G. Al-Hamodi, Songfeng Lu
    Comments: 11 pages, 10 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)

    Frequent Pattern Mining is a one field of the most significant topics in data
    mining. In recent years, many algorithms have been proposed for mining frequent
    itemsets. A new algorithm has been presented for mining frequent itemsets based
    on N-list data structure called Prepost algorithm. The Prepost algorithm is
    enhanced by implementing compact PPC-tree with the general tree. Prepost
    algorithm can only find a frequent itemsets with required (pre-order and
    post-order) for each node. In this chapter, we improved prepost algorithm based
    on Hadoop platform (HPrepost), proposed using the Mapreduce programming model.
    The main goals of proposed method are efficient mining frequent itemsets
    requiring less running time and memory usage. We have conduct experiments for
    the proposed scheme to compare with another algorithms. With dense datasets,
    which have a large average length of transactions, HPrepost is more effective
    than frequent itemsets algorithms in terms of execution time and memory usage
    for all min-sup. Generally, our algorithm outperforms algorithms in terms of
    runtime and memory usage with small thresholds and large datasets.

    Data aggregation routing protocols in wireless sensor networks: a taxonomy

    Saeid Pourroostaei Ardakani
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

    Routing in Wireless Sensor Network (WSN) aims to interconnect sensor nodes
    via single or multi-hop paths. The routes are established to forward data
    packets from sensor nodes to the sink. Establishing a single path to report
    each data packet results in increasing energy consumption in WSN, hence, data
    aggregation routing is used to combine data packets and consequently reduce the
    number of transmissions. This reduces the routing overhead by eliminating
    redundant and meaningless data. There are two models for data aggregation
    routing in WSN: mobile agent and client/server. This paper describes data
    aggregation routing and classifies then the routing protocols according to the
    network architecture and routing models. The key issues of the data aggregation
    routing models (client/server and mobile agent) are highlighted and discussed.

    User-transparent Distributed TensorFlow

    Abhinav Vishnu, Joseph Manzano, Charles Siegel, Jeff Daily
    Comments: 9 pages, 8 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Deep Learning (DL) algorithms have become the {em de facto} choice for data
    analysis. Several DL implementations — primarily limited to a single compute
    node — such as Caffe, TensorFlow, Theano and Torch have become readily
    available. Distributed DL implementations capable of execution on large scale
    systems are becoming important to address the computational needs of large data
    produced by scientific simulations and experiments. Yet, the adoption of
    distributed DL implementations faces significant impediments: 1) most
    implementations require DL analysts to modify their code significantly — which
    is a show-stopper, 2) several distributed DL implementations are geared towards
    cloud computing systems — which is inadequate for execution on massively
    parallel systems such as supercomputers.

    This work addresses each of these problems. We provide a distributed memory
    DL implementation by incorporating required changes in the TensorFlow runtime
    itself. This dramatically reduces the entry barrier for using a distributed
    TensorFlow implementation. We use Message Passing Interface (MPI) — which
    provides performance portability, especially since MPI specific changes are
    abstracted from users. Lastly — and arguably most importantly — we make our
    implementation available for broader use, under the umbrella of Machine
    Learning Toolkit for Extreme Scale (MaTEx) at { exttt
    this http URL}. We refer to our implementation as MaTEx-TensorFlow.

    Sparse Communication for Distributed Gradient Descent

    Alham Fikri Aji, Kenneth Heafield
    Comments: Submitted to EMNLP 2017
    Subjects: Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    We make distributed stochastic gradient descent faster by exchanging 99%
    sparse updates instead of dense updates. In data-parallel training, nodes pull
    updated values of the parameters from a sharded server, compute gradients, push
    their gradients to the server, and repeat. These push and pull updates strain
    the network. However, most updates are near zero, so we map the 99% smallest
    updates (by absolute value) to zero then exchange sparse matrices. Even simple
    coordinate and value encoding achieves 50x reduction in bandwidth. Our
    experiment with a neural machine translation on 4 GPUs achieved a 22% speed
    boost without impacting BLEU score.

    Morpheo: Traceable Machine Learning on Hidden data

    Mathieu Galtier, Camille Marini
    Comments: whitepaper, 9 pages, 6 figures
    Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

    Morpheo is a transparent and secure machine learning platform collecting and
    analysing large datasets. It aims at building state-of-the art prediction
    models in various fields where data are sensitive. Indeed, it offers strong
    privacy of data and algorithm, by preventing anyone to read the data, apart
    from the owner and the chosen algorithms. Computations in Morpheo are
    orchestrated by a blockchain infrastructure, thus offering total traceability
    of operations. Morpheo aims at building an attractive economic ecosystem around
    data prediction by channelling crypto-money from prediction requests to useful
    data and algorithms providers. Morpheo is designed to handle multiple data
    sources in a transfer learning approach in order to mutualize knowledge
    acquired from large datasets for applications with smaller but similar
    datasets.

    Design Guidelines for the User-Centred Collaborative Citizen Science Platforms

    Poonam Yadav, John Darlington
    Comments: In Review process
    Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI)

    Online Citizen Science platforms are good examples of socio-technical systems
    where technology-enabled interactions occur between scientists and the general
    public (volunteers). Citizen Science platforms usually host multiple Citizen
    Science projects, and allow volunteers to choose the ones to participate in.
    Recent work in the area has demonstrated a positive feedback loop between
    participation and learning and creativity in Citizen Science projects, which is
    one of the motivating factors both for scientists and the volunteers. This
    emphasises the importance of creating successful Citizen Science platforms,
    which support this feedback process, and enable enhanced learning and
    creativity to occur through knowledge sharing and diverse participation. In
    this paper, we discuss how scientists’ and volunteers’ motivation and
    participation influence the design of Citizen Science platforms. We present our
    summary as guidelines for designing these platforms as user-inspired
    socio-technical systems. We also present the case-studies on popular Citizen
    Science platforms, including our own CitizenGrid platform, developed as part of
    the CCL EU project, as well as Zooniverse, World Community Grid, CrowdCrafting
    and EpiCollect+ to see how closely these platforms follow our proposed
    guidelines and how these may be further improved to incorporate the creativity
    enabled by the collective knowledge sharing.


    Learning

    Fast multi-output relevance vector regression

    Youngmin Ha
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    This paper aims to decrease the time complexity of multi-output relevance
    vector regression from O(VM^3) to O(V^3+M^3), where V is the number of output
    dimensions, M is the number of basis functions, and V<M. The experimental
    results demonstrate that the proposed method is more competitive than the
    existing method, with regard to computation time. MATLAB codes are available at
    this http URL

    Larger is Better: The Effect of Learning Rates Enjoyed by Stochastic Optimization with Progressive Variance Reduction

    Fanhua Shang
    Comments: 36 pages, 10 figures. The simple variant of SVRG is much better than the best-known stochastic method, Katyusha
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

    In this paper, we propose a simple variant of the original stochastic
    variance reduction gradient (SVRG), where hereafter we refer to as the variance
    reduced stochastic gradient descent (VR-SGD). Different from the choices of the
    snapshot point and starting point in SVRG and its proximal variant, Prox-SVRG,
    the two vectors of each epoch in VR-SGD are set to the average and last iterate
    of the previous epoch, respectively. This setting allows us to use much larger
    learning rates or step sizes than SVRG, e.g., 3/(7L) for VR-SGD vs 1/(10L) for
    SVRG, and also makes our convergence analysis more challenging. In fact, a
    larger learning rate enjoyed by VR-SGD means that the variance of its
    stochastic gradient estimator asymptotically approaches zero more rapidly.
    Unlike common stochastic methods such as SVRG and proximal stochastic methods
    such as Prox-SVRG, we design two different update rules for smooth and
    non-smooth objective functions, respectively. In other words, VR-SGD can tackle
    non-smooth and/or non-strongly convex problems directly without using any
    reduction techniques such as quadratic regularizers. Moreover, we analyze the
    convergence properties of VR-SGD for strongly convex problems, which show that
    VR-SGD attains a linear convergence rate. We also provide the convergence
    guarantees of VR-SGD for non-strongly convex problems. Experimental results
    show that the performance of VR-SGD is significantly better than its
    counterparts, SVRG and Prox-SVRG, and it is also much better than the best
    known stochastic method, Katyusha.

    Adversarial and Clean Data Are Not Twins

    Zhitao Gong, Wenlu Wang, Wei-Shinn Ku
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Adversarial attack has cast a shadow on the massive success of deep neural
    networks. Despite being almost visually identical to the clean data, the
    adversarial images can fool deep neural networks into wrong predictions with
    very high confidence. In this paper, however, we show that we can build a
    simple binary classifier separating the adversarial apart from the clean data
    with accuracy over 99%. We also empirically show that the binary classifier is
    robust to a second-round adversarial attack. In other words, it is difficult to
    disguise adversarial samples to bypass the binary classifier. Further more, we
    empirically investigate the generalization limitation which lingers on all
    current defensive methods, including the binary classifier approach. And we
    hypothesize that this is the result of intrinsic property of adversarial
    crafting algorithms.

    Introspection: Accelerating Neural Network Training By Learning Weight Evolution

    Abhishek Sinha, Mausoom Sarkar, Aahitagni Mukherjee, Balaji Krishnamurthy
    Subjects: Learning (cs.LG)

    Neural Networks are function approximators that have achieved
    state-of-the-art accuracy in numerous machine learning tasks. In spite of their
    great success in terms of accuracy, their large training time makes it
    difficult to use them for various tasks. In this paper, we explore the idea of
    learning weight evolution pattern from a simple network for accelerating
    training of novel neural networks. We use a neural network to learn the
    training pattern from MNIST classification and utilize it to accelerate
    training of neural networks used for CIFAR-10 and ImageNet classification. Our
    method has a low memory footprint and is computationally efficient. This method
    can also be used with other optimizers to give faster convergence. The results
    indicate a general trend in the weight evolution during training of neural
    networks.

    Deep Relaxation: partial differential equations for optimizing deep neural networks

    Pratik Chaudhari, Adam Oberman, Stanley Osher, Stefano Soatto, Guillame Carlier
    Subjects: Learning (cs.LG); Analysis of PDEs (math.AP); Optimization and Control (math.OC)

    We establish connections between non-convex optimization methods for training
    deep neural networks (DNNs) and the theory of partial differential equations
    (PDEs). In particular, we focus on relaxation techniques initially developed in
    statistical physics, which we show to be solutions of a nonlinear
    Hamilton-Jacobi-Bellman equation. We employ the underlying stochastic control
    problem to analyze the geometry of the relaxed energy landscape and its
    convergence properties, thereby confirming empirical evidence. This paper opens
    non-convex optimization problems arising in deep learning to ideas from the PDE
    literature. In particular, we show that the non-viscous Hamilton-Jacobi
    equation leads to an elegant algorithm based on the Hopf-Lax formula that
    outperforms state-of-the-art methods. Furthermore, we show that these
    algorithms scale well in practice and can effectively tackle the high
    dimensionality of modern neural networks.

    Effective Warm Start for the Online Actor-Critic Reinforcement Learning based mHealth Intervention

    Feiyun Zhu, Peng Liao
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Online reinforcement learning (RL) is increasingly popular for the
    personalized mobile health (mHealth) intervention. It is able to personalize
    the type and dose of interventions according to user’s ongoing statuses and
    changing needs. However, at the beginning of online learning, there are usually
    too few samples to support the RL updating, which leads to poor performances. A
    delay in good performance of the online learning algorithms can be especially
    detrimental in the mHealth, where users tend to quickly disengage with apps. To
    address this problem, we propose a new online RL methodology that focuses on an
    effective warm start. The main idea is to make full use of the data accumulated
    and the decision rule achieved in a former study. As a result, we can greatly
    enrich the data size at the beginning of online learning in our method. Such
    case accelerates the online learning process for new users to achieve good
    performances not only at the beginning of online learning but also through the
    whole online learning process. Besides, we use the decision rules achieved
    previously to initialize the parameter in our online RL model for new users. It
    provides a good initialization for the proposed online RL algorithm. Experiment
    results show that promising improvements have been achieved by our method
    compared with the state-of-the-art method.

    On the Gap Between Strict-Saddles and True Convexity: An Omega(log d) Lower Bound for Eigenvector Approximation

    Max Simchowitz, Ahmed El Alaoui, Benjamin Recht
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Combinatorics (math.CO); Machine Learning (stat.ML)

    We prove a emph{query complexity} lower bound on rank-one principal
    component analysis (PCA). We consider an oracle model where, given a symmetric
    matrix (M in mathbb{R}^{d imes d}), an algorithm is allowed to make (T)
    emph{exact} queries of the form (w^{(i)} = Mv^{(i)}) for (i in
    {1,dots,T}), where (v^{(i)}) is drawn from a distribution which depends
    arbitrarily on the past queries and measurements ({v^{(j)},w^{(j)}}_{1 le j
    le i-1}). We show that for a small constant (epsilon), any adaptive,
    randomized algorithm which can find a unit vector (widehat{v}) for which
    (widehat{v}^{ op}Mwidehat{v} ge (1-epsilon)|M|), with even small
    probability, must make (T = Omega(log d)) queries. In addition to settling a
    widely-held folk conjecture, this bound demonstrates a fundamental gap between
    convex optimization and “strict-saddle” non-convex optimization of which PCA is
    a canonical example: in the former, first-order methods can have dimension-free
    iteration complexity, whereas in PCA, the iteration complexity of
    gradient-based methods must necessarily grow with the dimension. Our argument
    proceeds via a reduction to estimating the rank-one spike in a deformed Wigner
    model. We establish lower bounds for this model by developing a “truncated”
    analogue of the (chi^2) Bayes-risk lower bound of Chen et al.

    Hierarchic Kernel Recursive Least-Squares

    Hossein Mohamadipanah, Girish Chowdhary
    Subjects: Learning (cs.LG)

    We present a new hierarchic kernel based modeling technique for modeling
    evenly distributed multidimensional datasets that does not rely on input space
    sparsification. The presented method reorganizes the typical single-layer
    kernel based model in a hierarchical structure, such that the weights of a
    kernel model over each dimension are modeled over the adjacent dimension. We
    show that the imposition of the hierarchical structure in the kernel based
    model leads to significant computational speedup and improved modeling accuracy
    (over an order of magnitude in many cases). For instance the presented method
    is about five times faster and more accurate than Sparsified Kernel Recursive
    Least- Squares in modeling of a two-dimensional real-world data set.

    Sparse Communication for Distributed Gradient Descent

    Alham Fikri Aji, Kenneth Heafield
    Comments: Submitted to EMNLP 2017
    Subjects: Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    We make distributed stochastic gradient descent faster by exchanging 99%
    sparse updates instead of dense updates. In data-parallel training, nodes pull
    updated values of the parameters from a sharded server, compute gradients, push
    their gradients to the server, and repeat. These push and pull updates strain
    the network. However, most updates are near zero, so we map the 99% smallest
    updates (by absolute value) to zero then exchange sparse matrices. Even simple
    coordinate and value encoding achieves 50x reduction in bandwidth. Our
    experiment with a neural machine translation on 4 GPUs achieved a 22% speed
    boost without impacting BLEU score.

    Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models

    Ardavan Saeedi, Matthew D. Hoffman, Stephen J. DiVerdi, Asma Ghandeharioun, Matthew J. Johnson, Ryan P. Adams
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Professional-grade software applications are powerful but
    complicated(-)expert users can achieve impressive results, but novices often
    struggle to complete even basic tasks. Photo editing is a prime example: after
    loading a photo, the user is confronted with an array of cryptic sliders like
    “clarity”, “temp”, and “highlights”. An automatically generated suggestion
    could help, but there is no single “correct” edit for a given image(-)different
    experts may make very different aesthetic decisions when faced with the same
    image, and a single expert may make different choices depending on the intended
    use of the image (or on a whim). We therefore want a system that can propose
    multiple diverse, high-quality edits while also learning from and adapting to a
    user’s aesthetic preferences. In this work, we develop a statistical model that
    meets these objectives. Our model builds on recent advances in neural network
    generative modeling and scalable inference, and uses hierarchical structure to
    learn editing patterns across many diverse users. Empirically, we find that our
    model outperforms other approaches on this challenging multimodal prediction
    task.

    Bayesian Hybrid Matrix Factorisation for Data Integration

    Thomas Brouwer, Pietro Lió
    Comments: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017)
    Journal-ref: PMLR 54:557-566, 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We introduce a novel Bayesian hybrid matrix factorisation model (HMF) for
    data integration, based on combining multiple matrix factorisation methods,
    that can be used for in- and out-of-matrix prediction of missing values. The
    model is very general and can be used to integrate many datasets across
    different entity types, including repeated experiments, similarity matrices,
    and very sparse datasets. We apply our method on two biological applications,
    and extensively compare it to state-of-the-art machine learning and matrix
    factorisation models. For in-matrix predictions on drug sensitivity datasets we
    obtain consistently better performances than existing methods. This is
    especially the case when we increase the sparsity of the datasets. Furthermore,
    we perform out-of-matrix predictions on methylation and gene expression
    datasets, and obtain the best results on two of the three datasets, especially
    when the predictivity of datasets is high.

    Gang of GANs: Generative Adversarial Networks with Maximum Margin Ranking

    Felix Juefei-Xu, Vishnu Naresh Boddeti, Marios Savvides
    Comments: 16 pages. 11 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Traditional generative adversarial networks (GAN) and many of its variants
    are trained by minimizing the KL or JS-divergence loss that measures how close
    the generated data distribution is from the true data distribution. A recent
    advance called the WGAN based on Wasserstein distance can improve on the KL and
    JS-divergence based GANs, and alleviate the gradient vanishing, instability,
    and mode collapse issues that are common in the GAN training. In this work, we
    aim at improving on the WGAN by first generalizing its discriminator loss to a
    margin-based one, which leads to a better discriminator, and in turn a better
    generator, and then carrying out a progressive training paradigm involving
    multiple GANs to contribute to the maximum margin ranking loss so that the GAN
    at later stages will improve upon early stages. We call this method Gang of
    GANs (GoGAN). We have shown theoretically that the proposed GoGAN can reduce
    the gap between the true data distribution and the generated data distribution
    by at least half in an optimally trained WGAN. We have also proposed a new way
    of measuring GAN quality which is based on image completion tasks. We have
    evaluated our method on four visual datasets: CelebA, LSUN Bedroom, CIFAR-10,
    and 50K-SSFF, and have seen both visual and quantitative improvement over
    baseline WGAN.

    A Novel Experimental Platform for In-Vessel Multi-Chemical Molecular Communications

    Nariman Farsad, David Pan, Andrea Goldsmith
    Subjects: Emerging Technologies (cs.ET); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Learning (cs.LG)

    This work presents a new multi-chemical experimental platform for molecular
    communication where the transmitter can release different chemicals. This
    platform is designed to be inexpensive and accessible, and it can be expanded
    to simulate different environments including the cardiovascular system and
    complex network of pipes in industrial complexes and city infrastructures. To
    demonstrate the capabilities of the platform, we implement a time-slotted
    binary communication system where a bit-0 is represented by an acid pulse, a
    bit-1 by a base pulse, and information is carried via pH signals. The channel
    model for this system, which is nonlinear and has long memories, is unknown.
    Therefore, we devise novel detection algorithms that use techniques from
    machine learning and deep learning to train a maximum-likelihood detector.
    Using these algorithms the bit error rate improves by an order of magnitude
    relative to the approach used in previous works. Moreover, our system achieves
    a data rate that is an order of magnitude higher than any of the previous
    molecular communication platforms.

    Random Walk Sampling for Big Data over Networks

    Saeed Basirian, Alexander Jung
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    It has been shown recently that graph signals with small total variation can
    be accurately recovered from only few samples if the sampling set satisfies a
    certain condition, referred to as the network nullspace property. Based on this
    recovery condition, we propose a sampling strategy for smooth graph signals
    based on random walks. Numerical experiments demonstrate the effectiveness of
    this approach for graph signals obtained from a synthetic random graph model as
    well as a real-world dataset.

    In-Datacenter Performance Analysis of a Tensor Processing Unit

    Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, et al. (19 additional authors not shown)
    Comments: 17 pages, 11 figures, 8 tables. To appear at the 44th International Symposium on Computer Architecture (ISCA), Toronto, Canada, June 24-28, 2017
    Subjects: Hardware Architecture (cs.AR); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Many architects believe that major improvements in cost-energy-performance
    must now come from domain-specific hardware. This paper evaluates a custom
    ASIC—called a Tensor Processing Unit (TPU)—deployed in datacenters since
    2015 that accelerates the inference phase of neural networks (NN). The heart of
    the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak
    throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed
    on-chip memory. The TPU’s deterministic execution model is a better match to
    the 99th-percentile response-time requirement of our NN applications than are
    the time-varying optimizations of CPUs and GPUs (caches, out-of-order
    execution, multithreading, multiprocessing, prefetching, …) that help average
    throughput more than guaranteed latency. The lack of such features helps
    explain why, despite having myriad MACs and a big memory, the TPU is relatively
    small and low power. We compare the TPU to a server-class Intel Haswell CPU and
    an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters.
    Our workload, written in the high-level TensorFlow framework, uses production
    NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters’
    NN inference demand. Despite low utilization for some applications, the TPU is
    on average about 15X – 30X faster than its contemporary GPU or CPU, with
    TOPS/Watt about 30X – 80X higher. Moreover, using the GPU’s GDDR5 memory in the
    TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and
    200X the CPU.

    Deep Learning Based Regression and Multi-class Models for Acute Oral Toxicity Prediction

    Youjun Xu, Jianfeng Pei, Luhua Lai
    Comments: 36 pages, 4 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Quantitative Methods (q-bio.QM)

    Median lethal death, LD50, is a general indicator of compound acute oral
    toxicity (AOT). Various in silico methods were developed for AOT prediction to
    reduce costs and time. In this study, a deep learning architecture composed of
    multi-layer convolution neural network was used to develop three types of
    high-level predictive models: regression model (deepAOT-R),
    multi-classification (deepAOT-C) model and multitask model (deepAOT-CR) for AOT
    evaluation. These models highly outperformed previously reported models. For
    the two external datasets containing 1673 (test set I) and 375 (test set II)
    compounds, the R2 and mean absolute error (MAE) of deepAOTR on the test set I
    were 0.864 and 0.195, and the prediction accuracy of deepAOT-C was 95.5% and
    96.3% on the test set I and II, respectively. The two external prediction
    accuracy of deepAOT-CR is 95.0% and 94.1%, while the R2 and MAE are 0.861 and
    0.204 for test set I, respectively. We then performed forward and backward
    exploration of deepAOT models for deep fingerprints, which could support
    shallow machine learning methods more efficiently than traditional fingerprints
    or descriptors. We further performed automatic feature learning, a key essence
    of deep learning, to map the corresponding activation values into fragment
    space and derive AOT-related chemical substructures by reverse mining of the
    features. Our deep learning framework for AOT is generally applicable in
    predicting and exploring other toxicity or property endpoints of chemical
    compounds. The two deepAOT models are freely available at
    this http URL

    Machine Learning and the Future of Realism

    Giles Hooker, Cliff Hooker
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    The preceding three decades have seen the emergence, rise, and proliferation
    of machine learning (ML). From half-recognised beginnings in perceptrons,
    neural nets, and decision trees, algorithms that extract correlations (that is,
    patterns) from a set of data points have broken free from their origin in
    computational cognition to embrace all forms of problem solving, from voice
    recognition to medical diagnosis to automated scientific research and
    driverless cars, and it is now widely opined that the real industrial
    revolution lies less in mobile phone and similar than in the maturation and
    universal application of ML. Among the consequences just might be the triumph
    of anti-realism over realism.

    RACE: Large-scale ReAding Comprehension Dataset From Examinations

    Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, Eduard Hovy
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We present RACE, a new dataset for benchmark evaluation of methods in the
    reading comprehension task. Collected from the English exams for middle and
    high school Chinese students in the age range between 12 to 18, RACE consists
    of 28,000+ passages and near 100,000 questions generated by human experts
    (English instructors), and covers a variety of topics which are carefully
    designed for evaluating the students’ ability in understanding and reasoning.
    In particular, the proportion of questions that requires reasoning is much
    larger in RACE than that in other benchmark datasets for reading comprehension,
    and there is a significant gap between the performance of the state-of-the-art
    models (43%) and the ceiling human performance (95%). We hope this new dataset
    can serve as a valuable resource for research and evaluation in machine
    comprehension. The dataset is freely available at
    this http URL

    Deep Learning for Photoacoustic Tomography from Sparse Data

    Stephan Antholzer, Markus Haltmeier, Johannes Schwab
    Comments: 13 pages, 8 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    The development of fast and accurate image reconstruction algorithms is a
    central aspect of computed tomography. In this paper we investigate this issue
    for the sparse data problem of photoacoustic tomography (PAT). We develop
    direct and highly efficient reconstruction algorithms based on deep-learning.
    In this approach image reconstruction is performed with a deep convolutional
    neural network (CNN), whose weights are adjusted prior to the actual image
    reconstruction based on a set of training data. Our results demonstrate that
    the proposed deep learning approach reconstructs images with a quality
    komparable to state of the art iterative approaches from sparse data. At the
    same time, the numerically complexity of our approach is much smaller and the
    image reconstruction is performed in a fraction of the time required by
    iterative methods.

    Asynchronous Parallel Empirical Variance Guided Algorithms for the Thresholding Bandit Problem

    Jie Zhong, Yijun Huang, Ji Liu
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    This paper considers the multi-armed thresholding bandit problem —
    identifying all arms above a predefined threshold via as few pulls (or rounds)
    as possible — proposed by Locatelli et al. [2016] recently. Although the
    proposed algorithm in Locatelli et al. [2016] achieves the optimal round
    complexity a certain sense, there still remain unsolved issues. This paper
    proposes an asynchronous parallel thresholding algorithm and its parameter-free
    version to improve the efficiency and the applicability. On one hand, the
    proposed two algorithms use the empirical variance to guide which arm to pull
    at each round, and improve the round complexity of the “optimal” algorithm when
    all arms have bounded high order moments. On the other hand, most bandit
    algorithms assume that the reward can be observed immediately after the pull or
    the next decision would not be made before all rewards are observed. Our
    proposed asynchronous parallel algorithms allow making the choice of the next
    pull with unobserved rewards from earlier pulls, which avoids such an
    unrealistic assumption and significantly improves the identification process.
    Our theoretical analysis justifies the effectiveness and the efficiency of
    proposed asynchronous parallel algorithms. The empirical study is also provided
    to validate the proposed algorithms.

    A Sample Complexity Measure with Applications to Learning Optimal Auctions

    Vasilis Syrgkanis
    Subjects: Computer Science and Game Theory (cs.GT); Learning (cs.LG); Statistics Theory (math.ST)

    We introduce a new sample complexity measure, which we refer to as
    split-sample growth rate. For any hypothesis (H) and for any sample (S) of size
    (m), the split-sample growth rate (hat{ au}_H(m)) counts how many different
    hypotheses can empirical risk minimization output on any sub-sample of (S) of
    size (m/2). We show that the expected generalization error is upper bounded by
    (Oleft(sqrt{frac{log(hat{ au}_H(2m))}{m}}
    ight)). Our result is enabled
    by a strengthening of the Rademacher complexity analysis of the expected
    generalization error. We show that this sample complexity measure, greatly
    simplifies the analysis of the sample complexity of optimal auction design, for
    many auction classes studied in the literature. Their sample complexity can be
    derived solely by noticing that in these auction classes, ERM on any sample or
    sub-sample will pick parameters that are equal to one of the points in the
    sample.


    Information Theory

    Low Complexity Coefficient Selection Algorithms for Compute-and-Forward

    Qinhui Huang, Alister Burr
    Comments: 10 pages, 10 figures
    Subjects: Information Theory (cs.IT)

    Compute-and-Forward (C&F) has been proposed as an efficient strategy to
    reduce the backhaul load for the distributed antenna systems. Finding the
    optimal coefficients in C&F has commonly been treated as a shortest vector
    problem (SVP), which is N-P hard. The point of our work and of Sahraei’s recent
    work is that the C&F coefficient problem can be much simpler. Due to the
    special structure of C&F, some low polynomial complexity optimal algorithms
    have recently been developed. However these methods can be applied to real
    valued channels and integer based lattices only. In this paper, we consider the
    complex valued channel with complex integer based lattices. For the first time,
    we propose a low polynomial complexity algorithm to find the optimal solution
    for the complex scenario. Then we propose a simple linear search algorithm
    which is conceptually suboptimal, however numerical results show that the
    performance degradation is negligible compared to the optimal method. Both
    algorithms are suitable for lattices over any algebraic integers, and
    significantly outperform the lattice reduction algorithm. The complexity of
    both algorithms are investigated both theoretically and numerically. The
    results show that our proposed algorithms achieve better performance-complexity
    trade-offs compared to the existing algorithms.

    Covert Communication in Wireless Relay Networks

    Jinsong Hu, Shihao Yan, Xiangyun Zhou, Feng Shu, Jiangzhou Wang
    Subjects: Information Theory (cs.IT)

    Covert communication aims to shield the very existence of wireless
    transmissions in order to guarantee a strong security in wireless networks. In
    this work, for the first time we examine the possibility and achievable
    performance of covert communication in one-way relay networks. Specifically,
    the relay opportunistically transmits its own information to the destination
    covertly on top of forwarding the source’s message, while the source tries to
    detect this covert transmission to discover the illegitimate usage of the
    recourse (e.g., power, spectrum) allocated only for the purpose of forwarding
    source’s information. The necessary condition that the relay can transmit
    covertly without being detected is identified and the source’s detection limit
    is derived in terms of the false alarm and miss detection rates. Our analysis
    indicates that boosting the forwarding ability of the relay (e.g., increasing
    its maximum transmit power) also increases its capacity to perform the covert
    communication in terms of achieving a higher effective covert rate subject to
    some specific requirement on the source’s detection performance.

    Capacity-Achieving Input Distributions in Nondispersive Optical Fibers

    Jihad Fahs, Aslan Tchamkerten, Mansoor I. Yousefi
    Comments: 15 pages
    Subjects: Information Theory (cs.IT)

    This paper investigates the discrete-time per-sample model of the
    zero-dispersion optical fiber. It is shown that any capacity-achieving input
    distribution has (continuous) uniform phase and discrete amplitude with a
    finite number of mass points. This result holds when the input is subject to
    general input cost constraints that include peak power constraint and joint
    average and peak power constraint.

    Caching Policy Optimization for D2D Communications by Learning User Preference

    Binqiang Chen, Chenyang Yang
    Comments: Accepted by VTC Spring 2017
    Subjects: Information Theory (cs.IT)

    Cache-enabled device-to-device (D2D) communications can boost network
    throughput. By pre-downloading contents to local caches of users, the content
    requested by a user can be transmitted via D2D links by other users in
    proximity. Prior works optimize the caching policy at users with the knowledge
    of content popularity, defined as the probability distribution of request for
    every file in a library from by all users. However, content popularity can not
    reflect the interest of each individual user and thus popularity-based caching
    policy may not fully capture the performance gain introduced by caching. In
    this paper, we optimize caching policy for cache-enabled D2D by learning user
    preference, defined as the conditional probability distribution of a user’s
    request for a file given that the user sends a request. We first formulate an
    optimization problem with given user preference to maximize the offloading
    probability, which is proved as NP-hard, and then provide a greedy algorithm to
    find the solution. In order to predict the preference of each individual user,
    we model the user request behavior by probabilistic latent semantic analysis
    (pLSA), and then apply expectation maximization (EM) algorithm to estimate the
    model parameters. Simulation results show that the user preference can be
    learnt quickly. Compared to the popularity-based caching policy, the offloading
    gain achieved by the proposed policy can be remarkably improved even with
    predicted user preference.

    Linear Transceiver Design for Bidirectional Full-Duplex MIMO OFDM Systems

    Omid Taghizadeh, Vimal Radhakrishnan, Ali Cagatay Cirik, Rudolf Mathar, Lutz Lampe
    Subjects: Information Theory (cs.IT)

    In this paper we address the linear precoding and decoding design problem for
    a bidirectional orthogonal-frequency-division-multiplexing (OFDM) communication
    system, between two multiple-input-multiple-output (MIMO) full-duplex (FD)
    nodes. The effects of hardware distortion, leading to the residual
    self-interference and inter-carrier leakage, as well as the channel state
    information (CSI) error are taken into account. In the first step, the
    operation of a FD MIMO OFDM transceiver is modeled where the explicit impact of
    hardware inaccuracies on the inter-carrier leakage is observed. An alternating
    quadratic convex program (AltQCP) is then provided to obtain a
    minimum-mean-squared-error (MMSE) design for the defined system. Moreover,
    taking into account the impacts of CSI inaccuracy, an alternating
    semi-definite-program (AltSDP) is proposed to obtain a worst-case MMSE design
    under a norm-bounded CSI error. The provided designs are also extended to
    maximize the system sum rate, applying the weighted-MMSE (WMMSE) method. The
    proposed AltQCP and AltSDP algorithms are based on alternating update of the
    optimization variables, with a guaranteed convergence, where in each step the
    sub-problem is solved to optimality. In order to provide further insights, the
    computational complexity of the AltSDP algorithm is analytically obtained in
    relation to the problem dimensions. Moreover, a methodology to obtain the least
    favorable CSI error matrices is obtained, by transforming the resulting
    non-convex quadratic problem into a convex problem. Finally, the proposed
    methods are numerically evaluated in terms of the achievable system
    performance, computational complexity, as well as the comparison to other
    approaches in the literature. A significant gain is observed via the
    application of the proposed methods as the hardware inaccuracy, and
    consequently inter-carrier leakage, increases.

    Wireless Communication using Unmanned Aerial Vehicles (UAVs): Optimal Transport Theory for Hover Time Optimization

    Mohammad Mozaffari, Walid Saad, Mehdi Bennis, Merouane Debbah
    Subjects: Information Theory (cs.IT)

    In this paper, the effective use of flight-time constrained unmanned aerial
    vehicles (UAVs) as flying base stations that can provide wireless service to
    ground users is investigated. In particular, a novel framework for optimizing
    the performance of such UAV-based wireless systems in terms of the average
    number of bits (data service) transmitted to users as well as UAVs’ hover
    duration (i.e. flight time) is proposed. In the considered model, UAVs hover
    over a given geographical area to serve ground users that are distributed
    within the area based on an arbitrary spatial distribution function. In this
    case, two practical scenarios are considered. In the first scenario, based on
    the maximum possible hover times of UAVs, the average data service delivered to
    the users under a fair resource allocation scheme is maximized by finding the
    optimal cell partitions associated to the UAVs. Using the mathematical
    framework of optimal transport theory, a gradient-based algorithm is proposed
    for optimally partitioning the geographical area based on the users’
    distribution, hover times, and locations of the UAVs. In the second scenario,
    given the load requirements of ground users, the minimum average hover time
    that the UAVs need for completely servicing their ground users is derived. To
    this end, first, an optimal bandwidth allocation scheme for serving the users
    is proposed. Then, given this optimal bandwidth allocation, the optimal cell
    partitions associated with the UAVs are derived by exploiting the optimal
    transport theory. Results show that our proposed cell partitioning approach
    leads to a significantly higher fairness among the users compared to the
    classical weighted Voronoi diagram. In addition, our results reveal an inherent
    tradeoff between the hover time of UAVs and bandwidth efficiency while serving
    the ground users.

    Network Coding Channel Virtualization Schemes for Satellite Multicast Communications

    Samah A. M. Ghanem, Ala Eddine Gharsellaoui, Daniele Tarchi, Alessandro Vanelli-Coralli
    Subjects: Information Theory (cs.IT)

    In this paper, we propose two novel schemes to solve the problem of finding a
    quasi-optimal number of coded packets to multicast to a set of independent
    wireless receivers suffering different channel conditions. In particular, we
    propose two network channel virtualization schemes that allow for representing
    the set of intended receivers in a multicast group to be virtualized as one
    receiver. Such approach allows for a transmission scheme not only adapted to
    per-receiver channel variation over time, but to the network-virtualized
    channel representing all receivers in the multicast group. The first scheme
    capitalizes on a maximum erasure criterion introduced via the creation of a
    virtual worst per receiver per slot reference channel of the network. The
    second scheme capitalizes on a maximum completion time criterion by the use of
    the worst performing receiver channel as a virtual reference to the network. We
    apply such schemes to a GEO satellite scenario. We demonstrate the benefits of
    the proposed schemes comparing them to a per-receiver point-to-point adaptive
    strategy.

    Energy Efficient Adaptive Network Coding Schemes for Satellite Communications

    Ala Eddine Gharsellaoui, Samah A. M. Ghanem, Daniele Tarchi, Alessandro Vanelli Coralli
    Comments: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 24 March 2017
    Subjects: Information Theory (cs.IT)

    In this paper, we propose novel energy efficient adaptive network coding and
    modulation schemes for time variant channels. We evaluate such schemes under a
    realistic channel model for open area environments and Geostationary Earth
    Orbit (GEO) satellites. Compared to non-adaptive network coding and adaptive
    rate efficient network-coded schemes for time variant channels, we show that
    our proposed schemes, through physical layer awareness can be designed to
    transmit only if a target quality of service (QoS) is achieved. As a result,
    such schemes can provide remarkable energy savings.

    Angle-Domain Doppler Pre-Compensation for High-Mobility OFDM Uplink with a Massive ULA

    Wei Guo, Weile Zhang, Pengcheng Mu, Feifei Gao, Bobin Yao
    Subjects: Information Theory (cs.IT)

    In this paper, we propose a Doppler pre-compensation scheme for high-mobility
    orthogonal frequency division multiplexing (OFDM) uplink, where a high-speed
    terminal transmits signals to the base station (BS). Considering that the
    time-varying multipath channel consists of multiple Doppler frequency offsets
    (DFOs) with different angle of departures (AoDs), we propose to perform DFO
    pre-compensation at the transmitter with a large-scale uniform linear array
    (ULA). The transmitted signal passes through a beamforming network with
    high-spatial resolution to produce multiple parallel branches. Each branch
    transmits signal towards one direction thus it is affected by one dominant DFO
    when passing over the time-varying channel. Therefore, we can compensate the
    DFO for each branch at the transmitter previously. Theoretical analysis for the
    Doppler spread of the equivalent uplink channel is also conducted. It is found
    that when the number of transmit antennas is sufficiently large, the
    time-variation of channel can be efficiently suppressed. Therefore, the
    performance will not degrade significantly if applying the conventional
    time-invariant channel estimation and equalization methods at the receiver.
    Simulation results are provided to verify the proposed scheme.

    Quantization Design and Channel Estimation for Massive MIMO Systems with One-Bit ADCs

    Feiyu Wang, Jun Fang, Hongbin Li, Shaoqian Li
    Subjects: Information Theory (cs.IT)

    We consider the problem of channel estimation for uplink multiuser massive
    MIMO systems, where, in order to significantly reduce the hardware cost and
    power consumption, one-bit analog-to-digital converters (ADCs) are used at the
    base station (BS) to quantize the received signal. Channel estimation for
    one-bit massive MIMO systems is challenging due to the severe distortion caused
    by the coarse quantization. It was shown in previous studies that an extremely
    long training sequence is required to attain an acceptable performance. In this
    paper, we study the problem of optimal one-bit quantization design for channel
    estimation in one-bit massive MIMO systems. Our analysis reveals that, if the
    quantization thresholds are optimally devised, using one-bit ADCs can achieve
    an estimation error close to (with an increase by a factor of (pi/2)) that of
    an ideal estimator which has access to the unquantized data. The optimal
    quantization thresholds, however, are dependent on the unknown channel
    parameters. To cope with this difficulty, we propose an adaptive quantization
    (AQ) approach in which the thresholds are adaptively adjusted in a way such
    that the thresholds converge to the optimal thresholds, and a random
    quantization (RQ) scheme which randomly generate a set of nonidentical
    thresholds based on some statistical prior knowledge of the channel. Simulation
    results show that, our proposed AQ and RQ schemes, owing to their wisely
    devised thresholds, present a significant performance improvement over the
    conventional fixed quantization scheme that uses a fixed (typically zero)
    threshold, and meanwhile achieve a substantial training overhead reduction for
    channel estimation. In particular, even with a moderate number of pilot symbols
    (about 5 times the number of users), the AQ scheme can provide an achievable
    rate close to that of the perfect channel state information (CSI) case.

    Adaptive Network Coding Schemes for Satellite Communications

    Ala Eddine Gharsellaoui, Samah A. M. Ghanem, Daniele Tarchi, Alessandro Vanelli-Coralli
    Comments: IEEE Advanced Satellite Multimedia Systems Conference and the 14th Signal Processing for Space Communications Workshop (ASMS/SPSC), 2016
    Subjects: Information Theory (cs.IT)

    In this paper, we propose two novel physical layer aware adaptive network
    coding and coded modulation schemes for time variant channels. The proposed
    schemes have been applied to different satellite communications scenarios with
    different Round Trip Times (RTT). Compared to adaptive network coding, and
    classical non-adaptive network coding schemes for time variant channels, as
    benchmarks, the proposed schemes demonstrate that adaptation of packet
    transmission based on the channel variation and corresponding erasures allows
    for significant gains in terms of throughput, delay and energy efficiency. We
    shed light on the trade-off between energy efficiency and delay-throughput
    gains, demonstrating that conservative adaptive approaches that favors less
    transmission under high erasures, might cause higher delay and less throughput
    gains in comparison to non-conservative approaches that favor more transmission
    to account for high erasures.

    Capacity of the Gaussian Two-Pair Two-Way Relay Channel to Within 1/2 Bit

    Xiaojun Yuan, Haiyang Xin, Soung-Chang Liew, Yong Li
    Comments: 28 pages, 6 figures, journal
    Subjects: Information Theory (cs.IT)

    This paper studies the transceiver design of the Gaussian two-pair two-way
    relay channel (TWRC), where two pairs of users exchange information through a
    common relay in a pairwise manner. Our main contribution is to show that the
    capacity of the Gaussian two-pair TWRC is achievable to within 1/2 bit for
    arbitrary channel conditions. In the proof, we develop a hybrid coding scheme
    involving Gaussian random coding, nested lattice coding, superposition coding,
    and network-coded decoding. Further, we present a message-reassembling strategy
    to decouple the coding design for the uplink and downlink phases, so as to
    provide flexibility to fully exploit the channel randomness. Finally,
    sophisticated power allocation at the relay is necessary to approach the
    channel capacity under various channel conditions.

    Advances in Detection and Error Correction for Coherent Optical Communications: Regular, Irregular, and Spatially Coupled LDPC Code Designs

    Laurent Schmalen, Stephan ten Brink, Andreas Leven
    Comments: “Enabling Technologies for High Spectral-efficiency Coherent Optical Communication Networks” edited by X. Zhou and C. Xie, John Wiley & Sons, Inc., April 2016
    Subjects: Information Theory (cs.IT)

    In this chapter, we show how the use of differential coding and the presence
    of phase slips in the transmission channel affect the total achievable
    information rates and capacity of a system. By means of the commonly used QPSK
    modulation, we show that the use of differential coding does not decrease the
    total amount of reliably conveyable information over the channel. It is a
    common misconception that the use of differential coding introduces an
    unavoidable differential loss. This perceived differential loss is rather a
    consequence of simplified differential detection and decoding at the receiver.
    Afterwards, we show how capacity-approaching coding schemes based on LDPC and
    spatially coupled LDPC codes can be constructed by combining iterative
    demodulation and decoding. For this, we first show how to modify the
    differential decoder to account for phase slips and then how to use this
    modified differential decoder to construct good LDPC codes. This construction
    method can serve as a blueprint to construct good and practical LDPC codes for
    other applications with iterative detection, such as higher order modulation
    formats with non-square constellations, multi-dimensional optimized modulation
    formats, turbo equalization to mitigate ISI (e.g., due to nonlinearities) and
    many more. Finally, we introduce the class of spatially coupled (SC)-LDPC
    codes, which are a generalization of LDPC codes with some outstanding
    properties and which can be decoded with a very simple windowed decoder. We
    show that the universal behavior of spatially coupled codes makes them an ideal
    candidate for iterative differential demodulation/detection and decoding.

    Robust Transceiver Design Based on Interference Alignment for Multi-User Multi-Cell MIMO Networks with Channel Uncertainty

    Xianzhong Xie, Helin Yang, Athanasios V. Vasilakos
    Comments: 12 pages, 8 figures
    Subjects: Information Theory (cs.IT)

    In this paper, we firstly exploit the inter-user interference (IUI) and
    inter-cell interference (ICI) as useful references to develop a robust
    transceiver design based on interference alignment for a downlink multi-user
    multi-cell multiple-input multiple-output (MIMO) interference network under
    channel estimation error. At transmitters, we propose a two-tier transmit
    beamforming strategy, we first achieve the inner beamforming direction and
    allocated power by minimizing the interference leakage as well as maximizing
    the system energy efficiency, respectively. Then, for the outer beamformer
    design, we develop an efficient conjugate gradient Grassmann manifold subspace
    tracking algorithm to minimize the distances between the subspace spanned by
    interference and the interference subspace in the time varying channel. At
    receivers, we propose a practical interference alignment based on fast and
    robust fast data projection method (FDPM) subspace tracking algorithm, to
    achieve the receive beamformer under channel uncertainty. Numerical results
    show that our proposed robust transceiver design achieves better performance
    compared with some existing methods in terms of the sum rate and the energy
    efficiency.

    Massive MU-MIMO-OFDM Downlink with One-Bit DACs and Linear Precoding

    Sven Jacobsson, Giuseppe Durisi, Mikael Coldrey, Christoph Studer
    Subjects: Information Theory (cs.IT)

    Massive multiuser (MU) multiple-input multiple-output (MIMO) is foreseen to
    be a key technology in future wireless communication systems. In this paper, we
    analyze the downlink performance of an orthogonal frequency division
    multiplexing (OFDM)-based massive MU-MIMO system in which the base station (BS)
    is equipped with 1-bit digital-to-analog converters (DACs). Using Bussgang’s
    theorem, we characterize the performance achievable with linear precoders (such
    as maximal-ratio transmission and zero forcing) in terms of bit error rate
    (BER). Our analysis accounts for the possibility of oversampling the
    time-domain transmit signal before the DACs. We further develop a lower bound
    on the information-theoretic sum-rate throughput achievable with Gaussian
    inputs.

    Our results suggest that the performance achievable with 1-bit DACs in a
    massive MU-MIMO-OFDM downlink are satisfactory provided that the number of BS
    antennas is sufficiently large.

    Energy-Efficient Mobile Cooperative Computing

    Changsheng You, Kaibin Huang
    Comments: Submitted to IEEE Globecom 2017
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    Scavenging the idling computation resources at the enormous number of mobile
    devices will provide a new platform for implementing mobile cloud computing and
    thereby alleviate traffic congestion in core networks. The vision can be
    realized by mobile cooperative computing, referred to simply as co-computing.
    In this paper, we consider a co-computing system where a user offloads
    computation of input-data to a helper. To minimize the user’s energy
    consumption, the helper controls the offloading process based on a predicted
    CPU-idling profile, displaying time-varying spare computation resources. Assume
    the helper has a large buffer for storing the offloaded data. The derived
    solution for the optimal offloading control has an interesting graphical
    interpretation as follows. In the plane of user’s co-computable bits (by
    offloading) versus time, based on helper’s CPU-idling profile, a so called
    offloading feasibility region can be constructed constraining the range of
    offloaded bits at any time instant. Given the region, the optimal offloading is
    shown to be achieved by the well-known “string-pulling” strategy, graphically
    referring to pulling a string across the region. The solution approach is
    further extended to the case where the helper has a small buffer, via
    constructing an offloading feasibility tunnel based on the helper’s CPU-idling
    profile and buffer size. Furthermore, we prove that the problem of optimal data
    partitioning for offloading and local computing at the user is convex,
    admitting a simple solution using the sub-gradient method.

    Performance of Energy Harvesting Receivers with Power Optimization

    Zhengwei Ni, Mehul Motani
    Comments: 28 pages
    Subjects: Information Theory (cs.IT)

    The difficulty of modeling energy consumption in communication systems leads
    to challenges in energy harvesting (EH) systems, in which nodes scavenge energy
    from their environment. An EH receiver must harvest enough energy for
    demodulating and decoding. The energy required depends upon factors, like code
    rate and signal-to-noise ratio, which can be adjusted dynamically. We consider
    a receiver which harvests energy from ambient sources and the transmitter,
    meaning the received signal is used for both EH and information decoding.
    Assuming a generalized function for energy consumption, we maximize the total
    number of information bits decoded, under both average and peak power
    constraints at the transmitter, by carefully optimizing the power used for EH,
    power used for information transmission, fraction of time for EH, and code
    rate. For transmission over a single block, we find there exist problem
    parameters for which either maximizing power for information transmission or
    maximizing power for EH is optimal. In the general case, the optimal solution
    is a tradeoff of the two. For transmission over multiple blocks, we give an
    upper bound on performance and give sufficient and necessary conditions to
    achieve this bound. Finally, we give some numerical results to illustrate our
    results and analysis.

    RaPro: A Novel 5G Rapid Prototyping System Architecture

    Xi Yang, Zhichao Huang, Bin Han, Senjie Zhang, Chao-Kai Wen, Feifei Gao, Shi Jin
    Comments: accepted by IEEE Wireless Communication Letters
    Subjects: Information Theory (cs.IT)

    We propose a novel fifth-generation (5G) rapid prototyping (RaPro) system
    architecture by combining FPGA-privileged modules from a software defined radio
    (or FPGA-coprocessor) and high-level programming language for advanced
    algorithms from multi-core general purpose processors. The proposed system
    architecture exhibits excellent flexibility and scalability in the development
    of a 5G prototyping system. As a proof of concept, a multi-user full-dimension
    multiple-input and multiple-output system is established based on the proposed
    architecture. Experimental results demonstrate the superiority of the proposed
    architecture in large-scale antenna and wideband communication systems.

    A Novel Experimental Platform for In-Vessel Multi-Chemical Molecular Communications

    Nariman Farsad, David Pan, Andrea Goldsmith
    Subjects: Emerging Technologies (cs.ET); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Learning (cs.LG)

    This work presents a new multi-chemical experimental platform for molecular
    communication where the transmitter can release different chemicals. This
    platform is designed to be inexpensive and accessible, and it can be expanded
    to simulate different environments including the cardiovascular system and
    complex network of pipes in industrial complexes and city infrastructures. To
    demonstrate the capabilities of the platform, we implement a time-slotted
    binary communication system where a bit-0 is represented by an acid pulse, a
    bit-1 by a base pulse, and information is carried via pH signals. The channel
    model for this system, which is nonlinear and has long memories, is unknown.
    Therefore, we devise novel detection algorithms that use techniques from
    machine learning and deep learning to train a maximum-likelihood detector.
    Using these algorithms the bit error rate improves by an order of magnitude
    relative to the approach used in previous works. Moreover, our system achieves
    a data rate that is an order of magnitude higher than any of the previous
    molecular communication platforms.

    Mean Square Error of Neural Spike Train Sequence Matching with Optogenetics

    Adam Noel, Dimitrios Makrakis, Andrew W. Eckford
    Comments: 6 pages, 5 figures. Submitted to IEEE Global Communications Conference (IEEE GLOBECOM 2017) in April 2017
    Subjects: Neurons and Cognition (q-bio.NC); Information Theory (cs.IT); Biological Physics (physics.bio-ph)

    Optogenetics is an emerging field of neuroscience where neurons are
    genetically modified to express light-sensitive receptors that enable external
    control over when the neurons fire. Given the prominence of neuronal signaling
    within the brain and throughout the body, optogenetics has significant
    potential to improve the understanding of the nervous system and to develop
    treatments for neurological diseases. This paper uses a simple optogenetic
    model to compare the timing distortion between a randomly-generated target
    spike sequence and an externally-stimulated neuron spike sequence. The
    distortion is measured by filtering each sequence and finding the mean square
    error between the two filter outputs. The expected distortion is derived in
    closed form when the target sequence generation rate is sufficiently low.
    Derivations are verified via simulations.

    Structure and Randomness of Continuous-Time Discrete-Event Processes

    S. E. Marzen, J. P. Crutchfield
    Comments: 10 pages, 2 figures; this http URL
    Subjects: Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Statistics Theory (math.ST); Chaotic Dynamics (nlin.CD)

    Loosely speaking, the Shannon entropy rate is used to gauge a stochastic
    process’ intrinsic randomness; the statistical complexity gives the cost of
    predicting the process. We calculate, for the first time, the entropy rate and
    statistical complexity of stochastic processes generated by finite unifilar
    hidden semi-Markov models—memoryful, state-dependent versions of renewal
    processes. Calculating these quantities requires introducing novel mathematical
    objects ({epsilon}-machines of hidden semi-Markov processes) and new
    information-theoretic methods to stochastic processes.

    Automaton model of protein: dynamics of conformational and functional states

    Andrei Khrennikov, Ekaterina Yurova
    Subjects: Biomolecules (q-bio.BM); Information Theory (cs.IT); Quantum Physics (quant-ph)

    In this conceptual paper we propose to explore the analogy between
    ontic/epistemic description of quantum phenomena and interrelation between
    dynamics of conformational and functional states of proteins. Another new idea
    is to apply theory of automata to model the latter dynamics. In our model
    protein’s behavior is modeled with the aid of two dynamical systems, ontic and
    epistemic, which describe evolution of conformational and functional states of
    proteins, respectively. The epistemic automaton is constructed from the ontic
    automaton on the basis of functional (observational) equivalence relation on
    the space of ontic states. This reminds a few approaches to emergent quantum
    mechanics in which a quantum (epistemic) state is treated as representing a
    class of prequantum (ontic) states. This approach does not match to the
    standard {it protein structure-function paradigm.} However, it is perfect for
    modeling of behavior of intrinsically disordered proteins. Mathematically space
    of protein’s ontic states (conformational states) is modeled with the aid of
    (p)-adic numbers or more general ultrametric spaces encoding the internal
    hierarchical structure of proteins. Connection with theory of (p)-adic
    dynamical systems is briefly discussed.

    Randomized detection and detection capacity of multidetector networks

    Ghurumuruhan Ganesan
    Subjects: Probability (math.PR); Information Theory (cs.IT)

    In this paper, we study the following detection problem. There are (n)
    detectors randomly placed in the unit square (S =
    left[-frac{1}{2},frac{1}{2}
    ight]^2) assigned to detect the presence of a
    source located at the origin. Time is divided into slots of unit length and
    (D_i(t) in {0,1}) represents the (random) decision of the (i^{
    m th})
    detector in time slot (t). The location of the source is unknown to the
    detectors and the goal is to design schemes that use the decisions
    ({D_i(t)}_{i,t}) and detect the presence of the source in as short time as
    possible.

    We first determine the minimum achievable detection time (T_{cap}) and show
    the existence of emph{randomized} detection schemes that have detection times
    arbitrarily close to (T_{cap}) for almost all configuration of detectors,
    provided the number of detectors (n) is sufficiently large. We call such
    schemes as emph{capacity achieving} and completely characterize all capacity
    achieving detection schemes.

    On the Gap Between Strict-Saddles and True Convexity: An Omega(log d) Lower Bound for Eigenvector Approximation

    Max Simchowitz, Ahmed El Alaoui, Benjamin Recht
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Combinatorics (math.CO); Machine Learning (stat.ML)

    We prove a emph{query complexity} lower bound on rank-one principal
    component analysis (PCA). We consider an oracle model where, given a symmetric
    matrix (M in mathbb{R}^{d imes d}), an algorithm is allowed to make (T)
    emph{exact} queries of the form (w^{(i)} = Mv^{(i)}) for (i in
    {1,dots,T}), where (v^{(i)}) is drawn from a distribution which depends
    arbitrarily on the past queries and measurements ({v^{(j)},w^{(j)}}_{1 le j
    le i-1}). We show that for a small constant (epsilon), any adaptive,
    randomized algorithm which can find a unit vector (widehat{v}) for which
    (widehat{v}^{ op}Mwidehat{v} ge (1-epsilon)|M|), with even small
    probability, must make (T = Omega(log d)) queries. In addition to settling a
    widely-held folk conjecture, this bound demonstrates a fundamental gap between
    convex optimization and “strict-saddle” non-convex optimization of which PCA is
    a canonical example: in the former, first-order methods can have dimension-free
    iteration complexity, whereas in PCA, the iteration complexity of
    gradient-based methods must necessarily grow with the dimension. Our argument
    proceeds via a reduction to estimating the rank-one spike in a deformed Wigner
    model. We establish lower bounds for this model by developing a “truncated”
    analogue of the (chi^2) Bayes-risk lower bound of Chen et al.




沪ICP备19023445号-2号
友情链接