IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 9 May 2017

    我爱机器学习(52ml.net)发表于 2017-05-09 00:00:00
    love 0

    Neural and Evolutionary Computing

    Developing All-Skyrmion Spiking Neural Network

    Zhezhi He, Deliang Fan
    Subjects: Neural and Evolutionary Computing (cs.NE)

    In this work, we have proposed a revolutionary neuromorphic computing
    methodology to implement All-Skyrmion Spiking Neural Network (AS-SNN). Such
    proposed methodology is based on our finding that skyrmion is a topological
    stable spin texture and its spatiotemporal motion along the magnetic nano-track
    intuitively interprets the pulse signal transmission between two interconnected
    neurons. In such design, spike train in SNN could be encoded as particle-like
    skyrmion train and further processed by the proposed skyrmion-synapse and
    skyrmion-neuron within the same magnetic nano-track to generate output skyrmion
    as post-spike. Then, both pre-neuron spikes and post-neuron spikes are encoded
    as particle-like skyrmions without conversion between charge and spin signals,
    which fundamentally differentiates our proposed design from other hybrid
    Spin-CMOS designs. The system level simulation shows 87.1% inference accuracy
    for handwritten digit recognition task, while the energy dissipation is ~1
    fJ/per spike which is 3 orders smaller in comparison with CMOS based IBM
    TrueNorth system.

    DropIn: Making Reservoir Computing Neural Networks Robust to Missing Inputs by Dropout

    Davide Bacciu, Francesco Crecchi, Davide Morelli
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    The paper presents a novel, principled approach to train recurrent neural
    networks from the Reservoir Computing family that are robust to missing part of
    the input features at prediction time. By building on the ensembling properties
    of Dropout regularization, we propose a methodology, named DropIn, which
    efficiently trains a neural model as a committee machine of subnetworks, each
    capable of predicting with a subset of the original input features. We discuss
    the application of the DropIn methodology in the context of Reservoir Computing
    models and targeting applications characterized by input sources that are
    unreliable or prone to be disconnected, such as in pervasive wireless sensor
    networks and ambient intelligence. We provide an experimental assessment using
    real-world data from such application domains, showing how the Dropin
    methodology allows to maintain predictive performances comparable to those of a
    model without missing features, even when 20\%-50\% of the inputs are not
    available.

    A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

    Xinyu Zhang, Srinjoy Das, Ojash Neopane, Ken Kreutz-Delgado
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    In recent years deep learning algorithms have shown extremely high
    performance on machine learning tasks such as image classification and speech
    recognition. In support of such applications, various FPGA accelerator
    architectures have been proposed for convolutional neural networks (CNNs) that
    enable high performance for classification tasks at lower power than CPU and
    GPU processors. However, to date, there has been little research on the use of
    FPGA implementations of deconvolutional neural networks (DCNNs). DCNNs, also
    known as generative CNNs, encode high-dimensional probability distributions and
    have been widely used for computer vision applications such as scene
    completion, scene segmentation, image creation, image denoising, and
    super-resolution imaging. We propose an FPGA architecture for deconvolutional
    networks built around an accelerator which effectively handles the complex
    memory access patterns needed to perform strided deconvolutions, and that
    supports convolution as well. We also develop a three-step design optimization
    method that systematically exploits statistical analysis, design space
    exploration and VLSI optimization. To verify our FPGA deconvolutional
    accelerator design methodology we train DCNNs offline on two representative
    datasets using the generative adversarial network method (GAN) run on
    Tensorflow, and then map these DCNNs to an FPGA DCNN-plus-accelerator
    implementation to perform generative inference on a Xilinx Zynq-7000 FPGA. Our
    DCNN implementation achieves a peak performance density of 0.012 GOPs/DSP.

    Learning Distributed Representations of Texts and Entities from Knowledge Base

    Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji
    Comments: Accepted at TACL
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    We describe a neural network model that jointly learns distributed
    representations of texts and knowledge base (KB) entities. Given a text in the
    KB, we train our proposed model to predict entities that are relevant to the
    text. Our model is designed to be generic with the ability to address various
    NLP tasks with ease. We train the model using a large corpus of texts and their
    entity annotations extracted from Wikipedia. We evaluated the model on three
    important NLP tasks (i.e., sentence textual similarity, entity linking, and
    factoid question answering) involving both unsupervised and supervised
    settings. As a result, we achieved state-of-the-art results on all three of
    these tasks.


    Computer Vision and Pattern Recognition

    Real-Time User-Guided Image Colorization with Learned Deep Priors

    Richard Zhang, Jun-Yan Zhu, Phillip Isola, Xinyang Geng, Angela S. Lin, Tianhe Yu, Alexei A. Efros
    Comments: Accepted to SIGGRAPH 2017. Project page: this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a deep learning approach for user-guided image colorization. The
    system directly maps a grayscale image, along with sparse, local user “hints”
    to an output colorization with a Convolutional Neural Network (CNN). Rather
    than using hand-defined rules, the network propagates user edits by fusing
    low-level cues along with high-level semantic information, learned from
    large-scale data. We train on a million images, with simulated user inputs. To
    guide the user towards efficient input selection, the system recommends likely
    colors based on the input image and current user inputs. The colorization is
    performed in a single feed-forward pass, enabling real-time use. Even with
    randomly simulated user inputs, we show that the proposed system helps novice
    users quickly create realistic colorizations, and offers large improvements in
    colorization quality with just a minute of use. In addition, we demonstrate
    that the framework can incorporate other user “hints” to the desired
    colorization, showing an application to color histogram transfer. Our code and
    models are available at this https URL

    Light Field Video Capture Using a Learning-Based Hybrid Imaging System

    Ting-Chun Wang, Jun-Yan Zhu, Nima Khademi Kalantari, Alexei A. Efros, Ravi Ramamoorthi
    Comments: ACM Transactions on Graphics (Proceedings of SIGGRAPH 2017)
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

    Light field cameras have many advantages over traditional cameras, as they
    allow the user to change various camera settings after capture. However,
    capturing light fields requires a huge bandwidth to record the data: a modern
    light field camera can only take three images per second. This prevents current
    consumer light field cameras from capturing light field videos. Temporal
    interpolation at such extreme scale (10x, from 3 fps to 30 fps) is infeasible
    as too much information will be entirely missing between adjacent frames.
    Instead, we develop a hybrid imaging system, adding another standard video
    camera to capture the temporal information. Given a 3 fps light field sequence
    and a standard 30 fps 2D video, our system can then generate a full light field
    video at 30 fps. We adopt a learning-based approach, which can be decomposed
    into two steps: spatio-temporal flow estimation and appearance estimation. The
    flow estimation propagates the angular information from the light field
    sequence to the 2D video, so we can warp input images to the target view. The
    appearance estimation then combines these warped images to output the final
    pixels. The whole process is trained end-to-end using convolutional neural
    networks. Experimental results demonstrate that our algorithm outperforms
    current video interpolation methods, enabling consumer light field videography,
    and making applications such as refocusing and parallax view generation
    achievable on videos for the first time.

    You said that?

    Joon Son Chung, Amir Jamaludin, Andrew Zisserman
    Comments: this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a method for generating a video of a talking face. The method
    takes as inputs: (i) one still image of the target face, and (ii) an audio
    speech segment; and outputs a video of the target face lip synched with the
    audio. The method works in real time, and at run time, is applicable to
    previously unseen faces and audio (i.e. not part of the training data).

    To achieve this we propose an encoder-decoder CNN model that uses a joint
    embedding of the face and audio to generate synthesised talking face video
    frames. The model is trained on tens of hours of unlabelled videos.

    We also show results of re-dubbing videos using speech from a different
    person.

    Konzept für Bildanalysen in Hochdurchsatz-Systemen am Beispiel des Zebrabärblings

    Rüdiger Alshut
    Journal-ref: Ph.D. Thesis, Karlsruhe Institute of Technology, KIT Scientific
    Publishing, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)

    With image-based high-throughput experiments, new challenges arise in both,
    the design of experiments and the automated analysis. To be able to handle the
    massive number of single experiments and the corresponding amount of data, a
    comprehensive concept for the design of experiments and a new evaluation method
    is needed. This work proposes a new method for an optimized experiment layout
    that enables the determination of parameters, adapted for the needs of
    automated image analysis. Furthermore, a catalogue of new image analysis
    modules, especially developed for zebrafish analysis, is presented. The
    combination of both parts offers the user, usually a biologist, an approach for
    high-throughput zebrafish image analysis, which enables the extraction of new
    signals and optimizes the design of experiments. The result is a reduction of
    data amount, redundant information and workload as well as classification
    errors.

    Temporal Segment Networks for Action Recognition in Videos

    Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool
    Comments: 14 pages. An extension of submission at this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep convolutional networks have achieved great success for image
    recognition. However, for action recognition in videos, their advantage over
    traditional methods is not so evident. We present a general and flexible
    video-level framework for learning action models in videos. This method, called
    temporal segment network (TSN), aims to model long-range temporal structures
    with a new segment-based sampling and aggregation module. This unique design
    enables our TSN to efficiently learn action models by using the whole action
    videos. The learned models could be easily adapted for action recognition in
    both trimmed and untrimmed videos with simple average pooling and multi-scale
    temporal window integration, respectively. We also study a series of good
    practices for the instantiation of TSN framework given limited training
    samples. Our approach obtains the state-the-of-art performance on four
    challenging action recognition benchmarks: HMDB51 (71.0%), UCF101 (94.9%),
    THUMOS14 (80.1%), and ActivityNet v1.2 (89.6%). Using the proposed RGB
    difference for motion models, our method can still achieve competitive accuracy
    on UCF101 (91.0%) while running at 340 FPS. Furthermore, based on the temporal
    segment networks, we won the video classification track at the ActivityNet
    challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and
    the proposed good practices.

    Learning non-maximum suppression

    Jan Hosang, Rodrigo Benenson, Bernt Schiele
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Object detectors have hugely profited from moving towards an end-to-end
    learning paradigm: proposals, features, and the classifier becoming one neural
    network improved results two-fold on general object detection. One
    indispensable component is non-maximum suppression (NMS), a post-processing
    algorithm responsible for merging all detections that belong to the same
    object. The de facto standard NMS algorithm is still fully hand-crafted,
    suspiciously simple, and — being based on greedy clustering with a fixed
    distance threshold — forces a trade-off between recall and precision. We
    propose a new network architecture designed to perform NMS, using only boxes
    and their score. We report experiments for person detection on PETS and for
    general object categories on the COCO dataset. Our approach shows promise
    providing improved localization and occlusion handling.

    Object Detection by Spatio-Temporal Analysis and Tracking of the Detected Objects in a Video with Variable Background

    Kumar S. Ray, Vijayan K. Asari, Soma Chakraborty
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper we propose a novel approach for detecting and tracking objects
    in videos with variable background i.e. videos captured by moving cameras
    without any additional sensor. In a video captured by a moving camera, both the
    background and foreground are changing in each frame of the image sequence. So
    for these videos, modeling a single background with traditional background
    modeling methods is infeasible and thus the detection of actual moving object
    in a variable background is a challenging task. To detect actual moving object
    in this work, spatio-temporal blobs have been generated in each frame by
    spatio-temporal analysis of the image sequence using a three-dimensional Gabor
    filter. Then individual blobs, which are parts of one object are merged using
    Minimum Spanning Tree to form the moving object in the variable background. The
    height, width and four-bin gray-value histogram of the object are calculated as
    its features and an object is tracked in each frame using these features to
    generate the trajectories of the object through the video sequence. In this
    work, problem of data association during tracking is solved by Linear
    Assignment Problem and occlusion is handled by the application of kalman
    filter. The major advantage of our method over most of the existing tracking
    algorithms is that, the proposed method does not require initialization in the
    first frame or training on sample data to perform. Performance of the algorithm
    has been tested on benchmark videos and very satisfactory result has been
    achieved. The performance of the algorithm is also comparable and superior with
    respect to some benchmark algorithms.

    Keeping the Bad Guys Out: Protecting and Vaccinating Deep Learning with JPEG Compression

    Nilaksh Das, Madhuri Shanbhogue, Shang-Tse Chen, Fred Hohman, Li Chen, Michael E. Kounavis, Duen Horng Chau
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)

    Deep neural networks (DNNs) have achieved great success in solving a variety
    of machine learning (ML) problems, especially in the domain of image
    recognition. However, recent research showed that DNNs can be highly vulnerable
    to adversarially generated instances, which look seemingly normal to human
    observers, but completely confuse DNNs. These adversarial samples are crafted
    by adding small perturbations to normal, benign images. Such perturbations,
    while imperceptible to the human eye, are picked up by DNNs and cause them to
    misclassify the manipulated instances with high confidence. In this work, we
    explore and demonstrate how systematic JPEG compression can work as an
    effective pre-processing step in the classification pipeline to counter
    adversarial attacks and dramatically reduce their effects (e.g., Fast Gradient
    Sign Method, DeepFool). An important component of JPEG compression is its
    ability to remove high frequency signal components, inside square blocks of an
    image. Such an operation is equivalent to selective blurring of the image,
    helping remove additive perturbations. Further, we propose an ensemble-based
    technique that can be constructed quickly from a given well-performing DNN, and
    empirically show how such an ensemble that leverages JPEG compression can
    protect a model from multiple types of adversarial attacks, without requiring
    knowledge about the model.

    Multi Resolution LSTM For Long Term Prediction In Neural Activity Video

    Yilin Song, Jonathan Viventi, Yao Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Epileptic seizures are caused by abnormal, overly syn- chronized, electrical
    activity in the brain. The abnor- mal electrical activity manifests as waves,
    propagating across the brain. Accurate prediction of the propagation velocity
    and direction of these waves could enable real- time responsive brain
    stimulation to suppress or prevent the seizures entirely. However, this problem
    is very chal- lenging because the algorithm must be able to predict the neural
    signals in a sufficiently long time horizon to allow enough time for medical
    intervention. We consider how to accomplish long term prediction using a LSTM
    network. To alleviate the vanishing gradient problem, we propose two
    encoder-decoder-predictor structures, both using multi-resolution
    representation. The novel LSTM structure with multi-resolution layers could
    significantly outperform the single-resolution benchmark with similar number of
    parameters. To overcome the blurring effect associated with video prediction in
    the pixel domain using standard mean square error (MSE) loss, we use energy-
    based adversarial training to improve the long-term pre- diction. We
    demonstrate and analyze how a discriminative model with an encoder-decoder
    structure using 3D CNN model improves long term prediction.

    Generative Cooperative Net for Image Generation and Data Augmentation

    Qiangeng Xu, Zengchang Qin, Tao Wan
    Comments: 12 pages, 8 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    How to build a good model for image generation given an abstract concept is a
    fundamental problem in computer vision. In this paper, we explore a generative
    model for the task of generating unseen images with desired features. We
    propose the Generative Cooperative Net (GCN) for image generation. The idea is
    similar to generative adversarial networks except that the generators and
    discriminators are trained to work accordingly. Our experiments on hand-written
    digit generation and facial expression generation show that GCN’s two
    cooperative counterparts (the generator and the classifier) can work together
    nicely and achieve promising results. We also discovered a usage of such
    generative model as an data-augmentation tool. Our experiment of applying this
    method on a recognition task shows that it is very effective comparing to other
    existing methods. It is easy to set up and could help generate a very large
    synthesized dataset.

    A Dual-Source Approach for 3D Human Pose Estimation from a Single Image

    Umar Iqbal, Andreas Doering, Hashim Yasin, Björn Krüger, Andreas Weber, Juergen Gall
    Comments: Submitted to IEEE TPAMI. Extended version of CVPR-2016 paper, arXiv:1509.06720
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this work we address the challenging problem of 3D human pose estimation
    from single images. Recent approaches learn deep neural networks to regress 3D
    pose directly from images. One major challenge for such methods, however, is
    the collection of training data. Specifically, collecting large amounts of
    training data containing unconstrained images annotated with accurate 3D poses
    is infeasible. We therefore propose to use two independent training sources.
    The first source consists of accurate 3D motion capture data, and the second
    source consists of unconstrained images with annotated 2D poses. To integrate
    both sources, we propose a dual-source approach that combines 2D pose
    estimation with efficient 3D pose retrieval. To this end, we first convert the
    motion capture data into a normalized 2D pose space, and separately learn a 2D
    pose estimation model from the image data. During inference, we estimate the 2D
    pose and efficiently retrieve the nearest 3D poses. We then jointly estimate a
    mapping from the 3D pose space to the image and reconstruct the 3D pose. We
    provide a comprehensive evaluation of the proposed method and experimentally
    demonstrate the effectiveness of our approach, even when the skeleton
    structures of the two sources differ substantially.

    Video Processing for Barycenter Trajectory Identification in Diving

    Stefano Frassinelli, Alessandro Niccolai, Riccardo E. Zich
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The aim of this paper is to show a procedure for identify the barycentre of a
    diver by means of video processing. This procedure is aimed to introduce
    quantitative analysis tools and diving performance measurement and therefore in
    diving training. Sport performance analysis is a trend that is growing
    exponentially for all level athletes: it has been applied extensively in some
    sports such as cycling. Sport performance analysis has been applied mainly for
    high level athletes; in order to be used also for middle or low level athletes
    the proposed technique has to be flexible and low cost. Video processing is
    suitable to fulfil both these requirements. In diving, the first analysis that
    has to be done is the barycentre trajectory tracking.

    Face Recognition Machine Vision System Using Eigenfaces

    Fares Jalled
    Comments: 7 pages, 11 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Face Recognition is a common problem in Machine Learning. This technology has
    already been widely used in our lives. For example, Facebook can automatically
    tag people’s faces in images, and also some mobile devices use face recognition
    to protect private security. Face images comes with different background,
    variant illumination, different facial expression and occlusion. There are a
    large number of approaches for the face recognition. Different approaches for
    face recognition have been experimented with specific databases which consist
    of single type, format and composition of image. Doing so, these approaches
    don’t suit with different face databases. One of the basic face recognition
    techniques is eigenface which is quite simple, efficient, and yields generally
    good results in controlled circumstances. So, this paper presents an
    experimental performance comparison of face recognition using Principal
    Component Analysis (PCA) and Normalized Principal Component Analysis (NPCA).
    The experiments are carried out on the ORL (ATT) and Indian face database (IFD)
    which contain variability in expression, pose, and facial details. The results
    obtained for the two methods have been compared by varying the number of
    training images. MATLAB is used for implementing algorithms also.

    Scene Text Eraser

    Toshiki Nakamura, Anna Zhu, Keiji Yanai, Seiichi Uchida
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    The character information in natural scene images contains various personal
    information, such as telephone numbers, home addresses, etc. It is a high risk
    of leakage the information if they are published. In this paper, we proposed a
    scene text erasing method to properly hide the information via an inpainting
    convolutional neural network (CNN) model. The input is a scene text image, and
    the output is expected to be text erased image with all the character regions
    filled up the colors of the surrounding background pixels. This work is
    accomplished by a CNN model through convolution to deconvolution with
    interconnection process. The training samples and the corresponding inpainting
    images are considered as teaching signals for training. To evaluate the text
    erasing performance, the output images are detected by a novel scene text
    detection method. Subsequently, the same measurement on text detection is
    utilized for testing the images in benchmark dataset ICDAR2013. Compared with
    direct text detection way, the scene text erasing process demonstrates a
    drastically decrease on the precision, recall and f-score. That proves the
    effectiveness of proposed method for erasing the text in natural scene images.

    Deep Descriptor Transforming for Image Co-Localization

    Xiu-Shen Wei, Chen-Lin Zhang, Yao Li, Chen-Wei Xie, Jianxin Wu, Chunhua Shen, Zhi-Hua Zhou
    Comments: Accepted by IJCAI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Reusable model design becomes desirable with the rapid expansion of machine
    learning applications. In this paper, we focus on the reusability of
    pre-trained deep convolutional models. Specifically, different from treating
    pre-trained models as feature extractors, we reveal more treasures beneath
    convolutional layers, i.e., the convolutional activations could act as a
    detector for the common object in the image co-localization problem. We propose
    a simple but effective method, named Deep Descriptor Transforming (DDT), for
    evaluating the correlations of descriptors and then obtaining the
    category-consistent regions, which can accurately locate the common object in a
    set of images. Empirical studies validate the effectiveness of the proposed DDT
    method. On benchmark image co-localization datasets, DDT consistently
    outperforms existing state-of-the-art methods by a large margin. Moreover, DDT
    also demonstrates good generalization ability for unseen categories and
    robustness for dealing with noisy data.

    What Can Help Pedestrian Detection?

    Jiayuan Mao, Tete Xiao, Yuning Jiang, Zhimin Cao
    Comments: Accepted to IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Aggregating extra features has been considered as an effective approach to
    boost traditional pedestrian detection methods. However, there is still a lack
    of studies on whether and how CNN-based pedestrian detectors can benefit from
    these extra features. The first contribution of this paper is exploring this
    issue by aggregating extra features into CNN-based pedestrian detection
    framework. Through extensive experiments, we evaluate the effects of different
    kinds of extra features quantitatively. Moreover, we propose a novel network
    architecture, namely HyperLearner, to jointly learn pedestrian detection as
    well as the given extra feature. By multi-task training, HyperLearner is able
    to utilize the information of given features and improve detection performance
    without extra inputs in inference. The experimental results on multiple
    pedestrian benchmarks validate the effectiveness of the proposed HyperLearner.

    High-Level Concepts for Affective Understanding of Images

    Afsheen Rafaqat Ali, Usman Shahid, Mohsen Ali, Jeffrey Ho
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper aims to bridge the affective gap between image content and the
    emotional response of the viewer it elicits by using High-Level Concepts
    (HLCs). In contrast to previous work that relied solely on low-level features
    or used convolutional neural network (CNN) as a black-box, we use HLCs
    generated by pretrained CNNs in an explicit way to investigate the
    relations/associations between these HLCs and a (small) set of Ekman’s
    emotional classes. As a proof-of-concept, we first propose a linear admixture
    model for modeling these relations, and the resulting computational framework
    allows us to determine the associations between each emotion class and certain
    HLCs (objects and places). This linear model is further extended to a nonlinear
    model using support vector regression (SVR) that aims to predict the viewer’s
    emotional response using both low-level image features and HLCs extracted from
    images. These class-specific regressors are then assembled into a regressor
    ensemble that provide a flexible and effective predictor for predicting
    viewer’s emotional responses from images. Experimental results have
    demonstrated that our results are comparable to existing methods, with a clear
    view of the association between HLCs and emotional classes that is ostensibly
    missing in most existing work.

    ChineseFoodNet: A large-scale Image Dataset for Chinese Food Recognition

    Xin Chen, Hua Zhou, Liang Diao
    Comments: 3 pages, 1 figure
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper introduces a new and challenging large-scale food image dataset
    called “ChineseFoodNet”, which aims to automatically recognizing pictured
    Chinese dishes. We describe how to build the dataset such as category
    selection, data cleaning and labelling.

    Automatic Recognition of Mammal Genera on Camera-Trap Images using Multi-Layer Robust Principal Component Analysis and Mixture Neural Networks

    Jhony-Heriberto Giraldo-Zuluaga, Augusto Salazar, Alexander Gomez, Angélica Diaz-Pulido
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The segmentation and classification of animals from camera-trap images is due
    to the conditions under which the images are taken, a difficult task. This work
    presents a method for classifying and segmenting mammal genera from camera-trap
    images. Our method uses Multi-Layer Robust Principal Component Analysis (RPCA)
    for segmenting, Convolutional Neural Networks (CNNs) for extracting features,
    Least Absolute Shrinkage and Selection Operator (LASSO) for selecting features,
    and Artificial Neural Networks (ANNs) or Support Vector Machines (SVM) for
    classifying mammal genera present in the Colombian forest. We evaluated our
    method with the camera-trap images from the Alexander von Humboldt Biological
    Resources Research Institute. We obtained an accuracy of 92.65% classifying 8
    mammal genera and a False Positive (FP) class, using automatic-segmented
    images. On the other hand, we reached 90.32% of accuracy classifying 10 mammal
    genera, using ground-truth images only. Unlike almost all previous works, we
    confront the animal segmentation and genera classification in the camera-trap
    recognition. This method shows a new approach toward a fully-automatic
    detection of animals from camera-trap images.

    AirDraw: Leveraging Smart Watch Motion Sensors for Mobile Human Computer Interactions

    Seyed A Sajjadi, Danial Moazen, Ani Nahapetian
    Comments: 6 pages, AirDraw, Leveraging Smart Watch Motion Sensors for Mobile Human Computer Interactions : IEEE, CCNC 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)

    Wearable computing is one of the fastest growing technologies today. Smart
    watches are poised to take over at least of half the wearable devices market in
    the near future. Smart watch screen size, however, is a limiting factor for
    growth, as it restricts practical text input. On the other hand, wearable
    devices have some features, such as consistent user interaction and hands-free,
    heads-up operations, which pave the way for gesture recognition methods of text
    entry. This paper proposes a new text input method for smart watches, which
    utilizes motion sensor data and machine learning approaches to detect letters
    written in the air by a user. This method is less computationally intensive and
    less expensive when compared to computer vision approaches. It is also not
    affected by lighting factors, which limit computer vision solutions. The
    AirDraw system prototype developed to test this approach is presented.
    Additionally, experimental results close to 71% accuracy are presented.

    Handwritten Bangla Digit Recognition Using Deep Learning

    Md Zahangir Alom, Paheding Sidike, Tarek M. Taha, Vijayan K. Asari
    Comments: 12 pages, 10 figures, 3 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In spite of the advances in pattern recognition technology, Handwritten
    Bangla Character Recognition (HBCR) (such as alpha-numeric and special
    characters) remains largely unsolved due to the presence of many perplexing
    characters and excessive cursive in Bangla handwriting. Even the best existing
    recognizers do not lead to satisfactory performance for practical applications.
    To improve the performance of Handwritten Bangla Digit Recognition (HBDR), we
    herein present a new approach based on deep neural networks which have recently
    shown excellent performance in many pattern recognition and machine learning
    applications, but has not been throughly attempted for HBDR. We introduce
    Bangla digit recognition techniques based on Deep Belief Network (DBN),
    Convolutional Neural Networks (CNN), CNN with dropout, CNN with dropout and
    Gaussian filters, and CNN with dropout and Gabor filters. These networks have
    the advantage of extracting and using feature information, improving the
    recognition of two dimensional shapes with a high degree of invariance to
    translation, scaling and other pattern distortions. We systematically evaluated
    the performance of our method on publicly available Bangla numeral image
    database named CMATERdb 3.1.1. From experiments, we achieved 98.78% recognition
    rate using the proposed method: CNN with Gabor features and dropout, which
    outperforms the state-of-the-art algorithms for HDBR.

    Large scale digital prostate pathology image analysis combining feature extraction and deep neural network

    Naiyun Zhou, Andrey Fedorov, Fiona Fennessy, Ron Kikinis, Yi Gao
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Histopathological assessments, including surgical resection and core needle
    biopsy, are the standard procedures in the diagnosis of the prostate cancer.
    Current interpretation of the histopathology images includes the determination
    of the tumor area, Gleason grading, and identification of certain
    prognosis-critical features. Such a process is not only tedious, but also prune
    to intra/inter-observe variabilities. Recently, FDA cleared the marketing of
    the first whole slide imaging system for digital pathology. This opens a new
    era for the computer aided prostate image analysis and feature extraction based
    on the digital histopathology images. In this work, we present an analysis
    pipeline that includes localization of the cancer region, grading, area ratio
    of different Gleason grades, and cytological/architectural feature extraction.
    The proposed algorithm combines the human engineered feature extraction as well
    as those learned by the deep neural network. Moreover, the entire pipeline is
    implemented to directly operate on the whole slide images produced by the
    digital scanners and is therefore potentially easy to translate into clinical
    practices. The algorithm is tested on 368 whole slide images from the TCGA data
    set and achieves an overall accuracy of 75% in differentiating Gleason 3+4 with
    4+3 slides.

    Towards Applying the OPRA Theory to Shape Similarity

    Christopher H. Dorr, Reinhard Moratz
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The motivation for using qualitative shape descriptions is as follows:
    qualitative shape descriptions can implicitly act as a schema for measuring the
    similarity of shapes, which has the potential to be cognitively adequate. Then,
    shapes which are similar to each other would also be similar for a pattern
    recognition algorithm. There is substantial work in pattern recognition and
    computer vision dealing with shape similarity. Here with our approach to
    qualitative shape descriptions and shape similarity, the focus is on achieving
    a representation using only simple predicates that a human could even apply
    without computer support.

    TrajectoryNet: An Embedded GPS Trajectory Representation for Point-based Classification Using Recurrent Neural Networks

    Xiang Jiang, Erico N de Souza, Ahmad Pesaranghader, Baifan Hu, Daniel L. Silver, Stan Matwin
    Comments: 16 pages, 6 figures, under review at ECML PKDD 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Understanding and discovering knowledge from GPS (Global Positioning System)
    traces of human activities is an essential topic in mobility-based urban
    computing. We propose TrajectoryNet-a neural network architecture for
    point-based trajectory classification to infer real world human transportation
    modes from GPS traces. To overcome the challenge of capturing the underlying
    latent factors in the low-dimensional and heterogeneous feature space imposed
    by GPS data, we develop a novel representation that embeds the original feature
    space into another space that can be understood as a form of basis expansion.
    We also enrich the feature space via segment-based information and use Maxout
    activations to improve the predictive power of Recurrent Neural Networks
    (RNNs). We achieve over 98% classification accuracy when detecting four types
    of transportation modes, outperforming existing models without additional
    sensory data or location-based prior knowledge.

    Simultaneous Super-Resolution and Cross-Modality Synthesis of 3D Medical Images using Weakly-Supervised Joint Convolutional Sparse Coding

    Yawen Huang, Ling Shao, Alejandro F. Frangi
    Comments: 10 pages, 6 figures. Accepted by CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Magnetic Resonance Imaging (MRI) offers high-resolution emph{in vivo}
    imaging and rich functional and anatomical multimodality tissue contrast. In
    practice, however, there are challenges associated with considerations of
    scanning costs, patient comfort, and scanning time that constrain how much data
    can be acquired in clinical or research studies. In this paper, we explore the
    possibility of generating high-resolution and multimodal images from
    low-resolution single-modality imagery. We propose the weakly-supervised joint
    convolutional sparse coding to simultaneously solve the problems of
    super-resolution (SR) and cross-modality image synthesis. The learning process
    requires only a few registered multimodal image pairs as the training set.
    Additionally, the quality of the joint dictionary learning can be improved
    using a larger set of unpaired images. To combine unpaired data from different
    image resolutions/modalities, a hetero-domain image alignment term is proposed.
    Local image neighborhoods are naturally preserved by operating on the whole
    image domain (as opposed to image patches) and using joint convolutional sparse
    coding. The paired images are enhanced in the joint learning process with
    unpaired data and an additional maximum mean discrepancy term, which minimizes
    the dissimilarity between their feature distributions. Experiments show that
    the proposed method outperforms state-of-the-art techniques on both SR
    reconstruction and simultaneous SR and cross-modality synthesis.

    Deep Visual Attention Prediction

    Wenguan Wang, Jianbing Shen
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep Convolutional Neural Networks (CNNs) have made substantial improvement
    on human attention prediction. There still remains room for improvement over
    deep learning based attention models that do not explicitly deal with
    scale-space feature learning problem. Our method learns to predict human eye
    fixation with view-free scenes based on an end-to-end deep learning
    architecture. The attention model captures hierarchical saliency information
    from deep, coarse layers with global saliency information to shallow, fine
    layers with local saliency response. We base our model on a skip-layer network
    structure, which predicts human attention from multiple convolutional layers
    with various reception fields. Final saliency prediction is achieved via the
    cooperation of those global and local predictions. Our model is learned with a
    deep supervision manner, where supervision is directly fed into multi-level
    layers, instead of previous approaches of providing supervision only at the
    output layer and propagating this supervision back to earlier layers. Our model
    thus incorporates multi-level saliency predictions within a single network,
    which significantly decreases the redundancy of previous approaches of learning
    multiple network streams with different input scales. Extensive experimental
    analysis on various challenging benchmark datasets demonstrate our method
    yields state-of-the-art performance with competitive inference time.

    Context-Aware Trajectory Prediction

    Federico Bartoli, Giuseppe Lisanti, Lamberto Ballan, Alberto Del Bimbo
    Comments: Submitted to BMVC 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Human motion and behaviour in crowded spaces is influenced by several
    factors, such as the dynamics of other moving agents in the scene, as well as
    the static elements that might be perceived as points of attraction or
    obstacles. In this work, we present a new model for human trajectory prediction
    which is able to take advantage of both human-human and human-space
    interactions. The future trajectory of humans, are generated by observing their
    past positions and interactions with the surroundings. To this end, we propose
    a “context-aware” recurrent neural network LSTM model, which can learn and
    predict human motion in crowded spaces such as a sidewalk, a museum or a
    shopping mall. We evaluate our model on a public pedestrian datasets, and we
    contribute a new challenging dataset that collects videos of humans that
    navigate in a (real) crowded space such as a big museum. Results show that our
    approach can predict human trajectories better when compared to previous
    state-of-the-art forecasting models.

    A Study and Comparison of Human and Deep Learning Recognition Performance Under Visual Distortions

    Samuel Dodge, Lina Karam
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep neural networks (DNNs) achieve excellent performance on standard
    classification tasks. However, under image quality distortions such as blur and
    noise, classification accuracy becomes poor. In this work, we compare the
    performance of DNNs with human subjects on distorted images. We show that,
    although DNNs perform better than or on par with humans on good quality images,
    DNN performance is still much lower than human performance on distorted images.
    We additionally find that there is little correlation in errors between DNNs
    and human subjects. This could be an indication that the internal
    representation of images are different between DNNs and the human visual
    system. These comparisons with human performance could be used to guide future
    development of more robust DNNs.

    Image Annotation using Multi-Layer Sparse Coding

    Amara Tariq, Hassan Foroosh
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Automatic annotation of images with descriptive words is a challenging
    problem with vast applications in the areas of image search and retrieval. This
    problem can be viewed as a label-assignment problem by a classifier dealing
    with a very large set of labels, i.e., the vocabulary set. We propose a novel
    annotation method that employs two layers of sparse coding and performs
    coarse-to-fine labeling. Themes extracted from the training data are treated as
    coarse labels. Each theme is a set of training images that share a common
    subject in their visual and textual contents. Our system extracts coarse labels
    for training and test images without requiring any prior knowledge. Vocabulary
    words are the fine labels to be associated with images. Most of the annotation
    methods achieve low recall due to the large number of available fine labels,
    i.e., vocabulary words. These systems also tend to achieve high precision for
    highly frequent words only while relatively rare words are more important for
    search and retrieval purposes. Our system not only outperforms various
    previously proposed annotation systems, but also achieves symmetric response in
    terms of precision and recall. Our system scores and maintains high precision
    for words with a wide range of frequencies. Such behavior is achieved by
    intelligently reducing the number of available fine labels or words for each
    image based on coarse labels assigned to it.

    On human motion prediction using recurrent neural networks

    Julieta Martinez, Michael J. Black, Javier Romero
    Comments: Accepted at CVPR 17
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Human motion modelling is a classical problem at the intersection of graphics
    and computer vision, with applications spanning human-computer interaction,
    motion synthesis, and motion prediction for virtual and augmented reality.
    Following the success of deep learning methods in several computer vision
    tasks, recent work has focused on using deep recurrent neural networks (RNNs)
    to model human motion, with the goal of learning time-dependent representations
    that perform tasks such as short-term motion prediction and long-term human
    motion synthesis. We examine recent work, with a focus on the evaluation
    methodologies commonly used in the literature, and show that, surprisingly,
    state-of-the-art performance can be achieved by a simple baseline that does not
    attempt to model motion at all. We investigate this result, and analyze recent
    RNN methods by looking at the architectures, loss functions, and training
    procedures used in state-of-the-art approaches. We propose three changes to the
    standard RNN models typically used for human motion, which result in a simple
    and scalable RNN architecture that obtains state-of-the-art performance on
    human motion prediction.

    Sparse Representation-based Open Set Recognition

    He Zhang, Vishal M.Patel
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a generalized Sparse Representation- based Classification (SRC)
    algorithm for open set recognition where not all classes presented during
    testing are known during training. The SRC algorithm uses class reconstruction
    errors for classification. As most of the discriminative information for open
    set recognition is hidden in the tail part of the matched and sum of
    non-matched reconstruction error distributions, we model the tail of those two
    error distributions using the statistical Extreme Value Theory (EVT). Then we
    simplify the open set recognition problem into a set of hypothesis testing
    problems. The confidence scores corresponding to the tail distributions of a
    novel test sample are then fused to determine its identity. The effectiveness
    of the proposed method is demonstrated using four publicly available image and
    object classification datasets and it is shown that this method can perform
    significantly better than many competitive open set recognition algorithms.
    Code is public available: this https URL

    Deep Patch Learning for Weakly Supervised Object Classification and Discovery

    Peng Tang, Xinggang Wang, Zilong Huang, Xiang Bai, Wenyu Liu
    Comments: Accepted by Pattern Recognition
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Patch-level image representation is very important for object classification
    and detection, since it is robust to spatial transformation, scale variation,
    and cluttered background. Many existing methods usually require fine-grained
    supervisions (e.g., bounding-box annotations) to learn patch features, which
    requires a great effort to label images may limit their potential applications.
    In this paper, we propose to learn patch features via weak supervisions, i.e.,
    only image-level supervisions. To achieve this goal, we treat images as bags
    and patches as instances to integrate the weakly supervised multiple instance
    learning constraints into deep neural networks. Also, our method integrates the
    traditional multiple stages of weakly supervised object classification and
    discovery into a unified deep convolutional neural network and optimizes the
    network in an end-to-end way. The network processes the two tasks object
    classification and discovery jointly, and shares hierarchical deep features.
    Through this jointly learning strategy, weakly supervised object classification
    and discovery are beneficial to each other. We test the proposed method on the
    challenging PASCAL VOC datasets. The results show that our method can obtain
    state-of-the-art performance on object classification, and very competitive
    results on object discovery, with faster testing speed than competitors.

    Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation

    Guanghan Ning, Zhi Zhang, Zhihai He
    Comments: 13 pages, 12 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Human pose estimation using deep neural networks aims to map input images
    with large variations into multiple body keypoints which must satisfy a set of
    geometric constraints and inter-dependency imposed by the human body model.
    This is a very challenging nonlinear manifold learning process in a very high
    dimensional feature space. We believe that the deep neural network, which is
    inherently an algebraic computation system, is not able to capture highly
    sophisticated human knowledge, for example those highly coupled geometric
    characteristics and interdependence between keypoints in human poses. In this
    work, we propose to explore how external knowledge can be effectively
    represented and injected into the deep neural networks to guide its training
    process using learned projections for more accurate and robust human pose
    estimation. Specifically, we use the stacked hourglass network module and
    inception-resnet to construct a fractal network to regress human pose images
    into heatmaps with no explicit graphical modeling. We encode external knowledge
    with visual features which are able to characterize the constraints of human
    body models and evaluate the fitness of intermediate network output. We then
    inject these external features into the neural network using a projection
    matrix learned using an auxiliary cost function. The effectiveness of the
    proposed inception-resnet module and the benefit in guided learning with
    knowledge projection is evaluated on two widely used human pose estimation
    benchmarks. Our approach achieves state-of-the-art performance on both
    datasets. Code is made publicly available.

    DeepCorrect: Correcting DNN models against Image Distortions

    Tejas Borkar, Lina Karam
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In recent years, the widespread use of deep neural networks (DNNs) has
    facilitated great improvements in performance for computer vision tasks like
    image classification and object recognition. In most realistic computer vision
    applications, an input image undergoes some form of image distortion such as
    blur and additive noise during image acquisition or transmission. Deep networks
    trained on pristine images perform poorly when tested on distorted images
    affected by image blur or additive noise. In this paper, we evaluate the effect
    of image distortions like Gaussian blur and additive noise on the outputs of
    pre-trained convolutional filters. We propose a metric to identify the most
    noise susceptible convolutional filters and rank them in order of the highest
    gain in classification accuracy upon correction. In our proposed approach
    called DeepCorrect, we apply small convolutional filter blocks on top of these
    ranked filters and train them to correct the worst noise and blur affected
    filter outputs. Applying DeepCorrect on the CIFAR-100 dataset, we significantly
    improve the robustness of DNNs against distorted images and also outperform the
    alternative approach of fine-tuning networks.

    Face Detection, Bounding Box Aggregation and Pose Estimation for Robust Facial Landmark Localisation in the Wild

    Zhen-Hua Feng, Josef Kittler, Muhammad Awais, Patrik Huber, Xiao-Jun Wu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a framework for robust face detection and landmark localisation of
    faces in the wild, which has been evaluated as part of `the 2nd Facial Landmark
    Localisation Competition’. The framework has four stages: face detection,
    bounding box aggregation, pose estimation and landmark localisation. To achieve
    a high detection rate, we use two publicly available CNN-based face detectors
    and two proprietary detectors. We aggregate the detected face bounding boxes of
    each input image to reduce false positives and improve face detection accuracy.
    A cascaded shape regressor, trained using faces with a variety of pose
    variations, is then employed for pose estimation and image pre-processing.
    Last, we train the final cascaded shape regressor for fine-grained landmark
    localisation, using a large number of training samples with limited pose
    variations. The experimental results obtained on the 300W and Menpo benchmarks
    demonstrate the superiority of our framework over state-of-the-art methods.

    Cross-label Suppression: A Discriminative and Fast Dictionary Learning with Group Regularization

    Xiudong Wang, Yuantao Gu
    Comments: 36 pages, 12 figures, 11 tables
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    This paper addresses image classification through learning a compact and
    discriminative dictionary efficiently. Given a structured dictionary with each
    atom (columns in the dictionary matrix) related to some label, we propose
    cross-label suppression constraint to enlarge the difference among
    representations for different classes. Meanwhile, we introduce group
    regularization to enforce representations to preserve label properties of
    original samples, meaning the representations for the same class are encouraged
    to be similar. Upon the cross-label suppression, we don’t resort to
    frequently-used (ell_0)-norm or (ell_1)-norm for coding, and obtain
    computational efficiency without losing the discriminative power for
    categorization. Moreover, two simple classification schemes are also developed
    to take full advantage of the learnt dictionary. Extensive experiments on six
    data sets including face recognition, object categorization, scene
    classification, texture recognition and sport action categorization are
    conducted, and the results show that the proposed approach can outperform lots
    of recently presented dictionary algorithms on both recognition accuracy and
    computational efficiency.

    Geometric GAN

    Jae Hyun Lim, Jong Chul Ye
    Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Generative Adversarial Nets (GANs) represent an important milestone for
    effective generative models, which has inspired numerous variants seemingly
    different from each other. One of the main contributions of this paper is to
    reveal a unified geometric structure in GAN and its variants. Specifically, we
    show that the adversarial generative model training can be decomposed into
    three geometric steps: separating hyperplane search, discriminator parameter
    update away from the separating hyperplane, and the generator update along the
    normal vector direction of the separating hyperplane. This geometric intuition
    reveals the limitations of the existing approaches and leads us to propose a
    new formulation called geometric GAN using SVM separating hyperplane that
    maximizes the margin. Our theoretical analysis shows that the geometric GAN
    converges to a Nash equilibrium between the discriminator and generator. In
    addition, extensive numerical results show that the superior performance of
    geometric GAN.

    Multimodal Affect Analysis for Product Feedback Assessment

    Amol S Patwardhan, Gerald M Knapp
    Comments: 10 pages, ISERC 2013, IIE Annual Conference. Proceedings. Institute of Industrial Engineers
    Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Consumers often react expressively to products such as food samples, perfume,
    jewelry, sunglasses, and clothing accessories. This research discusses a
    multimodal affect recognition system developed to classify whether a consumer
    likes or dislikes a product tested at a counter or kiosk, by analyzing the
    consumer’s facial expression, body posture, hand gestures, and voice after
    testing the product. A depth-capable camera and microphone system – Kinect for
    Windows – is utilized. An emotion identification engine has been developed to
    analyze the images and voice to determine affective state of the customer. The
    image is segmented using skin color and adaptive threshold. Face, body and
    hands are detected using the Haar cascade classifier. Canny edges are
    identified and the lip, body and hand contours are extracted using spatial
    filtering. Edge count and orientation around the mouth, cheeks, eyes,
    shoulders, fingers and the location of the edges are used as features.
    Classification is done by an emotion template mapping algorithm and training a
    classifier using support vector machines. The real-time performance, accuracy
    and feasibility for multimodal affect recognition in feedback assessment are
    evaluated.


    Artificial Intelligence

    Safe and Nested Subgame Solving for Imperfect-Information Games

    Noam Brown, Tuomas Sandholm
    Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

    Unlike perfect-information games, imperfect-information games cannot be
    solved by decomposing the game into subgames that are solved independently.
    Instead, all decisions must consider the strategy of the game as a whole, and
    more computationally intensive algorithms are used. While it is not possible to
    solve an imperfect-information game exactly through decomposition, it is
    possible to approximate solutions, or improve existing strategies, by solving
    disjoint subgames. This process is referred to as subgame solving. We introduce
    subgame solving techniques that outperform prior methods both in theory and
    practice. We also show how to adapt them, and past subgame solving techniques,
    to respond to opponent actions that are outside the original action
    abstraction; this significantly outperforms the prior state-of-the-art
    approach, action translation. Finally, we show that subgame solving can be
    repeated as the game progresses down the tree, leading to lower exploitability.
    Subgame solving is a key component of Libratus, the first AI to defeat top
    humans in heads-up no-limit Texas hold’em poker.

    Machine Learning with World Knowledge: The Position and Survey

    Yangqiu Song, Dan Roth
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Machine learning has become pervasive in multiple domains, impacting a wide
    variety of applications, such as knowledge discovery and data mining, natural
    language processing, information retrieval, computer vision, social and health
    informatics, ubiquitous computing, etc. Two essential problems of machine
    learning are how to generate features and how to acquire labels for machines to
    learn. Particularly, labeling large amount of data for each domain-specific
    problem can be very time consuming and costly. It has become a key obstacle in
    making learning protocols realistic in applications. In this paper, we will
    discuss how to use the existing general-purpose world knowledge to enhance
    machine learning processes, by enriching the features or reducing the labeling
    work. We start from the comparison of world knowledge with domain-specific
    knowledge, and then introduce three key problems in using world knowledge in
    learning processes, i.e., explicit and implicit feature representation,
    inference for knowledge linking and disambiguation, and learning with direct or
    indirect supervision. Finally we discuss the future directions of this research
    topic.

    Block-Parallel IDA* for GPUs (Extended Manuscript)

    Satoru Horie, Alex Fukunaga
    Subjects: Artificial Intelligence (cs.AI)

    We investigate GPU-based parallelization of Iterative-Deepening A* (IDA*). We
    show that straightforward thread-based parallelization techniques which were
    previously proposed for massively parallel SIMD processors perform poorly due
    to warp divergence and load imbalance. We propose Block-Parallel IDA* (BPIDA*),
    which assigns the search of a subtree to a block (a group of threads with
    access to fast shared memory) rather than a thread. On the 15-puzzle, BPIDA* on
    a NVIDIA GRID K520 with 1536 CUDA cores achieves a speedup of 4.98 compared to
    a highly optimized sequential IDA* implementation on a Xeon E5-2670 core.

    Continuous Experience-aware Language Model for Recommender Systems using Brownian Motion

    Subhabrata Mukherjee, Stephan Guennemann, Gerhard Weikum
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Online review communities are dynamic as users join and leave, adopt new
    vocabulary, and adapt to evolving trends. Recent work has shown that
    recommender systems benefit from explicit consideration of user experience.
    However, prior work assumes a fixed number of discrete experience levels,
    whereas in reality users gain experience and mature continuously over time.
    This paper presents a new model that captures the continuous evolution of user
    experience, and the resulting language model in reviews and other posts. Our
    model is unsupervised and combines principles of Geometric Brownian Motion,
    Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal
    progression of user experience and language model respectively. We develop
    practical algorithms for estimating the model parameters from data and for
    inference with our model (e.g., to recommend items). Extensive experiments with
    five real-world datasets show that our model not only fits data better than
    discrete-model baselines, but also outperforms state-of-the-art methods for
    predicting item ratings.

    Credible Review Detection with Limited Information using Consistency Analysis

    Subhabrata Mukherjee, Sourav Dutta, Gerhard Weikum
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Online reviews provide viewpoints on the strengths and shortcomings of
    products/services, influencing potential customers’ purchasing decisions.
    However, the proliferation of non-credible reviews — either fake (promoting/
    demoting an item), incompetent (involving irrelevant aspects), or biased —
    entails the problem of identifying credible reviews. Prior works involve
    classifiers harnessing rich information about items/users — which might not be
    readily available in several domains — that provide only limited
    interpretability as to why a review is deemed non-credible. This paper presents
    a novel approach to address the above issues. We utilize latent topic models
    leveraging review texts, item ratings, and timestamps to derive consistency
    features without relying on item/user histories, unavailable for “long-tail”
    items/users. We develop models, for computing review credibility scores to
    provide interpretable evidence for non-credible reviews, that are also
    transferable to other domains — addressing the scarcity of labeled data.
    Experiments on real-world datasets demonstrate improvements over
    state-of-the-art baselines.

    Leveraging Joint Interactions for Credibility Analysis in News Communities with Continuous Conditional Random Field

    Subhabrata Mukherjee, Gerhard Weikum
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Media seems to have become more partisan, often providing a biased coverage
    of news catering to the interest of specific groups. It is therefore essential
    to identify credible information content that provides an objective narrative
    of an event. News communities such as digg, reddit, or newstrust offer
    recommendations, reviews, quality ratings, and further insights on journalistic
    works. However, there is a complex interaction between different factors in
    such online communities: fairness and style of reporting, language clarity and
    objectivity, topical perspectives (like political viewpoint), expertise and
    bias of community members, and more. This paper presents a model to
    systematically analyze the different interactions in a news community between
    users, news, and sources. We develop a probabilistic graphical model that
    leverages this joint interaction to identify 1) highly credible news articles,
    2) trustworthy news sources, and 3) expert users who perform the role of
    “citizen journalists” in the community. Our method extends CRF models to
    incorporate real-valued ratings, as some communities have very fine-grained
    scales that cannot be easily discretized without losing information. To the
    best of our knowledge, this paper is the first full-fledged analysis of
    credibility, trust, and expertise in news communities.

    A New Medical Diagnosis Method Based on Z-Numbers

    Dong Wu, Xiang Liu, Feng Xue, Hanqing Zheng, Yehang Shou, Wen Jiang
    Comments: 24 pages, 9 figures, 13 tables
    Subjects: Artificial Intelligence (cs.AI)

    How to handle uncertainty in medical diagnosis is an open issue. In this
    paper, a new decision making methodology based on Z-numbers is presented.
    Firstly, the experts’ opinions are represented by Z-numbers. Z-number is an
    ordered pair of fuzzy numbers denoted as Z = (A, B). Then, a new method for
    ranking fuzzy numbers is proposed. And based on the proposed fuzzy number
    ranking method, a novel method is presented to transform the Z-numbers into
    Basic Probability Assignment (BPA). As a result, the information from different
    sources is combined by the Dempster’ combination rule. The final decision
    making is more reasonable due to the advantage of information fusion. Finally,
    two experiments, risk analysis and medical diagnosis, are illustrated to show
    the efficiency of the proposed methodology.

    Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

    Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar
    Comments: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    We propose a new reinforcement learning algorithm for partially observable
    Markov decision processes (POMDP) based on spectral decomposition methods.
    While spectral methods have been previously employed for consistent learning of
    (passive) latent variable models such as hidden Markov models, POMDPs are more
    challenging since the learner interacts with the environment and possibly
    changes the future observations in the process. We devise a learning algorithm
    running through epochs, in each epoch we employ spectral techniques to learn
    the POMDP parameters from a trajectory generated by a fixed policy. At the end
    of the epoch, an optimization oracle returns the optimal memoryless planning
    policy which maximizes the expected reward based on the estimated POMDP model.
    We prove an order-optimal regret bound with respect to the optimal memoryless
    policy and efficient scaling with respect to the dimensionality of observation
    and action spaces.

    People on Drugs: Credibility of User Statements in Health Communities

    Subhabrata Mukherjee, Gerhard Weikum, Cristian Danescu-Niculescu-Mizil
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Online health communities are a valuable source of information for patients
    and physicians. However, such user-generated resources are often plagued by
    inaccuracies and misinformation. In this work we propose a method for
    automatically establishing the credibility of user-generated medical statements
    and the trustworthiness of their authors by exploiting linguistic cues and
    distant supervision from expert sources. To this end we introduce a
    probabilistic graphical model that jointly learns user trustworthiness,
    statement credibility, and language objectivity. We apply this methodology to
    the task of extracting rare or unknown side-effects of medical drugs — this
    being one of the problems where large scale non-expert data has the potential
    to complement expert medical knowledge. We show that our method can reliably
    extract side-effects and filter out false statements, while identifying
    trustworthy users that are likely to contribute valuable medical information.

    Item Recommendation with Evolving User Preferences and Experience

    Subhabrata Mukherjee, Hemank Lamba, Gerhard Weikum
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Current recommender systems exploit user and item similarities by
    collaborative filtering. Some advanced methods also consider the temporal
    evolution of item ratings as a global background process. However, all prior
    methods disregard the individual evolution of a user’s experience level and how
    this is expressed in the user’s writing in a review community. In this paper,
    we model the joint evolution of user experience, interest in specific item
    facets, writing style, and rating behavior. This way we can generate individual
    recommendations that take into account the user’s maturity level (e.g.,
    recommending art movies rather than blockbusters for a cinematography expert).
    As only item ratings and review texts are observables, we capture the user’s
    experience and interests in a latent model learned from her reviews, vocabulary
    and writing style. We develop a generative HMM-LDA model to trace user
    evolution, where the Hidden Markov Model (HMM) traces her latent experience
    progressing over time — with solely user reviews and ratings as observables
    over time. The facets of a user’s interest are drawn from a Latent Dirichlet
    Allocation (LDA) model derived from her reviews, as a function of her (again
    latent) experience level. In experiments with five real-world datasets, we show
    that our model improves the rating prediction over state-of-the-art baselines,
    by a substantial margin. We also show, in a use-case study, that our model
    performs well in the assessment of user experience levels.

    Exploring Latent Semantic Factors to Find Useful Product Reviews

    Subhabrata Mukherjee, Kashyap Popat, Gerhard Weikum
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Online reviews provided by consumers are a valuable asset for e-Commerce
    platforms, influencing potential consumers in making purchasing decisions.
    However, these reviews are of varying quality, with the useful ones buried deep
    within a heap of non-informative reviews. In this work, we attempt to
    automatically identify review quality in terms of its helpfulness to the end
    consumers. In contrast to previous works in this domain exploiting a variety of
    syntactic and community-level features, we delve deep into the semantics of
    reviews as to what makes them useful, providing interpretable explanation for
    the same. We identify a set of consistency and semantic factors, all from the
    text, ratings, and timestamps of user-generated reviews, making our approach
    generalizable across all communities and domains. We explore review semantics
    in terms of several latent factors like the expertise of its author, his
    judgment about the fine-grained facets of the underlying product, and his
    writing style. These are cast into a Hidden Markov Model — Latent Dirichlet
    Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii)
    item facets, and (iii) review helpfulness. Large-scale experiments on five
    real-world datasets from Amazon show significant improvement over
    state-of-the-art baselines in predicting and ranking useful reviews.

    Metacognitive Learning Approach for Online Tool Condition Monitoring

    Mahardhika Pratama, Eric Dimla, Chow Yin Lai, Edwin Lughofer
    Subjects: Artificial Intelligence (cs.AI)

    As manufacturing processes become increasingly automated, so should tool
    condition monitoring (TCM) as it is impractical to have human workers monitor
    the state of the tools continuously. Tool condition is crucial to ensure the
    good quality of products: Worn tools affect not only the surface quality but
    also the dimensional accuracy, which means higher reject rate of the products.
    Therefore, there is an urgent need to identify tool failures before it occurs
    on the fly. While various versions of intelligent tool condition monitoring
    have been proposed, most of them suffer from a cognitive nature of traditional
    machine learning algorithms. They focus on the how to learn process without
    paying attention to other two crucial issues: what to learn, and when to learn.
    The what to learn and the when to learn provide self regulating mechanisms to
    select the training samples and to determine time instants to train a model. A
    novel tool condition monitoring approach based on a psychologically plausible
    concept, namely the metacognitive scaffolding theory, is proposed and built
    upon a recently published algorithm, recurrent classifier (rClass). The
    learning process consists of three phases: what to learn, how to learn, when to
    learn and makes use of a generalized recurrent network structure as a cognitive
    component. Experimental studies with real-world manufacturing data streams were
    conducted where rClass demonstrated the highest accuracy while retaining the
    lowest complexity over its counterparts.

    PANFIS++: A Generalized Approach to Evolving Learning

    Mahardhika Pratama
    Subjects: Artificial Intelligence (cs.AI)

    The concept of evolving intelligent system (EIS) provides an effective avenue
    for data stream mining because it is capable of coping with two prominent
    issues: online learning and rapidly changing environments. We note at least
    three uncharted territories of existing EISs: data uncertainty, temporal system
    dynamic, redundant data streams. This book chapter aims at delivering a
    concrete solution of this problem with the algorithmic development of a novel
    learning algorithm, namely PANFIS++. PANFIS++ is a generalized version of the
    PANFIS by putting forward three important components: 1) An online active
    learning scenario is developed to overcome redundant data streams. This module
    allows to actively select data streams for the training process, thereby
    expediting execution time and enhancing generalization performance, 2) PANFIS++
    is built upon an interval type-2 fuzzy system environment, which incorporates
    the so-called footprint of uncertainty. This component provides a degree of
    tolerance for data uncertainty. 3) PANFIS++ is structured under a recurrent
    network architecture with a self-feedback loop. This is meant to tackle the
    temporal system dynamic. The efficacy of the PANFIS++ has been numerically
    validated through numerous real-world and synthetic case studies, where it
    delivers the highest predictive accuracy while retaining the lowest complexity.

    Geometric GAN

    Jae Hyun Lim, Jong Chul Ye
    Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Generative Adversarial Nets (GANs) represent an important milestone for
    effective generative models, which has inspired numerous variants seemingly
    different from each other. One of the main contributions of this paper is to
    reveal a unified geometric structure in GAN and its variants. Specifically, we
    show that the adversarial generative model training can be decomposed into
    three geometric steps: separating hyperplane search, discriminator parameter
    update away from the separating hyperplane, and the generator update along the
    normal vector direction of the separating hyperplane. This geometric intuition
    reveals the limitations of the existing approaches and leads us to propose a
    new formulation called geometric GAN using SVM separating hyperplane that
    maximizes the margin. Our theoretical analysis shows that the geometric GAN
    converges to a Nash equilibrium between the discriminator and generator. In
    addition, extensive numerical results show that the superior performance of
    geometric GAN.

    Scene Text Eraser

    Toshiki Nakamura, Anna Zhu, Keiji Yanai, Seiichi Uchida
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    The character information in natural scene images contains various personal
    information, such as telephone numbers, home addresses, etc. It is a high risk
    of leakage the information if they are published. In this paper, we proposed a
    scene text erasing method to properly hide the information via an inpainting
    convolutional neural network (CNN) model. The input is a scene text image, and
    the output is expected to be text erased image with all the character regions
    filled up the colors of the surrounding background pixels. This work is
    accomplished by a CNN model through convolution to deconvolution with
    interconnection process. The training samples and the corresponding inpainting
    images are considered as teaching signals for training. To evaluate the text
    erasing performance, the output images are detected by a novel scene text
    detection method. Subsequently, the same measurement on text detection is
    utilized for testing the images in benchmark dataset ICDAR2013. Compared with
    direct text detection way, the scene text erasing process demonstrates a
    drastically decrease on the precision, recall and f-score. That proves the
    effectiveness of proposed method for erasing the text in natural scene images.

    Multimodal Affect Analysis for Product Feedback Assessment

    Amol S Patwardhan, Gerald M Knapp
    Comments: 10 pages, ISERC 2013, IIE Annual Conference. Proceedings. Institute of Industrial Engineers
    Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Consumers often react expressively to products such as food samples, perfume,
    jewelry, sunglasses, and clothing accessories. This research discusses a
    multimodal affect recognition system developed to classify whether a consumer
    likes or dislikes a product tested at a counter or kiosk, by analyzing the
    consumer’s facial expression, body posture, hand gestures, and voice after
    testing the product. A depth-capable camera and microphone system – Kinect for
    Windows – is utilized. An emotion identification engine has been developed to
    analyze the images and voice to determine affective state of the customer. The
    image is segmented using skin color and adaptive threshold. Face, body and
    hands are detected using the Haar cascade classifier. Canny edges are
    identified and the lip, body and hand contours are extracted using spatial
    filtering. Edge count and orientation around the mouth, cheeks, eyes,
    shoulders, fingers and the location of the edges are used as features.
    Classification is done by an emotion template mapping algorithm and training a
    classifier using support vector machines. The real-time performance, accuracy
    and feasibility for multimodal affect recognition in feedback assessment are
    evaluated.

    AirDraw: Leveraging Smart Watch Motion Sensors for Mobile Human Computer Interactions

    Seyed A Sajjadi, Danial Moazen, Ani Nahapetian
    Comments: 6 pages, AirDraw, Leveraging Smart Watch Motion Sensors for Mobile Human Computer Interactions : IEEE, CCNC 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)

    Wearable computing is one of the fastest growing technologies today. Smart
    watches are poised to take over at least of half the wearable devices market in
    the near future. Smart watch screen size, however, is a limiting factor for
    growth, as it restricts practical text input. On the other hand, wearable
    devices have some features, such as consistent user interaction and hands-free,
    heads-up operations, which pave the way for gesture recognition methods of text
    entry. This paper proposes a new text input method for smart watches, which
    utilizes motion sensor data and machine learning approaches to detect letters
    written in the air by a user. This method is less computationally intensive and
    less expensive when compared to computer vision approaches. It is also not
    affected by lighting factors, which limit computer vision solutions. The
    AirDraw system prototype developed to test this approach is presented.
    Additionally, experimental results close to 71% accuracy are presented.

    Finding Bottlenecks: Predicting Student Attrition with Unsupervised Classifier

    Seyed Sajjadi, Bruce Shapiro, Christopher McKinlay, Allen Sarkisyan, Carol Shubin, Efunwande Osoba
    Comments: 7 pages, 10 figures, Finding Bottlenecks: Predicting Student Attrition with Unsupervised Classifiers, IEEE, IntelliSys 2017
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Learning (cs.LG); Applications (stat.AP)

    With pressure to increase graduation rates and reduce time to degree in
    higher education, it is important to identify at-risk students early. Automated
    early warning systems are therefore highly desirable. In this paper, we use
    unsupervised clustering techniques to predict the graduation status of declared
    majors in five departments at California State University Northridge (CSUN),
    based on a minimal number of lower division courses in each major. In addition,
    we use the detected clusters to identify hidden bottleneck courses.

    Metacontrol for Adaptive Imagination-Based Optimization

    Jessica B. Hamrick, Andrew J. Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, Peter W. Battaglia
    Comments: Published as a conference paper at ICLR 2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Many machine learning systems are built to solve the hardest examples of a
    particular task, which often makes them large and expensive to run—especially
    with respect to the easier examples, which might require much less computation.
    For an agent with a limited computational budget, this “one-size-fits-all”
    approach may result in the agent wasting valuable computation on easy examples,
    while not spending enough on hard examples. Rather than learning a single,
    fixed policy for solving all instances of a task, we introduce a metacontroller
    which learns to optimize a sequence of “imagined” internal simulations over
    predictive models of the world in order to construct a more informed, and more
    economical, solution. The metacontroller component is a model-free
    reinforcement learning agent, which decides both how many iterations of the
    optimization procedure to run, as well as which model to consult on each
    iteration. The models (which we call “experts”) can be state transition models,
    action-value functions, or any other mechanism that provides information useful
    for solving the task, and can be learned on-policy or off-policy in parallel
    with the metacontroller. When the metacontroller, controller, and experts were
    trained with “interaction networks” (Battaglia et al., 2016) as expert models,
    our approach was able to solve a challenging decision-making problem under
    complex non-linear dynamics. The metacontroller learned to adapt the amount of
    computation it performed to the difficulty of the task, and learned how to
    choose which experts to consult by factoring in both their reliability and
    individual computational resource costs. This allowed the metacontroller to
    achieve a lower overall cost (task loss plus computational cost) than more
    traditional fixed policy approaches. These results demonstrate that our
    approach is a powerful framework for using…

    TrajectoryNet: An Embedded GPS Trajectory Representation for Point-based Classification Using Recurrent Neural Networks

    Xiang Jiang, Erico N de Souza, Ahmad Pesaranghader, Baifan Hu, Daniel L. Silver, Stan Matwin
    Comments: 16 pages, 6 figures, under review at ECML PKDD 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Understanding and discovering knowledge from GPS (Global Positioning System)
    traces of human activities is an essential topic in mobility-based urban
    computing. We propose TrajectoryNet-a neural network architecture for
    point-based trajectory classification to infer real world human transportation
    modes from GPS traces. To overcome the challenge of capturing the underlying
    latent factors in the low-dimensional and heterogeneous feature space imposed
    by GPS data, we develop a novel representation that embeds the original feature
    space into another space that can be understood as a form of basis expansion.
    We also enrich the feature space via segment-based information and use Maxout
    activations to improve the predictive power of Recurrent Neural Networks
    (RNNs). We achieve over 98% classification accuracy when detecting four types
    of transportation modes, outperforming existing models without additional
    sensory data or location-based prior knowledge.

    Analogical Inference for Multi-Relational Embeddings

    Hanxiao Liu, Yuexin Wu, Yiming Yang
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Large-scale multi-relational embedding refers to the task of learning the
    latent representations for entities and relations in large knowledge graphs. An
    effective and scalable solution for this problem is crucial for the true
    success of knowledge-based inference in a broad range of applications. This
    paper proposes a novel framework for optimizing the latent representations with
    respect to the extit{analogical} properties of the embedded entities and
    relations. By formulating the learning objective in a differentiable fashion,
    our model enjoys both theoretical power and computational scalability, and
    significantly outperformed a large number of representative baseline methods on
    benchmark datasets. Furthermore, the model offers an elegant unification of
    several well-known methods in multi-relational embedding, which can be proven
    to be special instantiations of our framework.


    Information Retrieval

    Continuous Experience-aware Language Model for Recommender Systems using Brownian Motion

    Subhabrata Mukherjee, Stephan Guennemann, Gerhard Weikum
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Online review communities are dynamic as users join and leave, adopt new
    vocabulary, and adapt to evolving trends. Recent work has shown that
    recommender systems benefit from explicit consideration of user experience.
    However, prior work assumes a fixed number of discrete experience levels,
    whereas in reality users gain experience and mature continuously over time.
    This paper presents a new model that captures the continuous evolution of user
    experience, and the resulting language model in reviews and other posts. Our
    model is unsupervised and combines principles of Geometric Brownian Motion,
    Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal
    progression of user experience and language model respectively. We develop
    practical algorithms for estimating the model parameters from data and for
    inference with our model (e.g., to recommend items). Extensive experiments with
    five real-world datasets show that our model not only fits data better than
    discrete-model baselines, but also outperforms state-of-the-art methods for
    predicting item ratings.

    Credible Review Detection with Limited Information using Consistency Analysis

    Subhabrata Mukherjee, Sourav Dutta, Gerhard Weikum
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Online reviews provide viewpoints on the strengths and shortcomings of
    products/services, influencing potential customers’ purchasing decisions.
    However, the proliferation of non-credible reviews — either fake (promoting/
    demoting an item), incompetent (involving irrelevant aspects), or biased —
    entails the problem of identifying credible reviews. Prior works involve
    classifiers harnessing rich information about items/users — which might not be
    readily available in several domains — that provide only limited
    interpretability as to why a review is deemed non-credible. This paper presents
    a novel approach to address the above issues. We utilize latent topic models
    leveraging review texts, item ratings, and timestamps to derive consistency
    features without relying on item/user histories, unavailable for “long-tail”
    items/users. We develop models, for computing review credibility scores to
    provide interpretable evidence for non-credible reviews, that are also
    transferable to other domains — addressing the scarcity of labeled data.
    Experiments on real-world datasets demonstrate improvements over
    state-of-the-art baselines.

    Leveraging Joint Interactions for Credibility Analysis in News Communities with Continuous Conditional Random Field

    Subhabrata Mukherjee, Gerhard Weikum
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Media seems to have become more partisan, often providing a biased coverage
    of news catering to the interest of specific groups. It is therefore essential
    to identify credible information content that provides an objective narrative
    of an event. News communities such as digg, reddit, or newstrust offer
    recommendations, reviews, quality ratings, and further insights on journalistic
    works. However, there is a complex interaction between different factors in
    such online communities: fairness and style of reporting, language clarity and
    objectivity, topical perspectives (like political viewpoint), expertise and
    bias of community members, and more. This paper presents a model to
    systematically analyze the different interactions in a news community between
    users, news, and sources. We develop a probabilistic graphical model that
    leverages this joint interaction to identify 1) highly credible news articles,
    2) trustworthy news sources, and 3) expert users who perform the role of
    “citizen journalists” in the community. Our method extends CRF models to
    incorporate real-valued ratings, as some communities have very fine-grained
    scales that cannot be easily discretized without losing information. To the
    best of our knowledge, this paper is the first full-fledged analysis of
    credibility, trust, and expertise in news communities.

    People on Drugs: Credibility of User Statements in Health Communities

    Subhabrata Mukherjee, Gerhard Weikum, Cristian Danescu-Niculescu-Mizil
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Online health communities are a valuable source of information for patients
    and physicians. However, such user-generated resources are often plagued by
    inaccuracies and misinformation. In this work we propose a method for
    automatically establishing the credibility of user-generated medical statements
    and the trustworthiness of their authors by exploiting linguistic cues and
    distant supervision from expert sources. To this end we introduce a
    probabilistic graphical model that jointly learns user trustworthiness,
    statement credibility, and language objectivity. We apply this methodology to
    the task of extracting rare or unknown side-effects of medical drugs — this
    being one of the problems where large scale non-expert data has the potential
    to complement expert medical knowledge. We show that our method can reliably
    extract side-effects and filter out false statements, while identifying
    trustworthy users that are likely to contribute valuable medical information.

    Item Recommendation with Evolving User Preferences and Experience

    Subhabrata Mukherjee, Hemank Lamba, Gerhard Weikum
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Current recommender systems exploit user and item similarities by
    collaborative filtering. Some advanced methods also consider the temporal
    evolution of item ratings as a global background process. However, all prior
    methods disregard the individual evolution of a user’s experience level and how
    this is expressed in the user’s writing in a review community. In this paper,
    we model the joint evolution of user experience, interest in specific item
    facets, writing style, and rating behavior. This way we can generate individual
    recommendations that take into account the user’s maturity level (e.g.,
    recommending art movies rather than blockbusters for a cinematography expert).
    As only item ratings and review texts are observables, we capture the user’s
    experience and interests in a latent model learned from her reviews, vocabulary
    and writing style. We develop a generative HMM-LDA model to trace user
    evolution, where the Hidden Markov Model (HMM) traces her latent experience
    progressing over time — with solely user reviews and ratings as observables
    over time. The facets of a user’s interest are drawn from a Latent Dirichlet
    Allocation (LDA) model derived from her reviews, as a function of her (again
    latent) experience level. In experiments with five real-world datasets, we show
    that our model improves the rating prediction over state-of-the-art baselines,
    by a substantial margin. We also show, in a use-case study, that our model
    performs well in the assessment of user experience levels.

    Exploring Latent Semantic Factors to Find Useful Product Reviews

    Subhabrata Mukherjee, Kashyap Popat, Gerhard Weikum
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Online reviews provided by consumers are a valuable asset for e-Commerce
    platforms, influencing potential consumers in making purchasing decisions.
    However, these reviews are of varying quality, with the useful ones buried deep
    within a heap of non-informative reviews. In this work, we attempt to
    automatically identify review quality in terms of its helpfulness to the end
    consumers. In contrast to previous works in this domain exploiting a variety of
    syntactic and community-level features, we delve deep into the semantics of
    reviews as to what makes them useful, providing interpretable explanation for
    the same. We identify a set of consistency and semantic factors, all from the
    text, ratings, and timestamps of user-generated reviews, making our approach
    generalizable across all communities and domains. We explore review semantics
    in terms of several latent factors like the expertise of its author, his
    judgment about the fine-grained facets of the underlying product, and his
    writing style. These are cast into a Hidden Markov Model — Latent Dirichlet
    Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii)
    item facets, and (iii) review helpfulness. Large-scale experiments on five
    real-world datasets from Amazon show significant improvement over
    state-of-the-art baselines in predicting and ranking useful reviews.


    Computation and Language

    Ontology-Aware Token Embeddings for Prepositional Phrase Attachment

    Pradeep Dasigi, Waleed Ammar, Chris Dyer, Eduard Hovy
    Comments: ACL 2017
    Subjects: Computation and Language (cs.CL)

    Type-level word embeddings use the same set of parameters to represent all
    instances of a word regardless of its context, ignoring the inherent lexical
    ambiguity in language. Instead, we embed semantic concepts (or synsets) as
    defined in WordNet and represent a word token in a particular context by
    estimating a distribution over relevant semantic concepts. We use the new,
    context-sensitive embeddings in a model for predicting prepositional phrase(PP)
    attachments and jointly learn the concept embeddings and model parameters. We
    show that using context-sensitive embeddings improves the accuracy of the PP
    attachment model by 5.4% absolute points, which amounts to a 34.4% relative
    reduction in errors.

    Mnemonic Reader for Machine Comprehension

    Minghao Hu, Yuxing Peng, Xipeng Qiu
    Comments: 9 pages, 4 figures
    Subjects: Computation and Language (cs.CL)

    Recently, several end-to-end neural models have been proposed for machine
    comprehension tasks. Typically, these models use attention mechanisms to
    capture the complicated interaction between the context and the query and then
    point the boundary of answer. To better point the correct answer, we introduce
    the Mnemonic Reader for machine comprehension tasks, which enhance the
    attention reader in two aspects. Firstly, we use a self-alignment attention to
    model the long-distance dependency among context words, and obtain query-aware
    and self-aware contextual representation for each word in the context. Second,
    we use a memory-based query-dependent pointer to predict the answer, which
    integrates both explicit and implicit query information, such as query
    category. Our experimental evaluations show that our model obtains the
    state-of-the-art result on the large-scale machine comprehension benchmarks
    SQuAD.

    Density Estimation for Geolocation via Convolutional Mixture Density Network

    Hayate Iso, Shoko Wakamiya, Eiji Aramaki
    Comments: 8 pages
    Subjects: Computation and Language (cs.CL)

    Nowadays, geographic information related to Twitter is crucially important
    for fine-grained applications. However, the amount of geographic information
    avail- able on Twitter is low, which makes the pursuit of many applications
    challenging. Under such circumstances, estimating the location of a tweet is an
    important goal of the study. Unlike most previous studies that estimate the
    pre-defined district as the classification task, this study employs a
    probability distribution to represent richer information of the tweet, not only
    the location but also its ambiguity. To realize this modeling, we propose the
    convolutional mixture density network (CMDN), which uses text data to estimate
    the mixture model parameters. Experimentally obtained results reveal that CMDN
    achieved the highest prediction performance among the method for predicting the
    exact coordinates. It also provides a quantitative representation of the
    location ambiguity for each tweet that properly works for extracting the
    reliable location estimations.

    Combating Human Trafficking with Deep Multimodal Models

    Edmund Tong, Amir Zadeh, Cara Jones, Louis-Philippe Morency
    Comments: ACL 2017 Long Paper
    Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)

    Human trafficking is a global epidemic affecting millions of people across
    the planet. Sex trafficking, the dominant form of human trafficking, has seen a
    significant rise mostly due to the abundance of escort websites, where human
    traffickers can openly advertise among at-will escort advertisements. In this
    paper, we take a major step in the automatic detection of advertisements
    suspected to pertain to human trafficking. We present a novel dataset called
    Trafficking-10k, with more than 10,000 advertisements annotated for this task.
    The dataset contains two sources of information per advertisement: text and
    images. For the accurate detection of trafficking advertisements, we designed
    and trained a deep multimodal model called the Human Trafficking Deep Network
    (HTDN).

    Generating Memorable Mnemonic Encodings of Numbers

    Vincent Fiorentini, Megan Shao, Julie Medero
    Subjects: Computation and Language (cs.CL)

    The major system is a mnemonic system that can be used to memorize sequences
    of numbers. In this work, we present a method to automatically generate
    sentences that encode a given number. We propose several encoding models and
    compare the most promising ones in a password memorability study. The results
    of the study show that a model combining part-of-speech sentence templates with
    an (n)-gram language model produces the most memorable password
    representations.

    Learning Distributed Representations of Texts and Entities from Knowledge Base

    Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji
    Comments: Accepted at TACL
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    We describe a neural network model that jointly learns distributed
    representations of texts and knowledge base (KB) entities. Given a text in the
    KB, we train our proposed model to predict entities that are relevant to the
    text. Our model is designed to be generic with the ability to address various
    NLP tasks with ease. We train the model using a large corpus of texts and their
    entity annotations extracted from Wikipedia. We evaluated the model on three
    important NLP tasks (i.e., sentence textual similarity, entity linking, and
    factoid question answering) involving both unsupervised and supervised
    settings. As a result, we achieved state-of-the-art results on all three of
    these tasks.

    A Generative Model of a Pronunciation Lexicon for Hindi

    Pramod Pandey, Somnath Roy
    Subjects: Computation and Language (cs.CL)

    Voice browser applications in Text-to- Speech (TTS) and Automatic Speech
    Recognition (ASR) systems crucially depend on a pronunciation lexicon. The
    present paper describes the model of pronunciation lexicon of Hindi developed
    to automatically generate the output forms of Hindi at two levels, the
    <phoneme> and the <PS> (PS, in short for Prosodic Structure). The latter level
    involves both syllable-division and stress placement. The paper describes the
    tool developed for generating the two-level outputs of lexica in Hindi.

    Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting

    Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, Gengshen Fu, Arindam Mandal, Spyros Matsoukas, Nikko Strom, Shiv Vitaladevuni
    Journal-ref: Spoken Language Technology Workshop (SLT), 2016 IEEE (pp.
    474-480). IEEE
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    We propose a max-pooling based loss function for training Long Short-Term
    Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low
    CPU, memory, and latency requirements. The max-pooling loss training can be
    further guided by initializing with a cross-entropy loss trained network. A
    posterior smoothing based evaluation approach is employed to measure keyword
    spotting performance. Our experimental results show that LSTM models trained
    using cross-entropy loss or max-pooling loss outperform a cross-entropy loss
    trained baseline feed-forward Deep Neural Network (DNN). In addition,
    max-pooling loss trained LSTM with randomly initialized network performs better
    compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss
    trained LSTM initialized with a cross-entropy pre-trained network shows the
    best performance, which yields (67.6\%) relative reduction compared to baseline
    feed-forward DNN in Area Under the Curve (AUC) measure.

    On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow

    Markus Borg, Iben Lennerstad, Rasmus Ros, Elizabeth Bjarnason
    Comments: Preprint of paper accepted for the Proc. of the 21st International Conference on Evaluation and Assessment in Software Engineering, 2017
    Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Learning (cs.LG); Software Engineering (cs.SE)

    Abundant data is the key to successful machine learning. However, supervised
    learning requires annotated data that are often hard to obtain. In a
    classification task with limited resources, Active Learning (AL) promises to
    guide annotators to examples that bring the most value for a classifier. AL can
    be successfully combined with self-training, i.e., extending a training set
    with the unlabelled examples for which a classifier is the most certain. We
    report our experiences on using AL in a systematic manner to train an SVM
    classifier for Stack Overflow posts discussing performance of software
    components. We show that the training examples deemed as the most valuable to
    the classifier are also the most difficult for humans to annotate. Despite
    carefully evolved annotation criteria, we report low inter-rater agreement, but
    we also propose mitigation strategies. Finally, based on one annotator’s work,
    we show that self-training can improve the classification accuracy. We conclude
    the paper by discussing implication for future text miners aspiring to use AL
    and self-training.

    Learning Representations of Emotional Speech with Deep Convolutional Generative Adversarial Networks

    Jonathan Chang, Stefan Scherer
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    Automatically assessing emotional valence in human speech has historically
    been a difficult task for machine learning algorithms. The subtle changes in
    the voice of the speaker that are indicative of positive or negative emotional
    states are often “overshadowed” by voice characteristics relating to emotional
    intensity or emotional activation. In this work we explore a representation
    learning approach that automatically derives discriminative representations of
    emotional speech. In particular, we investigate two machine learning strategies
    to improve classifier performance: (1) utilization of unlabeled data using a
    deep convolutional generative adversarial network (DCGAN), and (2) multitask
    learning. Within our extensive experiments we leverage a multitask annotated
    emotional corpus as well as a large unlabeled meeting corpus (around 100
    hours). Our speaker-independent classification experiments show that in
    particular the use of unlabeled data in our investigations improves performance
    of the classifiers and both fully supervised baseline approaches are
    outperformed considerably. We improve the classification of emotional valence
    on a discrete 5-point scale to 43.88% and on a 3-point scale to 49.80%, which
    is competitive to state-of-the-art performance.

    Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

    Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, Antoine Bordes
    Comments: 11 pages, 8 figures
    Subjects: Computation and Language (cs.CL)

    Many modern NLP systems rely on word embeddings, previously trained in an
    unsupervised manner on large corpora, as base features. Efforts to obtain
    embeddings for larger chunks of text, such as sentences, have however not been
    so successful. Several attempts at learning unsupervised representations of
    sentences have not reached satisfactory enough performance to be widely
    adopted. In this paper, we show how universal sentence representations trained
    using the supervised data of the Stanford Natural Language Inference dataset
    can consistently outperform unsupervised methods like SkipThought vectors on a
    wide range of transfer tasks. Much like how computer vision uses ImageNet to
    obtain features, which can then be transferred to other tasks, our work tends
    to indicate the suitability of natural language inference for transfer learning
    to other NLP tasks.

    Continuous Experience-aware Language Model for Recommender Systems using Brownian Motion

    Subhabrata Mukherjee, Stephan Guennemann, Gerhard Weikum
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Online review communities are dynamic as users join and leave, adopt new
    vocabulary, and adapt to evolving trends. Recent work has shown that
    recommender systems benefit from explicit consideration of user experience.
    However, prior work assumes a fixed number of discrete experience levels,
    whereas in reality users gain experience and mature continuously over time.
    This paper presents a new model that captures the continuous evolution of user
    experience, and the resulting language model in reviews and other posts. Our
    model is unsupervised and combines principles of Geometric Brownian Motion,
    Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal
    progression of user experience and language model respectively. We develop
    practical algorithms for estimating the model parameters from data and for
    inference with our model (e.g., to recommend items). Extensive experiments with
    five real-world datasets show that our model not only fits data better than
    discrete-model baselines, but also outperforms state-of-the-art methods for
    predicting item ratings.

    Credible Review Detection with Limited Information using Consistency Analysis

    Subhabrata Mukherjee, Sourav Dutta, Gerhard Weikum
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Online reviews provide viewpoints on the strengths and shortcomings of
    products/services, influencing potential customers’ purchasing decisions.
    However, the proliferation of non-credible reviews — either fake (promoting/
    demoting an item), incompetent (involving irrelevant aspects), or biased —
    entails the problem of identifying credible reviews. Prior works involve
    classifiers harnessing rich information about items/users — which might not be
    readily available in several domains — that provide only limited
    interpretability as to why a review is deemed non-credible. This paper presents
    a novel approach to address the above issues. We utilize latent topic models
    leveraging review texts, item ratings, and timestamps to derive consistency
    features without relying on item/user histories, unavailable for “long-tail”
    items/users. We develop models, for computing review credibility scores to
    provide interpretable evidence for non-credible reviews, that are also
    transferable to other domains — addressing the scarcity of labeled data.
    Experiments on real-world datasets demonstrate improvements over
    state-of-the-art baselines.

    Leveraging Joint Interactions for Credibility Analysis in News Communities with Continuous Conditional Random Field

    Subhabrata Mukherjee, Gerhard Weikum
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Media seems to have become more partisan, often providing a biased coverage
    of news catering to the interest of specific groups. It is therefore essential
    to identify credible information content that provides an objective narrative
    of an event. News communities such as digg, reddit, or newstrust offer
    recommendations, reviews, quality ratings, and further insights on journalistic
    works. However, there is a complex interaction between different factors in
    such online communities: fairness and style of reporting, language clarity and
    objectivity, topical perspectives (like political viewpoint), expertise and
    bias of community members, and more. This paper presents a model to
    systematically analyze the different interactions in a news community between
    users, news, and sources. We develop a probabilistic graphical model that
    leverages this joint interaction to identify 1) highly credible news articles,
    2) trustworthy news sources, and 3) expert users who perform the role of
    “citizen journalists” in the community. Our method extends CRF models to
    incorporate real-valued ratings, as some communities have very fine-grained
    scales that cannot be easily discretized without losing information. To the
    best of our knowledge, this paper is the first full-fledged analysis of
    credibility, trust, and expertise in news communities.

    People on Drugs: Credibility of User Statements in Health Communities

    Subhabrata Mukherjee, Gerhard Weikum, Cristian Danescu-Niculescu-Mizil
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Online health communities are a valuable source of information for patients
    and physicians. However, such user-generated resources are often plagued by
    inaccuracies and misinformation. In this work we propose a method for
    automatically establishing the credibility of user-generated medical statements
    and the trustworthiness of their authors by exploiting linguistic cues and
    distant supervision from expert sources. To this end we introduce a
    probabilistic graphical model that jointly learns user trustworthiness,
    statement credibility, and language objectivity. We apply this methodology to
    the task of extracting rare or unknown side-effects of medical drugs — this
    being one of the problems where large scale non-expert data has the potential
    to complement expert medical knowledge. We show that our method can reliably
    extract side-effects and filter out false statements, while identifying
    trustworthy users that are likely to contribute valuable medical information.

    Item Recommendation with Evolving User Preferences and Experience

    Subhabrata Mukherjee, Hemank Lamba, Gerhard Weikum
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Current recommender systems exploit user and item similarities by
    collaborative filtering. Some advanced methods also consider the temporal
    evolution of item ratings as a global background process. However, all prior
    methods disregard the individual evolution of a user’s experience level and how
    this is expressed in the user’s writing in a review community. In this paper,
    we model the joint evolution of user experience, interest in specific item
    facets, writing style, and rating behavior. This way we can generate individual
    recommendations that take into account the user’s maturity level (e.g.,
    recommending art movies rather than blockbusters for a cinematography expert).
    As only item ratings and review texts are observables, we capture the user’s
    experience and interests in a latent model learned from her reviews, vocabulary
    and writing style. We develop a generative HMM-LDA model to trace user
    evolution, where the Hidden Markov Model (HMM) traces her latent experience
    progressing over time — with solely user reviews and ratings as observables
    over time. The facets of a user’s interest are drawn from a Latent Dirichlet
    Allocation (LDA) model derived from her reviews, as a function of her (again
    latent) experience level. In experiments with five real-world datasets, we show
    that our model improves the rating prediction over state-of-the-art baselines,
    by a substantial margin. We also show, in a use-case study, that our model
    performs well in the assessment of user experience levels.

    Exploring Latent Semantic Factors to Find Useful Product Reviews

    Subhabrata Mukherjee, Kashyap Popat, Gerhard Weikum
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

    Online reviews provided by consumers are a valuable asset for e-Commerce
    platforms, influencing potential consumers in making purchasing decisions.
    However, these reviews are of varying quality, with the useful ones buried deep
    within a heap of non-informative reviews. In this work, we attempt to
    automatically identify review quality in terms of its helpfulness to the end
    consumers. In contrast to previous works in this domain exploiting a variety of
    syntactic and community-level features, we delve deep into the semantics of
    reviews as to what makes them useful, providing interpretable explanation for
    the same. We identify a set of consistency and semantic factors, all from the
    text, ratings, and timestamps of user-generated reviews, making our approach
    generalizable across all communities and domains. We explore review semantics
    in terms of several latent factors like the expertise of its author, his
    judgment about the fine-grained facets of the underlying product, and his
    writing style. These are cast into a Hidden Markov Model — Latent Dirichlet
    Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii)
    item facets, and (iii) review helpfulness. Large-scale experiments on five
    real-world datasets from Amazon show significant improvement over
    state-of-the-art baselines in predicting and ranking useful reviews.

    Analogical Inference for Multi-Relational Embeddings

    Hanxiao Liu, Yuexin Wu, Yiming Yang
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Large-scale multi-relational embedding refers to the task of learning the
    latent representations for entities and relations in large knowledge graphs. An
    effective and scalable solution for this problem is crucial for the true
    success of knowledge-based inference in a broad range of applications. This
    paper proposes a novel framework for optimizing the latent representations with
    respect to the extit{analogical} properties of the embedded entities and
    relations. By formulating the learning objective in a differentiable fashion,
    our model enjoys both theoretical power and computational scalability, and
    significantly outperformed a large number of representative baseline methods on
    benchmark datasets. Furthermore, the model offers an elegant unification of
    several well-known methods in multi-relational embedding, which can be proven
    to be special instantiations of our framework.


    Distributed, Parallel, and Cluster Computing

    TaskUniVerse: A Task-Based Unified Interface for Versatile Parallel Execution

    Afshin Zafari
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Task based parallel programming has shown competitive outcomes in many
    aspects of parallel programming such as efficiency, performance, productivity
    and scalability. Different approaches are used by different software
    development frameworks to provide these outcomes to the programmer, while
    making the underlying hardware architecture transparent to her. However, since
    programs are not portable between these frameworks, using one framework or the
    other is still a vital decision by the programmer whose concerns are
    expandability, adaptivity, maintainability and interoperability of the
    programs. In this work, we propose a unified programming interface that a
    programmer can use for working with different task based parallel frameworks
    transparently. In this approach we abstract the common concepts of task based
    parallel programming and provide them to the programmer in a single programming
    interface uniformly for all frameworks. We have tested the interface by running
    programs which implement matrix operations within frameworks that are optimized
    for shared and distributed memory architectures and accelerators, while the
    cooperation between frameworks is configured externally with no need to modify
    the programs. Further possible extensions of the interface and future potential
    research are also described.

    Lower Bounds for Asymptotic Consensus in Dynamic Networks

    Matthias Függer, Thomas Nowak, Manfred Schwarz
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In this work we study the performance of asymptotic and approximate consensus
    algorithms in dynamic networks. The asymptotic consensus problem requires a set
    of agents to repeatedly set their outputs such that the outputs converge to a
    common value within the convex hull of initial values. This problem, and the
    related approximate consensus problem, are fundamental building blocks in
    distributed systems where exact consensus among agents is not required, e.g.,
    man-made distributed control systems, and have applications in the analysis of
    natural distributed systems, such as flocking and opinion dynamics. We prove
    new nontrivial lower bounds on the contraction rates of asymptotic consensus
    algorithms, from which we deduce lower bounds on the time complexity of
    approximate consensus algorithms. In particular, the obtained bounds show
    optimality of asymptotic and approximate consensus algorithms presented in
    [Charron-Bost et al., ICALP’16] for certain classes of networks that include
    classical failure assumptions, and confine the search for optimal bounds in the
    general case.

    Central to our lower bound proofs is an extended notion of valency, the set
    of reachable limits of an asymptotic consensus algorithm starting from a given
    configuration. We further relate topological properties of valencies to the
    solvability of exact consensus, shedding some light on the relation of these
    three fundamental problems in dynamic networks.

    Proving Correctness of Concurrent Objects by Validating Linearization Points

    Sathya Peri, Muktikanta Sa, Nandini Singhal
    Comments: submitted to DISC 2017 for review
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In the recent years, several concurrent data-structures/objects have been
    proposed. These data-structures allow multiple threads/process to operate on
    them concurrently while maintaining consistency. By allowing multiple threads
    to operate on them simultaneously, these structures strive to increase
    parallelism. These structures typically involve the operating threads applying
    different fine-grained synchronization mechanisms. While these concurrent
    structures with fine-grained synchronization mechanisms certainly improve
    parallelism, proving their correctness has turned out to be a challenging task.
    The well accepted criterion for proving correctness of these concurrent objects
    is Linearizability. Showing that these concurrent structures are linearizable
    is non-trivial in many cases. The standard technique to show correctness
    adopted by several researchers and practitioners in this case is to use
    Linearization Points (LPs) – an atomic event within each method between the
    invocation and response events where the effect of the entire method seems to
    have taken place. In many of these cases, the LPs intuitively prove the
    correctness of the object.

    Without a formal proof, it is not clear if these LPs and hence the concurrent
    structure are indeed correct. In this paper, we present a hand-written
    technique for verifying the correctness of Linearization Points of a class of
    commonly used concurrent data-structures. Although the technique is
    hand-written, we believe that it is applicable to wide variety of concurrent
    data-structures. We demonstrate the efficacy of this technique by showing the
    correctness of two commonly used variants of set data-structure based on
    linked-lists: Hand-over-Hand locking and Lazy List. Further, we hope to extend
    this technique for automatic verification in future.

    Flat Parallelization

    Vitaly Aksenov, Petr Kuznetsov
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    There are two intertwined factors that affect performance of concurrent data
    structures: the ability of processes to access the data in parallel and the
    cost of synchronization. It has been observed that for a large class of
    “concurrency-unfriendly” data structures, fine-grained parallelization does not
    pay off: an implementation based on a single global lock outperforms
    fine-grained solutions. The flat combining paradigm exploits this by ensuring
    that a thread holding the global lock sequentially combines requests and then
    executes the combined requests on behalf of concurrent threads.

    In this paper, we propose a synchronization technique that unites flat
    combining and parallel bulk updates borrowed from parallel algorithms designed
    for the PRAM model. The idea is that the combiner thread assigns waiting
    threads to perform concurrent requests in parallel.

    We foresee the technique to help in implementing efficient
    “concurrency-ambivalent” data structures, which can benefit from both
    parallelism and serialization, depending on the operational context. To
    validate the idea, we considered heap-based implementations of a priority
    queue. These data structures exhibit two important features: concurrent remove
    operations are likely to conflict and thus may benefit from combining, while
    concurrent insert operations can often be at least partly applied in parallel
    thus may benefit from parallel batching. We show that the resulting flat
    parallelization algorithm performs well compared to state-of-the-art priority
    queue implementations.

    Towards Reduced Instruction Sets for Synchronization

    Rati Gelashvili, Idit Keidar, Alexander Spiegelman, Roger Wattenhofer
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Contrary to common belief, a recent work by Ellen, Gelashvili, Shavit, and
    Zhu has shown that computability does not require multicore architectures to
    support “strong” synchronization instructions like compare-and-swap, as opposed
    to combinations of “weaker” instructions like decrement and multiply. However,
    this is the status quo, and in turn, most efficient concurrent data-structures
    heavily rely on compare-and-swap (e.g. for swinging pointers and in general,
    conflict resolution).

    We show that this need not be the case, by designing and implementing a
    concurrent linearizable Log data-structure (also known as a History object),
    supporting two operations: append(item), which appends the item to the log, and
    get-log(), which returns the appended items so far, in order. Readers are
    wait-free and writers are lock-free, and this data-structure can be used in a
    lock-free universal construction to implement any concurrent object with a
    given sequential specification. Our implementation uses atomic read, xor,
    decrement, and fetch-and-increment instructions supported on X86 architectures,
    and provides similar performance to a compare-and-swap-based solution on
    today’s hardware. This raises a fundamental question about minimal set of
    synchronization instructions that the architectures have to support.

    Lightweight Robust Framework for Workload Scheduling in Clouds

    Muhammed Abdulazeez, Pawel Garncarek, Dariusz R. Kowalski, Prudence W.H. Wong
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Reliability, security and stability of cloud services without sacrificing too
    much resources have become a desired feature in the area of workload management
    in clouds. The paper proposes and evaluates a lightweight framework for
    scheduling a workload which part could be unreliable. This unreliability could
    be caused by various types of failures or attacks. Our framework for robust
    workload scheduling efficiently combines classic fault-tolerant and security
    tools, such as packet/job scanning, with workload scheduling, and it does not
    use any heavy resource-consuming tools, e.g., cryptography or non-linear
    optimization. More specifically, the framework uses a novel objective function
    to allocate jobs to servers and constantly decides which job to scan based on a
    formula associated with the objective function. We show how to set up the
    objective function and the corresponding scanning procedure to make the system
    provably stable, provided it satisfies a specific stability condition. As a
    result, we show that our framework assures cloud stability even if naive
    scanning-all and scanning-none strategies are not stable. We extend the
    framework to decentralized scheduling and evaluate it under several popular
    routing procedures.

    Epistemic Model Checking of Atomic Commitment Protocols with Byzantine Failures

    Omar Al-Bataineh
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Logic in Computer Science (cs.LO)

    The notion of knowledge-based program introduced by Halpern and Fagin
    provides a useful formalism for designing, analysing, and optimising
    distributed systems. This paper formulates the two phase commit protocol as a
    knowledge-based program and then an iterative process of model checking and
    counter-example guided refinement is followed to find concrete implementations
    of the program for the case of perfect recall semantic in the Byzantine
    failures context with synchronous reliable communication. We model several
    different kinds of Byzantine failures and verify different strategies to fight
    and mitigate them. We address a number of questions that have not been
    considered in the prior literature, viz., under what circumstances a sender can
    know that its transmission has been successful, and under what circumstances an
    agent can know that the coordinator is cheating, and find concrete answers to
    these questions. The paper describes also a methodology based on
    temporal-epistemic model checking technology that can be followed to verify the
    shortest and longest execution time of a distributed protocol and the scenarios
    that lead to them.

    Teaching Concurrent Software Design: A Case Study Using Android

    Konstantin Läufer, George K. Thiruvathukal
    Comments: Submitted to CDER NSF/IEEE-TCPP Curriculum Initiative on Parallel and Distributed Computing – Core Topics for Undergraduates
    Subjects: Software Engineering (cs.SE); Distributed, Parallel, and Cluster Computing (cs.DC)

    In this article, we explore various parallel and distributed computing topics
    from a user-centric software engineering perspective. Specifically, in the
    context of mobile application development, we study the basic building blocks
    of interactive applications in the form of events, timers, and asynchronous
    activities, along with related software modeling, architecture, and design
    topics.

    Pricing of Tiered cloud storage via two-stage, latency-aware bidding

    Yang Zhang, Arnob Ghosh, Vaneet Aggarwal, Tian Lan, Yu Xiang, Yih-Farn Robin Chen
    Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Computer Science and Game Theory (cs.GT)

    In this paper, we address a major challenge confronting the Cloud Service
    Providers (CSPs) utilizing a tiered storage architecture – how to maximize
    their overall profit over a variety of storage tiers that offer distinct
    characteristics, as well as file placement and access request scheduling
    policies. To this end, we propose a scheme where the CSP offers a two-stage
    auction process for (a) requesting storage capacity, and (b) requesting
    accesses with latency requirements. Our two-stage bidding scheme provides a
    hybrid storage and access optimization framework with the objective of
    maximizing the CSP’s total net profit over four dimensions: file acceptance
    decision, placement of accepted files, file access decision and access request
    scheduling policy. The proposed optimization is a mixed-integer nonlinear
    program that is hard to solve. We propose an efficient heuristic to relax the
    integer optimization and to solve the resulting nonlinear stochastic programs.
    The algorithm is evaluated under different scenarios and with different storage
    system parameters, and insightful numerical results are reported by comparing
    the proposed approach with other profit-maximization models. We see a profit
    increase of over 60\% of our proposed method compared to other schemes in
    certain simulation scenarios.

    Emptiness Problems for Distributed Automata

    Antti Kuusisto, Fabian Reiter
    Comments: 13 pages, 2 figures
    Subjects: Formal Languages and Automata Theory (cs.FL); Distributed, Parallel, and Cluster Computing (cs.DC)

    We investigate the decidability of the emptiness problem for three classes of
    distributed automata. These devices operate on finite directed graphs, acting
    as networks of identical finite-state machines that communicate in an infinite
    sequence of synchronous rounds. The problem is shown to be decidable in
    LogSpace for a class of forgetful automata, where the nodes see the messages
    received from their neighbors but cannot remember their own state. When
    restricted to the appropriate families of graphs, these forgetful automata are
    equivalent to classical finite word automata, but strictly more expressive than
    finite tree automata. On the other hand, we also show that the emptiness
    problem is undecidable in general. This already holds for two heavily
    restricted classes of distributed automata: those that reject immediately if
    they receive more than one message per round, and those whose state diagram
    must be acyclic except for self-loops.


    Learning

    Cross-label Suppression: A Discriminative and Fast Dictionary Learning with Group Regularization

    Xiudong Wang, Yuantao Gu
    Comments: 36 pages, 12 figures, 11 tables
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    This paper addresses image classification through learning a compact and
    discriminative dictionary efficiently. Given a structured dictionary with each
    atom (columns in the dictionary matrix) related to some label, we propose
    cross-label suppression constraint to enlarge the difference among
    representations for different classes. Meanwhile, we introduce group
    regularization to enforce representations to preserve label properties of
    original samples, meaning the representations for the same class are encouraged
    to be similar. Upon the cross-label suppression, we don’t resort to
    frequently-used (ell_0)-norm or (ell_1)-norm for coding, and obtain
    computational efficiency without losing the discriminative power for
    categorization. Moreover, two simple classification schemes are also developed
    to take full advantage of the learnt dictionary. Extensive experiments on six
    data sets including face recognition, object categorization, scene
    classification, texture recognition and sport action categorization are
    conducted, and the results show that the proposed approach can outperform lots
    of recently presented dictionary algorithms on both recognition accuracy and
    computational efficiency.

    Multiple Imputation Using Deep Denoising Autoencoders

    Lovedeep Gondara, Ke Wang
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Missing data is a well-recognized problem impacting all domains. The current
    state-of-the-art framework to minimize missing data bias is multiple
    imputation, for which the choice of an imputation model remains nontrivial. We
    propose an imputation model based on deep denoising autoencoders for multiple
    imputation. Capable of handling different data types, missingness patterns,
    missingness proportions and distributions. Evaluation on real life datasets
    shows our proposed model outperforms the state-of-the-art methods under varying
    conditions and improves the end of the line analytics.

    Spatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Transportation Networks

    Haiyang Yu, Zhihai Wu, Shuqin Wang, Yunpeng Wang, Xiaolei Ma
    Subjects: Learning (cs.LG)

    Predicting large-scale transportation network traffic has become an important
    and challenging topic in recent decades. Inspired by the domain knowledge of
    motion prediction, in which the future motion of an object can be predicted
    based on previous scenes, we propose a network grid representation method that
    can retain the fine-scale structure of a transportation network. Network-wide
    traffic speeds are converted into a series of static images and input into a
    novel deep architecture, namely, spatiotemporal recurrent convolutional
    networks (SRCNs), for traffic forecasting. The proposed SRCNs inherit the
    advantages of deep convolutional neural networks (DCNNs) and long short-term
    memory (LSTM) neural networks. The spatial dependencies of network-wide traffic
    can be captured by DCNNs, and the temporal dynamics can be learned by LSTMs. An
    experiment on a Beijing transportation network with 278 links demonstrates that
    SRCNs outperform other deep learning-based algorithms in both short-term and
    long-term traffic prediction.

    Metacontrol for Adaptive Imagination-Based Optimization

    Jessica B. Hamrick, Andrew J. Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, Peter W. Battaglia
    Comments: Published as a conference paper at ICLR 2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Many machine learning systems are built to solve the hardest examples of a
    particular task, which often makes them large and expensive to run—especially
    with respect to the easier examples, which might require much less computation.
    For an agent with a limited computational budget, this “one-size-fits-all”
    approach may result in the agent wasting valuable computation on easy examples,
    while not spending enough on hard examples. Rather than learning a single,
    fixed policy for solving all instances of a task, we introduce a metacontroller
    which learns to optimize a sequence of “imagined” internal simulations over
    predictive models of the world in order to construct a more informed, and more
    economical, solution. The metacontroller component is a model-free
    reinforcement learning agent, which decides both how many iterations of the
    optimization procedure to run, as well as which model to consult on each
    iteration. The models (which we call “experts”) can be state transition models,
    action-value functions, or any other mechanism that provides information useful
    for solving the task, and can be learned on-policy or off-policy in parallel
    with the metacontroller. When the metacontroller, controller, and experts were
    trained with “interaction networks” (Battaglia et al., 2016) as expert models,
    our approach was able to solve a challenging decision-making problem under
    complex non-linear dynamics. The metacontroller learned to adapt the amount of
    computation it performed to the difficulty of the task, and learned how to
    choose which experts to consult by factoring in both their reliability and
    individual computational resource costs. This allowed the metacontroller to
    achieve a lower overall cost (task loss plus computational cost) than more
    traditional fixed policy approaches. These results demonstrate that our
    approach is a powerful framework for using…

    DropIn: Making Reservoir Computing Neural Networks Robust to Missing Inputs by Dropout

    Davide Bacciu, Francesco Crecchi, Davide Morelli
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    The paper presents a novel, principled approach to train recurrent neural
    networks from the Reservoir Computing family that are robust to missing part of
    the input features at prediction time. By building on the ensembling properties
    of Dropout regularization, we propose a methodology, named DropIn, which
    efficiently trains a neural model as a committee machine of subnetworks, each
    capable of predicting with a subset of the original input features. We discuss
    the application of the DropIn methodology in the context of Reservoir Computing
    models and targeting applications characterized by input sources that are
    unreliable or prone to be disconnected, such as in pervasive wireless sensor
    networks and ambient intelligence. We provide an experimental assessment using
    real-world data from such application domains, showing how the Dropin
    methodology allows to maintain predictive performances comparable to those of a
    model without missing features, even when 20\%-50\% of the inputs are not
    available.

    A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

    Xinyu Zhang, Srinjoy Das, Ojash Neopane, Ken Kreutz-Delgado
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    In recent years deep learning algorithms have shown extremely high
    performance on machine learning tasks such as image classification and speech
    recognition. In support of such applications, various FPGA accelerator
    architectures have been proposed for convolutional neural networks (CNNs) that
    enable high performance for classification tasks at lower power than CPU and
    GPU processors. However, to date, there has been little research on the use of
    FPGA implementations of deconvolutional neural networks (DCNNs). DCNNs, also
    known as generative CNNs, encode high-dimensional probability distributions and
    have been widely used for computer vision applications such as scene
    completion, scene segmentation, image creation, image denoising, and
    super-resolution imaging. We propose an FPGA architecture for deconvolutional
    networks built around an accelerator which effectively handles the complex
    memory access patterns needed to perform strided deconvolutions, and that
    supports convolution as well. We also develop a three-step design optimization
    method that systematically exploits statistical analysis, design space
    exploration and VLSI optimization. To verify our FPGA deconvolutional
    accelerator design methodology we train DCNNs offline on two representative
    datasets using the generative adversarial network method (GAN) run on
    Tensorflow, and then map these DCNNs to an FPGA DCNN-plus-accelerator
    implementation to perform generative inference on a Xilinx Zynq-7000 FPGA. Our
    DCNN implementation achieves a peak performance density of 0.012 GOPs/DSP.

    Learning Discriminative Relational Features for Sequence Labeling

    Naveen Nair, Ajay Nagesh, Ganesh Ramakrishnan
    Comments: 13 pages, technical report
    Subjects: Learning (cs.LG)

    Discovering relational structure between input features in sequence labeling
    models has shown to improve their accuracy in several problem settings.
    However, the search space of relational features is exponential in the number
    of basic input features. Consequently, approaches that learn relational
    features, tend to follow a greedy search strategy. In this paper, we study the
    possibility of optimally learning and applying discriminative relational
    features for sequence labeling. For learning features derived from inputs at a
    particular sequence position, we propose a Hierarchical Kernels-based approach
    (referred to as Hierarchical Kernel Learning for Structured Output Spaces –
    StructHKL). This approach optimally and efficiently explores the hierarchical
    structure of the feature space for problems with structured output spaces such
    as sequence labeling. Since the StructHKL approach has limitations in learning
    complex relational features derived from inputs at relative positions, we
    propose two solutions to learn relational features namely, (i) enumerating
    simple component features of complex relational features and discovering their
    compositions using StructHKL and (ii) leveraging relational kernels, that
    compute the similarity between instances implicitly, in the sequence labeling
    problem. We perform extensive empirical evaluation on publicly available
    datasets and record our observations on settings in which certain approaches
    are effective.

    Face Super-Resolution Through Wasserstein GANs

    Zhimin Chen, Yuguang Tong
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Generative adversarial networks (GANs) have received a tremendous amount of
    attention in the past few years, and have inspired applications addressing a
    wide range of problems. Despite its great potential, GANs are difficult to
    train. Recently, a series of papers (Arjovsky & Bottou, 2017a; Arjovsky et al.
    2017b; and Gulrajani et al. 2017) proposed using Wasserstein distance as the
    training objective and promised easy, stable GAN training across architectures
    with minimal hyperparameter tuning. In this paper, we compare the performance
    of Wasserstein distance with other training objectives on a variety of GAN
    architectures in the context of single image super-resolution. Our results
    agree that Wasserstein GAN with gradient penalty (WGAN-GP) provides stable and
    converging GAN training and that Wasserstein distance is an effective metric to
    gauge training progress.

    Analogical Inference for Multi-Relational Embeddings

    Hanxiao Liu, Yuexin Wu, Yiming Yang
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Large-scale multi-relational embedding refers to the task of learning the
    latent representations for entities and relations in large knowledge graphs. An
    effective and scalable solution for this problem is crucial for the true
    success of knowledge-based inference in a broad range of applications. This
    paper proposes a novel framework for optimizing the latent representations with
    respect to the extit{analogical} properties of the embedded entities and
    relations. By formulating the learning objective in a differentiable fashion,
    our model enjoys both theoretical power and computational scalability, and
    significantly outperformed a large number of representative baseline methods on
    benchmark datasets. Furthermore, the model offers an elegant unification of
    several well-known methods in multi-relational embedding, which can be proven
    to be special instantiations of our framework.

    A comprehensive study of batch construction strategies for recurrent neural networks in MXNet

    Patrick Doetsch, Pavel Golik, Hermann Ney
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    In this work we compare different batch construction methods for mini-batch
    training of recurrent neural networks. While popular implementations like
    TensorFlow and MXNet suggest a bucketing approach to improve the
    parallelization capabilities of the recurrent training process, we propose a
    simple ordering strategy that arranges the training sequences in a stochastic
    alternatingly sorted way. We compare our method to sequence bucketing as well
    as various other batch construction strategies on the CHiME-4 noisy speech
    recognition corpus. The experiments show that our alternated sorting approach
    is able to compete both in training time and recognition performance while
    being conceptually simpler to implement.

    Non-negative Matrix Factorization via Archetypal Analysis

    Hamid Javadi, Andrea Montanari
    Comments: 39 pages; 11 pdf figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Given a collection of data points, non-negative matrix factorization (NMF)
    suggests to express them as convex combinations of a small set of `archetypes’
    with non-negative entries. This decomposition is unique only if the true
    archetypes are non-negative and sufficiently sparse (or the weights are
    sufficiently sparse), a regime that is captured by the separability condition
    and its generalizations.

    In this paper, we study an approach to NMF that can be traced back to the
    work of Cutler and Breiman (1994) and does not require the data to be
    separable, while providing a generally unique decomposition. We optimize the
    trade-off between two objectives: we minimize the distance of the data points
    from the convex envelope of the archetypes (which can be interpreted as an
    empirical risk), while minimizing the distance of the archetypes from the
    convex envelope of the data (which can be interpreted as a data-dependent
    regularization). The archetypal analysis method of (Cutler, Breiman, 1994) is
    recovered as the limiting case in which the last term is given infinite weight.

    We introduce a `uniqueness condition’ on the data which is necessary for
    exactly recovering the archetypes from noiseless data. We prove that, under
    uniqueness (plus additional regularity conditions on the geometry of the
    archetypes), our estimator is robust. While our approach requires solving a
    non-convex optimization problem, we find that standard optimization methods
    succeed in finding good solutions both for real and synthetic data.

    Machine Learning with World Knowledge: The Position and Survey

    Yangqiu Song, Dan Roth
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Machine learning has become pervasive in multiple domains, impacting a wide
    variety of applications, such as knowledge discovery and data mining, natural
    language processing, information retrieval, computer vision, social and health
    informatics, ubiquitous computing, etc. Two essential problems of machine
    learning are how to generate features and how to acquire labels for machines to
    learn. Particularly, labeling large amount of data for each domain-specific
    problem can be very time consuming and costly. It has become a key obstacle in
    making learning protocols realistic in applications. In this paper, we will
    discuss how to use the existing general-purpose world knowledge to enhance
    machine learning processes, by enriching the features or reducing the labeling
    work. We start from the comparison of world knowledge with domain-specific
    knowledge, and then introduce three key problems in using world knowledge in
    learning processes, i.e., explicit and implicit feature representation,
    inference for knowledge linking and disambiguation, and learning with direct or
    indirect supervision. Finally we discuss the future directions of this research
    topic.

    Geometric GAN

    Jae Hyun Lim, Jong Chul Ye
    Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Generative Adversarial Nets (GANs) represent an important milestone for
    effective generative models, which has inspired numerous variants seemingly
    different from each other. One of the main contributions of this paper is to
    reveal a unified geometric structure in GAN and its variants. Specifically, we
    show that the adversarial generative model training can be decomposed into
    three geometric steps: separating hyperplane search, discriminator parameter
    update away from the separating hyperplane, and the generator update along the
    normal vector direction of the separating hyperplane. This geometric intuition
    reveals the limitations of the existing approaches and leads us to propose a
    new formulation called geometric GAN using SVM separating hyperplane that
    maximizes the margin. Our theoretical analysis shows that the geometric GAN
    converges to a Nash equilibrium between the discriminator and generator. In
    addition, extensive numerical results show that the superior performance of
    geometric GAN.

    Geometry and Dynamics for Markov Chain Monte Carlo

    Alessandro Barp, Francois-Xavier Briol, Anthony D. Kennedy, Mark Girolami
    Comments: Submitted to “Annual Review of Statistics and Its Applications”
    Subjects: Computation (stat.CO); Learning (cs.LG); High Energy Physics – Lattice (hep-lat); Numerical Analysis (math.NA); Machine Learning (stat.ML)

    Markov Chain Monte Carlo methods have revolutionised mathematical computation
    and enabled statistical inference within many previously intractable models. In
    this context, Hamiltonian dynamics have been proposed as an efficient way of
    building chains which can explore probability densities efficiently. The method
    emerges from physics and geometry and these links have been extensively studied
    by a series of authors through the last thirty years. However, there is
    currently a gap between the intuitions and knowledge of users of the
    methodology and our deep understanding of these theoretical foundations. The
    aim of this review is to provide a comprehensive introduction to the geometric
    tools used in Hamiltonian Monte Carlo at a level accessible to statisticians,
    machine learners and other users of the methodology with only a basic
    understanding of Monte Carlo methods. This will be complemented with some
    discussion of the most recent advances in the field which we believe will
    become increasingly relevant to applied scientists.

    Multimodal Affect Analysis for Product Feedback Assessment

    Amol S Patwardhan, Gerald M Knapp
    Comments: 10 pages, ISERC 2013, IIE Annual Conference. Proceedings. Institute of Industrial Engineers
    Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Consumers often react expressively to products such as food samples, perfume,
    jewelry, sunglasses, and clothing accessories. This research discusses a
    multimodal affect recognition system developed to classify whether a consumer
    likes or dislikes a product tested at a counter or kiosk, by analyzing the
    consumer’s facial expression, body posture, hand gestures, and voice after
    testing the product. A depth-capable camera and microphone system – Kinect for
    Windows – is utilized. An emotion identification engine has been developed to
    analyze the images and voice to determine affective state of the customer. The
    image is segmented using skin color and adaptive threshold. Face, body and
    hands are detected using the Haar cascade classifier. Canny edges are
    identified and the lip, body and hand contours are extracted using spatial
    filtering. Edge count and orientation around the mouth, cheeks, eyes,
    shoulders, fingers and the location of the edges are used as features.
    Classification is done by an emotion template mapping algorithm and training a
    classifier using support vector machines. The real-time performance, accuracy
    and feasibility for multimodal affect recognition in feedback assessment are
    evaluated.

    Finding Bottlenecks: Predicting Student Attrition with Unsupervised Classifier

    Seyed Sajjadi, Bruce Shapiro, Christopher McKinlay, Allen Sarkisyan, Carol Shubin, Efunwande Osoba
    Comments: 7 pages, 10 figures, Finding Bottlenecks: Predicting Student Attrition with Unsupervised Classifiers, IEEE, IntelliSys 2017
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Learning (cs.LG); Applications (stat.AP)

    With pressure to increase graduation rates and reduce time to degree in
    higher education, it is important to identify at-risk students early. Automated
    early warning systems are therefore highly desirable. In this paper, we use
    unsupervised clustering techniques to predict the graduation status of declared
    majors in five departments at California State University Northridge (CSUN),
    based on a minimal number of lower division courses in each major. In addition,
    we use the detected clusters to identify hidden bottleneck courses.

    TrajectoryNet: An Embedded GPS Trajectory Representation for Point-based Classification Using Recurrent Neural Networks

    Xiang Jiang, Erico N de Souza, Ahmad Pesaranghader, Baifan Hu, Daniel L. Silver, Stan Matwin
    Comments: 16 pages, 6 figures, under review at ECML PKDD 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Understanding and discovering knowledge from GPS (Global Positioning System)
    traces of human activities is an essential topic in mobility-based urban
    computing. We propose TrajectoryNet-a neural network architecture for
    point-based trajectory classification to infer real world human transportation
    modes from GPS traces. To overcome the challenge of capturing the underlying
    latent factors in the low-dimensional and heterogeneous feature space imposed
    by GPS data, we develop a novel representation that embeds the original feature
    space into another space that can be understood as a form of basis expansion.
    We also enrich the feature space via segment-based information and use Maxout
    activations to improve the predictive power of Recurrent Neural Networks
    (RNNs). We achieve over 98% classification accuracy when detecting four types
    of transportation modes, outperforming existing models without additional
    sensory data or location-based prior knowledge.

    Learning of Gaussian Processes in Distributed and Communication Limited Systems

    Mostafa Tavassolipour, Seyed Abolfazl Motahari, Mohammad-Taghi Manzuri Shalmani
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    It is of fundamental importance to find algorithms obtaining optimal
    performance for learning of statistical models in distributed and communication
    limited systems. Aiming at characterizing the optimal strategies, we consider
    learning of Gaussian Processes (GPs) in distributed systems as a pivotal
    example. We first address a very basic problem: how many bits are required to
    estimate the inner-products of Gaussian vectors across distributed machines?
    Using information theoretic bounds, we obtain an optimal solution for the
    problem which is based on vector quantization. Two suboptimal and more
    practical schemes are also presented as substitute for the vector quantization
    scheme. In particular, it is shown that the performance of one of the practical
    schemes which is called per-symbol quantization is very close to the optimal
    one. Schemes provided for the inner-product calculations are incorporated into
    our proposed distributed learning methods for GPs. Experimental results show
    that with spending few bits per symbol in our communication scheme, our
    proposed methods outperform previous zero rate distributed GP learning schemes
    such as Bayesian Committee Model (BCM) and Product of experts (PoE).

    Performance Limits on the Classification of Kronecker-structured Models

    Ishan Jindal, Matthew Nokleby
    Comments: This is an extended version of a paper that appears in Proc. 2017 IEEE International Symposium on Information Theory, Aachen, Germany, June 2017
    Subjects: Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)

    Kronecker-structured (K-S) models recently have been proposed for the
    efficient representation, processing, and classification of multidimensional
    signals such as images and video. Because they are tailored to the
    multi-dimensional structure of the target images, K-S models show improved
    performance in compression and reconstruction over more general (union of)
    subspace models. In this paper, we study the classification performance of
    Kronecker-structured models in two asymptotic regimes. First, we study the
    diversity order, the slope of the error probability as the signal noise power
    goes to zero. We derive an exact expression for the diversity order as a
    function of the signal and subspace dimensions of a K-S model. Next, we study
    the classification capacity, the maximum rate at which the number of classes
    can grow as the signal dimension goes to infinity. We derive upper and lower
    bounds on the prelog factor of the classification capacity. Finally, we
    evaluate the empirical classification performance of K-S models for both the
    synthetic and the real world data, showing that they agree with the diversity
    order analysis.

    Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

    Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar
    Comments: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    We propose a new reinforcement learning algorithm for partially observable
    Markov decision processes (POMDP) based on spectral decomposition methods.
    While spectral methods have been previously employed for consistent learning of
    (passive) latent variable models such as hidden Markov models, POMDPs are more
    challenging since the learner interacts with the environment and possibly
    changes the future observations in the process. We devise a learning algorithm
    running through epochs, in each epoch we employ spectral techniques to learn
    the POMDP parameters from a trajectory generated by a fixed policy. At the end
    of the epoch, an optimization oracle returns the optimal memoryless planning
    policy which maximizes the expected reward based on the estimated POMDP model.
    We prove an order-optimal regret bound with respect to the optimal memoryless
    policy and efficient scaling with respect to the dimensionality of observation
    and action spaces.

    Nonlinear Information Bottleneck

    Artemy Kolchinsky, Brendan D. Tracey, David H. Wolpert
    Subjects: Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)

    Information bottleneck [IB] is a technique for extracting information in some
    ‘input’ random variable that is relevant for predicting some different ‘output’
    random variable. IB works by encoding the input in a compressed ‘bottleneck
    variable’ from which the output can then be accurately decoded. IB can be
    difficult to compute in practice, and has been mainly developed for two limited
    cases: (1) discrete random variables with small state spaces, and (2)
    continuous random variables that are jointly Gaussian distributed (in which
    case the encoding and decoding maps are linear). We propose a method to perform
    IB in more general domains. Our approach can be applied to discrete or
    continuous inputs and outputs, and allows for nonlinear encoding and decoding
    maps. The method uses a novel upper bound on the IB objective, derived using a
    non-parametric estimator of mutual information and a variational approximation.
    We show how to implement the method using neural networks and gradient-based
    optimization, and demonstrate its performance on the MNIST dataset.

    Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting

    Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, Gengshen Fu, Arindam Mandal, Spyros Matsoukas, Nikko Strom, Shiv Vitaladevuni
    Journal-ref: Spoken Language Technology Workshop (SLT), 2016 IEEE (pp.
    474-480). IEEE
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    We propose a max-pooling based loss function for training Long Short-Term
    Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low
    CPU, memory, and latency requirements. The max-pooling loss training can be
    further guided by initializing with a cross-entropy loss trained network. A
    posterior smoothing based evaluation approach is employed to measure keyword
    spotting performance. Our experimental results show that LSTM models trained
    using cross-entropy loss or max-pooling loss outperform a cross-entropy loss
    trained baseline feed-forward Deep Neural Network (DNN). In addition,
    max-pooling loss trained LSTM with randomly initialized network performs better
    compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss
    trained LSTM initialized with a cross-entropy pre-trained network shows the
    best performance, which yields (67.6\%) relative reduction compared to baseline
    feed-forward DNN in Area Under the Curve (AUC) measure.

    On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow

    Markus Borg, Iben Lennerstad, Rasmus Ros, Elizabeth Bjarnason
    Comments: Preprint of paper accepted for the Proc. of the 21st International Conference on Evaluation and Assessment in Software Engineering, 2017
    Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Learning (cs.LG); Software Engineering (cs.SE)

    Abundant data is the key to successful machine learning. However, supervised
    learning requires annotated data that are often hard to obtain. In a
    classification task with limited resources, Active Learning (AL) promises to
    guide annotators to examples that bring the most value for a classifier. AL can
    be successfully combined with self-training, i.e., extending a training set
    with the unlabelled examples for which a classifier is the most certain. We
    report our experiences on using AL in a systematic manner to train an SVM
    classifier for Stack Overflow posts discussing performance of software
    components. We show that the training examples deemed as the most valuable to
    the classifier are also the most difficult for humans to annotate. Despite
    carefully evolved annotation criteria, we report low inter-rater agreement, but
    we also propose mitigation strategies. Finally, based on one annotator’s work,
    we show that self-training can improve the classification accuracy. We conclude
    the paper by discussing implication for future text miners aspiring to use AL
    and self-training.

    Learning Representations of Emotional Speech with Deep Convolutional Generative Adversarial Networks

    Jonathan Chang, Stefan Scherer
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    Automatically assessing emotional valence in human speech has historically
    been a difficult task for machine learning algorithms. The subtle changes in
    the voice of the speaker that are indicative of positive or negative emotional
    states are often “overshadowed” by voice characteristics relating to emotional
    intensity or emotional activation. In this work we explore a representation
    learning approach that automatically derives discriminative representations of
    emotional speech. In particular, we investigate two machine learning strategies
    to improve classifier performance: (1) utilization of unlabeled data using a
    deep convolutional generative adversarial network (DCGAN), and (2) multitask
    learning. Within our extensive experiments we leverage a multitask annotated
    emotional corpus as well as a large unlabeled meeting corpus (around 100
    hours). Our speaker-independent classification experiments show that in
    particular the use of unlabeled data in our investigations improves performance
    of the classifiers and both fully supervised baseline approaches are
    outperformed considerably. We improve the classification of emotional valence
    on a discrete 5-point scale to 43.88% and on a 3-point scale to 49.80%, which
    is competitive to state-of-the-art performance.

    Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner

    Tseng-Hung Chen, Yuan-Hong Liao, Ching-Yao Chuang, Wan-Ting Hsu, Jianlong Fu, Min Sun
    Comments: 10 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Impressive image captioning results are achieved in domains with plenty of
    training image and sentence pairs (e.g., MSCOCO). However, transferring to a
    target domain with significant domain shifts but no paired training data
    (referred to as cross-domain image captioning) remains largely unexplored. We
    propose a novel adversarial training procedure to leverage unpaired data in the
    target domain. Two critic networks are introduced to guide the captioner,
    namely domain critic and multi-modal critic. The domain critic assesses whether
    the generated sentences are indistinguishable from sentences in the target
    domain. The multi-modal critic assesses whether an image and its generated
    sentence are a valid pair. During training, the critics and captioner act as
    adversaries — captioner aims to generate indistinguishable sentences, whereas
    critics aim at distinguishing them. The assessment improves the captioner
    through policy gradient updates. During inference, we further propose a novel
    critic-based planning method to select high-quality sentences without
    additional supervision (e.g., tags). To evaluate, we use MSCOCO as the source
    domain and four other datasets (CUB-200-2011, Oxford-102, TGIF, and Flickr30k)
    as the target domains. Our method consistently performs well on all datasets.
    In particular, on CUB-200-2011, we achieve 21.8% CIDEr-D improvement after
    adaptation. Utilizing critics during inference further gives another 4.5%
    boost.


    Information Theory

    Optimally-Tuned Nonparametric Linear Equalization for Massive MU-MIMO Systems

    Ramina Ghods, Charles Jeon, Gulnar Mirza, Arian Maleki, Christoph Studer
    Comments: Will be presented at the 2017 IEEE International Symposium on Information Theory
    Subjects: Information Theory (cs.IT)

    This paper deals with linear equalization in massive multi-user
    multiple-input multiple-output (MU-MIMO) wireless systems. We first provide
    simple conditions on the antenna configuration for which the well-known linear
    minimum mean-square error (L-MMSE) equalizer provides near-optimal spectral
    efficiency, and we analyze its performance in the presence of parameter
    mismatches in the signal and/or noise powers. We then propose a novel,
    optimally-tuned NOnParametric Equalizer (NOPE) for massive MU-MIMO systems,
    which avoids knowledge of the transmit signal and noise powers altogether. We
    show that NOPE achieves the same performance as that of the L-MMSE equalizer in
    the large-antenna limit, and we demonstrate its efficacy in realistic,
    finite-dimensional systems. From a practical perspective, NOPE is
    computationally efficient and avoids dedicated training that is typically
    required for parameter estimation

    On the Achievable Rates of Decentralized Equalization in Massive MU-MIMO Systems

    Charles Jeon, Kaipeng Li, Joseph R. Cavallaro, Christoph Studer
    Comments: Will be presented at the 2017 IEEE International Symposium on Information Theory
    Subjects: Information Theory (cs.IT)

    Massive multi-user (MU) multiple-input multiple-output (MIMO) promises
    significant gains in spectral efficiency compared to traditional, small-scale
    MIMO technology. Linear equalization algorithms, such as zero forcing (ZF) or
    minimum mean-square error (MMSE)-based methods, typically rely on centralized
    processing at the base station (BS), which results in (i) excessively high
    interconnect and chip input/output data rates, and (ii) high computational
    complexity. In this paper, we investigate the achievable rates of decentralized
    equalization that mitigates both of these issues. We consider two distinct BS
    architectures that partition the antenna array into clusters, each associated
    with independent radio-frequency chains and signal processing hardware, and the
    results of each cluster are fused in a feedforward network. For both
    architectures, we consider ZF, MMSE, and a novel, non-linear equalization
    algorithm that builds upon approximate message passing (AMP), and we
    theoretically analyze the achievable rates of these methods. Our results
    demonstrate that decentralized equalization with our AMP-based methods incurs
    no or only a negligible loss in terms of achievable rates compared to that of
    centralized solutions.

    Energy-Throughput Tradeoff in Sustainable Cloud-RAN with Energy Harvesting

    Zhao Chen, Ziru Chen, Lin X. Cai, Yu Cheng
    Comments: Accepted by ICC 2017
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    In this paper, we investigate joint beamforming for energy-throughput
    tradeoff in a sustainable cloud radio access network system, where multiple
    base stations (BSs) powered by independent renewable energy sources will
    collaboratively transmit wireless information and energy to the data receiver
    and the energy receiver simultaneously. In order to obtain the optimal joint
    beamforming design over a finite time horizon, we formulate an optimization
    problem to maximize the throughput of the data receiver while guaranteeing
    sufficient RF charged energy of the energy receiver. Although such problem is
    non-convex, it can be relaxed into a convex form and upper bounded by the
    optimal value of the relaxed problem. We further prove tightness of the upper
    bound by showing the optimal solution to the relaxed problem is rank one.
    Motivated by the optimal solution, an efficient online algorithm is also
    proposed for practical implementation. Finally, extensive simulations are
    performed to verify the superiority of the proposed joint beamforming strategy
    to other beamforming designs.

    Game Theoretic Dynamic Channel Allocation for Frequency-Selective Interference Channels

    Ilai Bistritz, Amir Leshem
    Subjects: Information Theory (cs.IT)

    We consider the problem of distributed channel allocation in large networks
    under the frequency-selective interference channel. Our goal is to design a
    utility function for a non-cooperative game such that all of its pure Nash
    equilibria have close to optimal global performance. Performance is measured by
    the weighted sum of achievable rates. Such a game formulation is very
    attractive as a basis for a fully distributed channel allocation since it
    requires no communication between users. We propose a novel technique to
    analyze the Nash equilibria of a random interference game, determined by the
    random channel gains. Our analysis is asymptotic in the number of players.
    First we present a natural non-cooperative game where the utility of each
    player is his achievable rate. It is shown that, asymptotically in the number
    of players and for strong enough interference, this game exhibits many bad
    equilibria. Then we propose a novel non-cooperative M Frequency-Selective
    Interference Channel Game (M-FSIG), as a slight modification of the former,
    where the utility of each player is artificially limited. We prove that even
    its worst equilibrium has asymptotically optimal weighted sum-rate for any
    interference regime and even for correlated channels. This is based on an order
    statistics analysis of the fading channels that is valid for a broad class of
    fading distributions (including Rayleigh, Rician, m-Nakagami and more). In
    order to exploit these results algorithmically we propose a modified Fictitious
    Play algorithm that can be implemented distributedly. We carry out simulations
    that show its fast convergence to the proven pure Nash equilibria.

    Covariance Matrix Estimation in Massive MIMO

    David Neumann, Michael Joham, Wolfgang Utschick
    Comments: submitted to IEEE Signal Processing Letters
    Subjects: Information Theory (cs.IT)

    Interference during the uplink training phase significantly deteriorates the
    performance of a massive MIMO system. The impact of the interference can be
    reduced by exploiting second order statistics of the channel vectors, e.g., to
    obtain minimum mean squared error estimates of the channel. In practice, the
    channel covariance matrices have to be estimated. The estimation of the
    covariance matrices is also impeded by the interference during the training
    phase. However, the coherence interval of the covariance matrices is larger
    than that of the channel vectors. This allows us to derive methods for accurate
    covariance matrix estimation by appropriate assignment of pilot sequences to
    users in consecutive channel coherence intervals.

    Ultra Reliable UAV Communication Using Altitude and Cooperation Diversity

    Mohammad Mahdi Azari, Fernando Rosas, Kwang-Cheng Chen, Sofie Pollin
    Subjects: Information Theory (cs.IT); Probability (math.PR)

    The use of unmanned aerial vehicles (UAVs) that serve as aerial base stations
    is expected to become predominant in the next decade. However, in order for
    this technology to unfold its full potential it is necessary to develop a
    fundamental understanding of the distinctive features of air-to-ground (A2G)
    links. As a contribution in this direction, this paper proposes a generic
    framework for the analysis and optimization of the A2G systems. In contrast to
    the existing literature, this framework incorporates both height-dependent path
    loss exponent and small-scale fading, and unifies a widely used
    ground-to-ground channel model with that of A2G for analysis of large-scale
    wireless networks. We derive analytical expressions for the optimal UAV height
    that minimizes the outage probability of a given A2G link. Moreover, our
    framework allows us to derive a height-dependent closed-form expression and a
    tight lower bound for the outage probability of an extit{A2G cooperative
    communication} network. Our results suggest that the optimal location of the
    UAVs with respect to the ground nodes does not change by the inclusion of
    ground relays. This enables interesting insights in the deployment of future
    A2G networks, as the system reliability could be adjusted dynamically by adding
    relaying nodes without requiring changes in the position of the corresponding
    UAVs.

    Finite-Blocklength Bounds on the Maximum Coding Rate of Rician Fading Channels with Applications to Pilot-Assisted Transmission

    Johan Östman, Giuseppe Durisi, Erik G. Ström
    Subjects: Information Theory (cs.IT)

    We present nonasymptotic bounds on the maximum coding rate achievable over a
    Rician block-fading channel for a fixed packet size and a fixed packet error
    probability. Our bounds, which apply to the scenario where no a priori channel
    state information is available at the receiver, allow one to quantify the
    tradeoff between the rate gains resulting from the exploitation of
    time-frequency diversity and the rate loss resulting from fast channel
    variations and pilot-symbol overhead.

    The stochastic interpolation method: A simple scheme to prove replica formulas in Bayesian inference

    Jean Barbier, Nicolas Macris
    Subjects: Information Theory (cs.IT); Disordered Systems and Neural Networks (cond-mat.dis-nn)

    In recent years important progress has been achieved towards proving the
    validity of the replica predictions for the (asymptotic) mutual information (or
    “free energy”) in Bayesian inference problems. The proof techniques that have
    emerged appear to be quite general, despite they have been worked out on a
    case-by-case basis. Unfortunately, a common point between all these schemes is
    their relatively high level of technicality. We present a new proof scheme that
    is quite straightforward with respect to the previous ones. We call it the
    stochastic interpolation method because it can be seen as an extension of the
    interpolation method developped by Guerra and Toninelli in the context of spin
    glasses, with a trial “parameter” which becomes a stochastic process. In order
    to illustrate our method we show how to prove the replica formula for three
    non-trivial inference problems. The first one is symmetric rank-one matrix
    estimation (or factorisation), which is the simplest problem considered here
    and the one for which the method is presented in full details. Then we
    generalize to symmetric tensor estimation and random linear estimation. In
    addition, we show how to prove a tight lower bound for the mutual information
    of non-symmetric tensor estimation. We believe that the present method has a
    much wider range of applicability and also sheds new insights on the reasons
    for the validity of replica formulas in Bayesian inference.

    Network Coherence Time Matters – Aligned Image Sets and the Degrees of Freedom of Interference Networks with Finite Precision CSIT and Perfect CSIR

    Arash Gholami Davoodi, Syed Ali Jafar
    Comments: 19 pages, 4 figures
    Subjects: Information Theory (cs.IT)

    This work obtains the first bound that is provably sensitive to network
    coherence time, i.e., coherence time in an interference network where all
    channels experience the same coherence patterns. This is accomplished by a
    novel adaptation of the aligned image sets bound, and settles various open
    problems noted previously by Naderi and Avestimehr and by Gou et al. For
    example, a necessary and sufficient condition is obtained for the optimality of
    1/2 DoF per user in a partially connected interference network where the
    channel state information at the receivers (CSIR) is perfect, the channel state
    information at the transmitters (CSIT) is instantaneous but limited to finite
    precision, and the network coherence time is T_c= 1. The surprising insight
    that emerges is that even with perfect CSIR and instantaneous finite precision
    CSIT, network coherence time matters, i.e., it has a DoF impact.

    Optimizing Pilot Overhead for Ultra-Reliable Short-Packet Transmission

    Mohammadreza Mousaei, Besma Smida
    Comments: To be published on IEEE ICC 2017 Communication Theory Symposium
    Subjects: Information Theory (cs.IT)

    In this paper we optimize the pilot overhead for ultra-reliable short-packet
    transmission and investigate the dependence of this overhead on packet size and
    error probability. In particular, we consider a point-to-point communication in
    which one sensor sends messages to a central node, or base-station, over AWGN
    with Rayleigh fading channel. We formalize the optimization in terms of
    approximate achievable rates at a given block length, pilot length, and error
    probability. This leads to more accurate pilot overhead optimization.
    Simulation results show that it is important to take into account the packet
    size and the error probability when optimizing the pilot overhead.

    Pathwise continuous time spectrum degeneracy at a single point and weak predictability

    Nikolai Dokuchaev
    Subjects: Information Theory (cs.IT); Functional Analysis (math.FA)

    The paper studies properties of continuous time processes in pathwise
    deterministic setting with spectrum degeneracy at a single point where their
    Fourier transforms vanish. It appears that these processes are predictable is
    some weak sense, meaning that convolution integrals over future time can be
    approximated by integrals over past time. In particular, this means that the
    processes with this degeneracy are uniquely defined by their past values on the
    semi-infinite intervals. Corresponding predictors are obtained. The predictors
    feature some robustness with respect to noise contamination.

    Polarization Shift Keying (PolarSK): System Scheme and Performance Analysis

    Jiliang Zhang, Yang Wang, Jie Zhang, Liqin Ding
    Comments: Submitted to IEEE Transactions on Vehicular Technology at 07th, May, 2017
    Subjects: Information Theory (cs.IT)

    Single-Radio-Frequency (RF) Multiple-Input-Multiple-Output (MIMO) systems
    such as the spatial modulation (SM) system and the space shift keying (SSK)
    system have been proposed to pursue a high spectral efficiency while keeping a
    low cost and complexity transceiver design. Currently, polarization domain
    resource has been introduced to the single-RF MIMO system to reduce the size of
    the transmit antenna array and provide 1 bit per channel use (bpcu)
    multiplexing gain. Nevertheless, the polarization domain resource still has the
    potential to provide a higher multiplexing gain in the polarized single-RF MIMO
    system. In this paper, we propose a generalized polarization shift keying
    (PolarSK) modulation scheme for a SIMO system that uses the polarization states
    in the dual-polarized transmit antenna as an information-bearing unit to
    increase the overall spectral efficiency. At the receive end, the maximum
    likelihood (ML) detector is employed to demodulate the received signal. A
    closed form union upper bound on the average bit error probability (ABEP) of
    PolarSK system with the optimum maximum likelihood (ML) receiver is deduced
    under fading channels. To reduce the computational complexity of the receiver,
    a linear successive interference cancellation (SIC) detection algorithm and a
    sphere-decoding (SD) detection algorithm are proposed. On the basis of analytic
    results and simulations, performances of the proposed PolarSK systems in terms
    of computational complexity and ABEP are analyzed. Numerical results show that
    the proposed PolarSK scheme performs better than state of the art
    dual-polarized/uni-polarized SM schemes.

    Joint Trajectory and Communication Design for Multi-UAV Enabled Wireless Networks

    Qingqing Wu, Yong Zeng, Rui Zhang
    Comments: Submitted for possible journal publication. arXiv admin note: substantial text overlap with arXiv:1704.01765
    Subjects: Information Theory (cs.IT); Dynamical Systems (math.DS); Optimization and Control (math.OC)

    Unmanned aerial vehicles (UAVs) have attracted significant interest recently
    in assisting wireless communication due to their high maneuverability, flexible
    deployment, and low cost. This paper considers a multi-UAV enabled wireless
    communication system, where multiple UAV-mounted aerial base stations (BSs) are
    employed to serve a group of users on the ground. To achieve fair performance
    among users, we maximize the minimum throughput over all ground users in the
    downlink communication by optimizing the multiuser communication scheduling and
    association jointly with the UAVs’ trajectory and power control. The formulated
    problem is a mixed integer non-convex optimization problem that is challenging
    to solve. As such, we propose an efficient iterative algorithm for solving it
    by applying the block coordinate descent and successive convex optimization
    techniques. Specifically, the user scheduling and association, UAV trajectory,
    and transmit power are alternately optimized in each iteration. In particular,
    for the non-convex UAV trajectory and transmit power optimization problems, two
    approximate convex optimization problems are solved, respectively. We further
    show that the proposed algorithm is guaranteed to converge to at least a
    locally optimal solution. To speed up the algorithm convergence and achieve
    good throughput, a low-complexity and systematic initialization scheme is also
    proposed for the UAV trajectory design based on the simple circular trajectory
    and the circle packing scheme. Extensive simulation results are provided to
    demonstrate the significant throughput gains of the proposed design as compared
    to other benchmark schemes.

    On the optimality of some group testing algorithms

    Matthew Aldridge
    Comments: To be presented at ISIT 2017. 5 pages, 2 figures
    Subjects: Information Theory (cs.IT); Probability (math.PR)

    We consider Bernoulli nonadaptive group testing with (k = Theta(n^ heta))
    defectives, for ( heta in (0,1)). The practical definite defectives (DD)
    detection algorithm is known to be optimal for ( heta geq 1/2). We give a new
    upper bound on the rate of DD, showing that DD is strictly suboptimal for
    ( heta < 0.41). We also show that the SCOMP algorithm and algorithms based on
    linear programming achieve a rate at least as high as DD, so in particular are
    also optimal for ( heta geq 1/2).

    On Time-Reversal Imaging by Statistical Testing

    D. Ciuonzo
    Comments: Reduced form accepted in IEEE Signal Processing Letters
    Subjects: Information Theory (cs.IT)

    This letter is focused on the design and analysis of computational wideband
    time-reversal imaging algorithms, designed to be adaptive with respect to the
    noise levels pertaining to the frequencies being employed for scene probing.
    These algorithms are based on the concept of cell-by-cell processing and are
    obtained as theoretically-founded decision statistics for testing the
    hypothesis of single-scatterer presence (absence) at a specific location. These
    statistics are also validated in comparison with the maximal invariant
    statistic for the proposed problem.

    Linear Network Coding for Two-Unicast-Z Networks: A Commutative Algebraic Perspective and Fundamental Limits

    Mohammad Fahim, Viveck Cadambe
    Comments: 25 pages, 5 figures, a short version of this paper is published in the Proceedings of The IEEE International Symposium on Information Theory (ISIT), June 2017
    Subjects: Information Theory (cs.IT)

    We consider a two-unicast-Z network over a directed acyclic graph of unit
    capacitated edges; the two-unicast-Z network is a special case of two-unicast
    networks where one of the destinations has apriori side information of the
    unwanted (interfering) message. In this paper, we settle open questions on the
    limits of network coding for two-unicast-Z networks by showing that the
    generalized network sharing bound is not tight, vector linear codes outperform
    scalar linear codes, and non-linear codes outperform linear codes in general.
    We also develop a commutative algebraic approach to deriving linear network
    coding achievability results, and demonstrate our approach by providing an
    alternate proof to the previous result of Wang et. al. regarding feasibility of
    rate (1,1) in the network.

    Codes for Graph Erasures

    Lev Yohananov, Eitan Yaakobi
    Comments: To appear in IEEE International Symposium on Information Theory
    Subjects: Information Theory (cs.IT)

    Motivated by systems where the information is represented by a graph, such as
    neural networks, associative memories, and distributed systems, we present in
    this work a new class of codes, called codes over graphs. Under this paradigm,
    the information is stored on the edges of an undirected graph, and a code over
    graphs is a set of graphs. A node failure is the event where all edges in the
    neighborhood of the failed node have been erased. We say that a code over
    graphs can tolerate (
    ho) node failures if it can correct the erased edges of
    any (
    ho) failed nodes in the graph. While the construction of such codes can
    be easily accomplished by MDS codes, their field size has to be at least
    (O(n^2)), when (n) is the number of nodes in the graph. In this work we present
    several constructions of codes over graphs with smaller field size. In
    particular, we present optimal codes over graphs correcting two node failures
    over the binary field, when the number of nodes in the graph is a prime number.
    We also present a construction of codes over graphs correcting (
    ho) node
    failures for all (
    ho) over a field of size at least ((n+1)/2-1), and show how
    to improve this construction for optimal codes when (
    ho=2,3).

    Performance Limits on the Classification of Kronecker-structured Models

    Ishan Jindal, Matthew Nokleby
    Comments: This is an extended version of a paper that appears in Proc. 2017 IEEE International Symposium on Information Theory, Aachen, Germany, June 2017
    Subjects: Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)

    Kronecker-structured (K-S) models recently have been proposed for the
    efficient representation, processing, and classification of multidimensional
    signals such as images and video. Because they are tailored to the
    multi-dimensional structure of the target images, K-S models show improved
    performance in compression and reconstruction over more general (union of)
    subspace models. In this paper, we study the classification performance of
    Kronecker-structured models in two asymptotic regimes. First, we study the
    diversity order, the slope of the error probability as the signal noise power
    goes to zero. We derive an exact expression for the diversity order as a
    function of the signal and subspace dimensions of a K-S model. Next, we study
    the classification capacity, the maximum rate at which the number of classes
    can grow as the signal dimension goes to infinity. We derive upper and lower
    bounds on the prelog factor of the classification capacity. Finally, we
    evaluate the empirical classification performance of K-S models for both the
    synthetic and the real world data, showing that they agree with the diversity
    order analysis.

    Status Updates Over Unreliable Multiaccess Channels

    Sanjit K. Kaul, Roy D. Yates
    Subjects: Information Theory (cs.IT)

    Applications like environmental sensing, and health and activity sensing, are
    supported by networks of devices (nodes) that send periodic packet
    transmissions over the wireless channel to a sink node. We look at simple
    abstractions that capture the following commonalities of such networks (a) the
    nodes send periodically sensed information that is temporal and must be
    delivered in a timely manner, (b) they share a multiple access channel and (c)
    channels between the nodes and the sink are unreliable (packets may be received
    in error) and differ in quality.

    We consider scheduled access and slotted ALOHA-like random access. Under
    scheduled access, nodes take turns and get feedback on whether a transmitted
    packet was received successfully by the sink. During its turn, a node may
    transmit more than once to counter channel uncertainty. For slotted ALOHA-like
    access, each node attempts transmission in every slot with a certain
    probability. For these access mechanisms we derive the age of information
    (AoI), which is a timeliness metric, and arrive at conditions that optimize AoI
    at the sink. We also analyze the case of symmetric updating, in which updates
    from different nodes must have the same AoI. We show that ALOHA-like access,
    while simple, leads to AoI that is worse by a factor of about 2e, in comparison
    to scheduled access.

    Tight Lower Bounds on the Contact Distance Distribution in Poisson Hole Process

    Mustafa A. Kishk, Harpreet S. Dhillon
    Comments: To appear in IEEE Wireless Communications Letters
    Subjects: Information Theory (cs.IT)

    In this letter, we derive new lower bounds on the cumulative distribution
    function (CDF) of the contact distance in the Poisson Hole Process (PHP) for
    two cases: (i) reference point is selected uniformly at random from
    (mathbb{R}^2) independently of the PHP, and (ii) reference point is located at
    the center of a hole selected uniformly at random from the PHP. While one can
    derive upper bounds on the CDF of contact distance by simply ignoring the
    effect of holes, deriving lower bounds is known to be relatively more
    challenging. As a part of our proof, we introduce a tractable way of bounding
    the effect of all the holes in a PHP, which can be used to study other
    properties of a PHP as well.

    Millimeter Wave Channel Estimation via Exploiting Joint Sparse and Low-Rank Structures

    Xingjian Li, Jun Fang, Hongbin Li, Pu Wang
    Subjects: Information Theory (cs.IT)

    We consider the problem of channel estimation for millimeter wave (mmWave)
    systems, where, to minimize the hardware complexity and power consumption, an
    analog transmit beamforming and receive combining structure with only one radio
    frequency (RF) chain at the base station (BS) and mobile station (MS) is
    employed. Most existing works for mmWave channel estimation exploit sparse
    scattering characteristics of the channel. In addition to sparsity, mmWave
    channels may exhibit angular spreads over the angle of arrival (AoA), angle of
    departure (AoD), and elevation domains. In this paper, we show that angular
    spreads give rise to a useful low-rank structure that, along with the sparsity,
    can be simultaneously utilized to reduce the sample complexity, i.e. the number
    of samples needed to successfully recover the mmWave channel. Specifically, to
    effectively leverage the joint sparse and low-rank structure, we develop a
    two-stage compressed sensing method for mmWave channel estimation, where the
    sparse and low-rank properties are respectively utilized in two consecutive
    stages, namely, a matrix completion stage and a sparse recovery stage. Our
    theoretical analysis reveals that the proposed two-stage scheme can achieve a
    lower sample complexity than a direct compressed sensing method that exploits
    only the sparse structure of the mmWave channel. Simulation results are
    provided to corroborate our theoretical results and to show the superiority of
    the proposed two-stage method.

    Density Evolution on a Class of Smeared Random Graphs: A Theoretical Framework for Fast MRI

    Kabir Chandrasekher, Orhan Ocal, Kannan Ramchandran
    Subjects: Information Theory (cs.IT)

    We introduce a new ensemble of random bipartite graphs, which we term the
    `smearing ensemble’, where each left node is connected to some number of
    consecutive right nodes. Such graphs arise naturally in the recovery of sparse
    wavelet coefficients when signal acquisition is in the Fourier domain, such as
    in magnetic resonance imaging (MRI). Graphs from this ensemble exhibit small,
    structured cycles with high probability, rendering current techniques for
    determining iterative decoding thresholds inapplicable. In this paper, we
    develop a theoretical platform to analyze and evaluate the effects of
    smearing-based structure. Despite the existence of these small cycles, we
    derive exact density evolution recurrences for iterative decoding on graphs
    with smear-length two. Further, we give lower bounds on the performance of a
    much larger class from the smearing ensemble, and provide numerical experiments
    showing tight agreement between empirical thresholds and those determined by
    our bounds. Finally, we describe a system architecture to recover sparse
    wavelet representations in the MRI setting, giving explicit thresholds on the
    minimum number of Fourier samples needing to be acquired for the (1)-stage Haar
    wavelet setting. In particular, we show that (K)-sparse (1)-stage Haar wavelet
    coefficients of an (n)-dimensional signal can be recovered using (2.63K)
    Fourier domain samples asymptotically using (mathcal{O}(Klog{K})) operations.

    Nonlinear Information Bottleneck

    Artemy Kolchinsky, Brendan D. Tracey, David H. Wolpert
    Subjects: Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)

    Information bottleneck [IB] is a technique for extracting information in some
    ‘input’ random variable that is relevant for predicting some different ‘output’
    random variable. IB works by encoding the input in a compressed ‘bottleneck
    variable’ from which the output can then be accurately decoded. IB can be
    difficult to compute in practice, and has been mainly developed for two limited
    cases: (1) discrete random variables with small state spaces, and (2)
    continuous random variables that are jointly Gaussian distributed (in which
    case the encoding and decoding maps are linear). We propose a method to perform
    IB in more general domains. Our approach can be applied to discrete or
    continuous inputs and outputs, and allows for nonlinear encoding and decoding
    maps. The method uses a novel upper bound on the IB objective, derived using a
    non-parametric estimator of mutual information and a variational approximation.
    We show how to implement the method using neural networks and gradient-based
    optimization, and demonstrate its performance on the MNIST dataset.

    Optimal Power Control and Scheduling for Real-Time and Non-Real-Time Data

    Ahmed Ewaisha, Cihan Tepedelenlioglu
    Comments: Submitted to TVT April 2017. arXiv admin note: substantial text overlap with arXiv:1705.02068
    Subjects: Information Theory (cs.IT)

    We consider a joint scheduling-and-power-allocation problem of a downlink
    cellular system. The system consists of two groups of users: real-time (RT) and
    non-real-time (NRT) users. Given an average power constraint on the base
    station, the problem is to find an algorithm that satisfies the RT hard
    deadline constraint and NRT queue stability constraint. We propose two
    sum-rate-maximizing algorithms that satisfy these constraints as well as
    achieving the system’s capacity region. In both algorithms, the power
    allocation policy has a closed-form expression for the two groups of users.
    However, interestingly, the power policy of the RT users differ in structure
    from that of the NRT users. The first algorithm is optimal for the on-off
    channel model with a polynomial-time scheduling complexity in the number of RT
    users. The second, on the other hand, works for any channel fading model which
    is shown, through simulations, to have an average complexity that is
    close-to-linear. We also show the superiority of the proposed algorithms over
    existing approaches using extensive simulations.

    Transmit Array Interpolation for DOA Estimation via Tensor Decomposition in 2D MIMO Radar

    Ming-Yang Cao, Sergiy A. Vorobyov, Aboulnasr Hassanien
    Comments: 37 pages, 13 figures, Submitted to the IEEE Trans. Signal Processing in December 2016
    Subjects: Information Theory (cs.IT)

    In this paper, we propose a two-dimensional (2D) joint transmit array
    interpolation and beamspace design for planar array mono-static
    multiple-input-multiple-output (MIMO) radar for direction-of-arrival (DOA)
    estimation via tensor modeling. Our underlying idea is to map the transmit
    array to a desired array and suppress the transmit power outside the spatial
    sector of interest. In doing so, the signal-tonoise ratio is improved at the
    receive array. Then, we fold the received data along each dimension into a
    tensorial structure and apply tensor-based methods to obtain DOA estimates. In
    addition, we derive a close-form expression for DOA estimation bias caused by
    interpolation errors and argue for using a specially designed look-up table to
    compensate the bias. The corresponding Cramer-Rao Bound (CRB) is also derived.
    Simulation results are provided to show the performance of the proposed method
    and compare its performance to CRB.

    A Functorial Construction of Quantum Subtheories

    Ivan Contreras, Ali Nabi Duman
    Comments: 19 pages
    Subjects: Mathematical Physics (math-ph); Information Theory (cs.IT); Symplectic Geometry (math.SG)

    We apply the geometric quantization procedure via symplectic groupoids
    proposed by E. Hawkins to the setting of epistemically restricted toy theories
    formalized by Spekkens. In the continuous degrees of freedom, this produces the
    algebraic structure of quadrature quantum subtheories. In the odd-prime finite
    degrees of freedom, we obtain a functor from the Frobenius algebra in
    extbf{Rel} of the toy theories to the Frobenius algebra of stabilizer quantum
    mechanics.

    Directed Information as Privacy Measure in Cloud-based Control

    Takashi Tanaka, Mikael Skoglund, Henrik Sandberg, Karl Henrik Johansson
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)

    We consider cloud-based control scenarios in which clients with local control
    tasks outsource their computational or physical duties to a cloud service
    provider. In order to address privacy concerns in such a control architecture,
    we first investigate the issue of finding an appropriate privacy measure for
    clients who desire to keep local state information as private as possible
    during the control operation. Specifically, we justify the use of Kramer’s
    notion of causally conditioned directed information as a measure of privacy
    loss based on an axiomatic argument. Then we propose a methodology to design an
    optimal “privacy filter” that minimizes privacy loss while a given level of
    control performance is guaranteed. We show in particular that the optimal
    privacy filter for cloud-based Linear-Quadratic-Gaussian (LQG) control can be
    synthesized by a Linear-Matrix-Inequality (LMI) algorithm. The trade-off in the
    design is illustrated by a numerical example.

    Applications of some special numbers obtained from a difference equation of degree three

    Cristina Flaut, Diana Savin
    Subjects: Rings and Algebras (math.RA); Information Theory (cs.IT); Combinatorics (math.CO)

    In this paper we present applications of some special numbers obtained from a
    difference equation of degree three, especially in the Coding Theory. As a
    particular case, we obtain the generalized Pell-Fibonacci-Lucas numbers, which
    were extended to the generalized quaternions.

    Learning of Gaussian Processes in Distributed and Communication Limited Systems

    Mostafa Tavassolipour, Seyed Abolfazl Motahari, Mohammad-Taghi Manzuri Shalmani
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    It is of fundamental importance to find algorithms obtaining optimal
    performance for learning of statistical models in distributed and communication
    limited systems. Aiming at characterizing the optimal strategies, we consider
    learning of Gaussian Processes (GPs) in distributed systems as a pivotal
    example. We first address a very basic problem: how many bits are required to
    estimate the inner-products of Gaussian vectors across distributed machines?
    Using information theoretic bounds, we obtain an optimal solution for the
    problem which is based on vector quantization. Two suboptimal and more
    practical schemes are also presented as substitute for the vector quantization
    scheme. In particular, it is shown that the performance of one of the practical
    schemes which is called per-symbol quantization is very close to the optimal
    one. Schemes provided for the inner-product calculations are incorporated into
    our proposed distributed learning methods for GPs. Experimental results show
    that with spending few bits per symbol in our communication scheme, our
    proposed methods outperform previous zero rate distributed GP learning schemes
    such as Bayesian Committee Model (BCM) and Product of experts (PoE).

    Linearized ADMM for Non-convex Non-smooth Optimization with Convergence Analysis

    Qinghua Liu, Xinyue Shen, Yuantao Gu
    Comments: 31 pages
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)

    Linearized alternating direction method of multipliers (ADMM) as an extension
    of ADMM has been widely used to solve linearly constrained problems in signal
    processing, machine leaning, communications, and many other fields. Despite its
    broad applications in non-convex optimization, for a great number of non-convex
    and non-smooth objective functions, its theoretical convergence guarantee is
    still an open problem. In this paper, we study the convergence of an existing
    two-block linearized ADMM and a newly proposed multi-block parallel linearized
    ADMM for problems with non-convex and non-smooth objectives. Mathematically, we
    present that the algorithms can converge for a broader class of objective
    functions under less strict assumptions compared with previous works. Our
    proposed algorithm can update coupled variables in parallel and work for
    general non-convex problems, where the traditional ADMM may have difficulties
    in solving subproblems.

    Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval

    John C. Duchi, Feng Ruan
    Comments: 49 pages, 9 figures
    Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Optimization and Control (math.OC)

    We develop procedures, based on minimization of the composition (f(x) =
    h(c(x))) of a convex function (h) and smooth function (c), for solving random
    collections of quadratic equalities, applying our methodology to real-valued
    phase retrieval problems. We show that the prox-linear algorithm we develop can
    solve (robust) phase retrieval problems (even with adversarially faulty
    measurements) with high probability under appropriate random measurement models
    as soon as the number of measurements (m) is a constant factor larger than the
    dimension (n) of the signal to be recovered. The algorithm requires essentially
    no tuning—it requires solution of a sequence of convex problems—and it is
    implementable without any particular assumptions on the measurements taken. We
    provide substantial experiments investigating our methods, indicating the
    practical effectiveness of the procedures and showing that they succeed with
    high probability as soon as (m / n ge 2).




沪ICP备19023445号-2号
友情链接