IT博客汇 | arXiv Paper Daily: Tue, 30 May 2017

arXiv Paper Daily: Tue, 30 May 2017

我爱机器学习(52ml.net)发表于 2017-05-30 00:00:00

Neural and Evolutionary Computing

Deep Complex Networks

Chiheb Trabelsi, Olexa Bilaniuk, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, Christopher J Pal
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

At present, the vast majority of building blocks, techniques, and
architectures for deep learning are based on real-valued operations and
representations. However, recent work on recurrent neural networks and older
fundamental theoretical analysis suggests that complex numbers could have a
richer representational capacity and could also facilitate noise-robust memory
retrieval mechanisms. Despite their attractive properties and potential for
opening up entirely new neural architectures, complex-valued deep neural
networks have been marginalized due to the absence of the building blocks
required to design such models. In this work, we provide the key atomic
components for complex-valued deep neural networks and apply them to
convolutional feed-forward networks. More precisely, we rely on complex
convolutions and present algorithms for complex batch-normalization, complex
weight initialization strategies for complex-valued neural nets and we use them
in experiments with end-to-end training schemes. We demonstrate that such
complex-valued models are able to achieve comparable or better performance than
their real-valued counterparts. We test deep complex models on several computer
vision tasks and on music transcription using the MusicNet dataset where we
achieve state of the art performance.

Latent Intention Dialogue Models

Tsung-Hsien Wen, Yishu Miao, Phil Blunsom, Steve Young
Comments: Accepted at ICML 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Developing a dialogue agent that is capable of making autonomous decisions
and communicating by natural language is one of the long-term goals of machine
learning research. Traditional approaches either rely on hand-crafting a small
state-action set for applying reinforcement learning that is not scalable or
constructing deterministic models for learning dialogue sentences that fail to
capture natural conversational variability. In this paper, we propose a Latent
Intention Dialogue Model (LIDM) that employs a discrete latent variable to
learn underlying dialogue intentions in the framework of neural variational
inference. In a goal-oriented dialogue scenario, these latent intentions can be
interpreted as actions guiding the generation of machine responses, which can
be further refined autonomously by reinforcement learning. The experimental
evaluation of LIDM shows that the model out-performs published benchmarks for
both corpus-based and human evaluation, demonstrating the effectiveness of
discrete latent variable models for learning goal-oriented dialogues.

On Multilingual Training of Neural Dependency Parsers

Michał Zapotoczny, Paweł Rychlikowski, Jan Chorowski
Comments: preprint accepted into the TSD2017
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We show that a recently proposed neural dependency parser can be improved by
joint training on multiple languages from the same family. The parser is
implemented as a deep neural network whose only input is orthographic
representations of words. In order to successfully parse, the network has to
discover how linguistically relevant concepts can be inferred from word
spellings. We analyze the representations of characters and words that are
learned by the network to establish which properties of languages were
accounted for. In particular we show that the parser has approximately learned
to associate Latin characters with their Cyrillic counterparts and that it can
group Polish and Russian words that have a similar grammatical function.
Finally, we evaluate the parser on selected languages from the Universal
Dependencies dataset and show that it is competitive with other recently
proposed state-of-the art methods, while having a simple structure.

Implicit Variational Inference with Kernel Density Ratio Fitting

Jiaxin Shi, Shengyang Sun, Jun Zhu
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Recent progress in variational inference has paid much attention to the
flexibility of variational posteriors. Work has been done to use implicit
distributions, i.e., distributions without tractable likelihoods as the
variational posterior. However, existing methods on implicit posteriors still
face challenges of noisy estimation and can hardly scale to high-dimensional
latent variable models. In this paper, we present an implicit variational
inference approach with kernel density ratio fitting that addresses these
challenges. As far as we know, for the first time implicit variational
inference is successfully applied to Bayesian neural networks, which shows
promising results on both regression and classification tasks.

BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet

Haojin Yang, Martin Fritzsche, Christian Bartz, Christoph Meinel
Comments: 4 pages
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Binary Neural Networks (BNNs) can drastically reduce memory size and accesses
by applying bit-wise operations instead of standard arithmetic operations.
Therefore it could significantly improve the efficiency and lower the energy
consumption at runtime, which enables the application of state-of-the-art deep
learning models on low power devices. BMXNet is an open-source BNN library
based on MXNet, which supports both XNOR-Networks and Quantized Neural
Networks. The developed BNN layers can be seamlessly applied with other
standard library components and work in both GPU and CPU mode. BMXNet is
maintained and developed by the multimedia research group at Hasso Plattner
Institute and released under Apache license. Extensive experiments validate the
efficiency and effectiveness of our implementation. The BMXNet library, several
sample projects, and a collection of pre-trained binary deep models are
available for download at this https URL

Computer Vision and Pattern Recognition

Feature Incay for Representation Regularization

Yuhui Yuan, Kuiyuan Yang, Chao Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Softmax loss is widely used in deep neural networks for multi-class
classification, where each class is represented by a weight vector, a sample is
represented as a feature vector, and the feature vector has the largest
projection on the weight vector of the correct category when the model
correctly classifies a sample. To ensure generalization, weight decay that
shrinks the weight norm is often used as regularizer. Different from
traditional learning algorithms where features are fixed and only weights are
tunable, features are also tunable as representation learning in deep learning.
Thus, we propose feature incay to also regularize representation learning,
which favors feature vectors with large norm when the samples can be correctly
classified. With the feature incay, feature vectors are further pushed away
from the origin along the direction of their corresponding weight vectors,
which achieves better inter-class separability. In addition, the proposed
feature incay encourages intra-class compactness along the directions of weight
vectors by increasing the small feature norm faster than the large ones.
Empirical results on MNIST, CIFAR10 and CIFAR100 demonstrate feature incay can
improve the generalization ability.

Pose-Aware Person Recognition

Vijay Kumar, Anoop Namboodiri, Manohar Paluri, C V Jawahar
Comments: To appear in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Person recognition methods that use multiple body regions have shown
significant improvements over traditional face-based recognition. One of the
primary challenges in full-body person recognition is the extreme variation in
pose and view point. In this work, (i) we present an approach that tackles pose
variations utilizing multiple models that are trained on specific poses, and
combined using pose-aware weights during testing. (ii) For learning a person
representation, we propose a network that jointly optimizes a single loss over
multiple body regions. (iii) Finally, we introduce new benchmarks to evaluate
person recognition in diverse scenarios and show significant improvements over
previously proposed approaches on all the benchmarks including the photo album
setting of PIPA.

Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks – Counting, Detection, and Tracking

Di Kang, Zheng Ma, Antoni B. Chan
Comments: 14 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

For crowded scenes, the accuracy of object-based computer vision methods
declines when the images are low-resolution and objects have severe occlusions.
Taking counting methods for example, almost all the recent state-of-the-art
counting methods bypass explicit detection and adopt regression-based methods
to directly count the objects of interest. Among regression-based methods,
density map estimation, where the number of objects inside a subregion is the
integral of the density map over that subregion, is especially promising
because it preserves spatial information, which makes it useful for both
counting and localization (detection and tracking). With the power of deep
convolutional neural networks (CNNs) the counting performance has improved
steadily. The goal of this paper is to evaluate density maps generated by
density estimation methods on a variety of crowd analysis tasks, including
counting, detection, and tracking. Most existing CNN methods produce density
maps with resolution that is smaller than the original images, due to the
downsample strides in the convolution/pooling operations. To produce an
original-resolution density map, we also evaluate a classical CNN that uses a
sliding window regressor to predict the density for every pixel in the image.
We also consider a fully convolutional (FCNN) adaptation, with skip connections
from lower convolutional layers to compensate for loss in spatial information
during upsampling. In our experiments, we found that the lower-resolution
density maps sometimes have better counting performance. In contrast, the
original-resolution density maps improved localization tasks, such as detection
and tracking, compared to bilinear upsampling the lower-resolution density
maps. Finally, we also propose several metrics for measuring the quality of a
density map, and relate them to experiment results on counting and
localization.

On the Power Spectral Density Applied to the Analysis of Old Canvases

Francisco J. Simois, Juan J. Murillo-Fuentes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Spectral Theory (math.SP)

A routine task for art historians is painting diagnostics, such as dating or
attribution. Signal processing of the X-ray image of a canvas provides useful
information about its fabric. However, previous methods may fail when very old
and deteriorated artworks or simply canvases of small size are studied. We
present a new framework to analyze and further characterize the paintings from
their radiographs. First, we start from a general analysis of lattices and
provide new unifying results about the theoretical spectra of weaves. Then, we
use these results to infer the main structure of the fabric, like the type of
weave and the thread densities. We propose a practical estimation of these
theoretical results from paintings with the averaged power spectral density
(PSD), which provides a more robust tool. Furthermore, we found that the PSD
provides a fingerprint that characterizes the whole canvas. We search and
discuss some distinctive features we may find in that fingerprint. We apply
these results to several masterpieces of the 17th and 18th centuries from the
Museo Nacional del Prado to show that this approach yields accurate results in
thread counting and is very useful for paintings comparison, even in situations
where previous methods fail.

Towards Metamerism via Foveated Style Transfer

Arturo Deza, Aditya Jonnalagadda, Miguel Eckstein
Comments: Submitted to NIPS 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Given the recent successes of deep learning applied to style transfer and
texture synthesis, we propose a new theoretical framework to construct visual
metamers: extit{a family of perceptually identical, yet physically different
images}. We review work both in neuroscience related to metameric stimuli, as
well as computer vision research in style transfer. We propose our NeuroFovea
metamer model that is based on a mixture of peripheral representations and
style transfer forward-pass algorithms for emph{any} image from the recent
work of Adaptive Instance Normalization (Huang~&~Belongie). Our model is
parametrized by a VGG-Net versus a set of joint statistics of complex wavelet
coefficients which allows us to encode images in high dimensional space and
interpolate between the content and texture information. We empirically show
that human observers discriminate our metamers at a similar rate as the
metamers of Freeman~&~Simoncelli (FS) In addition, our NeuroFovea metamer
model gives us the benefit of near real-time generation which presents a
( imes1000) speed-up compared to previous work. Critically, psychophysical
studies show that both the FS and NeuroFovea metamers are discriminable from
the original images highlighting an important limitation of current metamer
generation methods.

Ensemble of Part Detectors for Simultaneous Classification and Localization

Xiaopeng Zhang, Hongkai Xiong, Weiyao Lin, Qi Tian
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Part-based representation has been proven to be effective for a variety of
visual applications. However, automatic discovery of discriminative parts
without object/part-level annotations is challenging. This paper proposes a
discriminative mid-level representation paradigm based on the responses of a
collection of part detectors, which only requires the image-level labels.
Towards this goal, we first develop a detector-based spectral clustering method
to mine the representative and discriminative mid-level patterns for detector
initialization. The advantage of the proposed pattern mining technology is that
the distance metric based on detectors only focuses on discriminative details,
and a set of such grouped detectors offer an effective way for consistent
pattern mining. Relying on the discovered patterns, we further formulate the
detector learning process as a confidence-loss sparse Multiple Instance
Learning (cls-MIL) task, which considers the diversity of the positive samples,
while avoid drifting away the well localized ones by assigning a confidence
value to each positive sample. The responses of the learned detectors can form
an effective mid-level image representation for both image classification and
object localization. Experiments conducted on benchmark datasets demonstrate
the superiority of our method over existing approaches.

Data Driven Coded Aperture Design for Depth Recovery

Prasan A Shedligeri, Sreyas Mohan, Kaushik Mitra
Comments: 4 pages, 4 figures, ICIP 2017 conference accepted
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Coded Aperture has been used for recovering all in focus image and a layered
depth map simultaneously using a single captured image. The non trivial task of
finding an optimal code has driven researchers to make some simplifying
assumptions over image distribution which may not hold under all practical
scenarios. In this work we propose a data driven approach to find the optimal
code for depth recovery. We also learn a neural network that can efficiently
recover depth map given the coded aperture photograph. We jointly train a
neural network for code selection as well as for depth estimation. We find that
the code learned by the network performs better in all metrics that have been
proposed previously.

Robust Online Matrix Factorization for Dynamic Background Subtraction

Hongwei Yong, Deyu Meng, Wangmeng Zuo, Lei Zhang
Comments: 14 pages, 13 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose an effective online background subtraction method, which can be
robustly applied to practical videos that have variations in both foreground
and background. Different from previous methods which often model the
foreground as Gaussian or Laplacian distributions, we model the foreground for
each frame with a specific mixture of Gaussians (MoG) distribution, which is
updated online frame by frame. Particularly, our MoG model in each frame is
regularized by the learned foreground/background knowledge in previous frames.
This makes our online MoG model highly robust, stable and adaptive to practical
foreground and background variations. The proposed model can be formulated as a
concise probabilistic MAP model, which can be readily solved by EM algorithm.
We further embed an affine transformation operator into the proposed model,
which can be automatically adjusted to fit a wide range of video background
transformations and make the method more robust to camera movements. With using
the sub-sampling technique, the proposed method can be accelerated to execute
more than 250 frames per second on average, meeting the requirement of
real-time background subtraction for practical video processing tasks. The
superiority of the proposed method is substantiated by extensive experiments
implemented on synthetic and real videos, as compared with state-of-the-art
online and offline background subtraction methods.

Conditional CycleGAN for Attribute Guided Face Image Generation

Yongyi Lu, Yu-Wing Tai, Chi-Keung Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

State-of-the-art techniques in Generative Adversarial Networks (GANs) such as
cycleGAN is able to learn the mapping of one image domain (X) to another image
domain (Y) using unpaired image data. We extend the cycleGAN to ({it
Conditional}) cycleGAN such that the mapping from (X) to (Y) is subjected to
attribute condition (Z). Using face image generation as an application example,
where (X) is a low resolution face image, (Y) is a high resolution face image,
and (Z) is a set of attributes related to facial appearance (e.g. gender, hair
color, smile), we present our method to incorporate (Z) into the network, such
that the hallucinated high resolution face image (Y’) not only satisfies the
low resolution constrain inherent in (X), but also the attribute condition
prescribed by (Z). Using face feature vector extracted from face verification
network as (Z), we demonstrate the efficacy of our approach on
identity-preserving face image super-resolution. Our approach is general and
applicable to high-quality face image generation where specific facial
attributes can be controlled easily in the automatically generated results.

L1-norm Error Function Robustness and Outlier Regularization

Chris Ding, Bo Jiang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In many real-world applications, data come with corruptions, large errors or
outliers. One popular approach is to use L1-norm function. However, the
robustness of L1-norm function is not well understood so far. In this paper, we
present a new outlier regularization framework to understand and analyze the
robustness of L1-norm function. There are two main features for the proposed
outlier regularization. (1) A key property of outlier regularization is that
how far an outlier lies away from its theoretically predicted value does not
affect the final regularization and analysis results. (2) Another important
feature of outlier regularization is that it has an equivalent continuous
representation that closely relates to L1 function. This provides a new way to
understand and analyze the robustness of L1 function. We apply our outlier
regularization framework to PCA and propose an outlier regularized PCA (ORPCA)
model. Comparing to the trace-normbased robust PCA, ORPCA has several benefits:
(1) It does not suffer singular value suppression. (2) It can retain small high
rank components which help retain fine details of data. (3) ORPCA can be
computed more efficiently.

Dilated Residual Networks

Fisher Yu, Vladlen Koltun, Thomas Funkhouser
Comments: Published at the Conference on Computer Vision and Pattern Recognition (CVPR 2017)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Convolutional networks for image classification progressively reduce
resolution until the image is represented by tiny feature maps in which the
spatial structure of the scene is no longer discernible. Such loss of spatial
acuity can limit image classification accuracy and complicate the transfer of
the model to downstream applications that require detailed scene understanding.
These problems can be alleviated by dilation, which increases the resolution of
output feature maps without reducing the receptive field of individual neurons.
We show that dilated residual networks (DRNs) outperform their non-dilated
counterparts in image classification without increasing the model’s depth or
complexity. We then study gridding artifacts introduced by dilation, develop an
approach to removing these artifacts (`degridding’), and show that this further
increases the performance of DRNs. In addition, we show that the accuracy
advantage of DRNs is further magnified in downstream applications such as
object localization and semantic segmentation.

Multi-channel Weighted Nuclear Norm Minimization for Real Color Image Denoising

Jun Xu, Lei Zhang, David Zhang, Xiangchu Feng
Comments: 9 pages, 4 figures, submitted
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Most of the existing denoising algorithms are developed for grayscale images,
while it is not a trivial work to extend them for color image denoising because
the noise statistics in R, G, B channels can be very different for real noisy
images. In this paper, we propose a multi-channel (MC) optimization model for
real color image denoising under the weighted nuclear norm minimization (WNNM)
framework. We concatenate the RGB patches to make use of the channel
redundancy, and introduce a weight matrix to balance the data fidelity of the
three channels in consideration of their different noise statistics. The
proposed MC-WNNM model does not have an analytical solution. We reformulate it
into a linear equality-constrained problem and solve it with the alternating
direction method of multipliers. Each alternative updating step has closed-form
solution and the convergence can be guaranteed. Extensive experiments on both
synthetic and real noisy image datasets demonstrate the superiority of the
proposed MC-WNNM over state-of-the-art denoising methods.

Continuous Video to Simple Signals for Swimming Stroke Detection with Convolutional Neural Networks

Brandon Victor, Zhen He, Stuart Morgan, Dino Miniutti
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In many sports, it is useful to analyse video of an athlete in competition
for training purposes. In swimming, stroke rate is a common metric used by
coaches; requiring a laborious labelling of each individual stroke. We show
that using a Convolutional Neural Network (CNN) we can automatically detect
discrete events in continuous video (in this case, swimming strokes). We create
a CNN that learns a mapping from a window of frames to a point on a smooth 1D
target signal, with peaks denoting the location of a stroke, evaluated as a
sliding window. To our knowledge this process of training and utilizing a CNN
has not been investigated before; either in sports or fundamental computer
vision research. Most research has been focused on action recognition and using
it to classify many clips in continuous video for action localisation.

In this paper we demonstrate our process works well on the task of detecting
swimming strokes in the wild. However, without modifying the model architecture
or training method, the process is also shown to work equally well on detecting
tennis strokes, implying that this is a general process.

The outputs of our system are surprisingly smooth signals that predict an
arbitrary event at least as accurately as humans (manually evaluated from a
sample of negative results). A number of different architectures are evaluated,
pertaining to slightly different problem formulations and signal targets.

Care about you: towards large-scale human-centric visual relationship detection

Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton van den Hengel
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual relationship detection aims to capture interactions between pairs of
objects in images. Relationships between objects and humans represent a
particularly important subset of this problem, with implications for challenges
such as understanding human behaviour, and identifying affordances, amongst
others. In addressing this problem we first construct a large-scale
human-centric visual relationship detection dataset (HCVRD), which provides
many more types of relationship annotation (nearly 10K categories) than the
previous released datasets.

This large label space better reflects the reality of human-object
interactions, but gives rise to a long-tail distribution problem, which in turn
demands a zero-shot approach to labels appearing only in the test set. This is
the first time this issue has been addressed. We propose a webly-supervised
approach to these problems and demonstrate that the proposed model provides a
strong baseline on our HCVRD dataset.

Cross-modal Subspace Learning for Fine-grained Sketch-based Image Retrieval

Peng Xu, Qiyue Yin, Yongye Huang, Yi-Zhe Song, Zhanyu Ma, Liang Wang, Tao Xiang, W. Bastiaan Kleijn, Jun Guo
Comments: Accepted by Neurocomputing
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Sketch-based image retrieval (SBIR) is challenging due to the inherent
domain-gap between sketch and photo. Compared with pixel-perfect depictions of
photos, sketches are iconic renderings of the real world with highly abstract.
Therefore, matching sketch and photo directly using low-level visual clues are
unsufficient, since a common low-level subspace that traverses semantically
across the two modalities is non-trivial to establish. Most existing SBIR
studies do not directly tackle this cross-modal problem. This naturally
motivates us to explore the effectiveness of cross-modal retrieval methods in
SBIR, which have been applied in the image-text matching successfully. In this
paper, we introduce and compare a series of state-of-the-art cross-modal
subspace learning methods and benchmark them on two recently released
fine-grained SBIR datasets. Through thorough examination of the experimental
results, we have demonstrated that the subspace learning can effectively model
the sketch-photo domain-gap. In addition we draw a few key insights to drive
future research.

Vocabulary-informed Extreme Value Learning

Yanwei Fu, HanZe Dong, Yu-feng Ma, Zhengjun Zhang, Xiangyang Xue
Comments: nips 2017 submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST); Machine Learning (stat.ML)

The novel unseen classes can be formulated as the extreme values of known
classes. This inspired the recent works on open-set recognition
cite{Scheirer_2013_TPAMI,Scheirer_2014_TPAMIb,EVM}, which however can have no
way of naming the novel unseen classes. To solve this problem, we propose the
Extreme Value Learning (EVL) formulation to learn the mapping from visual
feature to semantic space. To model the margin and coverage distributions of
each class, the Vocabulary-informed Learning (ViL) is adopted by using vast
open vocabulary in the semantic space. Essentially, by incorporating the EVL
and ViL, we for the first time propose a novel semantic embedding paradigm —
Vocabulary-informed Extreme Value Learning (ViEVL), which embeds the visual
features into semantic space in a probabilistic way. The learned embedding can
be directly used to solve supervised learning, zero-shot and open set
recognition simultaneously. Experiments on two benchmark datasets demonstrate
the effectiveness of proposed frameworks.

Person Depth ReID: Robust Person Re-identification with Commodity Depth Sensors

Nikolaos Karianakis, Zicheng Liu, Yinpeng Chen, Stefano Soatto
Comments: 13 pages, 6 figures, 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This work targets person re-identification (ReID) from depth sensors such as
Kinect. Since depth is invariant to illumination and less sensitive than color
to day-by-day appearance changes, a natural question is whether depth is an
effective modality for Person ReID, especially in scenarios where individuals
wear different colored clothes or over a period of several months. We explore
the use of recurrent Deep Neural Networks for learning high-level shape
information from low-resolution depth images. In order to tackle the small
sample size problem, we introduce regularization and a hard temporal attention
unit. The whole model can be trained end to end with a hybrid supervised loss.
We carry out a thorough experimental evaluation of the proposed method on three
person re-identification datasets, which include side views, views from the top
and sequences with varying degree of partial occlusion, pose and viewpoint
variations. To that end, we introduce a new dataset with RGB-D and skeleton
data. In a scenario where subjects are recorded after three months with new
clothes, we demonstrate large performance gains attained using Depth ReID
compared to a state-of-the-art Color ReID. Finally, we show further
improvements using the temporal attention unit in multi-shot setting.

Probabilistic Global Scale Estimation for MonoSLAM Based on Generic Object Detection

Edgar Sucar, Jean-Bernard Hayet
Comments: Int. Workshop on Visual Odometry, CVPR, (July 2017)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a novel method to estimate the global scale of a 3D
reconstructed model within a Kalman filtering-based monocular SLAM algorithm.
Our Bayesian framework integrates height priors over the detected objects
belonging to a set of broad predefined classes, based on recent advances in
fast generic object detection. Each observation is produced on single frames,
so that we do not need a data association process along video frames. This is
because we associate the height priors with the image region sizes at image
places where map features projections fall within the object detection regions.
We present very promising results of this approach obtained on several
experiments with different object classes.

Abnormality Detection and Localization in Chest X-Rays using Deep Convolutional Neural Networks

Mohammad Tariqul Islam, Md Abdul Aowal, Ahmed Tahseen Minhaz, Khalid Ashraf
Comments: 9 figures, 12 pages, 6 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Chest X-Rays are widely used for diagnosing abnormalities in the heart and
lung area. Automatically detecting these abnormalities with high accuracy could
greatly enhance real world diagnosis processes. Lack of standard publicly
available dataset and benchmark studies, however, makes it difficult to compare
various detection methods. In order to overcome these difficulties, we have
used a publicly available Indiana chest X-Ray dataset and studied the
performance of known deep convolutional network (DCN) architectures on
different abnormalities. We find that the same DCN architecture doesn’t perform
well across all abnormalities. Shallow features or earlier layers consistently
provide higher detection accuracy compared to deep features. We have also found
ensemble models to improve classification significantly compared to single
model. Combining these insight, we report the highest accuracy on chest X-Ray
abnormality detection on this dataset. Our localization experiments using these
trained classifiers show that for spatially spread out abnormalities like
cardiomegaly and pulmonary edema, the network can localize the abnormalities
successfully most of the time. We find that in the cardiomegaly classification
task, the deep learning method improves the accuracy by a staggering 17
percentage point compared to rule based methods. One remarkable result of the
cardiomegaly localization is that the heart and its surrounding region is most
responsible for cardiomegaly detection, in contrast to the rule based models
where the ratio of heart and lung area is used as the measure. We believe that
through deep learning based classification and localization, we will discover
many more interesting features in medical image diagnosis that are not
considered traditionally.

Global hard thresholding algorithms for joint sparse image representation and denoising

Reza Borhani, Jeremy Watt, Aggelos Katsaggelos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Sparse coding of images is traditionally done by cutting them into small
patches and representing each patch individually over some dictionary given a
pre-determined number of nonzero coefficients to use for each patch. In lack of
a way to effectively distribute a total number (or global budget) of nonzero
coefficients across all patches, current sparse recovery algorithms distribute
the global budget equally across all patches despite the wide range of
differences in structural complexity among them. In this work we propose a new
framework for joint sparse representation and recovery of all image patches
simultaneously. We also present two novel global hard thresholding algorithms,
based on the notion of variable splitting, for solving the joint sparse model.
Experimentation using both synthetic and real data shows effectiveness of the
proposed framework for sparse image representation and denoising tasks.
Additionally, time complexity analysis of the proposed algorithms indicate high
scalability of both algorithms, making them favorable to use on large megapixel
images.

Nearest Neighbour Radial Basis Function Solvers for Deep Neural Networks

Benjamin J. Meyer, Ben Harwood, Tom Drummond
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a radial basis function solver for convolutional neural networks
that can be directly applied to both image classification and distance metric
learning problems. Our method treats all training features from a deep neural
network as radial basis function centres and computes loss by summing the
influence of a feature’s nearby centres in the embedding space. Having a radial
basis function centred on each training feature is made scalable by treating it
as an approximate nearest neighbour search problem. End-to-end learning of the
network and solver is carried out, mapping high dimensional features into
clusters of the same class. This results in a well formed embedding space,
where semantically related instances are likely to be located near one another,
regardless of whether or not the network was trained on those classes. The same
loss function is used for both the metric learning and classification problems.
We show that our radial basis function solver sets state-of-the-art embedding
results on the Stanford Cars196 and CUB-200-2011 datasets. Additionally, we
show that when used as a classifier, our method outperforms a conventional
softmax classifier on the Caltech-256 object recognition dataset and the
fine-grained recognition dataset CUB-200-2011.

Deep Matching and Validation Network — An End-to-End Solution to Constrained Image Splicing Localization and Detection

Yue Wu, Wael AbdAlmageed, Prem Natarajan
Comments: 9 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)

Image splicing is a very common image manipulation technique that is
sometimes used for malicious purposes. A splicing detec- tion and localization
algorithm usually takes an input image and produces a binary decision
indicating whether the input image has been manipulated, and also a
segmentation mask that corre- sponds to the spliced region. Most existing
splicing detection and localization pipelines suffer from two main
shortcomings: 1) they use handcrafted features that are not robust against
subsequent processing (e.g., compression), and 2) each stage of the pipeline is
usually optimized independently. In this paper we extend the formulation of the
underlying splicing problem to consider two input images, a query image and a
potential donor image. Here the task is to estimate the probability that the
donor image has been used to splice the query image, and obtain the splicing
masks for both the query and donor images. We introduce a novel deep
convolutional neural network architecture, called Deep Matching and Validation
Network (DMVN), which simultaneously localizes and detects image splicing. The
proposed approach does not depend on handcrafted features and uses raw input
images to create deep learned representations. Furthermore, the DMVN is
end-to-end op- timized to produce the probability estimates and the
segmentation masks. Our extensive experiments demonstrate that this approach
outperforms state-of-the-art splicing detection methods by a large margin in
terms of both AUC score and speed.

CASENet: Deep Category-Aware Semantic Edge Detection

Zhiding Yu, Chen Feng, Ming-Yu Liu, Srikumar Ramalingam
Comments: Accepted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Boundary and edge cues are highly beneficial in improving a wide variety of
vision tasks such as semantic segmentation, object recognition, stereo, and
object proposal generation. Recently, the problem of edge detection has been
revisited and significant progress has been made with deep learning. While
classical edge detection is a challenging binary problem in itself, the
category-aware semantic edge detection by nature is an even more challenging
multi-label problem. We model the problem such that each edge pixel can be
associated with more than one class as they appear in contours or junctions
belonging to two or more semantic classes. To this end, we propose a novel
end-to-end deep semantic edge learning architecture based on ResNet and a new
skip-layer architecture where category-wise edge activations at the top
convolution layer share and are fused with the same set of bottom layer
features. We then propose a multi-label loss function to supervise the fused
activations. We show that our proposed architecture benefits this problem with
better performance, and we outperform the current state-of-the-art semantic
edge detection methods by a large margin on standard data sets such as SBD and
Cityscapes.

Direct Estimation of Regional Wall Thicknesses via Residual Recurrent Neural Network

Wufeng Xue, Ilanit Ben Nachum, Sachin Pandey, James Warrington, Stephanie Leung, Shuo Li
Comments: To appear as an oral paper in IPMI2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accurate estimation of regional wall thicknesses (RWT) of left ventricular
(LV) myocardium from cardiac MR sequences is of significant importance for
identification and diagnosis of cardiac disease. Existing RWT estimation still
relies on segmentation of LV myocardium, which requires strong prior
information and user interaction. No work has been devoted into direct
estimation of RWT from cardiac MR images due to the diverse shapes and
structures for various subjects and cardiac diseases, as well as the complex
regional deformation of LV myocardium during the systole and diastole phases of
the cardiac cycle. In this paper, we present a newly proposed Residual
Recurrent Neural Network (ResRNN) that fully leverages the spatial and temporal
dynamics of LV myocardium to achieve accurate frame-wise RWT estimation. Our
ResRNN comprises two paths: 1) a feed forward convolution neural network (CNN)
for effective and robust CNN embedding learning of various cardiac images and
preliminary estimation of RWT from each frame itself independently, and 2) a
recurrent neural network (RNN) for further improving the estimation by modeling
spatial and temporal dynamics of LV myocardium. For the RNN path, we design for
cardiac sequences a Circle-RNN to eliminate the effect of null hidden input for
the first time-step. Our ResRNN is capable of obtaining accurate estimation of
cardiac RWT with Mean Absolute Error of 1.44mm (less than 1-pixel error) when
validated on cardiac MR sequences of 145 subjects, evidencing its great
potential in clinical cardiac function assessment.

Towards Visual Ego-motion Learning in Robots

Sudeep Pillai, John J. Leonard
Comments: Conference paper; Submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017, Vancouver CA; 8 pages, 8 figures, 2 tables
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Many model-based Visual Odometry (VO) algorithms have been proposed in the
past decade, often restricted to the type of camera optics, or the underlying
motion manifold observed. We envision robots to be able to learn and perform
these tasks, in a minimally supervised setting, as they gain more experience.
To this end, we propose a fully trainable solution to visual ego-motion
estimation for varied camera optics. We propose a visual ego-motion learning
architecture that maps observed optical flow vectors to an ego-motion density
estimate via a Mixture Density Network (MDN). By modeling the architecture as a
Conditional Variational Autoencoder (C-VAE), our model is able to provide
introspective reasoning and prediction for ego-motion induced scene-flow.
Additionally, our proposed model is especially amenable to bootstrapped
ego-motion learning in robots where the supervision in ego-motion estimation
for a particular camera sensor can be obtained from standard navigation-based
sensor fusion strategies (GPS/INS and wheel-odometry fusion). Through
experiments, we show the utility of our proposed approach in enabling the
concept of self-supervised learning for visual ego-motion estimation in
autonomous robots.

LAP: a Linearize and Project Method for Solving Inverse Problems with Coupled Variables

James Herring, James Nagy, Lars Ruthotto
Comments: 21 pages, 6 figures, 3 tables
Subjects: Numerical Analysis (math.NA); Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (cs.NA); Optimization and Control (math.OC)

Many inverse problems involve two or more sets of variables that represent
different physical quantities but are tightly coupled with each other. For
example, image super-resolution requires joint estimation of image and motion
parameters from noisy measurements. Exploiting this structure is key for
efficiently solving large-scale problems to avoid, e.g., ill-conditioned
optimization problems.

In this paper, we present a new method called Linearize And Project (LAP)
that offers a flexible framework for solving inverse problems with coupled
variables. LAP is most promising for cases when the subproblem corresponding to
one of the variables is considerably easier to solve than the other. LAP is
based on a Gauss-Newton method, and thus after linearizing the residual, it
eliminates one block of variables through projection. Due to the linearization,
the block can be chosen freely and can represent quadratic as well as nonlinear
variables. Further, LAP supports direct, iterative, and hybrid regularization
as well as constraints. Therefore LAP is attractive, e.g., for ill-posed
imaging problems. These traits differentiate LAP from common alternatives for
this type of problems such as variable projection (VarPro) and block coordinate
descent (BCD). Our numerical experiments compare the performance of LAP to BCD
and VarPro using four coupled problems with one quadratic and one nonlinear set
of variables.

BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet

Binary Neural Networks (BNNs) can drastically reduce memory size and accesses
by applying bit-wise operations instead of standard arithmetic operations.
Therefore it could significantly improve the efficiency and lower the energy
consumption at runtime, which enables the application of state-of-the-art deep
learning models on low power devices. BMXNet is an open-source BNN library
based on MXNet, which supports both XNOR-Networks and Quantized Neural
Networks. The developed BNN layers can be seamlessly applied with other
standard library components and work in both GPU and CPU mode. BMXNet is
maintained and developed by the multimedia research group at Hasso Plattner
Institute and released under Apache license. Extensive experiments validate the
efficiency and effectiveness of our implementation. The BMXNet library, several
sample projects, and a collection of pre-trained binary deep models are
available for download at this https URL

PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations

Rico Jonschkowski, Roland Hafner, Jonathan Scholz, Martin Riedmiller
Comments: Submitted to Robotics: Science and Systems (RSS 2017) Workshop New Frontiers for Deep Learning in Robotics this http URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We propose position-velocity encoders (PVEs) which learn—without
supervision—to encode images to positions and velocities of task-relevant
objects. PVEs encode a single image into a low-dimensional position state and
compute the velocity state from finite differences in position. In contrast to
autoencoders, position-velocity encoders are not trained by image
reconstruction, but by making the position-velocity representation consistent
with priors about interacting with the physical world. We applied PVEs to
several simulated control tasks from pixels and achieved promising preliminary
results.

LiDAR-Camera Calibration using 3D-3D Point correspondences

Ankit Dhall, Kunal Chelani, Vishnu Radhakrishnan, K.M. Krishna
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

With the advent of autonomous vehicles, LiDAR and cameras have become an
indispensable combination of sensors. They both provide rich and complementary
data which can be used by various algorithms and machine learning to sense and
make vital inferences about the surroundings. We propose a novel pipeline and
experimental setup to find accurate rigid-body transformation for extrinsically
calibrating a LiDAR and a camera. The pipeling uses 3D-3D point correspondences
in LiDAR and camera frame and gives a closed form solution. We further show the
accuracy of the estimate by fusing point clouds from two stereo cameras which
align perfectly with the rotation and translation estimated by our method,
confirming the accuracy of our method’s estimates both mathematically and
visually. Taking our idea of extrinsic LiDAR-camera calibration forward, we
demonstrate how two cameras with no overlapping field-of-view can also be
calibrated extrinsically using 3D point correspondences. The code has been made
available as open-source software in the form of a ROS package, more
information about which can be sought here:
this https URL .

A Multi-strength Adversarial Training Method to Mitigate Adversarial Attacks

Chang Song, Hsin-Pai Cheng, Chunpeng Wu, Hai Li, Yiran Chen, Qing Wu
Comments: Submitted to NIPS 2017
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Some recent works revealed that deep neural networks (DNNs) are vulnerable to
so-called adversarial attacks where input examples are intentionally perturbed
to fool DNNs. In this work, we revisit the DNN training process that includes
adversarial examples into the training dataset so as to improve DNN’s
resilience to adversarial attacks, namely, adversarial training. Our
experiments show that different adversarial strengths, i.e., perturbation
levels of adversarial examples, have different working zones to resist the
attack. Based on the observation, we propose a multi-strength adversarial
training method (MAT) that combines the adversarial training examples with
different adversarial strengths to defend adversarial attacks. Two training
structures – mixed MAT and parallel MAT – are developed to facilitate the
tradeoffs between training time and memory occupation. Our results show that
MAT can substantially minimize the accuracy degradation of deep learning
systems to adversarial attacks on MNIST, CIFAR-10, CIFAR-100, and SVHN.

Artificial Intelligence

Learning Belief Network Structure From Data under Causal Insufficiency

Mieczysław Kłopotek
Comments: A short version of this paper appeared in [Klopotek:94m] M.A. K{l}opotek: Learning Belief Network Structure From Data under Causal Insufficiency. [in:] F. Bergadano, L.DeRaed Eds.: Machine Learning ECML-94 , Proc. 13th European Conference on Machine Learning, Catania, Italy, 6-8 April 1994, Lecture Notes in Artificial Intelligence 784, Springer-Verlag, 1994, pp. 379-382
Subjects: Artificial Intelligence (cs.AI)

Though a belief network (a representation of the joint probability
distribution, see [3]) and a causal network (a representation of causal
relationships [14]) are intended to mean different things, they are closely
related. Both assume an underlying dag (directed acyclic graph) structure of
relations among variables and if Markov condition and faithfulness condition
[15] are met, then a causal network is in fact a belief network. The difference
comes to appearance when we recover belief network and causal network structure
from data.

A causal network structure may be impossible to recover completely from data
as not all directions of causal links may be uniquely determined [15].
Fortunately, if we deal with causally sufficient sets of variables (that is
whenever significant influence variables are not omitted from observation),
then there exists the possibility to identify the family of belief networks a
causal network belongs to [16]. Regrettably, to our knowledge, a similar result
is not directly known for causally insufficient sets of variables. Spirtes,
Glymour and Scheines developed a CI algorithm to handle this situation, but it
leaves some important questions open.

The big open question is whether or not the bidirectional edges (that is
indications of a common cause) are the only ones necessary to develop a belief
network out of the product of CI, or must there be some other hidden variables
added (e.g. by guessing). This paper is devoted to settling this question.

Automatic White-Box Testing of First-Order Logic Ontologies

Javier Álvez, Montserrat Hermo, Paqui Lucio, German Rigau
Comments: 41 pages
Subjects: Artificial Intelligence (cs.AI)

A long-standing dream of Artificial Intelligence (AI) has pursued to encode
commonsense knowledge into computer programs enabling machines to reason about
our world and problems. This work offers a new practical insight towards the
automatic testing of first-order logic (FOL) ontologies. We introduce a novel
fully automatic white-box testing framework for first-order logic (FOL)
ontologies. The application of the proposed testing method is fully automatic
since a) the automated generation of tests is only guided by the syntax of
axioms and b) the evaluation of tests is performed by automated theorem
provers. Our proposal enables the detection of defective axioms and,
additionally, it also serves to demonstrate the suitability for reasoning
purposes of those formulas included into FOL ontologies. We validate our
proposal by its practical application to different FOL ontologies. In
particular, DOLCE —consisting of around 200 axioms—, FPK (formal proof of
the Kepler conjecture) —which has been derived from the Flyspeck project for
its use in the CADE ATP System Competition CASC-J8—, and Adimen-SUMO —which
is an ontology with more than 7,000 axioms derived from SUMO—. As result, we
have detected several non-trivial defects that were hidden in those ontologies.
Further, we have obtained an improved version of Adimen-SUMO (v2.6) by
correcting all the defects detected during the practical application of our
white-box testing method.

Black-box Testing of First-Order Logic Ontologies Using WordNet

Javier Álvez, Paqui Lucio, German Rigau
Comments: 53 pages
Subjects: Artificial Intelligence (cs.AI)

A long-standing dream of Artificial Intelligence (AI) has pursued to enrich
computer programs with commonsense knowledge enabling machines to reason about
our world. This paper offers a new practical insight towards the automation of
commonsense reasoning with first-order logic (FOL) ontologies. We propose a new
black-box testing methodology of FOL SUMO-based ontologies by exploiting
WordNet and its mapping into SUMO. Our proposal includes a method for the
(semi-)automatic creation of a very large set of tests and a procedure for its
automated evaluation by using automated theorem provers (ATPs). Applying our
testing proposal, we are able to successfully evaluate a) the competency of
several translations of SUMO into FOL and b) the performance of various
automated ATPs. In addition, we are also able to evaluate the resulting set of
tests according to different quality criteria.

Machine Learned Learning Machines

Leigh Sheneman, Arend Hintze
Subjects: Artificial Intelligence (cs.AI)

There are two common approaches for optimizing the performance of a machine:
genetic algorithms and machine learning. A genetic algorithm is applied over
many generations whereas machine learning works by applying feedback until the
system meets a performance threshold. Though these are methods that typically
operate separately, we combine evolutionary adaptation and machine learning
into one approach. Our focus is on machines that can learn during their
lifetime, but instead of equipping them with a machine learning algorithm we
aim to let them evolve their ability to learn by themselves. We use evolvable
networks of probabilistic and deterministic logic gates, known as Markov
Brains, as our computational model organism. The ability of Markov Brains to
learn is augmented by a novel adaptive component that can change its
computational behavior based on feedback. We show that Markov Brains can indeed
evolve to incorporate these feedback gates to improve their adaptability to
variable environments. By combining these two methods, we now also implemented
a computational model that can be used to study the evolution of learning.

Abstract Argumentation / Persuasion / Dynamics

Ryuta Arisaka, Ken Satoh
Subjects: Artificial Intelligence (cs.AI)

The act of persuasion, a key component in rhetoric argumentation, may be
viewed as a dynamics modifier. Such modifiers are well-known in other research
fields: recall dynamic epistemic logic where operators modify possible world
accessibilities, or recall side effects and concurrency in programming
languages. We consider persuasion in abstract argumentation as undertaking a
similar role. We extend Dung’s frameworks with acts of persuasion among agents
into Abstract Persuasion Argumentation (APA), and set forth properties related
to arguments’ admissibilities. We show a way of enriching our basic notion of
admissibility through CTL (computation tree logic) encoding, which also permits
importation of the theoretical results known to the logic into our
argumentation frameworks.

Should Robots be Obedient?

Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, Stuart Russell
Comments: Accepted to IJCAI 2017
Subjects: Artificial Intelligence (cs.AI)

Intuitively, obedience — following the order that a human gives — seems
like a good property for a robot to have. But, we humans are not perfect and we
may give orders that are not best aligned to our preferences. We show that when
a human is not perfectly rational then a robot that tries to infer and act
according to the human’s underlying preferences can always perform better than
a robot that simply follows the human’s literal order. Thus, there is a
tradeoff between the obedience of a robot and the value it can attain for its
owner. We investigate how this tradeoff is impacted by the way the robot infers
the human’s preferences, showing that some methods err more on the side of
obedience than others. We then analyze how performance degrades when the robot
has a misspecified model of the features that the human cares about or the
level of rationality of the human. Finally, we study how robots can start
detecting such model misspecification. Overall, our work suggests that there
might be a middle ground in which robots intelligently decide when to obey
human orders, but err on the side of obedience.

Probabilistic Program Abstractions

Steven Holtzen, Todd Millstein, Guy Van den Broeck
Subjects: Artificial Intelligence (cs.AI)

Abstraction is a fundamental tool for reasoning about complex systems.
Program abstraction has been utilized to great effect for analyzing
deterministic programs. At the heart of pro- gram abstraction is the
relationship between a concrete program, which is difficult to analyze, and an
abstraction, which is more tractable. We generalize non-deterministic program
abstractions to probabilistic program abstractions by explicitly quantifying
the non-deterministic choices made by traditional program abstractions. We
upgrade key theoretical program abstraction insights to the probabilistic
context. Probabilistic program abstractions provide avenues for utilizing
abstraction techniques from the programming languages community to improve the
analysis of probabilistic programs.

Bayesian Unification of Gradient and Bandit-based Learning for Accelerated Global Optimisation

Ole-Christoffer Granmo
Comments: 15th IEEE International Conference on Machine Learning and Applications (ICMLA 2016)
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

Bandit based optimisation has a remarkable advantage over gradient based
approaches due to their global perspective, which eliminates the danger of
getting stuck at local optima. However, for continuous optimisation problems or
problems with a large number of actions, bandit based approaches can be
hindered by slow learning. Gradient based approaches, on the other hand,
navigate quickly in high-dimensional continuous spaces through local
optimisation, following the gradient in fine grained steps. Yet, apart from
being susceptible to local optima, these schemes are less suited for online
learning due to their reliance on extensive trial-and-error before the optimum
can be identified. In this paper, we propose a Bayesian approach that unifies
the above two paradigms in one single framework, with the aim of combining
their advantages. At the heart of our approach we find a stochastic linear
approximation of the function to be optimised, where both the gradient and
values of the function are explicitly captured. This allows us to learn from
both noisy function and gradient observations, and predict these properties
across the action space to support optimisation. We further propose an
accompanying bandit driven exploration scheme that uses Bayesian credible
bounds to trade off exploration against exploitation. Our empirical results
demonstrate that by unifying bandit and gradient based learning, one obtains
consistently improved performance across a wide spectrum of problem
environments. Furthermore, even when gradient feedback is unavailable, the
flexibility of our model, including gradient prediction, still allows us
outperform competing approaches, although with a smaller margin. Due to the
pervasiveness of bandit based optimisation, our scheme opens up for improved
performance both in meta-optimisation and in applications where gradient
related information is readily available.

Inexpensive Cost-Optimized Measurement Proposal for Sequential Model-Based Diagnosis

Patrick Rodler, Wolfgang Schmid, Konstantin Schekotihin
Subjects: Artificial Intelligence (cs.AI)

In this work we present strategies for (optimal) measurement selection in
model-based sequential diagnosis. In particular, assuming a set of leading
diagnoses being given, we show how queries (sets of measurements) can be
computed and optimized along two dimensions: expected number of queries and
cost per query. By means of a suitable decoupling of two optimizations and a
clever search space reduction the computations are done without any inference
engine calls. For the full search space, we give a method requiring only a
polynomial number of inferences and guaranteeing query properties existing
methods cannot provide. Evaluation results using real-world problems indicate
that the new method computes (virtually) optimal queries instantly
independently of the size and complexity of the considered diagnosis problems.

Quadratic Unconstrained Binary Optimization Problem Preprocessing: Theory and Empirical Analysis

Mark Lewis, Fred Glover
Comments: Benchmark problems used are available from the first author
Subjects: Artificial Intelligence (cs.AI)

The Quadratic Unconstrained Binary Optimization problem (QUBO) has become a
unifying model for representing a wide range of combinatorial optimization
problems, and for linking a variety of disciplines that face these problems. A
new class of quantum annealing computer that maps QUBO onto a physical qubit
network structure with specific size and edge density restrictions is
generating a growing interest in ways to transform the underlying QUBO
structure into an equivalent graph having fewer nodes and edges. In this paper
we present rules for reducing the size of the QUBO matrix by identifying
variables whose value at optimality can be predetermined. We verify that the
reductions improve both solution quality and time to solution and, in the case
of metaheuristic methods where optimal solutions cannot be guaranteed, the
quality of solutions obtained within reasonable time limits.

We discuss the general QUBO structural characteristics that can take
advantage of these reduction techniques and perform careful experimental design
and analysis to identify and quantify the specific characteristics most
affecting reduction. The rules make it possible to dramatically improve
solution times on a new set of problems using both the exact Cplex solver and a
tabu search metaheuristic.

Multi-shot ASP solving with clingo

Martin Gebser, Roland Kaminski, Benjamin Kaufmann, Torsten Schaub
Comments: Submitted to TPLP
Subjects: Artificial Intelligence (cs.AI)

We introduce a new flexible paradigm of grounding and solving in Answer Set
Programming (ASP), which we refer to as multi-shot ASP solving, and present its
implementation in the ASP system clingo.

Multi-shot ASP solving features grounding and solving processes that deal
with continuously changing logic programs. In doing so, they remain operative
and accommodate changes in a seamless way. For instance, such processes allow
for advanced forms of search, as in optimization or theory solving, or
interaction with an environment, as in robotics or query-answering. Common to
them is that the problem specification evolves during the reasoning process,
either because data or constraints are added, deleted, or replaced. This
evolutionary aspect adds another dimension to ASP since it brings about state
changing operations. We address this issue by providing an operational
semantics that characterizes grounding and solving processes in multi-shot ASP
solving. This characterization provides a semantic account of grounder and
solver states along with the operations manipulating them.

The operative nature of multi-shot solving avoids redundancies in relaunching
grounder and solver programs and benefits from the solver’s learning
capacities. clingo accomplishes this by complementing ASP’s declarative input
language with control capacities. On the declarative side, a new directive
allows for structuring logic programs into named and parameterizable
subprograms. The grounding and integration of these subprograms into the
solving process is completely modular and fully controllable from the
procedural side. To this end, clingo offers a new application programming
interface that is conveniently accessible via scripting languages.

Contextual Explanation Networks

Maruan Al-Shedivat, Avinava Dubey, Eric P. Xing
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We introduce contextual explanation networks (CENs)—a class of models that
learn to predict by generating and leveraging intermediate explanations. CENs
combine deep networks with context-specific probabilistic models and construct
explanations in the form of locally-correct hypotheses. Contrary to the
existing post-hoc model-explanation tools, CENs learn to predict and to explain
jointly. Our approach offers two major advantages: (i) for each prediction,
valid instance-specific explanations are generated with no computational
overhead and (ii) prediction via explanation acts as a regularization and
boosts performance in low-resource settings. We prove that local approximations
to the decision boundary of our networks are consistent with the generated
explanations. Our results on image and text classification and survival
analysis tasks demonstrate that CENs can easily match or outperform the
state-of-the-art while offering additional insights behind each prediction,
valuable for decision support.

Towards Visual Ego-motion Learning in Robots

Many model-based Visual Odometry (VO) algorithms have been proposed in the
past decade, often restricted to the type of camera optics, or the underlying
motion manifold observed. We envision robots to be able to learn and perform
these tasks, in a minimally supervised setting, as they gain more experience.
To this end, we propose a fully trainable solution to visual ego-motion
estimation for varied camera optics. We propose a visual ego-motion learning
architecture that maps observed optical flow vectors to an ego-motion density
estimate via a Mixture Density Network (MDN). By modeling the architecture as a
Conditional Variational Autoencoder (C-VAE), our model is able to provide
introspective reasoning and prediction for ego-motion induced scene-flow.
Additionally, our proposed model is especially amenable to bootstrapped
ego-motion learning in robots where the supervision in ego-motion estimation
for a particular camera sensor can be obtained from standard navigation-based
sensor fusion strategies (GPS/INS and wheel-odometry fusion). Through
experiments, we show the utility of our proposed approach in enabling the
concept of self-supervised learning for visual ego-motion estimation in
autonomous robots.

Latent Intention Dialogue Models

Developing a dialogue agent that is capable of making autonomous decisions
and communicating by natural language is one of the long-term goals of machine
learning research. Traditional approaches either rely on hand-crafting a small
state-action set for applying reinforcement learning that is not scalable or
constructing deterministic models for learning dialogue sentences that fail to
capture natural conversational variability. In this paper, we propose a Latent
Intention Dialogue Model (LIDM) that employs a discrete latent variable to
learn underlying dialogue intentions in the framework of neural variational
inference. In a goal-oriented dialogue scenario, these latent intentions can be
interpreted as actions guiding the generation of machine responses, which can
be further refined autonomously by reinforcement learning. The experimental
evaluation of LIDM shows that the model out-performs published benchmarks for
both corpus-based and human evaluation, demonstrating the effectiveness of
discrete latent variable models for learning goal-oriented dialogues.

Mining Process Model Descriptions of Daily Life through Event Abstraction

Niek Tax, Natalia Sidorova, Reinder Haakma, Wil M.P. van der Aalst
Comments: arXiv admin note: substantial text overlap with arXiv:1606.07283
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB)

Process mining techniques focus on extracting insight in processes from event
logs. Process mining has the potential to provide valuable insights in
(un)healthy habits and to contribute to ambient assisted living solutions when
applied on data from smart home environments. However, events recorded in smart
home environments are on the level of sensor triggers, at which process
discovery algorithms produce overgeneralizing process models that allow for too
much behavior and that are difficult to interpret for human experts. We show
that abstracting the events to a higher-level interpretation can enable
discovery of more precise and more comprehensible models. We present a
framework for the extraction of features that can be used for abstraction with
supervised learning methods that is based on the XES IEEE standard for event
logs. This framework can automatically abstract sensor-level events to their
interpretation at the human activity level, after training it on training data
for which both the sensor and human activity events are known. We demonstrate
our abstraction framework on three real-life smart home event logs and show
that the process models that can be discovered after abstraction are more
precise indeed.

Implicit Variational Inference with Kernel Density Ratio Fitting

Jiaxin Shi, Shengyang Sun, Jun Zhu
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Recent progress in variational inference has paid much attention to the
flexibility of variational posteriors. Work has been done to use implicit
distributions, i.e., distributions without tractable likelihoods as the
variational posterior. However, existing methods on implicit posteriors still
face challenges of noisy estimation and can hardly scale to high-dimensional
latent variable models. In this paper, we present an implicit variational
inference approach with kernel density ratio fitting that addresses these
challenges. As far as we know, for the first time implicit variational
inference is successfully applied to Bayesian neural networks, which shows
promising results on both regression and classification tasks.

Role Playing Learning for Socially Concomitant Mobile Robot Navigation

Mingming Li, Rui Jiang, Shuzhi Sam Ge, Tong Heng Lee
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

In this paper, we present the Role Playing Learning (RPL) scheme for a mobile
robot to navigate socially with its human companion in populated environments.
Neural networks (NN) are constructed to parameterize a stochastic policy that
directly maps sensory data collected by the robot to its velocity outputs,
while respecting a set of social norms. An efficient simulative learning
environment is built with maps and pedestrians trajectories collected from a
number of real-world crowd data sets. In each learning iteration, a robot
equipped with the NN policy is created virtually in the learning environment to
play itself as a companied pedestrian and navigate towards a goal in a socially
concomitant manner. Thus, we call this process Role Playing Learning, which is
formulated under a reinforcement learning (RL) framework. The NN policy is
optimized end-to-end using Trust Region Policy Optimization (TRPO), with
consideration of the imperfectness of robot’s sensor measurements. Simulative
and experimental results are provided to demonstrate the efficacy and
superiority of our method.

AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks

Alex Gaunt, Matthew Johnson, Maik Riechert, Daniel Tarlow, Ryota Tomioka, Dimitrios Vytiniotis, Sam Webster
Comments: 17 pages, 13 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

New types of machine learning hardware in development and entering the market
hold the promise of revolutionizing deep learning in a manner as profound as
GPUs. However, existing software frameworks and training algorithms for deep
learning have yet to evolve to fully leverage the capability of the new wave of
silicon. We already see the limitations of existing algorithms for models that
exploit structured input via complex and instance-dependent control flow, which
prohibits minibatching. We present an {em asynchronous model-parallel} (AMP)
training algorithm that is specifically motivated by training on networks of
interconnected devices. Through an implementation on multi-core CPUs, we show
that AMP training converges to the same accuracy as conventional synchronous
training algorithms in a similar number of epochs, but utilizes the available
hardware more efficiently even for small minibatch sizes, resulting in
significantly shorter overall training times. Our framework opens the door for
scaling up a new class of deep learning models that cannot be efficiently
trained today.

Good Semi-supervised Learning that Requires a Bad GAN

Zihang Dai, Zhilin Yang, Fan Yang, William W. Cohen, Ruslan Salakhutdinov
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Semi-supervised learning methods based on generative adversarial networks
(GANs) obtained strong empirical results, but it is not clear 1) how the
discriminator benefits from joint training with a generator, and 2) why good
semi-supervised classification performance and a good generator cannot be
obtained at the same time. Theoretically, we show that given the discriminator
objective, good semisupervised learning indeed requires a bad generator, and
propose the definition of a preferred generator. Empirically, we derive a novel
formulation based on our analysis that substantially improves over feature
matching GANs, obtaining state-of-the-art results on multiple benchmark
datasets.

Multiple Source Domain Adaptation with Adversarial Training of Neural Networks

Han Zhao, Shanghang Zhang, Guanhang Wu, João P. Costeira, José M. F. Moura, Geoffrey J. Gordon
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We propose a new generalization bound for domain adaptation when there are
multiple source domains with labeled instances and one target domain with
unlabeled instances. The new bound has an interesting interpretation and
reduces to an existing bound when there is only one source domain. Compared
with existing bounds, the new bound does not require expert knowledge about the
target distribution, nor the optimal combination rule for multisource domains.
Interestingly, our theory also leads to an efficient implementation using
adversarial neural networks: we show how to interpret it as learning feature
representations that are invariant to the multiple domain shifts while still
being discriminative for the learning task. To this end, we propose two models,
both of which we call multisource domain adversarial networks (MDANs): the
first model optimizes directly our bound, while the second model is a smoothed
approximation of the first one, leading to a more data-efficient and
task-adaptive model. The optimization tasks of both models are minimax saddle
point problems that can be optimized by adversarial training. To demonstrate
the effectiveness of MDANs, we conduct extensive experiments showing superior
adaptation performance on three real-world datasets: sentiment analysis, digit
classification, and vehicle counting.

Information Retrieval

KlusTree: Clustering Answer Trees from Keyword Search on Graphs

Madhulika Mohanty, Maya Ramanath
Comments: 16 pages, 2 Figures
Subjects: Information Retrieval (cs.IR)

Graph structured data on the web is now massive as well as diverse, ranging
from social networks, web graphs to knowledge-bases. Effectively querying this
graph structured data is non-trivial and has led to research in a variety of
directions — structured queries, keyword and natural language queries,
automatic translation of these queries to structured queries, etc. We are
concerned with a class of queries called relationship queries, which are
usually expressed as a set of keywords (each keyword denoting a named entity).
The results returned are a set of ranked trees, each of which denotes
relationships among the various keywords. The result list could consist of
hundreds of answers. The problem of keyword search on graphs has been explored
for over a decade now, but an important aspect that is not as extensively
studied is that of user experience. We propose KlusTree, which presents
clustered results to the users instead of a list of all the results. In our
approach, the result trees are represented using language models and are
clustered using JS divergence as a distance measure. We compare KlusTree with
the well-known approaches based on isomorphism and tree-edit distance based
clustering. The user evaluations show that KlusTree outperforms the other two
in providing better clustering, thereby enriching user experience, revealing
interesting patterns and improving result interpretation by the user.

Computation and Language

Who's to say what's funny? A computer using Language Models and Deep Learning, That's Who!

Xinru Yan, Ted Pedersen
Comments: 3 pages, Appears in the Proceedings of the Women and Underrepresented Minorities in Natural Language Processing Workshop (WiNLP-2017), July 30, 2017, Vancouver, BC
Subjects: Computation and Language (cs.CL)

Humor is a defining characteristic of human beings. Our goal is to develop
methods that automatically detect humorous statements and rank them on a
continuous scale. In this paper we report on results using a Language Model
approach, and outline our plans for using methods from Deep Learning.

Latent Intention Dialogue Models

Developing a dialogue agent that is capable of making autonomous decisions
and communicating by natural language is one of the long-term goals of machine
learning research. Traditional approaches either rely on hand-crafting a small
state-action set for applying reinforcement learning that is not scalable or
constructing deterministic models for learning dialogue sentences that fail to
capture natural conversational variability. In this paper, we propose a Latent
Intention Dialogue Model (LIDM) that employs a discrete latent variable to
learn underlying dialogue intentions in the framework of neural variational
inference. In a goal-oriented dialogue scenario, these latent intentions can be
interpreted as actions guiding the generation of machine responses, which can
be further refined autonomously by reinforcement learning. The experimental
evaluation of LIDM shows that the model out-performs published benchmarks for
both corpus-based and human evaluation, demonstrating the effectiveness of
discrete latent variable models for learning goal-oriented dialogues.

On Multilingual Training of Neural Dependency Parsers

We show that a recently proposed neural dependency parser can be improved by
joint training on multiple languages from the same family. The parser is
implemented as a deep neural network whose only input is orthographic
representations of words. In order to successfully parse, the network has to
discover how linguistically relevant concepts can be inferred from word
spellings. We analyze the representations of characters and words that are
learned by the network to establish which properties of languages were
accounted for. In particular we show that the parser has approximately learned
to associate Latin characters with their Cyrillic counterparts and that it can
group Polish and Russian words that have a similar grammatical function.
Finally, we evaluate the parser on selected languages from the Universal
Dependencies dataset and show that it is competitive with other recently
proposed state-of-the art methods, while having a simple structure.

An Automatic Contextual Analysis and Clustering Classifiers Ensemble approach to Sentiment Analysis

Murtadha Talib AL-Sharuee, Fei Liu, Mahardhika Pratama
Comments: This article is submitted to a journal
Subjects: Computation and Language (cs.CL)

Products reviews are one of the major resources to determine the public
sentiment. The existing literature on reviews sentiment analysis mainly
utilizes supervised paradigm, which needs labeled data to be trained on and
suffers from domain-dependency. This article addresses these issues by
describes a completely automatic approach for sentiment analysis based on
unsupervised ensemble learning. The method consists of two phases. The first
phase is contextual analysis, which has five processes, namely (1) data
preparation; (2) spelling correction; (3) intensifier handling; (4) negation
handling and (5) contrast handling. The second phase comprises the unsupervised
learning approach, which is an ensemble of clustering classifiers using a
majority voting mechanism with different weight schemes. The base classifier of
the ensemble method is a modified k-means algorithm. The base classifier is
modified by extracting initial centroids from the feature set via using
SentWordNet (SWN). We also introduce new sentiment analysis problems of
Australian airlines and home builders which offer potential benchmark problems
in the sentiment analysis field. Our experiments on datasets from different
domains show that contextual analysis and the ensemble phases improve the
clustering performance in term of accuracy, stability and generalization
ability.

Dynamics of core of language vocabulary

Valery D. Solovyev, Vladimir V. Bochkarev, Anna V. Shevlyakova
Comments: This report was presented at the Workshop “Computational linguistics and language science”, Moscow, Russia on April 25, 2016
Subjects: Computation and Language (cs.CL)

Studies of the overall structure of vocabulary and its dynamics became
possible due to creation of diachronic text corpora, especially Google Books
Ngram. This article discusses the question of core change rate and the degree
to which the core words cover the texts. Different periods of the last three
centuries and six main European languages presented in Google Books Ngram are
compared. The main result is high stability of core change rate, which is
analogous to stability of the Swadesh list.

Supervised Complementary Entity Recognition with Augmented Key-value Pairs of Knowledge

Hu Xu, Lei Shu, Philip S. Yu
Subjects: Computation and Language (cs.CL)

Extracting opinion targets is an important task in sentiment analysis on
product reviews and complementary entities (products) are one important type of
opinion targets that may work together with the reviewed product. In this
paper, we address the problem of Complementary Entity Recognition (CER) as a
supervised sequence labeling with the capability of expanding domain knowledge
as key-value pairs from unlabeled reviews, by automatically learning and
enhancing knowledge-based features. We use Conditional Random Field (CRF) as
the base learner and augment CRF with knowledge-based features (called the
Knowledge-based CRF or KCRF for short). We conduct experiments to show that
KCRF effectively improves the performance of supervised CER task.

Subject Specific Stream Classification Preprocessing Algorithm for Twitter Data Stream

Nisansa de Silva, Danaja Maldeniya, Chamilka Wijeratne
Comments: 6 pages
Subjects: Computation and Language (cs.CL)

Micro-blogging service Twitter is a lucrative source for data mining
applications on global sentiment. But due to the omnifariousness of the
subjects mentioned in each data item; it is inefficient to run a data mining
algorithm on the raw data. This paper discusses an algorithm to accurately
classify the entire stream in to a given number of mutually exclusive
collectively exhaustive streams upon each of which the data mining algorithm
can be run separately yielding more relevant results with a high efficiency.

Deep Learning for User Comment Moderation

John Pavlopoulos, Prodromos Malakasiotis, Ion Androutsopoulos
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Experimenting with a new dataset of 1.6M user comments from a Greek news
portal and existing datasets of English Wikipedia comments, we show that an RNN
outperforms the previous state of the art in moderation. A deep,
classification-specific attention mechanism improves further the overall
performance of the RNN. We also compare against a CNN and a word-list baseline,
considering both fully automatic and semi-automatic moderation.

Neural Semantic Parsing by Character-based Translation: Experiments with Abstract Meaning Representations

Rik van Noord, Johan Bos
Comments: In review for CLIN Journal
Subjects: Computation and Language (cs.CL)

We evaluate the character-level translation method for neural semantic
parsing on a large corpus of sentences annotated with Abstract Meaning
Representations (AMRs). Using a seq2seq model, and some trivial preprocessing
and postprocessing of AMRs, we obtain a baseline accuracy of 53.1 (F-score on
AMR-triples). We examine four different approaches to improve this baseline
result: (i) reordering AMR branches to match the word order of the input
sentence increases performance to 58.3; (ii) adding part-of-speech tags
(automatically produced) to the input shows improvement as well (57.2); (iii)
So does the introduction of super characters (conflating frequent sequences of
characters to a single character), reaching 57.4; (iv) adding silver-standard
training data obtained by an off-the-shelf parser yields the biggest
improvement, resulting in an F-score of 64.0. Combining all four techniques
leads to an F-score of 69.0, which is state-of-the-art in AMR parsing. This is
remarkable because of the relatively simplicity of the approach: the only
explicit linguistic knowledge that we use are part-of-speech tags.

The placement of the head that maximizes predictability. An information theoretic approach

Ramon Ferrer-i-Cancho
Subjects: Computation and Language (cs.CL); Adaptation and Self-Organizing Systems (nlin.AO); Physics and Society (physics.soc-ph); Neurons and Cognition (q-bio.NC)

The minimization of the length of syntactic dependencies is a well-stablished
principle of word order and the basis of a mathematical theory of word order.
Here we complete that theory from the perspective of information theory, adding
a competing word order principle: the maximization of predictability of a
target element. These two principles are in conflict: to maximize the
predictability of the head, the head should appear last, which maximizes the
costs with respect to dependency length minimization. The implications of such
a broad theoretical framework to understand the optimality, diversity and
evolution of the six possible orderings of subject, object and verb are
reviewed.

Listen, Interact and Talk: Learning to Speak via Interaction

Haichao Zhang, Haonan Yu, Wei Xu
Subjects: Computation and Language (cs.CL)

One of the long-term goals of artificial intelligence is to build an agent
that can communicate intelligently with human in natural language. Most
existing work on natural language learning relies heavily on training over a
pre-collected dataset with annotated labels, leading to an agent that
essentially captures the statistics of the fixed external training data. As the
training data is essentially a static snapshot representation of the knowledge
from the annotator, the agent trained this way is limited in adaptiveness and
generalization of its behavior. Moreover, this is very different from the
language learning process of humans, where language is acquired during
communication by taking speaking action and learning from the consequences of
speaking action in an interactive manner. This paper presents an interactive
setting for grounded natural language learning, where an agent learns natural
language by interacting with a teacher and learning from feedback, thus
learning and improving language skills while taking part in the conversation.
To achieve this goal, we propose a model which incorporates both imitation and
reinforcement by leveraging jointly sentence and reward feedbacks from the
teacher. Experiments are conducted to validate the effectiveness of the
proposed approach.

Understanding Abuse: A Typology of Abusive Language Detection Subtasks

Zeerak Waseem, Thomas Davidson, Dana Warmsley, Ingmar Weber
Comments: To appear in the proceedings of the 1st Workshop on Abusive Language Online. Please cite that version
Subjects: Computation and Language (cs.CL)

As the body of research on abusive language detection and analysis grows,
there is a need for critical consideration of the relationships between
different subtasks that have been grouped under this label. Based on work on
hate speech, cyberbullying, and online abuse we propose a typology that
captures central similarities and differences between subtasks and we discuss
its implications for data annotation and feature construction. We emphasize the
practical actions that can be taken by researchers to best approach their
abusive language detection subtask of interest.

On the relation between dependency distance, crossing dependencies, and parsing. Comment on "Dependency distance: a new perspective on syntactic patterns in natural languages" by Haitao Liu et al

Carlos Gómez-Rodríguez
Comments: This is a comment on the article “Dependency distance: a new perspective on syntactic patterns in natural languages”, by Haitao Liu et al. Submitted to Physics of Life Reviews
Subjects: Computation and Language (cs.CL)

Liu et al. (2017) provide a comprehensive account of research on dependency
distance in human languages. While the article is a very rich and useful report
on this complex subject, here I will expand on a few specific issues where
research in computational linguistics (specifically natural language
processing) can inform DDM research, and vice versa. These aspects have not
been explored much in the article by Liu et al. or elsewhere, probably due to
the little overlap between both research communities, but they may provide
interesting insights for improving our understanding of the evolution of human
languages, the mechanisms by which the brain processes and understands
language, and the construction of effective computer systems to achieve this
goal.

word2vec Skip-Gram with Negative Sampling is a Weighted Logistic PCA

Andrew J. Landgraf, Jeremy Bellay
Subjects: Computation and Language (cs.CL); Machine Learning (stat.ML)

We show that the skip-gram formulation of word2vec trained with negative
sampling is equivalent to a weighted logistic PCA. This connection allows us to
better understand the objective, compare it to other word embedding methods,
and extend it to higher dimensional models.

Semi-Supervised Model Training for Unbounded Conversational Speech Recognition

Shane Walker, Morten Pedersen, Iroro Orife, Jason Flaks
Subjects: Computation and Language (cs.CL)

For conversational large-vocabulary continuous speech recognition (LVCSR)
tasks, up to about two thousand hours of audio is commonly used to train state
of the art models. Collection of labeled conversational audio however, is
prohibitively expensive, laborious and error-prone. Furthermore, academic
corpora like Fisher English (2004) or Switchboard (1992) are inadequate to
train models with sufficient accuracy in the unbounded space of conversational
speech. These corpora are also timeworn due to dated acoustic telephony
features and the rapid advancement of colloquial vocabulary and idiomatic
speech over the last decades. Utilizing the colossal scale of our unlabeled
telephony dataset, we propose a technique to construct a modern, high quality
conversational speech training corpus on the order of hundreds of millions of
utterances (or tens of thousands of hours) for both acoustic and language model
training. We describe the data collection, selection and training, evaluating
the results of our updated speech recognition system on a test corpus of 7K
manually transcribed utterances. We show relative word error rate (WER)
reductions of {35%, 19%} on {agent, caller} utterances over our seed model and
5% absolute WER improvements over IBM Watson STT on this conversational speech
task.

A Deep Multi-View Learning Framework for City Event Extraction from Twitter Data Streams

Nazli Farajidavar, Sefki Kolozali, Payam Barnaghi
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)

Cities have been a thriving place for citizens over the centuries due to
their complex infrastructure. The emergence of the Cyber-Physical-Social
Systems (CPSS) and context-aware technologies boost a growing interest in
analysing, extracting and eventually understanding city events which
subsequently can be utilised to leverage the citizen observations of their
cities. In this paper, we investigate the feasibility of using Twitter textual
streams for extracting city events. We propose a hierarchical multi-view deep
learning approach to contextualise citizen observations of various city systems
and services. Our goal has been to build a flexible architecture that can learn
representations useful for tasks, thus avoiding excessive task-specific feature
engineering. We apply our approach on a real-world dataset consisting of event
reports and tweets of over four months from San Francisco Bay Area dataset and
additional datasets collected from London. The results of our evaluations show
that our proposed solution outperforms the existing models and can be used for
extracting city related events with an averaged accuracy of 81% over all
classes. To further evaluate the impact of our Twitter event extraction model,
we have used two sources of authorised reports through collecting road traffic
disruptions data from Transport for London API, and parsing the Time Out London
website for sociocultural events. The analysis showed that 49.5% of the Twitter
traffic comments are reported approximately five hours prior to the authorities
official records. Moreover, we discovered that amongst the scheduled
sociocultural event topics; tweets reporting transportation, cultural and
social events are 31.75% more likely to influence the distribution of the
Twitter comments than sport, weather and crime topics.

Multiplex model of mental lexicon reveals explosive learning in humans

Massimo Stella, Nicole M. Beckage, Markus Brede, Manlio De Domenico
Comments: 11 pages, 4 figures and 1 table
Subjects: Physics and Society (physics.soc-ph); Computation and Language (cs.CL); Social and Information Networks (cs.SI); Adaptation and Self-Organizing Systems (nlin.AO)

Similarities among words affect language acquisition and processing in a
multi-relational way barely accounted for in the literature. We propose a
multiplex network representation of word similarities in a mental lexicon as a
natural framework for investigating large-scale cognitive patterns. Our model
accounts for semantic, taxonomic, and phonological interactions and identifies
a cluster of words of higher frequency, easier to identify, memorise and learn
and with more meanings than expected at random. This cluster emerges around age
7 yr through an explosive transition not reproduced by null models. We relate
this phenomenon to polysemy, i.e. redundancy in word meanings. We show that the
word cluster acts as a core for the lexicon, increasing both its navigability
and robustness to degradation in cognitive impairments. Our findings provide
quantitative confirmation of existing psycholinguistic conjectures about core
structure in the mental lexicon and the importance of integrating
multi-relational word-word interactions in suitable frameworks.

Community Identity and User Engagement in a Multi-Community Landscape

Justine Zhang, William L. Hamilton, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky, Jure Leskovec
Comments: 10 page, 3 figures, To appear in the Proceedings of the 11th International Conference On Web And Social Media, ICWSM 2017; this version has subtle differences with the proceedings version, including an introductory quote
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Computers and Society (cs.CY); Physics and Society (physics.soc-ph)

A community’s identity defines and shapes its internal dynamics. Our current
understanding of this interplay is mostly limited to glimpses gathered from
isolated studies of individual communities. In this work we provide a
systematic exploration of the nature of this relation across a wide variety of
online communities. To this end we introduce a quantitative, language-based
typology reflecting two key aspects of a community’s identity: how distinctive,
and how temporally dynamic it is. By mapping almost 300 Reddit communities into
the landscape induced by this typology, we reveal regularities in how patterns
of user engagement vary with the characteristics of a community.

Our results suggest that the way new and existing users engage with a
community depends strongly and systematically on the nature of the collective
identity it fosters, in ways that are highly consequential to community
maintainers. For example, communities with distinctive and highly dynamic
identities are more likely to retain their users. However, such niche
communities also exhibit much larger acculturation gaps between existing users
and newcomers, which potentially hinder the integration of the latter.

More generally, our methodology reveals differences in how various social
phenomena manifest across communities, and shows that structuring the
multi-community landscape can lead to a better understanding of the systematic
nature of this diversity.

Distributed, Parallel, and Cluster Computing

Increasing the Efficiency of Sparse Matrix-Matrix Multiplication with a 2.5D Algorithm and One-Sided MPI

Alfio Lazzaro, Joost VandeVondele, Juerg Hutter, Ole Schuett
Comments: In Proceedings of PASC ’17, Lugano, Switzerland, June 26-28, 2017, 10 pages, 4 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Matrix-matrix multiplication is a basic operation in linear algebra and an
essential building block for a wide range of algorithms in various scientific
fields. Theory and implementation for the dense, square matrix case are
well-developed. If matrices are sparse, with application-specific sparsity
patterns, the optimal implementation remains an open question. Here, we explore
the performance of communication reducing 2.5D algorithms and one-sided MPI
communication in the context of linear scaling electronic structure theory. In
particular, we extend the DBCSR sparse matrix library, which is the basic
building block for linear scaling electronic structure theory and low scaling
correlated methods in CP2K. The library is specifically designed to efficiently
perform block-sparse matrix-matrix multiplication of matrices with a relatively
large occupation. Here, we compare the performance of the original
implementation based on Cannon’s algorithm and MPI point-to-point
communication, with an implementation based on MPI one-sided communications
(RMA), in both a 2D and a 2.5D approach. The 2.5D approach trades memory and
auxiliary operations for reduced communication, which can lead to a speedup if
communication is dominant. The 2.5D algorithm is somewhat easier to implement
with one-sided communications. A detailed description of the implementation is
provided, also for non ideal processor topologies, since this is important for
actual applications. Given the importance of the precise sparsity pattern, and
even the actual matrix data, which decides the effective fill-in upon
multiplication, the tests are performed within the CP2K package with
application benchmarks. Results show a substantial boost in performance for the
RMA based 2.5D algorithm, up to 1.80x, which is observed to increase with the
number of involved processes in the parallelization.

Dependency-Aware Rollback and Checkpoint-Restart for Distributed Task-Based Runtimes

Kiril Dichev, Herbert Jordan, Konstantinos Tovletoglou, Thomas Heller, Dimitrios S. Nikolopoulos, Georgios Karakonstantis, Charles Gillan
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

With the increase in compute nodes in large compute platforms, a proportional
increase in node failures will follow. Many application-based
checkpoint/restart (C/R) techniques have been proposed for MPI applications to
target the reduced mean time between failures. However, rollback as part of the
recovery remains a dominant cost even in highly optimised MPI applications
employing C/R techniques. Continuing execution past a checkpoint (that is,
reducing rollback) is possible in message-passing runtimes, but extremely
complex to design and implement. Our work focuses on task-based runtimes, where
task dependencies are explicit and message passing is implicit. We see an
opportunity for reducing rollback for such runtimes: we explore task
dependencies in the rollback, which we call dependency-aware rollback. We also
design a new C/R technique, which is influenced by recursive decomposition of
tasks, and combine it with dependency-aware rollback. We expect the
dependency-aware rollback to cancel and recompute less tasks in the presence of
node failures. We describe, implement and validate the proposed protocol in a
simulator, which confirms these expectations. In addition, we consistently
observe faster overall execution time for dependency-aware rollback in the
presence of faults, despite the fact that reduced task cancellation does not
guarantee reduced overall execution time.

Deterministic subgraph detection in broadcast CONGEST

Janne H. Korhonen, Joel Rybicki
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

We present simple deterministic algorithms for subgraph finding and
enumeration in the broadcast CONGEST model of distributed computation:

— For any constant (k), detecting (k)-paths and trees on (k) nodes can be
done in constantly many rounds, and (k)-cycles in (O(n)) rounds.

— On (d)-degenerate graphs, cliques and (4)-cycles can be enumerated in (O(d
+ log n)) rounds, and (5)-cycles in (O(d^2 + log n)) rounds.

In many cases, these bounds are tight up to logarithmic factors. Moreover, we
show that the algorithms for (d)-degenerate graphs can be improved to optimal
complexity (O(d/log n)) and (O(d^2/log n)), respectively, in the supported
CONGEST model, which can be seen as an intermediate model between CONGEST and
the congested clique.

Fully distributed PageRank computation with exponential convergence

Liang Dai, Nikolaos M. Freris
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (cs.SY)

This work studies a fully distributed algorithm for computing the PageRank
vector, which is inspired by the Matching Pursuit and features: 1) fully
distributed 2) expected converges with exponential rate 3) low storage
requirement (two scalar values per page). Illustrative experiments are
conducted to verify the findings.

A Unified Optimization Approach for Sparse Tensor Operations on GPUs

Bangtian Liu, Chengyao Wen, Anand D.Sarwate, Maryam Mehri Dehnavi
Subjects: Mathematical Software (cs.MS); Distributed, Parallel, and Cluster Computing (cs.DC)

Sparse tensors appear in many large-scale applications with multidimensional
and sparse data. While multidimensional sparse data often need to be processed
on manycore processors, attempts to develop highly-optimized GPU-based
implementations of sparse tensor operations are rare. The irregular computation
patterns and sparsity structures as well as the large memory footprints of
sparse tensor operations make such implementations challenging. We leverage the
fact that sparse tensor operations share similar computation patterns to
propose a unified tensor representation called F-COO. Combined with
GPU-specific optimizations, F-COO provides highly-optimized implementations of
sparse tensor computations on GPUs. The performance of the proposed unified
approach is demonstrated for tensor-based kernels such as the Sparse Matricized
Tensor- Times-Khatri-Rao Product (SpMTTKRP) and the Sparse Tensor- Times-Matrix
Multiply (SpTTM) and is used in tensor decomposition algorithms. Compared to
state-of-the-art work we improve the performance of SpTTM and SpMTTKRP up to
3.7 and 30.6 times respectively on NVIDIA Titan-X GPUs. We implement a
CANDECOMP/PARAFAC (CP) decomposition and achieve up to 14.9 times speedup using
the unified method over state-of-the-art libraries on NVIDIA Titan-X GPUs.

Spreading a Confirmed Rumor: A Case for Oscillatory Dynamics

Bartlomiej Dudek, Adrian Kosowski (GANG)
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Dynamical Systems (math.DS)

We consider an information spreading problem in which a population of (n)
agents is to determine, through random pairwise interactions, whether an
authoritative rumor source (X) is present in the population or not. The studied
problem is a generalization of the rumor spreading problem, in which we
additionally impose that the rumor should disappear when the rumor source no
longer exists. It is also a generalization of the self-stabilizing broadcasting
problem and has direct application to amplifying trace concentrations in
chemical reaction networks.We show that there exists a protocol such that,
starting from any possible initial state configuration, in the absence of a
rumor source all agents reach a designated “uninformed” state after (O(log^2
n)) rounds w.h.p., whereas in the presence of the rumor source, at any time
after at least (O(log n)) rounds from the moment (X) appears, at least ((1
-varepsilon)n) agents are in an “informed” state with probability (1 –
O(1/n)), where (varepsilon>0) may be arbitrarily fixed. The protocol uses a
constant number of states and its operation relies on an underlying oscillatory
dynamics with a closed limit orbit. On the negative side, we show that any
system which has such an ability to “suppress false rumors” in sub-polynomial
time must either exhibit significant and perpetual variations of opinion over
time in the presence of the rumor source, or use a super-constant number of
states.

AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks

New types of machine learning hardware in development and entering the market
hold the promise of revolutionizing deep learning in a manner as profound as
GPUs. However, existing software frameworks and training algorithms for deep
learning have yet to evolve to fully leverage the capability of the new wave of
silicon. We already see the limitations of existing algorithms for models that
exploit structured input via complex and instance-dependent control flow, which
prohibits minibatching. We present an {em asynchronous model-parallel} (AMP)
training algorithm that is specifically motivated by training on networks of
interconnected devices. Through an implementation on multi-core CPUs, we show
that AMP training converges to the same accuracy as conventional synchronous
training algorithms in a similar number of epochs, but utilizes the available
hardware more efficiently even for small minibatch sizes, resulting in
significantly shorter overall training times. Our framework opens the door for
scaling up a new class of deep learning models that cannot be efficiently
trained today.

Learning

Contextual Explanation Networks

Maruan Al-Shedivat, Avinava Dubey, Eric P. Xing
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We introduce contextual explanation networks (CENs)—a class of models that
learn to predict by generating and leveraging intermediate explanations. CENs
combine deep networks with context-specific probabilistic models and construct
explanations in the form of locally-correct hypotheses. Contrary to the
existing post-hoc model-explanation tools, CENs learn to predict and to explain
jointly. Our approach offers two major advantages: (i) for each prediction,
valid instance-specific explanations are generated with no computational
overhead and (ii) prediction via explanation acts as a regularization and
boosts performance in low-resource settings. We prove that local approximations
to the decision boundary of our networks are consistent with the generated
explanations. Our results on image and text classification and survival
analysis tasks demonstrate that CENs can easily match or outperform the
state-of-the-art while offering additional insights behind each prediction,
valuable for decision support.

Boltzmann Exploration Done Right

Nicolò Cesa-Bianchi, Claudio Gentile, Gábor Lugosi, Gergely Neu
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Boltzmann exploration is a classic strategy for sequential decision-making
under uncertainty, and is one of the most standard tools in Reinforcement
Learning (RL). Despite its widespread use, there is virtually no theoretical
understanding about the limitations or the actual benefits of this exploration
scheme. Does it drive exploration in a meaningful way? Is it prone to
misidentifying the optimal actions or spending too much time exploring the
suboptimal ones? What is the right tuning for the learning rate? In this paper,
we address several of these questions in the classic setup of stochastic
multi-armed bandits. One of our main results is showing that the Boltzmann
exploration strategy with any monotone learning-rate sequence will induce
suboptimal behavior. As a remedy, we offer a simple non-monotone schedule that
guarantees near-optimal performance, albeit only when given prior access to key
problem parameters that are typically not available in practical situations
(like the time horizon (T) and the suboptimality gap (Delta)). More
importantly, we propose a novel variant that uses different learning rates for
different arms, and achieves a distribution-dependent regret bound of order
(frac{Klog^2 T}{Delta}) and a distribution-independent bound of order
(sqrt{KT}log K) without requiring such prior knowledge. To demonstrate the
flexibility of our technique, we also propose a variant that guarantees the
same performance bounds even if the rewards are heavy-tailed.

Deep Learning for Patient-Specific Kidney Graft Survival Analysis

Margaux Luck, Tristan Sylvain, Héloïse Cardinal, Andrea Lodi, Yoshua Bengio
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

An accurate model of patient-specific kidney graft survival distributions can
help to improve shared-decision making in the treatment and care of patients.
In this paper, we propose a deep learning method that directly models the
survival function instead of estimating the hazard function to predict survival
times for graft patients based on the principle of multi-task learning. By
learning to jointly predict the time of the event, and its rank in the cox
partial log likelihood framework, our deep learning approach outperforms, in
terms of survival time prediction quality and concordance index, other common
methods for survival analysis, including the Cox Proportional Hazards model and
a network trained on the cox partial log-likelihood.

Mining Process Model Descriptions of Daily Life through Event Abstraction

Process mining techniques focus on extracting insight in processes from event
logs. Process mining has the potential to provide valuable insights in
(un)healthy habits and to contribute to ambient assisted living solutions when
applied on data from smart home environments. However, events recorded in smart
home environments are on the level of sensor triggers, at which process
discovery algorithms produce overgeneralizing process models that allow for too
much behavior and that are difficult to interpret for human experts. We show
that abstracting the events to a higher-level interpretation can enable
discovery of more precise and more comprehensible models. We present a
framework for the extraction of features that can be used for abstraction with
supervised learning methods that is based on the XES IEEE standard for event
logs. This framework can automatically abstract sensor-level events to their
interpretation at the human activity level, after training it on training data
for which both the sensor and human activity events are known. We demonstrate
our abstraction framework on three real-life smart home event logs and show
that the process models that can be discovered after abstraction are more
precise indeed.

Kronecker Recurrent Units

Cijo Jose, Moustpaha Cisse, Francois Fleuret
Subjects: Learning (cs.LG)

Our work addresses two important issues with recurrent neural networks: (1)
they are over-parameterized, and (2) the recurrence matrix is ill-conditioned.
The former increases the sample complexity of learning and the training time.
The latter causes the vanishing and exploding gradient problem.

We present a flexible recurrent neural network model called Kronecker
Recurrent Units (KRU). KRU achieves parameter efficiency in RNNs through a
Kronecker factored recurrent matrix. It overcomes the ill-conditioning of the
recurrent matrix by enforcing soft unitary constraints on the factors. Thanks
to the small dimensionality of the factors, maintaining these constraints is
computationally efficient.

Our experimental results on five standard data-sets reveal that KRU can
reduce the number of parameters by three orders of magnitude in the recurrent
weight matrix compared to the existing recurrent models, without trading the
statistical performance.

These results in particular show that while there are advantages in having a
high dimensional recurrent space, the capacity of the recurrent part of the
model can be dramatically reduced.

Distributed Convolutional Sparse Coding

Thomas Moreau, Laurent Oudre, Nicolas Vayatis
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We consider the problem of building shift-invariant representations for long
signals in the context of distributed processing. We propose an asynchronous
algorithm based on coordinate descent called DICOD to efficiently solve the
(ell_1)-minimization problems involved in convolutional sparse coding. This
algorithm leverages the weak temporal dependency of the convolution to reduce
the interprocess communication to a few local messages. We prove that this
algorithm converges to the optimal solution and that it scales with superlinear
speedup, up to a certain limit. These properties are illustrated with numerical
experiments and our algorithm is compared to the state-of-the-art methods used
for convolutional sparse coding.

Learning Network Structures from Contagion

Adisak Supeesun (Kasetsart University, Bangkok, Thailand), Jittat Fakcharoenphol (Kasetsart University, Bangkok, Thailand)
Journal-ref: Information Processing Letters, Volume 121, May 2017, Pages 11-16
Subjects: Learning (cs.LG); Social and Information Networks (cs.SI)

In 2014, Amin, Heidari, and Kearns proved that tree networks can be learned
by observing only the infected set of vertices of the contagion process under
the independent cascade model, in both the active and passive query models.
They also showed empirically that simple extensions of their algorithms work on
sparse networks. In this work, we focus on the active model. We prove that a
simple modification of Amin et al.’s algorithm works on more general classes of
networks, namely (i) networks with large girth and low path growth rate, and
(ii) networks with bounded degree. This also provides partial theoretical
explanation for Amin et al.’s experiments on sparse networks.

Improving the Expected Improvement Algorithm

Chao Qin, Diego Klabjan, Daniel Russo
Comments: Submitted to NIPS 2017
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

The expected improvement (EI) algorithm is a popular strategy for information
collection in optimization under uncertainty. The algorithm is widely known to
be too greedy, but nevertheless enjoys wide use due to its simplicity and
ability to handle uncertainty and noise in a coherent decision theoretic
framework. To provide rigorous insight into EI, we study its properties in a
simple setting of Bayesian optimization where the domain consists of a finite
grid of points. This is the so-called best-arm identification problem, where
the goal is to allocate measurement effort wisely to confidently identify the
best arm using a small number of measurements. In this framework, one can show
formally that EI is far from optimal. To overcome this shortcoming, we
introduce a simple modification of the expected improvement algorithm.
Surprisingly, this simple change results in an algorithm that is asymptotically
optimal for Gaussian best-arm identification problems, and provably outperforms
standard EI by an order of magnitude.

Learning Data Manifolds with a Cutting Plane Method

SueYeon Chung, Uri Cohen, Haim Sompolinsky, Daniel D. Lee
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We consider the problem of classifying data manifolds where each manifold
represents invariances that are parameterized by continuous degrees of freedom.
Conventional data augmentation methods rely upon sampling large numbers of
training examples from these manifolds; instead, we propose an iterative
algorithm called M_{CP} based upon a cutting-plane approach that efficiently
solves a quadratic semi-infinite programming problem to find the maximum margin
solution. We provide a proof of convergence as well as a polynomial bound on
the number of iterations required for a desired tolerance in the objective
function. The efficiency and performance of M_{CP} are demonstrated in
high-dimensional simulations and on image manifolds generated from the ImageNet
dataset. Our results indicate that M_{CP} is able to rapidly learn good
classifiers and shows superior generalization performance compared with
conventional maximum margin methods using data augmentation methods.

Convergence Analysis of Two-layer Neural Networks with ReLU Activation

Yuanzhi Li, Yang Yuan
Subjects: Learning (cs.LG)

In recent years, stochastic gradient descent (SGD) based techniques has
become the standard tools for training neural networks. However, formal
theoretical understanding of why SGD can train neural networks in practice is
largely missing.

In this paper, we make progress on understanding this mystery by providing a
convergence analysis for SGD on a rich subset of two-layer feedforward networks
with ReLU activations. This subset is characterized by a special structure
called “identity mapping”. We prove that, if input follows from Gaussian
distribution, with standard (O(1/sqrt{d})) initialization of the weights, SGD
converges to the global minimum in polynomial number of steps. Unlike normal
vanilla networks, the “identity mapping” makes our network asymmetric and thus
the global minimum is unique. To complement our theory, we are also able to
show experimentally that multi-layer networks with this mapping have better
performance compared with normal vanilla networks.

Our convergence theorem differs from traditional non-convex optimization
techniques. We show that SGD converges to optimal in “two phases”: In phase I,
the gradient points to the wrong direction, however, a potential function (g)
gradually decreases. Then in phase II, SGD enters a nice one point convex
region and converges. We also show that the identity mapping is necessary for
convergence, as it moves the initial point to a better place for optimization.
Experiment verifies our claims.

BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet

Binary Neural Networks (BNNs) can drastically reduce memory size and accesses
by applying bit-wise operations instead of standard arithmetic operations.
Therefore it could significantly improve the efficiency and lower the energy
consumption at runtime, which enables the application of state-of-the-art deep
learning models on low power devices. BMXNet is an open-source BNN library
based on MXNet, which supports both XNOR-Networks and Quantized Neural
Networks. The developed BNN layers can be seamlessly applied with other
standard library components and work in both GPU and CPU mode. BMXNet is
maintained and developed by the multimedia research group at Hasso Plattner
Institute and released under Apache license. Extensive experiments validate the
efficiency and effectiveness of our implementation. The BMXNet library, several
sample projects, and a collection of pre-trained binary deep models are
available for download at this https URL

AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks

New types of machine learning hardware in development and entering the market
hold the promise of revolutionizing deep learning in a manner as profound as
GPUs. However, existing software frameworks and training algorithms for deep
learning have yet to evolve to fully leverage the capability of the new wave of
silicon. We already see the limitations of existing algorithms for models that
exploit structured input via complex and instance-dependent control flow, which
prohibits minibatching. We present an {em asynchronous model-parallel} (AMP)
training algorithm that is specifically motivated by training on networks of
interconnected devices. Through an implementation on multi-core CPUs, we show
that AMP training converges to the same accuracy as conventional synchronous
training algorithms in a similar number of epochs, but utilizes the available
hardware more efficiently even for small minibatch sizes, resulting in
significantly shorter overall training times. Our framework opens the door for
scaling up a new class of deep learning models that cannot be efficiently
trained today.

Good Semi-supervised Learning that Requires a Bad GAN

Zihang Dai, Zhilin Yang, Fan Yang, William W. Cohen, Ruslan Salakhutdinov
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Semi-supervised learning methods based on generative adversarial networks
(GANs) obtained strong empirical results, but it is not clear 1) how the
discriminator benefits from joint training with a generator, and 2) why good
semi-supervised classification performance and a good generator cannot be
obtained at the same time. Theoretically, we show that given the discriminator
objective, good semisupervised learning indeed requires a bad generator, and
propose the definition of a preferred generator. Empirically, we derive a novel
formulation based on our analysis that substantially improves over feature
matching GANs, obtaining state-of-the-art results on multiple benchmark
datasets.

A Multi-strength Adversarial Training Method to Mitigate Adversarial Attacks

Chang Song, Hsin-Pai Cheng, Chunpeng Wu, Hai Li, Yiran Chen, Qing Wu
Comments: Submitted to NIPS 2017
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Some recent works revealed that deep neural networks (DNNs) are vulnerable to
so-called adversarial attacks where input examples are intentionally perturbed
to fool DNNs. In this work, we revisit the DNN training process that includes
adversarial examples into the training dataset so as to improve DNN’s
resilience to adversarial attacks, namely, adversarial training. Our
experiments show that different adversarial strengths, i.e., perturbation
levels of adversarial examples, have different working zones to resist the
attack. Based on the observation, we propose a multi-strength adversarial
training method (MAT) that combines the adversarial training examples with
different adversarial strengths to defend adversarial attacks. Two training
structures – mixed MAT and parallel MAT – are developed to facilitate the
tradeoffs between training time and memory occupation. Our results show that
MAT can substantially minimize the accuracy degradation of deep learning
systems to adversarial attacks on MNIST, CIFAR-10, CIFAR-100, and SVHN.

Multiple Source Domain Adaptation with Adversarial Training of Neural Networks

Han Zhao, Shanghang Zhang, Guanhang Wu, João P. Costeira, José M. F. Moura, Geoffrey J. Gordon
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We propose a new generalization bound for domain adaptation when there are
multiple source domains with labeled instances and one target domain with
unlabeled instances. The new bound has an interesting interpretation and
reduces to an existing bound when there is only one source domain. Compared
with existing bounds, the new bound does not require expert knowledge about the
target distribution, nor the optimal combination rule for multisource domains.
Interestingly, our theory also leads to an efficient implementation using
adversarial neural networks: we show how to interpret it as learning feature
representations that are invariant to the multiple domain shifts while still
being discriminative for the learning task. To this end, we propose two models,
both of which we call multisource domain adversarial networks (MDANs): the
first model optimizes directly our bound, while the second model is a smoothed
approximation of the first one, leading to a more data-efficient and
task-adaptive model. The optimization tasks of both models are minimax saddle
point problems that can be optimized by adversarial training. To demonstrate
the effectiveness of MDANs, we conduct extensive experiments showing superior
adaptation performance on three real-world datasets: sentiment analysis, digit
classification, and vehicle counting.

Fisher GAN

Youssef Mroueh, Tom Sercu
Comments: submitted to NIPS 2017
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Generative Adversarial Networks (GANs) are powerful models for learning
complex distributions. Stable training of GANs has been addressed in many
recent works which explore different metrics between distributions. In this
paper we introduce Fisher GAN which fits within the Integral Probability
Metrics (IPM) framework for training GANs. Fisher GAN defines a critic with a
data dependent constraint on its second order moments. We show in this paper
that Fisher GAN allows for stable and time efficient training that does not
compromise the capacity of the critic, and does not need data independent
constraints such as weight clipping. We analyze our Fisher IPM theoretically
and provide an algorithm based on Augmented Lagrangian for Fisher GAN. We
validate our claims on both image sample generation and semi-supervised
classification using Fisher GAN.

Fast Single-Class Classification and the Principle of Logit Separation

Gil Keren, Sivan Sabato, Björn Schuller
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We consider neural network training, in applications in which there are many
possible classes, but at test time the task is to identify whether the example
belongs to one specific class, e.g., when searching for photos of a specific
person. We focus on reducing the computational burden at test-time in such
applications. We define the Single Logit Classification (SLC) task: training
the network so that at test time, it would be possible to accurately identify
if the example belongs to a given class, based only on the output logit of the
trained model for this class. We propose a natural principle, the Principle of
Logit Separation (PoLS), as a guideline for choosing and designing losses
suitable for the SLC. We study previously suggested losses and their alignment
with the PoLS. We further derive new batch versions of known losses, and show
that unlike the standard versions, these new versions satisfy the PoLS. Our
experiments show that the losses aligned with the PoLS overwhelmingly
outperform the other losses on the SLC task. Tensorflow code for optimizing the
new batch losses is publicly available in
this https URL

Latent Intention Dialogue Models

Developing a dialogue agent that is capable of making autonomous decisions
and communicating by natural language is one of the long-term goals of machine
learning research. Traditional approaches either rely on hand-crafting a small
state-action set for applying reinforcement learning that is not scalable or
constructing deterministic models for learning dialogue sentences that fail to
capture natural conversational variability. In this paper, we propose a Latent
Intention Dialogue Model (LIDM) that employs a discrete latent variable to
learn underlying dialogue intentions in the framework of neural variational
inference. In a goal-oriented dialogue scenario, these latent intentions can be
interpreted as actions guiding the generation of machine responses, which can
be further refined autonomously by reinforcement learning. The experimental
evaluation of LIDM shows that the model out-performs published benchmarks for
both corpus-based and human evaluation, demonstrating the effectiveness of
discrete latent variable models for learning goal-oriented dialogues.

On Multilingual Training of Neural Dependency Parsers

We show that a recently proposed neural dependency parser can be improved by
joint training on multiple languages from the same family. The parser is
implemented as a deep neural network whose only input is orthographic
representations of words. In order to successfully parse, the network has to
discover how linguistically relevant concepts can be inferred from word
spellings. We analyze the representations of characters and words that are
learned by the network to establish which properties of languages were
accounted for. In particular we show that the parser has approximately learned
to associate Latin characters with their Cyrillic counterparts and that it can
group Polish and Russian words that have a similar grammatical function.
Finally, we evaluate the parser on selected languages from the Universal
Dependencies dataset and show that it is competitive with other recently
proposed state-of-the art methods, while having a simple structure.

Adaptive Classification for Prediction Under a Budget

Feng Nan, Venkatesh Saligrama
Comments: arXiv admin note: substantial text overlap with arXiv:1704.07505
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We propose a novel adaptive approximation approach for test-time
resource-constrained prediction. Given an input instance at test-time, a gating
function identifies a prediction model for the input among a collection of
models. Our objective is to minimize overall average cost without sacrificing
accuracy. We learn gating and prediction models on fully labeled training data
by means of a bottom-up strategy. Our novel bottom-up method first trains a
high-accuracy complex model. Then a low-complexity gating and prediction model
are subsequently learned to adaptively approximate the high-accuracy model in
regions where low-cost models are capable of making highly accurate
predictions. We pose an empirical loss minimization problem with cost
constraints to jointly train gating and prediction models. On a number of
benchmark datasets our method outperforms state-of-the-art achieving higher
accuracy for the same cost.

Tangent Cones to TT Varieties

Benjamin Kutschan
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Algebraic Geometry (math.AG); Numerical Analysis (math.NA)

As already done for the matrix case for example in [Joe Harris, Algebraic
Geometry – A first course, p.256] we give a parametrization of the Bouligand
tangent cone of the variety of tensors of bounded TT rank. We discuss how the
proof generalizes to any binary hierarchical format. The parametrization can be
rewritten as an orthogonal sum of TT tensors. Its retraction onto the variety
is particularly easy to compose. We also give an implicit description of the
tangent cone as the solution of a system of polynomial equations.

On Residual CNN in text-dependent speaker verification task

Egor Malykh, Sergey Novoselov, Oleg Kudashev
Comments: Accepted for Specom 2017
Subjects: Sound (cs.SD); Learning (cs.LG)

Deep learning approaches are still not very common in the speaker
verification field. We investigate the possibility of using deep residual
convolutional neural network with spectrograms as an input features in the
text-dependent speaker verification task. Despite the fact that we were not
able to surpass the baseline system in quality, we achieved a quite good
results for such a new approach getting an 5.23\% ERR on the RSR2015 evaluation
part. Fusion of the baseline and proposed systems outperformed the best
individual system by 18\% relatively.

Implicit Variational Inference with Kernel Density Ratio Fitting

Jiaxin Shi, Shengyang Sun, Jun Zhu
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Recent progress in variational inference has paid much attention to the
flexibility of variational posteriors. Work has been done to use implicit
distributions, i.e., distributions without tractable likelihoods as the
variational posterior. However, existing methods on implicit posteriors still
face challenges of noisy estimation and can hardly scale to high-dimensional
latent variable models. In this paper, we present an implicit variational
inference approach with kernel density ratio fitting that addresses these
challenges. As far as we know, for the first time implicit variational
inference is successfully applied to Bayesian neural networks, which shows
promising results on both regression and classification tasks.

Coreset Construction via Randomized Matrix Multiplication

Jiasen Yang, Agniva Chowdhury, Petros Drineas
Comments: 27 pages, 11 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Coresets are small sets of points that approximate the properties of a larger
point-set. For example, given a compact set (mathcal{S} subseteq
mathbb{R}^d), a coreset could be defined as a (weighted) subset of
(mathcal{S}) that approximates the sum of squared distances from (mathcal{S})
to every linear subspace of (mathbb{R}^d). As such, coresets can be used as a
proxy to the full dataset and provide an important technique to speed up
algorithms for solving problems including principal component analysis, latent
semantic indexing, etc. In this paper, we provide a structural result that
connects the construction of such coresets to approximating matrix products.
This structural result implies a simple, randomized algorithm that constructs
coresets whose sizes are independent of the number and dimensionality of the
input points. The expected size of the resulting coresets yields an improvement
over the state-of-the-art deterministic approach. Finally, we evaluate the
proposed randomized algorithm on synthetic and real data, and demonstrate its
effective performance relative to its deterministic counterpart.

Temporal anomaly detection: calibrating the surprise

Eyal Gutflaish, Aryeh Kontorovich, Sivan Sabato, Ofer Biller, Oded Sofer
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

We propose a hybrid approach to temporal anomaly detection in user-database
access data — or more generally, any kind of subject-object co-occurrence
data. Our methodology allows identifying anomalies based on a single stationary
model, instead of requiring a full temporal one, which would be prohibitive in
our setting. We learn our low-rank stationary model from the high-dimensional
training data, and then fit a regression model for predicting the expected
likelihood score of normal access patterns in the future. The disparity between
the predicted and the observed likelihood scores is used to assess the
“surprise”. This approach enables calibration of the anomaly score so that
time-varying normal behavior patterns are not considered anomalous. We provide
a detailed description of the algorithm, including a convergence analysis, and
report encouraging empirical results. One of the datasets we tested is new for
the public domain. It consists of two months’ worth of database access records
from a live system. This dataset will be made publicly available, and is
provided in the supplementary material.

Deep Learning for User Comment Moderation

John Pavlopoulos, Prodromos Malakasiotis, Ion Androutsopoulos
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Experimenting with a new dataset of 1.6M user comments from a Greek news
portal and existing datasets of English Wikipedia comments, we show that an RNN
outperforms the previous state of the art in moderation. A deep,
classification-specific attention mechanism improves further the overall
performance of the RNN. We also compare against a CNN and a word-list baseline,
considering both fully automatic and semi-automatic moderation.

Conditional CycleGAN for Attribute Guided Face Image Generation

Yongyi Lu, Yu-Wing Tai, Chi-Keung Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

State-of-the-art techniques in Generative Adversarial Networks (GANs) such as
cycleGAN is able to learn the mapping of one image domain (X) to another image
domain (Y) using unpaired image data. We extend the cycleGAN to ({it
Conditional}) cycleGAN such that the mapping from (X) to (Y) is subjected to
attribute condition (Z). Using face image generation as an application example,
where (X) is a low resolution face image, (Y) is a high resolution face image,
and (Z) is a set of attributes related to facial appearance (e.g. gender, hair
color, smile), we present our method to incorporate (Z) into the network, such
that the hallucinated high resolution face image (Y’) not only satisfies the
low resolution constrain inherent in (X), but also the attribute condition
prescribed by (Z). Using face feature vector extracted from face verification
network as (Z), we demonstrate the efficacy of our approach on
identity-preserving face image super-resolution. Our approach is general and
applicable to high-quality face image generation where specific facial
attributes can be controlled easily in the automatically generated results.

Bayesian Unification of Gradient and Bandit-based Learning for Accelerated Global Optimisation

Ole-Christoffer Granmo
Comments: 15th IEEE International Conference on Machine Learning and Applications (ICMLA 2016)
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

Bandit based optimisation has a remarkable advantage over gradient based
approaches due to their global perspective, which eliminates the danger of
getting stuck at local optima. However, for continuous optimisation problems or
problems with a large number of actions, bandit based approaches can be
hindered by slow learning. Gradient based approaches, on the other hand,
navigate quickly in high-dimensional continuous spaces through local
optimisation, following the gradient in fine grained steps. Yet, apart from
being susceptible to local optima, these schemes are less suited for online
learning due to their reliance on extensive trial-and-error before the optimum
can be identified. In this paper, we propose a Bayesian approach that unifies
the above two paradigms in one single framework, with the aim of combining
their advantages. At the heart of our approach we find a stochastic linear
approximation of the function to be optimised, where both the gradient and
values of the function are explicitly captured. This allows us to learn from
both noisy function and gradient observations, and predict these properties
across the action space to support optimisation. We further propose an
accompanying bandit driven exploration scheme that uses Bayesian credible
bounds to trade off exploration against exploitation. Our empirical results
demonstrate that by unifying bandit and gradient based learning, one obtains
consistently improved performance across a wide spectrum of problem
environments. Furthermore, even when gradient feedback is unavailable, the
flexibility of our model, including gradient prediction, still allows us
outperform competing approaches, although with a smaller margin. Due to the
pervasiveness of bandit based optimisation, our scheme opens up for improved
performance both in meta-optimisation and in applications where gradient
related information is readily available.

Dimensionality reduction for acoustic vehicle classification with spectral clustering

Justin Sunu, Allon G. Percus
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)

Classification of vehicles has broad applications, ranging from traffic flow
management to military target recognition. We propose a method for automated
identification of moving vehicles from roadside audio sensors. We use a
short-time Fourier transform to decompose audio signals, and treat the
frequency signature at each time window as an individual data point to be
classified. Using spectral clustering, we then decrease the dimensionality of
the data sufficiently for K-nearest neighbors to provide accurate vehicle
identification.

Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes

Zhenwen Dai, Mauricio A. Álvarez, Neil D. Lawrence
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Often in machine learning, data are collected as a combination of multiple
conditions, e.g., the voice recordings of multiple persons, each labeled with
an ID. How could we build a model that captures the latent information related
to these conditions and generalize to a new one with few data? We present a new
model called Latent Variable Multiple Output Gaussian Processes (LVMOGP) and
that allows to jointly model multiple conditions for regression and generalize
to a new condition with a few data points at test time. LVMOGP infers the
posteriors of Gaussian processes together with a latent space representing the
information about different conditions. We derive an efficient variational
inference method for LVMOGP, of which the computational complexity is as low as
sparse Gaussian processes. We show that LVMOGP significantly outperforms
related Gaussian process methods on various tasks with both synthetic and real
data.

Lifelong Generative Modeling

Jason Ramapuram, Magda Gregorova, Alexandros Kalousis
Comments: 14 pages, 9 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Lifelong learning is the problem of learning multiple consecutive tasks in an
online manner and is essential towards the development of intelligent machines
that can adapt to their surroundings. In this work we focus on learning a
lifelong approach to generative modeling whereby we continuously incorporate
newly observed distributions into our model representation. We utilize two
models, aptly named the student and the teacher, in order to aggregate
information about all past distributions without the preservation of any of the
past data or previous models. The teacher is utilized as a form of compressed
memory in order to allow for the student model to learn over the past as well
as present data. We demonstrate why a naive approach to lifelong generative
modeling fails and introduce a regularizer with which we demonstrate learning
across a long range of distributions.

Global hard thresholding algorithms for joint sparse image representation and denoising

Reza Borhani, Jeremy Watt, Aggelos Katsaggelos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Sparse coding of images is traditionally done by cutting them into small
patches and representing each patch individually over some dictionary given a
pre-determined number of nonzero coefficients to use for each patch. In lack of
a way to effectively distribute a total number (or global budget) of nonzero
coefficients across all patches, current sparse recovery algorithms distribute
the global budget equally across all patches despite the wide range of
differences in structural complexity among them. In this work we propose a new
framework for joint sparse representation and recovery of all image patches
simultaneously. We also present two novel global hard thresholding algorithms,
based on the notion of variable splitting, for solving the joint sparse model.
Experimentation using both synthetic and real data shows effectiveness of the
proposed framework for sparse image representation and denoising tasks.
Additionally, time complexity analysis of the proposed algorithms indicate high
scalability of both algorithms, making them favorable to use on large megapixel
images.

PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations

We propose position-velocity encoders (PVEs) which learn—without
supervision—to encode images to positions and velocities of task-relevant
objects. PVEs encode a single image into a low-dimensional position state and
compute the velocity state from finite differences in position. In contrast to
autoencoders, position-velocity encoders are not trained by image
reconstruction, but by making the position-velocity representation consistent
with priors about interacting with the physical world. We applied PVEs to
several simulated control tasks from pixels and achieved promising preliminary
results.

Growth-Optimal Portfolio Selection under CVaR Constraints

Guy Uziel, Ran El-Yaniv
Subjects: Mathematical Finance (q-fin.MF); Learning (cs.LG)

Online portfolio selection research has so far focused mainly on minimizing
regret defined in terms of wealth growth. Practical financial decision making,
however, is deeply concerned with both wealth and risk. We consider online
learning of portfolios of stocks whose prices are governed by arbitrary
(unknown) stationary and ergodic processes, where the goal is to maximize
wealth while keeping the conditional value at risk (CVaR) below a desired
threshold. We characterize the asymptomatically optimal risk-adjusted
performance and present an investment strategy whose portfolios are guaranteed
to achieve the asymptotic optimal solution while fulfilling the desired risk
constraint. We also numerically demonstrate and validate the viability of our
method on standard datasets.

Deep Complex Networks

At present, the vast majority of building blocks, techniques, and
architectures for deep learning are based on real-valued operations and
representations. However, recent work on recurrent neural networks and older
fundamental theoretical analysis suggests that complex numbers could have a
richer representational capacity and could also facilitate noise-robust memory
retrieval mechanisms. Despite their attractive properties and potential for
opening up entirely new neural architectures, complex-valued deep neural
networks have been marginalized due to the absence of the building blocks
required to design such models. In this work, we provide the key atomic
components for complex-valued deep neural networks and apply them to
convolutional feed-forward networks. More precisely, we rely on complex
convolutions and present algorithms for complex batch-normalization, complex
weight initialization strategies for complex-valued neural nets and we use them
in experiments with end-to-end training schemes. We demonstrate that such
complex-valued models are able to achieve comparable or better performance than
their real-valued counterparts. We test deep complex models on several computer
vision tasks and on music transcription using the MusicNet dataset where we
achieve state of the art performance.

Stochastic Feedback Control of Systems with Unknown Nonlinear Dynamics

Dan Yu, Mohammadhussein Rafieisakhaei, Suman Chakravorty
Comments: 7 pages, 7 figures, submitted to 56th IEEE Conference on Decision and Control (CDC), 2017
Subjects: Systems and Control (cs.SY); Learning (cs.LG)

This paper studies the stochastic optimal control problem for systems with
unknown dynamics. First, an open-loop deterministic trajectory optimization
problem is solved without knowing the explicit form of the dynamical system.
Next, a Linear Quadratic Gaussian (LQG) controller is designed for the nominal
trajectory-dependent linearized system, such that under a small noise
assumption, the actual states remain close to the optimal trajectory. The
trajectory-dependent linearized system is identified using input-output
experimental data consisting of the impulse responses of the nominal system. A
computational example is given to illustrate the performance of the proposed
approach.

Online Auctions and Multi-scale Online Learning

Sébastien Bubeck, Nikhil R. Devanur, Zhiyi Huang, Rad Niazadeh
Comments: In Proceeding of 18th ACM conference on Economics and Computation (EC 2017)
Subjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Machine Learning (stat.ML)

We consider revenue maximization in online auctions and pricing. A seller
sells an identical item in each period to a new buyer, or a new set of buyers.
For the online posted pricing problem, we show regret bounds that scale with
the best fixed price, rather than the range of the values. We also show regret
bounds that are almost scale free, and match the offline sample complexity,
when comparing to a benchmark that requires a lower bound on the market share.
These results are obtained by generalizing the classical learning from experts
and multi-armed bandit problems to their multi-scale versions. In this version,
the reward of each action is in a different range, and the regret w.r.t. a
given action scales with its own range, rather than the maximum range.

Deriving Neural Architectures from Sequence and Graph Kernels

Tao Lei, Wengong Jin, Regina Barzilay, Tommi Jaakkola
Comments: to appear at ICML 2017; includes additional discussions
Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG)

The design of neural architectures for structured objects is typically guided
by experimental insights rather than a formal process. In this work, we appeal
to kernels over combinatorial structures, such as sequences and graphs, to
derive appropriate neural operations. We introduce a class of deep recurrent
neural operations and formally characterize their associated kernel spaces. Our
recurrent modules compare the input to virtual reference objects (cf. filters
in CNN) via the kernels. Similar to traditional neural operations, these
reference objects are parameterized and directly optimized in end-to-end
training. We empirically evaluate the proposed class of neural architectures on
standard applications such as language modeling and molecular graph regression,
achieving state-of-the-art or competitive results across these applications. We
also draw connections to existing architectures such as LSTMs.

Deep Descriptor Transforming for Image Co-Localization

Xiu-Shen Wei, Chen-Lin Zhang, Yao Li, Chen-Wei Xie, Jianxin Wu, Chunhua Shen, Zhi-Hua Zhou
Comments: Accepted by IJCAI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Reusable model design becomes desirable with the rapid expansion of machine
learning applications. In this paper, we focus on the reusability of
pre-trained deep convolutional models. Specifically, different from treating
pre-trained models as feature extractors, we reveal more treasures beneath
convolutional layers, i.e., the convolutional activations could act as a
detector for the common object in the image co-localization problem. We propose
a simple but effective method, named Deep Descriptor Transforming (DDT), for
evaluating the correlations of descriptors and then obtaining the
category-consistent regions, which can accurately locate the common object in a
set of images. Empirical studies validate the effectiveness of the proposed DDT
method. On benchmark image co-localization datasets, DDT consistently
outperforms existing state-of-the-art methods by a large margin. Moreover, DDT
also demonstrates good generalization ability for unseen categories and
robustness for dealing with noisy data.

Information Theory

Near Optimal Online Distortion Minimization for Energy Harvesting Nodes

Ahmed Arafa, Sennur Ulukus
Comments: To appear in the 2017 IEEE International Symposium on Information Theory
Subjects: Information Theory (cs.IT)

We consider online scheduling for an energy harvesting communication system
where a sensor node collects samples from a Gaussian source and sends them to a
destination node over a Gaussian channel. The sensor is equipped with a
finite-sized battery that is recharged by an independent and identically
distributed (i.i.d.) energy harvesting process over time. The goal is to
minimize the long term average distortion of the source samples received at the
destination. We study two problems: the first is when sampling is cost-free,
and the second is when there is a sampling cost incurred whenever samples are
collected. We show that fixed fraction policies [Shaviv-Ozgur], in which a
fixed fraction of the battery state is consumed in each time slot, are
near-optimal in the sense that they achieve a long term average distortion that
lies within a constant additive gap from the optimal solution for all energy
arrivals and battery sizes. For the problem with sampling costs, the
transmission policy is bursty; the sensor can collect samples and transmit for
only a portion of the time.

Robustness to unknown error in sparse regularization

Simone Brugiapaglia, Ben Adcock
Subjects: Information Theory (cs.IT); Numerical Analysis (math.NA)

Quadratically-constrained basis pursuit has become a popular device in sparse
regularization; in particular, in the context of compressed sensing. However,
the majority of theoretical error estimates for this regularizer assume an a
priori bound on the noise level, which is usually lacking in practice. In this
paper, we develop stability and robustness estimates which remove this
assumption. First, we introduce an abstract framework and show that robust
instance optimality of any decoder in the noise-aware setting implies stability
and robustness in the noise-blind setting. This is based on certain sup-inf
constants referred to as quotients, strictly related to the quotient property
of compressed sensing. We then apply this theory to prove the robustness of
quadratically-constrained basis pursuit under unknown error in the cases of
random Gaussian matrices and of random matrices with heavy-tailed rows, such as
random sampling matrices from bounded orthonormal systems. We illustrate our
results in several cases of practical importance, including subsampled Fourier
measurements and recovery of sparse polynomial expansions.

Session-Based Cooperation in Cognitive Radio Networks: A Network-Level Approach

Haichuan Ding, Chi Zhang, Xuanheng Li, Jianqing Liu, Miao Pan, Yuguang Fang, Shigang Chen
Subjects: Information Theory (cs.IT)

In cognitive radio networks (CRNs), secondary users (SUs) can proactively
obtain spectrum access opportunities by helping with primary users’ (PUs’) data
transmissions. Currently, such kind of spectrum access is implemented via a
cooperative communications based link-level frame-based cooperative (LLC)
approach where individual SUs independently serve as relays for PUs in order to
gain spectrum access opportunities. Unfortunately, this LLC approach cannot
fully exploit spectrum access opportunities to enhance the throughput of CRNs
and fails to motivate PUs to join the spectrum sharing processes. To address
these challenges, we propose a network-level session-based cooperative (NLC)
approach where SUs are grouped together to cooperate with PUs session by
session, instead of frame by frame as what has been done in existing works, for
spectrum access opportunities of the corresponding group. Thanks to our
group-based session-by-session cooperating strategy, our NLC approach is able
to address all those challenges in the LLC approach. To articulate our NLC
approach, we further develop an NLC scheme under a cognitive capacity
harvesting network (CCHN) architecture. We formulate the cooperative mechanism
design as a cross-layer optimization problem with constraints on primary
session selection, flow routing and link scheduling. To search for solutions to
the optimization problem, we propose an augmented scheduling index ordering
based (SIO-based) algorithm to identify maximal independent sets. Through
extensive simulations, we demonstrate the effectiveness of the proposed NLC
approach and the superiority of the augmented SIO-based algorithm over the
traditional method.

Energy-Efficient Transponder Configuration for FMF-based Elastic Optical Networks

Mohammad Hadi, Mohammad Reza Pakravan
Comments: 4 pages, 4 figures. arXiv admin note: text overlap with arXiv:1705.06891
Subjects: Information Theory (cs.IT)

We propose an energy-efficient procedure for transponder configuration in
FMF-based elastic optical networks in which quality of service and physical
constraints are guaranteed and joint optimization of transmit optical power,
temporal, spatial and spectral variables are addressed. We use geometric
convexification techniques to provide convex representations for quality of
service, transponder power consumption and transponder configuration problem.
Simulation results show that our convex formulation is considerably faster than
its mixed-integer nonlinear counterpart and its ability to optimize transmit
optical power reduces total transponder power consumption up to 32%. We also
analyze the effect of mode coupling and number of available modes on power
consumption of different network elements.

Rate ((n-1)/n) Systematic MDS Convolutional Codes over (GF(2^m))

Ángela Barbero, Øyvind Ytrehus
Subjects: Information Theory (cs.IT)

A systematic convolutional encoder of rate ((n-1)/n) and maximum degree (D)
generates a code of free distance at most ({cal D} = D+2) and, at best, a
column distance profile (CDP) of ([2,3,ldots,{cal D}]). A code is
emph{Maximum Distance Separable} (MDS) if it possesses this CDP. Applied on a
communication channel over which packets are transmitted sequentially and which
loses (erases) packets randomly, such a code allows the recovery from any
pattern of (j) erasures in the first (j) (n)-packet blocks for (j<{cal D}),
with a delay of at most (j) blocks counting from the first erasure. This paper
addresses the problem of finding the largest ({cal D}) for which a systematic
rate ((n-1)/n) code over (GF(2^m)) exists, for given (n) and (m). In
particular, constructions for rates ((2^m-1)/2^m) and ((2^{m-1}-1)/2^{m-1}) are
presented which provide optimum values of ({cal D}) equal to 3 and 4,
respectively. A search algorithm is also developed, which produces new codes
for ({cal D}) for field sizes (2^m leq 2^{14}). Using a complete search
version of the algorithm, the maximum value of ({cal D}), and codes that
achieve it, are determined for all code rates (geq 1/2) and every field size
(GF(2^m)) for (mleq 5) (and for some rates for (m=6)).

Symmetry Group of Ordered Hamming Block Space

Luciano Panek, Nayene Michele Paião Panek
Subjects: Information Theory (cs.IT)

Let (P = ({1,2,ldots,n,leq)) be a poset that is an union of disjoint
chains of the same length and (V=mathbb{F}_q^N) be the space of (N)-tuples
over the finite field (mathbb{F}_q). Let (V_i = mathbb{F}_q^{k_i}), (1 leq i
leq n), be a family of finite-dimensional linear spaces such that
(k_1+k_2+ldots +k_n = N) and let (V = V_1 oplus V_2 oplus ldots oplus V_n)
endow with the poset block metric (d_{(P,pi)}) induced by the poset (P) and
the partition (pi=(k_1,k_2,ldots,k_n)), encompassing both
Niederreiter-Rosenbloom-Tsfasman metric and error-block metric. In this paper,
we give a complete description of group of symmetries of the metric space
((V,d_{(P,pi)})), called the ordered Hammming block space. In particular, we
reobtain the group of symmetries of the Niederreiter-Rosenbloom-Tsfasman space
and obtain the group of symmetries of the error-block metric space.

User Selection and Widely Linear Multiuser Precoding for One-dimensional Signalling

Majid Bavand, Steven D. Blostein
Subjects: Information Theory (cs.IT)

Massive deployment of low data rate Internet of things and ehealth devices
prompts us to develop more practical precoding and user selection techniques
that comply with these requirements. Moreover, it is known that when the data
is real-valued and the observation is complex-valued, widely linear (WL)
estimation can be employed in lieu of linear estimation to improve the
performance. With these motivations, in this paper, we study the transmit
precoding (beamforming) in multiuser multiple-input single-output
communications systems assuming the transmit signal is one-dimensionally
modulated and widely linear estimation is performed at the receivers.
Closed-form solutions for widely linear maximum ratio transmission (MRT), WL
zero-forcing (ZF), WL minimum mean square error (MMSE), and WL maximum signal
to leakage and noise ratio (MSLNR) precoding are obtained. It is shown that
widely linear processing can potentially double the number of simultaneous
users compared to the linear processing of one-dimensionally modulated signals.
Furthermore, to deal with the increasing number of communications devices a
user selection algorithm compatible with widely linear processing of
one-dimensionally modulated signals is proposed. The proposed user selection
algorithm can double the number of simultaneously selected users compared to
conventional user selection methods.

Local Large Deviations, McMillian Theorem for multitype Galton-Watson Processes

Kwabena Doku-Amponsah
Comments: 6 pages
Subjects: Information Theory (cs.IT)

In this article we prove a local large deviation principle (LLDP) for the
critical multitype Galton-Watson process from spectral potential point. We
define the so-called spectral potential (U_{mathbb{Q}}(,cdot,,pi)) for the
Galton-Watson process, where (pi) is normalized eigen vector corresponding to
the leading eigen value (1) of the transition matrix (A(cdot,,cdot)) defined
from ({mathbb{Q}},) the transition kernel. We show that the Kullback action or
the deviation function, (J(pi,
u),) with respect to an empirical offspring
measure, (
u,) is the Legengre dual of (U_{mathbb{Q}}(,cdot,,pi).) From
the LLDP we deduce a full large deviation principle and a weak variant of the
classical McMillian Theorem for the multitype Galton-Watson process. To be
specific, given any empirical offspring measure (varpi,) we show the number of
critical multitype Galton-Watson processes on (n) vertices is approximately
(e^{nlangle H_{varpi},,pi
angle},) where (H_{varpi}) is suitably defined
entropy.

Near-optimal matrix recovery from random linear measurements

Elad Romanov, Matan Gavish
Subjects: Information Theory (cs.IT)

In matrix recovery from random linear measurements, one is interested in
recovering an unknown (M)-by-(N) matrix (X_0) from (n<MN) measurements
(y_i=Tr(A_i^T X_0)) where each (A_i) is an (M)-by-(N) measurement matrix with
i.i.d random entries, (i=1,ldots,n). We present a novel matrix recovery
algorithm, based on approximate message passing, which iteratively applies an
optimal singular value shrinker — a nonconvex nonlinearity tailored
specifically for matrix estimation. Our algorithm often converges exponentially
fast, offering a significant speedup over previously suggested matrix recovery
algorithms, such as iterative solvers for Nuclear Norm Minimization (NNM). It
is well known that there is recovery tradeoff between the information content
of the object (X_0) to be recovered (specifically, its matrix rank (r)) and the
number of linear measurements (n) from which recovery is to be attempted. The
precise tradeoff between (r) and (n), beyond which recovery by a given
algorithm becomes possible, traces the so-called phase transition curve of that
algorithm in the ((r,n)) plane. The phase transition curve of our algorithm is
noticeably better than that of NNM. Interestingly, it is close to the
information-theoretic lower bound for the minimal number of measurements needed
for matrix recovery, making it not only the fastest algorithm known, but also
near-optimal in terms of the matrices it successfully recovers.

Projection Theorems of Divergences and Likelihood Maximization Methods

Atin Gayen, M. Ashok Kumar
Comments: 20 pages
Subjects: Information Theory (cs.IT); Probability (math.PR); Statistics Theory (math.ST)

Projection theorems of divergences enable us to find reverse projection of a
divergence on a specific statistical model as a forward projection of the
divergence on a different but rather “simpler” statistical model, which, in
turn, results in solving a system of linear equations. Reverse projection of
divergences are closely related to various estimation methods such as the
maximum likelihood estimation or its variants in robust statistics. We consider
projection theorems of three parametric families of divergences that are widely
used in robust statistics, namely the Renyi divergences (or the Cressie-Reed
Power Divergences), Density Power Divergences, and the Relative alpha-Entropy
(or the Logarithmic Density Power Divergences). We explore these projection
theorems from the usual likelihood maximization approach and from the principle
of sufficiency. In particular, we show the equivalence of solving the
estimation problems by the projection theorems of the respective divergences
and by directly solving the corresponding estimating equations.

On shortened and punctured cyclic codes

Arti Yardi, Ruud Pellikaan
Subjects: Information Theory (cs.IT)

The problem of identifying whether the family of cyclic codes is
asymptotically good or not is a long-standing open problem in the field of
coding theory. It is known in the literature that some families of cyclic codes
such as BCH codes and Reed-Solomon codes are asymptotically bad, however in
general the answer to this question is not known. A recent result by Nelson and
Van Zwam shows that, all linear codes can be obtained by a sequence of
puncturing and/or shortening of a collection of asymptotically good
codes~cite{Nelson_2015}. In this paper, we prove that any linear code can be
obtained by a sequence of puncturing and/or shortening of some cyclic code.
Therefore the result that all codes can be obtained by shortening and/or
puncturing cyclic codes leaves the possibility open that cyclic codes are
asymptotically good.

Maximizing Indoor Wireless Coverage Using UAVs Equipped with Directional Antennas

Hazim Shakhatreh, Abdallah Khreishah
Comments: 19 pages, 17 figures
Subjects: Information Theory (cs.IT)

Unmanned aerial vehicles (UAVs) can be used to provide wireless coverage
during emergency cases where each UAV serves as an aerial wireless base station
when the cellular network goes down. They can also be used to supplement the
ground base station in order to provide better coverage and higher data rates
for the users. In this paper, we aim to maximize the indoor wireless coverage
using UAVs equipped with directional antennas. We study the case that the UAVs
are using one channel, thus in order to maximize the total indoor wireless
coverage, we avoid any overlapping in their coverage volumes. We present two
methods to place the UAVs; providing wireless coverage from one building side
and from two building sides. In the first method, we utilize circle packing
theory to determine the 3-D locations of the UAVs in a way that the total
coverage area is maximized. In the second method, we place the UAVs in front of
two building sides and efficiently arrange the UAVs in alternating upsidedown
arrangements. We show that the upside-down arrangements problem can be
transformed from 3D to 2D and based on that we present an efficient algorithm
to solve the problem. Our results show that the upside-down arrangements of
UAVs, can improve the maximum total coverage by 100% compared to providing
wireless coverage from one building side.

The Indoor Mobile Coverage Problem Using UAVs

Hazim Shakhatreh, Abdallah Khreishah, Issa Khalil
Comments: 27 pages, 14 figures
Subjects: Information Theory (cs.IT)

Unmanned aerial vehicles (UAVs) can be used as aerial wireless base stations
when cellular networks are not operational due to natural disasters. They can
also be used to supplement the ground base station in order to provide better
coverage and higher data rates for the users. Prior studies on UAV based
wireless coverage typically consider an Air-to-Ground path loss model, which
assumes that the users are outdoor and located on a 2D plane. In this paper, we
propose using UAVs to provide wireless coverage for indoor users inside a
high-rise building. First, we present realistic Outdoor-Indoor path loss models
and describe the tradeoff introduced by these models. Then we study the problem
of efficient placement of a single UAV, where the objective is to minimize the
total transmit power required to cover the entire high-rise building. The
formulated problem is non-convex and is generally difficult to solve. To that
end, we consider three cases of practical interest and provide efficient
solutions to the formulated problem under these cases. Then we study the
problem of minimizing the number of UAVs required to provide wireless coverage
to high rise buildings and prove that this problem is NP-complete. Due to the
intractability of the problem, we use clustering to minimize the number of UAVs
required to cover the indoor users. We demonstrate through simulations that the
method that clusters the building into regular structures and places the UAVs
in each cluster requires 80% more number of UAVs relative to our clustering
algorithm.

Providing Wireless Coverage to High-rise Buildings Using UAVs

Hazim Shakhatreh, Abdallah Khreishah, Bo Ji
Comments: 6 pages, 5 figures
Subjects: Information Theory (cs.IT)

Unmanned aerial vehicles (UAVs) can be used as aerial wireless base stations
when cellular networks go down. Prior studies on UAV-based wireless coverage
typically consider an Air-to-Ground path loss model, which assumes that the
users are outdoor and they are located on a 2D plane. In this paper, we propose
using a single UAV to provide wireless coverage for indoor users inside a
high-rise building under disaster situations (such as earthquakes or floods),
when cellular networks are down. First, we present a realistic Outdoor-Indoor
path loss model and describe the tradeoff introduced by this model. Then, we
study the problem of efficient UAV placement, where the objective is to
minimize the total transmit power required to cover the entire high-rise
building. The formulated problem is non-convex and is generally difficult to
solve. To that end, we consider two cases of practical interest and provide the
efficient solutions to the formulated problem under these cases. In the first
case, we aim to find the minimum transmit power such that an indoor user with
the maximum path loss can be covered. In the second case, we assume that the
locations of indoor users are symmetric across the dimensions of each floor.

Efficient 3D Placement of a UAV Using Particle Swarm Optimization

Hazim Shakhatreh, Abdallah Khreishah, Ayoub Alsarhan, Issa Khalil, Ahmad Sawalmeh, Noor Shamsiah Othman
Comments: 6 pages, 7 figures
Subjects: Information Theory (cs.IT)

Unmanned aerial vehicles (UAVs) can be used as aerial wireless base stations
when cellular networks go down. Prior studies on UAV-based wireless coverage
typically consider an Air-to-Ground path loss model, which assumes that the
users are outdoor and they are located on a 2D plane. In this paper, we propose
using a single UAV to provide wireless coverage for indoor users inside a
high-rise building under disaster situations (such as earthquakes or floods),
when cellular networks are down. We assume that the locations of indoor users
are uniformly distributed in each floor and we propose a particle swarm
optimization algorithm to find an efficient 3D placement of a UAV that
minimizes the total transmit power required to cover the indoor users.

On The Continuous Coverage Problem for a Swarm of UAVs

Hazim Shakhatreh, Abdallah Khreishah, Jacob Chakareski, Haythem Bany Salameh, Issa Khalil
Comments: 6 pages, 6 figures
Subjects: Information Theory (cs.IT)

Unmanned aerial vehicles (UAVs) can be used to provide wireless network and
remote surveillance coverage for disaster-affected areas. During such a
situation, the UAVs need to return periodically to a charging station for
recharging, due to their limited battery capacity. We study the problem of
minimizing the number of UAVs required for a continuous coverage of a given
area, given the recharging requirement. We prove that this problem is
NP-complete. Due to its intractability, we study partitioning the coverage
graph into cycles that start at the charging station. We first characterize the
minimum number of UAVs to cover such a cycle based on the charging time, the
traveling time, and the number of subareas to be covered by the cycle. Based on
this analysis, we then develop an efficient algorithm, the cycles with limited
energy algorithm. The straightforward method to continuously cover a given area
is to split it into N subareas and cover it by N cycles using N additional
UAVs. Our simulation results examine the importance of critical system
parameters: the energy capacity of the UAVs, the number of subareas in the
covered area, and the UAV charging and traveling times.We demonstrate that the
cycles with limited energy algorithm requires 69%-94% fewer additional UAVs
relative to the straightforward method, as the energy capacity of the UAVs is
increased, and 67%-71% fewer additional UAVs, as the number of subareas is
increased.

On the Capacity of Fractal Wireless Networks With Direct Social Interactions

Ying Chen, Rongpeng Li, Zhifeng Zhao, Honggang Zhang
Subjects: Information Theory (cs.IT)

The capacity of a fractal wireless network with direct social interactions is
studied in this paper. Specifically, we mathematically formulate the
self-similarity of a fractal wireless network by a power-law degree
distribution ( P(k) ), and we capture the connection feature between two nodes
with degree ( k_{1} ) and ( k_{2} ) by a joint probability distribution (
P(k_{1},k_{2}) ). It is proved that if the source node communicates with one of
its direct contacts randomly, the maximum capacity is consistent with the
classical result ( Thetaleft(frac{1}{sqrt{nlog n}}
ight) ) achieved by
Kumar cite{Gupta2000The}. On the other hand, if the two nodes with distance (
d ) communicate according to the probability ( d^{-eta} ), the maximum
capacity can reach up to ( Thetaleft(frac{1}{log n}
ight) ), which
exhibits remarkable improvement compared with the well-known result in
cite{Gupta2000The}.

Optimal Transport Theory for Cell Association in UAV-Enabled Cellular Networks

Mohammad Mozaffari, Walid Saad, Mehdi Bennis, Merouane Debbah
Comments: Accepted in IEEE Communications Letters
Subjects: Information Theory (cs.IT)

In this paper, a novel framework for delay-optimal cell association in
unmanned aerial vehicle (UAV)-enabled cellular networks is proposed. In
particular, to minimize the average network delay under any arbitrary spatial
distribution of the ground users, the optimal cell partitions of UAVs and
terrestrial base stations (BSs) are determined. To this end, using the powerful
mathematical tools of optimal transport theory, the existence of the solution
to the optimal cell association problem is proved and the solution space is
completely characterized. The analytical and simulation results show that the
proposed approach yields substantial improvements of the average network delay.

Control and Energy Management System in Microgrids

Hajir Pourbabak, Tao Chen, Bowen Zhang, Wencong Su
Comments: 25 pages, 12 figures, Book Chapter in Clean Energy Microgrids, Editors: Shin’ya Obara, Jorge Morel (2017)
Subjects: Systems and Control (cs.SY); Information Theory (cs.IT)

As a cutting-edge technology, microgrids feature intelligent EMSs and
sophisticated control, which will dramatically change our energy
infrastructure. The modern microgrids are a relatively recent development with
high potential to bring distributed generation, DES devices, controllable
loads, communication infrastructure, and many new technologies into the
mainstream. As a more controllable and intelligent entity, a microgrid has more
growth potential than ever before. However, there are still many open
questions, such as the future business models and economics. What is the
cost-benefit to the end-user? How should we systematically evaluate the
potential benefits and costs of control and energy management in a microgrid?

Maximum Number of Common Zeros of Homogeneous Polynomials over Finite Fields

Peter Beelen, Mrinmoy Datta, Sudhir R. Ghorpade
Comments: 15 pages
Subjects: Algebraic Geometry (math.AG); Information Theory (cs.IT)

About two decades ago, Tsfasman and Boguslavsky conjectured a formula for the
maximum number of common zeros that (r) linearly independent homogeneous
polynomials of degree (d) in (m+1) variables with coefficients in a finite
field with (q) elements can have in the corresponding (m)-dimensional
projective space. Recently, it has been shown by Datta and Ghorpade that this
conjecture is valid if (r) is at most (m+1) and can be invalid otherwise.
Moreover a new conjecture was proposed for many values of (r) beyond (m+1). In
this paper, we prove that this new conjecture holds true for several values of
(r). In particular, this settles the new conjecture completely when (d=3). Our
result also includes the positive result of Datta and Ghorpade as a special
case. Further, we determine the maximum number of zeros in certain cases not
covered by the earlier conjectures and results, namely, the case of (d=q-1) and
of (d=q). All these results are directly applicable to the determination of the
maximum number of points on sections of Veronese varieties by linear
subvarieties of a fixed dimension, and also the determination of generalized
Hamming weights of projective Reed-Muller codes.

Measurement uncertainty relations for position and momentum: Relative entropy formulation

Alberto Barchielli, Matteo Gregoratti, Alessandro Toigo
Comments: 33 pages
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph)

Heisenberg’s uncertainty principle has recently led to general measurement
uncertainty relations for quantum systems: incompatible observables can be
measured jointly or in sequence only with some unavoidable approximation, which
can be quantified in various ways. The relative entropy is the natural
theoretical quantifier of the information loss when a `true’ probability
distribution is replaced by an approximating one. In this paper, we provide a
lower bound for the amount of information that is lost by replacing the
distributions of the sharp position and momentum observables, as they could be
obtained with two separate experiments, by the marginals of any smeared joint
measurement. The bound is obtained by introducing an entropic error function,
and optimizing it over a suitable class of covariant approximate joint
measurements. We fully exploit two cases of target observables: (1)
(n)-dimensional position and momentum vectors; (2) two components of position
and momentum along different directions. In (1), we connect the quantum bound
to the dimension (n); in (2), going from parallel to orthogonal directions, we
show the transition from highly incompatible observables to compatible ones.
For simplicity, we develop the theory only for Gaussian states and
measurements.