IT博客汇 | arXiv Paper Daily: Thu, 13 Apr 2017

arXiv Paper Daily: Thu, 13 Apr 2017

我爱机器学习(52ml.net)发表于 2017-04-13 00:00:00

Neural and Evolutionary Computing

Approximating Optimization Problems using EAs on Scale-Free Networks

Ankit Chauhan, Tobias Friedrich, Francesco Quinzan
Comments: 21 pages, 3 figures, 2 tables and Accepted at GECCO’17
Subjects: Neural and Evolutionary Computing (cs.NE); Data Structures and Algorithms (cs.DS); Social and Information Networks (cs.SI)

It has been experimentally observed that real-world networks follow certain
topological properties, such as small-world, power-law etc. To study these
networks, many random graph models, such as Preferential Attachment, have been
proposed.

In this paper, we consider the deterministic properties which capture
power-law degree distribution and degeneracy. Networks with these properties
are known as scale-free networks in the literature. Many interesting problems
remain NP-hard on scale-free networks. We study the relationship between
scale-free properties and the approximation ratio of some commonly used
evolutionary algorithms.

For the Vertex Cover, we observe experimentally that the (1+1) EA always
gives the better result than a greedy local search, even when it runs for only
(O(n log n)) steps. We give the construction of a scale-free network in which
a multi-objective algorithm and a greedy algorithm obtain optimal solutions,
while the (1+1) EA obtains the worst possible solution with constant
probability.

We prove that for the Dominating Set, Vertex Cover, Connected Dominating Set
and Independent Set, the (1+1) EA obtains constant-factor approximation in
expected run time (O(n log n)) and (O(n^4)) respectively. Whereas, GSEMO gives
even better approximation than (1+1) EA in expected run time (O(n^3)) for
Dominating Set, Vertex Cover and Connected Dominating Set on such networks.

Improving Fitness Functions in Genetic Programming for Classification on Unbalanced Credit Card Datasets

Van Loi Cao, Nhien-An Le-Khac, Miguel Nicolau, Michael ONeill, James McDermott
Subjects: Neural and Evolutionary Computing (cs.NE)

Credit card fraud detection based on machine learning has recently attracted
considerable interest from the research community. One of the most important
tasks in this area is the ability of classifiers to handle the imbalance in
credit card data. In this scenario, classifiers tend to yield poor accuracy on
the fraud class (minority class) despite realizing high overall accuracy. This
is due to the influence of the majority class on traditional training criteria.
In this paper, we aim to apply genetic programming to address this issue by
adapting existing fitness functions. We examine two fitness functions from
previous studies and develop two new fitness functions to evolve GP classifier
with superior accuracy on the minority class and overall. Two UCI credit card
datasets are used to evaluate the effectiveness of the proposed fitness
functions. The results demonstrate that the proposed fitness functions augment
GP classifiers, encouraging fitter solutions on both the minority and the
majority classes.

A Neural Representation of Sketch Drawings

David Ha, Douglas Eck
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

We present sketch-rnn, a recurrent neural network (RNN) able to construct
stroke-based drawings of common objects. The model is trained on thousands of
crude human-drawn images representing hundreds of classes. We outline a
framework for conditional and unconditional sketch generation, and describe new
robust training methods for generating coherent sketch drawings in a vector
format.

Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

Timo Hackel, Nikolay Savinov, Lubor Ladicky, Jan D. Wegner, Konrad Schindler, Marc Pollefeys
Comments: Accepted to ISPRS Annals. The benchmark website is available at this http URL . The baseline code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

This paper presents a new 3D point cloud classification benchmark data set
with over four billion manually labelled points, meant as input for data-hungry
(deep) learning methods. We also discuss first submissions to the benchmark
that use deep convolutional neural networks (CNNs) as a work horse, which
already show remarkable performance improvements over state-of-the-art. CNNs
have become the de-facto standard for many tasks in computer vision and machine
learning like semantic segmentation or object detection in images, but have no
yet led to a true breakthrough for 3D point cloud labelling tasks due to lack
of training data. With the massive data set presented in this paper, we aim at
closing this data gap to help unleash the full potential of deep learning
methods for 3D labelling tasks. Our semantic3D.net data set consists of dense
point clouds acquired with static terrestrial laser scanners. It contains 8
semantic classes and covers a wide range of urban outdoor scenes: churches,
streets, railroad tracks, squares, villages, soccer fields and castles. We
describe our labelling interface and show that our data set provides more dense
and complete point clouds with much higher overall number of labelled points
compared to those already available to the research community. We further
provide baseline method descriptions and comparison between methods submitted
to our online system. We hope semantic3D.net will pave the way for deep
learning methods in 3D point cloud labelling to learn richer, more general 3D
representations, and first submissions after only a few months indicate that
this might indeed be the case.

Computer Vision and Pattern Recognition

Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

This paper presents a new 3D point cloud classification benchmark data set
with over four billion manually labelled points, meant as input for data-hungry
(deep) learning methods. We also discuss first submissions to the benchmark
that use deep convolutional neural networks (CNNs) as a work horse, which
already show remarkable performance improvements over state-of-the-art. CNNs
have become the de-facto standard for many tasks in computer vision and machine
learning like semantic segmentation or object detection in images, but have no
yet led to a true breakthrough for 3D point cloud labelling tasks due to lack
of training data. With the massive data set presented in this paper, we aim at
closing this data gap to help unleash the full potential of deep learning
methods for 3D labelling tasks. Our semantic3D.net data set consists of dense
point clouds acquired with static terrestrial laser scanners. It contains 8
semantic classes and covers a wide range of urban outdoor scenes: churches,
streets, railroad tracks, squares, villages, soccer fields and castles. We
describe our labelling interface and show that our data set provides more dense
and complete point clouds with much higher overall number of labelled points
compared to those already available to the research community. We further
provide baseline method descriptions and comparison between methods submitted
to our online system. We hope semantic3D.net will pave the way for deep
learning methods in 3D point cloud labelling to learn richer, more general 3D
representations, and first submissions after only a few months indicate that
this might indeed be the case.

Connecting Look and Feel: Associating the visual and tactile properties of physical materials

Wenzhen Yuan, Shaoxiong Wang, Siyuan Dong, Edward Adelson
Subjects: Computer Vision and Pattern Recognition (cs.CV)

For machines to interact with the physical world, they must understand the
physical properties of objects and materials they encounter. We use fabrics as
an example of a deformable material with a rich set of mechanical properties. A
thin flexible fabric, when draped, tends to look different from a heavy stiff
fabric. It also feels different when touched. Using a collection of 118 fabric
sample, we captured color and depth images of draped fabrics along with tactile
data from a high resolution touch sensor. We then sought to associate the
information from vision and touch by jointly training CNNs across the three
modalities. Through the CNN, each input, regardless of the modality, generates
an embedding vector that records the fabric’s physical property. By comparing
the embeddings, our system is able to look at a fabric image and predict how it
will feel, and vice versa. We also show that a system jointly trained on vision
and touch data can outperform a similar system trained only on visual data when
tested purely with visual inputs.

Attention-Set based Metric Learning for Video Face Recognition

Yibo Hu, Xiang Wu, Ran He
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Face recognition has made great progress with the development of deep
learning. However, video face recognition (VFR) is still an ongoing task due to
various illumination, low-resolution, pose variations and motion blur. Most
existing CNN-based VFR methods only obtain a feature vector from a single image
and simply aggregate the features in a video, which less consider the
correlations of face images in one video. In this paper, we propose a novel
Attention-Set based Metric Learning (ASML) method to measure the statistical
characteristics of image sets. It is a promising and generalized extension of
Maximum Mean Discrepancy with memory attention weighting. First, we define an
effective distance metric on image sets, which explicitly minimizes the
intra-set distance and maximizes the inter-set distance simultaneously. Second,
inspired by Neural Turing Machine, a Memory Attention Weighting is proposed to
adapt set-aware global contents. Then ASML is naturally integrated into CNNs,
resulting in an end-to-end learning scheme. Our method achieves
state-of-the-art performance for the task of video face recognition on the
three widely used benchmarks including YouTubeFace, YouTube Celebrities and
Celebrity-1000.

Ensemble classifier approach in breast cancer detection and malignancy grading- A review

Deepti Ameta
Comments: 10 pages,1 figure,5 tables
Journal-ref: International Journal of Managing Public Sector Information and
Communication Technologies (IJMPICT) Vol. 8, No. 1, March 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The diagnosed cases of Breast cancer is increasing annually and unfortunately
getting converted into a high mortality rate. Cancer, at the early stages, is
hard to detect because the malicious cells show similar properties (density) as
shown by the non-malicious cells. The mortality ratio could have been minimized
if the breast cancer could have been detected in its early stages. But the
current systems have not been able to achieve a fully automatic system which is
not just capable of detecting the breast cancer but also can detect the stage
of it. Estimation of malignancy grading is important in diagnosing the degree
of growth of malicious cells as well as in selecting a proper therapy for the
patient. Therefore, a complete and efficient clinical decision support system
is proposed which is capable of achieving breast cancer malignancy grading
scheme very efficiently. The system is based on Image processing and machine
learning domains. Classification Imbalance problem, a machine learning problem,
occurs when instances of one class is much higher than the instances of the
other class resulting in an inefficient classification of samples and hence a
bad decision support system. Therefore EUSBoost, ensemble based classifier is
proposed which is efficient and is able to outperform other classifiers as it
takes the benefits of both-boosting algorithm with Random Undersampling
techniques. Also comparison of EUSBoost with other techniques is shown in the
paper.

Unsupervised part learning for visual recognition

Ronan Sicre, Yannis Avrithis, Ewa Kijak, Frederic Jurie
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

Part-based image classification aims at representing categories by small sets
of learned discriminative parts, upon which an image representation is built.
Considered as a promising avenue a decade ago, this direction has been
neglected since the advent of deep neural networks. In this context, this paper
brings two contributions: first, it shows that despite the recent success of
end-to-end holistic models, explicit part learning can boosts classification
performance. Second, this work proceeds one step further than recent part-based
models (PBM), focusing on how to learn parts without using any labeled data.
Instead of learning a set of parts per class, as generally done in the PBM
literature, the proposed approach both constructs a partition of a given set of
images into visually similar groups, and subsequently learn a set of
discriminative parts per group in a fully unsupervised fashion. This strategy
opens the door to the use of PBM in new applications for which the notion of
image categories is irrelevant, such as instance-based image retrieval, for
example. We experimentally show that our learned parts can help building
efficient image representations, for classification as well as for indexing
tasks, resulting in performance superior to holistic state-of-the art Deep
Convolutional Neural Networks (DCNN) encoding.

Unsupervised Construction of Human Body Models Using Principles of Organic Computing

Thomas Walther, Rolf P. Würtz
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Unsupervised learning of a generalizable model of the visual appearance of
humans from video data is of major importance for computing systems interacting
naturally with their users and others. We propose a step towards automatic
behavior understanding by integrating principles of Organic Computing into the
posture estimation cycle, thereby relegating the need for human intervention
while simultaneously raising the level of system autonomy. The system extracts
coherent motion from moving upper bodies and autonomously decides about limbs
and their possible spatial relationships. The models from many videos are
integrated into meta-models, which show good generalization to different
individuals, backgrounds, and attire. These models allow robust interpretation
of single video frames without temporal continuity and posture mimicking by an
android robot.

Object proposal generation applying the distance dependent Chinese restaurant process

Mikko Lauri, Simone Frintrop
Comments: To appear at Scandinavian Conference on Image Analysis (SCIA) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In application domains such as robotics, it is useful to represent the
uncertainty related to the robot’s belief about the state of its environment.
Algorithms that only yield a single “best guess” as a result are not
sufficient. In this paper, we propose object proposal generation based on
non-parametric Bayesian inference that allows quantification of the likelihood
of the proposals. We apply Markov chain Monte Carlo to draw samples of image
segmentations via the distance dependent Chinese restaurant process. Our method
achieves state-of-the-art performance on an indoor object discovery data set,
while additionally providing a likelihood term for each proposal. We show that
the likelihood term can effectively be used to rank proposals according to
their quality.

Dilated Convolutional Neural Networks for Cardiovascular MR Segmentation in Congenital Heart Disease

Jelmer M. Wolterink, Tim Leiner, Max A. Viergever, Ivana Išgum
Journal-ref: RAMBO 2016, HVSMR 2016. LNCS 10129. pp. 95-102
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose an automatic method using dilated convolutional neural networks
(CNNs) for segmentation of the myocardium and blood pool in cardiovascular MR
(CMR) of patients with congenital heart disease (CHD).

Ten training and ten test CMR scans cropped to an ROI around the heart were
provided in the MICCAI 2016 HVSMR challenge. A dilated CNN with a receptive
field of 131×131 voxels was trained for myocardium and blood pool segmentation
in axial, sagittal and coronal image slices. Performance was evaluated within
the HVSMR challenge.

Automatic segmentation of the test scans resulted in Dice indices of
0.80(pm)0.06 and 0.93(pm)0.02, average distances to boundaries of
0.96(pm)0.31 and 0.89(pm)0.24 mm, and Hausdorff distances of 6.13(pm)3.76
and 7.07(pm)3.01 mm for the myocardium and blood pool, respectively.
Segmentation took 41.5(pm)14.7 s per scan.

In conclusion, dilated CNNs trained on a small set of CMR images of CHD
patients showing large anatomical variability provide accurate myocardium and
blood pool segmentations.

Feature Tracking Cardiac Magnetic Resonance via Deep Learning and Spline Optimization

Davis M. Vigneault, Weidi Xie, David A. Bluemke, J. Alison Noble
Comments: Accepted to Functional Imaging and Modeling of the Heart (FIMH) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Feature tracking Cardiac Magnetic Resonance (CMR) has recently emerged as an
area of interest for quantification of regional cardiac function from balanced,
steady state free precession (SSFP) cine sequences. However, currently
available techniques lack full automation, limiting reproducibility. We propose
a fully automated technique whereby a CMR image sequence is first segmented
with a deep, fully convolutional neural network (CNN) architecture, and
quadratic basis splines are fitted simultaneously across all cardiac frames
using least squares optimization. Experiments are performed using data from 42
patients with hypertrophic cardiomyopathy (HCM) and 21 healthy control
subjects. In terms of segmentation, we compared state-of-the-art CNN
frameworks, U-Net and dilated convolution architectures, with and without
temporal context, using cross validation with three folds. Performance relative
to expert manual segmentation was similar across all networks: pixel accuracy
was ~97%, intersection-over-union (IoU) across all classes was ~87%, and IoU
across foreground classes only was ~85%. Endocardial left ventricular
circumferential strain calculated from the proposed pipeline was significantly
different in control and disease subjects (-25.3% vs -29.1%, p = 0.006), in
agreement with the current clinical literature.

Predictive-Corrective Networks for Action Detection

Achal Dave, Olga Russakovsky, Deva Ramanan
Comments: Accepted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

While deep feature learning has revolutionized techniques for static-image
understanding, the same does not quite hold for video processing. Architectures
and optimization techniques used for video are largely based off those for
static images, potentially underutilizing rich video information. In this work,
we rethink both the underlying network architecture and the stochastic learning
paradigm for temporal data. To do so, we draw inspiration from classic theory
on linear dynamic systems for modeling time series. By extending such models to
include nonlinear mappings, we derive a series of novel recurrent neural
networks that sequentially make top-down predictions about the future and then
correct those predictions with bottom-up observations. Predictive-corrective
networks have a number of desirable properties: (1) they can adaptively focus
computation on “surprising” frames where predictions require large corrections,
(2) they simplify learning in that only “residual-like” corrective terms need
to be learned over time and (3) they naturally decorrelate an input data stream
in a hierarchical fashion, producing a more reliable signal for learning at
each layer of a network. We provide an extensive analysis of our lightweight
and interpretable framework, and demonstrate that our model is competitive with
the two-stream network on three challenging datasets without the need for
computationally expensive optical flow.

Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories

Ziad Al-Halah, Rainer Stiefelhagen
Comments: Accepted as a conference paper at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Attribute-based recognition models, due to their impressive performance and
their ability to generalize well on novel categories, have been widely adopted
for many computer vision applications. However, usually both the attribute
vocabulary and the class-attribute associations have to be provided manually by
domain experts or large number of annotators. This is very costly and not
necessarily optimal regarding recognition performance, and most importantly, it
limits the applicability of attribute-based models to large scale data sets. To
tackle this problem, we propose an end-to-end unsupervised attribute learning
approach. We utilize online text corpora to automatically discover a salient
and discriminative vocabulary that correlates well with the human concept of
semantic attributes. Moreover, we propose a deep convolutional model to
optimize class-attribute associations with a linguistic prior that accounts for
noise and missing data in text. In a thorough evaluation on ImageNet, we
demonstrate that our model is able to efficiently discover and learn semantic
attributes at a large scale. Furthermore, we demonstrate that our model
outperforms the state-of-the-art in zero-shot learning on three data sets:
ImageNet, Animals with Attributes and aPascal/aYahoo. Finally, we enable
attribute-based learning on ImageNet and will share the attributes and
associations for future research.

Instance-Level Salient Object Segmentation

Guanbin Li, Yuan Xie, Liang Lin, Yizhou Yu
Comments: To appear in CVPR2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image saliency detection has recently witnessed rapid progress due to deep
convolutional neural networks. However, none of the existing methods is able to
identify object instances in the detected salient regions. In this paper, we
present a salient instance segmentation method that produces a saliency mask
with distinct object instance labels for an input image. Our method consists of
three steps, estimating saliency map, detecting salient object contours and
identifying salient object instances. For the first two steps, we propose a
multiscale saliency refinement network, which generates high-quality salient
region masks and salient object contours. Once integrated with multiscale
combinatorial grouping and a MAP-based subset optimization framework, our
method can generate very promising salient object instance segmentation
results. To promote further research and evaluation of salient instance
segmentation, we also construct a new database of 1000 images and their
pixelwise salient instance annotations. Experimental results demonstrate that
our proposed method is capable of achieving state-of-the-art performance on all
public benchmarks for salient region detection as well as on our new dataset
for salient instance segmentation.

Deep Contextual Recurrent Residual Networks for Scene Labeling

T. Hoang Ngan Le, Chi Nhan Duong, Ligong Han, Khoa Luu, Marios Savvides, Dipan Pal
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Designed as extremely deep architectures, deep residual networks which
provide a rich visual representation and offer robust convergence behaviors
have recently achieved exceptional performance in numerous computer vision
problems. Being directly applied to a scene labeling problem, however, they
were limited to capture long-range contextual dependence, which is a critical
aspect. To address this issue, we propose a novel approach, Contextual
Recurrent Residual Networks (CRRN) which is able to simultaneously handle rich
visual representation learning and long-range context modeling within a fully
end-to-end deep network. Furthermore, our proposed end-to-end CRRN is
completely trained from scratch, without using any pre-trained models in
contrast to most existing methods usually fine-tuned from the state-of-the-art
pre-trained models, e.g. VGG-16, ResNet, etc. The experiments are conducted on
four challenging scene labeling datasets, i.e. SiftFlow, CamVid, Stanford
background and SUN datasets, and compared against various state-of-the-art
scene labeling methods.

Reformulating Level Sets as Deep Recurrent Neural Network Approach to Semantic Segmentation

Ngan Le, Kha Gia Quach, Khoa Luu, Marios Savvides, Chenchen Zhu
Comments: 10 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Variational Level Set (LS) has been a widely used method in medical
segmentation. However, it is limited when dealing with multi-instance objects
in the real world. In addition, its segmentation results are quite sensitive to
initial settings and highly depend on the number of iterations. To address
these issues and boost the classic variational LS methods to a new level of the
learnable deep learning approaches, we propose a novel definition of contour
evolution named Recurrent Level Set (RLS)} to employ Gated Recurrent Unit under
the energy minimization of a variational LS functional. The curve deformation
process in RLS is formed as a hidden state evolution procedure and updated by
minimizing an energy functional composed of fitting forces and contour length.
By sharing the convolutional features in a fully end-to-end trainable
framework, we extend RLS to Contextual RLS (CRLS) to address semantic
segmentation in the wild. The experimental results have shown that our proposed
RLS improves both computational time and segmentation accuracy against the
classic variations LS-based method, whereas the fully end-to-end system CRLS
achieves competitive performance compared to the state-of-the-art semantic
segmentation approaches.

Beyond Planar Symmetry: Modeling human perception of reflection and rotation symmetries in the wild

Christopher Funk, Yanxi Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

Humans take advantage of real world symmetries for various tasks, yet
capturing their superb symmetry perception mechanism into a computational model
remains elusive. Encouraged by a new discovery (CVPR 2016) demonstrating
extremely high inter-person accuracy of human perceived symmetries in the wild,
we have created the first deep-learning neural network for reflection and
rotation symmetry detection (Sym-NET), trained on photos from MS-COCO (Common
Object in COntext) dataset with nearly 11K symmetry-labels from more than 400
human observers. We employ novel methods to convert discrete human labels into
symmetry heatmaps, capture symmetry densely in an image and quantitatively
evaluate Sym-NET against multiple existing computer vision algorithms. Using
the symmetry competition testsets from CVPR 2013 and unseen MS-COCO photos,
Sym-NET comes out as the winner with significantly superior performance over
all other competitors. Beyond mathematically well-defined symmetries on a
plane, Sym-NET demonstrates abilities to identify viewpoint-varied 3D
symmetries, partially occluded symmetrical objects and symmetries at a semantic
level.

Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification

Muhammad Zeshan Afzal, Andreas Kölsch, Sheraz Ahmed, Marcus Liwicki
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present an exhaustive investigation of recent Deep Learning architectures,
algorithms, and strategies for the task of document image classification to
finally reduce the error by more than half. Existing approaches, such as the
DeepDocClassifier, apply standard Convolutional Network architectures with
transfer learning from the object recognition domain. The contribution of the
paper is threefold: First, it investigates recently introduced very deep neural
network architectures (GoogLeNet, VGG, ResNet) using transfer learning (from
real images). Second, it proposes transfer learning from a huge set of document
images, i.e. 400,000 documents. Third, it analyzes the impact of the amount of
training data (document images) and other parameters to the classification
abilities. We use two datasets, the Tobacco-3482 and the large-scale RVL-CDIP
dataset. We achieve an accuracy of 91.13% for the Tobacco-3482 dataset while
earlier approaches reach only 77.6%. Thus, a relative error reduction of more
than 60% is achieved. For the large dataset RVL-CDIP, an accuracy of 90.97% is
achieved, corresponding to a relative error reduction of 11.5%.

Attention-based Extraction of Structured Information from Street View Imagery

Zbigniew Wojna, Alex Gorban, Dar-Shyang Lee, Kevin Murphy, Qian Yu, Yeqing Li, Julian Ibarz
Comments: In submission to ICDAR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a neural network model – based on CNNs, RNNs and a novel attention
mechanism – which achieves 84.2% accuracy on the challenging French Street Name
Signs (FSNS) dataset, significantly outperforming the previous state of the art
(Smith’16), which achieved 72.46%. Furthermore, our new method is much simpler
and more general than the previous approach. To demonstrate the generality of
our model, we show that it also performs well on an even more challenging
dataset derived from Google Street View, in which the goal is to extract
business names from store fronts. Finally, we study the speed/accuracy tradeoff
that results from using CNN feature extractors of different depths.
Surprisingly, we find that deeper is not always better (in terms of accuracy,
as well as speed). Our resulting model is simple, accurate and fast, allowing
it to be used at scale on a variety of challenging real-world text extraction
problems.

Learning Detection with Diverse Proposals

Samaneh Azadi, Jiashi Feng, Trevor Darrell
Comments: Accepted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

To predict a set of diverse and informative proposals with enriched
representations, this paper introduces a differentiable Determinantal Point
Process (DPP) layer that is able to augment the object detection architectures.
Most modern object detection architectures, such as Faster R-CNN, learn to
localize objects by minimizing deviations from the ground-truth but ignore
correlation between multiple proposals and object categories. Non-Maximum
Suppression (NMS) as a widely used proposal pruning scheme ignores label- and
instance-level relations between object candidates resulting in multi-labeled
detections. In the multi-class case, NMS selects boxes with the largest
prediction scores ignoring the semantic relation between categories of
potential election. In contrast, our trainable DPP layer, allowing for Learning
Detection with Diverse Proposals (LDDP), considers both label-level contextual
information and spatial layout relationships between proposals without
increasing the number of parameters of the network, and thus improves location
and category specifications of final detected bounding boxes substantially
during both training and inference schemes. Furthermore, we show that LDDP
keeps it superiority over Faster R-CNN even if the number of proposals
generated by LDPP is only ~30% as many as those for Faster R-CNN.

UC Merced Submission to the ActivityNet Challenge 2016

Yi Zhu, Shawn Newsam, Zaikun Xu
Comments: Notebook paper for ActivityNet 2016 challenge, untrimmed video classification track
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

This notebook paper describes our system for the untrimmed classification
task in the ActivityNet challenge 2016. We investigate multiple
state-of-the-art approaches for action recognition in long, untrimmed videos.
We exploit hand-crafted motion boundary histogram features as well feature
activations from deep networks such as VGG16, GoogLeNet, and C3D. These
features are separately fed to linear, one-versus-rest support vector machine
classifiers to produce confidence scores for each action class. These
predictions are then fused along with the softmax scores of the recent
ultra-deep ResNet-101 using weighted averaging.

Creativity: Generating Diverse Questions using Variational Autoencoders

Unnat Jain, Ziyu Zhang, Alexander Schwing
Comments: Accepted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generating diverse questions for given images is an important task for
computational education, entertainment and AI assistants. Different from many
conventional prediction techniques is the need for algorithms to generate a
diverse set of plausible questions, which we refer to as “creativity”. In this
paper we propose a creative algorithm for visual question generation which
combines the advantages of variational autoencoders with long short-term memory
networks. We demonstrate that our framework is able to generate a large set of
varying questions given a single input image.

CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction

Keisuke Tateno, Federico Tombari, Iro Laina, Nassir Navab
Comments: 10 pages, 6 figures, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, June, 2017. The first two authors contribute equally to this paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Given the recent advances in depth prediction from Convolutional Neural
Networks (CNNs), this paper investigates how predicted depth maps from a deep
neural network can be deployed for accurate and dense monocular reconstruction.
We propose a method where CNN-predicted dense depth maps are naturally fused
together with depth measurements obtained from direct monocular SLAM. Our
fusion scheme privileges depth prediction in image locations where monocular
SLAM approaches tend to fail, e.g. along low-textured regions, and vice-versa.
We demonstrate the use of depth prediction for estimating the absolute scale of
the reconstruction, hence overcoming one of the major limitations of monocular
SLAM. Finally, we propose a framework to efficiently fuse semantic labels,
obtained from a single frame, with dense SLAM, yielding semantically coherent
scene reconstruction from a single view. Evaluation results on two benchmark
datasets show the robustness and accuracy of our approach.

Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems

Tim Meinhardt, Michael Möller, Caner Hazirbas, Daniel Cremers
Subjects: Computer Vision and Pattern Recognition (cs.CV)

While variational methods have been among the most powerful tools for solving
linear inverse problems in imaging, deep (convolutional) neural networks have
recently taken the lead in many challenging benchmarks. A remaining drawback of
deep learning approaches is that they require an expensive retraining whenever
the specific problem, the noise level, noise type, or desired measure of
fidelity changes. On the contrary, variational methods have a plug-and-play
nature as they usually consist of separate data fidelity and regularization
terms. In this paper we study the possibility of replacing the proximal
operator of the regularization used in many convex energy minimization
algorithms by a denoising neural network. The latter therefore serves as an
implicit natural image prior, while the data term can still be chosen
arbitrarily. Using a fixed denoising neural network in exemplary problems of
image deconvolution with different blur kernels and image demosaicking, we
obtain state-of-the-art results. Additionally, we discuss novel results on the
analysis of possible convex optimization algorithms to incorporate the network
into, as well as the choices of algorithm parameters and their relation to the
noise level the neural network is trained on.

Learning Two-Branch Neural Networks for Image-Text Matching Tasks

Liwei Wang, Yin Li, Svetlana Lazebnik
Comments: in submission for TPAMI
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper investigates two-branch neural networks for image-text matching
tasks. We propose two different network structures that produce different
output representations. The first one, referred to as an embedding network,
learns an explicit shared latent embedding space with a maximum-margin ranking
loss and novel neighborhood constraints. The second one, referred to as a
similarity network, fuses the two branches via element-wise product and is
trained with regression loss to directly predict a similarity score. Extensive
experiments show that our two-branch networks achieve high accuracies for
phrase localization on the Flickr30K Entities dataset and for bi-directional
image-sentence retrieval on Flickr30K and MSCOCO datasets.

Deep-FExt: Deep Feature Extraction for Vessel Segmentation and Centerline Prediction

Giles Tetteh, Markus Rempfler, Bjoern H. Menze, Claus Zimmer
Comments: 9 pages
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Feature extraction is a very crucial task in image and pixel (voxel)
classification and regression in biomedical image modeling. In this work we
present a machine learning based feature extraction scheme based on inception
models for pixel classification tasks. We extract features under multi-scale
and multi-layer schemes through convolutional operators. Layers of Fully
Convolutional Network are later stacked on this feature extraction layers and
trained end-to-end for the purpose of classification. We test our model on the
DRIVE and STARE public data sets for the purpose of segmentation and centerline
detection and it out performs most existing hand crafted or deterministic
feature schemes found in literature. We achieve an average maximum Dice of 0.85
on the DRIVE data set which out performs the scores from the second human
annotator of this data set. We also achieve an average maximum Dice of 0.85 and
kappa of 0.84 on the STARE data set. Though these datasets are mainly 2-D we
also propose ways of extending this feature extraction scheme to handle 3-D
datasets.

Artificial Intelligence

Counterexample Guided Inductive Optimization

Rodrigo F. Araujo, Higo F. Albuquerque, Iury V. de Bessa, Lucas C. Cordeiro, Joao Edgar C. Filho
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

This paper describes three variants of a counterexample guided inductive
optimization (CEGIO) approach based on Satisfiability Modulo Theories (SMT)
solvers. In particular, CEGIO relies on iterative executions to constrain a
verification procedure, in order to perform inductive generalization, based on
counterexamples extracted from SMT solvers. CEGIO is able to successfully
optimize a wide range of functions, including non-linear and non-convex
optimization problems based on SMT solvers, in which data provided by
counterexamples are employed to guide the verification engine, thus reducing
the optimization domain. The present algorithms are evaluated using a large set
of benchmarks typically employed for evaluating optimization techniques.
Experimental results show the efficiency and effectiveness of the proposed
algorithms, which find the optimal solution in all evaluated benchmarks, while
traditional techniques are usually trapped by local minima.

Learning from Demonstrations for Real World Reinforcement Learning

Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

Deep reinforcement learning (RL) has achieved several high profile successes
in difficult control problems. However, these algorithms typically require a
huge amount of data before they reach reasonable performance. In fact, their
performance during learning can be extremely poor. This may be acceptable for a
simulator, but it severely limits the applicability of deep RL to many
real-world tasks, where the agent must learn in the real environment. In this
paper we study a setting where the agent may access data from previous control
of the system. We present an algorithm, Deep Q-learning from Demonstrations
(DQfD), that leverages this data to massively accelerate the learning process
even from relatively small amounts of demonstration data. DQfD works by
combining temporal difference updates with large-margin classification of the
demonstrator’s actions. We show that DQfD has better initial performance than
Deep Q-Networks (DQN) on 40 of 42 Atari games and it receives more average
rewards than DQN on 27 of 42 Atari games. We also demonstrate that DQfD learns
faster than DQN even when given poor demonstration data.

Beliefs in Markov Trees – From Local Computations to Local Valuation

Mieczysław A. Kłopotek
Comments: Preliminary versioin of conference paper: M.A. K{l}opotek: Beliefs in Markov Trees – From Local Computations to Local Valuation. [in:] R. Trappl, Ed.: Cybernetics and Systems Research , Proc. 12th European Meeting on Cybernetics and System Research, Vienna 5-8 April 1994, World Scientific Publishers, Vol.1. pp. 351-358
Subjects: Artificial Intelligence (cs.AI)

This paper is devoted to expressiveness of hypergraphs for which uncertainty
propagation by local computations via Shenoy/Shafer method applies. It is
demonstrated that for this propagation method for a given joint belief
distribution no valuation of hyperedges of a hypergraph may provide with
simpler hypergraph structure than valuation of hyperedges by conditional
distributions. This has vital implication that methods recovering belief
networks from data have no better alternative for finding the simplest
hypergraph structure for belief propagation. A method for recovery
tree-structured belief networks has been developed and specialized for
Dempster-Shafer belief functions

Stigmergy-based modeling to discover urban activity patterns from positioning data

Antonio L. Alfeo, Mario G. C. A. Cimino, Sara Egidi, Bruno Lepri, Alex Pentland, Gigliola Vaglini
Comments: It will be presented at the International Conference on Social, Cultural, and Behavioral Modeling & Prediction and Behavior Representation in Modeling and Simulation, SBP-BRiMS 2017
Subjects: Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

Positioning data offer a remarkable source of information to analyze crowds
urban dynamics. However, discovering urban activity patterns from the emergent
behavior of crowds involves complex system modeling. An alternative approach is
to adopt computational techniques belonging to the emergent paradigm, which
enables self-organization of data and allows adaptive analysis. Specifically,
our approach is based on stigmergy. By using stigmergy each sample position is
associated with a digital pheromone deposit, which progressively evaporates and
aggregates with other deposits according to their spatiotemporal proximity.
Based on this principle, we exploit positioning data to identify high density
areas (hotspots) and characterize their activity over time. This
characterization allows the comparison of dynamics occurring in different days,
providing a similarity measure exploitable by clustering techniques. Thus, we
cluster days according to their activity behavior, discovering unexpected urban
activity patterns. As a case study, we analyze taxi traces in New York City
during 2015.

Finding Modes by Probabilistic Hypergraphs Shifting

Yang Wang, Lin Wu
Comments: Fixing some minor issues in PAKDD 2014
Subjects: Artificial Intelligence (cs.AI)

In this paper, we develop a novel paradigm, namely hypergraph shift, to find
robust graph modes by probabilistic voting strategy, which are semantically
sound besides the self-cohesiveness requirement in forming graph modes. Unlike
the existing techniques to seek graph modes by shifting vertices based on
pair-wise edges (i.e, an edge with (2) ends), our paradigm is based on shifting
high-order edges (hyperedges) to deliver graph modes. Specifically, we convert
the problem of seeking graph modes as the problem of seeking maximizers of a
novel objective function with the aim to generate good graph modes based on
sifting edges in hypergraphs. As a result, the generated graph modes based on
dense subhypergraphs may more accurately capture the object semantics besides
the self-cohesiveness requirement. We also formally prove that our technique is
always convergent. Extensive empirical studies on synthetic and real world data
sets are conducted on clustering and graph matching. They demonstrate that our
techniques significantly outperform the existing techniques.

CASP Solutions for Planning in Hybrid Domains

Marcello Balduccini, Daniele Magazzeni, Marco Maratea, Emily LeBlanc
Comments: Under consideration in Theory and Practice of Logic Programming (TPLP)
Subjects: Artificial Intelligence (cs.AI)

CASP is an extension of ASP that allows for numerical constraints to be added
in the rules. PDDL+ is an extension of the PDDL standard language of automated
planning for modeling mixed discrete-continuous dynamics.

In this paper, we present CASP solutions for dealing with PDDL+ problems,
i.e., encoding from PDDL+ to CASP, and extensions to the algorithm of the EZCSP
CASP solver in order to solve CASP programs arising from PDDL+ domains. An
experimental analysis, performed on well-known linear and non-linear variants
of PDDL+ domains, involving various configurations of the EZCSP solver, other
CASP solvers, and PDDL+ planners, shows the viability of our solution.

Parallelized Kendall's Tau Coefficient Computation via SIMD Vectorized Sorting On Many-Integrated-Core Processors

Yongchao Liu, Tony Pan, Oded Green, Srinivas Aluru
Comments: 29 pages, 6 figures, 5 tables, submitted to Journal of Parallel and Distributed Computing
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)

Pairwise association measure is an important operation in data analytics.
Kendall’s tau coefficient is one widely used correlation coefficient
identifying non-linear relationships between ordinal variables. In this paper,
we investigated a parallel algorithm accelerating all-pairs Kendall’s tau
coefficient computation via single instruction multiple data (SIMD) vectorized
sorting on Intel Xeon Phis by taking advantage of many processing cores and
512-bit SIMD vector instructions. To facilitate workload balancing and overcome
on-chip memory limitation, we proposed a generic framework for symmetric
all-pairs computation by building provable bijective functions between job
identifier and coordinate space. Performance evaluation demonstrated that our
algorithm on one 5110P Phi achieves two orders-of-magnitude speedups over
16-threaded MATLAB and three orders-of-magnitude speedups over sequential R,
both running on high-end CPUs. Besides, our algorithm exhibited rather good
distributed computing scalability with respect to number of Phis. Source code
and datasets are publicly available at this http URL

Real-time On-Demand Crowd-powered Entity Extraction

Ting-Hao (Kenneth)
Huang, Yun-Nung Chen, Jeffrey P. Bigham
Comments: Accepted by the 5th Edition Of The Collective Intelligence Conference (CI 2017) as an oral presentation. Interface code and data are available at: this https URL
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Output-agreement mechanisms such as ESP Game have been widely used in human
computation to obtain reliable human-generated labels. In this paper, we argue
that a “time-limited” output-agreement mechanism can be used to create a fast
and robust crowd-powered component in interactive systems, particularly
dialogue systems, to extract key information from user utterances on the fly.
Our experiments on Amazon Mechanical Turk using the Airline Travel Information
System (ATIS) dataset showed that the proposed approach achieves high-quality
results with an average response time shorter than 9 seconds.

Unsupervised Event Abstraction using Pattern Abstraction and Local Process Models

Felix Mannhardt, Niek Tax
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Process mining analyzes business processes based on events stored in event
logs. However, some recorded events may correspond to activities on a very low
level of abstraction. When events are recorded on a too low level of
granularity, process discovery methods tend to generate overgeneralizing
process models. Grouping low-level events to higher level activities, i.e.,
event abstraction, can be used to discover better process models. Existing
event abstraction methods are mainly based on common sub-sequences and
clustering techniques. In this paper, we propose to first discover local
process models and then use those models to lift the event log to a higher
level of abstraction. Our conjecture is that process models discovered on the
obtained high-level event log return process models of higher quality: their
fitness and precision scores are more balanced. We show this with preliminary
results on several real-life event logs.

Information Retrieval

Loklak – A Distributed Crawler and Data Harvester for Overcoming Rate Limits

Sudheesh Singanamalla, Michael Peter Christen
Comments: 4 pages, 1 figure
Subjects: Information Retrieval (cs.IR)

Modern social networks have become sources for vast quantities of data.
Having access to such big data can be very useful for various researchers and
data scientists. In this paper we describe Loklak, an open source distributed
peer to peer crawler and scraper for supporting such research on platforms like
Twitter, Weibo and other social networks. Social networks such as Twitter and
Weibo pose various limitations to the user on the rate at which one could
freely collect such data for research. Our crawler enables researchers to
continuously collect data while overcoming the barriers of authentication and
rate limits imposed to provide a repository of open data as a service.

Leveraging Term Banks for Answering Complex Questions: A Case for Sparse Vectors

Peter D. Turney
Comments: Related datasets can be found at this http URL
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Learning (cs.LG)

While open-domain question answering (QA) systems have proven effective for
answering simple questions, they struggle with more complex questions. Our goal
is to answer more complex questions reliably, without incurring a significant
cost in knowledge resource construction to support the QA. One readily
available knowledge resource is a term bank, enumerating the key concepts in a
domain. We have developed an unsupervised learning approach that leverages a
term bank to guide a QA system, by representing the terminological knowledge
with thousands of specialized vector spaces. In experiments with complex
science questions, we show that this approach significantly outperforms several
state-of-the-art QA systems, demonstrating that significant leverage can be
gained from continuous vector representations of domain terminology.

Unsupervised Spatio-Temporal Embeddings for User and Location Modelling

Jing Yang, Carsten Eickhoff
Subjects: Information Retrieval (cs.IR)

Many social network applications depend on robust representations of
spatio-temporal data. In this work, we present an embedding model based on
feed-forward neural networks which transforms social media check-ins into dense
feature vectors encoding geographic, temporal, and functional aspects for
modelling places, neighborhoods, and users. On the basis of this model, we
further propose a Spatio-Temporal Embedding Similarity algorithm (STES) for
location recommendation.

In a range of experiments on real life data collected from Foursquare,
Twitter, and Gowalla, we demonstrate our model’s effectiveness at
characterizing places and people, resulting in a significantly increased
recommendation and classification performance as compared to the state of the
art. Finally, we select eight major cities around the globe and verify the
robustness and generality of our model by porting pre-trained models from one
city to another, and thereby alleviating the need for costly local training.

Unsupervised part learning for visual recognition

Ronan Sicre, Yannis Avrithis, Ewa Kijak, Frederic Jurie
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

Part-based image classification aims at representing categories by small sets
of learned discriminative parts, upon which an image representation is built.
Considered as a promising avenue a decade ago, this direction has been
neglected since the advent of deep neural networks. In this context, this paper
brings two contributions: first, it shows that despite the recent success of
end-to-end holistic models, explicit part learning can boosts classification
performance. Second, this work proceeds one step further than recent part-based
models (PBM), focusing on how to learn parts without using any labeled data.
Instead of learning a set of parts per class, as generally done in the PBM
literature, the proposed approach both constructs a partition of a given set of
images into visually similar groups, and subsequently learn a set of
discriminative parts per group in a fully unsupervised fashion. This strategy
opens the door to the use of PBM in new applications for which the notion of
image categories is irrelevant, such as instance-based image retrieval, for
example. We experimentally show that our learned parts can help building
efficient image representations, for classification as well as for indexing
tasks, resulting in performance superior to holistic state-of-the art Deep
Convolutional Neural Networks (DCNN) encoding.

Computation and Language

Trainable Referring Expression Generation using Overspecification Preferences

Thiago castro Ferreira, Ivandre Paraboni
Comments: 8 pages
Subjects: Computation and Language (cs.CL)

Referring expression generation (REG) models that use speaker-dependent
information require a considerable amount of training data produced by every
individual speaker, or may otherwise perform poorly. In this work we present a
simple REG experiment that allows the use of larger training data sets by
grouping speakers according to their overspecification preferences. Intrinsic
evaluation shows that this method generally outperforms the personalised method
found in previous work.

Representation Stability as a Regularizer for Improved Text Analytics Transfer Learning

Matthew Riemer, Elham Khabiri, Richard Goodwin
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Although neural networks are well suited for sequential transfer learning
tasks, the catastrophic forgetting problem hinders proper integration of prior
knowledge. In this work, we propose a solution to this problem by using a
multi-task objective based on the idea of distillation and a mechanism that
directly penalizes forgetting at the shared representation layer during the
knowledge integration phase of training. We demonstrate our approach on a
Twitter domain sentiment analysis task with sequential knowledge transfer from
four related tasks. We show that our technique outperforms networks fine-tuned
to the target task. Additionally, we show both through empirical evidence and
examples that it does not forget useful knowledge from the source task that is
forgotten during standard fine-tuning. Surprisingly, we find that first
distilling a human made rule based sentiment engine into a recurrent neural
network and then integrating the knowledge with the target task data leads to a
substantial gain in generalization performance. Our experiments demonstrate the
power of multi-source transfer techniques in practical text analytics problems
when paired with distillation. In particular, for the SemEval 2016 Task 4
Subtask A (Nakov et al., 2016) dataset we surpass the state of the art
established during the competition with a comparatively simple model
architecture that is not even competitive when trained on only the labeled task
specific data.

ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational Knowledge

Robert Speer, Joanna Lowry-Duda
Comments: 5 pages, accepted to the SemEval workshop at ACL 2017
Subjects: Computation and Language (cs.CL)

This paper describes Luminoso’s participation in SemEval 2017 Task 2,
“Multilingual and Cross-lingual Semantic Word Similarity”, with a system based
on ConceptNet. ConceptNet is an open, multilingual knowledge graph that focuses
on general knowledge that relates the meanings of words and phrases. Our
submission to SemEval was an update of previous work that builds high-quality,
multilingual word embeddings from a combination of ConceptNet and
distributional semantics. Our system took first place in both subtasks. It
ranked first in 4 out of 5 of the separate languages, and also ranked first in
all 10 of the cross-lingual language pairs.

What do Neural Machine Translation Models Learn about Morphology?

Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, James Glass
Comments: Accepted to ACL 2017
Subjects: Computation and Language (cs.CL)

Neural machine translation (MT) models obtain state-of-the-art performance
while maintaining a simple, end-to-end architecture. However, little is known
about what these models learn about source and target languages during the
training process. In this work, we analyze the representations learned by
neural MT models at various levels of granularity and empirically evaluate the
quality of the representations for learning morphology through extrinsic
part-of-speech and morphological tagging tasks. We conduct a thorough
investigation along several parameters: word-based vs. character-based
representations, depth of the encoding layer, the identity of the target
language, and encoder vs. decoder representations. Our data-driven,
quantitative evaluation sheds light on important aspects in the neural MT
system and its ability to capture word structure.

A Neural Parametric Singing Synthesizer

Merlijn Blaauw, Jordi Bonada
Comments: Submitted to Interspeech 2017
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Learning (cs.LG)

We present a new model for singing synthesis based on a modified version of
the WaveNet architecture. Instead of modeling raw waveform, we model features
produced by a parametric vocoder that separates the influence of pitch and
timbre. This allows conveniently modifying pitch to match any target melody,
facilitates training on more modest dataset sizes, and significantly reduces
training and generation times. Our model makes frame-wise predictions using
mixture density outputs rather than categorical outputs in order to reduce the
required parameter count. As we found overfitting to be an issue with the
relatively small datasets used in our experiments, we propose a method to
regularize the model and make the autoregressive generation process more robust
to prediction errors. Using a simple multi-stream architecture, harmonic,
aperiodic and voiced/unvoiced components can all be predicted in a coherent
manner. We compare our method to existing parametric statistical and
state-of-the-art concatenative methods using quantitative metrics and a
listening test. While naive implementations of the autoregressive generation
algorithm tend to be inefficient, using a smart algorithm we can greatly speed
up the process and obtain a system that’s competitive in both speed and
quality.

Real-time On-Demand Crowd-powered Entity Extraction

Output-agreement mechanisms such as ESP Game have been widely used in human
computation to obtain reliable human-generated labels. In this paper, we argue
that a “time-limited” output-agreement mechanism can be used to create a fast
and robust crowd-powered component in interactive systems, particularly
dialogue systems, to extract key information from user utterances on the fly.
Our experiments on Amazon Mechanical Turk using the Airline Travel Information
System (ATIS) dataset showed that the proposed approach achieves high-quality
results with an average response time shorter than 9 seconds.

Leveraging Term Banks for Answering Complex Questions: A Case for Sparse Vectors

Peter D. Turney
Comments: Related datasets can be found at this http URL
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Learning (cs.LG)

While open-domain question answering (QA) systems have proven effective for
answering simple questions, they struggle with more complex questions. Our goal
is to answer more complex questions reliably, without incurring a significant
cost in knowledge resource construction to support the QA. One readily
available knowledge resource is a term bank, enumerating the key concepts in a
domain. We have developed an unsupervised learning approach that leverages a
term bank to guide a QA system, by representing the terminological knowledge
with thousands of specialized vector spaces. In experiments with complex
science questions, we show that this approach significantly outperforms several
state-of-the-art QA systems, demonstrating that significant leverage can be
gained from continuous vector representations of domain terminology.

Unsupervised Event Abstraction using Pattern Abstraction and Local Process Models

Felix Mannhardt, Niek Tax
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Process mining analyzes business processes based on events stored in event
logs. However, some recorded events may correspond to activities on a very low
level of abstraction. When events are recorded on a too low level of
granularity, process discovery methods tend to generate overgeneralizing
process models. Grouping low-level events to higher level activities, i.e.,
event abstraction, can be used to discover better process models. Existing
event abstraction methods are mainly based on common sub-sequences and
clustering techniques. In this paper, we propose to first discover local
process models and then use those models to lift the event log to a higher
level of abstraction. Our conjecture is that process models discovered on the
obtained high-level event log return process models of higher quality: their
fitness and precision scores are more balanced. We show this with preliminary
results on several real-life event logs.

Distributed, Parallel, and Cluster Computing

Parallelized Kendall's Tau Coefficient Computation via SIMD Vectorized Sorting On Many-Integrated-Core Processors

Pairwise association measure is an important operation in data analytics.
Kendall’s tau coefficient is one widely used correlation coefficient
identifying non-linear relationships between ordinal variables. In this paper,
we investigated a parallel algorithm accelerating all-pairs Kendall’s tau
coefficient computation via single instruction multiple data (SIMD) vectorized
sorting on Intel Xeon Phis by taking advantage of many processing cores and
512-bit SIMD vector instructions. To facilitate workload balancing and overcome
on-chip memory limitation, we proposed a generic framework for symmetric
all-pairs computation by building provable bijective functions between job
identifier and coordinate space. Performance evaluation demonstrated that our
algorithm on one 5110P Phi achieves two orders-of-magnitude speedups over
16-threaded MATLAB and three orders-of-magnitude speedups over sequential R,
both running on high-end CPUs. Besides, our algorithm exhibited rather good
distributed computing scalability with respect to number of Phis. Source code
and datasets are publicly available at this http URL

NG2C: Pretenuring N-Generational GC for HotSpot Big Data Applications

Rodrigo Bruno, Luís Oliveira, Paulo Ferreira
Comments: Accepted at ISMM’17
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Big Data applications suffer from unpredictable and unacceptably high pause
times due to Garbage Collection (GC). This is the case in latency-sensitive
applications such as on-line credit-card fraud detection, graph-based computing
for analysis on social networks, etc. Such pauses compromise latency
requirements of the whole application stack and result from applications’
aggressive buffering/caching of data, exposing an ill-suited GC design, which
assumes that most objects will die young and does not consider that
applications hold large amounts of middle-lived data in memory.

To avoid such pauses, we propose NG2C, a new GC algorithm that combines
pretenuring with an N-Generational heap. By being able to allocate objects into
different generations, NG2C is able to group objects with similar lifetime
profiles in the same generation. By allocating objects with similar lifetime
profiles close to each other, i.e. in the same generation, we avoid object
promotion (copying between generations) and heap fragmentation (which leads to
heap compactions) both responsible for most of the duration of HotSpot GC pause
times.

NG2C is implemented for the OpenJDK 8 HotSpot Java Virtual Machine, as an
extension of the Garbage First GC. We evaluate NG2C using Cassandra, Lucene,
and GraphChi with three different GCs: Garbage First (G1), Concurrent Mark
Sweep (CMS), and NG2C. Results show that NG2C decreases the worst observable GC
pause time by up to 94.8% for Cassandra, 85.0% for Lucene and 96.45% for
GraphChi, when compared to current collectors (G1 and CMS). In addition, NG2C
has no negative impact on application throughput or memory usage.

Optimal Repair Layering for Erasure-Coded Data Centers: From Theory to Practice

Yuchong Hu, Xiaolu Li, Mi Zhang, Patrick P. C. Lee, Xiaoyang Zhang, Pan Zhou, Dan Feng
Comments: 23 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Repair performance in hierarchical data centers is often bottlenecked by
cross-rack network transfer. Recent theoretical results show that the
cross-rack repair bandwidth can be minimized through repair layering, whose
idea is to partition a repair operation into inner-rack and cross-rack layers.
However, how repair layering should be implemented and deployed in practice
remains an open issue. In this paper, we address this issue by proposing a
practical repair layering framework called DoubleR. We design two families of
practical double regenerating codes (DRC), which not only minimize the
cross-rack repair bandwidth, but also have several practical properties that
improve state-of-the-art regenerating codes. We implement and deploy DoubleR
atop Hadoop Distributed File System (HDFS), and show that DoubleR maintains the
theoretical guarantees of DRC and improves the repair performance of
regenerating codes in both node recovery and degraded read operations.

A Component-Based Dual Decomposition Method for the OPF Problem

Sleiman Mhanna, Gregor Verbic, Archie Chapman
Comments: 8 pages + 4 pages of examples. Submitted to IEEE Transactions on Power Systems
Journal-ref: IEEE Transactions on Power Systems, 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)

This paper proposes a component-based dual decomposition of the optimal power
flow (OPF) problem, where the modified dual function is solved in a distributed
fashion. This paper is the first to conduct extensive numerical analysis
resulting in the identification and tabulation of the algorithmic parameter
settings that are crucial for the convergence of the method on a vast array of
test instances. Moreover, this work provides a deeper insight into the geometry
of the modified Lagrange dual function of the OPF problem and highlights the
conditions that make this function differentiable. Despite the inherent
nonconvexity of the AC OPF problem, this numerical proof of convergence coupled
with the scalability and the privacy preserving nature of the proposed method
brings component-based distributed OPF several steps closer to reality.

Toward a Distributed Knowledge Discovery system for Grid systems

Nhien-An Le-Khac, Lamine Aouad, M-Tahar Kechadi
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

During the last decade or so, we have had a deluge of data from not only
science fields but also industry and commerce fields. Although the amount of
data available to us is constantly increasing, our ability to process it
becomes more and more difficult. Efficient discovery of useful knowledge from
these datasets is therefore becoming a challenge and a massive economic need.
This led to the need of developing large-scale data mining (DM) techniques to
deal with these huge datasets either from science or economic applications. In
this chapter, we present a new DDM system combining dataset-driven and
architecture-driven strategies. Data-driven strategies will consider the size
and heterogeneity of the data, while architecture driven will focus on the
distribution of the datasets. This system is based on a Grid middleware tools
that integrate appropriate large data manipulation operations. Therefore, this
allows more dynamicity and autonomicity during the mining, integrating and
processing phases

Feature Selection Parallel Technique for Remotely Sensed Imagery Classification

Nhien-An Le-Khac, M-Tahar Kechadi, Bo Wu, C. Chen
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Remote sensing research focusing on feature selection has long attracted the
attention of the remote sensing community because feature selection is a
prerequisite for image processing and various applications. Different feature
selection methods have been proposed to improve the classification accuracy.
They vary from basic search techniques to clonal selections, and various
optimal criteria have been investigated. Recently, methods using
dependence-based measures have attracted much attention due to their ability to
deal with very high dimensional datasets. However, these methods are based on
Cramers V test, which has performance issues with large datasets. In this
paper, we propose a parallel approach to improve their performance. We evaluate
our approach on hyper-spectral and high spatial resolution images and compare
it to the proposed methods with a centralized version as preliminary results.
The results are very promising.

Toward a new approach for massive LiDAR data processing

V-H Cao, K-X Chu, Nhien-An Le-Khac, M-T Kechadi, Debra F. Laefer, Linh Truong-Hong
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Laser scanning (also known as Light Detection And Ranging) has been widely
applied in various application. As part of that, aerial laser scanning (ALS)
has been used to collect topographic data points for a large area, which
triggers to million points to be acquired. Furthermore, today, with integrating
full wareform (FWF) technology during ALS data acquisition, all return
information of laser pulse is stored. Thus, ALS data are to be massive and
complexity since the FWF of each laser pulse can be stored up to 256 samples
and density of ALS data is also increasing significantly. Processing LiDAR data
demands heavy operations and the traditional approaches require significant
hardware and running time. On the other hand, researchers have recently
proposed parallel approaches for analysing LiDAR data. These approaches are
normally based on parallel architecture of target systems such as multi-core
processors, GPU, etc. However, there is still missing efficient
approaches/tools supporting the analysis of LiDAR data due to the lack of a
deep study on both library tools and algorithms used in processing this data.
In this paper, we present a comparative study of software libraries and
algorithms to optimise the processing of LiDAR data. We also propose new method
to improve this process with experiments on large LiDAR data. Finally, we
discuss on a parallel solution of our approach where we integrate parallel
computing in processing LiDAR data.

Learning

Determining Song Similarity via Machine Learning Techniques and Tagging Information

Renato L. F. Cunha, Evandro Caldeira, Luciana Fujii
Comments: 6 pages, 2 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

The task of determining item similarity is a crucial one in a recommender
system. This constitutes the base upon which the recommender system will work
to determine which items are more likely to be enjoyed by a user, resulting in
more user engagement. In this paper we tackle the problem of determining song
similarity based solely on song metadata (such as the performer, and song
title) and on tags contributed by users. We evaluate our approach under a
series of different machine learning algorithms. We conclude that tf-idf
achieves better results than Word2Vec to model the dataset to feature vectors.
We also conclude that k-NN models have better performance than SVMs and Linear
Regression for this problem.

MAGAN: Margin Adaptation for Generative Adversarial Networks

Ruohan Wang, Antoine Cully, Hyung Jin Chang, Yiannis Demiris
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We propose a novel training procedure for Generative Adversarial Networks
(GANs) to improve stability and performance by using an adaptive hinge loss
objective function. We estimate the appropriate hinge loss margin with the
expected energy of the target distribution, and derive both a principled
criterion for updating the margin and an approximate convergence measure. The
resulting training procedure is simple yet robust on a diverse set of datasets.
We evaluate the proposed training procedure on the task of unsupervised image
generation, noting both qualitative and quantitative performance improvements.

Enabling Embedded Inference Engine with ARM Compute Library

Dawei Sun, Shaoshan Liu, Jean-Luc Gaudiot
Comments: 2 pages, 4 figures
Subjects: Learning (cs.LG)

When you need to enable deep learning on low-cost embedded SoCs, is it better
to port an existing deep learning framework or should you build one from
scratch? In this paper, we share our practical experiences of building an
embedded inference engine using ARM Compute Library (ACL). The results show
that, contradictory to conventional wisdoms, for simple models, it takes much
less development time to build an inference engine from scratch compared to
porting existing frameworks. In addition, by utilizing ACL, we managed to build
an inference engine that outperforms TensorFlow by 25%. Our conclusion is that,
on embedded devices, we most likely will use very simple deep learning models
for inference, and with well-developed building blocks such as ACL, it may be
better in both performance and development time to build the engine from
scratch.

Deep Extreme Multi-label Learning

Wenjie Zhang, Liwei Wang, Junchi Yan, Xiangfeng Wang, Hongyuan Zha
Comments: 7 pages, 7 figures
Subjects: Learning (cs.LG)

Extreme multi-label learning or classification has been a practical and
important problem since the boom of big data. The main challenge lies in the
exponential label space which involves 2L possible label sets when the label
dimension L is very large e.g. in millions for Wikipedia labels. This paper is
motivated to better explore the label space by build- ing and modeling an
explicit label graph. In the meanwhile, deep learning has been widely studied
and used in various classification problems includ- ing multi-label
classification, however it has not been sufficiently studied in this extreme
but practi- cal case, where the label space can be as large as in millions. In
this paper, we propose a practical deep embedding method for extreme
multi-label classifi- cation. Our method harvests the ideas of non-linear
embedding and modeling label space with graph priors at the same time.
Extensive experiments on public datasets for XML show that our method per- form
competitively against state-of-the-art result.

Sampling-based speech parameter generation using moment-matching networks

Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari
Comments: Submitted to INTERSPEECH 2017
Subjects: Learning (cs.LG)

This paper presents sampling-based speech parameter generation using
moment-matching networks for Deep Neural Network (DNN)-based speech synthesis.
Although people never produce exactly the same speech even if we try to express
the same linguistic and para-linguistic information, typical statistical speech
synthesis produces completely the same speech, i.e., there is no
inter-utterance variation in synthetic speech. To give synthetic speech natural
inter-utterance variation, this paper builds DNN acoustic models that make it
possible to randomly sample speech parameters. The DNNs are trained so that
they make the moments of generated speech parameters close to those of natural
speech parameters. Since the variation of speech parameters is compressed into
a low-dimensional simple prior noise vector, our algorithm has lower
computation cost than direct sampling of speech parameters. As the first step
towards generating synthetic speech that has natural inter-utterance variation,
this paper investigates whether or not the proposed sampling-based generation
deteriorates synthetic speech quality. In evaluation, we compare speech quality
of conventional maximum likelihood-based generation and proposed sampling-based
generation. The result demonstrates the proposed generation causes no
degradation in speech quality.

Active classification with comparison queries

Daniel M. Kane, Shachar Lovett, Shay Moran, Jiapeng Zhang
Comments: 23 pages (not including references), 1 figure
Subjects: Learning (cs.LG); Computational Geometry (cs.CG)

We study an extension of active learning in which the learning algorithm may
ask the annotator to compare the distances of two examples from the boundary of
their label-class. For example, in a recommendation system application (say for
restaurants), the annotator may be asked whether she liked or disliked a
specific restaurant (a label query); or which one of two restaurants did she
like more (a comparison query).

We focus on the class of half spaces, and show that under natural
assumptions, such as large margin or bounded bit-description of the input
examples, it is possible to reveal all the labels of a sample of size (n) using
approximately (O(log n)) queries. This implies an exponential improvement over
classical active learning, where only label queries are allowed. We complement
these results by showing that if any of these assumptions is removed then, in
the worst case, (Omega(n)) queries are required.

Our results follow from a new general framework of active learning with
additional queries. We identify a combinatorial dimension, called the
emph{inference dimension}, that captures the query complexity when each
additional query is determined by (O(1)) examples (such as comparison queries,
each of which is determined by the two compared examples). Our results for half
spaces follow by bounding the inference dimension in the cases discussed above.

Robustly Learning a Gaussian: Getting Optimal Error, Efficiently

Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart
Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

We study the fundamental problem of learning the parameters of a
high-dimensional Gaussian in the presence of noise — where an
(varepsilon)-fraction of our samples were chosen by an adversary. We give
robust estimators that achieve estimation error (O(varepsilon)) in the total
variation distance, which is optimal up to a universal constant that is
independent of the dimension.

In the case where just the mean is unknown, our robustness guarantee is
optimal up to a factor of (sqrt{2}) and the running time is polynomial in (d)
and (1/epsilon). When both the mean and covariance are unknown, the running
time is polynomial in (d) and quasipolynomial in (1/varepsilon). Moreover all
of our algorithms require only a polynomial number of samples. Our work shows
that the same sorts of error guarantees that were established over fifty years
ago in the one-dimensional setting can also be achieved by efficient algorithms
in high-dimensional settings.

Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

This paper presents a new 3D point cloud classification benchmark data set
with over four billion manually labelled points, meant as input for data-hungry
(deep) learning methods. We also discuss first submissions to the benchmark
that use deep convolutional neural networks (CNNs) as a work horse, which
already show remarkable performance improvements over state-of-the-art. CNNs
have become the de-facto standard for many tasks in computer vision and machine
learning like semantic segmentation or object detection in images, but have no
yet led to a true breakthrough for 3D point cloud labelling tasks due to lack
of training data. With the massive data set presented in this paper, we aim at
closing this data gap to help unleash the full potential of deep learning
methods for 3D labelling tasks. Our semantic3D.net data set consists of dense
point clouds acquired with static terrestrial laser scanners. It contains 8
semantic classes and covers a wide range of urban outdoor scenes: churches,
streets, railroad tracks, squares, villages, soccer fields and castles. We
describe our labelling interface and show that our data set provides more dense
and complete point clouds with much higher overall number of labelled points
compared to those already available to the research community. We further
provide baseline method descriptions and comparison between methods submitted
to our online system. We hope semantic3D.net will pave the way for deep
learning methods in 3D point cloud labelling to learn richer, more general 3D
representations, and first submissions after only a few months indicate that
this might indeed be the case.

Deep Neural Network Based Precursor microRNA Prediction on Eleven Species

Jaya Thomas, Lee Sael
Comments: 6 pages, 2 figures, extended from BigComp2017 short paper
Subjects: Quantitative Methods (q-bio.QM); Learning (cs.LG)

MicroRNA (miRNA) are small non-coding RNAs that regulates the gene expression
at the post-transcriptional level. Determining whether a sequence segment is
miRNA is experimentally challenging. Also, experimental results are sensitive
to the experimental environment. These limitations inspire the development of
computational methods for predicting the miRNAs. We propose a deep learning
based classification model, called DP-miRNA, for predicting precursor miRNA
sequence that contains the miRNA sequence. The feature set based Restricted
Boltzmann Machine method, which we call DP-miRNA, uses 58 features that are
categorized into four groups: sequence features, folding measures, stem-loop
features and statistical feature. We evaluate the performance of the DP-miRNA
on eleven twelve data sets of varying species, including the human. The deep
neural network based classification outperformed support vector machine, neural
network, naive Baye’s classifiers, k-nearest neighbors, random forests, and a
hybrid system combining support vector machine and genetic algorithm.

A Neural Parametric Singing Synthesizer

Merlijn Blaauw, Jordi Bonada
Comments: Submitted to Interspeech 2017
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Learning (cs.LG)

We present a new model for singing synthesis based on a modified version of
the WaveNet architecture. Instead of modeling raw waveform, we model features
produced by a parametric vocoder that separates the influence of pitch and
timbre. This allows conveniently modifying pitch to match any target melody,
facilitates training on more modest dataset sizes, and significantly reduces
training and generation times. Our model makes frame-wise predictions using
mixture density outputs rather than categorical outputs in order to reduce the
required parameter count. As we found overfitting to be an issue with the
relatively small datasets used in our experiments, we propose a method to
regularize the model and make the autoregressive generation process more robust
to prediction errors. Using a simple multi-stream architecture, harmonic,
aperiodic and voiced/unvoiced components can all be predicted in a coherent
manner. We compare our method to existing parametric statistical and
state-of-the-art concatenative methods using quantitative metrics and a
listening test. While naive implementations of the autoregressive generation
algorithm tend to be inefficient, using a smart algorithm we can greatly speed
up the process and obtain a system that’s competitive in both speed and
quality.

A Proof of Orthogonal Double Machine Learning with (Z)-Estimators

Vasilis Syrgkanis
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Statistics Theory (math.ST)

We consider two stage estimation with a non-parametric first stage and a
generalized method of moments second stage, in a simpler setting than
(Chernozhukov et al. 2016). We give an alternative proof of the theorem given
in (Chernozhukov et al. 2016) that orthogonal second stage moments, sample
splitting and (n^{1/4})-consistency of the first stage, imply
(sqrt{n})-consistency and asymptotic normality of second stage estimates. Our
proof is for a variant of their estimator, which is based on the empirical
version of the moment condition (Z-estimator), rather than a minimization of a
norm of the empirical vector of moments (M-estimator). This note is meant
primarily for expository purposes, rather than as a new technical contribution.

Deep-FExt: Deep Feature Extraction for Vessel Segmentation and Centerline Prediction

Giles Tetteh, Markus Rempfler, Bjoern H. Menze, Claus Zimmer
Comments: 9 pages
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Feature extraction is a very crucial task in image and pixel (voxel)
classification and regression in biomedical image modeling. In this work we
present a machine learning based feature extraction scheme based on inception
models for pixel classification tasks. We extract features under multi-scale
and multi-layer schemes through convolutional operators. Layers of Fully
Convolutional Network are later stacked on this feature extraction layers and
trained end-to-end for the purpose of classification. We test our model on the
DRIVE and STARE public data sets for the purpose of segmentation and centerline
detection and it out performs most existing hand crafted or deterministic
feature schemes found in literature. We achieve an average maximum Dice of 0.85
on the DRIVE data set which out performs the scores from the second human
annotator of this data set. We also achieve an average maximum Dice of 0.85 and
kappa of 0.84 on the STARE data set. Though these datasets are mainly 2-D we
also propose ways of extending this feature extraction scheme to handle 3-D
datasets.

Learning from Demonstrations for Real World Reinforcement Learning

Deep reinforcement learning (RL) has achieved several high profile successes
in difficult control problems. However, these algorithms typically require a
huge amount of data before they reach reasonable performance. In fact, their
performance during learning can be extremely poor. This may be acceptable for a
simulator, but it severely limits the applicability of deep RL to many
real-world tasks, where the agent must learn in the real environment. In this
paper we study a setting where the agent may access data from previous control
of the system. We present an algorithm, Deep Q-learning from Demonstrations
(DQfD), that leverages this data to massively accelerate the learning process
even from relatively small amounts of demonstration data. DQfD works by
combining temporal difference updates with large-margin classification of the
demonstrator’s actions. We show that DQfD has better initial performance than
Deep Q-Networks (DQN) on 40 of 42 Atari games and it receives more average
rewards than DQN on 27 of 42 Atari games. We also demonstrate that DQfD learns
faster than DQN even when given poor demonstration data.

Investigation on the use of Hidden-Markov Models in automatic transcription of music

D. Cazau, G. Nuel
Comments: arXiv admin note: text overlap with arXiv:1703.09772
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)

Hidden Markov Models (HMMs) are a ubiquitous tool to model time series data,
and have been widely used in two main tasks of Automatic Music Transcription
(AMT): note segmentation, i.e. identifying the played notes after a multi-pitch
estimation, and sequential post-processing, i.e. correcting note segmentation
using training data. In this paper, we employ the multi-pitch estimation method
called Probabilistic Latent Component Analysis (PLCA), and develop AMT systems
by integrating different HMM-based modules in this framework. For note
segmentation, we use two different twostate on/o? HMMs, including a
higher-order one for duration modeling. For sequential post-processing, we
focused on a musicological modeling of polyphonic harmonic transitions, using a
first- and second-order HMMs whose states are defined through candidate note
mixtures. These different PLCA plus HMM systems have been evaluated
comparatively on two different instrument repertoires, namely the piano (using
the MAPS database) and the marovany zither. Our results show that the use of
HMMs could bring noticeable improvements to transcription results, depending on
the instrument repertoire.

Energy Propagation in Deep Convolutional Neural Networks

Thomas Wiatowski, Philipp Grohs, Helmut Bölcskei
Comments: IEEE Transactions on Information Theory, Apr. 2017, submitted
Subjects: Information Theory (cs.IT); Learning (cs.LG); Functional Analysis (math.FA); Machine Learning (stat.ML)

Many practical machine learning tasks employ very deep convolutional neural
networks. Such large depths pose formidable computational challenges in
training and operating the network. It is therefore important to understand how
many layers are actually needed to have most of the input signal’s features be
contained in the feature vector generated by the network. This question can be
formalized by asking how quickly the energy contained in the feature maps
decays across layers. In addition, it is desirable that none of the input
signal’s features be “lost” in the feature extraction network or, more
formally, we want energy conservation in the sense of the energy contained in
the feature vector being proportional to that of the corresponding input
signal. This paper establishes conditions for energy conservation for a wide
class of deep convolutional neural networks and characterizes corresponding
feature map energy decay rates. Specifically, we consider general scattering
networks, and find that under mild analyticity and high-pass conditions on the
filters (which encompass, inter alia, various constructions of Weyl-Heisenberg
filters, wavelets, ridgelets, ((alpha))-curvelets, and shearlets) the feature
map energy decays at least polynomially fast. For broad families of wavelets
and Weyl-Heisenberg filters, the guaranteed decay rate is shown to be
exponential. Our results yield handy estimates of the number of layers needed
to have at least (((1-varepsilon)cdot 100)\%) of the input signal energy be
contained in the feature vector.

Representation Stability as a Regularizer for Improved Text Analytics Transfer Learning

Matthew Riemer, Elham Khabiri, Richard Goodwin
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Although neural networks are well suited for sequential transfer learning
tasks, the catastrophic forgetting problem hinders proper integration of prior
knowledge. In this work, we propose a solution to this problem by using a
multi-task objective based on the idea of distillation and a mechanism that
directly penalizes forgetting at the shared representation layer during the
knowledge integration phase of training. We demonstrate our approach on a
Twitter domain sentiment analysis task with sequential knowledge transfer from
four related tasks. We show that our technique outperforms networks fine-tuned
to the target task. Additionally, we show both through empirical evidence and
examples that it does not forget useful knowledge from the source task that is
forgotten during standard fine-tuning. Surprisingly, we find that first
distilling a human made rule based sentiment engine into a recurrent neural
network and then integrating the knowledge with the target task data leads to a
substantial gain in generalization performance. Our experiments demonstrate the
power of multi-source transfer techniques in practical text analytics problems
when paired with distillation. In particular, for the SemEval 2016 Task 4
Subtask A (Nakov et al., 2016) dataset we surpass the state of the art
established during the competition with a comparatively simple model
architecture that is not even competitive when trained on only the labeled task
specific data.

Leveraging Term Banks for Answering Complex Questions: A Case for Sparse Vectors

Peter D. Turney
Comments: Related datasets can be found at this http URL
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Learning (cs.LG)

While open-domain question answering (QA) systems have proven effective for
answering simple questions, they struggle with more complex questions. Our goal
is to answer more complex questions reliably, without incurring a significant
cost in knowledge resource construction to support the QA. One readily
available knowledge resource is a term bank, enumerating the key concepts in a
domain. We have developed an unsupervised learning approach that leverages a
term bank to guide a QA system, by representing the terminological knowledge
with thousands of specialized vector spaces. In experiments with complex
science questions, we show that this approach significantly outperforms several
state-of-the-art QA systems, demonstrating that significant leverage can be
gained from continuous vector representations of domain terminology.

A Neural Representation of Sketch Drawings

David Ha, Douglas Eck
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

We present sketch-rnn, a recurrent neural network (RNN) able to construct
stroke-based drawings of common objects. The model is trained on thousands of
crude human-drawn images representing hundreds of classes. We outline a
framework for conditional and unconditional sketch generation, and describe new
robust training methods for generating coherent sketch drawings in a vector
format.

Personalized Survival Predictions for Cardiac Transplantation via Trees of Predictors

J. Yoon, W. R. Zame, A. Banerjee, M. Cadeiras, A. M. Alaa, M. van der Schaar
Comments: Main manuscript: 20 pages, Supplementary materials: 13 pages, 5 figures, 3 tables. Submitted to Science Translational Medicine
Subjects: Applications (stat.AP); Learning (cs.LG)

Given the limited pool of donor organs, accurate predictions of survival on
the wait list and post transplantation are crucial for cardiac transplantation
decisions and policy. However, current clinical risk scores do not yield
accurate predictions. We develop a new methodology (ToPs, Trees of Predictors)
built on the principle that specific predictors should be used for specific
clusters within the target population. ToPs discovers these specific clusters
of patients and the specific predictor that perform best for each cluster. In
comparison with current clinical risk scoring systems, our method provides
significant improvements in the prediction of survival time on the wait list
and post transplantation. For example, in terms of 3 month survival for
patients who were on the US patient wait list in the period 1985 to 2015, our
method achieves AUC of 0.847, the best commonly used clinical risk score
(MAGGIC) achieves 0.630. In terms of 3 month survival/mortality predictions (in
comparison to MAGGIC), holding specificity at 80.0 percents, our algorithm
correctly predicts survival for 1,228 (26.0 percents more patients out of 4,723
who actually survived, holding sensitivity at 80.0 percents, our algorithm
correctly predicts mortality for 839 (33.0 percents) more patients out of 2,542
who did not survive. Our method achieves similar improvements for other time
horizons and for predictions post transplantation. Therefore, we offer a more
accurate, personalized approach to survival analysis that can benefit patients,
clinicians and policymakers in making clinical decisions and setting clinical
policy. Because risk prediction is widely used in diagnostic and prognostic
clinical decision making across diseases and clinical specialties, the
implications of our methods are far reaching.

Information Theory

From ds-bounds for cyclic codes to true distance for abelian codes

J. J. Bernal, M. Guerreiro, J. J. Simón
Comments: arXiv admin note: text overlap with arXiv:1604.02949
Subjects: Information Theory (cs.IT)

In this paper we develop a technique to extend any bound for the minimum
distance of cyclic codes constructed from its defining sets (ds-bounds) to
abelian (or multivariate) codes through the notion of (mathbb{B})-apparent
distance. We use this technique to improve the searching for new bounds for the
minimum distance of abelian codes. We also study conditions for an abelian code
to verify that its (mathbb{B})-apparent distance reaches its (true) minimum
distance. Then we construct some tables of such codes as an application

Energy Propagation in Deep Convolutional Neural Networks

Many practical machine learning tasks employ very deep convolutional neural
networks. Such large depths pose formidable computational challenges in
training and operating the network. It is therefore important to understand how
many layers are actually needed to have most of the input signal’s features be
contained in the feature vector generated by the network. This question can be
formalized by asking how quickly the energy contained in the feature maps
decays across layers. In addition, it is desirable that none of the input
signal’s features be “lost” in the feature extraction network or, more
formally, we want energy conservation in the sense of the energy contained in
the feature vector being proportional to that of the corresponding input
signal. This paper establishes conditions for energy conservation for a wide
class of deep convolutional neural networks and characterizes corresponding
feature map energy decay rates. Specifically, we consider general scattering
networks, and find that under mild analyticity and high-pass conditions on the
filters (which encompass, inter alia, various constructions of Weyl-Heisenberg
filters, wavelets, ridgelets, ((alpha))-curvelets, and shearlets) the feature
map energy decays at least polynomially fast. For broad families of wavelets
and Weyl-Heisenberg filters, the guaranteed decay rate is shown to be
exponential. Our results yield handy estimates of the number of layers needed
to have at least (((1-varepsilon)cdot 100)\%) of the input signal energy be
contained in the feature vector.

Hybrid Beamforming via the Kronecker Decomposition for the Millimeter-Wave Massive MIMO Systems

Guangxu Zhu, Kaibin Huang, Vincent K. N. Lau, Bin Xia, Xiaofan Li, Sha Zhang
Comments: Submitted to IEEE JSAC Special Issue on Millimeter Wave Communications for Future Mobile Networks, minor revision
Subjects: Information Theory (cs.IT)

Despite its promising performance gain, the realization of mmWave massive
MIMO still faces several practical challenges. In particular, implementing
massive MIMO in the digital domain requires hundreds of RF chains matching the
number of antennas. Furthermore, designing these components to operate at the
mmWave frequencies is challenging and costly. These motivated the recent
development of hybrid-beamforming where MIMO processing is divided for separate
implementation in the analog and digital domains, called the analog and digital
beamforming, respectively. Analog beamforming using a phase array introduces
uni-modulus constraints on the beamforming coefficients, rendering the
conventional MIMO techniques unsuitable and call for new designs. In this
paper, we present a systematic design framework for hybrid beamforming for
multi-cell multiuser massive MIMO systems over mmWave channels characterized by
sparse propagation paths. The framework relies on the decomposition of analog
beamforming vectors and path observation vectors into Kronecker products of
factors being uni-modulus vectors. Exploiting properties of Kronecker mixed
products, different factors of the analog beamformer are designed for either
nulling interference paths or coherently combining data paths. Furthermore, a
channel estimation scheme is designed for enabling the proposed hybrid
beamforming. The scheme estimates the AoA of data and interference paths by
analog beam scanning and data-path gains by analog beam steering. The
performance of the channel estimation scheme is analyzed. In particular, the
AoA spectrum resulting from beam scanning, which displays the magnitude
distribution of paths over the AoA range, is derived in closed-form. It is
shown that the inter-cell interference level diminishes inversely with the
array size, the square root of pilot sequence length and the spatial separation
between paths.

Privacy-Aware Guessing Efficiency

Shahab Asoodeh, Mario Diaz, Fady Alajaji, Tamás Linder
Comments: ISIT 2017
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Statistics Theory (math.ST)

We investigate the problem of guessing a discrete random variable (Y) under a
privacy constraint dictated by another correlated discrete random variable (X),
where both guessing efficiency and privacy are assessed in terms of the
probability of correct guessing. We define (h(P_{XY}, epsilon)) as the maximum
probability of correctly guessing (Y) given an auxiliary random variable (Z),
where the maximization is taken over all (P_{Z|Y}) ensuring that the
probability of correctly guessing (X) given (Z) does not exceed (epsilon). We
show that the map (epsilonmapsto h(P_{XY}, epsilon)) is strictly increasing,
concave, and piecewise linear, which allows us to derive a closed form
expression for (h(P_{XY}, epsilon)) when (X) and (Y) are connected via a
binary-input binary-output channel. For ((X^n, Y^n)) being pairs of independent
and identically distributed binary random vectors, we similarly define
(underline{h}_n(P_{X^nY^n}, epsilon)) under the assumption that (Z^n) is also
a binary vector. Then we obtain a closed form expression for
(underline{h}_n(P_{X^nY^n}, epsilon)) for sufficiently large, but nontrivial
values of (epsilon).

NOMA based Calibration for Large-Scale Spaceborne Antenna Arrays

Yujie Lin, Shuai Wang, Xiangyuan Bu, Chengwen Xing, Jianping An
Comments: 30 Pages, 8 Figures
Subjects: Information Theory (cs.IT)

In the parallel calibration for transmitting phased arrays, the calibration
receiver must separate the signals belonging to different antenna elements to
avoid mutual interference. Existing algorithms encode different antenna
elements’ radiation with orthogonal signature codes, but these algorithms are
far from desired for large-scale spaceborne antenna arrays. Considering the
strictly limited resources on satellites, to improve hardware efficiency of
large-scale spaceborne antenna arrays, in this paper based on non-orthogonal
multiple access (NOMA) we design a series of non-orthogonal signature codes for
different antenna elements by Cyclically Shifting an m-Sequence (CSmS) with
different offsets named as CSmS-NOMA signaling. This scheme can give an elegant
balance between performance and complexity and is very suitable for large-scale
spaceborne antenna arrays. It is shown that no matter how many antenna elements
are to be calibrated simultaneously, CSmS-NOMA signaling needs only one
calibrating waveform generator and one matched filter. Hence it is much more
efficient than the existing fully orthogonal schemes. To evaluate the
achievable calibration accuracy, a unified theoretical framework is developed
based on which the relationship between calibration accuracy and signal to
noise ratio (SNR) has been clearly revealed. Furthermore, a hardware experiment
platform is also built to access the theoretical work. For all the considered
scenarios, it can be concluded that the theoretical, simulated and experimental
results coincide with each other perfectly.

Bayesian Optimal Data Detector for mmWave OFDM System with Low-Resolution ADC

Hanqing Wang, Chao-Kai Wen, Shi Jin
Comments: 32 pages, 12 figures; accepted by IEEE JSAC special issue on millimeter wave communications for future mobile networks
Subjects: Information Theory (cs.IT)

Orthogonal frequency division multiplexing (OFDM) has been widely used in
communication systems operating in the millimeter wave (mmWave) band to combat
frequency-selective fading and achieve multi-Gbps transmissions, such as IEEE
802.15.3c and IEEE 802.11ad. For mmWave systems with ultra high sampling rate
requirements, the use of low-resolution analog-to-digital converters (ADCs)
(i.e., 1-3 bits) ensures an acceptable level of power consumption and system
costs. However, orthogonality among sub-channels in the OFDM system cannot be
maintained because of the severe non-linearity caused by low-resolution ADC,
which renders the design of data detector challenging. In this study, we
develop an efficient algorithm for optimal data detection in the mmWave OFDM
system with low-resolution ADCs. The analytical performance of the proposed
detector is derived and verified to achieve the fundamental limit of the
Bayesian optimal design. On the basis of the derived analytical expression, we
further propose a power allocation (PA) scheme that seeks to minimize the
average symbol error rate. In addition to the optimal data detector, we also
develop a feasible channel estimation method, which can provide high-quality
channel state information without significant pilot overhead. Simulation
results confirm the accuracy of our analysis and illustrate that the
performance of the proposed detector in conjunction with the proposed PA scheme
is close to the optimal performance of the OFDM system with infinite-resolution
ADC.

On Codes over (mathbb{F}_{q}+vmathbb{F}_{q}+v^{2}mathbb{F}_{q})

A. Melakhessou, K. Guenda, T. A. Gulliver, M. Shi, P. Solé
Comments: 19 pages
Subjects: Information Theory (cs.IT); Rings and Algebras (math.RA)

In this paper we investigate linear codes with complementary dual (LCD) codes
and formally self-dual codes over the ring (R=F_{q}+vF_{q}+v^{2}F_{q}),
where (v^{3}=v), for (q) odd. We give conditions on the existence of LCD codes
and present construction of formally self-dual codes over (R). Further, we give
bounds on the minimum distance of LCD codes over (F_q) and extend these to
codes over (R).

Robustly Learning a Gaussian: Getting Optimal Error, Efficiently

We study the fundamental problem of learning the parameters of a
high-dimensional Gaussian in the presence of noise — where an
(varepsilon)-fraction of our samples were chosen by an adversary. We give
robust estimators that achieve estimation error (O(varepsilon)) in the total
variation distance, which is optimal up to a universal constant that is
independent of the dimension.

In the case where just the mean is unknown, our robustness guarantee is
optimal up to a factor of (sqrt{2}) and the running time is polynomial in (d)
and (1/epsilon). When both the mean and covariance are unknown, the running
time is polynomial in (d) and quasipolynomial in (1/varepsilon). Moreover all
of our algorithms require only a polynomial number of samples. Our work shows
that the same sorts of error guarantees that were established over fifty years
ago in the one-dimensional setting can also be achieved by efficient algorithms
in high-dimensional settings.

Dynamic Quadratic Cheap Talk and Signaling Games

Serkan Sarıtaş, Serdar Yüksel, Sinan Gezici
Comments: 30 pages
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)

Simultaneous (Nash) and sequential (Stackelberg) equilibria of two-player
dynamic quadratic cheap talk and signaling game problems are investigated under
a perfect Bayesian formulation. For the dynamic scalar and multi-dimensional
cheap talk, it is shown that the Nash equilibrium cannot be fully revealing
whereas the Stackelberg equilibrium is always fully revealing. In addition, the
final state Nash equilibria have to be essentially quantized when the source is
scalar, and non-revealing for the multi-dimensional case. In the dynamic
signaling game where the transmission of a Gauss-Markov source over a
memoryless Gaussian channel is considered, affine policies constitute an
invariant subspace under best response maps for both scalar and
multi-dimensional sources under Nash equilibria; however, the Stackelberg
equilibrium policies are always linear for scalar sources but may be non-linear
for multi-dimensional sources. Under the Stackelberg setup, the conditions
under which the equilibrium is non-informative are derived for scalar sources,
and a dynamic programming solution is presented when the encoders are
restricted to be linear for multi-dimensional sources.

Inter-Operator Resource Management for Millimeter Wave, Multi-Hop Backhaul Networks

Omid Semiari, Walid Saad, Mehdi Bennis, Zaher Dawy
Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

In this paper, a novel framework is proposed for optimizing the operation and
performance of a large-scale, multi-hop millimeter wave (mmW) backhaul within a
wireless small cell network (SCN) that encompasses multiple mobile network
operators (MNOs). The proposed framework enables the small base stations (SBSs)
to jointly decide on forming the multi-hop, mmW links over backhaul
infrastructure that belongs to multiple, independent MNOs, while properly
allocating resources across those links. In this regard, the problem is
addressed using a novel framework based on matching theory that is composed to
two, highly inter-related stages: a multi-hop network formation stage and a
resource management stage. One unique feature of this framework is that it
jointly accounts for both wireless channel characteristics and economic factors
during both network formation and resource management. The multi-hop network
formation stage is formulated as a one-to-many matching game which is solved
using a novel algorithm, that builds on the so-called deferred acceptance
algorithm and is shown to yield a stable and Pareto optimal multi-hop mmW
backhaul network. Then, a one-to-many matching game is formulated to enable
proper resource allocation across the formed multi-hop network. This game is
then shown to exhibit peer effects and, as such, a novel algorithm is developed
to find a stable and optimal resource management solution that can properly
cope with these peer effects. Simulation results show that the proposed
framework yields substantial gains, in terms of the average sum rate, reaching
up to 27% and 54%, respectively, compared to a non-cooperative scheme in which
inter-operator sharing is not allowed and a random allocation approach. The
results also show that our framework provides insights on how to manage pricing
and the cost of the cooperative mmW backhaul network for the MNOs.