IT博客汇 | arXiv Paper Daily: Tue, 27 Sep 2016

arXiv Paper Daily: Tue, 27 Sep 2016

我爱机器学习(52ml.net)发表于 2016-09-27 00:00:00

Neural and Evolutionary Computing

An Ontology of Preference-Based Multiobjective Evolutionary Algorithms

Longmei Li, Iryna Yevseyeva, Vitor Basto-Fernandes, Heike Trautmann, Ning Jing, Michael Emmerich
Comments: 19 pages, 7 figures
Subjects: Neural and Evolutionary Computing (cs.NE)

User preference integration is of great importance in multiobjective
optimization, in particular in many objective optimization. Preferences have
long been considered in traditional multicriteria decision making (MCDM) which
is based on mathematical programming and recently it is integrated in
evolutionary multiobjective optimization (EMO), resulting in focus on preferred
parts of the Pareto front instead of the whole Pareto front. The number of
publications and results on preference-based multiobjective evolutionary
algorithms (PMOEAs) has increased rapidly over the past decade. There already
exists a large variety of preference handling methods and EMO methods, which
have been combined in various ways. This article proposes to use the Web
Ontology Language (OWL) to model and systematize the results developed in this
field. An extensive review of the existing work is provided, based on which an
ontology is built and instantiated with state of the art results. The OWL
ontology is made public and open to future extension. Moreover, the usage of
the ontology is exemplified for different use-cases, including training new
researchers into this knowledge domain, querying for methods that match an
application problem in engineering optimization, checking existence of
combinations of preference models and EMO techniques, and discovering
opportunities for new research and open research questions.

Multiplicative LSTM for sequence modelling

Ben Krause, Liang Lu, Iain Murray, Steve Renals
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

This paper introduces multiplicative LSTM, a novel hybrid recurrent neural
network architecture for sequence modelling that combines the long short-term
memory (LSTM) and multiplicative recurrent neural network architectures.
Multiplicative LSTM is motivated by its flexibility to have very different
recurrent transition functions for each possible input, which we argue helps
make it more expressive in autoregressive density estimation. We show
empirically that multiplicative LSTM outperforms standard LSTM and deep
variants for a range of character level modelling tasks. We also found that
this improvement increases as the complexity of the task scales up. This model
achieves a validation error of 1.20 bits/character on the Hutter prize dataset
when combined with dynamic evaluation.

Accurate and Efficient Hyperbolic Tangent Activation Function on FPGA using the DCT Interpolation Filter

Ahmed M. Abdelsalam, J.M. Pierre Langlois, F. Cheriet
Comments: 8 pages, 6 figures, 5 tables, submitted for the 25th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (ISFPGA), 22-24 February 2017, California, USA
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

Implementing an accurate and fast activation function with low cost is a
crucial aspect to the implementation of Deep Neural Networks (DNNs) on FPGAs.
We propose a high-accuracy approximation approach for the hyperbolic tangent
activation function of artificial neurons in DNNs. It is based on the Discrete
Cosine Transform Interpolation Filter (DCTIF). The proposed architecture
combines simple arithmetic operations on stored samples of the hyperbolic
tangent function and on input data. The proposed DCTIF implementation achieves
two orders of magnitude greater precision than previous work while using the
same or fewer computational resources. Various combinations of DCTIF parameters
can be chosen to tradeoff the accuracy and complexity of the hyperbolic tangent
function. In one case, the proposed architecture approximates the hyperbolic
tangent activation function with 10E-5 maximum error while requiring only 1.52
Kbits memory and 57 LUTs of a Virtex-7 FPGA. We also discuss how the activation
function accuracy affects the performance of DNNs in terms of their training
and testing accuracies. We show that a high accuracy approximation can be
necessary in order to maintain the same DNN training and testing performances
realized by the exact function.

The RNN-ELM Classifier

Athanasios Vlontzos
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

In this paper we examine learning methods combining the Random Neural
Network, a biologically inspired neural network and the Extreme Learning
Machine that achieve state of the art classification performance while
requiring much shorter training time. The Random Neural Network is a integrate
and fire computational model of a neural network whose mathematical structure
permits the efficient analysis of large ensembles of neurons. An activation
function is derived from the RNN and used in an Extreme Learning Machine. We
compare the performance of this combination against the ELM with various
activation functions, we reduce the input dimensionality via PCA and compare
its performance vs. autoencoder based versions of the RNN-ELM.

Sooner than Expected: Hitting the Wall of Complexity in Evolution

Thomas Schmickl, Payam Zahadat, Heiko Hamann
Comments: 24 pages, 15 figures
Subjects: Neural and Evolutionary Computing (cs.NE)

In evolutionary robotics an encoding of the control software, which maps
sensor data (input) to motor control values (output), is shaped by stochastic
optimization methods to complete a predefined task. This approach is assumed to
be beneficial compared to standard methods of controller design in those cases
where no a-priori model is available that could help to optimize performance.
Also for robots that have to operate in unpredictable environments, an
evolutionary robotics approach is favorable. We demonstrate here that such a
model-free approach is not a free lunch, as already simple tasks can represent
unsolvable barriers for fully open-ended uninformed evolutionary computation
techniques. We propose here the ‘Wankelmut’ task as an objective for an
evolutionary approach that starts from scratch without pre-shaped controller
software or any other informed approach that would force the behavior to be
evolved in a desired way. Our focal claim is that ‘Wankelmut’ represents the
simplest set of problems that makes plain-vanilla evolutionary computation
fail. We demonstrate this by a series of simple standard evolutionary
approaches using different fitness functions and standard artificial neural
networks as well as continuous-time recurrent neural networks. All our tested
approaches failed. We claim that any other evolutionary approach will also fail
that does per-se not favor or enforce modularity and does not freeze or protect
already evolved functionalities. Thus we propose a hard-to-pass benchmark and
make a strong statement for self-complexifying and generative approaches in
evolutionary computation. We anticipate that defining such a ‘simplest task to
fail’ is a valuable benchmark for promoting future development in the field of
artificial intelligence, evolutionary robotics and artificial life.

Learning by Stimulation Avoidance: A Principle to Control Spiking Neural Networks Dynamics

Lana Sinapayen, Atsushi Masumori, Takashi Ikegami
Comments: 17 pages, 11 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

Learning based on networks of real neurons, and by extension biologically
inspired models of neural networks, has yet to find general learning rules
leading to widespread applications. In this paper, we argue for the existence
of a principle allowing to steer the dynamics of a biologically inspired neural
network. Using carefully timed external stimulation, the network can be driven
towards a desired dynamical state. We term this principle “Learning by
Stimulation Avoidance” (LSA). We demonstrate through simulation that the
minimal sufficient conditions leading to LSA in artificial networks are also
sufficient to reproduce learning results similar to those obtained in
biological neurons by Shahaf and Marom [1]. We examine the mechanism’s basic
dynamics in a reduced network, and demonstrate how it scales up to a network of
100 neurons. We show that LSA has a higher explanatory power than existing
hypotheses about the response of biological neural networks to external
simulation, and can be used as a learning rule for an embodied application:
learning of wall avoidance by a simulated robot. The surge in popularity of
artificial neural networks is mostly directed to disembodied models of neurons
with biologically irrelevant dynamics: to the authors’ knowledge, this is the
first work demonstrating sensory-motor learning with random spiking networks
through pure Hebbian learning.

Computer Vision and Pattern Recognition

Learning Language-Visual Embedding for Movie Understanding with Natural-Language

Atousa Torabi, Niket Tandon, Leonid Sigal
Comments: 13 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Learning a joint language-visual embedding has a number of very appealing
properties and can result in variety of practical application, including
natural language image/video annotation and search. In this work, we study
three different joint language-visual neural network model architectures. We
evaluate our models on large scale LSMDC16 movie dataset for two tasks: 1)
Standard Ranking for video annotation and retrieval 2) Our proposed movie
multiple-choice test. This test facilitate automatic evaluation of
visual-language models for natural language video annotation based on human
activities. In addition to original Audio Description (AD) captions, provided
as part of LSMDC16, we collected and will make available a) manually generated
re-phrasings of those captions obtained using Amazon MTurk b) automatically
generated human activity elements in “Predicate + Object” (PO) phrases based on
“Knowlywood”, an activity knowledge mining model. Our best model archives
Recall@10 of 19.2% on annotation and 18.9% on video retrieval tasks for subset
of 1000 samples. For multiple-choice test, our best model achieve accuracy
58.11% over whole LSMDC16 public test-set.

Swipe Mosaics from Video

Malcolm Reynolds, Tom S. F. Haines, Gabriel J. Brostow
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A panoramic image mosaic is an attractive visualization for viewing many
overlapping photos, but its images must be both captured and processed
correctly to produce an acceptable composite. We propose Swipe Mosaics, an
interactive visualization that places the individual video frames on a 2D
planar map that represents the layout of the physical scene. Compared to
traditional panoramic mosaics, our capture is easier because the user can both
translate the camera center and film moving subjects. Processing and display
degrade gracefully if the footage lacks distinct, overlapping, non-repeating
texture. Our proposed visual odometry algorithm produces a distribution over
(x,y) translations for image pairs. Inferring a distribution of possible camera
motions allows us to better cope with parallax, lack of texture, dynamic
scenes, and other phenomena that hurt deterministic reconstruction techniques.
Robustness is obtained by training on synthetic scenes with known camera
motions. We show that Swipe Mosaics are easy to generate, support a wide range
of difficult scenes, and are useful for documenting a scene for closer
inspection.

Robust Matrix Decomposition for Image Segmentation under Heavy Noises and Uneven Background Intensities

Garret Vo, Chiwoo Park
Comments: 13 Pages; 11 Figures; 3 Tables; Submitted to IEEE T-PAMI
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a robust matrix decomposition approach that automatically
segments a binary image to foreground regions and background regions under high
observation noise levels and uneven background intensities. The work is
motivated by the need of identifying foreground objects in a noisy electron
microscopic image, but the method can be applied for a general binary
classification problem. The proposed method models an input image as a matrix
of image pixel values, and the matrix is represented by a mixture of three
component matrices of the same size, background, foreground and noise matrices.
We propose a robust matrix decomposition approach to separate the input matrix
into the three components through robust singular value decomposition. The
proposed approach is more robust to high image noises and uneven background
than the existing matrix-based approaches, which is numerically shown using
simulated images and five electron microscope images with manually achieved
ground truth data.

BioLeaf: a professional mobile application to measure foliar damage caused by insect herbivory

Bruno Machado, Jonatan Orue, Mauro Arrudaa, Cleidimar Santos, Diogo Sarath, Wesley Goncalves, Gercina Silva, Hemerson Pistoric, Antonia Roel, Jose Rodrigues-Jr
Journal-ref: Computers and Electronics in Agriculture 129: 1. 44-55 (2016)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Soybean is one of the ten greatest crops in the world, answering for
billion-dollar businesses every year. This crop suffers from insect herbivory
that costs millions from producers. Hence, constant monitoring of the crop
foliar damage is necessary to guide the application of insecticides. However,
current methods to measure foliar damage are expensive and dependent on
laboratory facilities, in some cases, depending on complex devices. To cope
with these shortcomings, we introduce an image processing methodology to
measure the foliar damage in soybean leaves. We developed a non-destructive
imaging method based on two techniques, Otsu segmentation and Bezier curves, to
estimate the foliar loss in leaves with or without border damage. We
instantiate our methodology in a mobile application named BioLeaf, which is
freely distributed for smartphone users. We experimented with real-world leaves
collected from a soybean crop in Brazil. Our results demonstrated that BioLeaf
achieves foliar damage quantification with precision comparable to that of
human specialists. With these results, our proposal might assist soybean
producers, reducing the time to measure foliar damage, reducing analytical
costs, and defining a commodity application that is applicable not only to soy,
but also to different crops such as cotton, bean, potato, coffee, and
vegetables.

Super-resolving multiresolution images with band-independant geometry of multispectral pixels

Nicolas Brodu
Comments: Source code with a ready-to-use script for super-resolving Sentinel-2 data is available at this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A new resolution enhancement method is presented for multispectral and
multi-resolution images, such as these provided by the Sentinel-2 satellites.
Starting from the highest resolution bands, band-dependent information
(reflectance) is separated from information that is common to all bands
(geometry of scene elements). This model is then applied to unmix
low-resolution bands, preserving their reflectance, while propagating
band-independent information to preserve the sub-pixel details. A reference
implementation is provided, with an application example for super-resolving
Sentinel-2 data.

Optimistic and Pessimistic Neural Networks for Scene and Object Recognition

Rene Grzeszick, Sebastian Sudholt, Gernot A. Fink
Comments: Submitted to WACV’17
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper the application of uncertainty modeling to convolutional neural
networks is evaluated. A novel method for adjusting the network’s predictions
based on uncertainty information is introduced. This allows the network to be
either optimistic or pessimistic in its prediction scores. The proposed method
builds on the idea of applying dropout at test time and sampling a predictive
mean and variance from the network’s output. Besides the methodological
aspects, implementation details allowing for a fast evaluation are presented.
Furthermore, a multilabel network architecture is introduced that strongly
benefits from the presented approach. In the evaluation it will be shown that
modeling uncertainty allows for improving the performance of a given model
purely at test time without any further training steps. The evaluation
considers several applications in the field of computer vision, including
object classification and detection as well as scene attribute recognition.

Deep Structured Features for Semantic Segmentation

Michael Tschannen, Lukas Cavigelli, Fabian Mentzer, Thomas Wiatowski, Luca Benini
Comments: 10 pages, 2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We propose a highly structured neural network architecture for semantic
segmentation of images that combines i) a Haar wavelet-based tree-like
convolutional neural network (CNN), ii) a random layer realizing a radial basis
function kernel approximation, and iii) a linear classifier. While stages i)
and ii) are completely pre-specified, only the linear classifier is learned
from data. Thanks to its high degree of structure, our architecture has a very
small memory footprint and thus fits onto low-power embedded and mobile
platforms. We apply the proposed architecture to outdoor scene and aerial image
semantic segmentation and show that the accuracy of our architecture is
competitive with conventional pixel classification CNNs. Furthermore, we
demonstrate that the proposed architecture is data efficient in the sense of
matching the accuracy of pixel classification CNNs when trained on a much
smaller data set.

Linear Support Tensor Machine: Pedestrian Detection in Thermal Infrared Images

Sujoy Kumar Biswas, Peyman Milanfar
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Pedestrian detection in thermal infrared images poses unique challenges
because of the low resolution and noisy nature of the image. Here we propose a
mid-level attribute in the form of multidimensional template, or tensor, using
Local Steering Kernel (LSK) as low-level descriptors for detecting pedestrians
in far infrared images. LSK is specifically designed to deal with intrinsic
image noise and pixel level uncertainty by capturing local image geometry
succinctly instead of collecting local orientation statistics (e.g., histograms
in HOG). Our second contribution is the introduction of a new image similarity
kernel in the popular maximum margin framework of support vector machines that
results in a relatively short and simple training phase for building a rigid
pedestrian detector. Our third contribution is to replace the sluggish but de
facto sliding window based detection methodology with multichannel discrete
Fourier transform, facilitating very fast and efficient pedestrian
localization. The experimental studies on publicly available thermal infrared
images justify our proposals and model assumptions. In addition, the proposed
work also involves the release of our in-house annotations of pedestrians in
more than 17000 frames of OSU Color Thermal database for the purpose of sharing
with the research community.

Visual Fashion-Product Search at SK Planet

Taewan Kim, Seyeong Kim, Sangil Na, Hayoon Kim, Moonki Kim, Beyeongki Jeon
Comments: 13 pages, 11 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We build a large-scale visual search system which finds similar product
images given a fashion item. Defining similarity among arbitrary
fashion-products is still remains a challenging problem, even there is no exact
ground-truth. To resolve this problem, we define more than 90 fashion-related
attributes, and combination of these attributes can represent thousands of
unique fashion-styles. The fashion- attributes are one of the ingredients to
define semantic similarity among fashion- product images. To build our system
at scale, these fashion-attributes are again used to build an inverted indexing
scheme. In addition to these fashion-attributes for semantic similarity, we
extract colour and appearance features in a region-of- interest (ROI) of a
fashion item for visual similarity. By sharing our approach, we expect active
discussion on that how to apply current computer vision research into the
e-commerce industry.

Multiview RGB-D Dataset for Object Instance Detection

Georgios Georgakis, Md Alimoor Reza, Arsalan Mousavian, Phi-Hung Le, Jana Kosecka
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

This paper presents a new multi-view RGB-D dataset of nine kitchen scenes,
each containing several objects in realistic cluttered environments including a
subset of objects from the BigBird dataset. The viewpoints of the scenes are
densely sampled and objects in the scenes are annotated with bounding boxes and
in the 3D point cloud. Also, an approach for detection and recognition is
presented, which is comprised of two parts: i) a new multi-view 3D proposal
generation method and ii) the development of several recognition baselines
using AlexNet to score our proposals, which is trained either on crops of the
dataset or on synthetically composited training images. Finally, we compare the
performance of the object proposals and a detection baseline to the Washington
RGB-D Scenes (WRGB-D) dataset and demonstrate that our Kitchen scenes dataset
is more challenging for object detection and recognition. The dataset is
available at: this http URL

Joint Rain Detection and Removal via Iterative Region Dependent Multi-Task Learning

Wenhan Yang, Robby T. Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, Shuicheng Yan
Comments: 12 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we address a rain removal problem from a single image, even in
the presence of heavy rain and rain accumulation. Our core ideas lie in our new
rain image models and a novel deep learning architecture. We first modify the
commonly used model, which is a linear combination of a rain streak layer and a
background layer, by adding a binary map that locates rain streak regions.
Second, we create a model consisting of a component representing rain
accumulation (where individual streaks cannot be seen, and thus visually
similar to mist or fog), and another component representing various shapes and
directions of overlapping rain streaks, which normally happen in heavy rain.
Based on the first model, we develop a multi-task deep learning architecture
that learns the binary rain streak map, the appearance of rain streaks, and the
clean background, which is our ultimate output. The additional binary map is
critically beneficial, since its loss function can provide additional strong
information to the network. In many cases though, rain streaks can be dense and
large in their size, thus to obtain the clean background, we need spatial
contextual information. For this, we utilize the dilated convolution. To handle
rain accumulation (again, a phenomenon visually similar to mist or fog) and
various shapes and directions of overlapping rain streaks, we propose an
iterative information feedback (IIF) network that removes rain streaks and
clears up the rain accumulation iteratively and progressively. Overall, this
multi-task learning and iterative information feedback benefits each other and
constitutes a network that is end-to-end trainable. Our extensive evaluation on
real images, particularly on heavy rain, shows the effectiveness of our novel
models and architecture, outperforming the state-of-the-art methods
significantly.

Deep learning based fence segmentation and removal from an image using a video sequence

Sankaraganesh Jonna, Krishna K. Nakka, Rajiv R. Sahay
Comments: ECCV Workshop on Video Segmentation, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Conventional approaches to image de-fencing use multiple adjacent frames for
segmentation of fences in the reference image and are limited to restoring
images of static scenes only. In this paper, we propose a de-fencing algorithm
for images of dynamic scenes using an occlusion-aware optical flow method. We
divide the problem of image de-fencing into the tasks of automated fence
segmentation from a single image, motion estimation under known occlusions and
fusion of data from multiple frames of a captured video of the scene.
Specifically, we use a pre-trained convolutional neural network to segment
fence pixels from a single image. The knowledge of spatial locations of fences
is used to subsequently estimate optical flow in the occluded frames of the
video for the final data fusion step. We cast the fence removal problem in an
optimization framework by modeling the formation of the degraded observations.
The inverse problem is solved using fast iterative shrinkage thresholding
algorithm (FISTA). Experimental results show the effectiveness of proposed
algorithm.

Perceptual uniform descriptor and Ranking on manifold: A bridge between image representation and ranking for image retrieval

Shenglan Liu, Jun Wu, Lin Feng, Yang Liu, Hong Qiao, Wenbo Luo Muxin Sun, Wei Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Incompatibility of image descriptor and ranking is always neglected in image
retrieval. In this paper, manifold learning and Gestalt psychology theory are
involved to solve the incompatibility problem. A new holistic descriptor called
Perceptual Uniform Descriptor (PUD) based on Gestalt psychology is proposed,
which combines color and gradient direction to imitate the human visual
uniformity. PUD features in the same class images distributes on one manifold
in most cases because PUD improves the visual uniformity of the traditional
descriptors. Thus, we use manifold ranking and PUD to realize image retrieval.
Experiments were carried out on five benchmark data sets, and the proposed
method can greatly improve the accuracy of image retrieval. Our experimental
results in the Ukbench and Corel-1K datasets demonstrated that N-S score
reached to 3.58 (HSV 3.4) and mAP to 81.77% (ODBTC 77.9%) respectively by
utilizing PUD which has only 280 dimension. The results are higher than other
holistic image descriptors (even some local ones) and state-of-the-arts
retrieval methods.

Three Tiers Neighborhood Graph and Multi-graph Fusion Ranking for Multi-feature Image Retrieval: A Manifold Aspect

Shenglan Liu, Muxin Sun, Lin Feng, Yang Liu, Jun Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Single feature is inefficient to describe content of an image, which is a
shortcoming in traditional image retrieval task. We know that one image can be
described by different features. Multi-feature fusion ranking can be utilized
to improve the ranking list of query. In this paper, we first analyze graph
structure and multi-feature fusion re-ranking from manifold aspect. Then, Three
Tiers Neighborhood Graph (TTNG) is constructed to re-rank the original ranking
list by single feature and to enhance precision of single feature. Furthermore,
we propose Multi-graph Fusion Ranking (MFR) for multi-feature ranking, which
considers the correlation of all images in multiple neighborhood graphs.
Evaluations are conducted on UK-bench, Corel-1K, Corel-10K and Cifar-10
benchmark datasets. The experimental results show that our TTNG and MFR
outperform than other state-of-the-art methods. For example, we achieve
competitive results N-S score 3.91 and precision 65.00% on UK-bench and
Corel-10K datasets respectively.

DimensionApp : android app to estimate object dimensions

Suriya Singh, Vijay Kumar
Comments: Project Report 2014
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this project, we develop an android app that uses on computer vision
techniques to estimate an object dimension present in field of view. The app
while having compact size, is accurate upto +/- 5 mm and robust towards touch
inputs. We use single-view metrology to compute accurate measurement. Unlike
previous approaches, our technique does not rely on line detection and can be
generalize to any object shape easily.

A Rotation Invariant Latent Factor Model for Moveme Discovery from Static Poses

Matteo Ruggero Ronchi, Joon Sik Kim, Yisong Yue
Comments: Long version of the paper accepted at the IEEE ICDM 2016 conference. 10 pages, 9 figures, 1 table. Project page: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We tackle the problem of learning a rotation invariant latent factor model
when the training data is comprised of lower-dimensional projections of the
original feature space. The main goal is the discovery of a set of 3-D bases
poses that can characterize the manifold of primitive human motions, or
movemes, from a training set of 2-D projected poses obtained from still images
taken at various camera angles. The proposed technique for basis discovery is
data-driven rather than hand-designed. The learned representation is rotation
invariant, and can reconstruct any training instance from multiple viewing
angles. We apply our method to modeling human poses in sports (via the Leeds
Sports Dataset), and demonstrate the effectiveness of the learned bases in a
range of applications such as activity classification, inference of dynamics
from a single frame, and synthetic representation of movements.

Lexicon-Free Fingerspelling Recognition from Video: Data, Models, and Signer Adaptation

Taehwan Kim, Jonathan Keane, Weiran Wang, Hao Tang, Jason Riggle, Gregory Shakhnarovich, Diane Brentari, Karen Livescu
Comments: arXiv admin note: substantial text overlap with arXiv:1608.08339
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

We study the problem of recognizing video sequences of fingerspelled letters
in American Sign Language (ASL). Fingerspelling comprises a significant but
relatively understudied part of ASL. Recognizing fingerspelling is challenging
for a number of reasons: It involves quick, small motions that are often highly
coarticulated; it exhibits significant variation between signers; and there has
been a dearth of continuous fingerspelling data collected. In this work we
collect and annotate a new data set of continuous fingerspelling videos,
compare several types of recognizers, and explore the problem of signer
variation. Our best-performing models are segmental (semi-Markov) conditional
random fields using deep neural network-based features. In the signer-dependent
setting, our recognizers achieve up to about 92% letter accuracy. The
multi-signer setting is much more challenging, but with neural network
adaptation we achieve up to 83% letter accuracies in this setting.

Autonomous Exploration with a Low-Cost Quadrocopter using Semi-Dense Monocular SLAM

Lukas von Stumberg, Vladyslav Usenko, Jakob Engel, Jörg Stückler, Daniel Cremers
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Micro aerial vehicles (MAVs) are strongly limited in their payload and power
capacity. In order to implement autonomous navigation, algorithms are therefore
desirable that use sensory equipment that is as small, low-weight, and low-
power consuming as possible. In this paper, we propose a method for autonomous
MAV navigation and exploration using a low-cost consumer-grade quadrocopter
equipped with a monocular camera. Our vision-based navigation system builds on
LSD-SLAM which estimates the MAV trajectory and a semi-dense reconstruction of
the environment in real-time. Since LSD-SLAM only determines depth at high
gradient pixels, texture-less areas are not directly observed. We propose an
obstacle mapping and exploration approach that takes this property into
account. In experiments, we demonstrate our vision-based autonomous navigation
and exploration system with a commercially available Parrot Bebop MAV.

Low-complexity Image and Video Coding Based on an Approximate Discrete Tchebichef Transform

P. A. M. Oliveira, R. J. Cintra, F. M. Bayer, S. Kulasekera, A. Madanayake
Comments: 11 pages, 5 figures, 4 tables
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Data Structures and Algorithms (cs.DS); Computation (stat.CO); Methodology (stat.ME)

The usage of linear transformations has great relevance for data
decorrelation applications, like image and video compression. In that sense,
the discrete Tchebichef transform (DTT) possesses useful coding and
decorrelation properties. The DTT transform kernel does not depend on the input
data and fast algorithms can be developed to real time applications. However,
the DTT fast algorithm presented in literature possess high computational
complexity. In this work, we introduce a new low-complexity approximation for
the DTT. The fast algorithm of the proposed transform is multiplication-free
and requires a reduced number of additions and bit-shifting operations. Image
and video compression simulations in popular standards shows good performance
of the proposed transform. Regarding hardware resource consumption for FPGA
shows 43.1% reduction of configurable logic blocks and ASIC place and route
realization shows 57.7% reduction in the area-time figure when compared with
the 2-D version of the exact DTT.

Artificial Intelligence

Commonsense reasoning, commonsense knowledge, and the SP theory of intelligence

J Gerard Wolff
Subjects: Artificial Intelligence (cs.AI)

This paper describes how the “SP theory of intelligence”, outlined in an
Appendix, may throw light on aspects of commonsense reasoning (CSR) and
commonsense knowledge (CSK) (together shortened to CSRK), as discussed in
another paper by Ernest Davis and Gary Marcus (DM). The SP system has the
generality needed for CSRK: Turing equivalence; the generality of information
compression as the foundation for the SP system both in the representation of
knowledge and in concepts of prediction and probability; the versatility of the
SP system in the representation of knowledge and in aspects of intelligence
including forms of reasoning; and the potential of the system for the seamless
integration of diverse forms of knowledge and diverse aspects of intelligence.
Several examples discussed by DM, and how they may be processed in the SP
system, are discussed. Also discussed are current successes in CSR (taxonomic
reasoning, temporal reasoning, action and change, and qualitative reasoning),
how the SP system may promote seamless integration across these areas, and how
insights gained from the SP programme of research may yield some potentially
useful new ways of approaching these topics. The paper considers how the SP
system may help overcome several challenges in the automation of CSR described
by DM, and how it meets several of the objectives for research in CSRK that
they have described.

Testing Quantum Models of Conjunction Fallacy on the World Wide Web

Diederik Aerts, Jonito Aerts Arguëlles, Lyneth Beltran, Massimiliano Sassoli de Bianchi, Sandro Sozzo, Tomas Veloz
Comments: 12 pages
Subjects: Artificial Intelligence (cs.AI); Quantum Physics (quant-ph)

The ‘conjunction fallacy’ has been extensively debated by scholars in
cognitive science and, in recent times, the discussion has been enriched by the
proposal of modeling the fallacy using the quantum formalism. Two major quantum
approaches have been put forward: the first assumes that respondents use a
two-step sequential reasoning and that the fallacy results from the presence of
‘question order effects’; the second assumes that respondents evaluate the
cognitive situation as a whole and that the fallacy results from the ’emergence
of new meanings’, as an ‘effect of overextension’ in the conceptual
conjunction. Thus, the question arises as to determine whether and to what
extent conjunction fallacies would result from ‘order effects’ or, instead,
from ’emergence effects’. To help clarify this situation, we propose to use the
World Wide Web as an ‘information space’ that can be interrogated both in a
sequential and non-sequential way, to test these two quantum approaches. We
find that ’emergence effects’, and not ‘order effects’, should be considered
the main cognitive mechanism producing the observed conjunction fallacies.

Towards Evidence-Based Ontology for Supporting Systematic Literature Review

Yueming Sun, Ye Yang, He Zhang, Wen Zhang, Qing Wang
Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

[Background]: Systematic Literature Review (SLR) has become an important
software engineering research method but costs tremendous efforts. [Aim]: This
paper proposes an approach to leverage on empirically evolved ontology to
support automating key SLR activities. [Method]: First, we propose an ontology,
SLRONT, built on SLR experiences and best practices as a groundwork to capture
common terminologies and their relationships during SLR processes; second, we
present an extended version of SLRONT, the COSONT and instantiate it with the
knowledge and concepts extracted from structured abstracts. Case studies
illustrate the details of applying it for supporting SLR steps. [Results]:
Results show that through using COSONT, we acquire the same conclusion compared
with sheer manual works, but the efforts involved is significantly reduced.
[Conclusions]: The approach of using ontology could effectively and efficiently
support the conducting of systematic literature review.

Dropout with Expectation-linear Regularization

Xuezhe Ma, Yingkai Gao, Zhiting Hu, Yaoliang Yu, Yuntian Deng, Eduard Hovy
Comments: Under review as a conference paper at ICLR 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Dropout, a simple and effective way to train deep neural networks, has led to
a number of impressive empirical successes and spawned many recent theoretical
investigations. However, the gap between dropout’s training and inference
phases, introduced due to tractability considerations, has largely remained
under-appreciated. In this work, we first formulate dropout as a tractable
approximation of some latent variable model, leading to a clean view of
parameter sharing and enabling further theoretical analysis. Then, we introduce
(approximate) expectation-linear dropout neural networks, whose inference gap
we are able to formally characterize. Algorithmically, we show that our
proposed measure of the inference gap can be used to regularize the standard
dropout training objective, resulting in an emph{explicit} control of the gap.
Our method is as simple and efficient as standard dropout. We further prove the
upper bounds on the loss in accuracy due to expectation-linearization, describe
classes of input distributions that expectation-linearize easily. Experiments
on three image classification benchmark datasets demonstrate that reducing the
inference gap can indeed improve the performance consistently.

Pointer Sentinel Mixture Models

Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Recent neural network sequence models with softmax classifiers have achieved
their best language modeling performance only with very large hidden states and
large vocabularies. Even then they struggle to predict rare or unseen words
even if the context makes the prediction unambiguous. We introduce the pointer
sentinel mixture architecture for neural sequence models which has the ability
to either reproduce a word from the recent context or produce a word from a
standard softmax classifier. Our pointer sentinel-LSTM model achieves state of
the art language modeling performance on the Penn Treebank (70.9 perplexity)
while using far fewer parameters than a standard softmax LSTM. In order to
evaluate how well language models can exploit longer contexts and deal with
more realistic vocabularies and larger corpora we also introduce the freely
available WikiText corpus.

Learning by Stimulation Avoidance: A Principle to Control Spiking Neural Networks Dynamics

Lana Sinapayen, Atsushi Masumori, Takashi Ikegami
Comments: 17 pages, 11 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

Learning based on networks of real neurons, and by extension biologically
inspired models of neural networks, has yet to find general learning rules
leading to widespread applications. In this paper, we argue for the existence
of a principle allowing to steer the dynamics of a biologically inspired neural
network. Using carefully timed external stimulation, the network can be driven
towards a desired dynamical state. We term this principle “Learning by
Stimulation Avoidance” (LSA). We demonstrate through simulation that the
minimal sufficient conditions leading to LSA in artificial networks are also
sufficient to reproduce learning results similar to those obtained in
biological neurons by Shahaf and Marom [1]. We examine the mechanism’s basic
dynamics in a reduced network, and demonstrate how it scales up to a network of
100 neurons. We show that LSA has a higher explanatory power than existing
hypotheses about the response of biological neural networks to external
simulation, and can be used as a learning rule for an embodied application:
learning of wall avoidance by a simulated robot. The surge in popularity of
artificial neural networks is mostly directed to disembodied models of neurons
with biologically irrelevant dynamics: to the authors’ knowledge, this is the
first work demonstrating sensory-motor learning with random spiking networks
through pure Hebbian learning.

Predictive modelling of football injuries

Stylianos Kampakis
Comments: PhD Thesis submitted and defended successfully at the Department of Computer Science at University College London
Subjects: Applications (stat.AP); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

The goal of this thesis is to investigate the potential of predictive
modelling for football injuries. This work was conducted in close collaboration
with Tottenham Hotspurs FC (THFC), the PGA European tour and the participation
of Wolverhampton Wanderers (WW).

Three investigations were conducted:

1. Predicting the recovery time of football injuries using the UEFA injury
recordings: The UEFA recordings is a common standard for recording injuries in
professional football. For this investigation, three datasets of UEFA injury
recordings were available. Different machine learning algorithms were used in
order to build a predictive model. The performance of the machine learning
models is then improved by using feature selection conducted through
correlation-based subset feature selection and random forests.

2. Predicting injuries in professional football using exposure records: The
relationship between exposure (in training hours and match hours) in
professional football athletes and injury incidence was studied. A common
problem in football is understanding how the training schedule of an athlete
can affect the chance of him getting injured. The task was to predict the
number of days a player can train before he gets injured.

3. Predicting intrinsic injury incidence using in-training GPS measurements:
A significant percentage of football injuries can be attributed to overtraining
and fatigue. GPS data collected during training sessions might provide
indicators of fatigue, or might be used to detect very intense training
sessions which can lead to overtraining. This research used GPS data gathered
during training sessions of the first team of THFC, in order to predict whether
an injury would take place during a week.

Computation and Language

An Unsupervised Probability Model for Speech-to-Translation Alignment of Low-Resource Languages

Antonios Anastasopoulos, David Chiang, Long Duong
Comments: accepted at EMNLP 2016
Subjects: Computation and Language (cs.CL)

For many low-resource languages, spoken language resources are more likely to
be annotated with translations than with transcriptions. Translated speech data
is potentially valuable for documenting endangered languages or for training
speech translation systems. A first step towards making use of such data would
be to automatically align spoken words with their translations. We present a
model that combines Dyer et al.’s reparameterization of IBM Model 2
(fast-align) and k-means clustering using Dynamic Time Warping as a distance
metric. The two components are trained jointly using expectation-maximization.
In an extremely low-resource scenario, our model performs significantly better
than both a neural model and a strong baseline.

Creating Causal Embeddings for Question Answering with Minimal Supervision

Rebecca Sharp, Mihai Surdeanu, Peter Jansen, Peter Clark, Michael Hammond
Comments: To appear in EMNLP 2016
Subjects: Computation and Language (cs.CL)

A common model for question answering (QA) is that a good answer is one that
is closely related to the question, where relatedness is often determined using
general-purpose lexical models such as word embeddings. We argue that a better
approach is to look for answers that are related to the question in a relevant
way, according to the information need of the question, which may be determined
through task-specific embeddings. With causality as a use case, we implement
this insight in three steps. First, we generate causal embeddings
cost-effectively by bootstrapping cause-effect pairs extracted from free text
using a small set of seed patterns. Second, we train dedicated embeddings over
this data, by using task-specific contexts, i.e., the context of a cause is its
effect. Finally, we extend a state-of-the-art reranking approach for QA to
incorporate these causal embeddings. We evaluate the causal embedding models
both directly with a casual implication task, and indirectly, in a downstream
causal QA task using data from Yahoo! Answers. We show that explicitly modeling
causality improves performance in both tasks. In the QA task our best model
achieves 37.3% P@1, significantly outperforming a strong baseline by 7.7%
(relative).

Toward Socially-Infused Information Extraction: Embedding Authors, Mentions, and Entities

Yi Yang, Ming-Wei Chang, Jacob Eisenstein
Comments: Accepted to EMNLP 2016
Subjects: Computation and Language (cs.CL)

Entity linking is the task of identifying mentions of entities in text, and
linking them to entries in a knowledge base. This task is especially difficult
in microblogs, as there is little additional text to provide disambiguating
context; rather, authors rely on an implicit common ground of shared knowledge
with their readers. In this paper, we attempt to capture some of this implicit
context by exploiting the social network structure in microblogs. We build on
the theory of homophily, which implies that socially linked individuals share
interests, and are therefore likely to mention the same sorts of entities. We
implement this idea by encoding authors, mentions, and entities in a continuous
vector space, which is constructed so that socially-connected authors have
similar vector representations. These vectors are incorporated into a neural
structured prediction model, which captures structural constraints that are
inherent in the entity linking task. Together, these design decisions yield F1
improvements of 1%-5% on benchmark datasets, as compared to the previous
state-of-the-art.

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

Yi Yang, Ming-Wei Chang
Comments: Appeared in ACL 2015 proceedings. This is an updated version. More details available in the pdf file
Subjects: Computation and Language (cs.CL)

Non-linear models recently receive a lot of attention as people are starting
to discover the power of statistical and embedding features. However,
tree-based models are seldom studied in the context of structured learning
despite their recent success on various classification and ranking tasks. In
this paper, we propose S-MART, a tree-based structured learning framework based
on multiple additive regression trees. S-MART is especially suitable for
handling tasks with dense features, and can be used to learn many different
structures under various loss functions.

We apply S-MART to the task of tweet entity linking — a core component of
tweet information extraction, which aims to identify and link name mentions to
entities in a knowledge base. A novel inference algorithm is proposed to handle
the special structure of the task. The experimental results show that S-MART
significantly outperforms state-of-the-art tweet entity linking systems.

Lexicon-Free Fingerspelling Recognition from Video: Data, Models, and Signer Adaptation

We study the problem of recognizing video sequences of fingerspelled letters
in American Sign Language (ASL). Fingerspelling comprises a significant but
relatively understudied part of ASL. Recognizing fingerspelling is challenging
for a number of reasons: It involves quick, small motions that are often highly
coarticulated; it exhibits significant variation between signers; and there has
been a dearth of continuous fingerspelling data collected. In this work we
collect and annotate a new data set of continuous fingerspelling videos,
compare several types of recognizers, and explore the problem of signer
variation. Our best-performing models are segmental (semi-Markov) conditional
random fields using deep neural network-based features. In the signer-dependent
setting, our recognizers achieve up to about 92% letter accuracy. The
multi-signer setting is much more challenging, but with neural network
adaptation we achieve up to 83% letter accuracies in this setting.

Pointer Sentinel Mixture Models

Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Recent neural network sequence models with softmax classifiers have achieved
their best language modeling performance only with very large hidden states and
large vocabularies. Even then they struggle to predict rare or unseen words
even if the context makes the prediction unambiguous. We introduce the pointer
sentinel mixture architecture for neural sequence models which has the ability
to either reproduce a word from the recent context or produce a word from a
standard softmax classifier. Our pointer sentinel-LSTM model achieves state of
the art language modeling performance on the Penn Treebank (70.9 perplexity)
while using far fewer parameters than a standard softmax LSTM. In order to
evaluate how well language models can exploit longer contexts and deal with
more realistic vocabularies and larger corpora we also introduce the freely
available WikiText corpus.

A Factorized Model for Transitive Verbs in Compositional Distributional Semantics

Lilach Edelstein, Roi Reichart
Subjects: Computation and Language (cs.CL)

We present a factorized compositional distributional semantics model for the
representation of transitive verb constructions. Our model first produces
(subject, verb) and (verb, object) vector representations based on the
similarity of the nouns in the construction to each of the nouns in the
vocabulary and the tendency of these nouns to take the subject and object roles
of the verb. These vectors are then combined into a final (subject,verb,object)
representation through simple vector operations. On two established tasks for
the transitive verb construction our model outperforms recent previous work.

Lattice-Based Recurrent Neural Network Encoders for Neural Machine Translation

Jinsong Su, Zhixing Tan, Deyi Xiong, Yang Liu
Subjects: Computation and Language (cs.CL)

Neural machine translation (NMT) heavily relies on word level modelling to
learn semantic representations of input sentences. However, for languages
without natural word delimiters (e.g., Chinese) where input sentences have to
be tokenized first, conventional NMT is confronted with two issues: 1) it is
difficult to find an optimal tokenization granularity for source sentence
modelling, and 2) errors in 1-best tokenizations may propagate to the encoder
of NMT. To handle these issues, we propose word-lattice based Recurrent Neural
Network (RNN) encoders for NMT, which generalize the standard RNN to word
lattice topology. The proposed encoders take as input a word lattice that
compactly encodes multiple tokenizations, and learn to generate new hidden
states from arbitrarily many inputs and hidden states in preceding time steps.
As such, the word-lattice based encoders not only alleviate the negative impact
of tokenization errors but also are more expressive and flexible to embed input
sentences. Experiment results on Chinese-English translation demonstrate the
superiorities of the proposed encoders over the conventional encoder.

Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results

Yonatan Belinkov, James Glass
Comments: SeMaT 2016
Subjects: Computation and Language (cs.CL)

Machine translation between Arabic and Hebrew has so far been limited by a
lack of parallel corpora, despite the political and cultural importance of this
language pair. Previous work relied on manually-crafted grammars or pivoting
via English, both of which are unsatisfactory for building a scalable and
accurate MT system. In this work, we compare standard phrase-based and neural
systems on Arabic-Hebrew translation. We experiment with tokenization by
external tools and sub-word modeling by character-level neural models, and show
that both methods lead to improved translation performance, with a small
advantage to the neural models.

The distribution of information content in English sentences

Shuiyuan Yu, Jin Cong, Junying Liang, Haitao Liu
Subjects: Computation and Language (cs.CL)

Sentence is a basic linguistic unit, however, little is known about how
information content is distributed across different positions of a sentence.
Based on authentic language data of English, the present study calculated the
entropy and other entropy-related statistics for different sentence positions.
The statistics indicate a three-step staircase-shaped distribution pattern,
with entropy in the initial position lower than the medial positions (positions
other than the initial and final), the medial positions lower than the final
position and the medial positions showing no significant difference. The
results suggest that: (1) the hypotheses of Constant Entropy Rate and Uniform
Information Density do not hold for the sentence-medial positions; (2) the
context of a word in a sentence should not be simply defined as all the words
preceding it in the same sentence; and (3) the contextual information content
in a sentence does not accumulate incrementally but follows a pattern of “the
whole is greater than the sum of parts”.

Existence of Hierarchies and Human's Pursuit of Top Hierarchy Lead to Power Law

Shuiyuan Yu, Junying Liang, Haitao Liu
Subjects: Computation and Language (cs.CL); Physics and Society (physics.soc-ph)

The power law is ubiquitous in natural and social phenomena, and is
considered as a universal relationship between the frequency and its rank for
diverse social systems. However, a general model is still lacking to interpret
why these seemingly unrelated systems share great similarity. Through a
detailed analysis of natural language texts and simulation experiments based on
the proposed ‘Hierarchical Selection Model’, we found that the existence of
hierarchies and human’s pursuit of top hierarchy lead to the power law.
Further, the power law is a statistical and emergent performance of
hierarchies, and it is the universality of hierarchies that contributes to the
ubiquity of the power law.

An Investigation of Recurrent Neural Architectures for Drug Name Recognition

Raghavendra Chalapathy, Ehsan Zare Borzeshi, Massimo Piccardi
Comments: Accepted for Oral Presentation at LOUHI 2016 : EMNLP 2016 Workshop – The Seventh International Workshop on Health Text Mining and Information Analysis (LOUHI 2016)
Subjects: Computation and Language (cs.CL)

Drug name recognition (DNR) is an essential step in the Pharmacovigilance
(PV) pipeline. DNR aims to find drug name mentions in unstructured biomedical
texts and classify them into predefined categories. State-of-the-art DNR
approaches heavily rely on hand crafted features and domain specific resources
which are difficult to collect and tune. For this reason, this paper
investigates the effectiveness of contemporary recurrent neural architectures –
the Elman and Jordan networks and the bidirectional LSTM with CRF decoding – at
performing DNR straight from the text. The experimental results achieved on the
authoritative SemEval-2013 Task 9.1 benchmarks show that the bidirectional
LSTM-CRF ranks closely to highly-dedicated, hand-crafted systems.

A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects

Yonatan Belinkov, James Glass
Comments: DSL 2016
Subjects: Computation and Language (cs.CL)

Discriminating between closely-related language varieties is considered a
challenging and important task. This paper describes our submission to the DSL
2016 shared-task, which included two sub-tasks: one on discriminating similar
languages and one on identifying Arabic dialects. We developed a
character-level neural network for this task. Given a sequence of characters,
our model embeds each character in vector space, runs the sequence through
multiple convolutions with different filter widths, and pools the convolutional
representations to obtain a hidden vector representation of the text that is
used for predicting the language or dialect. We primarily focused on the Arabic
dialect identification task and obtained an F1 score of 0.4834, ranking 6th out
of 18 participants. We also analyze errors made by our system on the Arabic
data in some detail, and point to challenges such an approach is faced with.

Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser

Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Noah A. Smith
Comments: 10 pages. To appear at EMNLP 2016
Subjects: Computation and Language (cs.CL)

We introduce two first-order graph-based dependency parsers achieving a new
state of the art. The first is a consensus parser built from an ensemble of
independently trained greedy LSTM transition-based parsers with different
random initializations. We cast this approach as minimum Bayes risk decoding
(under the Hamming cost) and argue that weaker consensus within the ensemble is
a useful signal of difficulty or ambiguity. The second parser is a
“distillation” of the ensemble into a single model. We train the distillation
parser using a structured hinge loss objective with a novel cost that
incorporates ensemble uncertainty estimates for each possible attachment,
thereby avoiding the intractable cross-entropy computations required by
applying standard distillation objectives to problems with structured outputs.
The first-order distillation parser matches or surpasses the state of the art
on English, Chinese, and German.

Speaker Recognition for Children's Speech

Saeid Safavi, Maryam Najafian, Abualsoud Hanani, Martin J Russell, Peter Jancovic, Michael J Carey
Comments: INTERSPEECH 2012, Pages 1836-1839
Subjects: Sound (cs.SD); Computation and Language (cs.CL)

This paper presents results on Speaker Recognition (SR) for children’s
speech, using the OGI Kids corpus and GMM-UBM and GMM-SVM SR systems. Regions
of the spectrum containing important speaker information for children are
identified by conducting SR experiments over 21 frequency bands. As for adults,
the spectrum can be split into four regions, with the first (containing primary
vocal tract resonance information) and third (corresponding to high frequency
speech sounds) being most useful for SR. However, the frequencies at which
these regions occur are from 11% to 38% higher for children. It is also noted
that subband SR rates are lower for younger children. Finally results are
presented of SR experiments to identify a child in a class (30 children,
similar age) and school (288 children, varying ages). Class performance depends
on age, with accuracy varying from 90% for young children to 99% for older
children. The identification rate achieved for a child in a school is 81%.

Distributed, Parallel, and Cluster Computing

Solving Batched Linear Programs on GPU and Multicore CPU

Amit Gurung, Rajarshi Ray
Comments: contains 31 pages in double line spacing, 11 figures with 2 tables. A preliminary work has been accepted in the 8th IEEE International Student Research Symposim on High Performance Computing, HiPC’2015, Bangalore, December 16-19, 2015. this http URL
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Linear Programs (LPs) appear in a large number of applications and offloading
them to the GPU is viable to gain performance. Existing work on offloading and
solving an LP on GPU suggests that performance is gained from large sized LPs
(typically 500 constraints, 500 variables and above). In order to gain
performance from GPU for applications involving small to medium sized LPs, we
propose batched solving of a large number of LPs in parallel. In this paper, we
present the design and CUDA implementation of our batched LP solver library,
keeping memory coalescent access, reduced CPU-GPU memory transfer latency and
load balancing as the goals. The performance of the batched LP solver is
compared against sequential solving in the CPU using an open source solver GLPK
(GNU Linear Programming Kit). The performance is evaluated for three types of
LPs. The first type is the initial basic solution as feasible, the second type
is the initial basic solution as infeasible and the third type is the feasible
region as a Hyperbox. For the first type, we show a maximum speedup of
$18.3 imes$ when running a batch of $50k$ LPs of size $100$ ($100$ variables,
$100$ constraints). For the second type, a maximum speedup of $12 imes$ is
obtained with a batch of $10k$ LPs of size $200$. For the third type, we show a
significant speedup of $63 imes$ in solving a batch of nearly $4$ million LPs
of size 5 and $34 imes$ in solving 6 million LPs of size $28$. In addition, we
show that the open source library for solving linear programs-GLPK, can be
easily extended to solve many LPs in parallel with multi-threading. The thread
parallel GLPK implementation runs $9.6 imes$ faster in solving a batch of
$1e5$ LPs of size $100$, on a $12$-core Intel Xeon processor. We demonstrate
the application of our batched LP solver in the domain of state-space
exploration of mathematical models of control systems design.

Scalable Estimation of Precision Maps in a MapReduce Framework

Claus Brenner
Comments: ACM SIGSPATIAL’16, October 31-November 03, 2016, Burlingame, CA, USA
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Robotics (cs.RO)

This paper presents a large-scale strip adjustment method for LiDAR mobile
mapping data, yielding highly precise maps. It uses several concepts to achieve
scalability. First, an efficient graph-based pre-segmentation is used, which
directly operates on LiDAR scan strip data, rather than on point clouds.
Second, observation equations are obtained from a dense matching, which is
formulated in terms of an estimation of a latent map. As a result of this
formulation, the number of observation equations is not quadratic, but rather
linear in the number of scan strips. Third, the dynamic Bayes network, which
results from all observation and condition equations, is partitioned into two
sub-networks. Consequently, the estimation matrices for all position and
orientation corrections are linear instead of quadratic in the number of
unknowns and can be solved very efficiently using an alternating least squares
approach. It is shown how this approach can be mapped to a standard key/value
MapReduce implementation, where each of the processing nodes operates
independently on small chunks of data, leading to essentially linear
scalability. Results are demonstrated for a dataset of one billion measured
LiDAR points and 278,000 unknowns, leading to maps with a precision of a few
millimeters.

Opportunistic Network Decoupling With Virtual Full-Duplex Operation in Multi-Source Interfering Relay Networks

Won-Yong Shin, Vien V. Mai, Bang Chul Jung, Hyun Jong Yang
Comments: 22 pages, 5 figures, To appear in IEEE Transactions on Mobile Computing
Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

We introduce a new achievability scheme, termed opportunistic network
decoupling (OND), operating in virtual full-duplex mode. In the scheme, a novel
relay scheduling strategy is utilized in the $K imes N imes K$ channel with
interfering relays, consisting of $K$ source–destination pairs and $N$
half-duplex relays in-between them. A subset of relays using alternate relaying
is opportunistically selected in terms of producing the minimum total
interference level, thereby resulting in network decoupling. As our main
result, it is shown that under a certain relay scaling condition, the OND
protocol achieves $K$ degrees of freedom even in the presence of interfering
links among relays. Numerical evaluation is also shown to validate the
performance of the proposed OND. Our protocol basically operates in a fully
distributed fashion along with local channel state information, thereby
resulting in a relatively easy implementation.

Benchmarking SciDB Data Import on HPC Systems

Siddharth Samsi, Laura Brattain, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Jeremy Kepner, Albert Reuther
Comments: 5 pages, 4 figures, IEEE High Performance Extreme Computing (HPEC) 2016, best paper finalist
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF); Quantitative Methods (q-bio.QM)

SciDB is a scalable, computational database management system that uses an
array model for data storage. The array data model of SciDB makes it ideally
suited for storing and managing large amounts of imaging data. SciDB is
designed to support advanced analytics in database, thus reducing the need for
extracting data for analysis. It is designed to be massively parallel and can
run on commodity hardware in a high performance computing (HPC) environment. In
this paper, we present the performance of SciDB using simulated image data. The
Dynamic Distributed Dimensional Data Model (D4M) software is used to implement
the benchmark on a cluster running the MIT SuperCloud software stack. A peak
performance of 2.2M database inserts per second was achieved on a single node
of this system. We also show that SciDB and the D4M toolbox provide more
efficient ways to access random sub-volumes of massive datasets compared to the
traditional approaches of reading volumetric data from individual files. This
work describes the D4M and SciDB tools we developed and presents the initial
performance results. This performance was achieved by using parallel inserts, a
in-database merging of arrays as well as supercomputing techniques, such as
distributed arrays and single-program-multiple-data programming.

Learning

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean
Subjects: Learning (cs.LG)

Neural Machine Translation (NMT) is an end-to-end learning approach for
automated translation, with the potential to overcome many of the weaknesses of
conventional phrase-based translation systems. Unfortunately, NMT systems are
known to be computationally expensive both in training and in translation
inference. Also, most NMT systems have difficulty with rare words. These issues
have hindered NMT’s use in practical deployments and services, where both
accuracy and speed are essential. In this work, we present GNMT, Google’s
Neural Machine Translation system, which attempts to address many of these
issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder
layers using attention and residual connections. To improve parallelism and
therefore decrease training time, our attention mechanism connects the bottom
layer of the decoder to the top layer of the encoder. To accelerate the final
translation speed, we employ low-precision arithmetic during inference
computations. To improve handling of rare words, we divide words into a limited
set of common sub-word units (“wordpieces”) for both input and output. This
method provides a good balance between the flexibility of “character”-delimited
models and the efficiency of “word”-delimited models, naturally handles
translation of rare words, and ultimately improves the overall accuracy of the
system. Our beam search technique employs a length-normalization procedure and
uses a coverage penalty, which encourages generation of an output sentence that
is most likely to cover all the words in the source sentence. On the WMT’14
English-to-French and English-to-German benchmarks, GNMT achieves competitive
results to state-of-the-art. Using a human side-by-side evaluation on a set of
isolated simple sentences, it reduces translation errors by an average of 60%
compared to Google’s phrase-based production system.

Dropout with Expectation-linear Regularization

Dropout, a simple and effective way to train deep neural networks, has led to
a number of impressive empirical successes and spawned many recent theoretical
investigations. However, the gap between dropout’s training and inference
phases, introduced due to tractability considerations, has largely remained
under-appreciated. In this work, we first formulate dropout as a tractable
approximation of some latent variable model, leading to a clean view of
parameter sharing and enabling further theoretical analysis. Then, we introduce
(approximate) expectation-linear dropout neural networks, whose inference gap
we are able to formally characterize. Algorithmically, we show that our
proposed measure of the inference gap can be used to regularize the standard
dropout training objective, resulting in an emph{explicit} control of the gap.
Our method is as simple and efficient as standard dropout. We further prove the
upper bounds on the loss in accuracy due to expectation-linearization, describe
classes of input distributions that expectation-linearize easily. Experiments
on three image classification benchmark datasets demonstrate that reducing the
inference gap can indeed improve the performance consistently.

Information-Theoretic Methods for Planning and Learning in Partially Observable Markov Decision Processes

Roy Fox
Subjects: Learning (cs.LG)

Bounded agents are limited by intrinsic constraints on their ability to
process information that is available in their sensors and memory and choose
actions and memory updates. In this dissertation, we model these constraints as
information-rate constraints on communication channels connecting these various
internal components of the agent.

We make four major contributions detailed below and many smaller
contributions detailed in each section. First, we formulate the problem of
optimizing the agent under both extrinsic and intrinsic constraints and develop
the main tools for solving it. Second, we identify another reason for the
challenging convergence properties of the optimization algorithm, which is the
bifurcation structure of the update operator near phase transitions. Third, we
study the special case of linear-Gaussian dynamics and quadratic cost (LQG),
where the optimal solution has a particularly simple and solvable form. Fourth,
we explore the learning task, where the model of the world dynamics is unknown
and sample-based updates are used instead.

Derivative Delay Embedding: Online Modeling of Streaming Time Series

Zhifei Zhang, Yang Song, Wei Wang, Hairong Qi
Comments: Accepted by The 25th ACM International Conference on Information and Knowledge Management (CIKM 2016)
Subjects: Learning (cs.LG)

The staggering amount of streaming time series coming from the real world
calls for more efficient and effective online modeling solution. For time
series modeling, most existing works make some unrealistic assumptions such as
the input data is of fixed length or well aligned, which requires extra effort
on segmentation or normalization of the raw streaming data. Although some
literature claim their approaches to be invariant to data length and
misalignment, they are too time-consuming to model a streaming time series in
an online manner. We propose a novel and more practical online modeling and
classification scheme, DDE-MGM, which does not make any assumptions on the time
series while maintaining high efficiency and state-of-the-art performance. The
derivative delay embedding (DDE) is developed to incrementally transform time
series to the embedding space, where the intrinsic characteristics of data is
preserved as recursive patterns regardless of the stream length and
misalignment. Then, a non-parametric Markov geographic model (MGM) is proposed
to both model and classify the pattern in an online manner. Experimental
results demonstrate the effectiveness and superior classification accuracy of
the proposed DDE-MGM in an online setting as compared to the state-of-the-art.

Grounding object perception in a naive agent's sensorimotor experience

Alban Laflaquière, Nikolas Hemion
Comments: 7 pages, 4 figures, ICDL-Epirob 2015 conference
Subjects: Robotics (cs.RO); Learning (cs.LG)

Artificial object perception usually relies on a priori defined models and
feature extraction algorithms. We study how the concept of object can be
grounded in the sensorimotor experience of a naive agent. Without any knowledge
about itself or the world it is immersed in, the agent explores its
sensorimotor space and identifies objects as consistent networks of
sensorimotor transitions, independent from their context. A fundamental drive
for prediction is assumed to explain the emergence of such networks from a
developmental standpoint. An algorithm is proposed and tested to illustrate the
approach.

Deep Structured Features for Semantic Segmentation

Michael Tschannen, Lukas Cavigelli, Fabian Mentzer, Thomas Wiatowski, Luca Benini
Comments: 10 pages, 2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We propose a highly structured neural network architecture for semantic
segmentation of images that combines i) a Haar wavelet-based tree-like
convolutional neural network (CNN), ii) a random layer realizing a radial basis
function kernel approximation, and iii) a linear classifier. While stages i)
and ii) are completely pre-specified, only the linear classifier is learned
from data. Thanks to its high degree of structure, our architecture has a very
small memory footprint and thus fits onto low-power embedded and mobile
platforms. We apply the proposed architecture to outdoor scene and aerial image
semantic segmentation and show that the accuracy of our architecture is
competitive with conventional pixel classification CNNs. Furthermore, we
demonstrate that the proposed architecture is data efficient in the sense of
matching the accuracy of pixel classification CNNs when trained on a much
smaller data set.

Random Forest for Malware Classification

Felan Carlo C. Garcia, Felix P. Muga II
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

The challenge in engaging malware activities involves the correct
identification and classification of different malware variants. Various
malwares incorporate code obfuscation methods that alters their code signatures
effectively countering antimalware detection techniques utilizing static
methods and signature database. In this study, we utilized an approach of
converting a malware binary into an image and use Random Forest to classify
various malware families. The resulting accuracy of 0.9562 exhibits the
effectivess of the method in detecting malware

Accurate and Efficient Hyperbolic Tangent Activation Function on FPGA using the DCT Interpolation Filter

Implementing an accurate and fast activation function with low cost is a
crucial aspect to the implementation of Deep Neural Networks (DNNs) on FPGAs.
We propose a high-accuracy approximation approach for the hyperbolic tangent
activation function of artificial neurons in DNNs. It is based on the Discrete
Cosine Transform Interpolation Filter (DCTIF). The proposed architecture
combines simple arithmetic operations on stored samples of the hyperbolic
tangent function and on input data. The proposed DCTIF implementation achieves
two orders of magnitude greater precision than previous work while using the
same or fewer computational resources. Various combinations of DCTIF parameters
can be chosen to tradeoff the accuracy and complexity of the hyperbolic tangent
function. In one case, the proposed architecture approximates the hyperbolic
tangent activation function with 10E-5 maximum error while requiring only 1.52
Kbits memory and 57 LUTs of a Virtex-7 FPGA. We also discuss how the activation
function accuracy affects the performance of DNNs in terms of their training
and testing accuracies. We show that a high accuracy approximation can be
necessary in order to maintain the same DNN training and testing performances
realized by the exact function.

The RNN-ELM Classifier

Athanasios Vlontzos
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

In this paper we examine learning methods combining the Random Neural
Network, a biologically inspired neural network and the Extreme Learning
Machine that achieve state of the art classification performance while
requiring much shorter training time. The Random Neural Network is a integrate
and fire computational model of a neural network whose mathematical structure
permits the efficient analysis of large ensembles of neurons. An activation
function is derived from the RNN and used in an Extreme Learning Machine. We
compare the performance of this combination against the ELM with various
activation functions, we reduce the input dimensionality via PCA and compare
its performance vs. autoencoder based versions of the RNN-ELM.

Learning by Stimulation Avoidance: A Principle to Control Spiking Neural Networks Dynamics

Lana Sinapayen, Atsushi Masumori, Takashi Ikegami
Comments: 17 pages, 11 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

Learning based on networks of real neurons, and by extension biologically
inspired models of neural networks, has yet to find general learning rules
leading to widespread applications. In this paper, we argue for the existence
of a principle allowing to steer the dynamics of a biologically inspired neural
network. Using carefully timed external stimulation, the network can be driven
towards a desired dynamical state. We term this principle “Learning by
Stimulation Avoidance” (LSA). We demonstrate through simulation that the
minimal sufficient conditions leading to LSA in artificial networks are also
sufficient to reproduce learning results similar to those obtained in
biological neurons by Shahaf and Marom [1]. We examine the mechanism’s basic
dynamics in a reduced network, and demonstrate how it scales up to a network of
100 neurons. We show that LSA has a higher explanatory power than existing
hypotheses about the response of biological neural networks to external
simulation, and can be used as a learning rule for an embodied application:
learning of wall avoidance by a simulated robot. The surge in popularity of
artificial neural networks is mostly directed to disembodied models of neurons
with biologically irrelevant dynamics: to the authors’ knowledge, this is the
first work demonstrating sensory-motor learning with random spiking networks
through pure Hebbian learning.

Dynamic Pricing in High-dimensions

Adel Javanmard, Hamid Nazerzadeh
Comments: 32 pages
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We study the pricing problem faced by a firm that sells a large number of
products, described via a wide range of features, to customers that arrive over
time. This is motivated in part by the prevalence of online marketplaces that
allow for real-time pricing. We propose a dynamic policy, called Regularized
Maximum Likelihood Pricing (RMLP), that obtains asymptotically optimal revenue.
Our policy leverages the structure (sparsity) of a high-dimensional demand
space in order to obtain a logarithmic regret compared to the clairvoyant
policy that knows the parameters of the demand in advance. More specifically,
the regret of our algorithm is of $O(s_0 log T (log d + log T))$, where $d$
and $s_0$ correspond to the dimension of the demand space and its sparsity.
Furthermore, we show that no policy can obtain regret better than $O(s_0 (log
d + log T))$.

Informative Planning and Online Learning with Sparse Gaussian Processes

Kai-Chieh Ma, Lantao Liu, Gaurav S. Sukhatme
Subjects: Robotics (cs.RO); Learning (cs.LG); Machine Learning (stat.ML)

A big challenge in environmental monitoring is the spatiotemporal variation
of the phenomena to be observed. To enable persistent sensing and estimation in
such a setting, it is beneficial to have a time-varying underlying
environmental model. Here we present a planning and learning method that
enables an autonomous marine vehicle to perform persistent ocean monitoring
tasks by learning and refining an environmental model. To alleviate the
computational bottleneck caused by large-scale data accumulated, we propose a
framework that iterates between a planning component aimed at collecting the
most information-rich data, and a sparse Gaussian Process learning component
where the environmental model and hyperparameters are learned online by taking
advantage of only a subset of data that provides the greatest contribution. Our
simulations with ground-truth ocean data shows that the proposed method is both
accurate and efficient.

A Tutorial on Distributed (Non-Bayesian) Learning: Problem, Algorithms and Results

Angelia Nedić, Alex Olshevsky, César A. Uribe
Comments: Tutorial Presented in CDC2016
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Multiagent Systems (cs.MA); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

We overview some results on distributed learning with focus on a family of
recently proposed algorithms known as non-Bayesian social learning. We consider
different approaches to the distributed learning problem and its algorithmic
solutions for the case of finitely many hypotheses. The original centralized
problem is discussed at first, and then followed by a generalization to the
distributed setting. The results on convergence and convergence rate are
presented for both asymptotic and finite time regimes. Various extensions are
discussed such as those dealing with directed time-varying networks, Nesterov’s
acceleration technique and a continuum sets of hypothesis.

A Rotation Invariant Latent Factor Model for Moveme Discovery from Static Poses

We tackle the problem of learning a rotation invariant latent factor model
when the training data is comprised of lower-dimensional projections of the
original feature space. The main goal is the discovery of a set of 3-D bases
poses that can characterize the manifold of primitive human motions, or
movemes, from a training set of 2-D projected poses obtained from still images
taken at various camera angles. The proposed technique for basis discovery is
data-driven rather than hand-designed. The learned representation is rotation
invariant, and can reconstruct any training instance from multiple viewing
angles. We apply our method to modeling human poses in sports (via the Leeds
Sports Dataset), and demonstrate the effectiveness of the learned bases in a
range of applications such as activity classification, inference of dynamics
from a single frame, and synthetic representation of movements.

Information Theory

The Capacity of Private Information Retrieval from Coded Databases

Karim Banawan, Sennur Ulukus
Comments: Submitted to IEEE Transactions on Information Theory, September 2016
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)

We consider the problem of private information retrieval (PIR) over a
distributed storage system. The storage system consists of $N$ non-colluding
databases, each storing a coded version of $M$ messages. In the PIR problem,
the user wishes to retrieve one of the available messages without revealing the
message identity to any individual database. We derive the
information-theoretic capacity of this problem, which is defined as the maximum
number of bits of the desired message that can be privately retrieved per one
bit of downloaded information. We show that the PIR capacity in this case is
$C=left(1+frac{K}{N}+frac{K^2}{N^2}+cdots+frac{K^{M-1}}{N^{M-1}}
ight)^{-1}=(1+R_c+R_c^2+cdots+R_c^{M-1})^{-1}=frac{1-R_c}{1-R_c^M}$,
where $R_c$ is the rate of the $(N,K)$ code used. The capacity is a function of
the code rate and the number of messages only regardless of the explicit
structure of the storage code. The result implies a fundamental tradeoff
between the optimal retrieval cost and the storage cost. The result generalizes
the achievability and converse results for the classical PIR with replicating
databases to the case of coded databases.

Nearest-Neighbor and Contact Distance Distributions for Thomas Cluster Process

Mehrnaz Afshang, Chiranjib Saha, Harpreet S. Dhillon
Subjects: Information Theory (cs.IT)

We characterize the statistics of nearest-neighbor and contact distance
distributions for Thomas cluster process (TCP), which is a special case of
Poisson cluster process. In particular, we derive the cumulative distribution
function (CDF) of the distance to the nearest point of TCP from a reference
point for three different cases: (i) reference point is not a part of the point
process, (ii) it is chosen uniformly at random from the TCP, and (iii) it is a
randomly chosen point from a cluster chosen uniformly at random from the TCP.
While the first corresponds to the contact distance distribution, the other two
provide two different viewpoints for the nearest-neighbor distance
distribution.

Power Talk for Multibus DC MicroGrids: Creating and Optimizing Communication Channels

Marko Angjelichinoski, Cedomir Stefanovic, Petar Popovski
Comments: Shorter version of a paper accepted for presentation at GLOBECOM 2016
Subjects: Information Theory (cs.IT)

We study a communication framework for nonlinear multibus DC MicroGrids based
on a deliberate modification of the parameters of the primary control and
termed power talk. We assess the case in which the information is modulated in
the deviations of reference voltages of the primary control loops and show that
the outputs of the power talk communication channels can be approximated
through linear combinations of the respective inputs. We show that the
coefficients of the linear combinations, representing equivalent channel gains,
depend on the virtual resistances of the primary control loops, implying that
they can be modified such that effective received signal-to-noise ratio (SNR)
is increased. On the other hand, we investigate the constraints that power talk
incurs on the supplied power deviations. We show that these constraints
translate into constraints on the reference voltages and virtual resistances
that are imposed on all units in the system. In this regard, we develop an
optimization approach to find the set of controllable virtual resistances that
maximize SNR under the constraints on the supplied power deviations.

Metrics based on Finite Directed Graphs

Tuvi Etzion, Marcelo Firer, Roberto Assis Machado
Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM)

Given a finite directed graph with $n$ vertices, we define a metric $d_G$
over $mathbb{F}_q^n$, where the weight of a word is the number of vertices
that can be reached by a directed path starting at the support of the vector.
Two canonical forms, which do not affect the metric, are given to each graph.
Based on these forms we characterize each such metric. We further use these
forms to prove that two graphs with different canonical forms yield different
metrics. Efficient algorithms to check if a set of metric weights define a
metric based on a graph are given. We provide tight bounds on the number of
metric weights required to reconstruct the metric. Furthermore, we give a
complete description of the group of linear isometries of the graph metrics and
a characterization of the graphs for which every linear code admits a
$G$-canonical decomposition. Considering those graphs, we are able to derive an
expression of the packing radius of linear codes in such metric spaces.
Finally, given a directed graph which determines a hierarchical poset, we
present sufficient and necessary conditions to ensure the validity of the
MacWilliams Identity and the MacWilliams Extension Property.

Antenna Selection for MIMO-NOMA Networks

Yuehua Yu, He Chen, Yonghui Li, Zhiguo Ding, Branka Vucetic
Comments: Submitted for possible conference publication
Subjects: Information Theory (cs.IT)

This paper considers the joint antenna selection (AS) problem for a classical
two-user non-orthogonal multiple access (NOMA) network where both the base
station and users are equipped with multiple antennas. Since the
exhaustive-search-based optimal AS scheme is computationally prohibitive when
the number of antennas is large, two computationally efficient joint AS
algorithms, namely max-min-max AS (AIA-AS) and max-max-max AS (A$^3$-AS), are
proposed to maximize the system sum-rate. The asymptotic closed-form
expressions for the average sum-rates for both AIA-AS and A$^3$-AS are derived
in the high signal-to-noise ratio (SNR) regime, respectively. Numerical results
demonstrate that both AIA-AS and A$^3$-AS can yield significant performance
gains over comparable schemes. Furthermore, AIA-AS can provide better user
fairness, while the A$^3$-AS scheme can achieve the near-optimal sum-rate
performance.

Function Computation through a Bidirectional Relay

Jithin Ravi, Bikash Kumar Dey
Comments: 32 pages, 6 figures
Subjects: Information Theory (cs.IT)

We consider a function computation problem in a three node wireless network.
Nodes A and B observe two correlated sources $X$ and $Y$ respectively, and want
to compute a function $f(X,Y)$. To achieve this, nodes A and B send messages to
a relay node C at rates $R_A$ and $R_B$ respectively. The relay C then
broadcasts a message to A and B at rate $R_C$. We allow block coding, and study
the achievable region of rate triples under both zero-error and
$epsilon$-error. As a preparation, we first consider a broadcast network from
the relay to A and B. A and B have side information $X$ and $Y$ respectively.
The relay node C observes both $X$ and $Y$ and broadcasts an encoded message to
A and B. We want to obtain the optimal broadcast rate such that A and B can
recover the function $f(X,Y)$ from the received message and their individual
side information $X$ and $Y$ respectively. For a special case of $f(X,Y) = X
oplus Y$, we show equivalence between $epsilon$-error and zero-error
computations– this gives a rate characterization for zero-error XOR
computation in the broadcast network. As a corollary, this also gives a rate
characterization for zero-error in the relay network when the support set of
$p_{XY}$ is full. For the relay network, the zero-error rate region for
arbitrary functions is characterized in terms of graph coloring of some
suitably defined probabilistic graphs. We then give a single-letter inner bound
to this rate region. Further, we extend the graph theoretic ideas to address
the $epsilon$-error problem and obtain a single-letter inner bound.

Performance Comparison of Short-Length Error-Correcting Codes

J. Van Wonterghem, A. Alloum, J. J. Boutros, M. Moeneclaey
Comments: 6 pages, 5 figures, paper submitted to the IEEE SCVT 2016 conference
Subjects: Information Theory (cs.IT)

We compare the performance of short-length linear binary codes on the binary
erasure channel and the binary-input Gaussian channel. We use a universal
decoder that can decode any linear binary block code: Gaussian-elimination
based Maximum-Likelihood decoder on the erasure channel and probabilistic
Ordered Statistics Decoder on the Gaussian channel. As such we compare codes
and not decoders. The word error rate versus the channel parameter is found for
LDPC, Reed-Muller, Polar, and BCH codes at length 256 bits. BCH codes
outperform other codes in absence of cyclic redundancy check. Under joint
decoding, the concatenation of a cyclic redundancy check makes all codes
perform very close to optimal lower bounds.

Role of Interference Alignment in Wireless Cellular Network Optimization

Gokul Sridharan, Siyu Liu, Wei Yu
Comments: 15 pages, accepted for publication
Subjects: Information Theory (cs.IT)

The emergence of interference alignment (IA) as a degrees-of-freedom optimal
strategy motivates the need to investigate whether IA can be leveraged to aid
conventional network optimization algorithms that are only capable of finding
locally optimal solutions. To test the usefulness of IA in this context, this
paper proposes a two-stage optimization framework for the downlink of a
$G$-cell multi-antenna network with $K$ users/cell. The first stage of the
proposed framework focuses on nulling interference from a set of dominant
interferers using IA, while the second stage optimizes transmit and receive
beamformers to maximize a network-wide utility using the IA solution as the
initial condition. Further, this paper establishes a set of new feasibility
results for partial IA that can be used to guide the number of dominant
interferers to be nulled in the first stage. Through simulations on specific
topologies of a cluster of base-stations, it is observed that the impact of IA
depends on the choice of the utility function and the presence of
out-of-cluster interference. In the absence of out-of-cluster interference, the
proposed framework outperforms straightforward optimization when maximizing the
minimum rate, while providing marginal gains when maximizing sum-rate. However,
the benefit of IA is greatly diminished in the presence of significant
out-of-cluster interference.

Uplink Performance Analysis of Dense Cellular Networks with LoS and NLoS Transmissions

Tian Ding, Ming Ding, Guoqiang Mao, Zihuai Lin, David Lopez-Perez, Albert Zomaya
Subjects: Information Theory (cs.IT)

In this paper, we analyse the coverage probability and the area spectral
efficiency (ASE) for the uplink (UL) of dense small cell networks (SCNs)
considering a practical path loss model incorporating both line-of-sight (LoS)
and non-line-of-sight (NLoS) transmissions. Compared with the existing work, we
adopt the following novel approaches in our study: (i) we assume a practical
user association strategy (UAS) based on the smallest path loss, or
equivalently the strongest received signal strength; (ii) we model the
positions of both base stations (BSs) and the user equipments (UEs) as two
independent Homogeneous Poisson point processes (HPPPs); and (iii) the
correlation of BSs’ and UEs’ positions is considered, thus making our
analytical results more accurate. The performance impact of LoS and NLoS
transmissions on the ASE for the UL of dense SCNs is shown to be significant,
both quantitatively and qualitatively, compared with existing work that does
not differentiate LoS and NLoS transmissions. In particular, existing work
predicted that a larger UL power compensation factor would always result in a
better ASE in the practical range of BS density, i.e., 10^1-10^3 BSs/km^2.
However, our results show that a smaller UL power compensation factor can
greatly boost the ASE in the UL of dense SCNs, i.e., 10^2-10^3 BSs/km^2, while
a larger UL power compensation factor is more suitable for sparse SCNs, i.e.,
10^1-10^2 BSs/km^2.

Friendship-based Cooperative Jamming for Secure Communication in Poisson Networks

Yuanyu Zhang, Yulong Shen, Hua Wang, Xiaohong Jiang
Subjects: Information Theory (cs.IT)

Wireless networks with the consideration of social relationships among
network nodes are highly appealing for lots of important data communication
services. Ensuring the security of such networks is of great importance to
facilitate their applications in supporting future social-based services with
strong security guarantee. This paper explores the physical layer
security-based secure communication in a finite Poisson network with social
friendships among nodes, for which a social friendship-based cooperative
jamming scheme is proposed. The jamming scheme consists of a Local Friendship
Circle (LFC) and a Long-range Friendship Annulus (LFA), where all legitimate
nodes in the LFC serve as jammers, but the legitimate nodes in the LFA are
selected as jammers through three location-based policies. To understand both
the security and reliability performance of the proposed jamming scheme, we
first model the sum interference at any location in the network by deriving its
Laplace transform under two typical path loss scenarios. With the help of the
interference Laplace transform results, we then derive the exact expression for
the transmission outage probability (TOP) and determine both the upper and
lower bounds on the secrecy outage probability (SOP), such that the overall
outage performances of the proposed jamming scheme can be depicted. Finally, we
present extensive numerical results to validate the theoretical analysis of TOP
and SOP and also to illustrate the impacts of the friendship-based cooperative
jamming on the network performances.

Semantic Information Measure with Two Types of Probability for Falsification and Confirmation

Cheguang Lu
Comments: 27 pages, 8 figure, 10tables, a list of symbols
Subjects: Information Theory (cs.IT); Logic (math.LO); Probability (math.PR); Statistics Theory (math.ST)

Logical Probability (LP) is strictly distinguished from Statistical
Probability (SP). To measure semantic information or confirm hypotheses, we
need to use sampling distribution (conditional SP function) to test or confirm
fuzzy truth function (conditional LP function). The Semantic Information
Measure (SIM) proposed is compatible with Shannon’s information theory and
Fisher’s likelihood method. It can ensure that the less the LP of a predicate
is and the larger the true value of the proposition is, the more information
there is. So the SIM can be used as Popper’s information criterion for
falsification or test. The SIM also allows us to optimize the true-value of
counterexamples or degrees of disbelief in a hypothesis to get the optimized
degree of belief, i. e. Degree of Confirmation (DOC). To explain confirmation,
this paper 1) provides the calculation method of the DOC of universal
hypotheses; 2) discusses how to resolve Raven Paradox with new DOC and its
increment; 3) derives the DOC of rapid HIV tests: DOC of
test-positive=1-(1-specificity)/sensitivity, which is similar to Likelihood
Ratio (=sensitivity/(1-specificity)) but has the upper limit 1; 4) discusses
negative DOC for excessive affirmations, wrong hypotheses, or lies; and 5)
discusses the DOC of general hypotheses with GPS as example.

The Exact Rate-Memory Tradeoff for Caching with Uncoded Prefetching

Qian Yu, Mohammad Ali Maddah-Ali, A. Salman Avestimehr
Subjects: Information Theory (cs.IT)

We consider a basic cache network, in which a single server is connected to
multiple users via a shared bottleneck link. The server has a database of a set
of files (content). Each user has an isolated memory that can be used to cache
content in a prefetching phase. In a following delivery phase, each user
requests a file from the database and the server needs to deliver users’
demands as efficiently as possible by taking into account their cache contents.
We focus on an important and commonly used class of prefetching schemes, where
the caches are filled with uncoded data. We provide the exact characterization
of rate-memory tradeoff for this problem, by deriving the both the minimum
average rate (for a uniform file popularity) and the minimum peak-rate required
on the bottleneck link for a given cache size available at each user. In
particular, we propose a novel caching scheme, which strictly improves the
state of the art by exploiting commonality among users’ demands. We then
demonstrate the exact optimality of our proposed scheme through a matching
converse, by dividing the set of all demands into types, and showing that the
placement phase in the proposed caching scheme is universally optimal for all
types. Using these techniques, we also fully characterize the rate-memory
tradeoff for a decentralized setting, in which users fill out their cache
content without any coordination.

On Analytical and Geometric Lattice Design Criteria for Wiretap Coset Codes

Alex Karrila, David Karpuk, Camilla Hollanti
Comments: 25 pages, 6 figures
Subjects: Information Theory (cs.IT)

This paper considers physical layer security and the design of secure coset
codes for wiretap channels, where information is not to be leaked to an
eavesdropper having a degraded channel. The eavesdropper’s bounds for correct
decoding probability and information are first revisited, and a new variant of
the information bound is derived. The new bound is valid for a general channel
with any fading and Gaussian noise. From these bounds, it is explicit that both
the information and probability are upper bounded by the average flatness
factor, i.e., the expected theta function of the faded lattice related to the
eavesdropper. Taking the minimization of the average flatness factor as a
design criterion, simple geometric heuristics to minimize it in the Gaussian
and Rayleigh fast fading channels are motivated. It is concluded that in the
Gaussian channel, the security boils down to the sphere packing density of the
eavesdropper’s lattice, whereas in the Rayleigh fading channel a full-diversity
well-rounded lattice with a dense sphere packing will provide the best secrecy.
The proposed criteria are backed up by extensive numerical experiments.

Simultaneous Spectrum Sensing and Data Reception for Cognitive Spatial Multiplexing Distributed Systems

Nikolaos I. Miridakis, Theodoros A. Tsiftsis, George C. Alexandropoulos, Merouane Debbah
Subjects: Information Theory (cs.IT)

A multi-user cognitive (secondary) radio system is considered, where the
spatial multiplexing mode of operation is implemented amongst the nodes, under
the presence of multiple primary transmissions. The secondary receiver carries
out minimum mean-squared error (MMSE) detection to effectively decode the
secondary data streams, while it performs spectrum sensing at the remaining
signal to capture the presence of primary activity or not. New analytical
closed-form expressions regarding some important system measures are obtained,
namely, the outage and detection probabilities; the transmission power of the
secondary nodes; the probability of unexpected interference at the primary
nodes; {color{blue}and the detection efficiency with the aid of the area under
the receive operating characteristics curve}. The realistic scenarios of
channel fading time variation and channel estimation errors are encountered for
the derived results. Finally, the enclosed numerical results verify the
accuracy of the proposed framework, while some useful engineering insights are
also revealed, such as the key role of the detection accuracy to the overall
performance and the impact of transmission power from the secondary nodes to
the primary system.

Performance Impact of Idle Mode Capability on Dense Small Cell Networks with LoS and NLoS Transmissions

Ming Ding, David Lopez-Perez, Guoqiang Mao, Zihuai Lin
Comments: submitted to IEEE TWC
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

In dense small cell networks (SCNs), a large number of base stations (BSs)
can be put to idle modes without signal transmission, if there is no active
user equipment (UE) within their coverage areas. Setting those BSs to idle
modes can mitigate unnecessary inter-cell interference and reduce energy
consumption. Such idle mode feature at BSs is referred to as the idle mode
capability (IMC) and it can largely improve the performance of the
5th-generation (5G) networks. In this paper, we study the performance impact of
the BS IMC on dense SCNs. Different from existing work, we consider a
sophisticated and more realistic path loss model incorporating both
line-of-sight (LoS) and non-line-of-sight (NLoS) transmissions. Analytical
results are obtained for the coverage probability, the area spectral efficiency
and the energy efficiency performance for SCNs with the BS IMC. An upper bound,
a lower bound and an approximate expression of the density of the non-idle BSs
are also derived. The performance impact of the IMC on network densification is
shown to be significant. As the BS density surpasses the UE density, thus
creating a surplus of BSs, the coverage probability will continuously increase
toward one, which addresses the critical issue of coverage probability decrease
caused by the LoS/NLoS transmissions. The results derived from our analysis
shed valuable new light on the deployment and the operation of future dense
SCNs in 5G.

Well-Rounded Lattices for Coset Coding in MIMO Wiretap Channels

Oliver W. Gnilke, Amaro Barreal, Alex Karrila, Ha Thanh Nguyen Tran, David A. Karpuk, Camilla Hollanti
Subjects: Information Theory (cs.IT)

The concept of well-rounded lattices has recently found important
applications in the setting of a fading single-input single-output (SISO)
wiretap channel. It has been shown that, under this setup, the property of
being well-rounded is critical for minimizing the eavesdropper’s probability of
correct decoding in lower SNR regimes. The superior performance of coset codes
constructed from well-rounded lattices has been illustrated in several
simulations.

In the present article, this work is extended to fading multiple-input
multiple-output (MIMO) wiretap channels, and similar design criteria as in the
SISO case are derived. Further, explicit coset codes for Rayleigh fading MIMO
wiretap channels are designed. In particular, it is shown through extensive
simulations that sublattices of the well-known Alamouti code and Golden code
which meet our design criteria perform better than scalar multiples of the code
lattice for the same parameters.

Opportunistic Network Decoupling With Virtual Full-Duplex Operation in Multi-Source Interfering Relay Networks

We introduce a new achievability scheme, termed opportunistic network
decoupling (OND), operating in virtual full-duplex mode. In the scheme, a novel
relay scheduling strategy is utilized in the $K imes N imes K$ channel with
interfering relays, consisting of $K$ source–destination pairs and $N$
half-duplex relays in-between them. A subset of relays using alternate relaying
is opportunistically selected in terms of producing the minimum total
interference level, thereby resulting in network decoupling. As our main
result, it is shown that under a certain relay scaling condition, the OND
protocol achieves $K$ degrees of freedom even in the presence of interfering
links among relays. Numerical evaluation is also shown to validate the
performance of the proposed OND. Our protocol basically operates in a fully
distributed fashion along with local channel state information, thereby
resulting in a relatively easy implementation.

Compressed Hypothesis Testing: To Mix or Not to Mix?

Myung Cho, Weiyu Xu, Lifeng Lai
Comments: compressed sensing, hypothesis testing, Chernoff information, anomaly detection, anomalous random variable, quickest detection. arXiv admin note: substantial text overlap with arXiv:1208.2311
Subjects: Information Theory (cs.IT)

In this paper, we study the problem of determining $k$ anomalous random
variables that have different probability distributions from the rest $(n-k)$
random variables. Instead of sampling each individual random variable
separately as in the conventional hypothesis testing, we propose to perform
hypothesis testing using mixed observations that are functions of multiple
random variables. We characterize the error exponents for correctly identifying
the $k$ anomalous random variables under fixed time-invariant mixed
observations, random time-varying mixed observations, and deterministic
time-varying mixed observations. Our error exponent characterization is through
newly introduced notions of emph{inner conditional Chernoff information} and
emph{outer conditional Chernoff information}. It is demonstrated that mixed
observations can strictly improve the error exponents of hypothesis testing,
over separate observations of individual random variables. We further
characterize the optimal sensing vector maximizing the error exponents, which
lead to explicit constructions of the optimal mixed observations in special
cases of hypothesis testing for Gaussian random variables. These results show
that mixed observations of random variables can reduce the number of required
samples in hypothesis testing applications.

A Compressed Sampling and Dictionary Learning Framework for WDM-Based Distributed Fiber Sensing

Christian Weiss, Abdelhak M. Zoubir
Comments: Submitted on July 30. 2016, to [copyright 2016 Optical Society of America.]. One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modifications of the content of this paper are prohibited
Subjects: Methodology (stat.ME); Information Theory (cs.IT)

We propose a versatile framework that unifies compressed sampling and
dictionary learning for fiber-optic sensing. It employs a redundant dictionary
that is generated from a parametric signal model and establishes a relation to
the physical quantity of interest. Imperfect prior knowledge is considered in
terms of uncertain local an global parameters. To estimate a sparse
representation and the dictionary parameters, we present a modified
alternating-minimization algorithm that is equipped with a pre-processing
routine to handle strong dictionary coherence. The performance is evaluated by
simulations and experimental data for a practical system with common core
architecture.

Multi-Rate Control over AWGN Channels via Analog Joint Source-Channel Coding

Anatoly Khina, Gustav M. Pettersson, Victoria Kostina, Babak Hassibi
Comments: An extended version of a paper to be presented in CDC2016
Subjects: Systems and Control (cs.SY); Information Theory (cs.IT)

We consider the problem of controlling an unstable plant over an additive
white Gaussian noise (AWGN) channel with a transmit power constraint, where the
signaling rate of communication is larger than the sampling rate (for
generating observations and applying control inputs) of the underlying plant.
Such a situation is quite common since sampling is done at a rate that captures
the dynamics of the plant and which is often much lower than the rate that can
be communicated. This setting offers the opportunity of improving the system
performance by employing multiple channel uses to convey a single message
(output plant observation or control input). Common ways of doing so are
through either repeating the message, or by quantizing it to a number of bits
and then transmitting a channel coded version of the bits whose length is
commensurate with the number of channel uses per sampled message. We argue that
such “separated source and channel coding” can be suboptimal and propose to
perform joint source-channel coding. Since the block length is short we obviate
the need to go to the digital domain altogether and instead consider analog
joint source-channel coding. For the case where the communication signaling
rate is twice the sampling rate, we employ the Archimedean bi-spiral-based
Shannon-Kotel’nikov analog maps to show significant improvement in stability
margins and linear-quadratic Gaussian (LQG) costs over simple schemes that
employ repetition.