IT博客汇 | arXiv Paper Daily: Tue, 18 Oct 2016

arXiv Paper Daily: Tue, 18 Oct 2016

我爱机器学习(52ml.net)发表于 2016-10-18 00:00:00

Neural and Evolutionary Computing

Evolving the Structure of Evolution Strategies

Sander van Rijn, Hao Wang, Matthijs van Leeuwen, Thomas Bäck
Comments: 10 pages. Extended (pre-print) version of SSCI 2016 submission
Subjects: Neural and Evolutionary Computing (cs.NE)

Various variants of the well known Covariance Matrix Adaptation Evolution
Strategy (CMA-ES) have been proposed recently, which improve the empirical
performance of the original algorithm by structural modifications. However, in
practice it is often unclear which variation is best suited to the specific
optimization problem at hand. As one approach to tackle this issue, algorithmic
mechanisms attached to CMA-ES variants are considered and extracted as
functional emph{modules}, allowing for combinations of them. This leads to a
configuration space over ES structures, which enables the exploration of
algorithm structures and paves the way toward novel algorithm generation.
Specifically, eleven modules are incorporated in this framework with two or
three alternative configurations for each module, resulting in $4,608$
algorithms. A self-adaptive Genetic Algorithm (GA) is used to efficiently
evolve effective ES-structures for given classes of optimization problems,
outperforming any classical CMA-ES variants from literature. The proposed
approach is evaluated on noiseless functions from BBOB suite. Furthermore, such
an observation is again confirmed on different function groups and
dimensionality, indicating the feasibility of ES configuration on real-world
problem classes.

Weekly maintenance scheduling using exact and genetic methods

Andrew W. Palmer, Robin Vujanic, Andrew J. Hill, Steven J. Scheding
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

The weekly maintenance schedule specifies when maintenance activities should
be performed on the equipment, taking into account the availability of workers
and maintenance bays, and other operational constraints. The current approach
to generating this schedule is labour intensive and requires coordination
between the maintenance schedulers and operations staff to minimise its impact
on the operation of the mine. This paper presents methods for automatically
generating this schedule from the list of maintenance tasks to be performed,
the availability roster of the maintenance staff, and time windows in which
each piece of equipment is available for maintenance. Both Mixed-Integer Linear
Programming (MILP) and genetic algorithms are evaluated, with the genetic
algorithm shown to significantly outperform the MILP. Two fitness functions for
the genetic algorithm are also examined, with a linear fitness function
outperforming an inverse fitness function by up to 5% for the same calculation
time. The genetic algorithm approach is computationally fast, allowing the
schedule to be rapidly recalculated in response to unexpected delays and
breakdowns.

Multiple Instance Fuzzy Inference Neural Networks

Amine Ben Khalifa, Hichem Frigui
Comments: Submitted to IEEE Transactions On Cybernetics for review
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (cs.SY)

Fuzzy logic is a powerful tool to model knowledge uncertainty, measurements
imprecision, and vagueness. However, there is another type of vagueness that
arises when data have multiple forms of expression that fuzzy logic does not
address quite well. This is the case for multiple instance learning problems
(MIL). In MIL, an object is represented by a collection of instances, called a
bag. A bag is labeled negative if all of its instances are negative, and
positive if at least one of its instances is positive. Positive bags encode
ambiguity since the instances themselves are not labeled. In this paper, we
introduce fuzzy inference systems and neural networks designed to handle bags
of instances as input and capable of learning from ambiguously labeled data.
First, we introduce the Multiple Instance Sugeno style fuzzy inference
(MI-Sugeno) that extends the standard Sugeno style inference to handle
reasoning with multiple instances. Second, we use MI-Sugeno to define and
develop Multiple Instance Adaptive Neuro Fuzzy Inference System (MI-ANFIS). We
expand the architecture of the standard ANFIS to allow reasoning with bags and
derive a learning algorithm using backpropagation to identify the premise and
consequent parameters of the network. The proposed inference system is tested
and validated using synthetic and benchmark datasets suitable for MIL problems.
We also apply the proposed MI-ANFIS to fuse the output of multiple
discrimination algorithms for the purpose of landmine detection using Ground
Penetrating Radar.

Cached Long Short-Term Memory Neural Networks for Document-Level Sentiment Classification

Jiacheng Xu, Danlu Chen, Xipeng Qiu, Xuangjing Huang
Comments: Published as long paper of EMNLP2016
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

Recently, neural networks have achieved great success on sentiment
classification due to their ability to alleviate feature engineering. However,
one of the remaining challenges is to model long texts in document-level
sentiment classification under a recurrent architecture because of the
deficiency of the memory unit. To address this problem, we present a Cached
Long Short-Term Memory neural networks (CLSTM) to capture the overall semantic
information in long texts. CLSTM introduces a cache mechanism, which divides
memory into several groups with different forgetting rates and thus enables the
network to keep sentiment information better within a recurrent unit. The
proposed CLSTM outperforms the state-of-the-art models on three publicly
available document-level sentiment analysis datasets.

Computer Vision and Pattern Recognition

Rule Extraction Algorithm for Deep Neural Networks: A Review

Tameru Hailesilassie
Comments: 6 pages,2 figures,IEEE Publication format, Keywords- Artificial neural network; Deep neural network; Rule extraction; Decompositional; Pedagogical; Eclectic
Journal-ref: (IJCSIS) International Journal of Computer Science and Information
Security,Vol. 14, No. 7, July 2016, page 371-381
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Despite the highest classification accuracy in wide varieties of application
areas, artificial neural network has one disadvantage. The way this Network
comes to a decision is not easily comprehensible. The lack of explanation
ability reduces the acceptability of neural network in data mining and decision
system. This drawback is the reason why researchers have proposed many rule
extraction algorithms to solve the problem. Recently, Deep Neural Network (DNN)
is achieving a profound result over the standard neural network for
classification and recognition problems. It is a hot machine learning area
proven both useful and innovative. This paper has thoroughly reviewed various
rule extraction algorithms, considering the classification scheme:
decompositional, pedagogical, and eclectics. It also presents the evaluation of
these algorithms based on the neural network structure with which the algorithm
is intended to work. The main contribution of this review is to show that there
is a limited study of rule extraction algorithm from DNN.

Structured Sparse Subspace Clustering: A Joint Affinity Learning and Subspace Clustering Framework

Chun-Guang Li, Chong You, René Vidal
Comments: 13 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Subspace clustering refers to the problem of segmenting data drawn from a
union of subspaces. State of the art approaches for solving this problem follow
a two-stage approach. In the first step, an affinity matrix is learned from the
data using sparse or low-rank minimization techniques. In the second step, the
segmentation is found by applying spectral clustering to this affinity. While
this approach has led to state-of-the-art results in many applications, it is
sub-optimal because it does not exploit the fact that the affinity and the
segmentation depend on each other. In this paper, we propose a joint
optimization framework — Structured Sparse Subspace Clustering (S$^3$C) —
for learning both the affinity and the segmentation. The proposed S$^3$C
framework is based on expressing each data point as a structured sparse linear
combination of all other data points, where the structure is induced by a norm
that depends on the unknown segmentation. Moreover, we extend the proposed
S$^3$C framework into Constrained Structured Sparse Subspace Clustering
(CS$^3$C) in which available partial side-information is incorporated into the
stage of learning the affinity. We show that both the structured sparse
representation and the segmentation can be found via a combination of an
alternating direction method of multipliers with spectral clustering.
Experiments on a synthetic data set, the Extended Yale B data set, the Hopkins
155 motion segmentation database, and three cancer data sets demonstrate the
effectiveness of our approach.

Spatio-temporal Co-Occurrence Characterizations for Human Action Classification

Aznul Qalid Md Sabri, Jacques Boonaert, Erma Rahayu Mohd Faizal Abdullah, Ali Mohammed Mansoor
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The human action classification task is a widely researched topic and is
still an open problem. Many state-of-the-arts approaches involve the usage of
bag-of-video-words with spatio-temporal local features to construct
characterizations for human actions. In order to improve beyond this standard
approach, we investigate the usage of co-occurrences between local features. We
propose the usage of co-occurrences information to characterize human actions.
A trade-off factor is used to define an optimal trade-off between vocabulary
size and classification rate. Next, a spatio-temporal co-occurrence technique
is applied to extract co-occurrence information between labeled local features.
Novel characterizations for human actions are then constructed. These include a
vector quantized correlogram-elements vector, a highly discriminative PCA
(Principal Components Analysis) co-occurrence vector and a Haralick texture
vector. Multi-channel kernel SVM (support vector machine) is utilized for
classification. For evaluation, the well known KTH as well as the challenging
UCF-Sports action datasets are used. We obtained state-of-the-arts
classification performance. We also demonstrated that we are able to fully
utilize co-occurrence information, and improve the standard bag-of-video-words
approach.

Deep Learning Prototype Domains for Person Re-Identification

Arne Schumann, Shaogang Gong, Tobias Schuchert
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Person re-identification (re-id) is the task of matching multiple occurrences
of the same person from different cameras, poses, lighting conditions, and a
multitude of other factors which alter the visual appearance. Typically, this
is achieved by learning either optimal features or matching metrics which are
adapted to specific pairs of camera views dictated by the pairwise labelled
training datasets. In this work, we formulate a deep learning based novel
approach to automatic prototype-domain discovery for domain perceptive
(adaptive) person re-id (rather than camera pair specific learning) for any
camera views scalable to new unseen scenes without training data. We learn a
separate re-id model for each of the discovered prototype-domains and during
model deployment, use the person probe image to select automatically the model
of the closest prototype domain. Our approach requires neither supervised nor
unsupervised domain adaptation learning, i.e. no data available from the target
domains. We evaluate extensively our model under realistic re-id conditions
using automatically detected bounding boxes with low-resolution and partial
occlusion. We show that our approach outperforms most of the state-of-the-art
supervised and unsupervised methods on the latest CUHK-SYSU and PRW benchmarks.

Encoding the Local Connectivity Patterns of fMRI for Cognitive State Classification

Itir Onal Ertugrul, Mete Ozay, Fatos T. Yarman Vural
Comments: 8 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

In this work, we propose a novel framework to encode the local connectivity
patterns of brain, using Fisher Vectors (FV), Vector of Locally Aggregated
Descriptors (VLAD) and Bag-of-Words (BoW) methods. We first obtain local
descriptors, called Mesh Arc Descriptors (MADs) from fMRI data, by forming
local meshes around anatomical regions, and estimating their relationship
within a neighborhood. Then, we extract a dictionary of relationships, called
extit{brain connectivity dictionary} by fitting a generative Gaussian mixture
model (GMM) to a set of MADs, and selecting the codewords at the mean of each
component of the mixture. Codewords represent the connectivity patterns among
anatomical regions. We also encode MADs by VLAD and BoW methods using the
k-Means clustering.

We classify the cognitive states of Human Connectome Project (HCP) task fMRI
dataset, where we train support vector machines (SVM) by the encoded MADs.
Results demonstrate that, FV encoding of MADs can be successfully employed for
classification of cognitive tasks, and outperform the VLAD and BoW
representations. Moreover, we identify the significant Gaussians in mixture
models by computing energy of their corresponding FV parts, and analyze their
effect on classification accuracy. Finally, we suggest a new method to
visualize the codewords of brain connectivity dictionary.

Spatio-Temporal Attention Models for Grounded Video Captioning

Mihai Zanfir, Elisabeta Marinoiu, Cristian Sminchisescu
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Automatic video captioning is challenging due to the complex interactions in
dynamic real scenes. A comprehensive system would ultimately localize and track
the objects, actions and interactions present in a video and generate a
description that relies on temporal localization in order to ground the visual
concepts. However, most existing automatic video captioning systems map from
raw video data to high level textual description, bypassing localization and
recognition, thus discarding potentially valuable information for content
localization and generalization. In this work we present an automatic video
captioning model that combines spatio-temporal attention and image
classification by means of deep neural network structures based on long
short-term memory. The resulting system is demonstrated to produce
state-of-the-art results in the standard YouTube captioning benchmark while
also offering the advantage of localizing the visual concepts (subjects, verbs,
objects), with no grounding supervision, over space and time.

What is the Best Way for Extracting Meaningful Attributes from Pictures?

Liangchen Liu, Arnold Wiliem, Shaokang Chen, Brian C. Lovell
Comments: Submission to Pattern Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Automatic attribute discovery methods have gained in popularity to extract
sets of visual attributes from images or videos for various tasks. Despite
their good performance in some classification tasks, it is difficult to
evaluate whether the attributes discovered by these methods are meaningful and
which methods are the most appropriate to discover attributes for visual
descriptions. In its simplest form, such an evaluation can be performed by
manually verifying whether there is any consistent identifiable visual concept
distinguishing between positive and negative exemplars labelled by an
attribute. This manual checking is tedious, expensive and labour intensive. In
addition, comparisons between different methods could also be problematic as it
is not clear how one could quantitatively decide which attribute is more
meaningful than the others. In this paper, we propose a novel attribute
meaningfulness metric to address this challenging problem. With this metric,
automatic quantitative evaluation can be performed on the attribute sets; thus,
reducing the enormous effort to perform manual evaluation. The proposed metric
is applied to some recent automatic attribute discovery and hashing methods on
four attribute-labelled datasets. To further validate the efficacy of the
proposed method, we conducted a user study. In addition, we also compared our
metric with a semi-supervised attribute discover method using the mixture of
probabilistic PCA. In our evaluation, we gleaned several insights that could be
beneficial in developing new automatic attribute discovery methods.

Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input

Srinath Sridhar, Franziska Mueller, Michael Zollhöfer, Dan Casas, Antti Oulasvirta, Christian Theobalt
Comments: Proceedings of ECCV 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Real-time simultaneous tracking of hands manipulating and interacting with
external objects has many potential applications in augmented reality, tangible
computing, and wearable computing. However, due to difficult occlusions, fast
motions, and uniform hand appearance, jointly tracking hand and object pose is
more challenging than tracking either of the two separately. Many previous
approaches resort to complex multi-camera setups to remedy the occlusion
problem and often employ expensive segmentation and optimization steps which
makes real-time tracking impossible. In this paper, we propose a real-time
solution that uses a single commodity RGB-D camera. The core of our approach is
a 3D articulated Gaussian mixture alignment strategy tailored to hand-object
tracking that allows fast pose optimization. The alignment energy uses novel
regularizers to address occlusions and hand-object contacts. For added
robustness, we guide the optimization with discriminative part classification
of the hand and segmentation of the object. We conducted extensive experiments
on several existing datasets and introduce a new annotated hand-object dataset.
Quantitative and qualitative results show the key advantages of our method:
speed, accuracy, and robustness.

Digital Makeup from Internet Images

Asad Khan, Yudong Guo, Ligang Liu
Comments: 10 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

We present a novel approach of color transfer between images by exploring
their high-level semantic information. First, we set up a database which
consists of the collection of downloaded images from the internet, which are
segmented automatically by using matting techniques. We then, extract image
foregrounds from both source and multiple target images. Then by using image
matting algorithms, the system extracts the semantic information such as faces,
lips, teeth, eyes, eyebrows, etc., from the extracted foregrounds of the source
image. And, then the color is transferred between corresponding parts with the
same semantic information. Next we get the color transferred result by
seamlessly compositing different parts together using alpha blending. In the
final step, we present an efficient method of color consistency to optimize the
color of a collection of images showing the common scene. The main advantage of
our method over existing techniques is that it does not need face matching, as
one could use more than one target images. It is not restricted to head shot
images as we can also change the color style in the wild. Moreover, our
algorithm does not require to choose the same color style, same pose and image
size between source and target images. Our algorithm is not restricted to
one-to-one image color transfer and can make use of more than one target images
to transfer the color in different parts in the source image. Comparing with
other approaches, our algorithm is much better in color blending in the input
data.

Location Sensitive Deep Convolutional Neural Networks for Segmentation of White Matter Hyperintensities

Mohsen Ghafoorian, Nico Karssemeijer, Tom Heskes, Inge van Uden, Clara Sanchez, Geert Litjens, Frank-Erik. de Leeuw, Bram van Ginneken, Elena Marchiori, Bram Platel
Comments: 13 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The anatomical location of imaging features is of crucial importance for
accurate diagnosis in many medical tasks. Convolutional neural networks (CNN)
have had huge successes in computer vision, but they lack the natural ability
to incorporate the anatomical location in their decision making process,
hindering success in some medical image analysis tasks.

In this paper, to integrate the anatomical location information into the
network, we propose several deep CNN architectures that consider multi-scale
patches or take explicit location features while training. We apply and compare
the proposed architectures for segmentation of white matter hyperintensities in
brain MR images on a large dataset. As a result, we observe that the CNNs that
incorporate location information substantially outperform a conventional
segmentation method with hand-crafted features as well as CNNs that do not
integrate location information. On a test set of 46 scans, the best
configuration of our networks obtained a Dice score of 0.791, compared to 0.797
for an independent human observer. Performance levels of the machine and the
independent human observer were not statistically significantly different
(p-value=0.17).

To Frontalize or Not To Frontalize: Do We Really Need Elaborate Pre-Processing to Improve Face Recognition Performance?

Sandipan Banerjee, Joel Brogan, Janez Krizaj, Aparna Bharati, Brandon RichardWebster, Vitomir Struc, Patrick Flynn, Walter Scheirer
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Automatic face recognition performance has improved remarkably in the last
decade. Much of this success can be attributed to the development of deep
learning techniques like convolutional neural networks (CNNs). But the training
process for CNNs requires a large amount of clean and well-labelled training
data. If a CNN is intended to work with non-frontal face images, should this
training data be diverse in terms of facial poses, or should face images be
frontalized as a pre-processing step? We address this question in this paper.
We evaluate a set of popular facial landmarking and pose frontalization
algorithms to understand their effect on facial recognition performance. We
also introduce a new automatic frontalization scheme that operates over a
single image without the need for a subject-specific 3D model, and perform a
comparative analysis between the new scheme and other methods in the
literature. A CNN trained on face images frontalized using different
pre-processing methods is used to extract features from the Point and Shoot
Challenge (PaSC) video dataset. The verification and identification performance
of the CNN serves to quantify the effectiveness of each landmarking and
frontalization scheme. We find that frontalization, although an intuitive
pre-processing strategy, does not significantly improve face recognition
performance when compared with a simple 2D face alignment.

Beyond Spatial Auto-Regressive Models: Predicting Housing Prices with Satellite Imagery

Archith J. Bency, Swati Rallapalli, Raghu K. Ganti, Mudhakar Srivatsa, B. S. Manjunath
Comments: 10 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

When modeling geo-spatial data, it is critical to capture spatial
correlations for achieving high accuracy. Spatial Auto-Regression (SAR) is a
common tool used to model such data, where the spatial contiguity matrix (W)
encodes the spatial correlations. However, the efficacy of SAR is limited by
two factors. First, it depends on the choice of contiguity matrix, which is
typically not learnt from data, but instead, is assumed to be known apriori.
Second, it assumes that the observations can be explained by linear models. In
this paper, we propose a Convolutional Neural Network (CNN) framework to model
geo-spatial data (specifi- cally housing prices), to learn the spatial
correlations automatically. We show that neighborhood information embedded in
satellite imagery can be leveraged to achieve the desired spatial smoothing. An
additional upside of our framework is the relaxation of linear assumption on
the data. Specific challenges we tackle while implementing our framework
include, (i) how much of the neighborhood is relevant while estimating housing
prices? (ii) what is the right approach to capture multiple resolutions of
satellite imagery? and (iii) what other data-sources can help improve the
estimation of spatial correlations? We demonstrate a marked improvement of 57%
on top of the SAR baseline through the use of features from deep neural
networks for the cities of London, Birmingham and Liverpool.

Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning

Ziad Al-Halah, Makarand Tapaswi, Rainer Stiefelhagen
Comments: Published as a conference paper at CVPR 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Collecting training images for all visual categories is not only expensive
but also impractical. Zero-shot learning (ZSL), especially using attributes,
offers a pragmatic solution to this problem. However, at test time most
attribute-based methods require a full description of attribute associations
for each unseen class. Providing these associations is time consuming and often
requires domain specific knowledge. In this work, we aim to carry out
attribute-based zero-shot classification in an unsupervised manner. We propose
an approach to learn relations that couples class embeddings with their
corresponding attributes. Given only the name of an unseen class, the learned
relationship model is used to automatically predict the class-attribute
associations. Furthermore, our model facilitates transferring attributes across
data sets without additional effort. Integrating knowledge from multiple
sources results in a significant additional improvement in performance. We
evaluate on two public data sets: Animals with Attributes and aPascal/aYahoo.
Our approach outperforms state-of-the-art methods in both predicting
class-attribute associations and unsupervised ZSL by a large margin.

Incremental One-Class Models for Data Classification

Takoua Kefi, Riadh Ksantini, M.Becha Kaaniche, Adel Bouhoula
Comments: 4 pages, accepted in PhD Forum Session of the ECML-PKDD 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper we outline a PhD research plan. This research contributes to
the field of one-class incremental learning and classification in case of
non-stationary environments. The goal of this PhD is to define a new
classification framework able to deal with very small learning dataset at the
beginning of the process and with abilities to adjust itself according to the
variability of the incoming data which create large scale datasets. As a
preliminary work, incremental Covariance-guided One-Class Support Vector
Machine is proposed to deal with sequentially obtained data. It is inspired
from COSVM which put more emphasis on the low variance directions while keeping
the basic formulation of incremental One-Class Support Vector Machine
untouched. The incremental procedure is introduced by controlling the possible
changes of support vectors after the addition of new data points, thanks to the
Karush-Kuhn-Tucker conditions, that have to be maintained on all previously
acquired data. Comparative experimental results with contemporary incremental
and non-incremental one-class classifiers on numerous artificial and real data
sets show that our method results in significantly better classification
performance.

Road Curb Extraction from Mobile LiDAR Point Clouds

Sheng Xu, Ruisheng Wang, Han Zheng
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Automatic extraction of road curbs from uneven, unorganized, noisy and
massive 3D point clouds is a challenging task. Existing methods often project
3D point clouds onto 2D planes to extract curbs. However, the projection causes
loss of 3D information which degrades the performance of the detection. This
paper presents a robust, accurate and efficient method to extract road curbs
from 3D mobile LiDAR point clouds. Our method consists of two steps: 1)
extracting the candidate points of curbs based on the proposed novel energy
function and 2) refining the candidate points using the proposed least cost
path model. We evaluated our method on a large-scale of residential area
(16.7GB, 300 million points) and an urban area (1.07GB, 20 million points)
mobile LiDAR point clouds. Results indicate that the proposed method is
superior to the state-of-the-art methods in terms of robustness, accuracy and
efficiency. The proposed curb extraction method achieved a completeness of
78.62% and a correctness of 83.29%. These experiments demonstrate that the
proposed method is a promising solution to extract road curbs from mobile LiDAR
point clouds.

Deep Learning Ensembles for Melanoma Recognition in Dermoscopy Images

Noel Codella, Quoc-Bao Nguyen, Sharath Pankanti, David Gutman, Brian Helba, Allan Halpern, John R. Smith
Comments: Permission from journal obtained for preprint on arXiv. IBM Journal is an IEEE Journal (this http URL), which approves arXiv as a third party server (this http URL). Document is a PDF from Word, which is required format for IBM Journal
Journal-ref: IBM Journal of Research and Development, vol. 61, no. 4/5, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Melanoma is the deadliest form of skin cancer. While curable with early
detection, only highly trained specialists are capable of accurately
recognizing the disease. As expertise is in limited supply, automated systems
capable of identifying disease could save lives, reduce unnecessary biopsies,
and reduce costs. Toward this goal, we propose a system that combines recent
developments in deep learning with established machine learning approaches,
creating ensembles of methods that are capable of segmenting skin lesions, as
well as analyzing the detected area and surrounding tissue for melanoma
detection. The system is evaluated using the largest publicly available
benchmark dataset of dermoscopic images, containing 900 training and 379
testing images. New state-of-the-art performance levels are demonstrated,
leading to an improvement in the area under receiver operating characteristic
curve of 7.5% (0.843 vs. 0.783), in average precision of 4% (0.649 vs. 0.624),
and in specificity measured at the clinically relevant 95% sensitivity
operating point 2.9 times higher than the previous state-of-the-art (36.8%
specificity compared to 12.5%). Compared to the average of 8 expert
dermatologists on a subset of 100 test images, the proposed system produces a
higher accuracy (76% vs. 70.5%), and specificity (62% vs. 59%) evaluated at an
equivalent sensitivity (82%).

A Harmonic Mean Linear Discriminant Analysis for Robust Image Classification

Shuai Zheng, Feiping Nie, Chris Ding, Heng Huang
Comments: IEEE 28th International Conference on Tools with Artificial Intelligence, ICTAI 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Linear Discriminant Analysis (LDA) is a widely-used supervised dimensionality
reduction method in computer vision and pattern recognition. In null space
based LDA (NLDA), a well-known LDA extension, between-class distance is
maximized in the null space of the within-class scatter matrix. However, there
are some limitations in NLDA. Firstly, for many data sets, null space of
within-class scatter matrix does not exist, thus NLDA is not applicable to
those datasets. Secondly, NLDA uses arithmetic mean of between-class distances
and gives equal consideration to all between-class distances, which makes
larger between-class distances can dominate the result and thus limits the
performance of NLDA. In this paper, we propose a harmonic mean based Linear
Discriminant Analysis, Multi-Class Discriminant Analysis (MCDA), for image
classification, which minimizes the reciprocal of weighted harmonic mean of
pairwise between-class distance. More importantly, MCDA gives higher priority
to maximize small between-class distances. MCDA can be extended to multi-label
dimension reduction. Results on 7 single-label data sets and 4 multi-label data
sets show that MCDA has consistently better performance than 10 other
single-label approaches and 4 other multi-label approaches in terms of
classification accuracy, macro and micro average F1 score.

Multiple Instance Fuzzy Inference Neural Networks

Fuzzy logic is a powerful tool to model knowledge uncertainty, measurements
imprecision, and vagueness. However, there is another type of vagueness that
arises when data have multiple forms of expression that fuzzy logic does not
address quite well. This is the case for multiple instance learning problems
(MIL). In MIL, an object is represented by a collection of instances, called a
bag. A bag is labeled negative if all of its instances are negative, and
positive if at least one of its instances is positive. Positive bags encode
ambiguity since the instances themselves are not labeled. In this paper, we
introduce fuzzy inference systems and neural networks designed to handle bags
of instances as input and capable of learning from ambiguously labeled data.
First, we introduce the Multiple Instance Sugeno style fuzzy inference
(MI-Sugeno) that extends the standard Sugeno style inference to handle
reasoning with multiple instances. Second, we use MI-Sugeno to define and
develop Multiple Instance Adaptive Neuro Fuzzy Inference System (MI-ANFIS). We
expand the architecture of the standard ANFIS to allow reasoning with bags and
derive a learning algorithm using backpropagation to identify the premise and
consequent parameters of the network. The proposed inference system is tested
and validated using synthetic and benchmark datasets suitable for MIL problems.
We also apply the proposed MI-ANFIS to fuse the output of multiple
discrimination algorithms for the purpose of landmine detection using Ground
Penetrating Radar.

Partial Procedural Geometric Model Fitting for Point Clouds

Zongliang Zhang, Jonathan Li, Yulan Guo, Yangbin Lin, Ming Cheng, Cheng Wang
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

Geometric model fitting is a fundamental task in computer graphics and
computer vision. However, most geometric model fitting methods are unable to
fit an arbitrary geometric model (e.g. a surface with holes) to incomplete
data, due to that the similarity metrics used in these methods are unable to
measure the rigid partial similarity between arbitrary models. This paper hence
proposes a novel rigid geometric similarity metric, which is able to measure
both the full similarity and the partial similarity between arbitrary geometric
models. The proposed metric enables us to perform partial procedural geometric
model fitting (PPGMF). The task of PPGMF is to search a procedural geometric
model space for the model rigidly similar to a query of non-complete point set.
Models in the procedural model space are generated according to a set of
parametric modeling rules. A typical query is a point cloud. PPGMF is very
useful as it can be used to fit arbitrary geometric models to non-complete
(incomplete, over-complete or hybrid-complete) point cloud data. For example,
most laser scanning data is non-complete due to occlusion. Our PPGMF method
uses Markov chain Monte Carlo technique to optimize the proposed similarity
metric over the model space. To accelerate the optimization process, the method
also employs a novel coarse-to-fine model dividing strategy to reject
dissimilar models in advance. Our method has been demonstrated on a variety of
geometric models and non-complete data. Experimental results show that the
PPGMF method based on the proposed metric is able to fit non-complete data,
while the method based on other metrics is unable. It is also shown that our
method can be accelerated by several times via early rejection.

Artificial Intelligence

Internet of Things Applications: Animal Monitoring with Unmanned Aerial Vehicle

Jun Xu, Gurkan Solmaz, Rouhollah Rahmatizadeh, Damla Turgut, Ladislau Boloni
Comments: 12 pages
Subjects: Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)

In animal monitoring applications, both animal detection and their movement
prediction are major tasks. While a variety of animal monitoring strategies
exist, most of them rely on mounting devices. However, in real world, it is
difficult to find these animals and install mounting devices. In this paper, we
propose an animal monitoring application by utilizing wireless sensor networks
(WSNs) and unmanned aerial vehicle (UAV). The objective of the application is
to detect locations of endangered species in large-scale wildlife areas and
monitor movement of animals without any attached devices. In this application,
sensors deployed throughout the observation area are responsible for gathering
animal information. The UAV flies above the observation area and collects the
information from sensors. To achieve the information efficiently, we propose a
path planning approach for the UAV based on a Markov decision process (MDP)
model. The UAV receives a certain amount of reward from an area if some animals
are detected at that location. We solve the MDP using Q-learning such that the
UAV prefers going to those areas that animals are detected before. Meanwhile,
the UAV explores other areas as well to cover the entire network and detects
changes in the animal positions. We first define the mathematical model
underlying the animal monitoring problem in terms of the value of information
(VoI) and rewards. We propose a network model including clusters of sensor
nodes and a single UAV that acts as a mobile sink and visits the clusters.
Then, one MDP-based path planning approach is designed to maximize the VoI
while reducing message delays. The effectiveness of the proposed approach is
evaluated using two real-world movement datasets of zebras and leopard.
Simulation results show that our approach outperforms greedy, random heuristics
and the path planning based on the traveling salesman problem.

Improvements in Sub-optimal Solving of the $(N^2-1)$-Puzzle via Joint Relocation of Pebbles and its Applications to Rule-based Cooperative Path-Finding

Pavel Surynek, Petr Michalík
Subjects: Artificial Intelligence (cs.AI)

The problem of solving $(n^2-1)$-puzzle and cooperative path-finding (CPF)
sub-optimally by rule based algorithms is addressed in this manuscript. The
task in the puzzle is to rearrange $n^2-1$ pebbles on the square grid of the
size of n x n using one vacant position to a desired goal configuration. An
improvement to the existent polynomial-time algorithm is proposed and
experimentally analyzed. The improved algorithm is trying to move pebbles in a
more efficient way than the original algorithm by grouping them into so-called
snakes and moving them jointly within the snake. An experimental evaluation
showed that the algorithm using snakes produces solutions that are 8% to 9%
shorter than solutions generated by the original algorithm.

The snake-based relocation has been also integrated into rule-based
algorithms for solving the CPF problem sub-optimally, which is a closely
related task. The task in CPF is to relocate a group of abstract robots that
move over an undirected graph to given goal vertices. Robots can move to
unoccupied neighboring vertices and at most one robot can be placed in each
vertex. The $(n^2-1)$-puzzle is a special case of CPF where the underlying
graph is represented by a 4-connected grid and there is only one vacant vertex.
Two major rule-based algorithms for CPF were included in our study – BIBOX and
PUSH-and-SWAP (PUSH-and-ROTATE). Improvements gained by using snakes in the
BIBOX algorithm were stable around 30% in $(n^2-1)$-puzzle solving and up to
50% in CPFs over bi-connected graphs with various ear decompositions and
multiple vacant vertices. In the case of the PUSH-and-SWAP algorithm the
improvement achieved by snakes was around 5% to 8%. However, the improvement
was unstable and hardly predictable in the case of PUSH-and-SWAP.

Fault Detection Engine in Intelligent Predictive Analytics Platform for DCIM

Bodhisattwa Prasad Majumder, Ayan Sengupta, Sajal jain, Parikshit Bhaduri
Comments: Accepted in 4th International Conference on Business Analytics and Intelligence (ICBAI 2016)
Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

With the advancement of huge data generation and data handling capability,
Machine Learning and Probabilistic modelling enables an immense opportunity to
employ predictive analytics platform in high security critical industries
namely data centers, electricity grids, utilities, airport etc. where downtime
minimization is one of the primary objectives. This paper proposes a novel,
complete architecture of an intelligent predictive analytics platform, Fault
Engine, for huge device network connected with electrical/information flow.
Three unique modules, here proposed, seamlessly integrate with available
technology stack of data handling and connect with middleware to produce online
intelligent prediction in critical failure scenarios. The Markov Failure module
predicts the severity of a failure along with survival probability of a device
at any given instances. The Root Cause Analysis model indicates probable
devices as potential root cause employing Bayesian probability assignment and
topological sort. Finally, a community detection algorithm produces correlated
clusters of device in terms of failure probability which will further narrow
down the search space of finding route cause. The whole Engine has been tested
with different size of network with simulated failure environments and shows
its potential to be scalable in real-time implementation.

Learning and Transfer of Modulated Locomotor Controllers

Nicolas Heess, Greg Wayne, Yuval Tassa, Timothy Lillicrap, Martin Riedmiller, David Silver
Comments: Supplemental video available at this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

We study a novel architecture and training procedure for locomotion tasks. A
high-frequency, low-level “spinal” network with access to proprioceptive
sensors learns sensorimotor primitives by training on simple tasks. This
pre-trained module is fixed and connected to a low-frequency, high-level
“cortical” network, with access to all sensors, which drives behavior by
modulating the inputs to the spinal network. Where a monolithic end-to-end
architecture fails completely, learning with a pre-trained spinal module
succeeds at multiple high-level tasks, and enables the effective exploration
required to learn from sparse rewards. We test our proposed architecture on
three simulated bodies: a 16-dimensional swimming snake, a 20-dimensional
quadruped, and a 54-dimensional humanoid. Our results are illustrated in the
accompanying video at this https URL

Weekly maintenance scheduling using exact and genetic methods

Andrew W. Palmer, Robin Vujanic, Andrew J. Hill, Steven J. Scheding
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

The weekly maintenance schedule specifies when maintenance activities should
be performed on the equipment, taking into account the availability of workers
and maintenance bays, and other operational constraints. The current approach
to generating this schedule is labour intensive and requires coordination
between the maintenance schedulers and operations staff to minimise its impact
on the operation of the mine. This paper presents methods for automatically
generating this schedule from the list of maintenance tasks to be performed,
the availability roster of the maintenance staff, and time windows in which
each piece of equipment is available for maintenance. Both Mixed-Integer Linear
Programming (MILP) and genetic algorithms are evaluated, with the genetic
algorithm shown to significantly outperform the MILP. Two fitness functions for
the genetic algorithm are also examined, with a linear fitness function
outperforming an inverse fitness function by up to 5% for the same calculation
time. The genetic algorithm approach is computationally fast, allowing the
schedule to be rapidly recalculated in response to unexpected delays and
breakdowns.

Wind ramp event prediction with parallelized Gradient Boosted Regression Trees

Saurav Gupta, Nitin Anand Shrivastava, Abbas Khosravi, Bijaya Ketan Panigrahi
Comments: IJCNN 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Accurate prediction of wind ramp events is critical for ensuring the
reliability and stability of the power systems with high penetration of wind
energy. This paper proposes a classification based approach for estimating the
future class of wind ramp event based on certain thresholds. A parallelized
gradient boosted regression tree based technique has been proposed to
accurately classify the normal as well as rare extreme wind power ramp events.
The model has been validated using wind power data obtained from the National
Renewable Energy Laboratory database. Performance comparison with several
benchmark techniques indicates the superiority of the proposed technique in
terms of superior classification accuracy.

Efficient Rectangular Maximal-Volume Algorithm for Rating Elicitation in Collaborative Filtering

Alexander Fonarev, Alexander Mikhalev, Pavel Serdyukov, Gleb Gusev, Ivan Oseledets
Comments: IEEE International Conference on Data Mining (ICDM) 2016
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Cold start problem in Collaborative Filtering can be solved by asking new
users to rate a small seed set of representative items or by asking
representative users to rate a new item. The question is how to build a seed
set that can give enough preference information for making good
recommendations. One of the most successful approaches, called Representative
Based Matrix Factorization, is based on Maxvol algorithm. Unfortunately, this
approach has one important limitation — a seed set of a particular size
requires a rating matrix factorization of fixed rank that should coincide with
that size. This is not necessarily optimal in the general case. In the current
paper, we introduce a fast algorithm for an analytical generalization of this
approach that we call Rectangular Maxvol. It allows the rank of factorization
to be lower than the required size of the seed set. Moreover, the paper
includes the theoretical analysis of the method’s error, the complexity
analysis of the existing methods and the comparison to the state-of-the-art
approaches.

A Harmonic Mean Linear Discriminant Analysis for Robust Image Classification

Linear Discriminant Analysis (LDA) is a widely-used supervised dimensionality
reduction method in computer vision and pattern recognition. In null space
based LDA (NLDA), a well-known LDA extension, between-class distance is
maximized in the null space of the within-class scatter matrix. However, there
are some limitations in NLDA. Firstly, for many data sets, null space of
within-class scatter matrix does not exist, thus NLDA is not applicable to
those datasets. Secondly, NLDA uses arithmetic mean of between-class distances
and gives equal consideration to all between-class distances, which makes
larger between-class distances can dominate the result and thus limits the
performance of NLDA. In this paper, we propose a harmonic mean based Linear
Discriminant Analysis, Multi-Class Discriminant Analysis (MCDA), for image
classification, which minimizes the reciprocal of weighted harmonic mean of
pairwise between-class distance. More importantly, MCDA gives higher priority
to maximize small between-class distances. MCDA can be extended to multi-label
dimension reduction. Results on 7 single-label data sets and 4 multi-label data
sets show that MCDA has consistently better performance than 10 other
single-label approaches and 4 other multi-label approaches in terms of
classification accuracy, macro and micro average F1 score.

Information Retrieval

Efficient Rectangular Maximal-Volume Algorithm for Rating Elicitation in Collaborative Filtering

Cold start problem in Collaborative Filtering can be solved by asking new
users to rate a small seed set of representative items or by asking
representative users to rate a new item. The question is how to build a seed
set that can give enough preference information for making good
recommendations. One of the most successful approaches, called Representative
Based Matrix Factorization, is based on Maxvol algorithm. Unfortunately, this
approach has one important limitation — a seed set of a particular size
requires a rating matrix factorization of fixed rank that should coincide with
that size. This is not necessarily optimal in the general case. In the current
paper, we introduce a fast algorithm for an analytical generalization of this
approach that we call Rectangular Maxvol. It allows the rank of factorization
to be lower than the required size of the seed set. Moreover, the paper
includes the theoretical analysis of the method’s error, the complexity
analysis of the existing methods and the comparison to the state-of-the-art
approaches.

Term-Class-Max-Support (TCMS): A Simple Text Document Categorization Approach Using Term-Class Relevance Measure

D S Guru, Mahamad Suhil
Comments: 4 Pages, 4 Figures; 2016 Intl. Conference on Advances in Computing, Communications and Informatics (ICACCI), Sept. 21-24, 2016, Jaipur, India
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

In this paper, a simple text categorization method using term-class relevance
measures is proposed. Initially, text documents are processed to extract
significant terms present in them. For every term extracted from a document, we
compute its importance in preserving the content of a class through a novel
term-weighting scheme known as Term_Class Relevance (TCR) measure proposed by
Guru and Suhil (2015) [1]. In this way, for every term, its relevance for all
the classes present in the corpus is computed and stored in the knowledgebase.
During testing, the terms present in the test document are extracted and the
term-class relevance of each term is obtained from the stored knowledgebase. To
achieve quick search of term weights, Btree indexing data structure has been
adapted. Finally, the class which receives maximum support in terms of
term-class relevance is decided to be the class of the given test document. The
proposed method works in logarithmic complexity in testing time and simple to
implement when compared to any other text categorization techniques available
in literature. The experiments conducted on various benchmarking datasets have
revealed that the performance of the proposed method is satisfactory and
encouraging.

Computation and Language

Achieving Human Parity in Conversational Speech Recognition

W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, G. Zweig
Subjects: Computation and Language (cs.CL)

Conversational speech recognition has served as a flagship speech recognition
task since the release of the DARPA Switchboard corpus in the 1990s. In this
paper, we measure the human error rate on the widely used NIST 2000 test set,
and find that our latest automated system has reached human parity. The error
rate of professional transcriptionists is 5.9% for the Switchboard portion of
the data, in which newly acquainted pairs of people discuss an assigned topic,
and 11.3% for the CallHome portion where friends and family members have
open-ended conversations. In both cases, our automated system establishes a new
state-of-the-art, and edges past the human benchmark. This marks the first time
that human parity has been reported for conversational speech. The key to our
system’s performance is the systematic use of convolutional and LSTM neural
networks, combined with a novel spatial smoothing method and lattice-free MMI
acoustic training.

Pre-Translation for Neural Machine Translation

Jan Niehues, Eunah Cho, Thanh-Le Ha, Alex Waibel
Comments: 9 pages. To appear in COLING 2016
Subjects: Computation and Language (cs.CL)

Recently, the development of neural machine translation (NMT) has
significantly improved the translation quality of automatic machine
translation. While most sentences are more accurate and fluent than
translations by statistical machine translation (SMT)-based systems, in some
cases, the NMT system produces translations that have a completely different
meaning. This is especially the case when rare words occur.

When using statistical machine translation, it has already been shown that
significant gains can be achieved by simplifying the input in a preprocessing
step. A commonly used example is the pre-reordering approach.

In this work, we used phrase-based machine translation to pre-translate the
input into the target language. Then a neural machine translation system
generates the final hypothesis using the pre-translation. Thereby, we use
either only the output of the phrase-based machine translation (PBMT) system or
a combination of the PBMT output and the source sentence.

We evaluate the technique on the English to German translation task. Using
this approach we are able to outperform the PBMT system as well as the baseline
neural MT system by up to 2 BLEU points. We analyzed the influence of the
quality of the initial system on the final result.

Neural Machine Translation Advised by Statistical Machine Translation

Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, Min Zhang
Comments: submitted to AAAI 2017
Subjects: Computation and Language (cs.CL)

Neural Machine Translation (NMT) is a new approach to machine translation
that has made great progress in recent years. However, recent studies show that
NMT generally produces fluent but inadequate translations (Tu et al. 2016; He
et al. 2016). This is in contrast to conventional Statistical Machine
Translation (SMT), which usually yields adequate but non-fluent translations.
It is natural, therefore, to leverage the advantages of both models for better
translations, and in this work we propose to incorporate SMT model into NMT
framework. More specifically, at each decoding step, SMT offers additional
recommendations of generated words based on the decoding information from NMT
(e.g., the generated partial translation and attention history). Then we employ
an auxiliary classifier to score the SMT recommendations and a gating function
to combine the SMT recommendations with NMT generations, both of which are
jointly trained within the NMT architecture in an end-to-end manner.
Experimental results on Chinese-English translation show that the proposed
approach achieves significant and consistent improvements over state-of the-art
NMT and SMT systems on multiple NIST test sets.

Interactive Attention for Neural Machine Translation

Fandong Meng, Zhengdong Lu, Hang Li, Qun Liu
Comments: Accepted at COLING 2016
Subjects: Computation and Language (cs.CL)

Conventional attention-based Neural Machine Translation (NMT) conducts
dynamic alignment in generating the target sentence. By repeatedly reading the
representation of source sentence, which keeps fixed after generated by the
encoder (Bahdanau et al., 2015), the attention mechanism has greatly enhanced
state-of-the-art NMT. In this paper, we propose a new attention mechanism,
called INTERACTIVE ATTENTION, which models the interaction between the decoder
and the representation of source sentence during translation by both reading
and writing operations. INTERACTIVE ATTENTION can keep track of the interaction
history and therefore improve the translation performance. Experiments on NIST
Chinese-English translation task show that INTERACTIVE ATTENTION can achieve
significant improvements over both the previous attention-based NMT baseline
and some state-of-the-art variants of attention-based NMT (i.e., coverage
models (Tu et al., 2016)). And neural machine translator with our INTERACTIVE
ATTENTION can outperform the open source attention-based NMT system Groundhog
by 4.22 BLEU points and the open source phrase-based system Moses by 3.94 BLEU
points averagely on multiple test sets.

Cached Long Short-Term Memory Neural Networks for Document-Level Sentiment Classification

Jiacheng Xu, Danlu Chen, Xipeng Qiu, Xuangjing Huang
Comments: Published as long paper of EMNLP2016
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

Recently, neural networks have achieved great success on sentiment
classification due to their ability to alleviate feature engineering. However,
one of the remaining challenges is to model long texts in document-level
sentiment classification under a recurrent architecture because of the
deficiency of the memory unit. To address this problem, we present a Cached
Long Short-Term Memory neural networks (CLSTM) to capture the overall semantic
information in long texts. CLSTM introduces a cache mechanism, which divides
memory into several groups with different forgetting rates and thus enables the
network to keep sentiment information better within a recurrent unit. The
proposed CLSTM outperforms the state-of-the-art models on three publicly
available document-level sentiment analysis datasets.

Translation Quality Estimation using Recurrent Neural Network

Raj Nath Patel, Sasikumar M
Comments: 7 pages, published at First Conference on Machine Translation
Journal-ref: cs.CL.NLP.WMT2016.QE. 2 (2016) 819-824
Subjects: Computation and Language (cs.CL)

This paper describes our submission to the shared task on word/phrase level
Quality Estimation (QE) in the First Conference on Statistical Machine
Translation (WMT16). The objective of the shared task was to predict if the
given word/phrase is a correct/incorrect (OK/BAD) translation in the given
sentence. In this paper, we propose a novel approach for word level Quality
Estimation using Recurrent Neural Network Language Model (RNN-LM) architecture.
RNN-LMs have been found very effective in different Natural Language Processing
(NLP) applications. RNN-LM is mainly used for vector space language modeling
for different NLP problems. For this task, we modify the architecture of
RNN-LM. The modified system predicts a label (OK/BAD) in the slot rather than
predicting the word. The input to the system is a word sequence, similar to the
standard RNN-LM. The approach is language independent and requires only the
translated text for QE. To estimate the phrase level quality, we use the output
of the word level QE system.

Term-Class-Max-Support (TCMS): A Simple Text Document Categorization Approach Using Term-Class Relevance Measure

In this paper, a simple text categorization method using term-class relevance
measures is proposed. Initially, text documents are processed to extract
significant terms present in them. For every term extracted from a document, we
compute its importance in preserving the content of a class through a novel
term-weighting scheme known as Term_Class Relevance (TCR) measure proposed by
Guru and Suhil (2015) [1]. In this way, for every term, its relevance for all
the classes present in the corpus is computed and stored in the knowledgebase.
During testing, the terms present in the test document are extracted and the
term-class relevance of each term is obtained from the stored knowledgebase. To
achieve quick search of term weights, Btree indexing data structure has been
adapted. Finally, the class which receives maximum support in terms of
term-class relevance is decided to be the class of the given test document. The
proposed method works in logarithmic complexity in testing time and simple to
implement when compared to any other text categorization techniques available
in literature. The experiments conducted on various benchmarking datasets have
revealed that the performance of the proposed method is satisfactory and
encouraging.

Generalization of metric classification algorithms for sequences classification and labelling

Roman Samarev, Andrey Vasnetsov, Elizaveta Smelkova
Subjects: Learning (cs.LG); Computation and Language (cs.CL)

The article deals with the issue of modification of metric classification
algorithms. In particular, it studies the algorithm k-Nearest Neighbours for
its application to sequential data. A method of generalization of metric
classification algorithms is proposed. As a part of it, there has been
developed an algorithm for solving the problem of classification and labelling
of sequential data. The advantages of the developed algorithm of classification
in comparison with the existing one are also discussed in the article. There is
a comparison of the effectiveness of the proposed algorithm with the algorithm
of CRF in the task of chunking in the open data set CoNLL2000.

Simultaneous Learning of Trees and Representations for Extreme Classification, with Application to Language Modeling

Yacine Jernite, Anna Choromanska, David Sontag, Yann LeCun
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)

This paper addresses the problem of multi-class classification with an
extremely large number of classes, where the class predictor is learned jointly
with the data representation, as is the case in language modeling problems. The
predictor admits a hierarchical structure, which allows for efficient handling
of settings that deal with a very large number of labels. The predictive power
of the model however can heavily depend on the structure of the tree. We
address this problem with an algorithm for tree construction and training that
is based on a new objective function which favors balanced and easily-separable
node partitions. We describe theoretical properties of this objective function
and show that it gives rise to a boosting algorithm for which we provide a
bound on classification error, i.e. we show that if the objective is weakly
optimized in the internal nodes of the tree, then our algorithm will amplify
this weak advantage to build a tree achieving any desired level of accuracy. We
apply the algorithm to the task of language modeling by re-framing conditional
density estimation as a variant of the hierarchical classification problem. We
empirically demonstrate on text data that the proposed approach leads to
high-quality trees in terms of perplexity and computational running time
compared to its non-hierarchical counterpart.

Distributed, Parallel, and Cluster Computing

Parallel Stream Processing Against Workload Skewness and Variance

Junhua Fang, Rong Zhang, Tom Z.J.Fu, Zhenjie Zhang, Aoying Zhou, Junhua Zhu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)

Key-based workload partitioning is a common strategy used in parallel stream
processing engines, enabling effective key-value tuple distribution over worker
threads in a logical operator. While randomized hashing on the keys is capable
of balancing the workload for key-based partitioning when the keys generally
follow a static distribution, it is likely to generate poor balancing
performance when workload variance occurs on the incoming data stream. This
paper presents a new key-based workload partitioning framework, with practical
algorithms to support dynamic workload assignment for stateful operators. The
framework combines hash-based and explicit key-based routing strategies for
workload distribution, which specifies the destination worker threads for a
handful of keys and assigns the other keys with the hashing function. When
short-term distribution fluctuations occur to the incoming data stream, the
system adaptively updates the routing table containing the chosen keys, in
order to rebalance the workload with minimal migration overhead within the
stateful operator. We formulate the rebalance operation as an optimization
problem, with multiple objectives on minimizing state migration costs,
controlling the size of the routing table and breaking workload imbalance among
worker threads. Despite of the NP-hardness nature behind the optimization
formulation, we carefully investigate and justify the heuristics behind key
(re)routing and state migration, to facilitate fast response to workload
variance with ignorable cost to the normal processing in the distributed
system. Empirical studies on synthetic data and real-world stream applications
validate the usefulness of our proposals and prove the huge advantage of our
approaches over state-of-the-art solutions in the literature.

Analysis of the modernization prospects of the WLCG monitoring framework's messaging subsystem

V. Airiian
Comments: 5 pages, XIX International Scientific Conference of Young Scientists and Specialists (AYSS-2015), JINR, Dubna, 16-20 February 2015
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The purpose of the project is an analysis of the modernization prospects of
the WLCG monitoring framework’s messaging subsystem based on Nagios monitoring
software and Apache ActiveMQ technologies. The modernization process demands
thorough examination of the existing subsystem to determine the vital upgrade
requirements. Thus the work is focused on research of the main underlying
technologies, the existing subsystem’s structure and revision of its design and
used software.

Fault Tolerant Frequent Pattern Mining

Sameh Shohdy, Abhinav Vishnu, Gagan Agrawal
Comments: 10 Pages, High Performance Computing Conference (HIPC 2016)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

FP-Growth algorithm is a Frequent Pattern Min- ing (FPM) algorithm that has
been extensively used to study correlations and patterns in large scale
datasets. While several researchers have designed distributed memory FP-Growth
algorithms, it is pivotal to consider fault tolerant FP-Growth, which can
address the increasing fault rates in large scale systems. In this work, we
propose a novel parallel, algorithm-level fault-tolerant FP-Growth algorithm.
We leverage algorithmic properties and MPI advanced features to guarantee an
O(1) space complexity, achieved by using the dataset memory space itself for
checkpointing. We also propose a recovery algorithm that can use in-memory and
disk-based checkpointing, though in many cases the recovery can be completed
without any disk access, and incurring no memory overhead for checkpointing. We
evaluate our FT algorithm on a large scale InfiniBand cluster with several
large datasets using up to 2K cores. Our evaluation demonstrates excellent
efficiency for checkpointing and recovery in comparison to the disk-based
approach. We have also observed 20x average speed-up in comparison to Spark,
establishing that a well designed algorithm can easily outperform a solution
based on a general fault-tolerant programming model.

Remarks to the paper "Control system for reducing energy consumption in backbone computer network" from "Concurrency and Computation: Practice and Experience" journal

Andrzej Karbowski
Comments: 5 pages. arXiv admin note: substantial text overlap with arXiv:1610.02551
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

This paper indicates two errors in the formulation of the main optimization
model in the article “Control system for reducing energy consumption in
backbone computer network” by Niewiadomska-Szynkiewicz et al. and shows how to
fix them.

A New Perspective on Randomized Gossip Algorithms

Nicolas Loizou, Peter Richtárik
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Optimization and Control (math.OC)

In this short note we propose a new approach for the design and analysis of
randomized gossip algorithms which can be used to solve the average consensus
problem. We show how that Randomized Block Kaczmarz (RBK) method – a method for
solving linear systems – works as gossip algorithm when applied to a special
system encoding the underlying network. The famous pairwise gossip algorithm
arises as a special case. Subsequently, we reveal a hidden duality of
randomized gossip algorithms, with the dual iterative process maintaining a set
of numbers attached to the edges as opposed to nodes of the network. We prove
that RBK obtains a superlinear speedup in the size of the block, and
demonstrate this effect through experiments.

Analysis and Modeling of Social Influence in High Performance Computing Workloads

Shuai Zheng, Zon-Yin Shae, Xiangliang Zhang, Hani Jamjoom, Liana Fong
Comments: The 17th International European Conference on Parallel and Distributed Computing, Euro-Par 2011
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI)

Social influence among users (e.g., collaboration on a project) creates
bursty behavior in the underlying high performance computing (HPC) workloads.
Using representative HPC and cluster workload logs, this paper identifies,
analyzes, and quantifies the level of social influence across HPC users. We
show the existence of a social graph that is characterized by a pattern of
dominant users and followers. This pattern also follows a power-law
distribution, which is consistent with those observed in mainstream social
networks. Given its potential impact on HPC workloads prediction and
scheduling, we propose a fast-converging, computationally-efficient online
learning algorithm for identifying social groups. Extensive evaluation shows
that our online algorithm can (1) quickly identify the social relationships by
using a small portion of incoming jobs and (2) can efficiently track group
evolution over time.

A Distributed Parallel Algorithm for Minimum Spanning Tree Problem

Artem Mazeev, Alexander Semenov, Alexey Simonov
Comments: 11 pages, 5 figures, 2 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In this paper we present and evaluate a parallel algorithm for solving a
minimum spanning tree (MST) problem for supercomputers with distributed memory.
The algorithm relies on the relaxation of the message processing order
requirement for one specific message type compared to the original GHS
(Gallager, Humblet, Spira) algorithm. Our algorithm adopts hashing and message
compression optimization techniques as well. To the best of our knowledge, this
is the first parallel implementation of the GHS algorithm that linearly scales
to more than 32 nodes (256 cores) of Infiniband cluster.

Improving Server Utilization in a Distributed Computing Set-up with Independent Clients

Anindya S. Chakrabarti, Diptesh Ghosh
Comments: 16 pages, 8 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We consider a set-up in which there are multiple servers and multiple clients
in a large distributed computing system. Clients request servers to process
jobs. Servers can only process one job in unit time. There is no coordinating
agent to route client requests to servers, and clients choose servers
independently and simultaneously, and only have access to the outcomes of their
own past requests. If more than one clients choose the same server, then only
one randomly chosen client’s requests will be fulfilled. If some servers do not
receive any request, they remain idle. In this paper, we show that a large
category of strategies are not effective in terms of server utilization. We
devise strategies for clients that improve server utilization of such systems
over those of strategies known in the current literature.

Decentralized Collaborative Learning of Personalized Models over Networks

Paul Vanhaesebrouck, Aurélien Bellet, Marc Tommasi
Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (cs.SY); Machine Learning (stat.ML)

We consider a set of learning agents in a collaborative peer-to-peer network,
where each agent learns a personalized model according to its own learning
objective. The question addressed in this paper is: how can agents improve upon
their locally trained model by communicating with other agents that have
similar objectives? We introduce and analyze two asynchronous gossip algorithms
running in a fully decentralized manner. Our first approach, inspired from
label propagation, aims to smooth pre-trained local models over the network
while accounting for the confidence that each agent has in its initial model.
In our second approach, agents jointly learn and propagate their model by
making iterative updates based on both their local dataset and the behavior of
their neighbors. Our algorithm to optimize this challenging objective in a
decentralized way is based on ADMM.

Efficient Random Sampling – Parallel, Vectorized, Cache-Efficient, and Online

Peter Sanders, Sebastian Lamm, Lorenz Hübschle-Schneider, Emanuel Schrade, Carsten Dachsbacher
Subjects: Mathematical Software (cs.MS); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

We consider the problem of sampling $n$ numbers from the range
${1,ldots,N}$ without replacement on modern architectures. The main result
is a simple divide-and-conquer scheme that makes sequential algorithms more
cache efficient and leads to a parallel algorithm running in expected time
$mathcal{O}left(n/p+log p
ight)$ on $p$ processors. The amount of
communication between the processors is very small and independent of the
sample size. We also discuss modifications needed for load balancing, reservoir
sampling, online sampling, sampling with replacement, Bernoulli sampling, and
vectorization on SIMD units or GPUs.

Fault Detection Engine in Intelligent Predictive Analytics Platform for DCIM

With the advancement of huge data generation and data handling capability,
Machine Learning and Probabilistic modelling enables an immense opportunity to
employ predictive analytics platform in high security critical industries
namely data centers, electricity grids, utilities, airport etc. where downtime
minimization is one of the primary objectives. This paper proposes a novel,
complete architecture of an intelligent predictive analytics platform, Fault
Engine, for huge device network connected with electrical/information flow.
Three unique modules, here proposed, seamlessly integrate with available
technology stack of data handling and connect with middleware to produce online
intelligent prediction in critical failure scenarios. The Markov Failure module
predicts the severity of a failure along with survival probability of a device
at any given instances. The Root Cause Analysis model indicates probable
devices as potential root cause employing Bayesian probability assignment and
topological sort. Finally, a community detection algorithm produces correlated
clusters of device in terms of failure probability which will further narrow
down the search space of finding route cause. The whole Engine has been tested
with different size of network with simulated failure environments and shows
its potential to be scalable in real-time implementation.

Learning

Decentralized Collaborative Learning of Personalized Models over Networks

Paul Vanhaesebrouck, Aurélien Bellet, Marc Tommasi
Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (cs.SY); Machine Learning (stat.ML)

We consider a set of learning agents in a collaborative peer-to-peer network,
where each agent learns a personalized model according to its own learning
objective. The question addressed in this paper is: how can agents improve upon
their locally trained model by communicating with other agents that have
similar objectives? We introduce and analyze two asynchronous gossip algorithms
running in a fully decentralized manner. Our first approach, inspired from
label propagation, aims to smooth pre-trained local models over the network
while accounting for the confidence that each agent has in its initial model.
In our second approach, agents jointly learn and propagate their model by
making iterative updates based on both their local dataset and the behavior of
their neighbors. Our algorithm to optimize this challenging objective in a
decentralized way is based on ADMM.

Risk-Aware Algorithms for Adversarial Contextual Bandits

Wen Sun, Debadeepta Dey, Ashish Kapoor
Comments: 28 pages
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

In this work we consider adversarial contextual bandits with risk
constraints. At each round, nature prepares a context, a cost for each arm, and
additionally a risk for each arm. The learner leverages the context to pull an
arm and then receives the corresponding cost and risk associated with the
pulled arm. In addition to minimizing the cumulative cost, the learner also
needs to satisfy long-term risk constraints — the average of the cumulative
risk from all pulled arms should not be larger than a pre-defined threshold. To
address this problem, we first study the full information setting where in each
round the learner receives an adversarial convex loss and a convex constraint.
We develop a meta algorithm leveraging online mirror descent for the full
information setting and extend it to contextual bandit with risk constraints
setting using expert advice. Our algorithms can achieve near-optimal regret in
terms of minimizing the total cost, while successfully maintaining a sublinear
growth of cumulative risk constraint violation.

Efficient Metric Learning for the Analysis of Motion Data

Babak Hosseini, Barbara Hammer
Comments: Pre-Print of DSAA2015 conference paper, Presented at the Data Science and Advanced Analytics (DSAA), Paris, France (2015)
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We investigate metric learning in the context of dynamic time warping (DTW),
the by far most popular dissimilarity measure used for the comparison and
analysis of motion capture data. While metric learning enables a
problem-adapted representation of data, the majority of meth- ods has been
proposed for vectorial data only. In this contribution, we extend the popular
principle offered by the large margin nearest neighbours learner (LMNN) to DTW
by treating the resulting component-wise dissimilarity values as features. We
demonstrate, that this principle greatly enhances the classification accuracy
in several benchmarks. Further, we show that recent auxiliary concepts such as
metric regularisation can be transferred from the vectorial case to
component-wise DTW in a similar way. We illustrate, that metric regularisation
constitutes a crucial prerequisite for the interpretation of the resulting
relevance profiles.

Wind ramp event prediction with parallelized Gradient Boosted Regression Trees

Saurav Gupta, Nitin Anand Shrivastava, Abbas Khosravi, Bijaya Ketan Panigrahi
Comments: IJCNN 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Accurate prediction of wind ramp events is critical for ensuring the
reliability and stability of the power systems with high penetration of wind
energy. This paper proposes a classification based approach for estimating the
future class of wind ramp event based on certain thresholds. A parallelized
gradient boosted regression tree based technique has been proposed to
accurately classify the normal as well as rare extreme wind power ramp events.
The model has been validated using wind power data obtained from the National
Renewable Energy Laboratory database. Performance comparison with several
benchmark techniques indicates the superiority of the proposed technique in
terms of superior classification accuracy.

Convergence rate of stochastic k-means

Cheng Tang, Claire Monteleoni
Subjects: Learning (cs.LG)

We analyze online and mini-batch k-means variants. Both scale up the widely
used Lloyd ‘s algorithm via stochastic approximation, and have become popular
for large-scale clustering and unsupervised feature learning. We show, for the
first time, that they have global convergence towards local optima at
$O(frac{1}{t})$ rate under general conditions. In addition, we show if the
dataset is clusterable, with suitable initialization, mini-batch k-means
converges to an optimal k-means solution with $O(frac{1}{t})$ convergence rate
with high probability. The k-means objective is non-convex and
non-differentiable: we exploit ideas from non-convex gradient-based
optimization by providing a novel characterization of the trajectory of k-means
algorithm on its solution space, and circumvent its non-differentiability via
geometric insights about k-means update.

Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering

Bo Yang, Xiao Fu, Nicholas D. Sidiropoulos, Mingyi Hong
Comments: Main paper: 9 pages; Supplementary material: 3 pages
Subjects: Learning (cs.LG)

Most learning approaches treat dimensionality reduction (DR) and clustering
separately (i.e., sequentially), but recent research has shown that optimizing
the two tasks jointly can substantially improve the performance of both. The
premise behind the latter genre is that the data samples are obtained via
linear transformation of latent representations that are easy to cluster; but
in practice, the transformation from the latent space to the data can be more
complicated. In this work, we assume that this transformation is an unknown and
possibly nonlinear function. To recover the `clustering-friendly’ latent
representations and to better cluster the data, we propose a joint DR and
K-means clustering approach in which DR is accomplished via learning a deep
neural network (DNN). The motivation is to keep the advantages of jointly
optimizing the two tasks, while exploiting the deep neural network’s ability to
approximate any nonlinear function. This way, the proposed approach can work
well for a broad class of generative models. Towards this end, we carefully
design the DNN structure and the associated joint optimization criterion, and
propose an effective and scalable algorithm to handle the formulated
optimization problem. Experiments using five different real datasets are
employed to showcase the effectiveness of the proposed approach.

Similarity Learning for Time Series Classification

Maria-Irina Nicolae, Éric Gaussier, Amaury Habrard, Marc Sebban
Comments: Techreport
Subjects: Learning (cs.LG)

Multivariate time series naturally exist in many fields, like energy,
bioinformatics, signal processing, and finance. Most of these applications need
to be able to compare these structured data. In this context, dynamic time
warping (DTW) is probably the most common comparison measure. However, not much
research effort has been put into improving it by learning. In this paper, we
propose a novel method for learning similarities based on DTW, in order to
improve time series classification. Making use of the uniform stability
framework, we provide the first theoretical guarantees in the form of a
generalization bound for linear classification. The experimental study shows
that the proposed approach is efficient, while yielding sparse classifiers.

Generalization of metric classification algorithms for sequences classification and labelling

Roman Samarev, Andrey Vasnetsov, Elizaveta Smelkova
Subjects: Learning (cs.LG); Computation and Language (cs.CL)

The article deals with the issue of modification of metric classification
algorithms. In particular, it studies the algorithm k-Nearest Neighbours for
its application to sequential data. A method of generalization of metric
classification algorithms is proposed. As a part of it, there has been
developed an algorithm for solving the problem of classification and labelling
of sequential data. The advantages of the developed algorithm of classification
in comparison with the existing one are also discussed in the article. There is
a comparison of the effectiveness of the proposed algorithm with the algorithm
of CRF in the task of chunking in the open data set CoNLL2000.

A Closed Form Solution to Multi-View Low-Rank Regression

Shuai Zheng, Xiao Cai, Chris Ding, Feiping Nie, Heng Huang
Comments: Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI, 2015
Subjects: Learning (cs.LG)

Real life data often includes information from different channels. For
example, in computer vision, we can describe an image using different image
features, such as pixel intensity, color, HOG, GIST feature, SIFT features,
etc.. These different aspects of the same objects are often called multi-view
(or multi-modal) data. Low-rank regression model has been proved to be an
effective learning mechanism by exploring the low-rank structure of real life
data. But previous low-rank regression model only works on single view data. In
this paper, we propose a multi-view low-rank regression model by imposing
low-rank constraints on multi-view regression model. Most importantly, we
provide a closed-form solution to the multi-view low-rank regression model.
Extensive experiments on 4 multi-view datasets show that the multi-view
low-rank regression model outperforms single-view regression model and reveals
that multi-view low-rank structure is very helpful.

A probabilistic model for the numerical solution of initial value problems

Michael Schober, Simo Särkkä, Philipp Hennig
Comments: 18 pages, 5 figures
Subjects: Numerical Analysis (math.NA); Learning (cs.LG); Numerical Analysis (cs.NA); Machine Learning (stat.ML)

Like many numerical methods, solvers for initial value problems (IVPs) on
ordinary differential equations estimate an analytically intractable quantity,
using the results of tractable computations as inputs. This structure is
closely connected to the notion of inference on latent variables in statistics.
We describe a class of algorithms that formulate the solution to an IVP as
inference on a latent path that is a draw from a Gaussian process probability
measure (or equivalently, the solution of a linear stochastic differential
equation). We then show that certain members of this class are identified
exactly with existing generalized linear methods for ODEs, in particular a
number of Runge–Kutta methods and Nordsieck methods. This probabilistic
formulation of classic methods is valuable in two ways: analytically, it
highlights implicit prior assumptions favoring certain approximate solutions to
the IVP over others, and gives a precise meaning to the old observation that
these methods act like filters. Practically, it endows the classic solvers with
`docking points’ for notions of uncertainty and prior information about the
initial value, the value of the ODE itself, and the solution of the problem.

BET on Independence

Kai Zhang
Subjects: Statistics Theory (math.ST); Learning (cs.LG); Computation (stat.CO); Methodology (stat.ME); Machine Learning (stat.ML)

We study the problem of model-free dependence detection. This problem can be
difficult even when the marginal distributions are known. We explain this
difficulty by showing the impossibility to uniformly consistently distinguish
degeneracy from independence with any single test. To make model-free
dependence detection a tractable problem, we introduce the concept of binary
expansion statistics (BEStat) and propose the binary expansion testing (BET)
framework. Through simple mathematics, we convert the dependence detection
problem to a multiple testing problem. Besides being model-free, the BET also
enjoys many other advantages which include (1) invariance to monotone marginal
transformations, (2) clear interpretability of local relationships upon
rejection, and (3) close connections to computing for efficient algorithms. We
illustrate the BET by studying the distribution of the brightest stars in the
night sky.

The Peaking Phenomenon in Semi-supervised Learning

Jesse H. Krijthe, Marco Loog
Comments: 11 pages, 5 figures. S+SSPR 2016, M’erida, Mexico
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

For the supervised least squares classifier, when the number of training
objects is smaller than the dimensionality of the data, adding more data to the
training set may first increase the error rate before decreasing it. This,
possibly counterintuitive, phenomenon is known as peaking. In this work, we
observe that a similar but more pronounced version of this phenomenon also
occurs in the semi-supervised setting, where instead of labeled objects,
unlabeled objects are added to the training set. We explain why the learning
curve has a more steep incline and a more gradual decline in this setting
through simulation studies and by applying an approximation of the learning
curve based on the work by Raudys & Duin.

Lazifying Conditional Gradient Algorithms

Gábor Braun, Sebastian Pokutta, Daniel Zink
Comments: 18 pages and 16 pages of computational results
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG)

Conditional gradient algorithms (also often called Frank-Wolfe algorithms)
are popular due to their simplicity of only requiring a linear optimization
oracle and more recently they also gained significant traction for online
learning. While simple in principle, in many cases the actual implementation of
the linear optimization oracle is costly. We show a general method to lazify
various conditional gradient algorithms, which in actual computations leads to
several orders of magnitude of speedup in wall-clock time. This is achieved by
using a faster separation oracle instead of a linear optimization oracle,
relying only on few linear optimization oracle calls.

Encoding the Local Connectivity Patterns of fMRI for Cognitive State Classification

Itir Onal Ertugrul, Mete Ozay, Fatos T. Yarman Vural
Comments: 8 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

In this work, we propose a novel framework to encode the local connectivity
patterns of brain, using Fisher Vectors (FV), Vector of Locally Aggregated
Descriptors (VLAD) and Bag-of-Words (BoW) methods. We first obtain local
descriptors, called Mesh Arc Descriptors (MADs) from fMRI data, by forming
local meshes around anatomical regions, and estimating their relationship
within a neighborhood. Then, we extract a dictionary of relationships, called
extit{brain connectivity dictionary} by fitting a generative Gaussian mixture
model (GMM) to a set of MADs, and selecting the codewords at the mean of each
component of the mixture. Codewords represent the connectivity patterns among
anatomical regions. We also encode MADs by VLAD and BoW methods using the
k-Means clustering.

We classify the cognitive states of Human Connectome Project (HCP) task fMRI
dataset, where we train support vector machines (SVM) by the encoded MADs.
Results demonstrate that, FV encoding of MADs can be successfully employed for
classification of cognitive tasks, and outperform the VLAD and BoW
representations. Moreover, we identify the significant Gaussians in mixture
models by computing energy of their corresponding FV parts, and analyze their
effect on classification accuracy. Finally, we suggest a new method to
visualize the codewords of brain connectivity dictionary.

Probabilistic Dimensionality Reduction via Structure Learning

Li Wang
Comments: 32 pages, 6 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We propose a novel probabilistic dimensionality reduction framework that can
naturally integrate the generative model and the locality information of data.
Based on this framework, we present a new model, which is able to learn a
smooth skeleton of embedding points in a low-dimensional space from
high-dimensional noisy data. The formulation of the new model can be
equivalently interpreted as two coupled learning problem, i.e., structure
learning and the learning of projection matrix. This interpretation motivates
the learning of the embedding points that can directly form an explicit graph
structure. We develop a new method to learn the embedding points that form a
spanning tree, which is further extended to obtain a discriminative and compact
feature representation for clustering problems. Unlike traditional clustering
methods, we assume that centers of clusters should be close to each other if
they are connected in a learned graph, and other cluster centers should be
distant. This can greatly facilitate data visualization and scientific
discovery in downstream analysis. Extensive experiments are performed that
demonstrate that the proposed framework is able to obtain discriminative
feature representations, and correctly recover the intrinsic structures of
various real-world datasets.

Dynamic Stacked Generalization for Node Classification on Networks

Zhen Han, Alyson Wilson
Comments: 9 pages, 6 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Social and Information Networks (cs.SI); Applications (stat.AP)

We propose a novel stacked generalization (stacking) method as a dynamic
ensemble technique using a pool of heterogeneous classifiers for node label
classification on networks. The proposed method assigns component models a set
of functional coefficients, which can vary smoothly with certain topological
features of a node. Compared to the traditional stacking model, the proposed
method can dynamically adjust the weights of individual models as we move
across the graph and provide a more versatile and significantly more accurate
stacking model for label prediction on a network. We demonstrate the benefits
of the proposed model using both a simulation study and real data analysis.

Sample Efficient Optimization for Learning Controllers for Bipedal Locomotion

Rika Antonova, Akshara Rai, Christopher G. Atkeson
Comments: To appear in International Conference on Humanoid Robots (Humanoids ‘2016), IEEE-RAS. (Rika Antonova and Akshara Rai contributed equally)
Subjects: Robotics (cs.RO); Learning (cs.LG)

Learning policies for bipedal locomotion can be difficult, as experiments are
expensive and simulation does not usually transfer well to hardware. To counter
this, we need al- gorithms that are sample efficient and inherently safe.
Bayesian Optimization is a powerful sample-efficient tool for optimizing
non-convex black-box functions. However, its performance can degrade in higher
dimensions. We develop a distance metric for bipedal locomotion that enhances
the sample-efficiency of Bayesian Optimization and use it to train a 16
dimensional neuromuscular model for planar walking. This distance metric
reflects some basic gait features of healthy walking and helps us quickly
eliminate a majority of unstable controllers. With our approach we can learn
policies for walking in less than 100 trials for a range of challenging
settings. In simulation, we show results on two different costs and on various
terrains including rough ground and ramps, sloping upwards and downwards. We
also perturb our models with unknown inertial disturbances analogous with
differences between simulation and hardware. These results are promising, as
they indicate that this method can potentially be used to learn control
policies on hardware.

An Adaptive Test of Independence with Analytic Kernel Embeddings

Wittawat Jitkrittum, Zoltan Szabo, Arthur Gretton
Comments: 8 pages of main text
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

A new computationally efficient dependence measure, and an adaptive
statistical test of independence, are proposed. The dependence measure is the
difference between analytic embeddings of the joint distribution and the
product of the marginals, evaluated at a finite set of locations (features).
These features are chosen so as to maximize a lower bound on the test power,
resulting in a test that is data-efficient, and that runs in linear time (with
respect to the sample size n). The optimized features can be interpreted as
evidence to reject the null hypothesis, indicating regions in the joint domain
where the joint distribution and the product of the marginals differ most.
Consistency of the independence test is established, for an appropriate choice
of features. In real-world benchmarks, independence tests using the optimized
features perform comparably to the state-of-the-art quadratic-time HSIC test,
and outperform competing O(n) and O(n log n) tests.

Simultaneous Learning of Trees and Representations for Extreme Classification, with Application to Language Modeling

Yacine Jernite, Anna Choromanska, David Sontag, Yann LeCun
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)

This paper addresses the problem of multi-class classification with an
extremely large number of classes, where the class predictor is learned jointly
with the data representation, as is the case in language modeling problems. The
predictor admits a hierarchical structure, which allows for efficient handling
of settings that deal with a very large number of labels. The predictive power
of the model however can heavily depend on the structure of the tree. We
address this problem with an algorithm for tree construction and training that
is based on a new objective function which favors balanced and easily-separable
node partitions. We describe theoretical properties of this objective function
and show that it gives rise to a boosting algorithm for which we provide a
bound on classification error, i.e. we show that if the objective is weakly
optimized in the internal nodes of the tree, then our algorithm will amplify
this weak advantage to build a tree achieving any desired level of accuracy. We
apply the algorithm to the task of language modeling by re-framing conditional
density estimation as a variant of the hierarchical classification problem. We
empirically demonstrate on text data that the proposed approach leads to
high-quality trees in terms of perplexity and computational running time
compared to its non-hierarchical counterpart.

Playing FPS Games with Deep Reinforcement Learning

Guillaume Lample, Devendra Singh Chaplot
Comments: The authors contributed equally to this work
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

Advances in deep reinforcement learning have allowed autonomous agents to
perform well on Atari games, often outperforming humans, using only raw pixels
to make their decisions. However, most of these games take place in 2D
environments that are fully observable to the agent. In this paper, we present
the first architecture to tackle 3D environments in first-person shooter games,
that involve partially observable states. Typically, deep reinforcement
learning methods only utilize visual input for training. We present a method to
augment these models to exploit game feature information such as the presence
of enemies or items, during the training phase. Our model is trained to
simultaneously learn these features along with minimizing a Q-learning
objective, which is shown to dramatically improve the training speed and
performance of our agent. Our architecture is also modularized to allow
different models to be independently trained for different phases of the game.
We show that the proposed architecture substantially outperforms built-in AI
agents of the game as well as humans in deathmatch scenarios.

Information Theory

Achieving Perfect Location Privacy in Wireless Devices Using Anonymization

Zarrin Montazeri, Amir Houmansadr, Hossein Pishro-Nik
Comments: 12 pages, 3 figures
Subjects: Information Theory (cs.IT)

The popularity of mobile devices and location-based services (LBS) has
created great concern regarding the location privacy of their users.
Anonymization is a common technique that is often used to protect the location
privacy of LBS users. Here, we present an information-theoretic approach to
define the notion of perfect location privacy. We show how LBS’s should use the
anonymization method to ensure that their users can achieve perfect location
privacy. First, we assume that a user’s current location is independent from
her past locations. Using this i.i.d model, we show that if the pseudonym of
the user is changed before O(n^(2/r-1)) observations are made by the adversary
for that user, then the user has perfect location privacy. Here, n is the
number of the users in the network and r is the number of all possible
locations that users can go to. Next, we model users’ movements using Markov
chains to better model real-world movement patterns. We show that perfect
location privacy is achievable for a user if the user’s pseudonym is changed
before O(n^(2/|E|-r)) observations are collected by the adversary for the user,
where |E| is the number of edges in the user’s Markov chain model.

Estimation of Received Signal Strength Distribution for Smart Meters with Biased Measurement Data Set

Mathias Rønholt Kielgast, Anders Charly Rasmussen, Mathias Hjorth Laursen, Jimmy Jessen Nielsen, Petar Popovski, Rasmus Krigslund
Comments: Accepted for publication in IEEE Wireless Communications Letters
Subjects: Information Theory (cs.IT)

This letter presents an experimental study and a novel modelling approach of
the wireless channel of smart utility meters placed in basements or sculleries.
The experimental data consist of signal strength measurements of consumption
report packets. Since such packets are only registered if they can be decoded
by the receiver, the part of the signal strength distribution that falls below
the receiver sensitivity threshold is not observable. We combine a Rician
fading model with a bias function that captures the cut-off in the observed
signal strength measurements. Two sets of experimental data are analysed. It is
shown that the proposed method offers an approximation of the distribution of
the signal strength measurements that is better than a na”ive Rician fitting.

Improved bounds for sparse recovery from subsampled random convolutions

Shahar Mendelson, Holger Rauhut, Rachel Ward
Comments: 39 pages
Subjects: Information Theory (cs.IT); Probability (math.PR)

We study the recovery of sparse vectors from subsampled random convolutions
via $ell_1$-minimization. We consider the setup in which both the subsampling
locations as well as the generating vector are chosen at random. For a
subgaussian generator with independent entries, we improve previously known
estimates: if the sparsity $s$ is small enough, i.e.~$s lesssim
sqrt{n/log(n)}$, we show that $m gtrsim s log(en/s)$ measurements are
sufficient to recover $s$-sparse vectors in dimension $n$ with high
probability, matching the well-known condition for recovery from standard
Gaussian measurements. If $s$ is larger, then essentially $m geq s log^2(s)
log(log(s)) log(n)$ measurements are sufficient, again improving over
previous estimates. Moreover, we also provide robustness estimates for
measurement errors that are bounded in $ell_q$ for $q > 2$ — in particular,
allowing the case $q=infty$ which is important for quantized compressive
sensing. All these results are shown via $ell_q$-robust versions of the null
space property and for $q > 2$ they represent the first non-trivial bounds for
structured random matrices. As a crucial ingredient, our approach requires to
lower bound expressions of the form $inf_{v in V_r} | Gamma_v xi|_q$,
where $Gamma_v$ is a set of matrices indexed by unit norm $r$-sparse vectors
and $xi$ is a subgaussian random vector. This involves the combination of
small ball estimates with chaining techniques.

Joint Relay-User Beamforming Design in Full-Duplex Two-Way Relay Channel

Zhigang Wen, Shuai Wang, Xiaoqing Liu, Junwei Zou
Comments: To appear in IEEE Transactions on Vehicular Technology
Subjects: Information Theory (cs.IT)

A full-duplex two-way relay channel with multiple antennas is considered. For
this three-node network, the beamforming design needs to suppress
self-interference. While a traditional way is to apply zero-forcing for
self-interference mitigation, it may harm the desired signals. In this paper, a
design which reserves a fraction of self-interference is proposed by solving a
quality-of-service constrained beamforming design problem. Since the problem is
challenging due to the loop self-interference, a convergence-guaranteed
alternating optimization algorithm is proposed to jointly design the relay-user
beamformers. Numerical results show that the proposed scheme outperforms
zero-forcing method, and achieves a transmit power close to the ideal case.

An interleaver design for polar codes over slow fading channels

Saurabha R Tavildar
Comments: 3 pages, 4 figures
Subjects: Information Theory (cs.IT)

We consider the problem of using polar codes over slow fading wireless
channels. For design, we focus on a parallel slow fading channel with 2 blocks,
and polar codes with rate <= 1/2. Motivated by Arikan’s systematic polar code
construction, we propose an interleaver design for a general polar code. The
interleaver comprises of using the bit reversal of the order of polarized bit
channels. This interleaver is called a diversity interleaver. In addition to
the diversity interleaver, a diversity polar code is proposed to further
increase the diversity gain.

The proposed designs are evaluated via link simulations for AWGN and fading
channels. The simulation results show a performance close to the outage
probability (within 2 dB) and significant gains over using a random
interleaver.

Joint Multiuser Downlink Beamforming and Admission Control for Green Cloud-RANs with Limited Fronthaul Based on Mixed Integer Semi-definite Program

Zhi Yu, Ke Wang, Lei Chen, Hong Ji
Comments: 13 pages, 8 figures. This article has been submitted to IEEE Transactions on Mobile Computing
Subjects: Information Theory (cs.IT)

With the dense deployment of the remote radio heads (RRHs), the huge network
power consumption has become a great challenge for green cloud radio access
networks (Cloud-RANs), and multiuser downlink beamforming has been proposed as
a promising solution. Moreover, the increasing number of mobile users (MUs)
causes that admission control is essential for Cloud-RAN with limited fronthaul
capacity and predefined power budget. In this paper, we consider the problem of
joint multiuser downlink beamforming and admission control (JBAC) to enhance
the admitted MUs in the network and reduce the network power consumption, while
taking into account the Quality of Service requirements of the MUs, the power
budget constraints and fronthaul limitation. It is shown that the JBAC problem
is a mixed integer nonlinear problem, and still non-convex even though the
continuous relaxation is adopted. Therefore, we first transform the JBAC
problem into a Mixed-Integer Semidefinite Program. Then, we propose a bound
improving Branch and Bound algorithm to yield the near-optimal solution. For
practical application, a polynomial-time heuristic algorithm is proposed to
derive the sub-optimal solution. Extensive simulations are conducted with
different system configurations to show the effectiveness of the proposed two
schemes.

Joint Transmit Power and Relay Two-Way Beamforming Optimization for Energy-Harvesting Full-Duplex Communications

Alexander A. Okandeji, Muhammad R. A. Khandaker, Kai-Kit Wong, Zhongbin Zheng
Comments: Accepted for Publication. Globecom 2016
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

This paper studies the joint optimization problem of two-way relay
beamforming, the receiver power splitting (PS) ratio as well as the transmit
power at the sources to maximize the achievable sum-rate of a simultaneous
wireless information and power transfer (SWIPT) system with a full-duplex (FD)
multiple-input multiple-output (MIMO) amplify and forward (AF) relay, assuming
perfect channel state information (CSI). In particular, our contribution is an
iterative algorithm based on the difference of convex programming (DC) and one
dimensional searching to achieve the joint solution. Simulation results are
provided to demonstrate the effectiveness of the proposed algorithm.

Joint CoMP-Cell Selection and Resource Allocation with Fronthaul-Constrained C-RAN

Lei You, Di Yuan
Comments: 6 pages
Subjects: Information Theory (cs.IT)

Cloud-based Radio Access Network (C-RAN) is a promising architecture for
future cellular networks, in which Baseband Units (BBUs) are placed at a
centralized location, with capacity-constrained fronthaul connected to multiple
distributed Remote Radio Units (RRHs) that are far away from the BBUs. The
centralization of signal processing enables the flexibility for coordinated
multi-point transmission (CoMP) to meet high traffic demand of users. We
investigate how to jointly optimize CoMP-cell selection and base station
resource allocation so as to enhance the quality of service (QoS), subject to
the fronthaul capacity constraint in orthogonal frequency-division multiple
access (OFDMA) based C-RAN. The problem is proved to be NP-hard in this paper.
To deal with the computational complexity, we derive a partial optimality
condition as the foundation for designing a cell-selection algorithm. Besides,
we provide a solution method of the optimum of the time-frequency resource
allocation problem without loss of fairness on the QoS enhancement of all
users. The simulations show good performance of the proposed algorithms for
jointly optimizing the cell selection and resource allocation in a C-RAN, with
respect to QoS.

MIMO: State of the Art and the Future in Focus

Mboli Sechang Julius
Comments: This is a simple review on the state of the art of MIMO and the future. It tries to address what Massive MIMO which is the MIMO that is expected to work for 5G networks will look like. Another generation of wireless networks is expected by 2020. It is simply written for educational purpose and not any commercial use. It is intended to help those trying to build knowledge in this area
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

Antennas of transmitters and receivers have been manipulated to increase the
capacity of transmission and reception of signals. Using many elements in
antennas to shape beams and direct nulls in a particular point for optimum
signal transmission and reception has over decades, had tremendous positive
influence in received power and signal to noise ratio (SNR). However, since the
antenna elements manipulation can be done both at base station and device
terminal, it gives rise to an important method of using several antennas to put
and obtain signals to and from space with increased capacity. This principle is
termed Multiple-input and Multiple-output (MIMO). This paper discusses
application of MIMO in the state of the art and next generation of wireless
systems (5G). It also discusses four models of MIMO, SISO, SIMO, MISO and MIMO,
considering three method of combing the signals from multipath propagations,
Selection combining (SC), Equal gain combing (EGC) and maximum ratio combining
(MRC). Spatial diversity and spatial multiplexing are also discussed as form of
MIMO. Finally, Massive or Hyper MIMO which is a new method of increasing
transmission capacity by very large scale for fifth generation of wireless
system is discussed with its challenges and opportunities. Key terms-Diversity
combining techniques, spatial multiplexing, channel state information (CSI).
Massive MIMO

Quantum enhancement of randomness distribution

Raul Garcia-Patron, William Matthews, Andreas Winter
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

The capability of a given channel to communicate information is, a priori,
distinct from its capability to distribute shared randomness. In this article
we define randomness distribution capacities of quantum channels assisted by
forward, back, or two-way classical communication and compare these to the
corresponding communication capacities. With forward assistance or no
assistance, we find that they are equal. We establish the mutual information of
the channel as an upper bound on the two-way assisted randomness distribution
capacity. This implies that all of the capacities are equal for
classical-quantum channels. On the other hand, we show that the back-assisted
randomness distribution capacity of a quantum-classical channels is equal to
its mutual information. This is often strictly greater than the back-assisted
communication capacity. We give an explicit example of such a separation where
the randomness distribution protocol is noiseless.

New Algorithms for Maximizing Cellular Wireless Network Energy Efficiency

Kemal Davaslioglu, Cemil Can Coskun, Ender Ayanoglu
Comments: Information Theory and Applications (ITA) Workshop 2016
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

In this paper, we aim to maximize the energy efficiency of cellular wireless
networks. Specifically, we address the power allocation problem in multi-cell
multi-carrier systems. Considering realistic base station power consumption
models, we formulate a network-wide energy efficiency maximization problem.
Using tools from fractional programming, we cast this problem in the framework
of bi-criterion optimization where rate maximization and power minimization are
weighted accordingly. Interference pricing mechanism is applied to reduce the
intercell interference and to achieve a higher network performance. We
decompose the main problem into subproblems via dual decomposition. These
subproblems are independently solved per sector using limited information
exchange between base stations. We first derive our expressions and present
algorithms for the single-tier networks. Then, we extend our analysis to
two-tier networks where picocell base stations are deployed to improve the
network performance and reduce the link distances. Lastly, we extend our
framework and include the quality-of-service constraints.We obtain closed-form
expressions for the power level updates which are determined by the multi-level
water-filling algorithm, or, as it is sometimes called as, the modified
waterfilling algorithm. Based on our simulation results, we demonstrate that
the proposed algorithms can outperform the benchmark approaches in terms of
energy efficiency by a factor of 2.7.

Uplink-Based Framework for Control Plane Applications in 5G mmWave Cellular Networks

Marco Giordani, Marco Mezzavilla, Sundeep Rangan, Michele Zorzi
Comments: Submitted for publication in IEEE Transactions on Wireless Communications (TWC)
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

The millimeter wave (mmWave) frequencies offer the potential of orders of
magnitude increases in capacity for next-generation cellular systems. However,
links in mmWave networks are susceptible to blockage and may suffer from rapid
variations in quality. Connectivity to multiple cells – in the mmWave and in
the traditional frequencies – is considered essential for robust connectivity.
One of the challenges in supporting multi-connectivity in mmWaves is the
requirement for the network to track the direction of each link in addition to
its power and timing. To address this challenge, this paper proposes a novel
uplink measurement system based on (i) the UE transmitting sounding signals in
directions that sweep the angular space, (ii) the mmWave cells measuring the
instantaneous received signal strength along with its variance to capture the
dynamics and the reliability of a channel/direction and, finally, (iii) a
centralized controller making scheduling decisions based on the mmWave cell
reports and transmitting the decisions either via a mmWave cell or a
conventional LTE cell (when the paths are not available). We argue that the
proposed scheme enables fair and robust cell selection, in addition to
efficient and periodical tracking of the user, in the presence of the channel
variability expected at mmWaves.

A New Perspective on Randomized Gossip Algorithms

Nicolas Loizou, Peter Richtárik
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Optimization and Control (math.OC)

In this short note we propose a new approach for the design and analysis of
randomized gossip algorithms which can be used to solve the average consensus
problem. We show how that Randomized Block Kaczmarz (RBK) method – a method for
solving linear systems – works as gossip algorithm when applied to a special
system encoding the underlying network. The famous pairwise gossip algorithm
arises as a special case. Subsequently, we reveal a hidden duality of
randomized gossip algorithms, with the dual iterative process maintaining a set
of numbers attached to the edges as opposed to nodes of the network. We prove
that RBK obtains a superlinear speedup in the size of the block, and
demonstrate this effect through experiments.