Ye Tian, Ran Cheng, Xingyi Zhang, Yaochu Jin
Comments: 20 pages, 12 figures, 4 tables
Subjects: Neural and Evolutionary Computing (cs.NE)
Over the last three decades, a large number of evolutionary algorithms have
been developed for solving multiobjective optimization problems. However, there
lacks an up-to-date and comprehensive software platform for researchers to
properly benchmark existing algorithms and for practitioners to apply selected
algorithms to solve their real-world problems. The demand of such a common tool
becomes even more urgent, when the source code of many proposed algorithms has
not been made publicly available. To address these issues, we have developed a
MATLAB platform for evolutionary multi-objective optimization in this paper,
called PlatEMO, which includes more than 50 multi-objective evolutionary
algorithms and more than 100 multi-objective test problems, along with several
widely used performance indicators. With a user-friendly graphical user
interface, PlatEMO enables users to easily compare several evolutionary
algorithms at one time and collect statistical results in Excel or LaTeX files.
More importantly, PlatEMO is completely open source, such that users are able
to develop new algorithms on the basis of it. This paper introduces the main
features of PlatEMO and illustrates how to use it for performing comparative
experiments, embedding new algorithms, creating new test problems, and
developing performance indicators. Source code of PlatEMO is now available at:
this http URL
Yanghao Li, Naiyan Wang, Jiaying Liu, Xiaodi Hou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Neural Style Transfer has recently demonstrated very exciting results which
catches eyes in both academia and industry. Despite the amazing results, the
principle of neural style transfer, especially why the Gram matrices could
represent style remains unclear. In this paper, we propose a novel
interpretation of neural style transfer by treating it as a domain adaptation
problem. Specifically, we theoretically show that matching the Gram matrices of
feature maps is equivalent to minimize the Maximum Mean Discrepancy (MMD) with
the second order polynomial kernel. Thus, we argue that the essence of neural
style transfer is to match the feature distributions between the style images
and the generated images. To further support our standpoint, we experiment with
several other distribution alignment methods, and achieve appealing results. We
believe this novel interpretation connects these two important research fields,
and could enlighten future researches.
Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O'Connor, Jordi Torres, Elisa Sayrol, Xavier Giro-i-Nieto
Comments: Our results can be reproduced with the source code and trained models available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce SalGAN, a deep convolutional neural network for visual saliency
prediction trained with adversarial examples. The first stage of the network
consists of a generator model whose weights are learned by back-propagation
computed from a binary cross entropy (BCE) loss over downsampled versions of
the saliency maps. The resulting prediction is processed by a discriminator
network trained to solve a binary classification task between the saliency maps
generated by the generative stage and the ground truth ones. Our experiments
show how adversarial training allows reaching state-of-the-art performance
across different metrics when combined with a widely-used loss function like
BCE.
Monit Shah Singh, Vinaychandran Pondenkandath, Bo Zhou, Paul Lukowicz, Marcus Liwicki
Comments: 8 pages, 8 figures, under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Convolutional Neural Networks (CNNs) have become the state-of-the-art in
various computer vision tasks, but they are still premature for most sensor
data, especially in pervasive and wearable computing. A major reason for this
is the limited amount of annotated training data. In this paper, we propose the
idea of leveraging the discriminative power of pre-trained deep CNNs on
2-dimensional sensor data by transforming the sensor modality to the visual
domain. By three proposed strategies, 2D sensor output is converted into
pressure distribution imageries. Then we utilize a pre-trained CNN for transfer
learning on the converted imagery data. We evaluate our method on a gait
dataset of floor surface pressure mapping. We obtain a classification accuracy
of 87.66%, which outperforms the conventional machine learning methods by over
10%.
Yanghao Li, Naiyan Wang, Jiaying Liu, Xiaodi Hou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Neural Style Transfer has recently demonstrated very exciting results which
catches eyes in both academia and industry. Despite the amazing results, the
principle of neural style transfer, especially why the Gram matrices could
represent style remains unclear. In this paper, we propose a novel
interpretation of neural style transfer by treating it as a domain adaptation
problem. Specifically, we theoretically show that matching the Gram matrices of
feature maps is equivalent to minimize the Maximum Mean Discrepancy (MMD) with
the second order polynomial kernel. Thus, we argue that the essence of neural
style transfer is to match the feature distributions between the style images
and the generated images. To further support our standpoint, we experiment with
several other distribution alignment methods, and achieve appealing results. We
believe this novel interpretation connects these two important research fields,
and could enlighten future researches.
Wei Lian
Subjects: Computer Vision and Pattern Recognition (cs.CV)
To address the problem of 3D point matching where the poses of two point sets
are unknown, we adapt a recently proposed path following based method to use
similarity transformation instead of the original affine transformation. The
reduced number of transformation parameters leads to more constrained and
desirable matching results. Experimental results demonstrate better robustness
of the proposed method over state-of-the-art methods.
Michal Balazia, Petr Sojka
Comments: 13 pages. Full paper published at the 1st IAPR Workshop on Reproducible Research in Pattern Recognition, Cancun, Mexico, December 2016. Postprint. arXiv admin note: text overlap with arXiv:1609.04392, arXiv:1609.06936
Subjects: Computer Vision and Pattern Recognition (cs.CV)
As a contribution to reproducible research, this paper presents a framework
and a database to improve the development, evaluation and comparison of methods
for gait recognition from motion capture (MoCap) data. The evaluation framework
provides implementation details and source codes of state-of-the-art
human-interpretable geometric features as well as our own approaches where gait
features are learned by a modification of Fisher’s Linear Discriminant Analysis
with the Maximum Margin Criterion, and by a combination of Principal Component
Analysis and Linear Discriminant Analysis. It includes a description and source
codes of a mechanism for evaluating four class separability coefficients of
feature space and four rank-based classifier performance metrics. This
framework also contains a tool for learning a custom classifier and for
classifying a custom query on a custom gallery. We provide an experimental
database along with source codes for its extraction from the general CMU MoCap
database.
Wei Lian, Lei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Point matching refers to the process of finding spatial transformation and
correspondences between two sets of points. In this paper, we focus on the case
that there is only partial overlap between two point sets. Following the
approach of the robust point matching method, we model point matching as a
mixed linear assignment-least square problem and show that after eliminating
the transformation variable, the resulting problem of minimization with respect
to point correspondence is a concave optimization problem. Furthermore, this
problem has the property that the objective function can be converted into a
form with few nonlinear terms via a linear transformation. Based on these
properties, we employ the branch-and-bound (BnB) algorithm to optimize the
resulting problem where the dimension of the search space is small. To further
improve efficiency of the BnB algorithm where computation of the lower bound is
the bottleneck, we propose a new lower bounding scheme which has a
k-cardinality linear assignment formulation and can be efficiently solved.
Experimental results show that the proposed algorithm outperforms
state-of-the-art methods in terms of robustness to disturbances and point
matching accuracy.
Zhun Fan, Jiewei Lu, Wenji Li
Comments: 10 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Automated blood vessel segmentation plays an important role in the diagnosis
and treatment of various cardiovascular and ophthalmologic diseases. In this
paper, a novel unsupervised segmentation algorithm is proposed to extract blood
vessels from fundus images. At first, an enhanced vessel image is generated by
morphological reconstruction, and then divided into three parts: preliminary
vessel regions, background regions and undetermined regions. Secondly, a list
of region features of blood vessels are defined and used to remove the noise
regions in preliminary vessel regions and extract the skeletons of blood
vessels. An intermediate vessel image is obtained by combing the denoised
preliminary vessel regions with vessel skeletons. After that, a novel
hierarchical growth algorithm, namely HGA, is proposed to label each pixel in
undetermined regions as vessel or not in an incremental way, which takes the
advantage of color and spatial information from the intermediate vessel image
and background regions. Finally, postprocessing is performed to remove the
nonvessel regions. The proposed algorithm has low computational complexity and
outperforms many other state-of-art supervised and unsupervised methods in
terms of accuracy. It achieves a vessel segmentation accuracy of 96.0%, 95.8%
and 95.1% in an average time of 10.72s, 15.74s and 50.71s on images from three
publicly available retinal image datasets DRIVE, STARE, and CHASE DB1,
respectively.
Ding Liu, Zhaowen Wang, Nasser Nasrabadi, Thomas Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Single image super-resolution (SR) is an ill-posed problem which aims to
recover high-resolution (HR) images from their low-resolution (LR)
observations. The crux of this problem lies in learning the complex mapping
between low-resolution patches and the corresponding high-resolution patches.
Prior arts have used either a mixture of simple regression models or a single
non-linear neural network for this propose. This paper proposes the method of
learning a mixture of SR inference modules in a unified framework to tackle
this problem. Specifically, a number of SR inference modules specialized in
different image local patterns are first independently applied on the LR image
to obtain various HR estimates, and the resultant HR estimates are adaptively
aggregated to form the final HR image. By selecting neural networks as the SR
inference module, the whole procedure can be incorporated into a unified
network and be optimized jointly. Extensive experiments are conducted to
investigate the relation between restoration performance and different network
architectures. Compared with other current image SR approaches, our proposed
method achieves state-of-the-arts restoration results on a wide range of images
consistently while allowing more flexible design choices. The source codes are
available in this http URL
Yuki Itoh, Siwei Feng, Marco F. Duarte, Mario Parente
Comments: 15 pages, 11 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper proposes a new hyperspectral unmixing method for nonlinearly mixed
hyperspectral data using a semantic representation in a semi-supervised
fashion, assuming the availability of a spectral reference library. Existing
semi-supervised unmixing algorithms select members from an endmember library
that are present at each of the pixels; most such methods assume a linear
mixing model. However, those methods will fail in the presence of nonlinear
mixing among the observed spectra. To address this issue, we develop an
endmember selection method using a recently proposed semantic spectral
representation obtained via non-homogeneous hidden Markov chain (NHMC) model
for a wavelet transform of the spectra. The semantic representation can encode
spectrally discriminative features for any observed spectrum and, therefore,
our proposed method can perform endmember selection without any assumption on
the mixing model. Experimental results show that in the presence of
sufficiently nonlinear mixing our proposed method outperforms dictionary-based
sparse unmixing approaches based on linear models.
Zhipeng Jia, Xingyi Huang, Eric I-Chao Chang, Yan Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we develop a new weakly-supervised learning algorithm to learn
to segment cancerous regions in histopathology images. Our work is under a
multiple instance learning framework (MIL) with a new formulation, deep weak
supervision (DWS); we also propose an effective way to introduce constraints to
our neural networks to assist the learning process. The contributions of our
algorithm are threefold: (1) We build an end-to-end learning system that
segments cancerous regions with fully convolutional networks (FCN) in which
image-to-image weakly-supervised learning is performed. (2) We develop a deep
week supervision formulation to exploit multi-scale learning under weak
supervision within fully convolutional networks. (3) Constraints about positive
instances are introduced in our approach to effectively explore additional
weakly-supervised information that is easy to obtain and enjoys a significant
boost to the learning process. The proposed algorithm, abbreviated as DWS-MIL,
is easy to implement and can be trained efficiently. Our system demonstrates
state-of-the-art results on large-scale histopathology image datasets and can
be applied to various applications in medical imaging beyond histopathology
images such as MRI, CT, and ultrasound images.
Dmitry Krotov, John J Hopfield
Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)
Deep neural networks (DNN) trained in a supervised way suffer from two known
problems. First, the minima of the objective function used in learning
correspond to data points (also known as rubbish examples or fooling images)
that lack semantic similarity with the training data. Second, a clean input can
be changed by a small, and often imperceptible for human vision, perturbation,
so that the resulting deformed input is misclassified by the network. These
findings emphasize the differences between the ways DNN and humans classify
patterns, and raise a question of designing learning algorithms that more
accurately mimic human perception compared to the existing methods.
Our paper examines these questions within the framework of Dense Associative
Memory (DAM) models. These models are defined by the energy function, with
higher order (higher than quadratic) interactions between the neurons. We show
that in the limit when the power of the interaction vertex in the energy
function is sufficiently large, these models have the following three
properties. First, the minima of the objective function are free from rubbish
images, so that each minimum is a semantically meaningful pattern. Second,
artificial patterns poised precisely at the decision boundary look ambiguous to
human subjects and share aspects of both classes that are separated by that
decision boundary. Third, adversarial images constructed by models with small
power of the interaction vertex, which are equivalent to DNN with rectified
linear units (ReLU), fail to transfer to and fool the models with higher order
interactions. This opens up a possibility to use higher order models for
detecting and stopping malicious adversarial attacks. The presented results
suggest that DAM with higher order energy functions are closer to human visual
perception than DNN with ReLUs.
Roni Khardon, Scott Sanner
Subjects: Artificial Intelligence (cs.AI)
Lifted probabilistic inference (Poole, 2003) and symbolic dynamic programming
for lifted stochastic planning (Boutilier et al, 2001) were introduced around
the same time as algorithmic efforts to use abstraction in stochastic systems.
Over the years, these ideas evolved into two distinct lines of research, each
supported by a rich literature. Lifted probabilistic inference focused on
efficient arithmetic operations on template-based graphical models under a
finite domain assumption while symbolic dynamic programming focused on
supporting sequential decision-making in rich quantified logical action models
and on open domain reasoning. Given their common motivation but different focal
points, both lines of research have yielded highly complementary innovations.
In this chapter, we aim to help close the gap between these two research areas
by providing an overview of lifted stochastic planning from the perspective of
probabilistic inference, showing strong connections to other chapters in this
book. This also allows us to define Generalized Lifted Inference as a paradigm
that unifies these areas and elucidates open problems for future research that
can benefit both lifted inference and stochastic planning.
Nithyanand Kota, Abhishek Mishra, Sunil Srinivasa, Xi (Peter)
Chen, Pieter Abbeel
Subjects: Artificial Intelligence (cs.AI)
The high variance issue in unbiased policy-gradient methods such as VPG and
REINFORCE is typically mitigated by adding a baseline. However, the baseline
fitting itself suffers from the underfitting or the overfitting problem. In
this paper, we develop a K-fold method for baseline estimation in policy
gradient algorithms. The parameter K is the baseline estimation hyperparameter
that can adjust the bias-variance trade-off in the baseline estimates. We
demonstrate the usefulness of our approach via two state-of-the-art policy
gradient algorithms on three MuJoCo locomotive control tasks.
I. Boulkaibet, T. Marwala, M.I. Friswell, H. Haddad Khodaparast, S. Adhikari
Comments: This article was accepted by the 2017 International Modal Analysis Conference
Subjects: Artificial Intelligence (cs.AI)
In this paper, a non-probabilistic method based on fuzzy logic is used to
update finite element models (FEMs). Model updating techniques use the measured
data to improve the accuracy of numerical models of structures. However, the
measured data are contaminated with experimental noise and the models are
inaccurate due to randomness in the parameters. This kind of aleatory
uncertainty is irreducible, and may decrease the accuracy of the finite element
model updating process. However, uncertainty quantification methods can be used
to identify the uncertainty in the updating parameters. In this paper, the
uncertainties associated with the modal parameters are defined as fuzzy
membership functions, while the model updating procedure is defined as an
optimization problem at each {alpha}-cut level. To determine the membership
functions of the updated parameters, an objective function is defined and
minimized using two metaheuristic optimization algorithms: ant colony
optimization (ACO) and particle swarm optimization (PSO). A structural example
is used to investigate the accuracy of the fuzzy model updating strategy using
the PSO and ACO algorithms. Furthermore, the results obtained by the fuzzy
finite element model updating are compared with the Bayesian model updating
results.
Daniel Borchmann, Tom Hanika, Sergei Obiedkov
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Learning (cs.LG)
We revisit the notion of probably approximately correct implication bases
from the literature and present a first formulation in the language of formal
concept analysis, with the goal to investigate whether such bases represent a
suitable substitute for exact implication bases in practical use-cases. To this
end, we quantitatively examine the behavior of probably approximately correct
implication bases on artificial and real-world data sets and compare their
precision and recall with respect to their corresponding exact implication
bases. Using a small example, we also provide qualitative insight that
implications from probably approximately correct bases can still represent
meaningful knowledge from a given data set.
Christoph Hube, Frank Fischer, Robert Jäschke, Gerhard Lauer, Mads Rosendahl Thomsen
Comments: 33 pages, 6 figures, 6 tables
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Among the manifold takes on world literature, it is our goal to contribute to
the discussion from a digital point of view by analyzing the representation of
world literature in Wikipedia with its millions of articles in hundreds of
languages. As a preliminary, we introduce and compare three different
approaches to identify writers on Wikipedia using data from DBpedia, a
community project with the goal of extracting and providing structured
information from Wikipedia. Equipped with our basic set of writers, we analyze
how they are represented throughout the 15 biggest Wikipedia language versions.
We combine intrinsic measures (mostly examining the connectedness of articles)
with extrinsic ones (analyzing how often articles are frequented by readers)
and develop methods to evaluate our results. The better part of our findings
seems to convey a rather conservative, old-fashioned version of world
literature, but a version derived from reproducible facts revealing an implicit
literary canon based on the editing and reading behavior of millions of people.
While still having to solve some known issues, the introduced methods will help
us build an observatory of world literature to further investigate its
representativeness and biases.
Ryan Cotterell, Hinrich Schütze
Comments: Accepted at TACL
Subjects: Computation and Language (cs.CL)
Much like sentences are composed of words, words themselves are composed of
smaller units. For example, the English word questionably can be analyzed as
question+able+ly. However, this structural decomposition of the word does not
directly give us a semantic representation of the word’s meaning. Since
morphology obeys the principle of compositionality, the semantics of the word
can be systematically derived from the meaning of its parts. In this work, we
propose a novel probabilistic model of word formation that captures both the
analysis of a word w into its constituents segments and the synthesis of the
meaning of w from the meanings of those segments. Our model jointly learns to
segment words into morphemes and compose distributional semantic vectors of
those morphemes. We experiment with the model on English CELEX data and German
DerivBase (Zeller et al., 2013) data. We show that jointly modeling semantics
increases both segmentation accuracy and morpheme F1 by between 3% and 5%.
Additionally, we investigate different models of vector composition, showing
that recurrent neural networks yield an improvement over simple additive
models. Finally, we study the degree to which the representations correspond to
a linguist’s notion of morphological productivity.
Xuezhe Ma, Eduard Hovy
Comments: arXiv admin note: text overlap with arXiv:1603.01354
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we propose a probabilistic parsing model, which defines a
proper conditional probability distribution over non-projective dependency
trees for a given sentence, using neural representations as inputs. The neural
network architecture is based on bi-directional LSTM-CNNs which benefits from
both word- and character-level representations automatically, by using
combination of bidirectional LSTM and CNN. On top of the neural network, we
introduce a probabilistic structured layer, defining a conditional log-linear
model over non-projective trees. We evaluate our model on 17 different
datasets, across 14 different languages. By exploiting Kirchhoff’s Matrix-Tree
Theorem (Tutte, 1984), the partition functions and marginals can be computed
efficiently, leading to a straight-forward end-to-end model training procedure
via back-propagation. Our parser achieves state-of-the-art parsing performance
on 9 datasets.
Herman Kamper
Comments: PhD thesis, University of Edinburgh, 107 pages, submitted and accepted 2016
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
In settings where only unlabelled speech data is available, zero-resource
speech technology needs to be developed without transcriptions, pronunciation
dictionaries, or language modelling text. There are two central problems in
zero-resource speech processing: (i) finding frame-level feature
representations which make it easier to discriminate between linguistic units
(phones or words), and (ii) segmenting and clustering unlabelled speech into
meaningful units. In this thesis, we argue that a combination of top-down and
bottom-up modelling is advantageous in tackling these two problems.
To address the problem of frame-level representation learning, we present the
correspondence autoencoder (cAE), a neural network trained with weak top-down
supervision from an unsupervised term discovery system. By combining this
top-down supervision with unsupervised bottom-up initialization, the cAE yields
much more discriminative features than previous approaches. We then present our
unsupervised segmental Bayesian model that segments and clusters unlabelled
speech into hypothesized words. By imposing a consistent top-down segmentation
while also using bottom-up knowledge from detected syllable boundaries, our
system outperforms several others on multi-speaker conversational English and
Xitsonga speech data. Finally, we show that the clusters discovered by the
segmental Bayesian model can be made less speaker- and gender-specific by using
features from the cAE instead of traditional acoustic features.
In summary, the different models and systems presented in this thesis show
that both top-down and bottom-up modelling can improve representation learning,
segmentation and clustering of unlabelled speech data.
Amir Hossein Yazdavar, Monireh Ebrahimi, Naomie Salim
Comments: Text mining, Natural language processing, Sentiment analysis, Fuzzy set theory
Subjects: Computation and Language (cs.CL)
With the rapid growth of social media on the web, emotional polarity
computation has become a flourishing frontier in the text mining community.
However, it is challenging to understand the latest trends and summarize the
state or general opinions about products due to the big diversity and size of
social media data and this creates the need of automated and real time opinion
extraction and mining. On the other hand, the bulk of current research has been
devoted to study the subjective sentences which contain opinion keywords and
limited work has been reported for objective statements that imply sentiment.
In this paper, fuzzy based knowledge engineering model has been developed for
sentiment classification of special group of such sentences including the
change or deviation from desired range or value. Drug reviews are the rich
source of such statements. Therefore, in this research, some experiments were
carried out on patient’s reviews on several different cholesterol lowering
drugs to determine their sentiment polarity. The main conclusion through this
study is, in order to increase the accuracy level of existing drug opinion
mining systems, objective sentences which imply opinion should be taken into
account. Our experimental results demonstrate that our proposed model obtains
over 72 percent F1 value.
Christoph Hube, Frank Fischer, Robert Jäschke, Gerhard Lauer, Mads Rosendahl Thomsen
Comments: 33 pages, 6 figures, 6 tables
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Among the manifold takes on world literature, it is our goal to contribute to
the discussion from a digital point of view by analyzing the representation of
world literature in Wikipedia with its millions of articles in hundreds of
languages. As a preliminary, we introduce and compare three different
approaches to identify writers on Wikipedia using data from DBpedia, a
community project with the goal of extracting and providing structured
information from Wikipedia. Equipped with our basic set of writers, we analyze
how they are represented throughout the 15 biggest Wikipedia language versions.
We combine intrinsic measures (mostly examining the connectedness of articles)
with extrinsic ones (analyzing how often articles are frequented by readers)
and develop methods to evaluate our results. The better part of our findings
seems to convey a rather conservative, old-fashioned version of world
literature, but a version derived from reproducible facts revealing an implicit
literary canon based on the editing and reading behavior of millions of people.
While still having to solve some known issues, the introduced methods will help
us build an observatory of world literature to further investigate its
representativeness and biases.
Marcus Brandenburger, Christian Cachin, Matthias Lorenz, Rüdiger Kapitza
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Novel hardware-aided trusted execution environments, as provided by Intel’s
Software Guard Extensions (SGX), enable to execute applications in a secure
context that enforces confidentiality and integrity of the application state
even when the host system is misbehaving. While this paves the way towards
secure and trustworthy cloud computing, essential system support to protect
persistent application state against rollback and forking attacks is missing.
In this paper we present LCM – a lightweight protocol to establish a collective
memory amongst all clients of a remote application to detect integrity and
consistency violations. LCM enables the detection of rollback attacks against
the remote application, enforces the consistency notion of fork-linearizability
and notifies clients about operation stability. The protocol exploits the
trusted execution environment, complements it with simple client-side
operations, and maintains only small, constant storage at the clients. This
simplifies the solution compared to previous approaches, where the clients had
to verify all operations initiated by other clients. We have implemented LCM
and demonstrated its advantages with a key-value store application. The
evaluation shows that it introduces low network and computation overhead; in
particular, a LCM-protected key-value store achieves 0.8x – 1.1x of an
SGX-secured key-value store throughput.
Paul E. McKenney
Comments: 477 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The purpose of this book is to help you program shared-memory parallel
machines without risking your sanity. We hope that this book’s design
principles will help you avoid at least some parallel-programming pitfalls.
That said, you should think of this book as a foundation on which to build,
rather than as a completed cathedral. Your mission, if you choose to accept, is
to help make further progress in the exciting field of parallel
programming–progress that will in time render this book obsolete. Parallel
programming is not as hard as some say, and we hope that this book makes your
parallel-programming projects easier and more fun.
In short, where parallel programming once focused on science, research, and
grand-challenge projects, it is quickly becoming an engineering discipline. We
therefore examine specific parallel-programming tasks and describe how to
approach them. In some surprisingly common cases, they can even be automated.
This book is written in the hope that presenting the engineering discipline
underlying successful parallel-programming projects will free a new generation
of parallel hackers from the need to slowly and painstakingly reinvent old
wheels, enabling them to instead focus their energy and creativity on new
frontiers. We sincerely hope that parallel programming brings you at least as
much fun, excitement, and challenge that it has brought to us!
Severin Sadjina, Lars T. Kyllingstad, Martin Rindarøy, Stian Skjong, Vilmar Æsøy, Dariusz Eirik Fathi, Vahid Hassani, Trond Johnsen, Jørgen Bremnes Nielsen, Eilif Pedersen
Comments: 17 pages, 9 figures
Subjects: Computational Engineering, Finance, and Science (cs.CE); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (cs.SY)
Here, we present the concept of an open virtual prototyping framework for
maritime systems and operations that enables its users to develop re-usable
component or subsystem models, and combine them in full-system simulations for
prototyping, verification, training, and performance studies. This framework
consists of a set of guidelines for model coupling, high-level and low-level
coupling interfaces to guarantee interoperability, a full-system simulation
software, and example models and demonstrators. We discuss the requirements for
such a framework, address the challenges and the possibilities in fulfilling
them, and aim to give a list of best practices for modular and efficient
virtual prototyping and full-system simulation. The context of our work is
within maritime systems and operations, but the issues and solutions we present
here are general enough to be of interest to a much broader audience, both
industrial and scientific.
Audrey Durand, Christian Gagné
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Many real-world applications are characterized by a number of conflicting
performance measures. As optimizing in a multi-objective setting leads to a set
of non-dominated solutions, a preference function is required for selecting the
solution with the appropriate trade-off between the objectives. This preference
function is often unknown, especially when it comes from an expert human user.
However, if we could provide the expert user with a proper estimation for each
action, she would be able to pick her best choice. The question is: how good do
these estimations have to be in order for her choice to remain the same as if
she had access to the exact values? In this paper, we introduce the concept of
preference radius to characterize the robustness of the preference function and
provide guidelines for controlling the quality of estimations in the
multi-objective setting. More specifically, we provide a general formulation of
multi-objective optimization under the bandits setting and the pure exploration
setting with user feedback for articulating the preferences. We show how the
preference radius relates to the optimal gap and how it can be used to analyze
algorithms in the bandits and pure exploration settings. We finally present
experiments in the bandits setting, where we evaluate the impact of noise and
delayed expert user feedback, and in the pure exploration setting, where we
compare multi-objective Thompson sampling with uniform sampling.
Tao Hong
Comments: 4 figures, 2 tables
Subjects: Learning (cs.LG)
An efficient algorithm is proposed in this letter to optimize the Projection
Matrix and Sparsifying Dictionary (PMSD) simultaneously on a large training
dataset through stochastic method. A closed-from solution is derived for
optimizing the projection matrix with a fixed sparsifying dictionary and the
stochastic method is used to optimize the sparsifying dictionary with a fixed
optimized projection matrix on a large training dataset. Benefiting from
training on a large dataset, the proposed method yields a much better
performance in terms of signal recovery accuracy than the existing ones. The
simulation results on natural images demonstrate its effectiveness and
efficiency.
Dmitry Krotov, John J Hopfield
Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)
Deep neural networks (DNN) trained in a supervised way suffer from two known
problems. First, the minima of the objective function used in learning
correspond to data points (also known as rubbish examples or fooling images)
that lack semantic similarity with the training data. Second, a clean input can
be changed by a small, and often imperceptible for human vision, perturbation,
so that the resulting deformed input is misclassified by the network. These
findings emphasize the differences between the ways DNN and humans classify
patterns, and raise a question of designing learning algorithms that more
accurately mimic human perception compared to the existing methods.
Our paper examines these questions within the framework of Dense Associative
Memory (DAM) models. These models are defined by the energy function, with
higher order (higher than quadratic) interactions between the neurons. We show
that in the limit when the power of the interaction vertex in the energy
function is sufficiently large, these models have the following three
properties. First, the minima of the objective function are free from rubbish
images, so that each minimum is a semantically meaningful pattern. Second,
artificial patterns poised precisely at the decision boundary look ambiguous to
human subjects and share aspects of both classes that are separated by that
decision boundary. Third, adversarial images constructed by models with small
power of the interaction vertex, which are equivalent to DNN with rectified
linear units (ReLU), fail to transfer to and fool the models with higher order
interactions. This opens up a possibility to use higher order models for
detecting and stopping malicious adversarial attacks. The presented results
suggest that DAM with higher order energy functions are closer to human visual
perception than DNN with ReLUs.
Marco Gori, Marco Maggini, Alessandro Rossi
Subjects: Learning (cs.LG)
We analyze a new approach to Machine Learning coming from a modification of
classical regularization networks by casting the process in the time dimension,
leading to a sort of collapse of dimensionality in the problem of learning the
model parameters. This approach allows the definition of a online learning
algorithm that progressively accumulates the knowledge provided in the input
trajectory. The regularization principle leads to a solution based on a
dynamical system that is paired with a procedure to develop a graph structure
that stores the input regularities acquired from the temporal evolution. We
report an extensive experimental exploration on the behavior of the parameter
of the proposed model and an evaluation on artificial dataset.
Yanghao Li, Naiyan Wang, Jiaying Liu, Xiaodi Hou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Neural Style Transfer has recently demonstrated very exciting results which
catches eyes in both academia and industry. Despite the amazing results, the
principle of neural style transfer, especially why the Gram matrices could
represent style remains unclear. In this paper, we propose a novel
interpretation of neural style transfer by treating it as a domain adaptation
problem. Specifically, we theoretically show that matching the Gram matrices of
feature maps is equivalent to minimize the Maximum Mean Discrepancy (MMD) with
the second order polynomial kernel. Thus, we argue that the essence of neural
style transfer is to match the feature distributions between the style images
and the generated images. To further support our standpoint, we experiment with
several other distribution alignment methods, and achieve appealing results. We
believe this novel interpretation connects these two important research fields,
and could enlighten future researches.
Li Liu, Yongzhong Yang, Lakshmi Narasimhan Govindarajan, Shu Wang, Bin Hu, Li Cheng, David S. Rosenblum
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Complex activity recognition is challenging due to the inherent uncertainty
and diversity of performing a complex activity. Normally, each instance of a
complex activity has its own configuration of atomic actions and their temporal
dependencies. We propose in this paper an atomic action-based Bayesian model
that constructs Allen’s interval relation networks to characterize complex
activities with structural varieties in a probabilistic generative way: By
introducing latent variables from the Chinese restaurant process, our approach
is able to capture all possible styles of a particular complex activity as a
unique set of distributions over atomic actions and relations. We also show
that local temporal dependencies can be retained and are globally consistent in
the resulting interval network. Moreover, network structure can be learned from
empirical data. A new dataset of complex hand activities has been constructed
and made publicly available, which is much larger in size than any existing
datasets. Empirical evaluations on benchmark datasets as well as our in-house
dataset demonstrate the competitiveness of our approach.
Daniel Borchmann, Tom Hanika, Sergei Obiedkov
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Learning (cs.LG)
We revisit the notion of probably approximately correct implication bases
from the literature and present a first formulation in the language of formal
concept analysis, with the goal to investigate whether such bases represent a
suitable substitute for exact implication bases in practical use-cases. To this
end, we quantitatively examine the behavior of probably approximately correct
implication bases on artificial and real-world data sets and compare their
precision and recall with respect to their corresponding exact implication
bases. Using a small example, we also provide qualitative insight that
implications from probably approximately correct bases can still represent
meaningful knowledge from a given data set.
Xuezhe Ma, Eduard Hovy
Comments: arXiv admin note: text overlap with arXiv:1603.01354
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we propose a probabilistic parsing model, which defines a
proper conditional probability distribution over non-projective dependency
trees for a given sentence, using neural representations as inputs. The neural
network architecture is based on bi-directional LSTM-CNNs which benefits from
both word- and character-level representations automatically, by using
combination of bidirectional LSTM and CNN. On top of the neural network, we
introduce a probabilistic structured layer, defining a conditional log-linear
model over non-projective trees. We evaluate our model on 17 different
datasets, across 14 different languages. By exploiting Kirchhoff’s Matrix-Tree
Theorem (Tutte, 1984), the partition functions and marginals can be computed
efficiently, leading to a straight-forward end-to-end model training procedure
via back-propagation. Our parser achieves state-of-the-art parsing performance
on 9 datasets.
Herman Kamper
Comments: PhD thesis, University of Edinburgh, 107 pages, submitted and accepted 2016
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
In settings where only unlabelled speech data is available, zero-resource
speech technology needs to be developed without transcriptions, pronunciation
dictionaries, or language modelling text. There are two central problems in
zero-resource speech processing: (i) finding frame-level feature
representations which make it easier to discriminate between linguistic units
(phones or words), and (ii) segmenting and clustering unlabelled speech into
meaningful units. In this thesis, we argue that a combination of top-down and
bottom-up modelling is advantageous in tackling these two problems.
To address the problem of frame-level representation learning, we present the
correspondence autoencoder (cAE), a neural network trained with weak top-down
supervision from an unsupervised term discovery system. By combining this
top-down supervision with unsupervised bottom-up initialization, the cAE yields
much more discriminative features than previous approaches. We then present our
unsupervised segmental Bayesian model that segments and clusters unlabelled
speech into hypothesized words. By imposing a consistent top-down segmentation
while also using bottom-up knowledge from detected syllable boundaries, our
system outperforms several others on multi-speaker conversational English and
Xitsonga speech data. Finally, we show that the clusters discovered by the
segmental Bayesian model can be made less speaker- and gender-specific by using
features from the cAE instead of traditional acoustic features.
In summary, the different models and systems presented in this thesis show
that both top-down and bottom-up modelling can improve representation learning,
segmentation and clustering of unlabelled speech data.
Pedro Mercado, Francesco Tudisco, Matthias Hein
Comments: 14 pages, 5 figures. Accepted in Neural Information Processing Systems (NIPS), 2016
Journal-ref: Advances in Neural Information Processing Systems 29,
pp.4421–4429, 2016
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Numerical Analysis (math.NA)
Signed networks allow to model positive and negative relationships. We
analyze existing extensions of spectral clustering to signed networks. It turns
out that existing approaches do not recover the ground truth clustering in
several situations where either the positive or the negative network structures
contain no noise. Our analysis shows that these problems arise as existing
approaches take some form of arithmetic mean of the Laplacians of the positive
and negative part. As a solution we propose to use the geometric mean of the
Laplacians of positive and negative part and show that it outperforms the
existing approaches. While the geometric mean of matrices is computationally
expensive, we show that eigenvectors of the geometric mean can be computed
efficiently, leading to a numerical scheme for sparse matrices which is of
independent interest.
Ali Burak Ünal, Öznur Taştan
Comments: NIPS Workshop on Machine Learning in Computational Biology, Barcelona, Spain, December 10, 2016
Subjects: Computational Engineering, Finance, and Science (cs.CE); Learning (cs.LG)
Characterizing patient somatic mutations through next-generation sequencing
technologies opens up possibilities for refining cancer subtypes. However,
catalogues of mutations reveal that only a small fraction of the genes are
altered frequently in patients. On the other hand different genomic alterations
may perturb the same pathways. We propose a novel clustering procedure that
quantifies the similarities of patients from their mutational profile on
pathways via a novel graph kernel. We represent each KEGG pathway as an
undirected graph. For each patient the vertex labels are assigned based on her
altered genes. Smoothed shortest path graph kernel (smSPK) evaluates each pair
of patients by comparing their vertex labeled pathway graphs. Our clustering
procedure involves two steps: the smSPK kernel matrix derived for each pathway
are input to kernel k-means algorithm and each pathway is evaluated
individually. In the next step, only those pathways that are successful are
combined in to a single kernel input to kernel k-means to stratify patients.
Evaluating the procedure on simulated data showed that smSPK clusters patients
up to 88\% accuracy. Finally to identify ovarian cancer patient subgroups, we
apply our methodology to the cancer genome atlas ovarian data that involves 481
patients. The identified subgroups are evaluated through survival analysis.
Grouping patients into four clusters results with patients groups that are
significantly different in their survival times ((p)-value (le 0.005)).
Semih Yagli, Yücel Altuğ, Sergio Verdú
Comments: 34 Pages, Submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)
The redundancy for universal lossless compression in Campbell’s setting is
characterized as a minimax R’enyi divergence, which is shown to be equal to
the maximal (alpha)-mutual information via a generalized redundancy-capacity
theorem. Special attention is placed on the analysis of the asymptotics of
minimax R’enyi divergence, which is determined up to a term vanishing in
blocklength.
Yuyi Mao, Changsheng You, Jun Zhang, Kaibin Huang, Khaled B. Letaief
Comments: 30 pages, double column, submitted to IEEE Commun. Surveys Tuts
Subjects: Information Theory (cs.IT)
Driven by the visions of Internet of Things and 5G communications, recent
years have seen a paradigm shift in mobile computing, from the centralized
Mobile Cloud Computing towards Mobile Edge Computing (MEC). The main feature of
MEC is to push mobile computing, network control and storage to the network
edges (e.g., base stations and access points) so as to enable
computation-intensive and latency-critical applications at the resource-limited
mobile devices. MEC promises dramatic reduction in latency and mobile energy
consumption, tackling the key challenges for materializing 5G vision. The
promised gains of MEC have motivated extensive efforts in both academia and
industry on developing the technology. A main thrust of MEC research is to
seamlessly merge the two disciplines of wireless communications and mobile
computing, resulting in a wide-range of new designs ranging from techniques for
computation offloading to network architectures. This paper provides a
comprehensive survey of the state-of-the-art MEC research with a focus on joint
radio-and-computational resource management. We also present a research outlook
consisting of a set of promising directions for MEC research, including MEC
system deployment, cache-enabled MEC, mobility management for MEC, green MEC,
as well as privacy-aware MEC. Advancements in these directions will facilitate
the transformation of MEC from theory to practice. Finally, we introduce recent
standardization efforts on MEC as well as some typical MEC application
scenarios.
Harinaivo Andriatahiny, Vololona Harinoro Rakotomalala
Subjects: Information Theory (cs.IT)
We give a description of the weighted Reed-Muller codes over a prime field in
a modular algebra. A description of the homogeneous Reed-Muller codes in the
same ambient space is presented for the binary case. A decoding procedure using
the Landrock-Manz method is developed.
Nuh Aydin, Ajdin Halilovic
Subjects: Information Theory (cs.IT)
Cyclic codes and their various generalizations, such as quasi-twisted (QT)
codes, have a special place in algebraic coding theory. Among other things,
many of the best-known or optimal codes have been obtained from these classes.
In this work we introduce a new generalization of QT codes that we call
multi-twisted (MT) codes and study some of their basic properties. Presenting
several methods of constructing codes in this class and obtaining bounds on the
minimum distances, we show that there exist codes with good parameters in this
class that cannot be obtained as QT or constacyclic codes. This suggests that
considering this larger class in computer searches is promising for
constructing codes with better parameters than currently best-known linear
codes. Working with this new class of codes motivated us to consider a problem
about binomials over finite fields and to discover a result that is interesting
in its own right.
Ishay Haviv, Michael Langberg, Moshe Schwartz, Eitan Yaakobi
Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM); Combinatorics (math.CO)
We prove that there exist non-linear binary cyclic codes that attain the
Gilbert-Varshamov bound.
Peter Harremoës
Comments: 30 pages, 3 figures
Subjects: Information Theory (cs.IT); Statistical Mechanics (cond-mat.stat-mech); Optimization and Control (math.OC)
Logarithmic score and information divergence appear in both information
theory, statistics, statistical mechanics, and portfolio theory. We demonstrate
that all these topics involve some kind of optimization that leads directly to
the use of Bregman divergences. If the Bregman divergence also fulfills a
sufficiency condition it must be proportional to information divergence. We
will demonstrate that sufficiency is equivalent to the apparently weaker notion
of locality and it is also equivalent to the apparently stronger notion of
monotonicity. These sufficiency conditions have quite different relevance in
the different areas of application, and often they are not fulfilled. Therefore
sufficiency conditions can be used to explain when results from one area can be
transferred directly to another and when one will experience differences.
Christos K. Kourtellaris, Charalambos D. Charalambous, Ioannis Tzortzis
Comments: submitted to IEEE Transactions on Information Theory, Paper no. IT-16-0909
Subjects: Information Theory (cs.IT)
We study finite alphabet channels with Unit Memory on the previous Channel
Outputs called UMCO channels. We identify necessary and sufficient conditions,
to test whether the capacity achieving channel input distributions with
feedback are time-invariant, and whether feedback capacity is characterized by
single letter, expressions, similar to that of memoryless channels. The method
is based on showing that a certain dynamic programming equation, which in
general, is a nested optimization problem over the sequence of channel input
distributions, reduces to a non-nested optimization problem. Moreover, for UMCO
channels, we give a simple expression for the ML error exponent, and we
identify sufficient conditions to test whether feedback does not increase
capacity. We derive similar results, when transmission cost constraints are
imposed. We apply the results to a special class of the UMCO channels, the
Binary State Symmetric Channel (BSSC) with and without transmission cost
constraints, to show that the optimization problem of feedback capacity is
non-nested, the capacity achieving channel input distribution and the
corresponding channel output transition probability distribution are
time-invariant, and feedback capacity is characterized by a single letter
formulae, precisely as Shannon’s single letter characterization of capacity of
memoryless channels. Then we derive closed form expressions for the capacity
achieving channel input distribution and feedback capacity. We use the closed
form expressions to evaluate an error exponent for ML decoding.
Gaojie Chen, Justin P. Coon, Marco Di Renzo
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)
We analyze the secrecy outage probability in the downlink for wireless
networks with spatially (Poisson) distributed eavesdroppers (EDs) under the
assumption that the base station employs transmit antenna selection (TAS) to
enhance secrecy performance. We compare the cases where the receiving user
equipment (UE) operates in half-duplex (HD) mode and full-duplex (FD) mode. In
the latter case, the UE simultaneously receives the intended downlink message
and transmits a jamming signal to strengthen secrecy. We investigate two models
of (semi)passive eavesdropping: (1) EDs act independently and (2) EDs collude
to intercept the transmitted message. For both of these models, we obtain
expressions for the secrecy outage probability in the downlink for HD and FD UE
operation. The expressions for HD systems have very accurate approximate or
exact forms in terms of elementary and/or special functions for all path loss
exponents. Those related to the FD systems have exact integral forms for
general path loss exponents, while exact closed forms are given for specific
exponents. A closed-form approximation is also derived for the FD case with
colluding EDs. The resulting analysis shows that the reduction in the secrecy
outage probability is logarithmic in the number of antennas used for TAS and
identifies conditions under which HD operation should be used instead of FD
jamming at the UE. These performance trends and exact relations between system
parameters can be used to develop adaptive power allocation and duplex
operation methods in practice. Examples of such techniques are alluded to
herein.
Peibo Duan, Guoqiang Mao, Shangbo Wang, Changsheng Zhang, Bin Zhang
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Based on the observation that the correlation between observed traffic at two
measurement points or traffic stations may be time-varying, attributable to the
time-varying speed which subsequently causes variations in the time required to
travel between the two points, in this paper, we develop a modified Space-Time
Autoregressive Integrated Moving Average (STARIMA) model with time-varying lags
for short-term traffic flow prediction. Particularly, the temporal lags in the
modified STARIMA change with the time-varying speed at different time of the
day or equivalently change with the (time-varying) time required to travel
between two measurement points. Firstly, a technique is developed to evaluate
the temporal lag in the STARIMA model, where the temporal lag is formulated as
a function of the spatial lag (spatial distance) and the average speed.
Secondly, an unsupervised classification algorithm based on ISODATA algorithm
is designed to classify different time periods of the day according to the
variation of the speed. The classification helps to determine the appropriate
time lag to use in the STARIMA model. Finally, a STARIMA-based model with
time-varying lags is developed for short-term traffic prediction. Experimental
results using real traffic data show that the developed STARIMA-based model
with time-varying lags has superior accuracy compared with its counterpart
developed using the traditional cross-correlation function and without
employing time-varying lags.
Zhiyong Zhou, Jun Yu
Subjects: Applications (stat.AP); Information Theory (cs.IT)
In this paper, we consider a soft measure of block sparsity,
(k_alpha(mathbf{x})=left(lVertmathbf{x}
Vert_{2,alpha}/lVertmathbf{x}
Vert_{2,1}
ight)^{frac{alpha}{1-alpha}},alphain[0,infty])
and propose a procedure to estimate it by using multivariate isotropic
symmetric (alpha)-stable random projections without sparsity or block sparsity
assumptions. The limiting distribution of the estimator is given. Some
simulations are conducted to illustrate our theoretical results.
Thibault Lesieur, Florent Krzakala, Lenka Zdeborová
Comments: 64 pages, 12 figures
Subjects: Statistics Theory (math.ST); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT)
This article is an extended version of previous work of the authors [40, 41]
on low-rank matrix estimation in the presence of constraints on the factors
into which the matrix is factorized. Low-rank matrix factorization is one of
the basic methods used in data analysis for unsupervised learning of relevant
features and other types of dimensionality reduction. We present a framework to
study the constrained low-rank matrix estimation for a general prior on the
factors, and a general output channel through which the matrix is observed. We
draw a paralel with the study of vector-spin glass models – presenting a
unifying way to study a number of problems considered previously in separate
statistical physics works. We present a number of applications for the problem
in data analysis. We derive in detail a general form of the low-rank
approximate message passing (Low- RAMP) algorithm, that is known in statistical
physics as the TAP equations. We thus unify the derivation of the TAP equations
for models as different as the Sherrington-Kirkpatrick model, the restricted
Boltzmann machine, the Hopfield model or vector (xy, Heisenberg and other) spin
glasses. The state evolution of the Low-RAMP algorithm is also derived, and is
equivalent to the replica symmetric solution for the large class of vector-spin
glass models. In the section devoted to result we study in detail phase
diagrams and phase transitions for the Bayes-optimal inference in low-rank
matrix estimation. We present a typology of phase transitions and their
relation to performance of algorithms such as the Low-RAMP or commonly used
spectral methods.