IT博客汇 | arXiv Paper Daily: Mon, 26 Sep 2016

arXiv Paper Daily: Mon, 26 Sep 2016

我爱机器学习(52ml.net)发表于 2016-09-26 00:00:00

Neural and Evolutionary Computing

Multi-Output Artificial Neural Network for Storm Surge Prediction in North Carolina

Anton Bezuglov, Brian Blanton, Reinaldo Santiago
Subjects: Neural and Evolutionary Computing (cs.NE); Atmospheric and Oceanic Physics (physics.ao-ph); Applications (stat.AP)

During hurricane seasons, emergency managers and other decision makers need
accurate and `on-time’ information on potential storm surge impacts. Fully
dynamical computer models, such as the ADCIRC tide, storm surge, and wind-wave
model take several hours to complete a forecast when configured at high spatial
resolution. Additionally, statically meaningful ensembles of high-resolution
models (needed for uncertainty estimation) cannot easily be computed in near
real-time. This paper discusses an artificial neural network model for storm
surge prediction in North Carolina. The network model provides fast, real-time
storm surge estimates at coastal locations in North Carolina. The paper studies
the performance of the neural network model vs. other models on synthetic and
real hurricane data.

Deep Learning in Multi-Layer Architectures of Dense Nuclei

Yonghua Yin, Erol Gelenbe
Comments: 10 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

In dense clusters of neurons in nuclei, cells may interconnect via
soma-to-soma interactions, in addition to conventional synaptic connections. We
illustrate this idea with a multi-layer architecture (MLA) composed of multiple
clusters of recurrent sub-networks of spiking Random Neural Networks (RNN) with
dense soma-to-soma interactions. We use this RNN-MLA architecture for deep
learning. The inputs to the clusters are normalised by adjusting the external
arrival rates of spikes to each cluster, and then apply this architectures to
learning from multi-channel datasets. We present numerical results based on
both images and sensor based data that show the value of this RNN-MLA for deep
learning.

Regulating Reward Training by Means of Certainty Prediction in a Neural Network-Implemented Pong Game

Matt Oberdorfer, Matt Abuzalaf
Comments: 7 pages, 3 figures
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

We present the first reinforcement-learning model to self-improve its
reward-modulated training implemented through a continuously improving
“intuition” neural network. An agent was trained how to play the arcade video
game Pong with two reward-based alternatives, one where the paddle was placed
randomly during training, and a second where the paddle was simultaneously
trained on three additional neural networks such that it could develop a sense
of “certainty” as to how probable its own predicted paddle position will be to
return the ball. If the agent was less than 95% certain to return the ball, the
policy used an intuition neural network to place the paddle. We trained both
architectures for an equivalent number of epochs and tested learning
performance by letting the trained programs play against a near-perfect
opponent. Through this, we found that the reinforcement learning model that
uses an intuition neural network for placing the paddle during reward training
quickly overtakes the simple architecture in its ability to outplay the
near-perfect opponent, additionally outscoring that opponent by an increasingly
wide margin after additional epochs of training.

A Novel Progressive Multi-label Classifier for Classincremental Data

Mihika Dave, Sahil Tapiawala, Meng Joo Er, Rajasekar Venkatesan
Comments: 5 pages, 3 figures, 4 tables
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

In this paper, a progressive learning algorithm for multi-label
classification to learn new labels while retaining the knowledge of previous
labels is designed. New output neurons corresponding to new labels are added
and the neural network connections and parameters are automatically
restructured as if the label has been introduced from the beginning. This work
is the first of the kind in multi-label classifier for class-incremental
learning. It is useful for real-world applications such as robotics where
streaming data are available and the number of labels is often unknown. Based
on the Extreme Learning Machine framework, a novel universal classifier with
plug and play capabilities for progressive multi-label classification is
developed. Experimental results on various benchmark synthetic and real
datasets validate the efficiency and effectiveness of our proposed algorithm.

Computer Vision and Pattern Recognition

Real-time Human Pose Estimation from Video with Convolutional Neural Networks

Marko Linna, Juho Kannala, Esa Rahtu
Comments: 16 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we present a method for real-time multi-person human pose
estimation from video by utilizing convolutional neural networks. Our method is
aimed for use case specific applications, where good accuracy is essential and
variation of the background and poses is limited. This enables us to use a
generic network architecture, which is both accurate and fast. We divide the
problem into two phases: (1) pre-training and (2) finetuning. In pre-training,
the network is learned with highly diverse input data from publicly available
datasets, while in finetuning we train with application specific data, which we
record with Kinect. Our method differs from most of the state-of-the-art
methods in that we consider the whole system, including person detector, pose
estimator and an automatic way to record application specific training material
for finetuning. Our method is considerably faster than many of the
state-of-the-art methods. Our method can be thought of as a replacement for
Kinect, and it can be used for higher level tasks, such as gesture control,
games, person tracking, action recognition and action tracking. We achieved
accuracy of 96.8\% (PCK@0.2) with application specific data.

The face-space duality hypothesis: a computational model

Jonathan Vitale, Mary-Anne Williams, Benjamin Johnston
Comments: in 38th Annual Meeting of the Cognitive Science Society, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Valentine’s face-space suggests that faces are represented in a psychological
multidimensional space according to their perceived properties. However, the
proposed framework was initially designed as an account of invariant facial
features only, and explanations for dynamic features representation were
neglected. In this paper we propose, develop and evaluate a computational model
for a twofold structure of the face-space, able to unify both identity and
expression representations in a single implemented model. To capture both
invariant and dynamic facial features we introduce the face-space duality
hypothesis and subsequently validate it through a mathematical presentation
using a general approach to dimensionality reduction. Two experiments with real
facial images show that the proposed face-space: (1) supports both identity and
expression recognition, and (2) has a twofold structure anticipated by our
formal argument.

Example-Based Image Synthesis via Randomized Patch-Matching

Yi Ren, Yaniv Romano, Michael Elad
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image and texture synthesis is a challenging task that has long been drawing
attention in the fields of image processing, graphics, and machine learning.
This problem consists of modelling the desired type of images, either through
training examples or via a parametric modeling, and then generating images that
belong to the same statistical origin.

This work addresses the image synthesis task, focusing on two specific
families of images — handwritten digits and face images. This paper offers two
main contributions. First, we suggest a simple and intuitive algorithm capable
of generating such images in a unified way. The proposed approach taken is
pyramidal, consisting of upscaling and refining the estimated image several
times. For each upscaling stage, the algorithm randomly draws small patches
from a patch database, and merges these to form a coherent and novel image with
high visual quality. The second contribution is a general framework for the
evaluation of the generation performance, which combines three aspects: the
likelihood, the originality and the spread of the synthesized images. We assess
the proposed synthesis scheme and show that the results are similar in nature,
and yet different from the ones found in the training set, suggesting that true
synthesis effect has been obtained.

EgoCap: Egocentric Marker-less Motion Capture with Two Fisheye Cameras

Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafutdinov, Mohammad Shafiei, Hans-Peter Seidel, Bernt Schiele, Christian Theobalt
Comments: SIGGRAPH Asia 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Marker-based and marker-less optical skeletal motion-capture methods use an
outside-in arrangement of cameras placed around a scene, with viewpoints
converging on the center. They often create discomfort by possibly needed
marker suits, and their recording volume is severely restricted and often
constrained to indoor scenes with controlled backgrounds. Alternative
suit-based systems use several inertial measurement units or an exoskeleton to
capture motion. This makes capturing independent of a confined volume, but
requires substantial, often constraining, and hard to set up body
instrumentation. We therefore propose a new method for real-time, marker-less
and egocentric motion capture which estimates the full-body skeleton pose from
a lightweight stereo pair of fisheye cameras that are attached to a helmet or
virtual reality headset. It combines the strength of a new generative pose
estimation framework for fisheye views with a ConvNet-based body-part detector
trained on a large new dataset. Our inside-in method captures full-body motion
in general indoor and outdoor scenes, and also crowded scenes with many people
in close vicinity. The captured user can freely move around, which enables
reconstruction of larger-scale activities and is particularly useful in virtual
reality to freely roam and interact, while seeing the fully motion-captured
virtual body.

Funnel-Structured Cascade for Multi-View Face Detection with Alignment-Awareness

Shuzhe Wu, Meina Kan, Zhenliang He, Shiguang Shan, Xilin Chen
Comments: Submitted to Neurocomputing (under review). An adapted open source implementation can be found at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multi-view face detection in open environment is a challenging task due to
diverse variations of face appearances and shapes. Most multi-view face
detectors depend on multiple models and organize them in parallel, pyramid or
tree structure, which compromise between the accuracy and time-cost. Aiming at
a more favorable multi-view face detector, we propose a novel funnel-structured
cascade (FuSt) detection framework. In a coarse-to-fine flavor, our FuSt
consists of, from top to bottom, 1) multiple view-specific fast LAB cascade for
extremely quick face proposal, 2) multiple coarse MLP cascade for further
candidate window verification, and 3) a unified fine MLP cascade with
shape-indexed features for accurate face detection. Compared with other
structures, on the one hand, the proposed one uses multiple computationally
efficient distributed classifiers to propose a small number of candidate
windows but with a high recall of multi-view faces. On the other hand, by using
a unified MLP cascade to examine proposals of all views in a centralized style,
it provides a favorable solution for multi-view face detection with high
accuracy and low time-cost. Besides, the FuSt detector is alignment-aware and
performs a coarse facial part prediction which is beneficial for subsequent
face alignment. Extensive experiments on two challenging datasets, FDDB and
AFW, demonstrate the effectiveness of our FuSt detector in both accuracy and
speed.

EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph

Cong Fu, Deng Cai
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Approximate nearest neighbor (ANN) search is a fundamental problem in many
areas of data mining, machine learning and computer vision. The performance of
traditional hierarchical structure (tree) based methods decreases as the
dimensionality of data grows, while hashing based methods usually lack
efficiency in practice. Recently, the graph based methods have drawn
considerable attention. The main idea is that emph{a neighbor of a neighbor is
also likely to be a neighbor}, which we refer as emph{NN-expansion}. These
methods construct a $k$-nearest neighbor ($k$NN) graph offline. And at online
search stage, these methods find candidate neighbors of a query point in some
way (eg, random selection), and then check the neighbors of these candidate
neighbors for closer ones iteratively. Despite some promising results, there
are mainly two problems with these approaches: 1) These approaches tend to
converge to local optima. 2) Constructing a $k$NN graph is time consuming. We
find that these two problems can be nicely solved when we provide a good
initialization for NN-expansion. In this paper, we propose EFANNA, an extremely
fast approximate nearest neighbor search algorithm based on $k$NN Graph. Efanna
nicely combines the advantages of hierarchical structure based methods and
nearest-neighbor-graph based methods. Extensive experiments have shown that
EFANNA outperforms the state-of-art algorithms both on approximate nearest
neighbor search and approximate nearest neighbor graph construction. To the
best of our knowledge, EFANNA is the fastest algorithm so far both on
approximate nearest neighbor graph construction and approximate nearest
neighbor search. A library EFANNA based on this research is released on Github.

Deep Quality: A Deep No-reference Quality Assessment System

Prajna Paramita Dash, Akshaya Mishra, Alexander Wong
Comments: 2 pages
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)

Image quality assessment (IQA) continues to garner great interest in the
research community, particularly given the tremendous rise in consumer video
capture and streaming. Despite significant research effort in IQA in the past
few decades, the area of no-reference image quality assessment remains a great
challenge and is largely unsolved. In this paper, we propose a novel
no-reference image quality assessment system called Deep Quality, which
leverages the power of deep learning to model the complex relationship between
visual content and the perceived quality. Deep Quality consists of a novel
multi-scale deep convolutional neural network, trained to learn to assess image
quality based on training samples consisting of different distortions and
degradations such as blur, Gaussian noise, and compression artifacts.
Preliminary results using the CSIQ benchmark image quality dataset showed that
Deep Quality was able to achieve strong quality prediction performance (89%
patch-level and 98% image-level prediction accuracy), being able to achieve
similar performance as full-reference IQA methods.

Deep Learning in Multi-Layer Architectures of Dense Nuclei

Yonghua Yin, Erol Gelenbe
Comments: 10 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

In dense clusters of neurons in nuclei, cells may interconnect via
soma-to-soma interactions, in addition to conventional synaptic connections. We
illustrate this idea with a multi-layer architecture (MLA) composed of multiple
clusters of recurrent sub-networks of spiking Random Neural Networks (RNN) with
dense soma-to-soma interactions. We use this RNN-MLA architecture for deep
learning. The inputs to the clusters are normalised by adjusting the external
arrival rates of spikes to each cluster, and then apply this architectures to
learning from multi-channel datasets. We present numerical results based on
both images and sensor based data that show the value of this RNN-MLA for deep
learning.

Artificial Intelligence

Regulating Reward Training by Means of Certainty Prediction in a Neural Network-Implemented Pong Game

Matt Oberdorfer, Matt Abuzalaf
Comments: 7 pages, 3 figures
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

We present the first reinforcement-learning model to self-improve its
reward-modulated training implemented through a continuously improving
“intuition” neural network. An agent was trained how to play the arcade video
game Pong with two reward-based alternatives, one where the paddle was placed
randomly during training, and a second where the paddle was simultaneously
trained on three additional neural networks such that it could develop a sense
of “certainty” as to how probable its own predicted paddle position will be to
return the ball. If the agent was less than 95% certain to return the ball, the
policy used an intuition neural network to place the paddle. We trained both
architectures for an equivalent number of epochs and tested learning
performance by letting the trained programs play against a near-perfect
opponent. Through this, we found that the reinforcement learning model that
uses an intuition neural network for placing the paddle during reward training
quickly overtakes the simple architecture in its ability to outplay the
near-perfect opponent, additionally outscoring that opponent by an increasingly
wide margin after additional epochs of training.

Optimizing positional scoring rules for rank aggregation

Ioannis Caragiannis, Xenophon Chatzigeorgiou, George A. Krimpas, Alexandros A. Voudouris
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC)

Nowadays, several crowdsourcing projects exploit social choice methods for
computing an aggregate ranking of alternatives given individual rankings
provided by workers. Motivated by such systems, we consider a setting where
each worker is asked to rank a fixed (small) number of alternatives and, then,
a positional scoring rule is used to compute the aggregate ranking. Among the
apparently infinite such rules, what is the best one to use? To answer this
question, we assume that we have partial access to an underlying true ranking.
Then, the important optimization problem to be solved is to compute the
positional scoring rule whose outcome, when applied to the profile of
individual rankings, is as close as possible to the part of the underlying true
ranking we know. We study this fundamental problem from a theoretical viewpoint
and present positive and negative complexity results and, furthermore,
complement our theoretical findings with experiments on real-world and
synthetic data.

Discovering Sound Concepts and Acoustic Relations In Text

Anurag Kumar, Bhiksha Raj, Ndapandula Nakashole
Comments: 5 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Learning (cs.LG)

In this paper we describe approaches for discovering acoustic concepts and
relations in text. The first major goal is to be able to identify text phrases
which contain a notion of audibility and can be termed as a sound or an
acoustic concept. We also propose a method to define an acoustic scene through
a set of sound concepts. We use pattern matching and parts of speech tags to
generate sound concepts from large scale text corpora. We use dependency
parsing and LSTM recurrent neural network to predict a set of sound concepts
for a given acoustic scene. These methods are not only helpful in creating an
acoustic knowledge base but also directly help in acoustic event and scene
detection research in a variety of ways.

Towards the bio-personalization of music recommendation systems: A single-sensor EEG biomarker of subjective music preference

Dimitrios A. Adamos (1 and 3), Stavros I. Dimitriadis (2), Nikolaos A. Laskaris (2 and 3), ((1) School of Music Studies, Faculty of Fine Arts, Aristotle University of Thessaloniki, (2) AIIA Lab, Department of Informatics, Aristotle University of Thessaloniki, (3) Neuroinformatics GRoup, Aristotle University of Thessaloniki)
Journal-ref: Information Sciences, Volumes 343 – 344, 20 May 2016, Pages 94 –
108
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)

Recent advances in biosensors technology and mobile electroencephalographic
(EEG) interfaces have opened new application fields for cognitive monitoring. A
computable biomarker for the assessment of spontaneous aesthetic brain
responses during music listening is introduced here. It derives from
well-established measures of cross-frequency coupling (CFC) and quantifies the
music-induced alterations in the dynamic relationships between brain rhythms.
During a stage of exploratory analysis, and using the signals from a suitably
designed experiment, we established the biomarker, which acts on brain
activations recorded over the left prefrontal cortex and focuses on the
functional coupling between high-beta and low-gamma oscillations. Based on data
from an additional experimental paradigm, we validated the introduced biomarker
and showed its relevance for expressing the subjective aesthetic appreciation
of a piece of music. Our approach resulted in an affordable tool that can
promote human-machine interaction and, by serving as a personalized music
annotation strategy, can be potentially integrated into modern flexible music
recommendation systems.

Keywords: Cross-frequency coupling; Human-computer interaction;
Brain-computer interface

Language as a Latent Variable: Discrete Generative Models for Sentence Compression

Yishu Miao, Phil Blunsom
Comments: EMNLP 2016
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In this work we explore deep generative models of text in which the latent
representation of a document is itself drawn from a discrete language model
distribution. We formulate a variational auto-encoder for inference in this
model and apply it to the task of compressing sentences. In this application
the generative model first draws a latent summary sentence from a background
language model, and then subsequently draws the observed sentence conditioned
on this latent summary. In our empirical evaluation we show that generative
formulations of both abstractive and extractive compression yield
state-of-the-art results when trained on a large amount of supervised data.
Further, we explore semi-supervised compression scenarios where we show that it
is possible to achieve performance competitive with previously proposed
supervised models while training on a fraction of the supervised data.

Computation and Language

Incorporating Relation Paths in Neural Relation Extraction

Wenyuan Zeng, Yankai Lin, Zhiyuan Liu, Maosong Sun
Comments: 9 pages, 3 figures, 4 tables
Subjects: Computation and Language (cs.CL)

Distantly supervised relation extraction has been widely used to find novel
relational facts from plain text. To predict the relation between a pair of two
target entities, existing methods solely rely on those direct sentences
containing both entities. In fact, there are also many sentences containing
only one of the target entities, which provide rich and useful information for
relation extraction. To address this issue, we build inference chains between
two target entities via intermediate entities, and propose a path-based neural
relation extraction model to encode the relational semantics from both direct
sentences and inference chains. Experimental results on real-world datasets
show that, our model can make full use of those sentences containing only one
target entity, and achieves significant and consistent improvements on relation
extraction as compared with baselines.

AMR-to-text generation as a Traveling Salesman Problem

Linfeng Song, Yue Zhang, Xiaochang Peng, Zhiguo Wang, Daniel Gildea
Comments: accepted by EMNLP 2016
Subjects: Computation and Language (cs.CL)

The task of AMR-to-text generation is to generate grammatical text that
sustains the semantic meaning for a given AMR graph. We at- tack the task by
first partitioning the AMR graph into smaller fragments, and then generating
the translation for each fragment, before finally deciding the order by solving
an asymmetric generalized traveling salesman problem (AGTSP). A Maximum Entropy
classifier is trained to estimate the traveling costs, and a TSP solver is used
to find the optimized solution. The final model reports a BLEU score of 22.44
on the SemEval-2016 Task8 dataset.

Language as a Latent Variable: Discrete Generative Models for Sentence Compression

Yishu Miao, Phil Blunsom
Comments: EMNLP 2016
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In this work we explore deep generative models of text in which the latent
representation of a document is itself drawn from a discrete language model
distribution. We formulate a variational auto-encoder for inference in this
model and apply it to the task of compressing sentences. In this application
the generative model first draws a latent summary sentence from a background
language model, and then subsequently draws the observed sentence conditioned
on this latent summary. In our empirical evaluation we show that generative
formulations of both abstractive and extractive compression yield
state-of-the-art results when trained on a large amount of supervised data.
Further, we explore semi-supervised compression scenarios where we show that it
is possible to achieve performance competitive with previously proposed
supervised models while training on a fraction of the supervised data.

Deep Multi-Task Learning with Shared Memory

Pengfei Liu, Xipeng Qiu, Xuanjing Huang
Comments: accepted by emnlp2016. arXiv admin note: text overlap with arXiv:1605.05101
Subjects: Computation and Language (cs.CL)

Neural network based models have achieved impressive results on various
specific tasks. However, in previous works, most models are learned separately
based on single-task supervised objectives, which often suffer from
insufficient training data. In this paper, we propose two deep architectures
which can be trained jointly on multiple related tasks. More specifically, we
augment neural model with an external memory, which is shared by several tasks.
Experiments on two groups of text classification tasks show that our proposed
architectures can improve the performance of a task with the help of other
related tasks.

Annotating Derivations: A New Evaluation Strategy and Dataset for Algebra Word Problems

Shyam Upadhyay, Ming-Wei Chang
Subjects: Computation and Language (cs.CL)

We propose a new evaluation for automatic solvers for algebra word problems,
which can identify reasoning mistakes that existing evaluations overlook. Our
proposal is to use derivations for evaluations, which reflect the reasoning
process of the solver by explaining how the equation system was constructed. We
accomplish this by developing an algorithm for checking the equivalence between
two derivations, and showing how derivation annotations can be
semi-automatically added to existing datasets. To make our experiments more
comprehensive, we also annotated DRAW-1K , a new dataset of 1000 general
algebra word problems. In total, our experiments span over 2300 algebra word
problems. We found that the annotated derivation enable a superior evaluation
of automatic solvers than previously used metrics.

Novel stochastic properties of the short-time spectrum for unvoiced pronunciation modeling and synthesis

Xiaodong Zhuang, Nikos E. Mastorakis
Comments: 24 pages, 16 figures, original work
Subjects: Sound (cs.SD); Computation and Language (cs.CL)

Stochastic property of speech signal is a fundamental research topic in
speech analysis and processing. In this paper, multiple levels of randomness in
speech signal are discussed, and the stochastic properties of unvoiced
pronunciation are studied in detail, which has not received sufficient research
attention before. The study is based on the signals of sustained unvoiced
pronunciation captured in the experiments, for which the amplitude and phase
values in the short-time spectrum are studied as random variables. The
statistics of amplitude for each frequency component is studied individually,
based on which a new property of “consistent standard deviation coefficient” is
revealed for the amplitude spectrum of unvoiced pronunciation. The relationship
between the amplitude probability distributions of different frequency
components is further studied, which indicates that all the frequency
components have a common prototype of amplitude probability distribution. As an
adaptive and flexible probability distribution, the Weibull distribution is
adopted to fit the expectation-normalized amplitude spectrum data. The phase
distribution for the short-time spectrum is also studied, and the results show
a uniform distribution. A synthesis method for unvoiced pronunciation is
proposed based on the Weibull distribution of amplitude and uniform
distribution of phase, which is implemented by STFT with artificially generated
short-time spectrum with random amplitude and phase. The synthesis results have
identical quality of auditory perception as the original pronunciation, and
have similar autocorrelation as that of the original signal, which proves the
effectiveness of the proposed stochastic model of short-time spectrum for
unvoiced pronunciation.

Distributed, Parallel, and Cluster Computing

MPI Parallelization of the Resistive Wall Code STARWALL: Report of the EUROfusion High Level Support Team Project JORSTAR

S. Mochalskyy, M. Hoelzl, R. Hatzky
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Large scale plasma instabilities inside a tokamak can be influenced by the
currents flowing in the conducting vessel wall. This involves non linear plasma
dynamics and its interaction with the wall current. In order to study this
problem the code that solves the magneto-hydrodynamic (MHD) equations, called
JOREK, was coupled with the model for the vacuum region and the resistive
conducting structure named STARWALL. The JOREK-STARWALL model has been already
applied to perform simulations of the Vertical Displacement Events (VDEs), the
Resistive Wall Modes (RWMs), and Quiescent H-Mode.

At the beginning of the project it was not possible to resolve the realistic
wall structure with a large number of finite element triangles due to the huge
consumption of memory and wall clock time by STARWALL and the corresponding
coupling routine in JOREK. Moreover, both the STARWALL code and the JOREK
coupling routine are only partially parallelized via OpenMP. The aim of this
project is to implement an MPI parallelization in the model that should allow
to obtain realistic results with high resolution. This project concentrates on
the MPI parallelization of STARWALL. Parallel I/O and the MPI parallelization
of the coupling terms inside JOREK will be addressed in a follow-up project.

Scheduling Under Power and Energy Constraints

Mohammed Haroon Dupty, Pragati Agrawal, Shrisha Rao
Comments: 28 pages
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

Given a system model where machines have distinct speeds and power ratings
but are otherwise compatible, we consider various problems of scheduling under
resource constraints on the system which place the restriction that not all
machines can be run at once. These can be power, energy, or makespan
constraints on the system. Given such constraints, there are problems with
divisible as well as non-divisible jobs. In the setting where there is a
constraint on power, we show that the problem of minimizing makespan for a set
of divisible jobs is NP-hard by reduction to the knapsack problem. We then show
that scheduling to minimize energy with power constraints is also NP-hard. We
then consider scheduling with energy and makespan constraints with divisible
jobs and show that these can be solved in polynomial time, and the problems
with non-divisible jobs are NP-hard. We give exact and approximation algorithms
for these problems as required.

Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers

Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao, T. N. Vijaykumar
Comments: 8 pages
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)

The conventional approach to scaling Software Defined Networking (SDN)
controllers today is to partition switches based on network topology, with each
partition being controlled by a single physical controller, running all SDN
applications. However, topological partitioning is limited by the fact that (i)
performance of latency-sensitive (e.g., monitoring) SDN applications associated
with a given partition may be impacted by co-located compute-intensive (e.g.,
route computation) applications; (ii) simultaneously achieving low convergence
time and response times might be challenging; and (iii) communication between
instances of an application across partitions may increase latencies. To tackle
these issues, in this paper, we explore functional slicing, a complementary
approach to scaling, where multiple SDN applications belonging to the same
topological partition may be placed in physically distinct servers. We present
Hydra, a framework for distributed SDN controllers based on functional slicing.
Hydra chooses partitions based on convergence time as the primary metric, but
places application instances across partitions in a manner that keeps response
times low while considering communication between applications of a partition,
and instances of an application across partitions. Evaluations using the
Floodlight controller show the importance and effectiveness of Hydra in
simultaneously keeping convergence times on failures small, while sustaining
higher throughput per partition and ensuring responsiveness to
latency-sensitive applications.

Learning

Using Neural Network Formalism to Solve Multiple-Instance Problems

Tomas Pevny, Petr Somol
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Many objects in the real world are difficult to describe by a single
numerical vector of a fixed length, whereas describing them by a set of vectors
is more natural. Therefore, Multiple instance learning (MIL) techniques have
been constantly gaining on importance throughout last years. MIL formalism
represents each object (sample) by a set (bag) of feature vectors (instances)
of fixed length where knowledge about objects (e.g., class label) is available
on bag level but not necessarily on instance level. Many standard tools
including supervised classifiers have been already adapted to MIL setting since
the problem got formalized in late nineties. In this work we propose a neural
network (NN) based formalism that intuitively bridges the gap between MIL
problem definition and the vast existing knowledge-base of standard models and
classifiers. We show that the proposed NN formalism is effectively optimizable
by a modified back-propagation algorithm and can reveal unknown patterns inside
bags. Comparison to eight types of classifiers from the prior art on a set of
14 publicly available benchmark datasets confirms the advantages and accuracy
of the proposed solution.

A Novel Progressive Multi-label Classifier for Classincremental Data

Mihika Dave, Sahil Tapiawala, Meng Joo Er, Rajasekar Venkatesan
Comments: 5 pages, 3 figures, 4 tables
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

In this paper, a progressive learning algorithm for multi-label
classification to learn new labels while retaining the knowledge of previous
labels is designed. New output neurons corresponding to new labels are added
and the neural network connections and parameters are automatically
restructured as if the label has been introduced from the beginning. This work
is the first of the kind in multi-label classifier for class-incremental
learning. It is useful for real-world applications such as robotics where
streaming data are available and the number of labels is often unknown. Based
on the Extreme Learning Machine framework, a novel universal classifier with
plug and play capabilities for progressive multi-label classification is
developed. Experimental results on various benchmark synthetic and real
datasets validate the efficiency and effectiveness of our proposed algorithm.

Multilayer Spectral Graph Clustering via Convex Layer Aggregation

Pin-Yu Chen, Alfred O. Hero III
Comments: To appear at IEEE GlobalSIP 2016
Subjects: Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

Multilayer graphs are commonly used for representing different relations
between entities and handling heterogeneous data processing tasks. New
challenges arise in multilayer graph clustering for assigning clusters to a
common multilayer node set and for combining information from each layer. This
paper presents a theoretical framework for multilayer spectral graph clustering
of the nodes via convex layer aggregation. Under a novel multilayer signal plus
noise model, we provide a phase transition analysis that establishes the
existence of a critical value on the noise level that permits reliable cluster
separation. The analysis also specifies analytical upper and lower bounds on
the critical value, where the bounds become exact when the clusters have
identical sizes. Numerical experiments on synthetic multilayer graphs are
conducted to validate the phase transition analysis and study the effect of
layer weights and noise levels on clustering reliability.

Input Convex Neural Networks

Brandon Amos, Lei Xu, J. Zico Kolter
Subjects: Learning (cs.LG); Optimization and Control (math.OC)

This paper presents the input convex neural network architecture. These are
scalar-valued (potentially deep) neural networks with constraints on the
network parameters such that the output of the network is a convex function of
(some of) the inputs. The networks allow for efficient inference via
optimization over some inputs to the network given others, and can be applied
to settings including structured prediction, data imputation, reinforcement
learning, and others. In this paper we lay the basic groundwork for these
models, proposing methods for inference, optimization and learning, and analyze
their representational power. We show that many existing neural network
architectures can be made input-convex with only minor modification, and
develop specialized optimization algorithms tailored to this setting. Finally,
we highlight the performance of the methods on multi-label prediction, image
completion, and reinforcement learning problems, where we show improvement over
the existing state of the art in many cases.

Screening Rules for Convex Problems

Anant Raj, Jakob Olbrich, Bernd Gärtner, Bernhard Schölkopf, Martin Jaggi
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)

We propose a new framework for deriving screening rules for convex
optimization problems. Our approach covers a large class of constrained and
penalized optimization formulations, and works in two steps. First, given any
approximate point, the structure of the objective function and the duality gap
is used to gather information on the optimal solution. In the second step, this
information is used to produce screening rules, i.e. safely identifying
unimportant weight variables of the optimal solution. Our general framework
leads to a large variety of useful existing as well as new screening rules for
many applications. For example, we provide new screening rules for general
simplex and $L_1$-constrained problems, Elastic Net, squared-loss Support
Vector Machines, minimum enclosing ball, as well as structured norm regularized
problems, such as group lasso.

Gated Neural Networks for Option Pricing: Rationality by Design

Yongxin Yang, Yu Zheng, Timothy M. Hospedales
Comments: Technical Report. 7 pages, 5 figures
Subjects: Computational Finance (q-fin.CP); Learning (cs.LG); Pricing of Securities (q-fin.PR)

We propose a neural network approach to price EU call options that
significantly outperforms some existing pricing models and comes with
guarantees that its predictions are economically reasonable. To achieve this,
we introduce a class of gated neural networks that automatically learn to
divide-and-conquer the problem space for robust and accurate pricing. We then
derive instantiations of these networks that are ‘rational by design’ in terms
of naturally encoding a valid call option surface that enforces no arbitrage
principles. This integration of human insight within data-driven learning
provides significantly better generalisation in pricing performance due to the
encoded inductive bias in the learning, guarantees sanity in the model’s
predictions, and provides econometrically useful byproduct such as risk neutral
density.

Discovering Sound Concepts and Acoustic Relations In Text

Anurag Kumar, Bhiksha Raj, Ndapandula Nakashole
Comments: 5 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Learning (cs.LG)

In this paper we describe approaches for discovering acoustic concepts and
relations in text. The first major goal is to be able to identify text phrases
which contain a notion of audibility and can be termed as a sound or an
acoustic concept. We also propose a method to define an acoustic scene through
a set of sound concepts. We use pattern matching and parts of speech tags to
generate sound concepts from large scale text corpora. We use dependency
parsing and LSTM recurrent neural network to predict a set of sound concepts
for a given acoustic scene. These methods are not only helpful in creating an
acoustic knowledge base but also directly help in acoustic event and scene
detection research in a variety of ways.

Constraint-Based Clustering Selection

Toon Van Craenendonck, Hendrik Blockeel
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Semi-supervised clustering methods incorporate a limited amount of
supervision into the clustering process. Typically, this supervision is
provided by the user in the form of pairwise constraints. Existing methods use
such constraints in one of the following ways: they adapt their clustering
procedure, their similarity metric, or both. All of these approaches operate
within the scope of individual clustering algorithms. In contrast, we propose
to use constraints to choose between clusterings generated by very different
unsupervised clustering algorithms, run with different parameter settings. We
empirically show that this simple approach often outperforms existing
semi-supervised clustering methods.

Information Theory

Channel Estimation and Performance Analysis of One-Bit Massive MIMO Systems

Yongzhi Li, Cheng Tao, Gonzalo Seco-Granados, Amine Mezghani, A. Lee Swindlehurst, Liu Liu
Comments: 14 pages, 9 figures, submitted to IEEE Trans. on Signal Processing
Subjects: Information Theory (cs.IT)

This paper considers channel estimation and system performance for the uplink
of a single-cell massive multiple-input multiple-output (MIMO) system. Each
receive antenna of the base station (BS) is assumed to be equipped with a pair
of one-bit analog-to-digital converters (ADCs) to quantize the real and
imaginary part of the received signal. We first obtain the Cramer-Rao lower
bound for channel estimation, and show that the one-bit ADCs cause the
performance of unbiased channel estimators to degrade at high SNR. We then
propose a biased channel estimator based on the Bussgang decomposition, which
reformulates the nonlinear quantizer as a linear function with identical first-
and second-order statistics. The resulting channel estimator works well at high
SNRs and outperforms previously proposed approaches across all SNRs.We then
derive closed-form expressions at low SNR for an approximation of the
achievable rate for the maximal ratio combiner and zero forcing receivers that
takes channel estimation error due to both noise and one bit quantization into
account. The closed-form expressions in turn allow us to obtain insight into
important system design issues such as optimal resource allocation, maximal sum
spectral efficiency, overall energy efficiency, and number of antennas.
Numerical results are presented to verify our analytical results and
demonstrate the benefit of optimizing system performance accordingly.

An extended characterization of a class of optimal three-weight cyclic codes over any finite field

Gerardo Vega
Comments: arXiv admin note: text overlap with arXiv:1508.05077
Subjects: Information Theory (cs.IT)

A characterization of a class of optimal three-weight cyclic codes of
dimension 3 over any finite field was recently presented in [10]. Shortly after
this, a generalization for the sufficient numerical conditions of such
characterization was given in [3]. The main purpose of this work is to show
that the numerical conditions found in [3], are also necessary. As we will see
later, an interesting feature of the present work, in clear contrast with these
two preceding works, is that we use some new and non-conventional methods in
order to achieve our goals. In fact, through these non-conventional methods, we
not only were able to extend the characterization in [10], but also present a
less complex proof of such extended characterization, which avoids the use of
some of the sophisticated –but at the same time complex– theorems, that are
the key arguments of the proofs given in [10] and [3]. Furthermore, we also
find the parameters for the dual code of any cyclic code in our extended
characterization class. In fact, after the analysis of some examples, it seems
that such dual codes always have the same parameters as the best known linear
codes.

On the Non-Existence of Unbiased Estimators in Constrained Estimation Problems

Anelia Somekh-Baruch, Amir Leshem, Venkatesh Saligrama
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT)

We address the problem of existence of unbiased constrained parameter
estimators. We show that if the constrained set of parameters is compact and
the hypothesized distributions are absolutely continuous with respect to one
another, then there exists no unbiased estimator. Weaker conditions for the
absence of unbiased constrained estimators are also specified. We provide
several examples which demonstrate the utility of these conditions.

Signal acquisition via polarization modulation in single photon sources

Mark D. McDonnell, Adrian P. Flitney
Comments: 7 pages, 2 figures, accepted by Physical Review E. This version adds a reference
Journal-ref: Physical Review E 80, 060102(R) (2009)
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

A simple model system is introduced for demonstrating how a single photon
source might be used to transduce classical analog information. The theoretical
scheme results in measurements of analog source samples that are (i) quantized
in the sense of analog-to-digital conversion and (ii) corrupted by random noise
that is solely due to the quantum uncertainty in detecting the polarization
state of each photon. This noise is unavoidable if more than one bit per sample
is to be transmitted, and we show how it may be exploited in a manner inspired
by suprathreshold stochastic resonance. The system is analyzed information
theoretically, as it can be modeled as a noisy optical communication channel,
although unlike classical Poisson channels, the detector’s photon statistics
are binomial. Previous results on binomial channels are adapted to demonstrate
numerically that the classical information capacity, and thus the accuracy of
the transduction, increases logarithmically with the square root of the number
of photons, N. Although the capacity is shown to be reduced when an additional
detector nonideality is present, the logarithmic increase with N remains.