IT博客汇 | arXiv Paper Daily: Fri, 7 Oct 2016

arXiv Paper Daily: Fri, 7 Oct 2016

我爱机器学习(52ml.net)发表于 2016-10-07 00:00:00

Neural and Evolutionary Computing

Sequence-based Sleep Stage Classification using Conditional Neural Fields

Intan Nurma Yulita, Mohamad Ivan Fanany, Aniati Murni Arymurthy
Comments: 14 pages. Submitted to Computational and Mathematical Methods in Medicine (Hindawi Publishin). Article ID 7163687
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

Sleep signals from a polysomnographic database are sequences in nature.
Commonly employed analysis and classification methods, however, ignored this
fact and treated the sleep signals as non-sequence data. Treating the sleep
signals as sequences, this paper compared two powerful unsupervised feature
extractors and three sequence-based classifiers regarding accuracy and
computational (training and testing) time after 10-folds cross-validation. The
compared feature extractors are Deep Belief Networks (DBN) and Fuzzy C-Means
(FCM) clustering. Whereas the compared sequence-based classifiers are Hidden
Markov Models (HMM), Conditional Random Fields (CRF) and its variants, i.e.,
Hidden-state CRF (HCRF) and Latent-Dynamic CRF (LDCRF); and Conditional Neural
Fields (CNF) and its variant (LDCNF). In this study, we use two datasets. The
first dataset is an open (public) polysomnographic dataset downloadable from
the Internet, while the second dataset is our polysomnographic dataset (also
available for download). For the first dataset, the combination of FCM and CNF
gives the highest accuracy (96.75\%) with relatively short training time (0.33
hours). For the second dataset, the combination of DBN and CRF gives the
accuracy of 99.96\% but with 1.02 hours training time, whereas the combination
of DBN and CNF gives slightly less accuracy (99.69\%) but also less computation
time (0.89 hours).

Regularized Dynamic Boltzmann Machine with Delay Pruning for Unsupervised Learning of Temporal Sequences

Sakyasingha Dasgupta, Takayuki Yoshizumi, Takayuki Osogami
Comments: 6 pages, 5 figures, accepted full paper (oral presentation) at ICPR 2016
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

We introduce Delay Pruning, a simple yet powerful technique to regularize
dynamic Boltzmann machines (DyBM). The recently introduced DyBM provides a
particularly structured Boltzmann machine, as a generative model of a
multi-dimensional time-series. This Boltzmann machine can have infinitely many
layers of units but allows exact inference and learning based on its
biologically motivated structure. DyBM uses the idea of conduction delays in
the form of fixed length first-in first-out (FIFO) queues, with a neuron
connected to another via this FIFO queue, and spikes from a pre-synaptic neuron
travel along the queue to the post-synaptic neuron with a constant period of
delay. Here, we present Delay Pruning as a mechanism to prune the lengths of
the FIFO queues (making them zero) by setting some delay lengths to one with a
fixed probability, and finally selecting the best performing model with fixed
delays. The uniqueness of structure and a non-sampling based learning rule in
DyBM, make the application of previously proposed regularization techniques
like Dropout or DropConnect difficult, leading to poor generalization. First,
we evaluate the performance of Delay Pruning to let DyBM learn a
multidimensional temporal sequence generated by a Markov chain. Finally, we
show the effectiveness of delay pruning in learning high dimensional sequences
using the moving MNIST dataset, and compare it with Dropout and DropConnect
methods.

Metaheuristic Algorithms for Convolution Neural Network

L. M. Rasdi Rere, Mohamad Ivan Fanany, Aniati Murni Arymurthy
Comments: Article ID 1537325, 13 pages. Received 29 January 2016; Revised 15 April 2016; Accepted 10 May 2016. Academic Editor: Martin Hagan. in Hindawi Publishing. Computational Intelligence and Neuroscience Volume 2016 (2016)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

A typical modern optimization technique is usually either heuristic or
metaheuristic. This technique has managed to solve some optimization problems
in the research area of science, engineering, and industry. However,
implementation strategy of metaheuristic for accuracy improvement on
convolution neural networks (CNN), a famous deep learning method, is still
rarely investigated. Deep learning relates to a type of machine learning
technique, where its aim is to move closer to the goal of artificial
intelligence of creating a machine that could successfully perform any
intellectual tasks that can be carried out by a human. In this paper, we
propose the implementation strategy of three popular metaheuristic approaches,
that is, simulated annealing, differential evolution, and harmony search, to
optimize CNN. The performances of these metaheuristic methods in optimizing CNN
on classifying MNIST and CIFAR dataset were evaluated and compared.
Furthermore, the proposed methods are also compared with the original CNN.
Although the proposed methods show an increase in the computation time, their
accuracy has also been improved (up to 7.14 percent).

Adaptive Online Sequential ELM for Concept Drift Tackling

Arif Budiman, Mohamad Ivan Fanany, Chan Basaruddin
Comments: Hindawi Publishing. Computational Intelligence and Neuroscience Volume 2016 (2016), Article ID 8091267, 17 pages Received 29 January 2016, Accepted 17 May 2016. Special Issue on “Advances in Neural Networks and Hybrid-Metaheuristics: Theory, Algorithms, and Novel Engineering Applications”. Academic Editor: Stefan Haufe
Journal-ref: Computational Intelligence and Neuroscience Volume 2016 (2016),
Article ID 8091267, 17 pages
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

A machine learning method needs to adapt to over time changes in the
environment. Such changes are known as concept drift. In this paper, we propose
concept drift tackling method as an enhancement of Online Sequential Extreme
Learning Machine (OS-ELM) and Constructive Enhancement OS-ELM (CEOS-ELM) by
adding adaptive capability for classification and regression problem. The
scheme is named as adaptive OS-ELM (AOS-ELM). It is a single classifier scheme
that works well to handle real drift, virtual drift, and hybrid drift. The
AOS-ELM also works well for sudden drift and recurrent context change type. The
scheme is a simple unified method implemented in simple lines of code. We
evaluated AOS-ELM on regression and classification problem by using concept
drift public data set (SEA and STAGGER) and other public data sets such as
MNIST, USPS, and IDS. Experiments show that our method gives higher kappa value
compared to the multiclassifier ELM ensemble. Even though AOS-ELM in practice
does not need hidden nodes increase, we address some issues related to the
increasing of the hidden nodes such as error condition and rank values. We
propose taking the rank of the pseudoinverse matrix as an indicator parameter
to detect underfitting condition.

A New Data Representation Based on Training Data Characteristics to Extract Drug Named-Entity in Medical Text

Sadikin Mujiono, Mohamad Ivan Fanany, Chan Basaruddin
Comments: Hindawi Publishing. Computational Intelligence and Neuroscience Volume 2016 (2016), Article ID 3483528, 24 pages Received 27 May 2016; Revised 8 August 2016; Accepted 18 September 2016. Special Issue on “Smart Data: Where the Big Data Meets the Semantics”. Academic Editor: Trong H. Duong
Journal-ref: Computational Intelligence and Neuroscience Volume 2016 (2016),
Article ID 3483528, 24 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

One essential task in information extraction from the medical corpus is drug
name recognition. Compared with text sources come from other domains, the
medical text is special and has unique characteristics. In addition, the
medical text mining poses more challenges, e.g., more unstructured text, the
fast growing of new terms addition, a wide range of name variation for the same
drug. The mining is even more challenging due to the lack of labeled dataset
sources and external knowledge, as well as multiple token representations for a
single drug name that is more common in the real application setting. Although
many approaches have been proposed to overwhelm the task, some problems
remained with poor F-score performance (less than 0.75). This paper presents a
new treatment in data representation techniques to overcome some of those
challenges. We propose three data representation techniques based on the
characteristics of word distribution and word similarities as a result of word
embedding training. The first technique is evaluated with the standard NN
model, i.e., MLP (Multi-Layer Perceptrons). The second technique involves two
deep network classifiers, i.e., DBN (Deep Belief Networks), and SAE (Stacked
Denoising Encoders). The third technique represents the sentence as a sequence
that is evaluated with a recurrent NN model, i.e., LSTM (Long Short Term
Memory). In extracting the drug name entities, the third technique gives the
best F-score performance compared to the state of the art, with its average
F-score being 0.8645.

Multiple Regularizations Deep Learning for Paddy Growth Stages Classification from LANDSAT-8

Ines Heidieni Ikasari, Vina Ayumi, Mohamad Ivan Fanany, Sidik Mulyono
Comments: 11 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

This study uses remote sensing technology that can provide information about
the condition of the earth’s surface area, fast, and spatially. The study area
was in Karawang District, lying in the Northern part of West Java-Indonesia. We
address a paddy growth stages classification using LANDSAT 8 image data
obtained from multi-sensor remote sensing image taken in October 2015 to August
2016. This study pursues a fast and accurate classification of paddy growth
stages by employing multiple regularizations learning on some deep learning
methods such as DNN (Deep Neural Networks) and 1-D CNN (1-D Convolutional
Neural Networks). The used regularizations are Fast Dropout, Dropout, and Batch
Normalization. To evaluate the effectiveness, we also compared our method with
other machine learning methods such as (Logistic Regression, SVM, Random
Forest, and XGBoost). The data used are seven bands of LANDSAT-8 spectral data
samples that correspond to paddy growth stages data obtained from i-Sky (eye in
the sky) Innovation system. The growth stages are determined based on paddy
crop phenology profile from time series of LANDSAT-8 images. The classification
results show that MLP using multiple regularization Dropout and Batch
Normalization achieves the highest accuracy for this dataset.

Ischemic Stroke Identification Based on EEG and EOG using 1D Convolutional Neural Network and Batch Normalization

Endang Purnama Giri, Mohamad Ivan Fanany, Aniati Murni Arymurthy
Comments: 13 pages. To be published in ICACSIS 2016
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

In 2015, stroke was the number one cause of death in Indonesia. The majority
type of stroke is ischemic. The standard tool for diagnosing stroke is CT-Scan.
For developing countries like Indonesia, the availability of CT-Scan is very
limited and still relatively expensive. Because of the availability, another
device that potential to diagnose stroke in Indonesia is EEG. Ischemic stroke
occurs because of obstruction that can make the cerebral blood flow (CBF) on a
person with stroke has become lower than CBF on a normal person (control) so
that the EEG signal have a deceleration. On this study, we perform the ability
of 1D Convolutional Neural Network (1DCNN) to construct classification model
that can distinguish the EEG and EOG stroke data from EEG and EOG control data.
To accelerate training process our model we use Batch Normalization. Involving
62 person data object and from leave one out the scenario with five times
repetition of measurement we obtain the average of accuracy 0.86 (F-Score
0.861) only at 200 epoch. This result is better than all over shallow and
popular classifiers as the comparator (the best result of accuracy 0.69 and
F-Score 0.72 ). The feature used in our study were only 24 handcrafted feature
with simple feature extraction process.

Combining Generative and Discriminative Neural Networks for Sleep Stages Classification

Endang Purnama Giri, Mohamad Ivan Fanany, Aniati Murni Arymurthy
Comments: Submitted to Computational Intelligence and Neuroscience (Hindawi Publishing). 13 pages
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Sleep stages pattern provides important clues in diagnosing the presence of
sleep disorder. By analyzing sleep stages pattern and extracting its features
from EEG, EOG, and EMG signals, we can classify sleep stages. This study
presents a novel classification model for predicting sleep stages with a high
accuracy. The main idea is to combine the generative capability of Deep Belief
Network (DBN) with a discriminative ability and sequence pattern recognizing
capability of Long Short-term Memory (LSTM). We use DBN that is treated as an
automatic higher level features generator. The input to DBN is 28 “handcrafted”
features as used in previous sleep stages studies. We compared our method with
other techniques which combined DBN with Hidden Markov Model (HMM).In this
study, we exploit the sequence or time series characteristics of sleep dataset.
To the best of our knowledge, most of the present sleep analysis from
polysomnogram relies only on single instanced label (nonsequence) for
classification. In this study, we used two datasets: an open data set that is
treated as a benchmark; the other dataset is our sleep stages dataset
(available for download) to verify the results further. Our experiments showed
that the combination of DBN with LSTM gives better overall accuracy 98.75\%
(Fscore=0.9875) for benchmark dataset and 98.94\% (Fscore=0.9894) for MKG
dataset. This result is better than the state of the art of sleep stages
classification that was 91.31\%.

Computer Vision and Pattern Recognition

Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks?

Matthew Johnson-Roberson, Charles Barto, Rounak Mehta, Sharath Nittur Sridhar, Ram Vasudevan
Comments: 8 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Deep learning has rapidly transformed the state of the art algorithms used to
address a variety of problems in computer vision and robotics. These
breakthroughs have however relied upon massive amounts of human annotated
training data. This time-consuming process has begun impeding the progress of
these deep learning efforts. This paper describes a method to incorporate
photo-realistic computer images from a simulation engine to rapidly generate
annotated data that can be used for training of machine learning algorithms. We
demonstrate that a state of the art architecture, which is trained only using
these synthetic annotations, performs better than the identical architecture
trained on human annotated real-world data, when tested on the KITTI data set
for vehicle detection. By training machine learning algorithms on a rich
virtual world, this paper illustrates that real objects in real scenes can be
learned and classified using synthetic data. This approach offers the
possibility of accelerating deep learning’s application to sensor based
classification problems like those that appear in self-driving cars.

PetroSurf3D – A high-resolution 3D Dataset of Rock Art for Surface Segmentation

Georg Poier, Markus Seidl, Matthias Zeppelzauer, Christian Reinbacher, Martin Schaich, Giovanna Bellando, Alberto Marretta, Horst Bischof
Comments: Dataset and more information can be found at this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Ancient rock engravings (so called petroglyphs) represent one of the earliest
surviving artifacts describing life of our ancestors. Recently, modern 3D
scanning techniques found their application in the domain of rock art
documentation by providing high-resolution reconstructions of rock surfaces.
Reconstruction results demonstrate the strengths of novel 3D techniques and
have the potential to replace the traditional (manual) documentation techniques
of archaeologists.

An important analysis task in rock art documentation is the segmentation of
petroglyphs. To foster automation of this tedious step, we present a
high-resolution 3D surface dataset of natural rock surfaces which exhibit
different petroglyphs together with accurate expert ground-truth annotations.
To our knowledge, this dataset is the first public 3D surface dataset which
allows for surface segmentation at sub-millimeter scale. We conduct experiments
with state-of-the-art methods to generate a baseline for the dataset and verify
that the size and variability of the data is sufficient to successfully adopt
even recent data-hungry Convolutional Neural Networks (CNNs). Furthermore, we
experimentally demonstrate that the provided geometric information is key to
successful automatic segmentation and strongly outperforms color-based
segmentation. The introduced dataset represents a novel benchmark for 3D
surface segmentation methods in general and is intended to foster comparability
among different approaches in future.

Metaheuristic Algorithms for Convolution Neural Network

A typical modern optimization technique is usually either heuristic or
metaheuristic. This technique has managed to solve some optimization problems
in the research area of science, engineering, and industry. However,
implementation strategy of metaheuristic for accuracy improvement on
convolution neural networks (CNN), a famous deep learning method, is still
rarely investigated. Deep learning relates to a type of machine learning
technique, where its aim is to move closer to the goal of artificial
intelligence of creating a machine that could successfully perform any
intellectual tasks that can be carried out by a human. In this paper, we
propose the implementation strategy of three popular metaheuristic approaches,
that is, simulated annealing, differential evolution, and harmony search, to
optimize CNN. The performances of these metaheuristic methods in optimizing CNN
on classifying MNIST and CIFAR dataset were evaluated and compared.
Furthermore, the proposed methods are also compared with the original CNN.
Although the proposed methods show an increase in the computation time, their
accuracy has also been improved (up to 7.14 percent).

A Vision-based Indoor Positioning System on Shopping Mall Context

Ziwei Xu, Haitian Zheng, Minjian Pang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

With the help of a map and GPS, outdoor navigation from one spot to another
can be done quickly and well. Unfortunately, inside a shopping mall, where GPS
signal is hardly available, navigation becomes troublesome. In this paper, we
propose an indoor navigation system to address the problem. Unlike most
existing indoor navigation systems, which relies heavily on infrastructures and
pre-labelled maps, our system uses only photos taken by cellphone cameras as
input. We utilize multiple image processing techniques to parse photos of a
mall’s shopping instruction and a construct topological map of the mall. During
navigation, we make use of deep neural networks to extract information from
environment and find out the real-time position of the user. We propose a new
feature fusion method to help automatically identifying shops in a photo.

Do They All Look the Same? Deciphering Chinese, Japanese and Koreans by Fine-Grained Deep Learning

Yu Wang, Haofu Liao, Yang Feng, Xiangyang Xu, Jiebo Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We study to what extend Chinese, Japanese and Korean faces can be classified
and which facial attributes offer the most important cues. First, we propose a
novel way of obtaining large numbers of facial images with nationality labels.
Then we train state-of-the-art neural networks with these labeled images. We
are able to achieve an accuracy of 75.03% in the classification task, with
chances being 33.33% and human accuracy 38.89% . Further, we train multiple
facial attribute classifiers to identify the most distinctive features for each
group. We find that Chinese, Japanese and Koreans do exhibit substantial
differences in certain attributes, such as bangs, smiling, and bushy eyebrows.
Along the way, we uncover several gender-related cross-country patterns as
well. Our work, which complements existing APIs such as Microsoft Cognitive
Services and Face++, could find potential applications in tourism, e-commerce,
social media marketing, criminal justice and even counter-terrorism.

Compressive Imaging with Iterative Forward Models

Hsiou-Yuan Liu, Ulugbek S. Kamilov, Dehong Liu, Hassan Mansour, Petros T. Boufounos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)

We propose a new compressive imaging method for reconstructing 2D or 3D
objects from their scattered wave-field measurements. Our method relies on a
novel, nonlinear measurement model that can account for the multiple scattering
phenomenon, which makes the method preferable in applications where linear
measurement models are inaccurate. We construct the measurement model by
expanding the scattered wave-field with an accelerated-gradient method, which
is guaranteed to converge and is suitable for large-scale problems. We provide
explicit formulas for computing the gradient of our measurement model with
respect to the unknown image, which enables image formation with a sparsity-
driven numerical optimization algorithm. We validate the method both
analytically and with numerical simulations.

Searching Scenes by Abstracting Things

Svetlana Kordumova, Jan C. van Gemert, Cees G. M. Snoek, Arnold W. M. Smeulders
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper we propose to represent a scene as an abstraction of ‘things’.
We start from ‘things’ as generated by modern object proposals, and we
investigate their immediately observable properties: position, size, aspect
ratio and color, and those only. Where the recent successes and excitement of
the field lie in object identification, we represent the scene composition
independent of object identities. We make three contributions in this work.
First, we study simple observable properties of ‘things’, and call it things
syntax. Second, we propose translating the things syntax in linguistic abstract
statements and study their descriptive effect to retrieve scenes. Thirdly, we
propose querying of scenes with abstract block illustrations and study their
effectiveness to discriminate among different types of scenes. The benefit of
abstract statements and block illustrations is that we generate them directly
from the images, without any learning beforehand as in the standard attribute
learning. Surprisingly, we show that even though we use the simplest of
features from ‘things’ layout and no learning at all, we can still retrieve
scenes reasonably well.

Multiple Regularizations Deep Learning for Paddy Growth Stages Classification from LANDSAT-8

Ines Heidieni Ikasari, Vina Ayumi, Mohamad Ivan Fanany, Sidik Mulyono
Comments: 11 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

This study uses remote sensing technology that can provide information about
the condition of the earth’s surface area, fast, and spatially. The study area
was in Karawang District, lying in the Northern part of West Java-Indonesia. We
address a paddy growth stages classification using LANDSAT 8 image data
obtained from multi-sensor remote sensing image taken in October 2015 to August
2016. This study pursues a fast and accurate classification of paddy growth
stages by employing multiple regularizations learning on some deep learning
methods such as DNN (Deep Neural Networks) and 1-D CNN (1-D Convolutional
Neural Networks). The used regularizations are Fast Dropout, Dropout, and Batch
Normalization. To evaluate the effectiveness, we also compared our method with
other machine learning methods such as (Logistic Regression, SVM, Random
Forest, and XGBoost). The data used are seven bands of LANDSAT-8 spectral data
samples that correspond to paddy growth stages data obtained from i-Sky (eye in
the sky) Innovation system. The growth stages are determined based on paddy
crop phenology profile from time series of LANDSAT-8 images. The classification
results show that MLP using multiple regularization Dropout and Batch
Normalization achieves the highest accuracy for this dataset.

PCA-aided Fully Convolutional Networks for Semantic Segmentation of Multi-channel fMRI

Lei Tai, Qiong Ye, Ming Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Semantic segmentation of functional magnetic resonance imaging (fMRI) makes
great sense for pathology diagnosis and decision system of medical robots. The
multi-channel fMRI data provide more information of the pathological features.
But the increased amount of data causes complexity in feature detection. This
paper proposes a principal component analysis (PCA)-aided fully convolutional
network to particularly deal with multi-channel fMRI. We transfer the learned
weights of contemporary classification networks to the segmentation task by
fine-tuning. The experiments results are compared with various methods e.g.
k-NN. A new labelling strategy is proposed to solve the semantic segmentation
problem with unclear boundaries. Even with a small-sized training dataset, the
test results demonstrate that our model outperforms other pathological feature
detection methods. Besides, its forward inference only takes 90 milliseconds
for a single set of fMRI data. To our knowledge, this is the first time to
realize pixel-wise labeling of multi-channel magnetic resonance image using
FCN.

A Deep Spatial Contextual Long-term Recurrent Convolutional Network for Saliency Detection

Nian Liu, Junwei Han
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Traditional saliency models usually adopt hand-crafted image features and
human-designed mechanisms to calculate local or global contrast. In this paper,
we propose a novel computational saliency model, i.e., deep spatial contextual
long-term recurrent convolutional network (DSCLRCN) to predict where people
looks in natural scenes. DSCLRCN first automatically learns saliency related
local features on each image location in parallel. Then, in contrast with most
other deep network based saliency models which infer saliency in local
contexts, DSCLRCN can mimic the cortical lateral inhibition mechanisms in human
visual system to incorporate global contexts to assess the saliency of each
image location by leveraging the deep spatial long short-term memory (DSLSTM)
model. Moreover, we also integrate scene context modulation in DSLSTM for
saliency inference, leading to a novel deep spatial contextual LSTM (DSCLSTM)
model. The whole network can be trained end-to-end and works efficiently when
testing. Experimental results on two benchmark datasets show that DSCLRCN can
achieve state-of-the-art performance on saliency detection. Furthermore, the
proposed DSCLSTM model can significantly boost the saliency detection
performance by incorporating both global spatial interconnections and scene
context modulation, which may uncover novel inspirations for studies on them in
computational saliency models.

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

Yuanzhouhan Cao, Chunhua Shen, Heng Tao Shen
Comments: 14 pages. Accepted to IEEE T. Image Processing
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Augmenting RGB data with measured depth has been shown to improve the
performance of a range of tasks in computer vision including object detection
and semantic segmentation. Although depth sensors such as the Microsoft Kinect
have facilitated easy acquisition of such depth information, the vast majority
of images used in vision tasks do not contain depth information. In this paper,
we show that augmenting RGB images with estimated depth can also improve the
accuracy of both object detection and semantic segmentation. Specifically, we
first exploit the recent success of depth estimation from monocular images and
learn a deep depth estimation model. Then we learn deep depth features from the
estimated depth and combine with RGB features for object detection and semantic
segmentation. Additionally, we propose an RGB-D semantic segmentation method
which applies a multi-task training scheme: semantic label prediction and depth
value regression. We test our methods on several datasets and demonstrate that
incorporating information from estimated depth improves the performance of
object detection and semantic segmentation remarkably.

Distortion Varieties

Joe Kileel, Zuzana Kukelova, Tomas Pajdla, Bernd Sturmfels
Comments: 25 pages
Subjects: Algebraic Geometry (math.AG); Computer Vision and Pattern Recognition (cs.CV)

The distortion varieties of a given projective variety are parametrized by
duplicating coordinates and multiplying them with monomials. We study their
degrees and defining equations. Exact formulas are obtained for the case of
one-parameter distortions. These are based on Chow polytopes and Gr”obner
bases. Multi-parameter distortions are studied using tropical geometry. The
motivation for distortion varieties comes from multi-view geometry in computer
vision. Our theory furnishes a new framework for formulating and solving
minimal problems for camera models with image distortion.

Supervision via Competition: Robot Adversaries for Learning Tasks

Lerrel Pinto, James Davidson, Abhinav Gupta
Comments: Submission to ICRA 2017
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

There has been a recent paradigm shift in robotics to data-driven learning
for planning and control. Due to large number of experiences required for
training, most of these approaches use a self-supervised paradigm: using
sensors to measure success/failure. However, in most cases, these sensors
provide weak supervision at best. In this work, we propose an adversarial
learning framework that pits an adversary against the robot learning the task.
In an effort to defeat the adversary, the original robot learns to perform the
task with more robustness leading to overall improved performance. We show that
this adversarial framework forces the the robot to learn a better grasping
model in order to overcome the adversary. By grasping 82% of presented novel
objects compared to 68% without an adversary, we demonstrate the utility of
creating adversaries. We also demonstrate via experiments that having robots in
adversarial setting might be a better learning strategy as compared to having
collaborative multiple robots.

Much Ado About Time: Exhaustive Annotation of Temporal Data

Gunnar A. Sigurdsson, Olga Russakovsky, Ali Farhadi, Ivan Laptev, Abhinav Gupta
Comments: HCOMP 2016 Camera Ready
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)

Large-scale annotated datasets allow AI systems to learn from and build upon
the knowledge of the crowd. Many crowdsourcing techniques have been developed
for collecting image annotations. These techniques often implicitly rely on the
fact that a new input image takes a negligible amount of time to perceive. In
contrast, we investigate and determine the most cost-effective way of obtaining
high-quality multi-label annotations for temporal data such as videos. Watching
even a short 30-second video clip requires a significant time investment from a
crowd worker; thus, requesting multiple annotations following a single viewing
is an important cost-saving strategy. But how many questions should we ask per
video? We conclude that the optimal strategy is to ask as many questions as
possible in a HIT (up to 52 binary questions after watching a 30-second video
clip in our experiments). We demonstrate that while workers may not correctly
answer all questions, the cost-benefit analysis nevertheless favors consensus
from multiple such cheap-yet-imperfect iterations over more complex
alternatives. When compared with a one-question-per-video baseline, our method
is able to achieve a 10% improvement in recall 76.7% ours versus 66.7%
baseline) at comparable precision (83.8% ours versus 83.0% baseline) in about
half the annotation time (3.8 minutes ours compared to 7.1 minutes baseline).
We demonstrate the effectiveness of our method by collecting multi-label
annotations of 157 human activities on 1,815 videos.

Artificial Intelligence

Adaptive Online Sequential ELM for Concept Drift Tackling

A machine learning method needs to adapt to over time changes in the
environment. Such changes are known as concept drift. In this paper, we propose
concept drift tackling method as an enhancement of Online Sequential Extreme
Learning Machine (OS-ELM) and Constructive Enhancement OS-ELM (CEOS-ELM) by
adding adaptive capability for classification and regression problem. The
scheme is named as adaptive OS-ELM (AOS-ELM). It is a single classifier scheme
that works well to handle real drift, virtual drift, and hybrid drift. The
AOS-ELM also works well for sudden drift and recurrent context change type. The
scheme is a simple unified method implemented in simple lines of code. We
evaluated AOS-ELM on regression and classification problem by using concept
drift public data set (SEA and STAGGER) and other public data sets such as
MNIST, USPS, and IDS. Experiments show that our method gives higher kappa value
compared to the multiclassifier ELM ensemble. Even though AOS-ELM in practice
does not need hidden nodes increase, we address some issues related to the
increasing of the hidden nodes such as error condition and rank values. We
propose taking the rank of the pseudoinverse matrix as an indicator parameter
to detect underfitting condition.

Metaheuristic Algorithms for Convolution Neural Network

A typical modern optimization technique is usually either heuristic or
metaheuristic. This technique has managed to solve some optimization problems
in the research area of science, engineering, and industry. However,
implementation strategy of metaheuristic for accuracy improvement on
convolution neural networks (CNN), a famous deep learning method, is still
rarely investigated. Deep learning relates to a type of machine learning
technique, where its aim is to move closer to the goal of artificial
intelligence of creating a machine that could successfully perform any
intellectual tasks that can be carried out by a human. In this paper, we
propose the implementation strategy of three popular metaheuristic approaches,
that is, simulated annealing, differential evolution, and harmony search, to
optimize CNN. The performances of these metaheuristic methods in optimizing CNN
on classifying MNIST and CIFAR dataset were evaluated and compared.
Furthermore, the proposed methods are also compared with the original CNN.
Although the proposed methods show an increase in the computation time, their
accuracy has also been improved (up to 7.14 percent).

A New Data Representation Based on Training Data Characteristics to Extract Drug Named-Entity in Medical Text

One essential task in information extraction from the medical corpus is drug
name recognition. Compared with text sources come from other domains, the
medical text is special and has unique characteristics. In addition, the
medical text mining poses more challenges, e.g., more unstructured text, the
fast growing of new terms addition, a wide range of name variation for the same
drug. The mining is even more challenging due to the lack of labeled dataset
sources and external knowledge, as well as multiple token representations for a
single drug name that is more common in the real application setting. Although
many approaches have been proposed to overwhelm the task, some problems
remained with poor F-score performance (less than 0.75). This paper presents a
new treatment in data representation techniques to overcome some of those
challenges. We propose three data representation techniques based on the
characteristics of word distribution and word similarities as a result of word
embedding training. The first technique is evaluated with the standard NN
model, i.e., MLP (Multi-Layer Perceptrons). The second technique involves two
deep network classifiers, i.e., DBN (Deep Belief Networks), and SAE (Stacked
Denoising Encoders). The third technique represents the sentence as a sequence
that is evaluated with a recurrent NN model, i.e., LSTM (Long Short Term
Memory). In extracting the drug name entities, the third technique gives the
best F-score performance compared to the state of the art, with its average
F-score being 0.8645.

Parallel Large-Scale Attribute Reduction on Cloud Systems

Junbo Zhang, Tianrui Li, Yi Pan
Comments: 14 pages, 10 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)

The rapid growth of emerging information technologies and application
patterns in modern society, e.g., Internet, Internet of Things, Cloud Computing
and Tri-network Convergence, has caused the advent of the era of big data. Big
data contains huge values, however, mining knowledge from big data is a
tremendously challenging task because of data uncertainty and inconsistency.
Attribute reduction (also known as feature selection) can not only be used as
an effective preprocessing step, but also exploits the data redundancy to
reduce the uncertainty. However, existing solutions are designed 1) either for
a single machine that means the entire data must fit in the main memory and the
parallelism is limited; 2) or for the Hadoop platform which means that the data
have to be loaded into the distributed memory frequently and therefore become
inefficient. In this paper, we overcome these shortcomings for maximum
efficiency possible, and propose a unified framework for Parallel Large-scale
Attribute Reduction, termed PLAR, for big data analysis. PLAR consists of three
components: 1) Granular Computing (GrC)-based initialization: it converts a
decision table (i.e., original data representation) into a granularity
representation which reduces the amount of space and hence can be easily cached
in the distributed memory: 2) model-parallelism: it simultaneously evaluates
all feature candidates and makes attribute reduction highly parallelizable; 3)
data-parallelism: it computes the significance of an attribute in parallel
using a MapReduce-style manner. We implement PLAR with four representative
heuristic feature selection algorithms on Spark, and evaluate them on various
huge datasets, including UCI and astronomical datasets, finding our method’s
advantages beyond existing solutions.

Human Decision-Making under Limited Time

Pedro A. Ortega, Alan A. Stocker
Comments: 9 pages, 4 figures, NIPS Advances in Neural Information Processing Systems 29, 2016
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI)

Subjective expected utility theory assumes that decision-makers possess
unlimited computational resources to reason about their choices; however,
virtually all decisions in everyday life are made under resource constraints –
i.e. decision-makers are bounded in their rationality. Here we experimentally
tested the predictions made by a formalization of bounded rationality based on
ideas from statistical mechanics and information-theory. We systematically
tested human subjects in their ability to solve combinatorial puzzles under
different time limitations. We found that our bounded-rational model accounts
well for the data. The decomposition of the fitted model parameter into the
subjects’ expected utility function and resource parameter provide interesting
insight into the subjects’ information capacity limits. Our results confirm
that humans gradually fall back on their learned prior choice patterns when
confronted with increasing resource limitations.

Information Retrieval

Discriminative Information Retrieval for Knowledge Discovery

Tongfei Chen, Benjamin Van Durme
Subjects: Information Retrieval (cs.IR)

We propose a framework for discriminative Information Retrieval (IR) atop
linguistic features, trained to improve the recall of tasks such as answer
candidate passage retrieval, the initial step in text-based Question Answering
(QA). We formalize this as an instance of linear feature-based IR (Metzler and
Croft, 2007), illustrating how a variety of knowledge discovery tasks are
captured under this approach, leading to a 44% improvement in recall for
candidate triage for QA.

A Robust Framework for Classifying Evolving Document Streams in an Expert-Machine-Crowd Setting

Muhammad Imran, Sanjay Chawla, Carlos Castillo
Comments: Accepted at ICDM 2016
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

An emerging challenge in the online classification of social media data
streams is to keep the categories used for classification up-to-date. In this
paper, we propose an innovative framework based on an Expert-Machine-Crowd
(EMC) triad to help categorize items by continuously identifying novel concepts
in heterogeneous data streams often riddled with outliers. We unify constrained
clustering and outlier detection by formulating a novel optimization problem:
COD-Means. We design an algorithm to solve the COD-Means problem and show that
COD-Means will not only help detect novel categories but also seamlessly
discover human annotation errors and improve the overall quality of the
categorization process. Experiments on diverse real data sets demonstrate that
our approach is both effective and efficient.

Computation and Language

Scalable Machine Translation in Memory Constrained Environments

Paul Baltescu
Comments: Master Thesis
Subjects: Computation and Language (cs.CL)

Machine translation is the discipline concerned with developing automated
tools for translating from one human language to another. Statistical machine
translation (SMT) is the dominant paradigm in this field. In SMT, translations
are generated by means of statistical models whose parameters are learned from
bilingual data. Scalability is a key concern in SMT, as one would like to make
use of as much data as possible to train better translation systems.

In recent years, mobile devices with adequate computing power have become
widely available. Despite being very successful, mobile applications relying on
NLP systems continue to follow a client-server architecture, which is of
limited use because access to internet is often limited and expensive. The goal
of this dissertation is to show how to construct a scalable machine translation
system that can operate with the limited resources available on a mobile
device.

The main challenge for porting translation systems on mobile devices is
memory usage. The amount of memory available on a mobile device is far less
than what is typically available on the server side of a client-server
application. In this thesis, we investigate alternatives for the two components
which prevent standard translation systems from working on mobile devices due
to high memory usage. We show that once these standard components are replaced
with our proposed alternatives, we obtain a scalable translation system that
can work on a device with limited memory.

Toward Automatic Understanding of the Function of Affective Language in Support Groups

Amit Navindgi, Caroline Brun, Cécile Boulard Masson, Scott Nowson
Comments: 9 pages, 1 figure, conference workshop
Subjects: Computation and Language (cs.CL)

Understanding expressions of emotions in support forums has considerable
value and NLP methods are key to automating this. Many approaches
understandably use subjective categories which are more fine-grained than a
straightforward polarity-based spectrum. However, the definition of such
categories is non-trivial and, in fact, we argue for a need to incorporate
communicative elements even beyond subjectivity. To support our position, we
report experiments on a sentiment-labelled corpus of posts taken from a medical
support forum. We argue that not only is a more fine-grained approach to text
analysis important, but simultaneously recognising the social function behind
affective expressions enable a more accurate and valuable level of
understanding.

A New Data Representation Based on Training Data Characteristics to Extract Drug Named-Entity in Medical Text

One essential task in information extraction from the medical corpus is drug
name recognition. Compared with text sources come from other domains, the
medical text is special and has unique characteristics. In addition, the
medical text mining poses more challenges, e.g., more unstructured text, the
fast growing of new terms addition, a wide range of name variation for the same
drug. The mining is even more challenging due to the lack of labeled dataset
sources and external knowledge, as well as multiple token representations for a
single drug name that is more common in the real application setting. Although
many approaches have been proposed to overwhelm the task, some problems
remained with poor F-score performance (less than 0.75). This paper presents a
new treatment in data representation techniques to overcome some of those
challenges. We propose three data representation techniques based on the
characteristics of word distribution and word similarities as a result of word
embedding training. The first technique is evaluated with the standard NN
model, i.e., MLP (Multi-Layer Perceptrons). The second technique involves two
deep network classifiers, i.e., DBN (Deep Belief Networks), and SAE (Stacked
Denoising Encoders). The third technique represents the sentence as a sequence
that is evaluated with a recurrent NN model, i.e., LSTM (Long Short Term
Memory). In extracting the drug name entities, the third technique gives the
best F-score performance compared to the state of the art, with its average
F-score being 0.8645.

Neural-based Noise Filtering from Word Embeddings

Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu
Comments: 9 pages, 4 figures, COLING 2016
Subjects: Computation and Language (cs.CL)

Word embeddings have been demonstrated to benefit NLP tasks impressively.
Yet, there is room for improvement in the vector representations, because
current word embeddings typically contain unnecessary information, i.e., noise.
We propose two novel models to improve word embeddings by unsupervised
learning, in order to yield word denoising embeddings. The word denoising
embeddings are obtained by strengthening salient information and weakening
noise in the original word embeddings, based on a deep feed-forward neural
network filter. Results from benchmark tasks show that the filtered word
denoising embeddings outperform the original word embeddings.

A Robust Framework for Classifying Evolving Document Streams in an Expert-Machine-Crowd Setting

Muhammad Imran, Sanjay Chawla, Carlos Castillo
Comments: Accepted at ICDM 2016
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

An emerging challenge in the online classification of social media data
streams is to keep the categories used for classification up-to-date. In this
paper, we propose an innovative framework based on an Expert-Machine-Crowd
(EMC) triad to help categorize items by continuously identifying novel concepts
in heterogeneous data streams often riddled with outliers. We unify constrained
clustering and outlier detection by formulating a novel optimization problem:
COD-Means. We design an algorithm to solve the COD-Means problem and show that
COD-Means will not only help detect novel categories but also seamlessly
discover human annotation errors and improve the overall quality of the
categorization process. Experiments on diverse real data sets demonstrate that
our approach is both effective and efficient.

Automatic Detection of Small Groups of Persons, Influential Members, Relations and Hierarchy in Written Conversations Using Fuzzy Logic

French Pope III, Rouzbeh A. Shirvani, Mugizi Robert Rwebangira, Mohamed Chouikha, Ayo Taylor, Andres Alarcon Ramirez, Amirsina Torfi
Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)

Nowadays a lot of data is collected in online forums. One of the key tasks is
to determine the social structure of these online groups, for example the
identification of subgroups within a larger group. We will approach the
grouping of individual as a classification problem. The classifier will be
based on fuzzy logic. The input to the classifier will be linguistic features
and degree of relationships (among individuals). The output of the classifiers
are the groupings of individuals. We also incorporate a method that ranks the
members of the detected subgroup to identify the hierarchies in each subgroup.
Data from the HBO television show The Wire is used to analyze the efficacy and
usefulness of fuzzy logic based methods as alternative methods to classical
statistical methods usually used for these problems. The proposed methodology
could detect automatically the most influential members of each organization
The Wire with 90% accuracy.

Generating Simulations of Motion Events from Verbal Descriptions

James Pustejovsky, Nikhil Krishnaswamy
Comments: 11 pages, 5 figures, *SEM workshop, COLING 2014
Subjects: Computation and Language (cs.CL)

In this paper, we describe a computational model for motion events in natural
language that maps from linguistic expressions, through a dynamic event
interpretation, into three-dimensional temporal simulations in a model.
Starting with the model from (Pustejovsky and Moszkowicz, 2011), we analyze
motion events using temporally-traced Labelled Transition Systems. We model the
distinction between path- and manner-motion in an operational semantics, and
further distinguish different types of manner-of-motion verbs in terms of the
mereo-topological relations that hold throughout the process of movement. From
these representations, we generate minimal models, which are realized as
three-dimensional simulations in software developed with the game engine,
Unity. The generated simulations act as a conceptual “debugger” for the
semantics of different motion verbs: that is, by testing for consistency and
informativeness in the model, simulations expose the presuppositions associated
with linguistic expressions and their compositions. Because the model
generation component is still incomplete, this paper focuses on an
implementation which maps directly from linguistic interpretations into the
Unity code snippets that create the simulations.

Distributed, Parallel, and Cluster Computing

Parallel Large-Scale Attribute Reduction on Cloud Systems

Junbo Zhang, Tianrui Li, Yi Pan
Comments: 14 pages, 10 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)

The rapid growth of emerging information technologies and application
patterns in modern society, e.g., Internet, Internet of Things, Cloud Computing
and Tri-network Convergence, has caused the advent of the era of big data. Big
data contains huge values, however, mining knowledge from big data is a
tremendously challenging task because of data uncertainty and inconsistency.
Attribute reduction (also known as feature selection) can not only be used as
an effective preprocessing step, but also exploits the data redundancy to
reduce the uncertainty. However, existing solutions are designed 1) either for
a single machine that means the entire data must fit in the main memory and the
parallelism is limited; 2) or for the Hadoop platform which means that the data
have to be loaded into the distributed memory frequently and therefore become
inefficient. In this paper, we overcome these shortcomings for maximum
efficiency possible, and propose a unified framework for Parallel Large-scale
Attribute Reduction, termed PLAR, for big data analysis. PLAR consists of three
components: 1) Granular Computing (GrC)-based initialization: it converts a
decision table (i.e., original data representation) into a granularity
representation which reduces the amount of space and hence can be easily cached
in the distributed memory: 2) model-parallelism: it simultaneously evaluates
all feature candidates and makes attribute reduction highly parallelizable; 3)
data-parallelism: it computes the significance of an attribute in parallel
using a MapReduce-style manner. We implement PLAR with four representative
heuristic feature selection algorithms on Spark, and evaluate them on various
huge datasets, including UCI and astronomical datasets, finding our method’s
advantages beyond existing solutions.

A Survey and Measurement Study of GPU DVFS on Energy Conservation

Xinxin Mei, Qiang Wang, Xiaowen Chu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Energy efficiency has become one of the top design criteria for current
computing systems. The dynamic voltage and frequency scaling (DVFS) has been
widely adopted by laptop computers, servers, and mobile devices to conserve
energy, while the GPU DVFS is still at a certain early age. This paper aims at
exploring the impact of GPU DVFS on the application performance and power
consumption, and furthermore, on energy conservation. We survey the
state-of-the-art GPU DVFS characterizations, and then summarize recent research
works on GPU power and performance models. We also conduct real GPU DVFS
experiments on NVIDIA Fermi and Maxwell GPUs. According to our experimental
results, GPU DVFS has significant potential for energy saving. The effect of
scaling core voltage/frequency and memory voltage/frequency depends on not only
the GPU architectures, but also the characteristic of GPU applications.

RedThreads: An Interface for Application-level Fault Detection/Correction through Adaptive Redundant Multithreading

Saurabh Hukerikar, Keita Teranishi, Pedro C. Diniz, Robert F. Lucas
Comments: Submitted to the International Journal of Parallel Programming
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In the presence of accelerated fault rates, which are projected to be the
norm on future exascale systems, it will become increasingly difficult for
high-performance computing (HPC) applications to accomplish useful computation.
Due to the fault-oblivious nature of current HPC programming paradigms and
execution environments, HPC applications are insufficiently equipped to deal
with errors. We believe that HPC applications should be enabled with
capabilities to actively search for and correct errors in their computations.
The redundant multithreading (RMT) approach offers lightweight replicated
execution streams of program instructions within the context of a single
application process. However, the use of complete redundancy incurs significant
overhead to the application performance.

In this paper we present RedThreads, an interface that provides
application-level fault detection and correction based on RMT, but applies the
thread-level redundancy adaptively. We describe the RedThreads syntax and
semantics, and the supporting compiler infrastructure and runtime system. Our
approach enables application programmers to scope the extent of redundant
computation. Additionally, the runtime system permits the use of RMT to be
dynamically enabled, or disabled, based on the resiliency needs of the
application and the state of the system. Our experimental results demonstrate
how adaptive RMT exploits programmer insight and runtime inference to
dynamically navigate the trade-off space between an application’s resilience
coverage and the associated performance overhead of redundant computation.

Learning

Regularized Dynamic Boltzmann Machine with Delay Pruning for Unsupervised Learning of Temporal Sequences

We introduce Delay Pruning, a simple yet powerful technique to regularize
dynamic Boltzmann machines (DyBM). The recently introduced DyBM provides a
particularly structured Boltzmann machine, as a generative model of a
multi-dimensional time-series. This Boltzmann machine can have infinitely many
layers of units but allows exact inference and learning based on its
biologically motivated structure. DyBM uses the idea of conduction delays in
the form of fixed length first-in first-out (FIFO) queues, with a neuron
connected to another via this FIFO queue, and spikes from a pre-synaptic neuron
travel along the queue to the post-synaptic neuron with a constant period of
delay. Here, we present Delay Pruning as a mechanism to prune the lengths of
the FIFO queues (making them zero) by setting some delay lengths to one with a
fixed probability, and finally selecting the best performing model with fixed
delays. The uniqueness of structure and a non-sampling based learning rule in
DyBM, make the application of previously proposed regularization techniques
like Dropout or DropConnect difficult, leading to poor generalization. First,
we evaluate the performance of Delay Pruning to let DyBM learn a
multidimensional temporal sequence generated by a Markov chain. Finally, we
show the effectiveness of delay pruning in learning high dimensional sequences
using the moving MNIST dataset, and compare it with Dropout and DropConnect
methods.

Active exploration in parameterized reinforcement learning

Mehdi Khamassi, Costas Tzafestas
Comments: Submitted to EWRL2016
Subjects: Learning (cs.LG)

Online model-free reinforcement learning (RL) methods with continuous actions
are playing a prominent role when dealing with real-world applications such as
Robotics. However, when confronted to non-stationary environments, these
methods crucially rely on an exploration-exploitation trade-off which is rarely
dynamically and automatically adjusted to changes in the environment. Here we
propose an active exploration algorithm for RL in structured (parameterized)
continuous action space. This framework deals with a set of discrete actions,
each of which is parameterized with continuous variables. Discrete exploration
is controlled through a Boltzmann softmax function with an inverse temperature
$eta$ parameter. In parallel, a Gaussian exploration is applied to the
continuous action parameters. We apply a meta-learning algorithm based on the
comparison between variations of short-term and long-term reward running
averages to simultaneously tune $eta$ and the width of the Gaussian
distribution from which continuous action parameters are drawn. When applied to
a simple virtual human-robot interaction task, we show that this algorithm
outperforms continuous parameterized RL both without active exploration and
with active exploration based on uncertainty variations measured by a
Kalman-Q-learning algorithm.

Connecting Generative Adversarial Networks and Actor-Critic Methods

David Pfau, Oriol Vinyals
Comments: Submission to NIPS 2016 Workshop on Adversarial Training
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Both generative adversarial networks (GAN) in unsupervised learning and
actor-critic methods in reinforcement learning (RL) have gained a reputation
for being difficult to optimize. Practitioners in both fields have amassed a
large number of strategies to mitigate these instabilities and improve
training. Here we show that GANs can be viewed as actor-critic methods in an
environment where the actor cannot affect the reward. We review the strategies
for stabilizing training for each class of models, both those that generalize
between the two and those that are particular to that model. We also review a
number of extensions to GANs and RL algorithms with even more complicated
information flow. We hope that by highlighting this formal connection we will
encourage both GAN and RL communities to develop general, scalable, and stable
algorithms for multilevel optimization with deep networks, and to draw
inspiration across communities.

Using Non-invertible Data Transformations to Build Adversary-Resistant Deep Neural Networks

Qinglong Wang, Wenbo Guo, Alexander G. Ororbia II, Xinyu Xing, Lin Lin, C. Lee Giles, Xue Liu, Peng Liu, Gang Xiong
Subjects: Learning (cs.LG)

Deep neural networks have proven to be quite effective in a wide variety of
machine learning tasks, ranging from improved speech recognition systems to
advancing the development of autonomous vehicles. However, despite their
superior performance in many applications, these models have been recently
shown to be susceptible to a particular type of attack possible through the
generation of particular synthetic examples referred to as adversarial samples.
These samples are constructed by manipulating real examples from the training
data distribution in order to “fool” the original neural model, resulting in
misclassification (with high confidence) of previously correctly classified
samples. Addressing this weakness is of utmost importance if deep neural
architectures are to be applied to critical applications, such as those in the
domain of cybersecurity. In this paper, we present an analysis of this
fundamental flaw lurking in all neural architectures to uncover limitations of
previously proposed defense mechanisms. More importantly, we present a unifying
framework for protecting deep neural models using a non-invertible data
transformation–developing two adversary-resilient architectures utilizing both
linear and nonlinear dimensionality reduction. Empirical results indicate that
our framework provides better robustness compared to state-of-art solutions
while having negligible degradation in accuracy.

Ischemic Stroke Identification Based on EEG and EOG using 1D Convolutional Neural Network and Batch Normalization

Endang Purnama Giri, Mohamad Ivan Fanany, Aniati Murni Arymurthy
Comments: 13 pages. To be published in ICACSIS 2016
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

In 2015, stroke was the number one cause of death in Indonesia. The majority
type of stroke is ischemic. The standard tool for diagnosing stroke is CT-Scan.
For developing countries like Indonesia, the availability of CT-Scan is very
limited and still relatively expensive. Because of the availability, another
device that potential to diagnose stroke in Indonesia is EEG. Ischemic stroke
occurs because of obstruction that can make the cerebral blood flow (CBF) on a
person with stroke has become lower than CBF on a normal person (control) so
that the EEG signal have a deceleration. On this study, we perform the ability
of 1D Convolutional Neural Network (1DCNN) to construct classification model
that can distinguish the EEG and EOG stroke data from EEG and EOG control data.
To accelerate training process our model we use Batch Normalization. Involving
62 person data object and from leave one out the scenario with five times
repetition of measurement we obtain the average of accuracy 0.86 (F-Score
0.861) only at 200 epoch. This result is better than all over shallow and
popular classifiers as the comparator (the best result of accuracy 0.69 and
F-Score 0.72 ). The feature used in our study were only 24 handcrafted feature
with simple feature extraction process.

Combining Generative and Discriminative Neural Networks for Sleep Stages Classification

Sleep stages pattern provides important clues in diagnosing the presence of
sleep disorder. By analyzing sleep stages pattern and extracting its features
from EEG, EOG, and EMG signals, we can classify sleep stages. This study
presents a novel classification model for predicting sleep stages with a high
accuracy. The main idea is to combine the generative capability of Deep Belief
Network (DBN) with a discriminative ability and sequence pattern recognizing
capability of Long Short-term Memory (LSTM). We use DBN that is treated as an
automatic higher level features generator. The input to DBN is 28 “handcrafted”
features as used in previous sleep stages studies. We compared our method with
other techniques which combined DBN with Hidden Markov Model (HMM).In this
study, we exploit the sequence or time series characteristics of sleep dataset.
To the best of our knowledge, most of the present sleep analysis from
polysomnogram relies only on single instanced label (nonsequence) for
classification. In this study, we used two datasets: an open data set that is
treated as a benchmark; the other dataset is our sleep stages dataset
(available for download) to verify the results further. Our experiments showed
that the combination of DBN with LSTM gives better overall accuracy 98.75\%
(Fscore=0.9875) for benchmark dataset and 98.94\% (Fscore=0.9894) for MKG
dataset. This result is better than the state of the art of sleep stages
classification that was 91.31\%.

A Methodology for Customizing Clinical Tests for Esophageal Cancer based on Patient Preferences

Asis Roy, Sourangshu Bhattacharya, Kalyan Guin
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Tests for Esophageal cancer can be expensive, uncomfortable and can have side
effects. For many patients, we can predict non-existence of disease with 100%
certainty, just using demographics, lifestyle, and medical history information.
Our objective is to devise a general methodology for customizing tests using
user preferences so that expensive or uncomfortable tests can be avoided. We
propose to use classifiers trained from electronic health records (EHR) for
selection of tests. The key idea is to design classifiers with 100% false
normal rates, possibly at the cost higher false abnormals. We compare Naive
Bayes classification (NB), Random Forests (RF), Support Vector Machines (SVM)
and Logistic Regression (LR), and find kernel Logistic regression to be most
suitable for the task. We propose an algorithm for finding the best probability
threshold for kernel LR, based on test set accuracy. Using the proposed
algorithm, we describe schemes for selecting tests, which appear as features in
the automatic classification algorithm, using preferences on costs and
discomfort of the users. We test our methodology with EHRs collected for more
than 3000 patients, as a part of project carried out by a reputed hospital in
Mumbai, India. Kernel SVM and kernel LR with a polynomial kernel of degree 3,
yields an accuracy of 99.8% and sensitivity 100%, without the MP features, i.e.
using only clinical tests. We demonstrate our test selection algorithm using
two case studies, one using cost of clinical tests, and other using
“discomfort” values for clinical tests. We compute the test sets corresponding
to the lowest false abnormals for each criterion described above, using
exhaustive enumeration of 15 clinical tests. The sets turn out to different,
substantiating our claim that one can customize test sets based on user
preferences.

Low-tubal-rank Tensor Completion using Alternating Minimization

Xiao-Yang Liu, Shuchin Aeron, Vaneet Aggarwal, Xiaodong Wang
Subjects: Learning (cs.LG); Information Theory (cs.IT)

The low-tubal-rank tensor model has been recently proposed for real-world
multidimensional data. In this paper, we study the low-tubal-rank tensor
completion problem, i.e., to recover a third-order tensor by observing a subset
of its elements selected uniformly at random. We propose a fast iterative
algorithm, called {em Tubal-Alt-Min}, that is inspired by a similar approach
for low-rank matrix completion. The unknown low-tubal-rank tensor is
represented as the product of two much smaller tensors with the low-tubal-rank
property being automatically incorporated, and Tubal-Alt-Min alternates between
estimating those two tensors using tensor least squares minimization. First, we
note that tensor least squares minimization is different from its matrix
counterpart and nontrivial as the circular convolution operator of the
low-tubal-rank tensor model is intertwined with the sub-sampling operator.
Second, the theoretical performance guarantee is challenging since
Tubal-Alt-Min is iterative and nonconvex in nature. We prove that 1)
Tubal-Alt-Min guarantees exponential convergence to the global optima, and 2)
for an $n imes n imes k$ tensor with tubal-rank $r ll n$, the required
sampling complexity is $O(nr^2k log^3 n)$ and the computational complexity is
$O(n^2rk^2 log^2 n)$. Third, on both synthetic data and real-world video data,
evaluation results show that compared with tensor-nuclear norm minimization
(TNN-ADMM), Tubal-Alt-Min improves the recovery error dramatically (by orders
of magnitude). It is estimated that Tubal-Alt-Min converges at an exponential
rate $10^{-0.4423 ext{Iter}}$ where $ ext{Iter}$ denotes the number of
iterations, which is much faster than TNN-ADMM’s $10^{-0.0332 ext{Iter}}$,
and the running time can be accelerated by more than $5$ times for a $200
imes 200 imes 20$ tensor.

Generalized Inverse Classification

Michael T. Lash, Qihang Lin, W. Nick Street, Jennifer G. Robinson, Jeffrey Ohlmann
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Inverse classification is the process of perturbing an instance in a
meaningful way such that it is more likely to conform to a specific class.
Historical methods that address such a problem are often framed to leverage
only a single classifier, or specific set of classifiers. These works are often
accompanied by naive assumptions. In this work we propose generalized inverse
classification (GIC), which avoids restricting the classification model that
can be used. We incorporate this formulation into a refined framework in which
GIC takes place. Under this framework, GIC operates on features that are
immediately actionable. Each change incurs an individual cost, either linear or
non-linear. Such changes are subjected to occur within a specified level of
cumulative change (budget). Furthermore, our framework incorporates the
estimation of features that change as a consequence of direct actions taken
(indirectly changeable features). To solve such a problem, we propose three
real-valued heuristic-based methods and two sensitivity analysis-based
comparison methods, each of which is evaluated on two freely available
real-world datasets. Our results demonstrate the validity and benefits of our
formulation, framework, and methods.

Polynomial-time Tensor Decompositions with Sum-of-Squares

Tengyu Ma, Jonathan Shi, David Steurer
Comments: to appear in FOCS 2016
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG)

We give new algorithms based on the sum-of-squares method for tensor
decomposition. Our results improve the best known running times from
quasi-polynomial to polynomial for several problems, including decomposing
random overcomplete 3-tensors and learning overcomplete dictionaries with
constant relative sparsity. We also give the first robust analysis for
decomposing overcomplete 4-tensors in the smoothed analysis model. A key
ingredient of our analysis is to establish small spectral gaps in moment
matrices derived from solutions to sum-of-squares relaxations. To enable this
analysis we augment sum-of-squares relaxations with spectral analogs of maximum
entropy constraints.

Efficient L1-Norm Principal-Component Analysis via Bit Flipping

Panos P. Markopoulos, Sandipan Kundu, Shubham Chamadia, Dimitris A. Pados
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG); Machine Learning (stat.ML)

It was shown recently that the $K$ L1-norm principal components (L1-PCs) of a
real-valued data matrix $mathbf X in mathbb R^{D imes N}$ ($N$ data
samples of $D$ dimensions) can be exactly calculated with cost
$mathcal{O}(2^{NK})$ or, when advantageous, $mathcal{O}(N^{dK – K + 1})$
where $d=mathrm{rank}(mathbf X)$, $K<d$ [1],[2]. In applications where
$mathbf X$ is large (e.g., “big” data of large $N$ and/or “heavy” data of
large $d$), these costs are prohibitive. In this work, we present a novel
suboptimal algorithm for the calculation of the $K < d$ L1-PCs of $mathbf X$
of cost $mathcal O(ND mathrm{min} { N,D} + N^2(K^4 + dK^2) + dNK^3)$, which
is comparable to that of standard (L2-norm) PC analysis. Our theoretical and
experimental studies show that the proposed algorithm calculates the exact
optimal L1-PCs with high frequency and achieves higher value in the L1-PC
optimization metric than any known alternative algorithm of comparable
computational cost. The superiority of the calculated L1-PCs over standard
L2-PCs (singular vectors) in characterizing potentially faulty
data/measurements is demonstrated with experiments on data dimensionality
reduction and disease diagnosis from genomic data.

Sequence-based Sleep Stage Classification using Conditional Neural Fields

Sleep signals from a polysomnographic database are sequences in nature.
Commonly employed analysis and classification methods, however, ignored this
fact and treated the sleep signals as non-sequence data. Treating the sleep
signals as sequences, this paper compared two powerful unsupervised feature
extractors and three sequence-based classifiers regarding accuracy and
computational (training and testing) time after 10-folds cross-validation. The
compared feature extractors are Deep Belief Networks (DBN) and Fuzzy C-Means
(FCM) clustering. Whereas the compared sequence-based classifiers are Hidden
Markov Models (HMM), Conditional Random Fields (CRF) and its variants, i.e.,
Hidden-state CRF (HCRF) and Latent-Dynamic CRF (LDCRF); and Conditional Neural
Fields (CNF) and its variant (LDCNF). In this study, we use two datasets. The
first dataset is an open (public) polysomnographic dataset downloadable from
the Internet, while the second dataset is our polysomnographic dataset (also
available for download). For the first dataset, the combination of FCM and CNF
gives the highest accuracy (96.75\%) with relatively short training time (0.33
hours). For the second dataset, the combination of DBN and CRF gives the
accuracy of 99.96\% but with 1.02 hours training time, whereas the combination
of DBN and CNF gives slightly less accuracy (99.69\%) but also less computation
time (0.89 hours).

Adaptive Online Sequential ELM for Concept Drift Tackling

A machine learning method needs to adapt to over time changes in the
environment. Such changes are known as concept drift. In this paper, we propose
concept drift tackling method as an enhancement of Online Sequential Extreme
Learning Machine (OS-ELM) and Constructive Enhancement OS-ELM (CEOS-ELM) by
adding adaptive capability for classification and regression problem. The
scheme is named as adaptive OS-ELM (AOS-ELM). It is a single classifier scheme
that works well to handle real drift, virtual drift, and hybrid drift. The
AOS-ELM also works well for sudden drift and recurrent context change type. The
scheme is a simple unified method implemented in simple lines of code. We
evaluated AOS-ELM on regression and classification problem by using concept
drift public data set (SEA and STAGGER) and other public data sets such as
MNIST, USPS, and IDS. Experiments show that our method gives higher kappa value
compared to the multiclassifier ELM ensemble. Even though AOS-ELM in practice
does not need hidden nodes increase, we address some issues related to the
increasing of the hidden nodes such as error condition and rank values. We
propose taking the rank of the pseudoinverse matrix as an indicator parameter
to detect underfitting condition.

A New Data Representation Based on Training Data Characteristics to Extract Drug Named-Entity in Medical Text

One essential task in information extraction from the medical corpus is drug
name recognition. Compared with text sources come from other domains, the
medical text is special and has unique characteristics. In addition, the
medical text mining poses more challenges, e.g., more unstructured text, the
fast growing of new terms addition, a wide range of name variation for the same
drug. The mining is even more challenging due to the lack of labeled dataset
sources and external knowledge, as well as multiple token representations for a
single drug name that is more common in the real application setting. Although
many approaches have been proposed to overwhelm the task, some problems
remained with poor F-score performance (less than 0.75). This paper presents a
new treatment in data representation techniques to overcome some of those
challenges. We propose three data representation techniques based on the
characteristics of word distribution and word similarities as a result of word
embedding training. The first technique is evaluated with the standard NN
model, i.e., MLP (Multi-Layer Perceptrons). The second technique involves two
deep network classifiers, i.e., DBN (Deep Belief Networks), and SAE (Stacked
Denoising Encoders). The third technique represents the sentence as a sequence
that is evaluated with a recurrent NN model, i.e., LSTM (Long Short Term
Memory). In extracting the drug name entities, the third technique gives the
best F-score performance compared to the state of the art, with its average
F-score being 0.8645.

Sampled Fictitious Play is Hannan Consistent

Zifan Li, Ambuj Tewari
Subjects: Computer Science and Game Theory (cs.GT); Learning (cs.LG); Machine Learning (stat.ML)

Fictitious play is a simple and widely studied adaptive heuristic for playing
repeated games. It is well known that fictitious play fails to be Hannan
consistent. Several variants of fictitious play including regret matching,
generalized regret matching and smooth fictitious play, are known to be Hannan
consistent. In this note, we consider sampled fictitious play: at each round,
the player samples past times and plays the best response to previous moves of
other players at the sampled time points. We show that sampled fictitious play,
using Bernoulli sampling, is Hannan consistent. Unlike several existing Hannan
consistency proofs that rely on concentration of measure results, ours instead
uses anti-concentration results from Littlewood-Offord theory.

Supervision via Competition: Robot Adversaries for Learning Tasks

Lerrel Pinto, James Davidson, Abhinav Gupta
Comments: Submission to ICRA 2017
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

There has been a recent paradigm shift in robotics to data-driven learning
for planning and control. Due to large number of experiences required for
training, most of these approaches use a self-supervised paradigm: using
sensors to measure success/failure. However, in most cases, these sensors
provide weak supervision at best. In this work, we propose an adversarial
learning framework that pits an adversary against the robot learning the task.
In an effort to defeat the adversary, the original robot learns to perform the
task with more robustness leading to overall improved performance. We show that
this adversarial framework forces the the robot to learn a better grasping
model in order to overcome the adversary. By grasping 82% of presented novel
objects compared to 68% without an adversary, we demonstrate the utility of
creating adversaries. We also demonstrate via experiments that having robots in
adversarial setting might be a better learning strategy as compared to having
collaborative multiple robots.

Automatic Sleep Stage Scoring with Single-Channel EEG Using Convolutional Neural Networks

Orestis Tsinalis, Paul M. Matthews, Yike Guo, Stefanos Zafeiriou
Comments: 12 pages
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We used convolutional neural networks (CNNs) for automatic sleep stage
scoring based on single-channel electroencephalography (EEG) to learn
task-specific filters for classification without using prior domain knowledge.
We used an openly available dataset from 20 healthy young adults for evaluation
and applied 20-fold cross-validation. We used class-balanced random sampling
within the stochastic gradient descent (SGD) optimization of the CNN to avoid
skewed performance in favor of the most represented sleep stages. We achieved
high mean F1-score (81%, range 79-83%), mean accuracy across individual sleep
stages (82%, range 80-84%) and overall accuracy (74%, range 71-76%) over all
subjects. By analyzing and visualizing the filters that our CNN learns, we
found that rules learned by the filters correspond to sleep scoring criteria in
the American Academy of Sleep Medicine (AASM) manual that human experts follow.
Our method’s performance is balanced across classes and our results are
comparable to state-of-the-art methods with hand-engineered features. We show
that, without using prior domain knowledge, a CNN can automatically learn to
distinguish among different normal sleep stages.

Information Theory

Geometric decoding of subspace codes with explicit Schubert calculus applied to spread codes

Klara Stokes
Subjects: Information Theory (cs.IT)

This article is about a decoding algorithm for error-correcting subspace
codes. A version of this algorithm was previously described by Rosenthal,
Silberstein and Trautmann. The decoding algorithm requires the code to be
defined as the intersection of the Pl”ucker embedding of the Grassmannian and
an algebraic variety. We call such codes emph{geometric subspace codes}.
Complexity is substantially improved compared to the algorithm by Rosenthal,
Silberstein and Trautmann and connections to finite geometry are given. The
decoding algorithm is applied to Desarguesian spread codes, which are known to
be defined as the intersection of the Pl”ucker embedding of the Grassmannian
with a linear space.

Fundamental properties of solutions to utility maximization problems in wireless networks

Renato Luis Garrido Cavalcante, Slawomir Stanczak
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

We introduce a unified framework for the study of utility maximization
problems in interference-coupled wireless networks. The framework can be
applied to a large class of utilities, but in this study special attention is
devoted to the rate. In more detail, we resort to results from concave
Perron-Frobenius theory to show that, within the class of problems we consider
here, each problem has a unique solution. As a consequence, given any network
utility maximization problem belonging to this class, we can define two
functions that relate the power budget $ar{p}$ of a network to the network
utility and to the energy efficiency achieved by the solution to the given
problem. Among many interesting properties, we prove that these functions are
continuous and monotonic. In addition, we derive bounds revealing that the
solution is characterized by a low and a high power regime. In the low power
regime, the energy efficiency can decrease slowly as the power budget
increases, and the network utility grows linearly at best. In contrast, in the
high power regime, the energy efficiency typically scales as
$mathcal{O}(1/ar{p})$ as $ar{p} oinfty$, and the network utility scales
as $mathcal{O}(1)$. We apply the theoretical findings to a novel weighted rate
maximization problem involving the joint optimization of the uplink power and
the base station assignment. For this novel problem formulation, we also
propose a simple and practical iterative solver.

Quantum Game Theory for Beam Alignment in Millimeter Wave Device-to-Device Communications

Qianqian Zhang, Walid Saad, Mehdi Bennis, Merouane Debbah
Comments: 6 pages, 6 figures
Subjects: Information Theory (cs.IT); Computer Science and Game Theory (cs.GT)

In this paper, the problem of optimized beam alignment for wearable
device-to-device (D2D) communications over millimeter wave (mmW) frequencies is
studied. In particular, a noncooperative game is formulated between wearable
communication pairs that engage in D2D communications. In this game, wearable
devices acting as transmitters autonomously select the directions of their
beams so as to maximize the data rate to their receivers. To solve the game, an
algorithm based on best response dynamics is proposed that allows the
transmitters to reach a Nash equilibrium in a distributed manner. To further
improve the performance of mmW D2D communications, a novel quantum game model
is formulated to enable the wearable devices to exploit new quantum directions
during their beam alignment so as to further enhance their data rate.
Simulation results show that the proposed game-theoretic approach improves the
performance, in terms of data rate, of about 75% compared to a uniform beam
alignment. The results also show that the quantum game model can further yield
up to 20% improvement in data rates, relative to the classical game approach.

Downlink Coordinated Joint Transmission for Mutual Information Accumulation

Amogh Rajanna, Martin Haenggi
Comments: 10 pages, 2 figures, IEEE Wireless Communications Letters (submitted 5 Oct 2016)
Subjects: Information Theory (cs.IT)

In this letter, we propose a new coordinated multipoint (CoMP) technique
based on mutual information (MI) accumulation using rateless codes. Using a
stochastic geometry model for the cellular downlink, we quantify the
performance enhancements in coverage probability and rate due to MI
accumulation. By simulation and analysis, we show that MI accumulation using
rateless codes leads to remarkable improvements in coverage and rate for
general users and specific cell edge users.

Variable-Length Coding with Cost Allowing Non-Vanishing Error Probability

Hideki Yagi, Ryo Nomura
Comments: 7 pages; extended version of a paper accepted by ISITA2016
Subjects: Information Theory (cs.IT)

We derive a general formula of the minimum achievable rate for
fixed-to-variable length coding with a regular cost function by allowing the
error probability up to a constant $varepsilon$. For a fixed-to-variable
length code, we call the set of source sequences that can be decoded without
error the dominant set of source sequences. For any two regular cost functions,
it is revealed that the dominant set of source sequences for a code attaining
the minimum achievable rate with a cost function is also the dominant set for a
code attaining the minimum achievable rate with the other cost function. We
also give a general formula of the second-order minimum achievable rate.

Learning with Finite Memory for Machine Type Communication

Taehyeun Park, Walid Saad
Comments: 6 pages, 3 figures
Subjects: Information Theory (cs.IT)

Machine-type devices (MTDs) will lie at the heart of the Internet of Things
(IoT) system. A key challenge in such a system is sharing network resources
between small MTDs, which have limited memory and computational capabilities.
In this paper, a novel learning emph{with finite memory} framework is proposed
to enable MTDs to effectively learn about each others message state, so as to
properly adapt their transmission parameters. In particular, an IoT system in
which MTDs can transmit both delay tolerant, periodic messages and critical
alarm messages is studied. For this model, the characterization of the
exponentially growing delay for critical alarm messages and the convergence of
the proposed learning framework in an IoT are analyzed. Simulation results show
that the delay of critical alarm messages is significantly reduced up to $94\%$
with very minimal memory requirements. The results also show that the proposed
learning with finite memory framework is very effective in mitigating the
limiting factors of learning that prevent proper learning procedures.

Low-tubal-rank Tensor Completion using Alternating Minimization

Xiao-Yang Liu, Shuchin Aeron, Vaneet Aggarwal, Xiaodong Wang
Subjects: Learning (cs.LG); Information Theory (cs.IT)

The low-tubal-rank tensor model has been recently proposed for real-world
multidimensional data. In this paper, we study the low-tubal-rank tensor
completion problem, i.e., to recover a third-order tensor by observing a subset
of its elements selected uniformly at random. We propose a fast iterative
algorithm, called {em Tubal-Alt-Min}, that is inspired by a similar approach
for low-rank matrix completion. The unknown low-tubal-rank tensor is
represented as the product of two much smaller tensors with the low-tubal-rank
property being automatically incorporated, and Tubal-Alt-Min alternates between
estimating those two tensors using tensor least squares minimization. First, we
note that tensor least squares minimization is different from its matrix
counterpart and nontrivial as the circular convolution operator of the
low-tubal-rank tensor model is intertwined with the sub-sampling operator.
Second, the theoretical performance guarantee is challenging since
Tubal-Alt-Min is iterative and nonconvex in nature. We prove that 1)
Tubal-Alt-Min guarantees exponential convergence to the global optima, and 2)
for an $n imes n imes k$ tensor with tubal-rank $r ll n$, the required
sampling complexity is $O(nr^2k log^3 n)$ and the computational complexity is
$O(n^2rk^2 log^2 n)$. Third, on both synthetic data and real-world video data,
evaluation results show that compared with tensor-nuclear norm minimization
(TNN-ADMM), Tubal-Alt-Min improves the recovery error dramatically (by orders
of magnitude). It is estimated that Tubal-Alt-Min converges at an exponential
rate $10^{-0.4423 ext{Iter}}$ where $ ext{Iter}$ denotes the number of
iterations, which is much faster than TNN-ADMM’s $10^{-0.0332 ext{Iter}}$,
and the running time can be accelerated by more than $5$ times for a $200
imes 200 imes 20$ tensor.