IT博客汇 | arXiv Paper Daily: Fri, 5 May 2017

arXiv Paper Daily: Fri, 5 May 2017

我爱机器学习(52ml.net)发表于 2017-05-05 00:00:00

Neural and Evolutionary Computing

Evolutionary learning of fire fighting strategies

Martin Kretschmer, Elmar Langetepe
Subjects: Neural and Evolutionary Computing (cs.NE)

The dynamic problem of enclosing an expanding fire can be modelled by a
discrete variant in a grid graph. While the fire expands to all neighbouring
cells in any time step, the fire fighter is allowed to block (c) cells in the
average outside the fire in the same time interval. It was shown that the
success of the fire fighter is guaranteed for (c>1.5) but no strategy can
enclose the fire for (cleq 1.5). For achieving such a critical threshold the
correctness (sometimes even optimality) of strategies and lower bounds have
been shown by integer programming or by direct but often very sophisticated
arguments. We investigate the problem whether it is possible to find or to
approach such a threshold and/or optimal strategies by means of evolutionary
algorithms, i.e., we just try to learn successful strategies for different
constants (c) and have a look at the outcome. The main general idea is that
this approach might give some insight in the power of evolutionary strategies
for similar geometrically motivated threshold questions. We investigate the
variant of protecting a highway with still unknown threshold and found
interesting strategic paradigms.

Keywords: Dynamic environments, fire fighting, evolutionary strategies,
threshold approximation

Pixel Normalization from Numeric Data as Input to Neural Networks

Parth Sane, Ravindra Agrawal
Comments: IEEE WiSPNET 2017 conference in Chennai
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Text to image transformation for input to neural networks requires
intermediate steps. This paper attempts to present a new approach to pixel
normalization so as to convert textual data into image, suitable as input for
neural networks. This method can be further improved by its Graphics Processing
Unit (GPU) implementation to provide significant speedup in computational time.

Computer Vision and Pattern Recognition

Recurrent Soft Attention Model for Common Object Recognition

Liliang Ren, Tong Xiao, Xiaogang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose the Recurrent Soft Attention Model, which integrates the visual
attention from the original image to a LSTM memory cell through a down-sample
network. The model recurrently transmits visual attention to the memory cells
for glimpse mask generation, which is a more natural way for attention
integration and exploitation in general object detection and recognition
problem. We test our model under the metric of the top-1 accuracy on the
CIFAR-10 dataset. The experiment shows that our down-sample network and
feedback mechanism plays an effective role among the whole network structure.

Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks

Yifan Liu, Zengchang Qin, Zhenbo Luo, Hua Wang
Comments: 12 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, realistic image generation using deep neural networks has become a
hot topic in machine learning and computer vision. Images can be generated at
the pixel level by learning from a large collection of images. Learning to
generate colorful cartoon images from black-and-white sketches is not only an
interesting research problem, but also a potential application in digital
entertainment. In this paper, we investigate the sketch-to-image synthesis
problem by using conditional generative adversarial networks (cGAN). We propose
the auto-painter model which can automatically generate compatible colors for a
sketch. The new model is not only capable of painting hand-draw sketch with
proper colors, but also allowing users to indicate preferred colors.
Experimental results on two sketch datasets show that the auto-painter performs
better that existing image-to-image methods.

Edge-based Component-Trees for Multi-Channel Image Segmentation

Tobias Böttger, Dominik Gutermuth
Comments: 11 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce the concept of edge-based component-trees for images with an
arbitrary number of channels. The approach is a natural extension of the
classical component-tree devoted to gray-scale images. The similar structure
enables the translation of many gray-level image processing techniques based on
the component-tree to hyperspectral and color images. As an example
application, we present an image segmentation approach that extracts Maximally
Stable Homogeneous Regions (MSHR). The approach very similar to MSER but can be
applied to images with an arbitrary number of channels. As opposed to MSER, our
approach implicitly segments regions with are both lighter and darker than
their background for gray-scale images and can be used in OCR applications
where MSER will fail. We introduce a local flooding-based immersion for the
edge-based component-tree construction which is linear in the number of pixels.
In the experiments, we show that the runtime scales favorably with an
increasing number of channels and may improve algorithms which build on MSER.

Action Tubelet Detector for Spatio-Temporal Action Localization

Vicky Kalogeiton, Philippe Weinzaepfel, Vittorio Ferrari, Cordelia Schmid
Comments: 9 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Current state-of-the-art approaches for spatio-temporal action detection rely
on detections at the frame level that are then linked or tracked across time.
In this paper, we leverage the temporal continuity of videos instead of
operating at the frame level. We propose the ACtion Tubelet detector
(ACT-detector) that takes as input a sequence of frames and outputs tubeletes,
i.e., sequences of bounding boxes with associated scores. The same way
state-of-the-art object detectors rely on anchor boxes, our ACT-detector is
based on anchor cuboids. We build upon the state-of-the-art SSD framework
(Single Shot MultiBox Detector). Convolutional features are extracted for each
frame, while scores and regressions are based on the temporal stacking of these
features, thus exploiting information from a sequence. Our experimental results
show that leveraging sequences of frames significantly improves detection
performance over using individual frames. The gain of our tubelet detector can
be explained by both more relevant scores and more precise localization. Our
ACT-detector outperforms the state of the art methods for frame-mAP and
video-mAP on the J-HMDB and UCF-101 datasets, in particular at high overlap
thresholds.

A Deep Learning Perspective on the Origin of Facial Expressions

Ran Breuer, Ron Kimmel
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Facial expressions play a significant role in human communication and
behavior. Psychologists have long studied the relationship between facial
expressions and emotions. Paul Ekman et al., devised the Facial Action Coding
System (FACS) to taxonomize human facial expressions and model their behavior.
The ability to recognize facial expressions automatically, enables novel
applications in fields like human-computer interaction, social gaming, and
psychological research. There has been a tremendously active research in this
field, with several recent papers utilizing convolutional neural networks (CNN)
for feature extraction and inference. In this paper, we employ CNN
understanding methods to study the relation between the features these
computational networks are using, the FACS and Action Units (AU). We verify our
findings on the Extended Cohn-Kanade (CK+), NovaEmotions and FER2013 datasets.
We apply these models to various tasks and tests using transfer learning,
including cross-dataset validation and cross-task performance. Finally, we
exploit the nature of the FER based CNN models for the detection of
micro-expressions and achieve state-of-the-art accuracy using a simple
long-short-term-memory (LSTM) recurrent neural network (RNN).

Pixel Normalization from Numeric Data as Input to Neural Networks

Parth Sane, Ravindra Agrawal
Comments: IEEE WiSPNET 2017 conference in Chennai
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Text to image transformation for input to neural networks requires
intermediate steps. This paper attempts to present a new approach to pixel
normalization so as to convert textual data into image, suitable as input for
neural networks. This method can be further improved by its Graphics Processing
Unit (GPU) implementation to provide significant speedup in computational time.

From Zero-shot Learning to Conventional Supervised Classification: Unseen Visual Data Synthesis

Yang Long, Li Liu, Ling Shao, Fumin Shen, Guiguang Ding, Jungong Han
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Robust object recognition systems usually rely on powerful feature extraction
mechanisms from a large number of real images. However, in many realistic
applications, collecting sufficient images for ever-growing new classes is
unattainable. In this paper, we propose a new Zero-shot learning (ZSL)
framework that can synthesise visual features for unseen classes without
acquiring real images. Using the proposed Unseen Visual Data Synthesis (UVDS)
algorithm, semantic attributes are effectively utilised as an intermediate clue
to synthesise unseen visual features at the training stage. Hereafter, ZSL
recognition is converted into the conventional supervised problem, i.e. the
synthesised visual features can be straightforwardly fed to typical classifiers
such as SVM. On four benchmark datasets, we demonstrate the benefit of using
synthesised unseen data. Extensive experimental results suggest that our
proposed approach significantly improve the state-of-the-art results.

Am I Done? Predicting Action Progress in Videos

Federico Becattini, Tiberio Uricchio, Lamberto Ballan, Lorenzo Seidenari, Alberto Del Bimbo
Comments: Submitted to BMVC 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper we introduce the problem of predicting action progress in
untrimmed videos. We argue that this is an extremely important task because, on
the one hand, it can be valuable for a wide range of applications and, on the
other hand, it facilitates better action detection results. To solve this
problem we introduce a novel approach, named ProgressNet, capable of predicting
when an action takes place in a video, where it is located within the frames,
and how far it has progressed during its execution. Motivated by the recent
success obtained from the interaction of Convolutional and Recurrent Neural
Networks, our model is based on a combination of the well known Faster R-CNN
framework, to make framewise predictions, and LSTM networks, to estimate action
progress through time. After introducing two evaluation protocols for the task
at hand, we demonstrate the capability of our model to effectively predict
action progress on a subset of 11 classes from UCF-101, all of which exhibit
strong temporal structure. Moreover, we show that this leads to
state-of-the-art spatio-temporal localization results.

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Video

Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, Min Sun
Comments: 13 pages, 8 figures, To appear in CVPR 2017 as an Oral paper. The first two authors contributed equally to this work. this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM)

Watching a 360{deg} sports video requires a viewer to continuously select a
viewing angle, either through a sequence of mouse clicks or head movements. To
relieve the viewer from this “360 piloting” task, we propose “deep 360 pilot”
— a deep learning-based agent for piloting through 360{deg} sports videos
automatically. At each frame, the agent observes a panoramic image and has the
knowledge of previously selected viewing angles. The task of the agent is to
shift the current viewing angle (i.e. action) to the next preferred one (i.e.,
goal). We propose to directly learn an online policy of the agent from data. We
use the policy gradient technique to jointly train our pipeline: by minimizing
(1) a regression loss measuring the distance between the selected and ground
truth viewing angles, (2) a smoothness loss encouraging smooth transition in
viewing angle, and (3) maximizing an expected reward of focusing on a
foreground object. To evaluate our method, we build a new 360-Sports video
dataset consisting of five sports domains. We train domain-specific agents and
achieve the best performance on viewing angle selection accuracy and transition
smoothness compared to [51] and other baselines.

Attributes2Classname: A discriminative model for attribute-based unsupervised zero-shot learning

Berkan Demirel, Ramazan Gokberk Cinbis, Nazli Ikizler Cinbis
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a novel approach for unsupervised zero-shot learning (ZSL) of
classes based on their names. Most existing unsupervised ZSL methods aim to
learn a model for directly comparing image features and class names. However,
this proves to be a difficult task due to dominance of non-visual semantics in
underlying vector-space embeddings of class names. To address this issue, we
discriminatively learn a word representation such that the similarities between
class and combination of attribute names fall in line with the visual
similarity. Contrary to the traditional zero-shot learning approaches that are
built upon attribute presence, our approach avoids the laborious
attribute-class relation annotations for unseen classes. In addition, our
proposed approach renders text-only training possible, hence, the training can
be augmented without the need to collect additional image data. The
experimental results show that our method yields state-of-the-art results for
unsupervised ZSL in three benchmark datasets.

Generative Convolutional Networks for Latent Fingerprint Reconstruction

Jan Svoboda, Federico Monti, Michael M. Bronstein
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Performance of fingerprint recognition depends heavily on the extraction of
minutiae points. Enhancement of the fingerprint ridge pattern is thus an
essential pre-processing step that noticeably reduces false positive and
negative detection rates. A particularly challenging setting is when the
fingerprint images are corrupted or partially missing. In this work, we apply
generative convolutional networks to denoise visible minutiae and predict the
missing parts of the ridge pattern. The proposed enhancement approach is tested
as a pre-processing step in combination with several standard feature
extraction methods such as MINDTCT, followed by biometric comparison using MCC
and BOZORTH3. We evaluate our method on several publicly available latent
fingerprint datasets captured using different sensors.

VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, Christian Theobalt
Comments: Accepted to SIGGRAPH 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

We present the first real-time method to capture the full global 3D skeletal
pose of a human in a stable, temporally consistent manner using a single RGB
camera. Our method combines a new convolutional neural network (CNN) based pose
regressor with kinematic skeleton fitting. Our novel fully-convolutional pose
formulation regresses 2D and 3D joint positions jointly in real time and does
not require tightly cropped input frames. A real-time kinematic skeleton
fitting method uses the CNN output to yield temporally stable 3D global pose
reconstructions on the basis of a coherent kinematic skeleton. This makes our
approach the first monocular RGB method usable in real-time applications such
as 3D character control—thus far, the only monocular methods for such
applications employed specialized RGB-D cameras. Our method’s accuracy is
quantitatively on par with the best offline 3D monocular RGB pose estimation
methods. Our results are qualitatively comparable to, and sometimes better
than, results from monocular RGB-D approaches, such as the Kinect. However, we
show that our approach is more broadly applicable than RGB-D solutions, i.e. it
works for outdoor scenes, community videos, and low quality commodity RGB
cameras.

Toward Open Set Face Recognition

Manuel Günther, Steve Cruz, Ethan M. Rudd, Terrance E. Boult
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Much research has been conducted on both face identification and face
verification problems, with greater focus on the latter. Research on face
identification has mostly focused on using closed-set protocols, which assume
that all probe images used in evaluation contain identities of subjects that
are enrolled in the gallery. Real systems, however, where only a fraction of
probe sample identities are enrolled in the gallery, cannot make this
closed-set assumption. Instead, they must assume an open set of probe samples
and be able to reject/ignore those that correspond to unknown identities. In
this paper, we address the widespread misconception that thresholding
verification-like scores is sufficient to solve the open-set face
identification problem, by formulating an open-set face identification protocol
and evaluating different strategies for assessing similarity. Our open-set
identification protocol is based on the canonical labeled faces in the wild
(LFW) dataset. We compare three algorithms for assessing similarity in a deep
feature space under an open-set protocol: thresholded verification-like scores,
linear discriminant analysis (LDA) scores, and an extreme value machine (EVM)
output probabilities. Our findings suggest that thresholding similarity
measures that are open-set by design outperforms verification-like score level
thresholding.

Fast k-means based on KNN Graph

Cheng-Hao Deng, Wan-Lei Zhao
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

In the era of big data, k-means clustering has been widely adopted as a basic
processing tool in various contexts. However, its computational cost could be
prohibitively high as the data size and the cluster number are large. It is
well known that the processing bottleneck of k-means lies in the operation of
seeking closest centroid in each iteration. In this paper, a novel solution
towards the scalability issue of k-means is presented. In the proposal, k-means
is supported by an approximate k-nearest neighbors graph. In the k-means
iteration, each data sample is only compared to clusters that its nearest
neighbors reside. Since the number of nearest neighbors we consider is much
less than k, the processing cost in this step becomes minor and irrelevant to
k. The processing bottleneck is therefore overcome. The most interesting thing
is that k-nearest neighbor graph is constructed by iteratively calling the fast
(k)-means itself. Comparing with existing fast k-means variants, the proposed
algorithm achieves hundreds to thousands times speed-up while maintaining high
clustering quality. As it is tested on 10 million 512-dimensional data, it
takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the
same scale of clustering, it would take 3 years for traditional k-means.

Artificial Intelligence

Semi-supervised model-based clustering with controlled clusters leakage

Marek Śmieja, Łukasz Struski, Jacek Tabor
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we focus on finding clusters in partially categorized data
sets. We propose a semi-supervised version of Gaussian mixture model, called
C3L, which retrieves natural subgroups of given categories. In contrast to
other semi-supervised models, C3L is parametrized by user-defined leakage
level, which controls maximal inconsistency between initial categorization and
resulting clustering. Our method can be implemented as a module in practical
expert systems to detect clusters, which combine expert knowledge with true
distribution of data. Moreover, it can be used for improving the results of
less flexible clustering techniques, such as projection pursuit clustering. The
paper presents extensive theoretical analysis of the model and fast algorithm
for its efficient optimization. Experimental results show that C3L finds high
quality clustering model, which can be applied in discovering meaningful groups
in partially classified data.

A Reasoning System for a First-Order Logic of Limited Belief

Christoph Schwering
Comments: 22 pages, 0 figures, Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
Subjects: Artificial Intelligence (cs.AI)

Logics of limited belief aim at enabling computationally feasible reasoning
in highly expressive representation languages. These languages are often
dialects of first-order logic with a weaker form of logical entailment that
keeps reasoning decidable or even tractable. While a number of such logics have
been proposed in the past, they tend to remain for theoretical analysis only
and their practical relevance is very limited. In this paper, we aim to go
beyond the theory. Building on earlier work by Liu, Lakemeyer, and Levesque, we
develop a logic of limited belief that is highly expressive while remaining
decidable in the first-order and tractable in the propositional case and
exhibits some characteristics that make it attractive for an implementation. We
introduce a reasoning system that employs this logic as representation language
and present experimental results that showcase the benefit of limited belief.

Tramp Ship Scheduling Problem with Berth Allocation Considerations and Time-dependent Constraints

Francisco López-Ramos, Armando Guarnaschelli, José-Fernando Camacho-Vallejo, Laura Hervert-Escobar, Rosa G. González-Ramírez
Comments: 16 pages, 3 figures, 5 tables, proceedings paper of Mexican International Conference on Artificial Intelligence (MICAI) 2016
Subjects: Artificial Intelligence (cs.AI)

This work presents a model for the Tramp Ship Scheduling problem including
berth allocation considerations, motivated by a real case of a shipping
company. The aim is to determine the travel schedule for each vessel
considering multiple docking and multiple time windows at the berths. This work
is innovative due to the consideration of both spatial and temporal attributes
during the scheduling process. The resulting model is formulated as a
mixed-integer linear programming problem, and a heuristic method to deal with
multiple vessel schedules is also presented. Numerical experimentation is
performed to highlight the benefits of the proposed approach and the
applicability of the heuristic. Conclusions and recommendations for further
research are provided.

Semi-supervised cross-entropy clustering with information bottleneck constraint

Marek Śmieja, Bernhard C. Geiger
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we propose a semi-supervised clustering method, CEC-IB, that
models data with a set of Gaussian distributions and that retrieves clusters
based on a partial labeling provided by the user (partition-level side
information). By combining the ideas from cross-entropy clustering (CEC) with
those from the information bottleneck method (IB), our method trades between
three conflicting goals: the accuracy with which the data set is modeled, the
simplicity of the model, and the consistency of the clustering with side
information. Experiments demonstrate that CEC-IB has a performance comparable
to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but
is faster, more robust to noisy labels, automatically determines the optimal
number of clusters, and performs well when not all classes are present in the
side information. Moreover, in contrast to other semi-supervised models, it can
be successfully applied in discovering natural subgroups if the partition-level
side information is derived from the top levels of a hierarchical clustering.

Fast k-means based on KNN Graph

Cheng-Hao Deng, Wan-Lei Zhao
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

In the era of big data, k-means clustering has been widely adopted as a basic
processing tool in various contexts. However, its computational cost could be
prohibitively high as the data size and the cluster number are large. It is
well known that the processing bottleneck of k-means lies in the operation of
seeking closest centroid in each iteration. In this paper, a novel solution
towards the scalability issue of k-means is presented. In the proposal, k-means
is supported by an approximate k-nearest neighbors graph. In the k-means
iteration, each data sample is only compared to clusters that its nearest
neighbors reside. Since the number of nearest neighbors we consider is much
less than k, the processing cost in this step becomes minor and irrelevant to
k. The processing bottleneck is therefore overcome. The most interesting thing
is that k-nearest neighbor graph is constructed by iteratively calling the fast
(k)-means itself. Comparing with existing fast k-means variants, the proposed
algorithm achieves hundreds to thousands times speed-up while maintaining high
clustering quality. As it is tested on 10 million 512-dimensional data, it
takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the
same scale of clustering, it would take 3 years for traditional k-means.

Of the People: Voting Is More Effective with Representative Candidates

Yu Cheng, Shaddin Dughmi, David Kempe
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)

In light of the classic impossibility results of Arrow and Gibbard and
Satterthwaite regarding voting with ordinal rules, there has been recent
interest in characterizing how well common voting rules approximate the social
optimum. In order to quantify the quality of approximation, it is natural to
consider the candidates and voters as embedded within a common metric space,
and to ask how much further the chosen candidate is from the population as
compared to the socially optimal one. We use this metric preference model to
explore a fundamental and timely question: does the social welfare of a
population improve when candidates are representative of the population? If so,
then by how much, and how does the answer depend on the complexity of the
metric space?

We restrict attention to the most fundamental and common social choice
setting: a population of voters, two independently drawn candidates, and a
majority rule election. When candidates are not representative of the
population, it is known that the candidate selected by the majority rule can be
thrice as far from the population as the socially optimal one. We examine how
this ratio improves when candidates are drawn independently from the population
of voters. Our results are two-fold: When the metric is a line, the ratio
improves from 3 to 4-2 sqrt{2}, roughly 1.1716; this bound is tight. When the
metric is arbitrary, we show a lower bound of 1.5 and a constant upper bound
strictly better than 2 on the approximation ratio of the majority rule.

The positive result depends in part on the assumption that candidates are
independent and identically distributed. However, we show that independence
alone is not enough to achieve the upper bound: even when candidates are drawn
independently, if the population of candidates can be different from the
voters, then an upper bound of 2 on the approximation is tight.

Gait Pattern Recognition Using Accelerometers

Vahid Alizadeh
Comments: 6 pages, project report
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Motion ability is one of the most important human properties, including gait
as a basis of human transitional movement. Gait, as a biometric for recognizing
human identities, can be non-intrusively captured signals using wearable or
portable smart devices. In this study gait patterns is collected using a
wireless platform of two sensors located at chest and right ankle of the
subjects. Then the raw data has undergone some preprocessing methods and
segmented into 5 seconds windows. Some time and frequency domain features is
extracted and the performance evaluated by 5 different classifiers. Decision
Tree (with all features) and K-Nearest Neighbors (with 10 selected features)
classifiers reached 99.4% and 100% respectively.

Computation and Language

A Finite State and Rule-based Akshara to Prosodeme (A2P) Converter in Hindi

Somnath Roy
Comments: If you need software (A2P Converter), you have to write for the same at “somnathroy86@gmail.com” or “somnat75_llh@jnu.ac.in”
Subjects: Computation and Language (cs.CL)

This article describes a software module called Akshara to Prosodeme (A2P)
converter in Hindi. It converts an input grapheme into prosedeme (sequence of
phonemes with the specification of syllable boundaries and prosodic labels).
The software is based on two proposed finite state machines extemdash one for
the syllabification and another for the syllable labeling. In addition to that,
it also uses a set of nonlinear phonological rules proposed for foot formation
in Hindi, which encompass solutions to schwa-deletion in simple, compound,
derived and inflected words. The nonlinear phonological rules are based on
metrical phonology with the provision of recursive foot structure. A software
module is implemented in Python. The testing of the software for
syllabification, syllable labeling, schwa deletion and prosodic labeling yield
an accuracy of more than 99% on a lexicon of size 28664 words.

Probabilistic Typology: Deep Generative Models of Vowel Inventories

Ryan Cotterell, Jason Eisner
Comments: ACL 2017
Subjects: Computation and Language (cs.CL)

Linguistic typology studies the range of structures present in human
language. The main goal of the field is to discover which sets of possible
phenomena are universal, and which are merely frequent. For example, all
languages have vowels, while most—but not all—languages have an /u/ sound.
In this paper we present the first probabilistic treatment of a basic question
in phonological typology: What makes a natural vowel inventory? We introduce a
series of deep stochastic point processes, and contrast them with previous
computational, simulation-based approaches. We provide a comprehensive suite of
experiments on over 200 distinct languages.

Distributed, Parallel, and Cluster Computing

Execution Templates: Caching Control Plane Decisions for Strong Scaling of Data Analytics

Omid Mashayekhi, Hang Qu, Chinmayee Shah, Philip Levis
Comments: To appear at USENIX ATC 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Control planes of cloud frameworks trade off between scheduling granularity
and performance. Centralized systems schedule at task granularity, but only
schedule a few thousand tasks per second. Distributed systems schedule hundreds
of thousands of tasks per second but changing the schedule is costly.

We present execution templates, a control plane abstraction that can schedule
hundreds of thousands of tasks per second while supporting fine-grained,
per-task scheduling decisions. Execution templates leverage a program’s
repetitive control flow to cache blocks of frequently-executed tasks. Executing
a task in a template requires sending a single message. Large-scale scheduling
changes install new templates, while small changes apply edits to existing
templates.

Evaluations of execution templates in Nimbus, a data analytics framework,
find that they provide the fine-grained scheduling flexibility of centralized
control planes while matching the strong scaling of distributed ones. Execution
templates support complex, real-world applications, such as a fluid simulation
with a triply nested loop and data dependent branches.

Learning

Fast k-means based on KNN Graph

Cheng-Hao Deng, Wan-Lei Zhao
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

In the era of big data, k-means clustering has been widely adopted as a basic
processing tool in various contexts. However, its computational cost could be
prohibitively high as the data size and the cluster number are large. It is
well known that the processing bottleneck of k-means lies in the operation of
seeking closest centroid in each iteration. In this paper, a novel solution
towards the scalability issue of k-means is presented. In the proposal, k-means
is supported by an approximate k-nearest neighbors graph. In the k-means
iteration, each data sample is only compared to clusters that its nearest
neighbors reside. Since the number of nearest neighbors we consider is much
less than k, the processing cost in this step becomes minor and irrelevant to
k. The processing bottleneck is therefore overcome. The most interesting thing
is that k-nearest neighbor graph is constructed by iteratively calling the fast
(k)-means itself. Comparing with existing fast k-means variants, the proposed
algorithm achieves hundreds to thousands times speed-up while maintaining high
clustering quality. As it is tested on 10 million 512-dimensional data, it
takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the
same scale of clustering, it would take 3 years for traditional k-means.

Optimal Approximation with Sparsely Connected Deep Neural Networks

Helmut Bölcskei, Philipp Grohs, Gitta Kutyniok, Philipp Petersen
Subjects: Learning (cs.LG); Functional Analysis (math.FA)

We derive fundamental lower bounds on the connectivity and the memory
requirements of deep neural networks guaranteeing uniform approximation rates
for arbitrary function classes in (L^2(mathbb{R}^d)). In other words, we
establish a connection between the complexity of a function class and the
complexity of deep neural networks approximating functions from this class to
within a prescribed accuracy.

Additionally, we prove that our lower bounds are achievable for a broad
family of function classes. Specifically, all function classes that are
optimally approximated by a general class of representation systems—so-called
emph{affine systems}—can be approximated by deep neural networks with
minimal connectivity and memory requirements. Affine systems encompass a wealth
of representation systems from applied harmonic analysis such as wavelets,
ridgelets, curvelets, shearlets, (alpha)-shearlets, and more generally
(alpha)-molecules. This result elucidates a remarkable universality property
of neural networks and shows that they achieve the optimum approximation
properties of all affine systems combined. As a specific example, we consider
the class of (1/alpha)-cartoon-like functions, which is approximated optimally
by (alpha)-shearlets.

We also explain how our results can be extended to the case of functions on
low-dimensional immersed manifolds.

Finally, we present numerical experiments demonstrating that the standard
stochastic gradient descent algorithm generates deep neural networks providing
close-to-optimal approximation rates at minimal connectivity. Moreover, these
results show that stochastic gradient descent actually learns approximations
that are sparse in the representation systems optimally sparsifying the
function class the network is trained on.

Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks

Minsoo Rhu, Mike O'Connor, Niladrish Chatterjee, Jeff Pool, Stephen W. Keckler
Subjects: Learning (cs.LG); Hardware Architecture (cs.AR)

Popular deep learning frameworks require users to fine-tune their memory
usage so that the training data of a deep neural network (DNN) fits within the
GPU physical memory. Prior work tries to address this restriction by
virtualizing the memory usage of DNNs, enabling both CPU and GPU memory to be
utilized for memory allocations. Despite its merits, virtualizing memory can
incur significant performance overheads when the time needed to copy data back
and forth from CPU memory is higher than the latency to perform the
computations required for DNN forward and backward propagation. We introduce a
high-performance virtualization strategy based on a “compressing DMA engine”
(cDMA) that drastically reduces the size of the data structures that are
targeted for CPU-side allocations. The cDMA engine offers an average 2.6x
(maximum 13.8x) compression ratio by exploiting the sparsity inherent in
offloaded data, improving the performance of virtualized DNNs by an average 32%
(maximum 61%).

Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels

Curtis G. Northcutt, Tailin Wu, Isaac L. Chuang
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Noisy PN learning is the problem of binary classification when training
examples may be mislabeled (flipped) uniformly with noise rate rho1 for
positive examples and rho0 for negative examples. We propose Rank Pruning (RP)
to solve noisy PN learning and the open problem of estimating the noise rates.
Unlike prior solutions, RP is efficient and general, requiring O(T) for any
unrestricted choice of probabilistic classifier with T fitting time. We prove
RP achieves consistent noise estimation and equivalent empirical risk as
learning with uncorrupted labels in ideal conditions, and derive closed-form
solutions when conditions are non-ideal. RP achieves state-of-the-art noise
rate estimation and F1, error, and AUC-PR on the MNIST and CIFAR datasets,
regardless of noise rates. To highlight, RP with a CNN classifier can predict
if a MNIST digit is a “1” or “not 1” with only 0.25% error, and 0.46% error
across all digits, even when 50% of positive examples are mislabeled and 50% of
observed positive labels are mislabeled negative examples.

Semi-supervised model-based clustering with controlled clusters leakage

Marek Śmieja, Łukasz Struski, Jacek Tabor
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we focus on finding clusters in partially categorized data
sets. We propose a semi-supervised version of Gaussian mixture model, called
C3L, which retrieves natural subgroups of given categories. In contrast to
other semi-supervised models, C3L is parametrized by user-defined leakage
level, which controls maximal inconsistency between initial categorization and
resulting clustering. Our method can be implemented as a module in practical
expert systems to detect clusters, which combine expert knowledge with true
distribution of data. Moreover, it can be used for improving the results of
less flexible clustering techniques, such as projection pursuit clustering. The
paper presents extensive theoretical analysis of the model and fast algorithm
for its efficient optimization. Experimental results show that C3L finds high
quality clustering model, which can be applied in discovering meaningful groups
in partially classified data.

Near-optimal linear decision trees for k-SUM and related problems

Daniel M. Kane, Shachar Lovett, Shay Moran
Comments: 18 paged, 1 figure
Subjects: Computational Geometry (cs.CG); Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Learning (cs.LG); Combinatorics (math.CO)

We construct near optimal linear decision trees for a variety of decision
problems in combinatorics and discrete geometry. For example, for any constant
(k), we construct linear decision trees that solve the (k)-SUM problem on (n)
elements using (O(n log^2 n)) linear queries. Moreover, the queries we use are
comparison queries, which compare the sums of two (k)-subsets; when viewed as
linear queries, comparison queries are (2k)-sparse and have only ({-1,0,1})
coefficients. We give similar constructions for sorting sumsets (A+B) and for
solving the SUBSET-SUM problem, both with optimal number of queries, up to
poly-logarithmic terms.

Our constructions are based on the notion of “inference dimension”, recently
introduced by the authors in the context of active classification with
comparison queries. This can be viewed as another contribution to the fruitful
link between machine learning and discrete geometry, which goes back to the
discovery of the VC dimension.

Semi-Supervised AUC Optimization based on Positive-Unlabeled Learning

Tomoya Sakai, Gang Niu, Masashi Sugiyama
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Maximizing the area under the receiver operating characteristic curve (AUC)
is a standard approach to imbalanced classification. So far, various supervised
AUC optimization methods have been developed and they are also extended to
semi-supervised scenarios to cope with small sample problems. However, existing
semi-supervised AUC optimization methods rely on strong distributional
assumptions, which are rarely satisfied in real-world problems. In this paper,
we propose a novel semi-supervised AUC optimization method that does not
require such restrictive assumptions. We first develop an AUC optimization
method based only on positive and unlabeled data (PU-AUC) and then extend it to
semi-supervised learning by combining it with a supervised AUC optimization
method. We theoretically prove that, without the restrictive distributional
assumptions, unlabeled data contribute to improving the generalization
performance in PU and semi-supervised AUC optimization methods. Finally, we
demonstrate the practical usefulness of the proposed methods through
experiments.

Generative Convolutional Networks for Latent Fingerprint Reconstruction

Jan Svoboda, Federico Monti, Michael M. Bronstein
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Performance of fingerprint recognition depends heavily on the extraction of
minutiae points. Enhancement of the fingerprint ridge pattern is thus an
essential pre-processing step that noticeably reduces false positive and
negative detection rates. A particularly challenging setting is when the
fingerprint images are corrupted or partially missing. In this work, we apply
generative convolutional networks to denoise visible minutiae and predict the
missing parts of the ridge pattern. The proposed enhancement approach is tested
as a pre-processing step in combination with several standard feature
extraction methods such as MINDTCT, followed by biometric comparison using MCC
and BOZORTH3. We evaluate our method on several publicly available latent
fingerprint datasets captured using different sensors.

Semi-supervised cross-entropy clustering with information bottleneck constraint

Marek Śmieja, Bernhard C. Geiger
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we propose a semi-supervised clustering method, CEC-IB, that
models data with a set of Gaussian distributions and that retrieves clusters
based on a partial labeling provided by the user (partition-level side
information). By combining the ideas from cross-entropy clustering (CEC) with
those from the information bottleneck method (IB), our method trades between
three conflicting goals: the accuracy with which the data set is modeled, the
simplicity of the model, and the consistency of the clustering with side
information. Experiments demonstrate that CEC-IB has a performance comparable
to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but
is faster, more robust to noisy labels, automatically determines the optimal
number of clusters, and performs well when not all classes are present in the
side information. Moreover, in contrast to other semi-supervised models, it can
be successfully applied in discovering natural subgroups if the partition-level
side information is derived from the top levels of a hierarchical clustering.

Information Theory

Blind Detection with Polar Codes

Carlo Condo, Seyyed Ali Hashemi, Warren J. Gross
Subjects: Information Theory (cs.IT)

In blind detection, a set of candidates has to be decoded within a strict
time constraint, to identify which transmissions are directed at the user
equipment. Blind detection is an operation required by the 3GPP
LTE/LTE-Advanced standard, and it will be required in the 5th generation
wireless communication standard (5G) as well. We propose a blind detection
scheme based on polar codes, where the radio network temporary identifier
(RNTI) is transmitted instead of some of the frozen bits. A low-complexity
decoding stage decodes all candidates, selecting a subset that is decoded by a
high-performance algorithm. Simulations results show good missed detection and
false alarm rates, that meet the system specifications. We also propose an
early stopping criterion for the second decoding stage that can reduce the
number of operations performed, improving both average latency and energy
consumption. The detection speed is analyzed and different system parameter
combinations are shown to meet the stringent timing requirements, leading to
various implementation trade-offs.

PER Approximation for Cross-Layer Optimization under Reliability and Energy Constraints

Aamir Mahmood, Mikael Gidlund, M M Aftab Hossain
Subjects: Information Theory (cs.IT)

The vision of connecting billions of battery operated devices to be used for
diverse emerging applications calls for a wireless communication system that
can support stringent reliability and latency requirements. Both reliability
and energy efficiency are critical for many of these applications that involve
communication with short packets which undermine the coding gain achievable
from large packets. In this paper, we first revisit the packet error rate (PER)
performance of uncoded schemes in block fading channels and derive a simple and
accurate PER expression. Specifically, we show that the waterfall threshold in
the PER upper bound in Nakagami-(m) block fading channels is tightly
approximated by the (m)-th moment of an asymptotic distribution of PER in AWGN
channel. This PER expression gives an explicit connection between the
parameters of both the physical and link layers and the PER. We utilize this
connection for cross-layer design and optimization of communication links. To
this end, we optimize signal-to-noise ratio (SNR) and modulation order at
physical layer, and the packet length and number of retransmissions at link
layer with respect to distance under the prescribed delay and reliability
constraint.

On the Design of Matched Filters for Molecule Counting Receivers

Vahid Jamali, Arman Ahmadzadeh, Robert Schober
Comments: To appear in IEEE Communications Letter
Subjects: Information Theory (cs.IT)

In this paper, we design matched filters for diffusive molecular
communication systems taking into account the following impairments:
signal-dependent diffusion noise, inter-symbol interference (ISI), and external
interfering molecules. The receiver counts the number of observed molecules
several times within one symbol interval and employs linear filtering to detect
the transmitted data. We derive the optimal matched filter by maximizing the
expected signal-to-interference-plus-noise ratio of the decision variable.
Moreover, we show that for the special case of an ISI-free channel, the matched
filter reduces to a simple sum detector and a correlator for the channel
impulse response for the diffusion noise-limited and (external)
interference-limited regimes, respectively. Our simulation results reveal that
the proposed matched filter considerably outperforms the benchmark schemes
available in literature, especially when ISI is severe.

Wireless Channel Modeling Perspectives for Ultra-Reliable Communications

Patrick C. F. Eggers, Petar Popovski
Comments: Submitted to IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

Ultra-Reliable Communication (URC) is one of the distinctive features of the
upcoming 5G wireless communication. The level of reliability, going down to
packet error rates (PER) of (10^{-9}), should be sufficiently convincing in
order to remove cables in an industrial setting or provide remote control of
robots with mission-critical function. In this paper we present elements of
physical and statistical modeling of the wireless channel that are relevant for
characterization of the lower tail of the channel Cumulative Distribution
Function (CDF). There are channel models, such as Two-Wave with Diffuse Power
(TWDP) or Suzuki, where finding the full CDF is not tractable. We show that,
for a wide range of channel models, the outage probability at URC levels can be
approximated by a simple expression, whose exponent depends on the actual
channel model. Furthermore, it is seen that the two-wave model leads to
pessimistic predictions of the fading in the region of ultra-reliable
communications, while the CDFs of models that contain diffuse components have
slopes that correspond to the slope of a Rayleigh fading. We provide analysis
of the receive antenna diversity schemes for URC-relevant statistics and obtain
a new expression for Maximum Ratio Combining (MRC) in Weibull channels.

3GPP-inspired HetNet Model using Poisson Cluster Process: Sum-product Functionals and Downlink Coverage

Chiranjib Saha, Mehrnaz Afshang, Harpreet S. Dhillon
Comments: Submitted to IEEE Transactions on Communications. A part of this paper appeared in 2017 ITA Workshop. It is available at arXiv:1702.05706
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

The growing complexity of heterogeneous cellular networks (HetNets) has
necessitated a variety of user and base station (BS) configurations to be
considered for realistic performance evaluation and system design. This is
directly reflected in the HetNet simulation models proposed by standardization
bodies, such as the third generation partnership project (3GPP). Complementary
to these simulation models, stochastic geometry-based approach, modeling the
locations of the users and the K tiers of BSs as independent and homogeneous
Poisson point processes (PPPs), has gained prominence in the past few years.
Despite its success in revealing useful insights, this PPP-based K-tier HetNet
model is not rich enough to capture spatial coupling between user and BS
locations that exists in real-world HetNet deployments and is included in 3GPP
simulation models. In this paper, we demonstrate that modeling a fraction of
users and arbitrary number of BS tiers alternatively with a Poisson cluster
process (PCP) captures the aforementioned coupling, thus bridging the gap
between the 3GPP simulation models and the PPP-based analytic model for
HetNets. We further show that the downlink coverage probability of a typical
user under maximum signal-to-interference-ratio association can be expressed in
terms of the sum-product functionals over PPP, PCP, and its associated
offspring point process, which are all characterized as a part of our analysis.
We also show that the proposed model converges to the PPP-based HetNet model as
the cluster size of the PCPs tends to infinity. Finally, we specialize our
analysis based on general PCPs for Thomas and Matern cluster processes. Special
instances of the proposed model closely resemble the different configurations
for BS and user locations considered in 3GPP simulations.

State-Dependent Gaussian Multiple Access Channels: New Outer Bounds and Capacity Results

Wei Yang, Yingbin Liang, Shlomo Shamai (Shitz), H. Vincent Poor
Comments: The material of this paper will be presented in part at the 2017 International Symposium on Information Theory (ISIT)
Subjects: Information Theory (cs.IT)

This paper studies a two-user state-dependent Gaussian multiple-access
channel (MAC) with state noncausally known at one encoder. Two scenarios are
considered: i) each user wishes to communicate an independent message to the
common receiver, and ii) the two encoders send a common message to the receiver
and the non-cognitive encoder (i.e., the encoder that does not know the state)
sends an independent individual message (this model is also known as the MAC
with degraded message sets). For both scenarios, new outer bounds on the
capacity region are derived, which improve uniformly over the best known outer
bounds. In the first scenario, the two corner points of the capacity region as
well as the sum rate capacity are established, and it is shown that a
single-letter solution is adequate to achieve both the corner points and the
sum rate capacity. Furthermore, the full capacity region is characterized in
situations in which the sum rate capacity is equal to the capacity of the
helper problem. The proof exploits the optimal-transportation idea of
Polyanskiy and Wu (which was used previously to establish an outer bound on the
capacity region of the interference channel) and the worst-case Gaussian noise
result for the case in which the input and the noise are dependent.

Capacity of Burst Noise-Erasure Channels With and Without Feedback and Input Cost

Lin Song, Fady Alajaji, Tamás Linder
Comments: Parts of this work will be presented at the 2017 IEEE International Symposium on Information Theory
Subjects: Information Theory (cs.IT)

A class of burst noise-erasure channels which incorporate both errors and
erasures during transmission is studied. The channel, whose output is
explicitly expressed in terms of its input and a stationary ergodic
noise-erasure process, is shown to have a so-called “quasi-symmetry” property
under certain invertibility conditions. As a result, it is proved that a
uniformly distributed input process maximizes the channel’s block mutual
information, resulting in a closed-form formula for its non-feedback capacity
in terms of the noise-erasure entropy rate and the entropy rate of an auxiliary
erasure process. The feedback channel capacity is also characterized, showing
that feedback does not increase capacity and generalizing prior related
results. The capacity-cost function of the channel with and without feedback is
also investigated. A sequence of finite-letter upper bounds for the
capacity-cost function without feedback is derived. Finite-letter lower bonds
for the capacity-cost function with feedback are obtained using a specific
encoding rule. Based on these bounds, it is demonstrated both numerically and
analytically that feedback can increase the capacity-cost function for a class
of channels with Markov noise-erasure processes.

Fourth-order Tensors with Multidimensional Discrete Transforms

Xiao-Yang Liu, Xiaodong Wang
Subjects: Numerical Analysis (cs.NA); Information Theory (cs.IT)

The big data era is swamping areas including data analysis, machine/deep
learning, signal processing, statistics, scientific computing, and cloud
computing. The multidimensional feature and huge volume of big data put urgent
requirements to the development of multilinear modeling tools and efficient
algorithms. In this paper, we build a novel multilinear tensor space that
supports useful algorithms such as SVD and QR, while generalizing the matrix
space to fourth-order tensors was believed to be challenging. Specifically,
given any multidimensional discrete transform, we show that fourth-order
tensors are bilinear operators on a space of matrices. First, we take a
transform-based approach to construct a new tensor space by defining a new
multiplication operation and tensor products, and accordingly the analogous
concepts: identity, inverse, transpose, linear combinations, and orthogonality.
Secondly, we define the (mathcal{L})-SVD for fourth-order tensors and present
an efficient algorithm, where the tensor case requires a stronger condition for
unique decomposition than the matrix case. Thirdly, we define the tensor
(mathcal{L})-QR decomposition and propose a Householder QR algorithm to avoid
the catastrophic cancellation problem associated with the conventional
Gram-Schmidt process. Finally, we validate our schemes on video compression and
one-shot face recognition. For video compression, compared with the existing
tSVD, the proposed (mathcal{L})-SVD achieves (3sim 10)dB gains in RSE, while
the running time is reduced by about (50\%) and (87.5\%), respectively. For
one-shot face recognition, the recognition rate is increased by about (10\%
sim 20\%).